Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Okay hows this: unless the author of the original blog post is deliberately deceiving the audience but putting a bunch of points on top of each other then one can tell there is no line which most of the points lie near.

I agree that a scatter plot is not the best for showing data with that many data points, but frankly it's kind of irrelevant. The point the author was making was just that it isn't that hard to cook up some data that is not highly correlated, but will be if you bin and average it.



It's not a matter of being deliberate. He's just being lazy, using the standard excel scatterplot. To fix that graph, he needs to carefully choose an opacity, which he didn't do.

See the discussion here: http://news.ycombinator.com/item?id=4027337

I stand by the claim made in my blog post. Don't use scatterplots, use a density plot instead.

Incidentally, according to a comment the author made, the correlation is actually 0.3. That's far better than his graph suggests.


I agree he could have presented his data better. I also recognize that there is some correlation. None the less, the story was about how averaging can be used to make a correlation appear strong than it is, and going from 0.3 to 0.99 certainly meets that criteria. I was responding to the claim that the data might already have a strong correlation (presumably on the order of 0.99), and I was arguing that no "natural" strongly correlated data would have a scatter plot like that.

I might add that discussions like these, Illustrate a broader problem with the hacker news community. Rather than discussing the facts of the story, many of the posts are instead about the less relevant detail about how the author chose to present them, despite the fact the the authors point is still effectively made (and if the 0.3 were included there wouldn't have been any doubt)

BTW: tmoertel did some analysis on the data: http://news.ycombinator.com/item?id=5125851 and the correlation appears to be around 0.2, which is pretty weak.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: