Re: Experiments with large N

Huge samples are very nice if you can get 'em, though such is not always the case, alas.

So one thing that I would like to see from people who do have gigantic N is to do some analyses to determine at what point the data reach some asymptote. In other words, if you've collected 1,000,000 people, at what earlier point in your sampling could you have stopped, and come to the identical conclusions with valid statistics?

Obviously, the answer to this question will be different for different types of studies with different types of variance and so forth. But having the large N allows one to perform this calculation, so that next time one does a similar study, one could reasonably stop after reaching a smaller and more manageable sample size.

Has anybody already done this for those large samples that were recently discussed? It would be really helpful for those who cannot always collect such samples.



Malcolm Slaney wrote:
This music paper has 380k subjects :-)

While Ben Marlin collected another 30k subjects for this music-recommendation study.

The underlying data for both papers is available for academic researchers (fully anonymized, both by song and by user). Send me email if you want more information.

- Malcolm

On Dec 1, 2007, at 5:43 PM, Matt Wright wrote:

Trevor Cox recently published the results of an online experiment about listeners' ratings of sound files on a six-point scale ("not horrible", "bad", "really bad", "awful", "really awful", and "horrible"). To date he has 130,000 subjects (!) and about 1.5 million data points:


Here's the website for his experiment: http://www.sound101.org

Clearly this is related to the "effect of visual stimuli on the horribleness of awful sounds" that Kelly Fitz pointed out.


On Jun 29, 2007, at 12:32 AM, Massimo Grassi wrote:
So far it looks that the experiment with the largest N (513!) is "The role of contrasting temporal amplitude patterns in the perception of speech" Healy and Warren JASA but I didn't check yet the methodology to see whether is a between or a within subject design.