Matt Wright mentions a "novel online experimental paradigm" of Honing, H. (2006). Online listening tests are not so novel - I have been using a very similar method since 1999. However what concerns me about many of the growing number of online tests is the apparent disregard for conventional testing methodology, in particular control of the listener's environment.
In any listening test it is vital that all listeners are presented with the same stimuli. Given the huge variation in computer speakers and acoustic environments into which computers speak, the only method I have developed thus far is to insist upon the use of good quality headphones. Of course there will be some variation, but this variation will be far less than if no such stipulation is made, and participants take part using anything from laptop speakers to formal listening rooms.
A case can be made that for some experiments, involving rhythm for example, this variation is not significant. However for examples such as Honing's, where listeners are asked to concentrate on certain timbral aspects, the lack of information about listening environments calls into question any conclusions drawn from that data. A quick look at the test interface suggests that this is because no information was gathered on listeners' audio equipment or environments.
If results from an online test are to be taken seriously, and the results published in serious journals, it is vital that all participants receive the same stimuli. To this end I strongly recommend insisting upon good quality headphones as a delivery method. By asking participants what headphones they use, you can verify that they are not deliberately ignoring instructions and using speakers, and also collect data to answer any future studies which may call into question certain types of headphones for this purpose.
We have formally examined the difference between taking a listening test in a controlled environment, and taking the same test in an uncontrolled environment (using headphones, a web interface and strict guidelines). Our conclusions were that there was almost no difference in the results, with a slight increase in self-indicated listener confidence in their ratings for the uncontrolled situation. This suggests that for listening tests in which listener stress may play an important factor, taking part in an uncontrolled but carefully stipulated environment might actually improve subject performance.
Given the many unknowns about online listening tests, it is also vital that they are preceded by at least a small-scale offline test. The results can then be compared as they come in, and any significant differences rapidly identified, hopefully before too much wasted effort by online participants.
There are various papers available about online psychological experiments, but as far as I know our research is the only study which attempts to quantify the difference between online and offline testing for listening experiments. I have therefore just made our poster on the subject (from an ASA meeting last year) available on my website - there are a couple of useful references. See http://www.disley.org/mustech/ASA-remote-poster.pdf - obviously being a poster, there is no great detail, but I hope it is useful. If anyone would find a paper on the same subject useful, please suggest an appropriate journal to me.
Returning to the question of large N, this particular experiment only had 23 participants, but in the wider experiment we had about 100 participants, although for language reasons we only used the data from 59 participants. My research has also suggested that for a lot of psychoacoustic topics that I investigate, the nationality and tongue of listeners is extremely important, with significant differences in understanding of timbral adjectives by, for example, UK and US English speakers. But that's another topic entirely!
(Dr Alastair Disley, Audio Lab, University of York, UK)
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it