[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Question: same/different judgments across domains.



I agree with the points made so far.  I'd already drafted this as adding 'some general points I think are compatible' before I read Mattson's msg. I'm sending it anyway, because one or two points have not yet been made, and while others are obvious or have now been said, it may be helpful to put them in one place. I can supply references for most of my points if you would like them, but much of this is easily found in literature that may be of more relevance to you.

- Listening strategies are unavoidable, so even if you try to produce an unbiased initial situation, participants are likely to develop a strategy during the experiment that is tuned to the particular stimuli (including their range of variation) and task. The strategy may or may not vary significantly between individuals, depending on stimulus construction and presentation.
- What do you want to generalise your results to?  Responses to short sounds heard out of context may not generalise to responses to longer sounds, and the same sound can be interpreted very differently in different contexts. Ideally, your presentation context as well as your sound stimuli themselves will reflect the situations you want your experiment to be relevant to.
- Another consideration might be the definition of your categories.  Is it the domains (e.g. speech, music, natural environment) you are interested in, or detection of different timbres? If it's the domains, then it would seem reasonable to let awareness of the domain be part of the experiment, since expectations tend to drive perceptions of ambiguous stimuli. But it sounds as though timbre rather than domain may be the point. If timbre, then these can change within a 250-ms excerpt in all 3 domains mentioned. So considering whether you want natural dynamical variation or not could be important. (And perhaps to use stimuli long enough for those functional categories to be meaningful.)
- do you care about thresholds, or what people normally do above threshold? Either way, exactly where and how in a sound chunk a particular change occurs is sometimes critical, and sometimes of no apparent importance at all. This is perhaps particularly true for speech, for example for f0 contours vis a vis the syllable structures that carry them, and for what the perceived function of the utterance is (which would normally require it to be heard in context). And though it sounds as though you are probably planning a psychoacoustic expt, the fact you've asked the question suggests that you might in time want to find a functionally-meaningful task, perhaps in addition to a 4IAX task. If you do, this could affect stimulus choice now.
- Stimulus variation can strongly affect responses, presumably by helping or hindering attention to be focussed on particular acoustic properties. You can assess this by blocking stimulus presentations, so that listeners hear only one type of stimulus in a block, or by presenting the full range in a block. With subtle differences, you'll likely get different results.
- While stimuli of constant duration look nicely controlled, and can be the best for some experiments (probably including discrimination and threshold tasks), it could be worth considering the amount of information conveyed within a stimulus. Generalising across genres, musical notes are typically rather slower than spontaneous conversational speech. In normal-rate speech, 250 ms can (but does not always) involve more than one syllable and often more than one word. Fast music can involve several notes within 250 ms, but in much music, single notes typically are longer than 250 ms. (There is no 1:1 relation between phonemes, syllables, words, and notes and phrases.)
- In longer stretches of sound, temporal properties (e.g. amplitude envelope, factors that affect rhythm and metre) strongly affect perceptual responses, and how listeners hear them is culturally-sensitive. (Relevance to generalisation to real life, again.)
- Relatedly, and following on from Bob's email, speech, music and environmental sounds can and typically do include harmonic, inharmonic and aperiodic sounds, and of course silence too (albeit often in different proportions).  For speech, it is easy to stick to stimuli that have an f0, but generalisation to normal conditions may be somewhat limited. I'd predict, but don't know, the same for environmental sounds.
- Finally, where do singing and rap fit in?

Many of these issues cannot be resolved to produce perfectly controlled stimuli - you have to make (sometimes very tough) decisions about your focus and what's practical, after which other decisions are likely to be influenced by your earlier ones. Being aware that you are making the early ones before the design is finalised is useful though!

I hope this helps, and good luck!

Sarah









On 09/05/2021 16:14, Mattson ogg wrote:
Hi Max,

I looked at this a bit in grad school, particularly with very brief sounds though mostly focusing on onsets bc I was interested in getting at “when” listeners can recognize what they hear to subsequently engage any potentially different listening strategies (I.e., you more frequently hear/recognize quickly during what is basically a sound onset than dropping in on the middle of an acoustic event in the real world).

Anyway, I think the thread raises some very good points - I’d just add that it sort of depends what question you (they) are asking. I kept it fairly high level. At like 25ms listeners can only barely tell different sound classes apart. But I think by 250ms you do have different listening strategies and the same acoustic dimension can carry different kinds of information for different classes so it depends on what you’re interested in (e.g., pitch is more variable in a given vowel and can cue different speakers or emotions, often doesn’t vary as much within an instrument note and is not as useful for identifying instruments, is basically absent for many noisy environmental sounds). So IMO the trickier thing in limited time windows is controlling things so the comparisons are meaningful for your q bc in my experience there’s always a bit of compromise here due to how different those sound classes are. Note speech I think is interesting and tricky here bc it’s particularly slippery: it’s acoustically rich and variable from moment to moment. 

Anyhow since you asked for some recs here’s links to a few papers of mine that dig into this that could be helpful - all looking at slightly different questions with multiple sound classes on limited time scales. Perhaps there’s a better way to treat some of these issues but this general approach seemed like a fairly straightforward starting place to me:



(Follow up to the two previous should be on some arxiv soonish? Whenever I can get around to it! heh)







On Sun, May 9, 2021 at 12:30 AM Jan Schnupp <000000e042a1ec30-dmarc-request@xxxxxxxxxxxxxxx> wrote:
Same/different judgments are always a bad idea. Unless stimuli are actually identical, they are not the same, so the observer has to make some sort of "close enough" judgment which always involves a bit of a fudge in their minds. Much better to play 3 sounds and ask which was the odd one out, or two pairs and ask which pair was more different. In those cases you have a much more unambiguous way of declaring a response objectively correct or incorrect. There is no internal "close enough" criterion that may vary from subject to subject or from domain to domain. Playing with duration is tricky. Certain categories of sounds have characteristic temporal envelopes and if you make them "much shorter than they should be" then they are no longer good representives of their domain or category. 
Good luck with your experiment. 
Jan 


On Sat, May 8, 2021, 12:34 PM Max Henry <max.henry@xxxxxxxxxxxxxx> wrote:
Hi folks. Long time listener, first time caller...

Some friends of mind are setting up an experiment with same/different judgements between pairs of sounds. They want to test sounds from a variety of domains: speech, music, natural sounds, etc.

One of the researchers suggested that listeners will have different listening strategies depending on the domain, and this might pose a problem for the experiment: our sensitivity for difference in pitch, for example, might be very acute for musical sounds but much less-so for speech sounds.

I have a hunch that if the stimuli were short enough, this might sidestep the problem. Ie, if I played you 250 milliseconds of speech, or 250 milliseconds of music, you would not necessarily use any particular domain-specific listening strategy to tell the difference. It would simply be “sound.”

I suspect this is because a sound that’s sufficiently short can stay entirely in echoic memory. For longer sounds, you have to consolidate the information somehow, and the way that you consolidate it has to do with the kind of domain it falls into. For speech sounds, we can throw away the acute pitch information.

But that’s just a hunch. I’m wondering if this rings true for any of you, that is to say, if it reminds you of any particular research. I’d love to read about it.

It's been a pleasure to follow these e-mails. I'm glad to finally have an excuse to write. Wishing you all well.

Max Henry (he/his)
Graduate Researcher and Teaching Assistant
Music Technology Area
McGill University.