FW: modifying speech (Fatima Husain )


Subject: FW: modifying speech
From:    Fatima Husain  <fhusain(at)CNS.BU.EDU>
Date:    Thu, 1 Mar 2001 10:49:58 -0500

This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime(at)docserver.cac.washington.edu for more info. ------_=_NextPart_000_01C0A266.928D46B0 Content-Type: TEXT/PLAIN; CHARSET=iso-8859-1 Content-ID: <Pine.SUN.3.95.1010301104528.27863L(at)retina> Dear List, Due to requests, I am posting a summary of the responses I received. I am not attaching the sound samples P. Belin sent, but if anyone needs them, email me. Thanks to everyone who responded, incl. Jont Allen (No, I wasn't aware of the abstract but will go read it). --fatima > 1] > Date: Wed, 28 Feb 2001 08:07:39 -0800 > From: Aniruddh Patel <apatel(at)nsi.edu> > To: Fatima Husain <fhusain(at)cns.bu.edu> > Subject: Re: Modifying speech > > Hi Fatima, > > You could try spectrally rotated speech. See: > > Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J.S. Identification of a > pathway for intelligible speech in the left temporal lobe. Brain 123, > 2400-2406 (2000). > > Regards, > Ani > > 2] > Date: Wed, 28 Feb 2001 16:35:05 -0000 > From: Franck Ramus <f.ramus(at)ucl.ac.uk> > To: Fatima Husain <fhusain(at)cns.bu.edu> > Subject: Re: Modifying speech > > > dear Fatima, > perhaps the resynthesis method I have used will be what you need. > listen to the stimuli on this page: > http://www.ehess.fr/centres/lscp/persons/ramus/resynth/ecoute.htm > here, my point was to delexicalise the sentences and make them > unintelligible, > but of course you can do what you want (scramble some words, not others, > etc...). the drawback of this method is that it requires precise phonetic > labelling of the sentences, but this can be done more or less > automatically > with mbrola utilities (ask me if you want more details about this). > the accompanying paper is: > Ramus, F., & Mehler, J. (1999). Language identification with > suprasegmental > cues: A study based on speech resynthesis. Journal of the Acoustical > Society of > America, 105(1), 512-521. > and you can download it from my publications page: > http://www.ehess.fr/centres/lscp/persons/ramus/pub.htm > > all the best, > > Franck Ramus > Institute of Cognitive Neuroscience > 17 Queen Square > London WC1N 3AR > GB > tel: (+44) 20 7679 1138 > fax: (+44) 20 7813 2835 > f.ramus(at)ucl.ac.uk > > 3] > Date: Wed, 28 Feb 2001 11:44:08 -0500 > From: Marc Joanisse <marcj(at)uwo.ca> > To: Fatima Husain <fhusain(at)cns.bu.edu>, AUDITORY(at)lists.mcgill.ca > Subject: Re: Modifying speech > > Fatima, > > Sophie Scott and colleagues have used spectrally rotated speech and > vocoding > for this purpose: Scott, S.K., Blank, C.C., Rosen, S. & Wise, R.S.J. > (2000) > Brain 123, 2400-2406. The rotation technique is described in a paper by B. > Blesser (1972) J. Speech and Hearing Research 15, 5-41. It involves > amplitude modulating a speech waveform which results in high and low > frequencies being swapped. After lowpass filtering, the resulting speech > is > not recognizeable as speech and incomprehensible. The vocoding technique > is > described by Shannon et al (2995) Nature 270, 303-4. > > One caveat is that rotated speech might not be ideal for tasks involving > isolated words. When I tried using reversed speech as a control for a > speech > discrimination task, subjects reported using speech-like cues while doing > the > task. I'd think it would work much better when applied to longer passages, > which is what Scott et al. used it for. > > A second thing to try is modified sinewave speech. Removing the first or > second sinewave 'formant' from a 3-sinewave speech stimulus pretty much > removes its intelligibility while simulating some - but not all - of the > spectral and temporal characteristics of the original. I have been using > this > in my own imaging studies with some success. It's not perfect since > sinewave > stimuli lack the full spectral characteristics of actual speech. My > instinct > is that the importance of this depends on what areas of cortex you are > interested in imaging. The idea comes from a paper by Mody et al (1997) > J. > Exp. Child Psych. 64, 199-231 where they compared speech and nonspeech > discrimination in children with dyslexia. > > Good luck, > > -Marc- > > 4] Date: Wed, 28 Feb 2001 12:36:59 -0500 > From: Pascal BELIN <pascal(at)BIC.MNI.MCGILL.CA> > To: AUDITORY(at)lists.mcgill.ca > Subject: Re: Modifying speech > ---------------------------------------- > > Dear Fatima and List, > > I can think of two control sounds that might 'be generated from normal > speech, retain more or less of the spectral features of the normal > speech, yet is not pseudo-word like'. > > One is the amplitude-modulated noise that has been used for a while now > in Robert Zatorre's lab (and others): you simply modulate white-noise by > the amplitude of the speech signal. You come up with something that has > very similar amplitude waveform as the original signal, but not the > spectral content. This is a very 'low level' control, and it might not do > the job of keeping some of the spectral features. > > So another possibility, which we used recently in a neuroimaging study of > voice perception, is to use 'scrambled speech'. Here, the signal is > transformed in Fourier space, then for each window of the FFT phase and > amplitude components are randomized (phase with phase and amplitude with > amplitude), and an inverse FFT is performed. You end up with a sinal which > has the same energy as the original one, and a very similar waveform > (depending on the size of the FFT window, a very important parameter). It > is very similar to the scrambling used in the object recognition > litterature, and in fact the spectrogram of these scrambled stimuli looks > like an visual scramble of the original spectrogram. Yet the spectral > structure is also dramatically modified, perhaps less than for the > AM-noise though. > Attached are a sample of speech and of its scrambled version. > > Hope this helps. > > > Pascal BELIN, PhD > Neuropsychology/Cognitive Neuroscience Unit > Montreal Neurological Institute > McGill University, 3801 University Street > Montreal, Quebec, Canada H3A2B4 > phone: (514) 398-8519 (8504) > fax: (514) 398-1338 > http://www.zlab.mcgill.ca/ > > <<scrambled.wav>> <<original.wav>> > > 5] > Date: Wed, 28 Feb 2001 15:44:44 -0500 > From: Jont Allen <jba(at)research.att.com> > To: Fatima Husain <fhusain(at)cns.bu.edu>, > AUDITORY mailing list <AUDITORY(at)lists.mcgill.ca> > Subject: Re: Modifying speech > > Fatima Husain wrote: > > > Dear List, > > > > Sorry to barge into an interesting discussion, but - > > My lab wants to image subjects listening to normal and modified speech. > > We are trying to investigate semantic memory. > > If you accept that the phoneme is the lowest order of semantics, then > that would suggest one might test using nonsense CV and CVC sounds. > They can be identified as parts of words (subjects can think of words that > start with a given CV for example). > > Finally, have you seen the very interesting work of Cyma Van Petten? > Look at her abstract, JASA page 2643, Vol 108, #5, pt. 2, Nov 2000 > Abstract 5aSC3 (Newport Beach CA meeting Friday Dec 8, 2000) > > > -- > Jont B. Allen > AT&T Labs-Research, Shannon Laboratory, E161 > 180 Park Ave., Florham Park NJ, 07932-0971 > 973/360-8545voice, x7111fax, http://www.research.att.com/~jba > > > ------_=_NextPart_000_01C0A266.928D46B0--


This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University