[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Granular synthesis and auditory segmentation



Dear List,
    The topic of granularity and segmentation relates to some experiments I
did in the early 1990s looking for the reason for the high intelligibility
of differentiated clipped speech.  I discovered that fragments (grains) of
waveforms taken from intervals as small as the period between
unidirectional zero crossings (halfwaves) contain sufficient information to
characterize speech as well as other sounds.  From this discovery I
developed an algorithm that recognizes transitions in the fragment shapes
that locate phonetic segments.  To test the validity of the approach I
compared the timbre and intelligibility of words and/or sentences that are
reconstructed from the fragment data against the sounds of the original
utterances.  The reconstruction algorithm uses information derived from the
segment analyzer as follows: (1) from each segment select a halfwave
fragment, (2) label each segment's phonetic class, (3) extract each
segment's prosody (pitch and envelope information), and (4) label whether
it is voiced or unvoiced.  Much of the reconstructed waveform
intelligibility has been quite good, even with fricatives and whispers.  I
concluded that the method was on the correct path.
   The problem is that details of these experiments and results are
currently unpublished except for a poster paper I presented at the 1991
Whistler IEEE Workshop on speech coding applications.  The object of the
paper was to show that the segmentation algorithm could have application in
speech coding by using the very high compression of speech data that is
available in the redundancy of granular phonetic information.  However,
despite the promising results, I found that building a model suitable for
demonstrating a commercial speech coder was beyond my personal resources.
Nevertheless, I think I have shown that waveform fragments that are smaller
than the conventional grain size contain the timbre information from which
a variety of sounds may be synthesized.  In other words, you can bypass the
time-frequency limit by ignoring it.
   For anyone familiar with APL language I could send some software to play
with.  The best way to understand this is to hear it.

   Best wishes,

    John Bates


>Has anyone systematically explored the use of
>"granular synthesis" in manipulating auditory
>streaming and segregation?
>
>I'd like to see connections between Bregman's
>interesting ASA work and mesoscopic auditory
>textures.
>
>I expect such textures to be helpful in controlling
>auditory grouping and segregation, and since my own
>work maps visual textures to auditory textures in a
>way closely related to granular synthesis, there is
>an obvious connection that could be of practical
>interest.
>
>Best wishes,
>
>Peter Meijer

Email to AUDITORY should now be sent to AUDITORY@lists.mcgill.ca
LISTSERV commands should be sent to listserv@lists.mcgill.ca
Information is available on the WEB at http://www.mcgill.ca/cc/listserv