[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
musical tones in speech
The following information is essential when comparing Alain de Cheveigné's
(see end of mail) and my results on "musical tones in speech".
I put three questions to Bob Ladd, who is not a regular reader of the list
but a leading expert in intonational phonology (see address below). He
answered as follows:
1) Do "all glottal vibrations" in a section of speech have any relation to
No, that's the whole point. Most of them are transitions. Perhaps the
clearest explanation of this notion and its empirical consequences is
in Chapter 2 of Pierrehumbert and Beckman's "Japanese Tone Structure"
(MIT Press, 1988), esp. section 2.2.1.
2) Do "maxima or minima of contiguous voiced portions" (presumably extracted
by software) have any relation to speech targets?
Well, this is closer to what Jacques Terken and I were looking at, but
(a) if the software is stupid, it will be misled by local F0
perturbations (e.g. it will find the first glottal cycle after a
voiceless stop as a local maximum), and (b) I'm puzzled by Alain's
reference to "breath groups", since contiguous voiced portions the size
of breath groups are bound to be interrupted by short stretches of
voicelessness, unless of course his materials are carefully controlled
segmentally. But yes, if you did some intelligent pre-processing of the
laryngograph signal you could take the automatically extracted maxima
and minima as a first approximation to targets (or certainly a first
approximation to the kinds of targets Jacques and I were looking at).
3) Is a software extraction of speech targets at all possible today, or in
the near future?
No. It's still in many ways a theoretical question, not an empirical one.
Bob Ladd's address is:
Dept. of Theoretical and Applied Linguistics
University of Edinburgh
Edinburgh EH8 9LL
Concerning many issues related to these questions I can also recommend a
book Bob published:
D. Robert Ladd, Intonational Phonology, Cambridge University Press 1996.
Neuroscience of Music
S-671 95 Klässbol
Replying to my message from Tuesday, May 08, 2001, Alain de Cheveigné wrote
on Thursday, May 10, 2001:
"I happen to be working with several databases of speech recorded with a
laryngograph signal (which allows accurate F0 to be estimated). Together
they contain 1.75 hours of speech of which half is voiced, pronunced by 38
speakers (19 male, 19 female) of Japanese (30), English (4) and French (4).
The data have been carefully labeled with an accurate period estimation
method with sub-sample resolution, and the estimates checked visually.
A histogram of F0 values with 1/4 semitone bins shows no obvious structure
related to the musical scale. A 4-bin histogram of values modulo one
semitone is essentially flat. The remarkable statistics of "hand-marked
end- and turning points of the contour" are apparently not reflected in raw
He added the same day upon my question,
"At what points in a sentence did you extract f0 ?"
"At all points for which there was regular glottal vibration. By raw I mean
that no "speech target" selection process was involved.
I also tried doing statistics of maxima or minima of contiguous voiced
portions (which roughly correspond to "breath groups") as a rough but
plausible target selection process. No sign of a note-related structure in
the distribution of values."