Subject: musical tones in speech From: Martin Braun <nombraun(at)POST.NETLINK.SE> Date: Fri, 11 May 2001 15:31:38 +0200
The following information is essential when comparing Alain de Cheveigné's (see end of mail) and my results on "musical tones in speech". I put three questions to Bob Ladd, who is not a regular reader of the list but a leading expert in intonational phonology (see address below). He answered as follows: 1) Do "all glottal vibrations" in a section of speech have any relation to speech targets? No, that's the whole point. Most of them are transitions. Perhaps the clearest explanation of this notion and its empirical consequences is in Chapter 2 of Pierrehumbert and Beckman's "Japanese Tone Structure" (MIT Press, 1988), esp. section 2.2.1. 2) Do "maxima or minima of contiguous voiced portions" (presumably extracted by software) have any relation to speech targets? Well, this is closer to what Jacques Terken and I were looking at, but (a) if the software is stupid, it will be misled by local F0 perturbations (e.g. it will find the first glottal cycle after a voiceless stop as a local maximum), and (b) I'm puzzled by Alain's reference to "breath groups", since contiguous voiced portions the size of breath groups are bound to be interrupted by short stretches of voicelessness, unless of course his materials are carefully controlled segmentally. But yes, if you did some intelligent pre-processing of the laryngograph signal you could take the automatically extracted maxima and minima as a first approximation to targets (or certainly a first approximation to the kinds of targets Jacques and I were looking at). 3) Is a software extraction of speech targets at all possible today, or in the near future? No. It's still in many ways a theoretical question, not an empirical one. Bob Ladd's address is: Bob Ladd Dept. of Theoretical and Applied Linguistics University of Edinburgh Edinburgh EH8 9LL bob(at)ling.ed.ac.uk http://www.ling.ed.ac.uk/~bob Concerning many issues related to these questions I can also recommend a book Bob published: D. Robert Ladd, Intonational Phonology, Cambridge University Press 1996. Martin Martin Braun Neuroscience of Music Gansbyn 14 S-671 95 Klässbol Sweden nombraun(at)post.netlink.se Replying to my message from Tuesday, May 08, 2001, Alain de Cheveigné wrote on Thursday, May 10, 2001: "I happen to be working with several databases of speech recorded with a laryngograph signal (which allows accurate F0 to be estimated). Together they contain 1.75 hours of speech of which half is voiced, pronunced by 38 speakers (19 male, 19 female) of Japanese (30), English (4) and French (4). The data have been carefully labeled with an accurate period estimation method with sub-sample resolution, and the estimates checked visually. A histogram of F0 values with 1/4 semitone bins shows no obvious structure related to the musical scale. A 4-bin histogram of values modulo one semitone is essentially flat. The remarkable statistics of "hand-marked end- and turning points of the contour" are apparently not reflected in raw F0 contours." He added the same day upon my question, "At what points in a sentence did you extract f0 ?" as follows: "At all points for which there was regular glottal vibration. By raw I mean that no "speech target" selection process was involved. I also tried doing statistics of maxima or minima of contiguous voiced portions (which roughly correspond to "breath groups") as a rough but plausible target selection process. No sign of a note-related structure in the distribution of values."