Dissertation announcement: Music-Listening Systems (Eric Scheirer )

Subject: Dissertation announcement: Music-Listening Systems
From:    Eric Scheirer  <eds(at)media.mit.edu>
Date:    Fri, 28 Apr 2000 12:58:42 -0400

Dear Auditory list, I have recently completed my dissertation and submitted it to the Media Laboratory. A one-sentence description of the contents is: "Computational models of the auditory scene analysis of fully-complex musical signals, and their use in explaining human semantic judgments of music." I hope someone on the list finds it interesting and/or useful. It may be downloaded from my web site at http://sound.media.mit.edu/~eds/thesis/ I am also pleased to mail paper copies (once they return from the bindery) to those who would like them, as long as my supply lasts. Please send your mailing address via (off-list) email. Abstract and TOC follow. Best to all, -- Eric +-----------------+ | Eric Scheirer |A-7b5 D7b9|G-7 C7|Cb C-7b5 F7#9|Bb |B-7 E7| |eds(at)media.mit.edu| < http://sound.media.mit.edu/~eds > | 617 253 1750 |A A/G# F#-7 F#-/E|Eb-7b5 D7b5|Db|C7b5 B7b5|Bb| +-----------------+ Scheirer, E. D. (2000) Music-Listening Systems. Unpublished Ph.D. Dissertation, MIT Media Laboratory, June 2000. 248pp. Abstract: When human listeners are confronted with musical sounds, they rapidly and automatically orient themselves in the music. Even musically untrained listeners have an exceptional ability to make rapid judgments about music from very short examples, such as determining the music s style, performer, beat, complexity, and emotional impact. However, there are presently no theories of music perception that can explain this behavior, and it has proven very difficult to build computer music-analysis tools with similar capabilities. This dissertation examines the psychoacoustic origins of the early stages of music listening in humans, using both experimental and computer-modeling approaches. The results of this research enable the construction of automatic machine- listening systems that can make human-like judgments about short musical stimuli. New models are presented that explain the perception of musical tempo, the perceived segmentation of sound scenes into multiple auditory images, and the extraction of musical features from complex musical sounds. These models are implemented as signal-processing and pattern-recognition computer programs, using the principle of *understanding without separation*. Two experiments with human listeners study the rapid assignment of high-level judgments to musical stimuli, and it is demonstrated that many of the experimental results can be explained with a multiple-regression model on the extracted musical features. From a theoretical standpoint, the thesis shows how theories of music perception can be grounded in a principled way upon sychoacoustic models in a computational-auditory-scene- analysis framework. Further, the perceptual theory presented is more relevant to everyday listeners and situations than are previous cognitive-structuralist approaches to music perception and cognition. From a practical standpoint, the various models form a set of computer signal-processing and pattern-recognition tools that can mimic human perceptual abilities on a variety of musical tasks such as tapping along with the beat, parsing music into sections, making semantic judgments about musical examples, and estimating the similarity of two pieces of music. --- Music-Listening Systems Table of Contents CHAPTER 1 Introduction 1.1. Organization CHAPTER 2 Background 2.1. Psychoacoustics 2.1.1. Pitch theory and models 2.1.2. Computational auditory scene analysis 2.1.3. Spectral-temporal pattern analysis 2.2. Music Psychology 2.2.1. Pitch, melody, and tonality 2.2.2. Perception of chords: tonal consonance and tonal fusion 2.2.3. The perception of musical timbre 2.2.4. Music and emotion 2.2.5. Perception of musical structure 2.2.6. Epistemology/general perception of music 2.2.7. Musical experts and novices 2.3. Musical signal processing 2.3.1. Pitch-tracking 2.3.2. Automatic music transcription 2.3.3. Representations and connections to perception 2.3.4. Tempo and beat-tracking models 2.3.5. Audio classification 2.4. Recent cross-disciplinary approaches 2.5. Chapter summary CHAPTER 3 Approach 3.1. Definitions 3.1.1. The auditory stimulus 3.1.2. Properties, attributes and features of the auditory stimulus 3.1.3. Mixtures of sounds 3.1.4. Attributes of mixtures 3.1.5. The perceived qualities of music 3.2. The musical surface 3.3. Representations and computer models in perception research 3.3.1. Representation and Music-AI 3.3.2. On components 3.4. Understanding without Separation 3.4.1. Bottom-up vs. Top-Down Processing 3.5. Chapter summary CHAPTER 4 Musical Tempo 4.1. A Psychoacoustic Demonstration 4.2. Description of a Beat-tracking Model 4.2.1. Frequency analysis and envelope extraction 4.2.2. Resonators and tempo analysis 4.2.3. Phase determination 4.2.4. Comparison with autocorrelation methods 4.3. Implementation and Complexity 4.3.1. Program parameters 4.3.2. Behavior tuning 4.4. Validation 4.4.1. Qualitative performance 4.4.2. Validation Experiment 4.5. Discussion 4.5.1. Processing level 4.5.2. Prediction and Retrospection 4.5.3. Tempo vs. Rhythm 4.5.4. Comparison to other psychoacoustic models 4.6. Chapter summary CHAPTER 5 Musical Scene Analysis 5.1. The dynamics of subband periodicity 5.2. Processing model 5.2.1. Frequency analysis and hair-cell modeling 5.2.2. Modulation analysis 5.2.3. Dynamic clustering analysis: goals 5.2.4. Dynamic clustering analysis: cluster model 5.2.5. Dynamic clustering analysis: time-series labeling 5.2.6. Dynamic cluster analysis: channel-image assignment 5.2.7. Limitations of this clustering model 5.2.8. Feature analysis 5.3. Model implementation 5.3.1. Implementation details 5.3.2. Summary of free parameters 5.4. Psychoacoustic tests 5.4.1. Grouping by common frequency modulation 5.4.2. The temporal coherence boundary 5.4.3. Alternating wideband and narrowband noise 5.4.4. Comodulation release from masking 5.5. General discussion 5.5.1. Complexity of the model 5.5.2. Comparison to other models 5.5.3. Comparison to auditory physiology 5.5.4. The role of attention 5.5.5. Evaluation of performance for complex sound scenes 5.6. Chapter summary and conclusions CHAPTER 6 Musical Features 6.1. Signal representations of real music 6.2. Feature-based models of musical perceptions 6.3. Feature extraction 6.3.1. Features based on auditory image configuration 6.3.2. Tempo and beat features 6.3.3. Psychoacoustic features based on image segmentation 6.4. Feature interdependencies 6.5. Chapter summary CHAPTER 7 Musical Perceptions 7.1. Semantic features of short musical stimuli 7.1.1. Overview of procedure 7.1.2. Subjects 7.1.3. Materials 7.1.4. Detailed procedure 7.1.5. Dependent measures 7.1.6. Results 7.2. Modeling semantic features 7.2.1. Modeling mean responses 7.2.2. Intersubject differences in model prediction 7.2.3. Comparison to other feature models 7.3. Experiment II: Perceived similarity of short musical stimuli 7.3.1. Overview of procedure 7.3.2. Subjects 7.3.3. Materials 7.3.4. Detailed procedure 7.3.5. Dependent measures 7.3.6. Results 7.4. Modeling perceived similarity 7.4.1. Predicting similarity from psychoacoustic features 7.4.2. Predicting similarity from semantic judgments 7.4.3. Individual differences 7.4.4. Multidimensional scaling 7.5. Experiment III: Effect of interface 7.6. General discussion 7.7. Applications 7.7.1. Music retrieval by example 7.7.2. Parsing music into sections 7.7.3. Classifying music by genre 7.8. Chapter summary CHAPTER 8 Conclusion 8.1. Summary of results 8.2. Contributions 8.3. Future work 8.3.1. Applications of tempo-tracking 8.3.2. Applications of music-listening systems 8.3.3. Continued evaluation of image-formation model 8.3.4. Experimental methodology 8.3.5. Data modeling and individual differences 8.3.6. Integrating sensory and symbolic models Appendix A: Musical Stimuli Appendix B: Synthesis Code B.1. McAdams oboe B.2. Temporal coherence threshold B.3. Alternating wideband and narrowband noise B.4. Comodulation release from masking References

This message came from the mail archive
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University