[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ph.D. thesis announcement



Dear List,

I am pleased to announce the availability of my Ph.D. thesis:

Modeling Judgements of Environmental Sounds by means of Artificial Neural
Networks
Holger U. Prante
Berlin University of Technology, Germany, 2001

The document is available in paperback (ISBN 3-89825-364-3) or as download
from http://www.dissertation.de/PDF/hup507.pdf [2.5 MB].

Best Regards,
Holger U. Prante
prante@t-online.de

-------------------------------------------------------------------------
Abstract
-------------------------------------------------------------------------
The thesis pursues the following objectives:
(a) to collect and to evaluate pairs of adjectives for sound (quality)
assessment.
(b) to select and binaurally record environmental sounds.
(c) to perform listening test for obtaining assessments of the recordings
(b) using the adjectives (a). Examination of principle hearing dimensions by
means of factor analysis applied to the subjects' assessments and
correlation of these with physical parameters extracted from the sounds.
(d) to set-up and to compare TEMPORAL supervised and unsupervised neural
networks for re-producing the hearing dimensions (c).
(e) to compare the prediction (d) with results from linear multiple
regression using "classical" psychoacoustic parameters (loudness, roughness,
sharpness etc.) examined from the sounds.

Results:
(a) In a pre-study 384 pairs of adjectives were collected from literature. A
cluster analysis was performed to identify corresponding pairs of
adjectives. As an outcome of the analysis 12 semantic clusters were formed
which can be represented by 24 pairs of adjectives.
(b) Subjects were asked to give sound examples which correspond best to the
adjectives. 25 sounds were selected to represent the 12 semantic clusters.
The sounds were recorded by means of a dummy head on digital audio tape
[available on CD: Environmental sounds for psychoacoustic testing, K.
Johannsen and H. Prante, Supplement to acta acoustica ACUSTICA 87 (2) 2001].
(c) In the listening test 20 subjects assessed the environmental sounds
using the semantic differentials. The factor analysis produced 6 dimensions
explaining 72% of the variance in the data. The dimensions are named
according the highest factor loading, respectively: pleasant, metallic,
scratching, powerful, fluctuating, and distinct. Technically, these
dimensions represent soft / dull sounds (pleasant), sounds with strong high
frequency content (metallic), sounds with low (fluctuating) or fast
(scratching) modulation, sounds with high loudness (powerful) and sounds
with high curtosis (distinct).
(d) Two types of artificial neural networks were investigated: a temporal
supervised and a temporal unsupervised one. The supervised network was
implemented as FIR network using temporal backpropagation [Wan, 1994] and
the unsupervised network was realized as temporal self-organized feature map
[Chappell & Taylor, 1993]. As input for the connectionist prediction models
the sounds were passed through an auditory model [Slaney, 1994] producing
auditory spectra that were taken as input values for the prediction models.
Factor scores from (c) were taken as target. Using cross-validation the
supervised network showed highest prediction scores for dimension powerful
with 80% correct prediction. The unsupervised method performs best on
dimension powerful with 88% correct scores.
(e) As reference to the neural network models an alternative approach from
classical statistics was performed. Percentiles and other statistical
quantities for several psycho-acoustic parameters were calculated from the
sounds and fed into a multiple linear regression analysis. The
cross-validation results produce the highest values in this study for the
dimensions powerful and metallic, which reached 100% (correlation
coefficient 0.96 and 0.97) within the pre-defined boundaries, respectively.

Conclusions:
- The better the pre-processor the better the outcome of the classifier. The
results of the multiple regression analysis show that even a rather simple
(linear) classifier using a "problem-adjusted" pre-processing can be used to
predict the main cognitive factors of real world sounds up to 100% correct.
- The analysis of the trained FIR neural networks shows that the connecting
weights are trained to perform all-pass filters. This way the fluctuating
incoming signals cancel each other out in the hidden layer. Due to the
adjusted bias at the hidden units a DC output is produced as required from
the constant supervised output value. This outcome reflects the flexibility
of the FIR network and indicates possible applications, e.g. in the domain
of active noise control.
- The temporal-SOFM algorithms can be regarded as a stochastic clustering
algorithm. The thesis shows that the algorithm is well suited for grouping
high dimensional temporal data in pattern classification and recognition
applications.
----------------------------------------------------------------------------