auditory memory and sound classification

Dear list,

During my PhD thesis I was working on developing a general audio classifier
for multimedia applications. I started from the supposition that if we model
some aspects of the way humans classify audio semantic classes (in our
context semantic class means: speech, music, jazz, man, action.) while
building an audio classifier will be advantageous. In two experiments I made
for the human discrimination capability between speech/music and
man(speech)/woman(speech) the context and the duration of the stimuli
affected seriously the discrimination judgments by the subjects. For
example, a short speech stimulus (70ms) presented after a 900ms music
stimulus was not noticed by the subjects in the majority of cases (similarly
to the spectral contrast in speech recognition).
>From an engineering perspective I supposed that some kind of integration of
the spectral activity over time exists (we suppose for instance a simple
model of the ear consisting of estimating and transferring the energy
information into different frequency bands to be processed by the cortex). I
supposed then an auditory memory model consisting of some kind of a mean
(accumulation) and a variance (surprise) of the past spectral energy in
different frequency bands. This model, although very simplistic and based
only on intuition, when used as the basis for audio signal feature
extraction was shown to have interesting properties for general audio
classification in multimedia indexing applications.

I was wondering if we can suppose from a scientific point of view that human
subjects when classifying a stimulus into high level concepts (rain,
explosion, speech) base their judgments on the auditory memory state (the
integration and correlation of the past spectral activities for instance)
which is updated continuously by new acoustic activities. Also I would like
to know if such auditory memory models exist.

I am seeking clarification, directions, and references about the effect of
the auditory memory models on the human perception of general sound if

