>I turn off the stimulus and in my head I "re-listen" [*] to the two examples again.

>My question: what would be a 'better' word to more accurately 
>describe the [*] re-listen used above.

What you do is "replay" the acoustic mental image constructed in your mind (brain) while listening. In this case, your acoustic image is most  probably still present in the working memory (short-term memory) so you do not need to activate  your long-term memory (involved in recall from long-term storage). 
By definition, auditory echoic memory is a system that receives auditory stimuli and maintains them for a short period of time in sensory stores, and therefore represents the earliest stages of sensory memory.
See http://www.dissertationen.unizh.ch/2004/gaab/Gaabthesis.pdf 

If stored long-term it acquires the status of mental REPRESENTATION - or a mental image related to the sensorial modality involved in its perception.
By the way, a linguistic sign is defined as an association between an acoustic image and a meaning attached to it through convention.
Obviously, the  representations that our cognitive system elaborates, categorizes, and puts into relations, are not to be reduced to mirror-images of  material reality. Let's not fall into naive empiricism.

>Am I now "thinking" the 'sounds'; thinking 'about' the sounds?
>When someone speaks to me, I almost immediately replay what they are 
>saying picking out the key aspects / features creating a hierarchy of 
>dealing with ideas which are to be handled in a quasi-sequential fashion, 
>Or is this what is partially meant by 'cognition'?

Not partially but entirely (unless you have a definition of cognition as being restricted to conscious mental operations).

