There are several issues here:
1) different mics are known to behave quite differently in 'reactive' (ie enclosed or semi-enclosed)spaces (ref: Angelo Farina), and thus vary in their abilities to reproduce the salient plane-wave characteristics. Allthough the same mics are being used to record human speakers and loudspeakers, which would seem to rule out this factor, because of the ways that the two types of source may differ in their relationships with local surfaces, this might still be pertinent. (in other words, you could get some mics which yield similar results for both types of source, theoretically).
2) Loudspeakers sound more like loudspeakers than they do anything else. A given speaker's diffusion characteristic, and hence it's acoustic relationships with local features remain fairly constant, irrespective of programme material (to a point!).
3) A loudspeaker is (generally) designed to be part of a 'sound field reproduction system', rather than as a sounding object in itself (with the exception of speakers which are in fact designed to be part of a musical instrument, such as guitar cabs - these could be called 'production systems' rather than re-production systems). As such, a design intention is to minimise resonances at all frequencies, whereas all the 'sounding objects' the speaker is to imitate rely on resonance for the majority of their energetic output. Because of acoustic coupling (air-transformer) effects, an object in resonance interacts with it's local environment quite differently from a sounding object which is not in resonance.
4)In fact, from 3), an important design criterium for speaker enclosure design is to minimise speaker -room interactions, inhibiting speaker localiseability in favour of phantom image localiseability.
It's my guess that the loudspeakers should be most localiseable when they are a) distorting in one or more ways (compressing,clipping, resonating etc.,) or b) playing material which is electronically generated, and never in itself contained any 'spatial information' (with apologies for using the term 'information' in this way).
This highlights the problem with the (generally held) notion that it is possible to accurately record 'a sound' yet strip it of any spatial attribute - what is often called 'direct sound'.Consideration of what might constitute 'direct sound' shows the term to be a theoretical entity which is useful, but about as likely as a 'point source'.
It is unclear whether the sense of moving through an environment that you are after would be improved by better localisability of sounding objects in that environment, or whether some other characteristic of the environment itself might directly appeal to spatial perception in such a way as to generate a sense of 'out there-ness'. (I suspect the latter).
Lastly, from an ecological point of view, there's the question of whether a highly symmetrical environment such as the corridor you describe is particularly 'natural'; did we evolve best acuity for this type of environment? I suspect not, and that a highly symmetrical environment is second only to an anechoic one in terms of difficulties in localisation. Reflections and resonances are simply too homogenous, which is tantamount to saying 'lacking in potential-information' .(I've done some informal experimenting in large circular empty spaces)
The bottom line is, loudspeakers don't sound like 'real objects' (unless they are actually being real objects). They actually imitate a wide variety of real objects fairly well from the perspectives of considering frequency response, dynamic range etc., but none of their design criteria have anything to do with spatial perception.
It would be interesting to speculate as to what conditions you would need to achieve similar results for 'real' speakers and loudspeakers.