Key aspects of the categorical perception of initial stop consonants by human and animal listeners are replicated by a ``synthetic listener'' in the form of a computational auditory model. This model is a composite of a physiologically faithful ``front end'' [M. J. Pont and R. I. Damper, J. Acoust. Soc. Am. 89, 1213--1228 (1991)] and an artificial neural network (ANN) ``back end.'' ANNs are trained on neurograms (i.e., the front end's auditory-nerve representation) for the 0- and 80-ms end points of three voice-onset time continua (bilabial, alveolar, and velar), and tested on the intermediate (10, 20, ..., 70 ms) patterns. Replication of the behavioral data---in terms of warping of similarity space and shift of phoneme boundary with place of articulation---is robust in that it is maintained across a variety of ANN architectures and learning algorithms. Unlike real listeners, the synthetic listener can be analyzed easily and manipulated purposefully to reveal the acoustic and auditory features underlying the voicing contrast. A contribution analysis of the weights and activations of the ANNs highlights the auditory features. Replacement of the Pont--Damper front end by a simple Fourier-based spectral analysis abolishes the boundary-shift effect, although similarity warping is maintained. This suggests that restructuring of the acoustic features by the peripheral auditory system is crucial to phonetic classification.