When a tone or a resolved harmonic passes through a filterbank there is a phase transition. The result of this is that filters with neighbouring CFs respond at different phases. When you have two such components the brain is faced with the task of comparing the outputs of two sets of auditory filters, where the filters in each set responding at a wide range of different phases. So to know whether the two tones are in phase or not, the brain would have to know which filters in one set to compare with which filters in the other. No wonder it’s difficult!
When you apply AM to a sound then all filters wax and wane at the same time. The same is true when you present a filtered pulse train, containing only unresolved harmonics. If you have two AM sounds or two groups of filtered pulse trains at the same time then brain can fairly easily tell whether they are in phase or not (because it’s comparing two sets of auditory filters where the filters in any one set are responding at the same phase).
A model incorporating this idea, and which accounts for several phenomena reported in the literature, is as follows:
R.P. Carlyon and S. Shamma (2003). “An account of monaural phase sensitivity", J. Acoust. Soc. Am., 114, 333-348.
Not to be outdone, I can also provide a pdf on request J