Applying the ``theory of psychological testing,'' physical noise measurement procedures are regarded as ``tests'' variables and the related annoyance as the criterion variable. The concepts of ``reliability,'' ``validity'' and ``equivalence'' then can be used to assess the precision of different noise measurements. Referring to data of an investigation on aircraft noise in the vicinity of Munich Airport in 1969, in which different physical as well as annoyance data were sampled from the same subjects, it turns out, that the measurements called L[inf eq1], L[inf s], L[inf eq3], L[inf eq4], NNI and FB1 meet the equivalence criterions of test theory, that is, they cannot statistically be distinguished, and their coefficients of reliability are close to one. Other measurements like D10, H81, log N and L[inf eq10], equivalent to the former six. However, even the former six are only of moderate validity (about 0.5). Regarding psychological testing, therefore, physical noise is only a poor measure when used to predict individual annoyance, but it suffices when used to predict group averages. Moreover, it can be concluded from the theory that the attempt to enhance validity by modification of the highly reliable physical measuring procedures cannot be successful. It would be more effective to enlarge the reliability of the psychological measurement procedures of annoyance.