The acoustic--phonetic characteristics of the word accent in common Japanese are based on the moraic structure by which segmental and prosodic features are temporally coordinated. The present study investigates the temporal organization of these features with an emphasis on the influence of ``moraic phonemes.'' The speech material consists of words of varying lengths, phonetic constituents, and accent types embedded in a carrier sentence. The timing of the prosodic features was represented by the onset/offset of the accent command, estimated from the F[inf 0] contour using a quantitative model for the process of F[inf 0] contour generation. The timing of the segmental features was represented by the acoustic--phonetic boundaries, detected by referring to the frequency--time--intensity patterns. While the timing of the accent command is strongly correlated with the onset of the vowel rather than with the onset of the mora-initial consonant, segmental constituency, especially the presence/absence of the moraic phonemes, is found to exert strong systematic influences on their relative timing. The results are formulated as a quantitative model of timing for use in speech synthesis.