To realize speech visualization by which one can read any word, the processing used to extract features on manner and place of articulation by neural networks has been improved. First, a necessary and minimum number of frames and those positions to maximize the automatic recognition rate have been searched for four three-layered perceptrons, which means the networks for manner of articulation, nasals, plosives, and fricatives. The results show that two or three frames within the interval of 100 ms are necessary. Next, since the coefficients of the networks in a part around the top of words are somewhat different from those in the middle or the end, two kinds of network coefficients have been estimated for the top of words and for the others, respectively. By using the outputs of the networks for speech visualization, consonantal patterns appear naturally without using any recognition technique. Finally, reading tests have been carried out using 520 words spoken by each of ten speakers, for a total of 5200 words. Considering that the subjects were not told the answers at all during the tests and that the training time was only 30 h, the correct answers reached 90%--92%.