Owl Investigations home - Table of Contents

III

THE METHOD OF VOICE IDENTIFICATION

The method by which a voice is identified is a multifaceted process requiring the use of both aural and visual senses. In the typical voice identification case the examiner is given several recordings; one or more recordings of the voice to be identified and one or more recorded voice samples of one or more suspects. It is from these recordings the examiner must make the determination about the identity of the unknown voice.

The first step is to evaluate the recording of the unknown voice, checking to make sure the recording has a sufficient amount of speech with which to work and that the quality of the recording is of sufficient clarity in the frequency range required for analysis. The volume of the recorded voice signal must be significantly higher than that of the environmental noise. The greater the number of obscuring events, such as noise, music, and other speakers, the longer the sample of speech must be. Some examiners report that they reject as many as sixty percent of the cases submitted to them with one of the main reasons for rejection being the poor quality of the recording of the unknown voice.

Once the unknown voice sample has been determined to be suitable for analysis, the examiner then turns his attention to the voice samples of the suspects. Here also, the recordings must be of sufficient clarity to allow comparison, although at this stage, the recording process is usually so closely controlled that the quality of recording is not a problem.

The examiner can only work with speech samples which are the same as the text of the unknown recording. Under the best of circumstances the suspects will repeat, several times, the text of the recording of the unknown speaker and these words will be recorded in a similar manner to the recording of the unknown speaker. For example, if the recording of the unknown speaker was a bomb threat made to a recorded telephone line then each of the suspects would repeat the threat, word for word, to a recorded telephone line. This will provide the examiner with not only the same speech sounds for comparison but also with valuable information about the way each speech sound completes the transition to the next sound.

There are those times when a voice sample must be obtained without the knowledge of the suspect. It is possible to make an identification from a surreptitious recording but the amount of speech necessary to do the comparison is usually much greater. If the suspect is being engaged in conversation for the purpose of obtaining a voice sample, the conversation must be manipulated in such a way so as to have the suspect repeat as many of the words and phrases found in the text of the unknown recording as possible.

The worst exemplar recordings with which an examiner must work are those of random speech. It is necessary to obtain a large sample of speech to improve the chances of obtaining a sufficient amount of comparable speech.

As in any other form of identification analysis, as the quality of the evidence with which the examiner has to work declines, the greater the amount of evidence and time necessary to complete the analysis, and the less likely the chance for a positive conclusion.

Once the evidence has been determined to be sufficient to perform the analysis, the examiner then begins the two step process of voice sample comparison; one aural (listening) and the other spectrographic (visual). These are two different but interwoven and equally important analytical methods which the examiner combines to reach the final conclusion. The first step is an aural comparison of the voice samples. Here the examiner compares both single speech sounds and series of speech sounds of the known and unknown samples. At this stage the examiner is conducting a number of tasks; comparing for similarities and differences, screening out less useful portions of the samples, and indexing the samples for further analysis. An example of the initial aural comparison is the screening of the samples for pronunciation similarities or discrepancies such as the word "the" may be said with a short "a" sound or a long "e" sound. If the word is not pronounced in the same manner it loses comparison value.

Once the examiner has located those portions to be used for the analysis, a more detailed aural comparison is undertaken. This comparison can be accomplished in many different ways. One of the most commonly used methods of aural comparison is rerecording a speech sound sample of the unknown followed immediately by a rerecording of the same speech sounds of the suspect. This is repeated several times so that the final product is a recording of specific speech sounds, in alternating order, by the unknown speaker followed by the suspect. Such comparisons have been greatly facilitated by the use of audio digital recording equipment which allows for the digital recording, storage, and repeated playback of only the desired speech sounds to be examined.

During the aural comparison the examiner studies the psycholinguistic features of the speakers voice. There are a large number of qualities and traits which are examined from such general traits as accent and dialect to inflection, syllable grouping and breath patterns. The examiner also scrutinizes the samples for signs of speech pathologies and peculiar speech habits.

The second step in the voice identification process is the spectrographic analysis of the recorded samples. The sound spectrograph is an automatic sound wave analyzer with a high quality, fully functional tape recorder. The speech samples to be analyzed are recorded on the sound spectrograph. The recording is then analyzed in two and one half second segments. The product is a spectrogram, a graphic display of the recorded signal on the basis of time and frequency with a general indication of amplitude.

The spectrograms of the unknown speaker are then visually compared to the spectrograms of the suspects. Only those speech sounds which are the same are compared. The comparisons of the spectrograms are based on the displayed patterns representing the psychoacoustical features of the captured speech. The examiner studies the bandwidths, mean frequencies, and trajectory of vowel formants; vertical striations, distribution of formant energy and nasal resonances; stops, plosives and fricatives; interformant features, the relation of all features present as affected during articulatory changes and any peculiar acoustic patterning. The examiner looks not only for similarities but also for differences. The differences are closely examined to determine if they are due to pronunciation differences or if they are indicative of different speakers.

When the analysis is complete the examiner integrates his findings from both the aural and spectrographic analyses into one of five standard conclusions; a positive identification, a probable identification, a positive elimination, a probable elimination, or no decision. In order to arrive at a positive identification the examiner must find a minimum of twenty speech sounds which possess sufficient aural and spectrographic similarities. There can be no differences either aural or spectrographic for which there can be no accounting.

The probable identification conclusion is reached when there are less then twenty similarities and no unexplained differences. This conclusion is usually reached when working with small samples, random speech samples or recordings of lower quality. The result of positive elimination is rendered when twenty differences between the samples are found that can not be based on any fact other than different voices having produced the samples. A probable elimination decision is usually reached when working with limited text or a recording of lower quality. The no decision conclusion is used when the quality of the recording is so poor that there is insufficient information with which to work or when there are too few common speech sounds suitable for comparison.

Continue