APPENDIX 1

The following are summaries of studies of spectrographic voice identification and an FBI survey of forensic cases..

Greenwald, M., "The Effects of Decreased Frequency Bandwidth on Speaker Identification by Aural and Spectrographic Examination of Speech Samples", Master Thesis, Michigan State University, 1979

Hall, M. C., "Spectrographic Analysis of Interspeaker and Intraspeaker variables of Professional Mimicry", Master Thesis, Michigan State University, 1975

Hazen, B., "Effects of Different Phonetic Contexts on Spectrographic Speaker Identification", 54 J. Acoust. Soc. Am. 650, 1973

Hollien, H., & McGlone, R., "The Effect of Disguise on Voiceprint Identification", In the Proceedings of the Carnahan Crime Countermeasures Conference, University of Kentucky, University of Kentucky Press, Lexington, KY, 1976

Kersta, L. G., "Voiceprint Identification", 196 Nature Magazine 1253, Dec. 29, 1962

Reich, et al., "Effects of Selected Vocal Disguises upon Spectrographic Speaker Identification", 60 J. Acoust. Soc. Am. 919, 1976

Reich & Duke, "Effects of selected vocal disguises upon speaker identification by listening", 66 J. Acoust. Soc. Am. 1023, 1979

Smrkovski, L. L., "Collaborative Study of Speaker Identification by the Voiceprint Method", 58 J. AOAC 453, 1975

Smrkovski, L. L., "Study of Speaker Identification by Aural and Visual Examination of Non-Contemporary Speech Samples", 59 J. AOAC 927, 1976

Stevens, et al., "Speaker Authentication and Identification: A Comparison of Spectrographic and Auditory Presentations of Speech Material", 44 J. Acoust. Soc. Am. 1596, 1968

Tosi, et al., "Experiment on Voice Identification", 15 J. Acoust. Soc. Am. 2030, 1972

Tosi & Greenwald, "Voice Identification by Subjective Methods of Minority Group Voices", Paper presented at the 6th Meeting of the International Association of Voice Identification, New Orleans, La., 1978

Young, M. A.,& Campbell, R. A., "Effects of Context on Talker Identification", 42 Acoust. Soc. Am. 1250,1967

KERSTA

1962

Examiners: 8 high school girls Training duration: 1 week

Method: visual Speaker population: 123

Number of words: 10 words excerpted from sentences Context type: isolated random context

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 2000

Type of decision: forced decisions limited sample limited time random context no aural examination examiners lacked sufficient experience Results: closed trials range of errors for false ID - 0.35 to 1.0% 10 words excerpted 0.00 to 2.0%

YOUNG & CAMPBELL

1967

Examiners: 7 PhD candidates in ASC 3 assistant professors in ASC Training duration: 1 week

Method: visual Speaker population: 5 adult males

Number of words: 2 words (you/it) in isolation & excerpted from 4 short sentences Context type: 1 word in isolation 2 words from random context

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 1046

Type of decision: forced decisions limited sample random context no aural examination examiners not trained Results: closed trials range of errors for false ID - "you" in isolation: 10.4 to 18.0% 'it' in isolation: 22.7 to 33.0% "you/it" from random context in trial 1 of 15: mean error: 62.7%

STEVENS

1968

Examiners: college students 6 in the open trials 4 in the closed trials Training duration: 1 week

Method: aural vs.visual but not combined Speaker population: 24 males

Number of words: catalogue of 11 words in different random order - only 1 word used in most trials Context type: 1 to 4 words

Temporal sequence: non-contemporary (1 week) Type of trial: closed & open

Total number of trials: 216

Type of decision: forced decisions limited sample (1 to 4 words) random context no aural examination examiners not trained Results: open trials: range of errors for false ID for 4 examiners/1 word visual trials - 31.0 to 47.0% aural trials - 6.0 to 8.0% closed trials: range of errors for false ID - 1 - 4 discrete words visual trials 20.0 to 30.0% aural trials 5.0 to 18.0%

TOSI ET AL

1968 - 1970

Examiners: 29 of various backgrounds Training duration: 1 month

Method: visual Speaker population: 250 males randomly selected from a population of 25,000

Number of words: 6 & 9 words Context type: isolated, fixed and random context

Temporal sequence: contemporary & noncontemporary (1 month) Type of trial: closed & open

Total number of trials: 34,992

Type of decision: forced decisions, but allowed to rate confidence level limited sample limited time no aural examination examiners lacked sufficient experience Results: range of errors for all trials false ID - 0.51 to 6.43% when only 'fairly & almost' certain decisions are combined, the error of false ID reduces to 2.4%

HAZEN

1972

Examiners: college students (7 panels of 2) Training duration: 5 lectures and 3 practice sessions

Method: visual Speaker population: 60 males

Number of words: 5 words in the same context, 5 words physically excerpted from random conversation Context type: fixed and random context

Temporal sequence: contemporary Type of trial: closed & open

Total number of trials: 280

Type of decision: forced decisions limited sample (5 words) no aural examination random & fixed context examiners lacked sufficient experience used the most dissimilar spectrographic utterances compared sounds from totally different words studying changing phonetic context examiners could not evaluate effects of coarticulation due to questionable word boundaries Results: closed trials errors for false ID - fixed context range:10.0 to 30.0% mean: 20.0% random context range:50.0 to 90.0% mean: 74.29% open trials errors for false ID - fixed context range:16 to 66% mean: 42.86% random context range:66 to 100% mean: 83%

SMRKOVSKI

1974

Examiners: 7 police & private Training duration: more than 2 years experience/less than 2 years experience

Method: combined aural and visual Speaker population: 7 male & female

Number of words: 38 to 54 words Context type: fixed context

Temporal sequence: noncontemporary (1 week) Type of trial: open

Total number of trials: 84

Type of decision: no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination trained and experienced examiners Results: open trials trainees w/less than 2 yr experience: false ID - 0.0% false elim. 5.0% no decision 25.0% 0.35 to 1.0% examiners w/more than 2 yr experience: false ID - 0.0% false elim. 0.0% no decision 22.0%

SMRKOVSKI

1975

Examiners: 12 scientists, police and private Training duration: novice: no training trainee: < 2 yr Professional: > 2 yr

Method: combined visual and visual Speaker population: 20 male & female

Number of words: 9 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 120

Type of decision: no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination compared words in context - trainees, novices and experienced examiners Results: open trials: errors novices false ID 5.0% false elim 25.0% no decision 2.5% trainee false ID 0.0% false elim 0.0% no decision 2.5% Professional false ID 0.0% false elim 0.0% no decision 7.5%

HALL

1975

Examiners: 4 professional and 20 college graduates Training duration: IAVI certified voice identification examiner

Method: combined visual and visual / visual only Speaker population: professional mimic and 6 celebrity voices

Number of words: mimic (mean of 25 sec.), celebrities (mean of 35 min.) Context type: quasi-fixed and random context

Temporal sequence: contemporary/ noncontemporary Type of trial: open

Total number of trials: aural (20/examiner) visual (200/examiner) Type of decision: same, different or undecided 5 IAVA classifications

Results: Interspeaker variability does not exist between a mimicked, disguised voice and the nature voice of the subject mimicked. Intraspeaker variabilities are minute and not significant when comparing mimics' voice and the nature voice of the mimic. Aurally: The smaller signal-to-noise ratio within the recording and the more similar the context, the greater the percentage of accuracy in distinguishing between speakers. AURAL EXAMINATION: Grand means: RIGHT WRONG UNDEC. Grad. students 0.74 0.18 0.08 Professional 0.92 0.082 0.0

HOLLIEN/McCLONE

1975-76

Examiners: 5 faculty 1 graduate student Training duration: "the authors were familiar with the 'voiceprint' method of speaker identification"

Method: visual only (spectrograms were cut & mounted) Speaker population: 25 faculty and graduate students of the University of Florida

Number of words: 7 words Context type: "I do not set the same store"

Temporal sequence: contemporary Type of trial: open

Total number of trials: 25/examiner

Type of decision: record a match/ indicate none was possible Results: ". . . even skilled auditors such as these were unable to match correctly the disguised speech to the reference (normal) samples as much as 25% of the time . . . these groups were able to disguise their voices in such manners that their identification by the 'voiceprint' technique became little more than a matter of chance."

REICH ET AL

1976

Examiners: 2 PhD candidates in speech science 2 PhD candidates in speech pathology Training duration: 3 courses in speech science plus previous experience with speech spectrograms: 4 weeks at 10-15 hr/wk

Method: visual only (words excerpted and mounted) Speaker population: 40 adult males (mean: 27.3 yrs)

Number of words: 9 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 105 (7 matching tasks w/15 known & 15 unknown)

Type of decision: 1 to 5 certainty scale Results: The examiners were able to match speakers with a moderate degree of accuracy (55.67%) when there was no attempt to vocally disguise. Disguised speech significantly interfered with speaker identification. Further research is needed . . . in which the examiners may listen to the voice as well as view the spectrograms.

ROTHMAN

1977

Examiners: 30 listeners 6 visual examiners Training duration: none

Method: Study I: Aural Study II: Visual (0 to 8kHz) Speaker population: 12

Number of words: four - 2 second speech segments Context type: random context

Temporal sequence: contemporary/ noncontemporary (1wk) Type of trial: open

Total number of trials: 5 visual 38 aural

Type of decision: same/different for each contemporary and noncontemporary Results: 94% correct identifications were obtained for contemporary speech segments. 42% correct identifications were obtained for noncontemporary speech segments. 58.45% correct identifications were obtained when comparing different speakers. All examiners in pretest visual achieved 100% correct matching. Aural method is clearly superior to the spectrographic or 'voiceprint' method

McGLONE, HOLLIEN & HOLLIEN

1977

Examiners: 4 phoneticians Training duration: experienced

Method: visual measurement of format fundamental frequency to obtain for Speaker population: 23 adult males

Number of words: 7 words ("I do not set the same store" Context type: fixed (normal & disguised) context

Temporal sequence: contemporary Type of trial:

Total number of trials: 46/phonetician

Type of decision: Results: A great amount of variability in the fo was found between normal and disguised speech. The mean bandwidth differences (f1, f2, f3) for the group were large and also demonstrated considerable variability. Phonetic means also differed.

HOULIHAN - Study I

1977

Examiners: 21 undergraduate students Training duration: series of lectures & discussions on phonetics, acoustics, and sound spectrography and speaker identification

Method: visual only Speaker population: 9 female, 5 male undergraduates - homogenous age and geographic background

Number of words: 9 words Context type: fixed context: 5 voice conditions (normal, lowered, falsetto, whispered and muffled)

Temporal sequence: contemporary Type of trial: open

Total number of trials: 18 matches

Type of decision: same/different Results: correct identifications: F- voice M-voice normal 100% 95% lowered 85% 95% falsetto 95% 90% whispered 5% 98% muffled 75% 100% range: 39 to 70% correct mean: 58.8% Std.D.: 8.7%

HOULIHAN - Study II

1977

Examiners: 7 students from Experimental phonetics Training duration: completion of Exp. I with feedback

Method: visual only Speaker population: 8 female, 8 male (mean age: 25.3 yrs)

Number of words: 8 words Context type: fixed context: "There's a bomb in the main post office"

Temporal sequence: contemporary Type of trial: closed

Total number of trials: 16/examiner

Type of decision: instructed to consider the sets in a particular order. All examiners considered undisguised before disguised Results: correct identifications: F-voice M-voice normal 71% 100% lowered 85% 100% falsetto 100% 67% whispered 71% 71% muffled 85% 100% The results suggest that minimally trained examiners have little difficulty with spectrographic identification in closed, contemporary, undisguised trials. Results do not suggest that female voices are more difficult to identify than male voices.

TOSI ET AL

1979

Examiners: professional and students Training duration: IAVA certified voice examiners and 2 weeks of training, respectively

Method: aural only, visual only and aural/visual combined Speaker population: Chicano (25 female and 25 male)

Number of words: four sentences approximately 2.4 seconds in Spanish Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open - randomized

Total number of trials: 600/examiner

Type of decision: same, different, no opinion. qualified percentage of self- confidence from 51 to 100% Results: Student and Professional examiners for errors of elimination and identification had a mean percentile greater for noisy samples than for quiet samples, however, professional examiners errors were due to aural only examinations whereas spectrographic/aural examinations produced 0.0% errors. The 'no opinion' option was used more by professional examiners.

REICH

1979

Examiners: 24 undergraduate students, 3 doctoral students, 3 professors of Speech and Hearing Science Training duration: brief lecture; 120 discrimination trials identical to the experiment

Method: aural only Speaker population: 40 adult males (mean age: 27.3 yrs)

Number of words: 9 words (it, is, on, you, and, the, I, to, me) Context type: fixed context

Temporal sequence: noncontemporary (2 weeks +) Type of trial: open

Total number of trials: 18 matches

Type of decision: same/different (1 to 5 certainty) Results: Both groups were able to discriminate speakers with moderately high degrees of accuracy, 92% correct for undisguised. Disguised trials ranged from 59 to 81% depending on the disguise. Recommended further research to study the combined aural/spectrographic method.

GREENWALD

1979

Examiners: 3 professional, 5 trainees (less than 2 years experience) Training duration: professionals: 8 yrs each trainees: < 2 yrs

Method: aural only, visual only and aural/visual combined Speaker population: 12 female, 12 male; American Midwest dialect

Number of words: 24 words Context type: fixed context

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 192 discrimination types Type of decision: the five IAVI alternatives

Results: Professional examiners produced no errors of false identification or elimination. 1536 decisions by all eight examiners. Effect of restricted bandwidths (240-2K, 240-2.5K, 240-3K, and 240-4K) does not increase the errors but does increase the percentage of 'no decisions'. Training of the examiner is very important on error rate. Trainees produced errors as follows: 6.1% false identification and 4.1% false elimination for all trials. However, at 240-4khz., 0.0% errors of false identification of elimination.

KOENIG - FBI SURVEY

1986

Examiners: Federal Bureau of Investigation voice identification examiners Training duration: minimum of 2 yrs experience, completion of at least 100 actual voice comparison cases, formal approval by other trained examiners

Method: combined aural/visual method Speaker population: actual criminal cases

Number of words: varied with each case Context type:

Temporal sequence: noncontemporary Type of trial: open

Total number of trials: 2000 forensic comparisons

Type of decision: very similar very dissimilar no decision (low confidence) Results: number percent no/low conf. 1304 65.2 elimination 378 18.9 identification 318 15.9 errors false elim. 2 0.53 false id. 1 0.31

Introduction | The Sound Spectrograph | The Method of Voice Identification | History
Standards of Admissibility | Research Studies | Conclusion| Table of Cases | Appendix 1