
Owl Investigations home - Table of Contents
VIII
APPENDIX 1
The following are summaries of studies of spectrographic voice identification and an FBI survey of forensic cases..
Greenwald, M., "The Effects of Decreased Frequency Bandwidth on Speaker Identification by Aural and Spectrographic Examination of Speech Samples", Master Thesis, Michigan State University, 1979
Hall, M. C., "Spectrographic Analysis of Interspeaker and Intraspeaker variables of Professional Mimicry", Master Thesis, Michigan State University, 1975
Hazen, B., "Effects of Different Phonetic Contexts on Spectrographic Speaker Identification", 54 J. Acoust. Soc. Am. 650, 1973
Hollien, H., & McGlone, R., "The Effect of Disguise on Voiceprint Identification", In the Proceedings of the Carnahan Crime Countermeasures Conference, University of Kentucky, University of Kentucky Press, Lexington, KY, 1976
Kersta, L. G., "Voiceprint Identification", 196 Nature Magazine 1253, Dec. 29, 1962
Reich, et al., "Effects of Selected Vocal Disguises upon Spectrographic Speaker Identification", 60 J. Acoust. Soc. Am. 919, 1976
Reich & Duke, "Effects of selected vocal disguises upon speaker identification by listening", 66 J. Acoust. Soc. Am. 1023, 1979
Smrkovski, L. L., "Collaborative Study of Speaker Identification by the Voiceprint Method", 58 J. AOAC 453, 1975
Smrkovski, L. L., "Study of Speaker Identification by Aural and Visual Examination of Non-Contemporary Speech Samples", 59 J. AOAC 927, 1976
Stevens, et al., "Speaker Authentication and Identification: A Comparison of Spectrographic and Auditory Presentations of Speech Material", 44 J. Acoust. Soc. Am. 1596, 1968
Tosi, et al., "Experiment on Voice Identification", 15 J. Acoust. Soc. Am. 2030, 1972
Tosi & Greenwald, "Voice Identification by Subjective Methods of Minority Group Voices", Paper presented at the 6th Meeting of the International Association of Voice Identification, New Orleans, La., 1978
Young, M. A.,& Campbell, R. A., "Effects of Context on Talker Identification", 42 Acoust. Soc. Am. 1250,1967
KERSTA
1962
|
Examimers: |
8 high school girls |
Training duration: |
1 week |
|
|
|
|
|
|
Method: |
visual |
Speaker population: |
123 |
|
|
|
|
|
|
Number of words: |
10 words excerpted from sentences |
Context type: |
isolated random context |
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
closed |
|
|
|
|
|
|
Total number of trials: |
2000 |
|
|
|
|
|
|
|
|
Type of decision: |
forced decisions limited sample limited time random context no aural examination examiners lacked sufficient experience |
Results: |
closed trials range of errors for false ID - 0.35 to 1.0% 10 words excerpted 0.00 to 2.0% |
YOUNG & CAMPBELL
1967
|
Examimers: |
7 PhD candidates in ASC 3 assistant professors in ASC |
Training duration: |
1 week |
|
|
|
|
|
|
Method: |
visual |
Speaker population: |
5 adult males |
|
|
|
|
|
|
Number of words: |
2 words (you/it) in isolation & excerpted from 4 short sentences |
Context type: |
1 word in isolation 2 words from random context |
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
closed |
|
|
|
|
|
|
Total number of trials: |
1046 |
|
|
|
|
|
|
|
|
Type of decision: |
forced decisions limited sample random context no aural examination examiners not trained |
Results: |
closed trials range of errors for false ID - "you" in islation: 10.4 to 18.0% ‘it' in isolation: 22.7 to 33.0% "you/it" from ramdom context in trial 1 of 15: mean error: 62.7% |
STEVENS
1968
|
Examimers: |
college students 6 in the open trials 4 in the closed trials |
Training duration: |
1 week |
|
|
|
|
|
|
Method: |
aural vs.visual but not combined |
Speaker population: |
24 males |
|
|
|
|
|
|
Number of words: |
catalogue of 11 words in different random order - only 1 word used in most trials |
Context type: |
1 to 4 words |
|
|
|
|
|
|
Temporal sequence: |
non-contemporary (1 week) |
Type of trial: |
closed & open |
|
|
|
|
|
|
Total number of trials: |
216 |
|
|
|
|
|
|
|
|
Type of decision: |
forced decisions limited sample (1 to 4 words) random context no aural examination examiners not trained |
Results: |
open trials: range of errors for false ID for 4 examiners/1 word visual trials - 31.0 to 47.0% aural trials - 6.0 to 8.0% closed trials: range of errors for false ID - 1 - 4 discrete words visual trials 20.0 to 30.0% aural trials 5.0 to 18.0% |
TOSI ET AL
1968 - 1970
|
Examimers:1 month |
|||
|
|
|
|
|
|
Method: |
visual |
Speaker population: |
250 males randomly selected from a population of 25,000 |
|
|
|
|
|
|
Number of words: |
6 & 9 words |
Context type: |
isolated, fixed and random context |
|
|
|
|
|
|
Temporal sequence: |
contemporary & noncontemporary (1 month) |
Type of trial: |
closed & open |
|
|
|
|
|
|
Total number of trials: |
34,992 |
|
|
|
|
|
|
|
|
Type of decision: |
forced decisions, but allowed to rate confidence level limited sample limited time no aural examination examiners lacked sufficient experience |
Results: |
range of errors for all trials false ID - 0.51 to 6.43% when only ‘fairly & almost' certain decisions are combined, the error of false ID reduces to 2.4% |
HAZEN
1972
|
Examimers: |
college students (7 panels of 2) |
Training duration: |
5 lectures and 3 practice sessions |
|
|
|
|
|
|
Method: |
visual |
Speaker population: |
60 males |
|
|
|
|
|
|
Number of words: |
5 words in the same context, 5 words physically excerpted from random conversation |
Context type: |
fixed and random context |
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
closed & open |
|
|
|
|
|
|
Total number of trials: |
280 |
|
|
|
|
|
|
|
|
Type of decision: |
forced decisions limited sample (5 words) no aural examination random & fixed context examiners lacked sufficient experience used the most dissimilar spectrographic utterances compared sounds from totally different words studying changing phonetic context examiners could not evaluate effects of coarticulation due to questionable word boundaries |
Results: |
closed trials errors for false ID - fixed context range:10.0 to 30.0% mean: 20.0% randon context range:50.0 to 90.0% mean: 74.29% open trials errors for false ID - fixed context range:16 to 66% mean: 42.86% randon context range:66 to 100% mean: 83% |
SMRKOVSKI
1974
|
Examimers: |
7 police & private |
Training duration: |
more than 2 years experience/less than 2 years experience |
|
|
|
|
|
|
Method: |
combined aural and visual |
Speaker population: |
7 male & female |
|
|
|
|
|
|
Number of words: |
38 to 54 words |
Context type: |
fixed context |
|
|
|
|
|
|
Temporal sequence: |
noncontemporary (1 week) |
Type of trial: |
open |
|
|
|
|
|
|
Total number of trials: |
84 |
|
|
|
|
|
|
|
|
Type of decision: |
no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination trained and experienced examiners |
Results: |
open trials trainees w/less than 2 yr eperience: false ID - 0.0% false elim. 5.0% no decision 25.0% 0.35 to 1.0% examiners w/more than 2 yr eperience: false ID - 0.0% false elim. 0.0% no decision 22.0% |
SMRKOVSKI
1975
|
Examimers: |
12 scientists, police and private |
Training duration: |
novice: no training trainee: < 2 yr Professional: > 2 yr |
|
|
|
|
|
|
Method: |
combined visual and visual |
Speaker population: |
20 male & female |
|
|
|
|
|
|
Number of words: |
9 words |
Context type: |
fixed context |
|
|
|
|
|
|
Temporal sequence: |
noncontemporary |
Type of trial: |
open |
|
|
|
|
|
|
Total number of trials: |
120 |
|
|
|
|
|
|
|
|
Type of decision: |
no forced decisions allowed 1 to 5 conclusions no limited time aural & visual examination compared words in context - trainees, novices and experienced examiners |
Results: |
open trials: errors novices false ID 5.0% false elim 25.0% no decision 2.5% trainee false ID 0.0% false elim 0.0% no decision 2.5% Professional false ID 0.0% false elim 0.0% no decision 7.5% |
HALL
1975
|
Examimers: |
4 professional and 20 college graduates |
Training duration: |
IAVI certified voice identification examiner |
|
|
|
|
|
|
|
|
Method: |
combined visual and visual / visual only |
Speaker population: |
professional minic and 6 celebrity voices |
|
|
|
|
|
|
|
|
Number of words: |
mimic (mean of 25 sec.), celebrities (mean of 35 min.) |
Context type: |
quasi-fixed and random context |
|
|
|
|
|
|
|
|
Temporal sequence: |
contemporary/ noncontemporary |
Type of trial: |
open |
|
|
|
|
|
|
|
|
Total number of trials: |
aural (20/examiner) visual (200/examiner) |
Type of decision: |
same, different or undecided 5 IAVA classifications |
|
|
|
|
|
|
|
|
Results: |
Interspeaker variability does not exist between a mimiced, disquised voice and the nature voice of the subject mimiced. Intraspeaker variabilities are minute and not signifiacnt when comparing mimics' voice and the nature voice of the mimic. Aurally: The smaller signal-to-noise ratio within the recording and the more similar the context, the greater the percentage of accuracy in distinguishing bewteen speakers. AURAL EXAMINATION: Grand means: RIGHT WRONG UNDEC. Grad. students 0.74 0.18 0.08 Professional 0.92 0.082 0.0 |
|||
HOLLIEN/McCLONE
1975-76
|
Examimers: |
5 faculty 1 graduate student |
Training duration: |
"the authors were familiar with the ‘voiceprint' method of speaker identification" |
|
|
|
|
|
|
Method: |
visual ony (spectrograms were cut & mounted) |
Speaker population: |
25 faculty and graduate students of the University of Florida |
|
|
|
|
|
|
Number of words: |
7 words |
Context type: |
"I do not set the same store" |
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
open |
|
|
|
|
|
|
Total number of trials: |
25/examiner |
|
|
|
|
|
|
|
|
Type of decision: |
record a match/ indicate none was possible |
Results: |
". . . even skilled auditors such as these were unable to match correctly the disguised speech to the reference (normal) samples as much as 25% of the time . . . these groups were able to disguise their voices in such manners that their identification by the ‘voiceprint' technique became little more than a matter of chance." |
REICH ET AL
1976
|
Examimers: |
2 PhD candidates in speech science 2 PhD candidates in speech pathology |
Training duration: |
3 courses in speech science plus previous experience with speech spectrograms: 4 weeks at 10-15 hr/wk |
|
|
|
|
|
|
Method: |
visual only (words excerpted and mounted) |
Speaker population: |
40 adult males (mean: 27.3 yrs) |
|
|
|
|
|
|
Number of words: |
9 words |
Context type: |
fixed context |
|
|
|
|
|
|
Temporal sequence: |
noncontemporary |
Type of trial: |
open |
|
|
|
|
|
|
Total number of trials: |
105 (7 matching tasks w/15 known & 15 unknown) |
|
|
|
|
|
|
|
|
Type of decision: |
1 to 5 certainty scale |
Results: |
The examiners were able to match speakers with a moderate degree of accuracy (55.67%) when there was no attemp to vocally disguise. Disguised speech significantly interfered with speaker identification. Further research is needed . . . in which the examiners may listen to the voice as well as view the spectrograms. |
ROTHMAN
1977
|
Examimers: |
30 listeners 6 visual examiners |
Training duration: |
none |
|
|
|
|
|
|
Method: |
Study I: Aural Study II: Visual (0 to 8kHz) |
Speaker population: |
12 |
|
|
|
|
|
|
Number of words: |
four - 2 second speech segments |
Context type: |
random context |
|
|
|
|
|
|
Temporal sequence: |
contemporary/ noncontemporary (1wk) |
Type of trial: |
open |
|
|
|
|
|
|
Total number of trials: |
5 visual 38 aural |
|
|
|
|
|
|
|
|
Type of decision: |
same/different for each contemporary and noncontemporary |
Results: |
94% correct identifications were obtained for contemporary speech segments. 42% correct identifications were obtained for noncontemporary speech segments. 58.45% correct identifications were obtained when comparing different speakers. All examiners in pretest visual achieved 100% correct matching. Aural method is clearly superior to the spectrographic or ‘voiceprint' method |
McGLONE, HOLLIEN & HOLLIEN
1977
|
Examimers: |
4 phoneticians |
Training duration: |
experienced |
|
|
|
|
|
|
Method: |
visual measurement of format fundamental frequency to obtain fo |
Speaker population: |
23 adult males |
|
|
|
|
|
|
Number of words: |
7 words ("I do not set the same store" |
Context type: |
fixed (normal & disguised) context |
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
|
|
|
|
|
|
|
Total number of trials: |
46/phonetician |
|
|
|
|
|
|
|
|
Type of decision: |
|
Results: |
A great amount of variablity in the fo was found between normal and disguised speech. The mean bandwidth differences (f1, f2, f3) for the group were large and also demonstrated considerable variability. Phonetic means also differed. |
HOULIHAN - Study I
1977
|
Examimers: |
21 undergraduate students |
Training duration: |
series of lectures & discussions on phonetics, acoustics, and sound spectrography and speaker identification |
|
|
|
|
|
|
|
|
Method: |
visual only |
Speaker population: |
9 female, 5 male undergraduates - homogenous age and geographic background |
|
|
|
|
|
|
|
|
Number of words: |
9 words |
Context type: |
fixed context: 5 voice conditions (normal, lowered, falsetto, whispered and muffled) |
|
|
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
open |
|
|
|
|
|
|
|
|
Total number of trials: |
18 matches |
|
|
|
|
|
|
|
|
|
|
Type of decision: |
same/different |
Results: |
correct identifications: F-voice M-voice normal 100% 95% lowered 85% 95% falsetto 95% 90% whispered 5% 98% muffled 75% 100% range: 39 to 70% correct mean: 58.8% Std.D.: 8.7% |
|
HOULIHAN - Study II
1977
|
Examimers: |
7 students from Experimental phonetics |
Training duration: |
completion of Exp. I with feedback |
|
|
|
|
|
|
|
|
Method: |
visual only |
Speaker population: |
8 female, 8 male (mean age: 25.3 yrs) |
|
|
|
|
|
|
|
|
Number of words: |
8 words |
Context type: |
fixed context: "There's a bomb in the main post office" |
|
|
|
|
|
|
|
|
Temporal sequence: |
contemporary |
Type of trial: |
closed |
|
|
|
|
|
|
|
|
Total number of trials: |
16/examiner |
|
|
|
|
|
|
|
|
|
|
Type of decision: |
instructed to consider the sets in a particular order. All examiners considered undisguised before disguised |
Results: |
correct identifications: F-voice M-voice normal 71% 100% lowered 85% 100% falsetto 100% 67% whispered 71% 71% muffled 85% 100% The results suggest that minimally trained examiners have little difficulty with spectrographic identification in closed, contemporary, undisguised trials. Results do not suggest that female voices are more difficult to identify than male voices. |
|
TOSI ET AL
1979
|
Examimers: |
professional and students |
Training duration: |
IAVA certified voice examiners and 2 weeks of training, respectively |
|
|
|
|
|
|
|
|
Method: |
aural only, visual only and aural/visual combined |
Speaker population: |
Chicano (25 female and 25 male) |
|
|
|
|
|
|
|
|
Number of words: |
four sentences approximately 2.4 seconds in Spanish |
Context type: |
fixed context |
|
|
|
|
|
|
|
|
Temporal sequence: |
noncontemporary |
Type of trial: |
open - randomized |
|
|
|
|
|
|
|
|
Total number of trials: |
600/examiner |
|
|
|
|
|
|
|
|
|
|
Type of decision: |
same, different, no opinion. qualified percentage of self-confidence from 51 to 100% |
Results: |
Student and Professional examiners for errors of elimination and identifcation had a mean percentile greater for noisy samples than for quiet samples, however, professional examiners srrors were due to aural only examinations whereas spectrographic/aural examinations produced 0.0% errors. The ‘no opinion' option was used more by professional examiners. |
|
REICH
1979
|
Examimers: |
24 undergraduate students, 3 doctoral students, 3 professors of Speech and Hearing Science |
Training duration: |
brief lecture; 120 discrimination trials identical to the experiment |
|
|
|
|
|
|
|
|
Method: |
aural only |
Speaker population: |
40 adult males (mean age: 27.3 yrs) |
|
|
|
|
|
|
|
|
Number of words: |
9 words (it, is, on, you, and, the, I, to, me) |
Context type: |
fixed context |
|
|
|
|
|
|
|
|
Temporal sequence: |
noncontemporary (2 weeks +) |
Type of trial: |
open |
|
|
|
|
|
|
|
|
Total number of trials: |
18 matches |
|
|
|
|
|
|
|
|
|
|
Type of decision: |
same/different (1 to 5 certainty) |
Results: |
Both groups were ablt to discriminate speakers with moderately high degrees of accuracy, 92% correct for undisguised. Disguised trials ranged from 59 to 81% depending on the disguise. Recommended further research to studythe combined aural/spectrographic method. |
|
GREENWALD
1979
|
Examimers: |
3 professional, 5 trainees (less than 2 years experience) |
Training duration: |
professionals: 8 yrs each trainees: < 2 yrs |
|
|
|
|
|
|
|
|
Method: |
aural only, visual only and aural/visual combined |
Speaker population: |
12 female, 12 male; American midwest dialect |
|
|
|
|
|
|
|
|
Number of words: |
24 words |
Context type: |
fixed context |
|
|
|
|
|
|
|
|
Temporal sequence: |
noncontemporary |
Type of trial: |
open |
|
|
|
|
|
|
|
|
Total number of trials: |
192 discrimination types |
Type of decision: |
the five IAVI alternatives |
|
|
|
|
|
|
|
|
Results: |
Professional examiners produced no errors of false identification or elimination. 1536 decisions by all eight examiners. Effect of restricted bandwidths (240-2K, 240-2.5K, 240-3K, and 240-4K) does not increase the errors but does increase the percentage of ‘no decisions'. Training of the examiner is very important on error rate. Trainees produced errors as follows: 6.1% false identifcation and 4.1% false elimination for all trials. However, at 240-4khz., 0.0% errors of false identification of elimination. |
|||
KOENIG - FBI SURVEY
1986
|
Examimers: |
Federal Bureau of Investigation voice identification examiners |
Training duration: |
minimum of 2 yrs experience, completion of at least 100 actual voice comparision cases, formal approval by other trained examiners |
|
|
|
|
|
|
|
|
Method: |
combined aural/visual method |
Speaker population: |
actual criminal cases |
|
|
|
|
|
|
|
|
Number of words: |
vaired with each case |
Context type: |
|
|
|
|
|
|
|
|
|
Temporal sequence: |
noncontemporary |
Type of trial: |
open |
|
|
|
|
|
|
|
|
Total number of trials: |
2000 forensic comparisons |
|
|
|
|
|
|
|
|
|
|
Type of decision: |
very similar very dissimilar no decision (low confidence) |
Results: |
number percent no/low conf. 1304 65.2 elimination 378 18.9 identification 318 15.9 errors false elim. 2 0.53 false id. 1 0.31 |
|
Continue