Innovative research in the Department of Linguistics suggests that dynamic features of speech could provide a clue to forensic speaker identification.
Innovative research in the Department of Linguistics suggests that dynamic features of speech could provide a clue to forensic speaker identification.
A key problem in attempting to characterise a speaker is that each individual’s voice can vary greatly. We change our voices depending on who we are talking to, how formal the situation is, the emotion we wish to express and whether there is background noise.
Recognising a voice is a familiar experience for most people – identifying a friend’s voice over the telephone, recognising the voice of a well-known personality on the radio, hearing the voice of a colleague call out from behind. But why do voices sound distinctive? Given our ability to recognise individuals, it seems reasonable to assume that voices are unique, but it has not been scientifically demonstrated that all voices are measurably distinctive. In spite of the impression given by televised crime shows, as yet there is no technique available to identify a speaker with 100% reliability.
This is a serious problem for forensic speaker identification, a branch of forensic phonetics in which a phonetician is asked to identify an unknown speaker whose voice has been recorded during the committing of a crime, for example a bomb threat, ransom demand, hoax emergency call or drug deal. The phonetician compares the incriminating recording with samples of speech from a suspect with a view to identifying the perpetrator or eliminating the suspect. These cases are often controversial, and since the extent to which an individual’s voice is idiosyncratic has not yet been established, research in this area is crucial.
A key problem in attempting to characterise a speaker is that each individual’s voice can vary greatly. We change our voices depending on who we are talking to, how formal the situation is, the emotion we wish to express and whether there is background noise. Speakers’ voices also change if they are tired, drunk or have a cold or sore throat, and of course speakers can disguise their voices. So a voice is much more complicated to capture than a fingerprint, which is a fixed, unchanging feature of an individual.
DyViS: investigating speech
A team of researchers in the Department of Linguistics – Dr Kirsty McDougall, Dr Gea de Jong, Toby Hudson and Professor Francis Nolan – is carrying out innovative research in speaker identification in the DyViS project (Dynamic Variability in Speech: A Forensic Phonetic Study of British English), funded by the Economic and Social Research Council (ESRC).
To investigate the problem of variation within a speaker’s voice, the DyViS team have compiled a large-scale database of recordings of southern British English spoken across a range of speaking styles. Speakers participated in several tasks: a mock police interview where they were required to ‘lie’ about a particular scenario, a telephone call with a friend involving a more casual and relaxed style of speech, and a number of reading tasks. All of the speaking tasks included a particular selection of words that the participants had to utter in different contexts. These data enable the researchers to investigate how phonetic features of these words change for a given individual across the different speaking styles, and to what extent these features can be used to distinguish individuals.
Identifying the speaker
One particular feature being examined is a phenomenon known as ‘formant frequency dynamics’. Formant frequencies are the resonances of the vocal tract during speech – the frequencies at which vibrations of air are at maximum amplitude in the vocal tract in speech sounds such as vowels. Formant frequencies appear as roughly horizontal dark bands on a spectrogram, a computer-generated representation of the acoustic speech signal. These frequencies are powerful cues to speaker identity since they are determined by both the physical dimensions of a speaker’s vocal tract and the way the speaker configures the vocal organs to produce each sound.
Previous research on speaker differences has typically measured the formant frequencies only at the centre of the sound. The DyViS research goes beyond these ‘static’ measures to investigate the dynamics of formant frequencies, which reflect the movement of a person’s speech organs and are likely to reveal more fine-grained differences among speakers. Just as people exhibit personal styles for walking, running and other skilled motor activities, they move their vocal organs in individual ways when producing speech.
Dr McDougall’s experiments have investigated the speaker-distinguishing potential of the formant frequency dynamics of the vowel sound in spoken words like bike and hike, of the vowel sound in who’d, and of sequences containing an ‘r’ sound preceded and followed by vowel sounds such as a route and a rack. The work shows that formant frequency dynamics carry considerable speaker-specific information. By taking measurements along the formant contours surrounding the centre of a speech sound, a significant improvement in speaker discrimination is achieved.
Forensic phonetics
Together with research into other features of speech being investigated by the DyViS team, this work offers crucial new directions for solutions to the problem of extracting a speaker’s ‘signature’ from the speech signal. Findings from the DyViS project suggest that dynamic features of speech could provide a clue in speaker identification, which has clear applications in forensic evidence – in comparing voices and speech for purposes of identification, and in analysing speech recordings.
The research also has important implications for phonetic theory. Current models of speech production and perception do not provide a good explanation of the role of individual variation in speech communication. The analysis of dynamic features of speech being undertaken by the DyViS team will lead to important theoretical developments in these areas, contributing to our understanding of how individual speakers can communicate with the same language yet sound so different from each other.
This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.