Speaker identification for security systems using reinforcement-trained pRAM neural network architectures

Abstract

Speaker identification may be employed as part of a security system requiring user authentication. In this case, the claimed identity of the user is known from a magnetic card and PIN number, for example, and an utterance is requested to confirm the identity of the user. A fast response is necessary in the confirmation phase and a fast registration process for new users is desirable. The TESPAR (Time Encoded Signal Processing and Recognition) digital language is used to preprocess the speech signal. TESPAR compresses an arbitrary-length utterance to a 29-element vector and this vector must be unique for a given speaker. The advantage of using TESPAR is that it provides a very high degree of compression and does not require convolution or filtering operations on the speech signal as spectrogram methods do, since the TESPAR process extracts the formant and vocal tract information implicitly. A speaker cannot be identified directly from the single TESPAR vector since there is a highly non-linear relationship between the vector’s components such that vectors are not linearly separable. Therefore the vector and its characteristics suggest that classification using a neural network will provide an effective solution. Good classification performance has been achieved using a multi-layer perceptron (MLP) network [1], however the processing time was too long for a practical system. A hardware solution was investigated and the pRAM (probabilistic RAM) neuron was chosen because it existed in integrated circuit (VLSI) form and because it was compatible with the TESPAR output since both use binary-coded signals. Four probabilistic RAM (pRAM) neural network architectures are presented in order to explain how the different pRAM network architectures perform a classification and to show where the difficulties lie in separating different speakers using their TESPAR representations. A performance of approximately 97% correct classifications has been obtained, which is similar to results obtained elsewhere [2] and slightly better than the MLP network previously mentioned. No speech recognition stage was used in obtaining these results, so the performance relates only to identifying a speaker’s voice and is therefore independent of the spoken phrase. This has been achieved in a hardware-realisable system which may be incorporated into a smart-card or similar application.

Cite this paper

@inproceedings{Clarkson1999SpeakerIF, title={Speaker identification for security systems using reinforcement-trained pRAM neural network architectures}, author={Trevor G. Clarkson and Chris Christodoulou and Yelin Guan and Denise Gorse and David A. Romano-Critchley and John G. Taylor}, year={1999} }