Author Topic: Text Independent Speaker Identification In a Distant Talking Multi-microphone  (Read 2240 times)

0 Members and 1 Guest are viewing this topic.


  • Newbie
  • *
  • Posts: 48
  • Karma: +0/-0
    • View Profile
Text Independent Speaker Identification In a Distant Talking Multi-microphone Environment Using Generalized Gaussian Mixture Model
Author : P. Soundarya Mala, Dr. V. Sailaja, Shuaib Akram
International Journal of Scientific & Engineering Research, IJSER - Volume 2, Issue 4, April-2011
ISSN 2229-5518
Download Full Paper -

Abstract -- In speaker Identification System, the goal is to determine which one of the groups of an unknown voice which best matches with one of the input voices. The field of speaker identification has recently seen significant advancement, but improvements have tended on near field speech, ignoring the more realistic setting of far field instrumented speakers. In this paper, we use far field speech recorded with multi microphones for speaker identification. For this we develop the model for each speaker’s speech. In developing the model, it is customary to consider that the voice of the individual speaker is characterized with Generalized Gaussian model. The model parameters are estimated using EM algorithm. Speaker identification is carried by maximizing the likelihood function of the individual speakers. The efficiency of the proposed model is studied through accuracy measure with experimentation of 25 speaker’s database. This model performs much better than the existing earlier algorithms in Speaker Identification.

Keywords-- Generalized Gaussian model, EM Algorithm, and Mel Frequency Cepstral Coefficients.

Speaker recognition is the process of recognizing who is speaking on the basis of information extracted from the speech signal.  It has been number of applications such as verification of control access permission to corporate database search and voice mail, government lawful intercepts or forensics applications, government corrections, financial services, telecom & call centers, health care, transportation, security, distance learning, entertainment & consumer etc [2].

The growing need for automation in complex work environments and increased need for voice operated services in many commercial areas have motivated for recent efforts in reducing laboratory speech processing algorithms to practice. While many existing systems for speaker identification have demonstrated good performance and achieve high classification accuracy when close talking microphones are used. In adverse distant-talking environments, however the performance is significantly degraded due to a variety of factors such as the distance between the speaker and microphone, the location of the microphone or the noise source, the direction of the speaker and the quality of the microphone. To deal with these problems micro phone arrays based speaker recognizers have been successfully applied to improve the identification accuracy through speech enhancement [3][4][6].
In speaker identification since there is no identity claim, the system identifies the most likely speaker of the test speech signal. Speaker identification can be further classified into closed-set identification and open-set identification. Speaker identification can be further classified into closed-set identification and open-set identification. The task of identifying a speaker who is known a priori to be a member of the set of N enrolled speakers is known as closed-set speaker Identification. The limitation of this system is that the test speech signal from an unknown speaker will be identified to be one among the N enrolled speakers. Thus there is a risk of false identification. Therefore, closed set mode should be employed in applications where it is surely to be used always by the set of enrolled speakers. On the other hand, speaker identification system which is able to identify the speaker who may be from outside the set of N enrolled speakers is known as open-set speaker identification. In this case, first the closed-set speaker identification system identifies the speaker closest to the test speech data. The speaker identification system is divided into text independent speaker identification and text dependent speaker identification. Among these two, Text Independent Speaker Identification is more complicated in open test.

Speaker Identification:
Given different speech inputs X1,X2,……, Xc simultaneously recorded through C multiple microphones, whoever has pronounced X1,X2,…….Xc among registered speakers S={1,2,…..C} is identified by equation (1). Each speaker is modeled by GGMM λk.

S ̂= 〖arg〗_(1≤k≤C)^max p(λk | X1,X2,….Xc).
          = 〖arg〗_(1≤k≤C)^max (p(λk | X1,X2,….Xc) .P(λk))/(p(X1,X2,……Xc))     (1)
By using Bayes’s rule equal prior probability (ie. P(λk.)=1/C), and the conditional independency between different speech inputs X1,X2,….Xc given speaker model λp , and not in that p(X1,X2,……..Xc) is the same for all speakers, equation 1 can be simplified as,

             S ̂= 〖arg〗_(1≤k≤C)^max ∏_(c=1)^C▒p(Xc ┤|  λk)     (2)
Taking the logarithm of equation 2, we obtain
S ̂= 〖arg〗_(1≤k≤C)^max ∑_(c=1)^C▒p(Xc ┤|   λk)                    (3)
                    The identity of the speaker can be determined by the sum of hypothesis log likelihood scores obtained from C microphones. In a distance talking environment, however, the log likelihood score itself in equation(3) is expected to degraded, ie. Its reliability cannot be ensured. Furthermore, a variety of causes such as the location of the speaker or the noise, the direction of the speaker, and the distance can have a different effect on each microphone. Therefore the identification result obtained from a microphone can be better than the others. In such cases, the simple integration is greatly affected by the incorrect classification of single channel. Thus, we propose a new integration method to re-score the hypothesis scores, measure the distance between them and combine them.

Read More: