Author Topic: Method of Speech Signal Compression in Speaker Identification Systems  (Read 3127 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Quote
Author : A.Raimy, K.Konate, NM Bykov
International Journal of Scientific & Engineering Research Volume 3, Issue 1, January-2012
ISSN 2229-5518
Download Full Paper : PDF

Abstract— In this paper we present  a technique of efficacy improvement of speech signal compression algorithm without individual features speech   production loss. The compression in this case means to delete, from the digital signal, those quantization steps that can be predicted. We propose to  decrease the number of those quantization steps using a modified linear predication algorithm with variable order. That allows to decrease compression time and save computer resource. 

Index Terms— speech signal compression, quantization steps, linear predication algorithm, computer resource.

1   INTRODUCTION                                                                     
THE task of efficient representation of speech signal is one of the vital tasks in speaker identification problems. For example, an automatic speaker recognition system is installed on a LAN or WAN server, which authorizes a ter-minal to access the network according to the voice of the subscriber. There are two ways of processing information in this case:
1) get the identity features of the speaker from the speech signal on the subscriber’s terminal and transfer them to the server for a decision regarding the possibility of admission;
2) compress the speech signal, without loosing the information about the speaker’s identity, in the form of a password wav-file, and transfer it across the network to the server, where the identification procedure is carried out.
One of the advantages of the first approach is the reduction of  the  transmission time over the network. Its main  drawbacks  are that  it  reduces the confidentiality the  speaker identification procedures,  and there is a need to install on the terminals a system for a primary analysis and description of  the speaker signals features. Thus, the second approach is more effective for information processing regarding  the number of computations that are required for the compression, and the use of  ASP-technologies for the selection of informative features and for decision-making.

Analysis of known works
According to the well known methods of signal compression and given the statistical characteristics of the speech signal, the parameters of the analog-to-digital converters (ADC) are chosen according to the rules presented in [1, 2]: the discretization frequency is determined by the upper limit frequency of the signal, the  quantification range  – by the dispersion, the quantification step - by the signal to noise ratio and the required precision. Since the speech signal is not stationary, the parameters of the ADC are chosen approximately using the most catastrophic situation, which is rarely encountered. As a result, the inherent redundancy of the speech signal is com-pleted by the redundancy of the discrete transformation. As a result a new problem arises:  eliminating the ADC’s redundancy. In the numerous variants of pulse modulation  and adaptive coding, which are used today to eliminate encoding redundancy, the sample rate remains constant and equals the Nyquist frequency, and redundancy is eliminated by analyzing the values of neighboring signal samples.

The aim of the research
The aim of the research is to increase the efficiency of the algorithm of speech signal compression without loosing the information related to the personnal peculiarities of the speaker,  by removing those samples that can be predicted.

2 THEORETICAL FOUNDATIONS OF THE PROPOSED METHOD

In this work we propose to reduce the number of signal sam-ples by using the modified method of variable order linear prediction. The peculiarity of the proposed method consists in a two step processing of the speech signal, which allows  reducing the time that is necessary for wav-file compression. The process is carried out in two steps:
 1.Preliminary compression;
        2. Final compression.
At the first stage the wav-file is processed using an original technique, which consists in approximating the speech signal using a polyline, with the possibility to  establish the degree of its deviation from the original signal. At the second stage  the wav-file areas which were not affected during the initial compression procedure are approximated using a polynomial, whose order is determined according to the accuracy that is required to restore the speech signal from the archive file.
Since the speech signal is a continuous function  , whose spectrum  is limited by the upper  frequency  F, it is defined by the succession of his samples, whose time interval is calculated using the following formula:

 .
Thus the signal    can be described as follows:

 ,

where     is the sample function and  assumes discrete value
 
 

For a limited duration  of the speech signal the number of the signal samples is defined by the expression:
         
Taking to account the quazzi stationarity of the signal and also the non critical state of the data collection systems to real time of processing, a method of reduction of the encoding redundancy of the speech signal using the ADC has been developed.
Minimization of the error of restored signal consist in the finding those fixed values of the argument   that ensure convergence of broken plot from the vertices   towards the function  so that for the entire range of argument changing  the absolute error does not exceed permissible values.
The function   in these points can be presented as follows:

  for  ,
                                for  ,
 ,
for ,

where  can be defined as follows :

            ,
 
             
In general:
 ,
where   

                     Approximation error is determined by the remainder term of interpolation formula. In this case, the segment of line in the within the time interval [ ] is defined by the expression:
 
and the remaining member of functions expansion  at the same interval will be:

 
where  - the second derivative of a given function within the interval.
If it is known that  and   are maximal, then
 .
Letting , we get the formula for the sampling interval
 .
Asking the upper frequency of signal bandwidth is defined we can determine the deviation of real signal value from predicted. Based on the above, an algorithm to imple-ment the procedure for pre-compression of voice information was created. It includes following steps:
1. Set level of allowable absolute error of the recovery signal  ;
2. Set the minimum size  of buffer compression;
3. For the current point the coefficient of prediction is determined;
4. If a deviation of the coefficient  , we incorporate current sample in compression buffer, increasing the value   of  buffer counter by 1 and go to Item 3, if the inequality is not fulfilled, then check the buffer counter  : if  then set   and go to to Item 3; if  then compression is full field;
5. If end of wav-file not found, then go to Item 3.
Linear prediction used for the realization of the process of the second step of compression [3,4].The signal is presented in a digital form ,  , where  is number of signal samples, which is obtained by sampling it at a certain frequency F. This signal  ,  ,can be presented as a linear combination of preceding values of the signal and some influence 
 
where   is the amplification coefficient and   is the order of prediction.
Then, knowing the values of signal , the problem reduces to searching the coefficients   and . Concerning the estimate, we will use the least square method assuming the signal   as deterministic.
The values of signal   will be expressed through his estimating values   by the following formula :
 .
Then the predicion error can be described as follows:
 
Using the least square method, the parameters   are selected so as to minimize the average or the sum of squares of the prediction error. In order to find the coefficients , let us use the matrix method [5,7] called as Darbin method.
Calculation of the coefficients of linear prediction and the prediction error is performed by the following algorithm: of coefficients of linear prediction and prediction error is:
1.   The segmentation of the speech signal  at stationary intervals;
2.    For separated intervals, a system of linear equations is formed that is solved by matrix method or by Darbin method using the auto-correlation function (method is selected by user);
3.    The prediction error is calculated.

Read More: Click here...