Saturday, January 25, 2020

Speaker Recognition System Pattern Classification

Speaker Recognition System Pattern Classification A Study on Speaker Recognition System and Pattern classification Techniques Dr E.Chandra,  K.Manikandan,  M.S.Kalaivani Abstract Speaker Recognition is the process of identifying a person through his/her voice signals or speech waves. Pattern classification plays a vital role in speaker recognition. Pattern classification is the process of grouping the patterns, which are sharing the same set of properties. This paper deals with speaker recognition system and over view of Pattern classification techniques DTW, GMM and SVM. Keywords Speaker Recognition System, Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Support Vector Machine (SVM). INTRODUCTION Speaker Recognition is the process of identifying a person through his/her voice signals [1] or speech waves. It can be classified into two categories, speaker identification and speaker verification. In speaker identification task, a speech utterance of an unknown speaker is compared with set of valid users. The best match is used to identify the speaker. Similarly, in speaker verification the unknown speaker first claims identity, and the claimed model is then used for identification. If the match is above a predefined threshold, the identity claim is accepted The speech used for these task can be either text dependent or text independent. In text dependent application the system has the prior knowledge of the text to be spoken. The user will speak the same text as it is in the predefined text. In a text-independent application, there is no prior knowledge by the system of the text to be spoken. Pattern classification plays a vital role in speaker recognition. The term Pattern defines the objects of interest. In this paper the sequence of acoustic vectors, extracted from input speech are taken as patterns. Pattern classification is the process of grouping the patterns, which are sharing the same set of properties. It plays a vital role in speaker recognition system. The result of pattern classification decides whether to accept or reject a speaker. Several research efforts have been done in pattern classification. Most of the works based on generative model. There are Dynamic Time Warping (DTW) [3], Hidden Markov Models (HMM) , Vector Quantization (VQ) [4], Gaussian mixture model (GMM) [5] and so forth. Generative model is for randomly generating observed data, with some hidden parameters. Because of the randomly generating observed data functions, they are not able to provide a machine that can directly optimize discrimination. Support vector machine was introducing as an alternative classifier for speaker verification. [6]. In machine learning SVM is a new tool, which is used for hard classification problems in several fields of application. This tool is capable to deal with the samples of higher dimensionality. In speaker verification binary decision is needed, since SVM is discriminative binary classifier it can classify a complete utterance in a single step. This paper is planned as follows. In section 2: speaker recognition system, in section 3, Pattern Classification, AND overview of DTW, GMM, and SVM techniques .section 4: Conclusion. SPEAKER RECOGNITION SYSTEM Speaker recognition categorized into verification and identification. Speaker Recognition system consists of two stages .speaker verification and speaker identification. Speaker verification is 1:1 match, where the voice print is matched with one template. But speaker identification is 1:N match, where the input speech is matched with more than one templates. Speaker verification consists of five steps. 1. Input data acquisition 2.feature extraction 3.pattern matching 4.decision making 5.generate speaker models. Fig 1: Speaker recognition system In the first step sample speech is acquired in a controlled manner from the user. The speaker recognition system will process the speech signals and extract the speaker discriminatory information. This information forms a speaker model. At the time of verification process, a sample voice print is acquired from the user. The speaker recognition system will extract the features from the input speech and compared withpredefined model. This process is called pattern matching. DC Offset Removal and Silence Removal Speech data are discrete-time speech signals, carry some redundant constant offset called DC offset [8].The values of DC offset affect the information ,extracted from the speech signals. Silence frames are audio frames of background noise with low energy level .silence removal is the process of discarding the silence period from the speech. The signal energy in each speech frame is calculated by using equation (1). M – Number of samples in a speech frames, N- Total number of speech frames. Threshold level is determined by using the equation (2) Threshold = Emin + 0.1 (Emax – Emin) (2) Emax and Emin are the lowest and greatest values of the N segments. Fig 2. Speech Signal before Silence Removal Fig 3. Speech Signal after Silence Removal This technique is used to enhance the high frequencies of the speech signal. The aim of this technique is to spectrally flatten the speech signal that is to increase the relative energy of its high frequency spectrum. The following two factors decides the need of Pre-emphasis technique.1.Speech Signals generally contains more speaker specific information in higher frequencies [9]. 2. If the speech signal energy decreases the frequency increases .This made the feature extraction process to focus all the aspects of the voice signals. Pre-emphasis is implemented as first order finite Impulse Response filter, defined as H(Z) = 1-0.95 Z-1 (3) The below example represents speech signals before and after Pre-emphasizing. Fig 4. Speech Signal before Pre-emphasizing Fig 5. Speech Signal after Pre-emphasizing Windowing and Feature Extraction: The technique windowing is used to minimize the signal discontinuities at beginning and end of each frame. It is used to smooth the signal and makes the frame more flexible for spectral analysis. The following equation is used in windowing technique. y1(n) = x (n)w(n), 0 ≠¤Ãƒ ¯Ã¢â€š ¬Ã‚  n ≠¤Ãƒ ¯Ã¢â€š ¬Ã‚  N-1 (4) N- Number of samples in each frame. The equation for Hamming window is(5) There is large variability in the speech signal, which are taken for processing. to reduce this variability ,feature extraction technique is needed. MFCC has been widely used as the feature extraction technique for automatic speaker recognition. Davis and Mermelstein reported that Mel-frequency cepstral Coefficients (MFCC) provided better performance than other features in 1980 [10]. Fig 6. Feature Extraction MFCC technique divides the input signal into short frames and apply the windowing techniques, to discard the discontinuities at edges of the frames. In fast Fourier transform (FFT) phase, it converts the signal to frequency domain and after that Mel scale filter bank is applied to the resultant frames. After that, Logarithm of the signal is passed to the inverse DFT function converting the signal back to time domain. PATTERN CLASSIFICATION Pattern classification involves in computing a match score in speaker recognition system. The term match score refers the similarity of the input feature vectors to some model. Speaker models are built from the features extracted from the speech signal. Based on the feature extraction a model of the voice is generated and stored in the speaker recognition system. To validate a user the matching algorithm compares the input voice signal with the model of the claimed user. In this paper three techniques in pattern classification have been compared. Those three major techniques are DTW, GMM and SVM. Dynamic Time Warping: This well known algorithm is used in many areas. It is currently used in Speech recognition,sign language recognition and gestures recognition, handwriting and online signature matching ,data mining and time series clustering, surveillance , protein sequence alignment and chemical engineering , music and signal processing . Dynamic Time Warping algorithm is proposed by Sadaoki Furui in 1981.This algorithm measures the similarity between two series which may vary in time and speed. This algorithm finds an optimal match between two given sequences. The average of the two patterns is taken to form a new template. This process is repeated until all the training utterances have been combined into a single template. This technique matches a test input from a multi-dimensional feature vector T= [ t1, t2†¦tI] with a reference template R= [ r1, r2†¦rj]. It finds the function w(i) as shown in the below figure. In Speaker Recognition system Every input speech is compared with the utte rance in the database .For each comparison, the distance measure is calculated .In the measurements lower distance indicates higher similarity. Fig 7. . Dynamic Time Warping Gaussian mixture model: Gaussian mixture model is the most commonly used classifier in speaker recognition system.It is a type of density model which comprises a number of component functions. These functions are combined to provide a multimodal density. This model is often used for data clustering. It uses an alternative algorithm that converges to a local optimum. In this method the distribution of the feature vector x is modeled clearly using mixture of M Gaussians. mui- represent the mean and covariance of the i th mixture. x1, x2†¦xn, Training data ,M-number of mixture. The task is parameter estimation which best matches the distribution of the training feature vectors given in the input speech. The well known method is maximum likehood estimation. It finds the model parameters which maximize the likehood of GMM. Therefore, the testing data which gain a maximum score will recognize as speaker. Support Vector Machine: Support machine was proposed in 1990 and it is one of the best machine learning algorithms. This is used in many pattern classification problems. such as image recognition, speech recognition, text categorization, face detection and faulty card detection, etc. The basic idea of support vector machine is to find the optimal linear decision surface based on the concept of structural risk minimization. It is a binary classification method. The decision surface refers the weighted combination of elements in a training dataset. These elements are called support vectors. These vectors define the boundary between two classes. In a binary problem +1 and -1 are taken as two classes. The size of the margin should be maximized to characterize the boundary between two classes. The below example explains pattern classification by using SVM. In the fig 3(a), there are two different kinds of patterns taken for process. A line is drawn to separate these two patterns. In the fig 3(b),by using a single line the patterns are separated, the patterns are presented in two dimensional space. The similar representation in one dimensional space in the fig 3(c), a point can be used to separate patterns in one dimensional space. a plane that separates these patterns in 3-D space ,represented in the fig 3(d),is called separating hyper plane. . The next task a plane should be selected from the set of planes whose margin is maximum. The plane with the maximum margin i.e. perpendicular distance from the marginal line is known as optimal hyper plane or maximum margin hyper plane as shown in fig 3(f). The patterns that lie on the edges of the plane are called support vectors While classify the patterns, there may exist some errors in the representation, as shown in the fig 3(g), such types of errors are called soft margin. Sometimes ,these errors can be ignored to some threshold value. The patterns that can be easily separated using line or Plane are called linearly Separable patterns .Non-linear separable patterns (fig-j,k,l)are difficult to classify. These patterns are classified by using kernel functions . In order to classify non-linear separable patterns the original data’s are mapped to higher dimensional space using kernel function. CONCLUSION In this paper we have explained about speaker recognition system and discussed about three major pattern classification techniques, Dynamic Time Warping, Gaussian mixture model and Support Vector Machine. SVM will work efficiently on fixed length vectors. To implement SVM the input data should be normalized for better performance. In future, we have planned to implement these techniques in speaker recognition system and evaluate the performance. The performance of the models will also be evaluated by incrementing the amounts of training data. REFERENCES [1] Campbell, J.P., Speaker Recognition: A Tutorial, Proc. Of the IEEE, vol. 85,no. 9, 1997, pp. 1437-1462. [2] Sadaoki Furui., Recent advances in speaker recognition,Pattern Recognition Letters. 1997,18 (9): 859-72. [3] Sakoe, H.and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, Acoustics,Speech, and Signal Processing, IEEE Transactions on Volume 26, Issue 1, Feb 1978 Page 43 49. [4] Lubkin, J. and Cauwenberghs, G., VLSI Implementation of Fuzzy Adaptive Resonance and Learning Vector Quantization, Int. J. Analog Integrated Circuits and Signal Processing, vol. 30 (2), 2002,pp. 149-157. [5] Reynolds, D. A. and Rose, R. C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 1995, pp 72–83. [6] Solera, U.R., Martà ­n-Iglesias, D., Gallardo-Antolà ­n, A., Pelà ¡ez-Moreno, C. and Dà ­az-de-Marà ­a, F, Robust ASR using Support Vector Machines, Speech Communication, Volume 49 Issue 4, 2007. [7] Temko, A.; Monte, E.; Nadeu, C., Comparison of Sequence Discriminant Support Vector Machines for Acoustic Event Classification, ICASSP 2006 Proceedings, 2006 IEEE International Conference on Volume 5, Issue , 14-19 May 2006 [8] Shang, S.; Mirabbasi, S.; Saleh, R., A technique for DCoffset removal and carrier phase error compensation in integrated wireless receivers Circuits and Systems, ISCAS apos;03. Proceedings of the 2003 International Symposium onVolume 1, Issue , 25-28 May 2003 Page I-173 I-176 vol.1 [9] Vergin, R.; Oapos;Shaughnessy, D., Pre-emphasis and speech recognition lectrical and Computer Engineering†,Canadian Conference on Volume 2, Issue , 5-8 Sep 1995 [10] Davis, S. B. and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. on Acoustic, Speech and Signal Processing, ASSP-28, 1980, No. 4. [11] Sadaoki Furui., Cepstral analysis technique for automatic speaker verification, IEEE Trans. ASSP 29, 1981,pages 254-272. BIOGRAPHIES Dr.E.Chandra received her B.Sc., from Bharathiar University, Coimbatore in 1992 and received M.Sc., from Avinashilingam University ,Coimbatore in 1994. She obtained her M.Phil. In the area of Neural Networks from Bharathiar University, in 1999. She obtained her PhD degree in the area of Speech recognition system from Alagappa University Karikudi in 2007. She has totally 15 yrs of experience in teaching including 6 months in the industry. Presently she is working as Director, Department of Computer Applications in D. J. Academy for Managerial Excellence, Coimbatore. She has published more than 30 research papers in National, International Journals and Conferences in India and abroad. She has guided more than 20 M.Phil. Research Scholars. Currently 3 M.Phil Scholars and 8 PhD Scholars are working under her guidance. She has delivered lectures to various Colleges. She is a Board of studies member of various Institutions. Her research interest lies in the area of Data Mining, Artificial Intelligence, Neural Networks, Speech Recognition Systems, Fuzzy Logic and Machine Learning Techniques. She is an active and Life member of CSI, Society of Statistics and Computer Applications. Currently she is Management Committee member of CSI Coimbatore Chapter. K. Manikandan received his Bsc from Bharathidhasan University, Tiruchirappalli in1998 and received his MCA from Bharathiadsan University, Tiruchirappalli in 2001. He received M.Phil in the area of soft computing from Bharathiyar university, Coimbatore in 2004. He has 12 years of experience in teaching. Currently, he is working as a Assistant Professor, Department Of Computer Science, PSG College of arts and Science, Coimbatore and pursuing PhD in Bharathiar University, Coimbatore.He has presented research papers in National and International Conferences and published a paper in International Journal. His Research Interest is Soft Computing . He is Life a member of IAENG. He has guided more than 4 M.Phil Research Scholars. Currently 3 M.Phil Scholars are working under his guidance. He has delivered lectures to various Colleges. M.S.Kalaivani received her BCA from P.S.G College of Arts and Science, Coimbatore, in 2005 and received her MCA from National Institute of Technology, Tiruchirappalli in 2008.She has 4 years of working experience at software industry. Presently, she is working as a Research Scholar, Department of Computer Science, P.S.G. College of Arts and Science, Coimbatore. Her research interests are Machine Learning and Fuzzy logic.

Friday, January 17, 2020

Interpersonal communication Essay

Interpersonal communication is defined as an interaction between involving two or more participants, providing immediate feedback to each other. It serves a purpose especially in building relationships. Interpersonal communication is a transactional process. It is not a one way activity like a monologue. Rather, it is interactive, and ongoing. In watching the sitcom, Scrubs, communication doesn’t commence when the characters start talking. It starts the moment viewers and actors come face to face in the boob tube. The actors do not convey messages solely through words, but through actions and facial expressions too. For instance, the scrunching face of one actor may already be interpreted by the audience as an expression of disgust or dislike. Interpersonal communication is also ambiguous. The significance of the words articulated is interpreted distinctively by each receiver. The particular line â€Å"well good news is, I don’t have to eat my wife’s cooking anymore, right? † uttered by the patient was understood differently by the pair of doctors standing in his bedside. The female physician laughed so hard because she found it funny, but the male physician furrowed his brows. The understanding of a person may be affected by various factors. His culture, personality, upbringing, gender and even intelligence are just some of the reasons for the disparity in interpretation. The ambiguity of interpersonal communication is also a cause of dispute. In a lover’s feud for example, the female might be fuming mad when his partner chides about her weight. She might take it as a sign that he is not attracted to him anymore. Whereas, the bewildered boyfriend’s initial goal was perhaps to make her less conscious of her body by joking about it. In addition, interpersonal communications have a content and relationship dimension. The meaning of a line or phrase is dependent on the context and the circumstance involved. Just like in the sitcom line mentioned above, where the man commented about his wife’s cooking, the connotation will change if the man is not ill and in bed. For me, what he said was meant to make the hospital mood lighter. But, if he were talking to an attractive woman at a cafe, it might be interpreted as flirting. Interpersonal communication may be viewed as symmetrical or complementary. Symmetry suggests that the behaviour of one person is mirrored by another, while the term complementary refers to contrasting reactions. Both were evident in the sitcom Scrubs. The patient-doctor relationship is usually symmetrical in the show. The physician wants to cure the patient’s sickness, and the patient wants to be treated. Complementarity arises due to the different power positions. The physician, who is an expert on medical care instructs his patient. The patient oftentimes, becomes a passive receiver of information. When the relationship is complementary, there is a chance that the two parties would intensify each other. For instance, when the patient told his doctor that he wanted to get out of bed to see the talent show, the doctor of course declined. The patient looked downcast and ready to protest, but it turned out that the doctor was only kidding him initially. Interpersonal communication is a series of punctuated events. After each statement or idea, there is a reaction. A person does not respond only after a lengthy narrative is finished, but on each word, sentence or paragraph mentioned. In a sitcom for example, viewers do not watch the whole episode and laugh only when it ends. But, they chuckle on each line that they find funny. In addition, the series of reactions, on when to laugh is arbitrarily set by the viewer. I do not find other dialogues ticklish, and thus I do not giggle a bit, even if others do. However, live sitcoms like Scrubs exploit this aspect by adapting to and adopting the viewer’s point of view. Since communication is a transactional process, it is easy to catch the audience’s empathy and adjust to their mood. A laughing spiel is often followed by serious dialogue. Interpersonal communication is inevitable. In a situation where interaction is possible, one cannot not communicate. It is hard not to respond to someone who is conveying a message to you. But, I personally find this point rather contentious. As a television viewer, I sometimes watch simply to absorb information. In watching the weather news, I feel no empathy for what I am hearing. I am simply a passive funnel of ideas. In this sense, the news reporter has given me weather data, but has not elicited any reaction from me. Interpersonal communication is irreversible. Something that has been said cannot be taken back. The meaning of the words that has been transmitted and digested by the other party cannot be reversed. In sitcoms for instance, if viewers are offended by a racial joke, it is hard to appease them. The only way to do it is through a public apology. Interpersonal communication is unrepeatable. The exact line containing exactly the same words can of course be uttered twice, but the underlying situation is constantly changing and there is no certainty that it can be reconstructed. Due to the unrepeatable aspect of interpersonal communication, one has to be aware of himself. At such, one has to be conscious of using strong words, like â€Å"hate† and giving commitments.

Thursday, January 9, 2020

Bending Water with Static Electricity

When two objects are rubbed against each other, some of the electrons from one object jump to the other. The object that gains electrons becomes more negatively charged; the one that loses electrons becomes more positively charged. The opposite charges attract each other in a way that you can actually see. One way to collect charge is to comb your hair with a nylon comb or rub it with a balloon. The comb or balloon will become attracted to your hair, while the strands of your hair (all the same charge) repel each other. The comb or balloon will also attract a stream of water, which carries an electrical charge. Difficulty: EasyTime Required: minutes What You Need Aside from water, all you need for this experiment is dry hair and a comb. The trick is using a comb that picks up charge from your hair. Choose nylon, not wood or metal. If you dont have a comb, a latex balloon works equally well. Water faucetNylon comb or latex balloon Heres How Comb dry hair with a nylon comb or rub it with an inflated latex balloon.Turn on the tap so that a narrow stream of water is flowing (1 to 2 mm across, flowing smoothly).Move the balloon or teeth of the comb close to the water (not in it). As you approach the water, the stream will begin to bend toward your comb.Experiment!Does the amount of bend depend on how close the comb is to the water?If you adjust the flow, does it affect how much the stream bends?Do combs made from other materials work equally well?How does a comb compare with a balloon?Do you get the same effect from everyones hair or does some hair release more charge than others?Can you get your hair close enough to the water to repel it without getting it wet? Tip This activity will work better when the humidity is low. When humidity is high, water vapor catches some of the electrons that would jump between objects. For the same reason, your hair needs to be completely dry when you comb it.