Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed fo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Al-Kaltakchi, Musab T. S. (Author), Al-Raheem Taha, Haithem Abd (Author), Abd Shehab, Mohanad (Author), Abdullah, Mohamed A.M (Author)
Format:	EJournal Article
Published:	Institute of Advanced Engineering and Science, 2020-05-01.
Subjects:	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
Online Access:	Get fulltext
Tags:	Add Tag No Tags, Be the first to tag this record!


LEADER	02739 am a22003253u 4500
001	ijeecs20106_13706
042			\|a dc
100	1	0	\|a Al-Kaltakchi, Musab T. S. \|e author
100	1	0	\|e contributor
700	1	0	\|a Al-Raheem Taha, Haithem Abd \|e author
700	1	0	\|a Abd Shehab, Mohanad \|e author
700	1	0	\|a Abdullah, Mohamed A.M. \|e author
245	0	0	\|a Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database
260			\|b Institute of Advanced Engineering and Science, \|c 2020-05-01.
500			\|a https://ijeecs.iaescore.com/index.php/IJEECS/article/view/20106
520			\|a In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method.
540			\|a Copyright (c) 2019 Institute of Advanced Engineering and Science
540			\|a http://creativecommons.org/licenses/by-nc/4.0
546			\|a eng
690			\|a pattern recognition; signal processing;
690			\|a Cepstral mean variance normalization (CMVN); Coefficients (MFCCS); Gaussian mixture model (GMM); Mel frequency cepstral; Power normalized cepstral coefficients (PNCCS); Speaker recognition
655	7		\|a info:eu-repo/semantics/article \|2 local
655	7		\|a info:eu-repo/semantics/publishedVersion \|2 local
655	7		\|2 local
786	0		\|n Indonesian Journal of Electrical Engineering and Computer Science; Vol 18, No 2: May 2020; 782-789
786	0		\|n 2502-4760
786	0		\|n 2502-4752
786	0		\|n 10.11591/ijeecs.v18.i2
787	0		\|n https://ijeecs.iaescore.com/index.php/IJEECS/article/view/20106/13706
856	4	1	\|u https://ijeecs.iaescore.com/index.php/IJEECS/article/view/20106/13706 \|z Get fulltext

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Similar Items