메뉴 건너뛰기




Volumn 20, Issue 7, 2012, Pages 1990-2001

Low-variance multitaper MFCC features: A case study in robust speaker verification

Author keywords

Mel frequency cepstral coefficient (MFCC); multitaper; small variance estimation; speaker verification

Indexed keywords

AUDIO APPLICATIONS; AUTO REGRESSIVE PROCESS; BIAS AND VARIANCE; FREQUENCY DOMAINS; GAUSSIAN MIXTURE MODEL; JOINT FACTOR ANALYSIS; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MULTITAPER; MULTITAPER METHODS; MULTITAPERS; PARAMETER SELECTION; ROBUST SPEAKER VERIFICATION; SIGNAL SPECTRUM; SPEAKER VERIFICATION; SPECTRAL LEAKAGE; TIME DOMAIN; UNIVERSAL BACKGROUND MODEL;

EID: 84860850285     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2191960     Document Type: Article
Times cited : (131)

References (51)
  • 1
    • 0017851927 scopus 로고
    • On the use of windows for harmonic analysis with the discrete Fourier transform
    • Jan.
    • F. J. Harris, "On the use of windows for harmonic analysis with the discrete Fourier transform," Proc. IEEE, vol. 66, no. 1, pp. 51-84, Jan. 1978.
    • (1978) Proc. IEEE , vol.66 , Issue.1 , pp. 51-84
    • Harris, F.J.1
  • 2
    • 0016495091 scopus 로고
    • Linear prediction: A tutorial review
    • Apr.
    • J. Makhoul, "Linear prediction: A tutorial review," Proc. IEEE, vol. 64, no. 4, pp. 561-580, Apr. 1975.
    • (1975) Proc. IEEE , vol.64 , Issue.4 , pp. 561-580
    • Makhoul, J.1
  • 3
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • S. B. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357-366, Aug. 1980. (Pubitemid 11464930)
    • (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis Steven, B.1    Mermelstein Paul2
  • 6
    • 34347376319 scopus 로고    scopus 로고
    • Temporal structure normalization of speech feature for robust speech recognition
    • DOI 10.1109/LSP.2006.891341
    • X. Xiao, E.-S. Chng, and H. Li, "Temporal structure normalization of speech feature for robust speech recognition," IEEE Signal Process. Lett., vol. 14, no. 7, pp. 500-503, Jul. 2007. (Pubitemid 47018924)
    • (2007) IEEE Signal Processing Letters , vol.14 , Issue.7 , pp. 500-503
    • Xiao, X.1    Chng, E.S.2    Li, H.3
  • 9
    • 70349223791 scopus 로고    scopus 로고
    • Optimal cepstrum estimation using multiple windows
    • Taipei, Taiwan, Apr.
    • M. Hansson-Sandsten and J. Sandberg, "Optimal cepstrum estimation using multiple windows," in Proc. ICASSP '09, Taipei, Taiwan, Apr. 2009, pp. 3077-3080.
    • (2009) Proc. ICASSP ' , vol.9 , pp. 3077-3080
    • Hansson-Sandsten, M.1    Sandberg, J.2
  • 11
    • 70350125882 scopus 로고    scopus 로고
    • An overview of text-independent speaker recognition: From features to supervectors
    • Jan.
    • T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Comm., vol. 52, no. 1, pp. 12-40, Jan. 2010.
    • (2010) Speech Comm. , vol.52 , Issue.1 , pp. 12-40
    • Kinnunen, T.1    Li, H.2
  • 12
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • DOI 10.1006/dspr.1999.0361
    • D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, no. 1, pp. 19-41, Jan. 2000. (Pubitemid 30592166)
    • (2000) Digital Signal Processing: A Review Journal , vol.10 , Issue.1 , pp. 19-41
    • Reynolds, D.A.1    Quatieri, T.F.2    Dunn, R.B.3
  • 13
    • 33645887246 scopus 로고    scopus 로고
    • Support vector machines using GMM supervectors for speaker verification
    • May
    • W. M. Campbell, D. E. Sturim, and D. A. Reynolds, "Support vector machines using GMM supervectors for speaker verification," IEEE Signal Process. Lett., vol. 13, no. 5, pp. 308-311, May 2006.
    • (2006) IEEE Signal Process. Lett. , vol.13 , Issue.5 , pp. 308-311
    • Campbell, W.M.1    Sturim, D.E.2    Reynolds, D.A.3
  • 17
    • 51449111842 scopus 로고    scopus 로고
    • Speaker recognition with session variability normalization based on MLLR adaptation transforms
    • Sep.
    • A. Stolcke, S. S. Kajarekar, L. Ferrer, and E. Shriberg, "Speaker recognition with session variability normalization based on MLLR adaptation transforms," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 1987-1998, Sep. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 1987-1998
    • Stolcke, A.1    Kajarekar, S.S.2    Ferrer, L.3    Shriberg, E.4
  • 18
    • 77955790894 scopus 로고    scopus 로고
    • GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition
    • Aug.
    • C. H. You, K. A. Lee, and H. Li, "GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1300-1312, Aug. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.6 , pp. 1300-1312
    • You, C.H.1    Lee, K.A.2    Li, H.3
  • 20
    • 33846259282 scopus 로고    scopus 로고
    • Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
    • Mar.
    • A. Davis, S. Nordholm, and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 412-424, Mar. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.2 , pp. 412-424
    • Davis, A.1    Nordholm, S.2    Togneri, R.3
  • 21
    • 78649989192 scopus 로고    scopus 로고
    • Robust voice activity detection using long-term signal variability
    • Mar.
    • P. K. Ghosh, A. Tsiartas, and S. Narayanan, "Robust voice activity detection using long-term signal variability," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 3, pp. 600-613, Mar. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.3 , pp. 600-613
    • Ghosh, P.K.1    Tsiartas, A.2    Narayanan, S.3
  • 23
    • 17244374373 scopus 로고    scopus 로고
    • Multitapering and a wavelet variant of MFCC in speech recognition
    • DOI 10.1049/ip-vis:20051004
    • L. P. Ricotti, "Multitapering and a wavelet variant of MFCC in speech recognition," IEE Proc. Vis., Image Signal Process., vol. 152, no. 1, pp. 29-35, Feb 2005. (Pubitemid 40527360)
    • (2005) IEE Proceedings: Vision, Image and Signal Processing , vol.152 , Issue.1 , pp. 29-35
    • Ricotti, L.P.1
  • 24
    • 0020189541 scopus 로고
    • Spectrum estimation and harmonic analysis
    • Sep.
    • D. J. Thomson, "Spectrum estimation and harmonic analysis," Proc. IEEE, vol. 70, no. 9, pp. 1055-1096, Sep. 1982.
    • (1982) Proc. IEEE , vol.70 , Issue.9 , pp. 1055-1096
    • Thomson, D.J.1
  • 25
    • 0029184722 scopus 로고
    • Minimum bias multiple taper spectral estimation
    • Jan
    • K. S. Riedel and A. Sidorenko, "Minimum bias multiple taper spectral estimation," IEEE Trans. Signal Process., vol. 43, no. 1, pp. 188-195, Jan 1995.
    • (1995) IEEE Trans. Signal Process. , vol.43 , Issue.1 , pp. 188-195
    • Riedel, K.S.1    Sidorenko, A.2
  • 26
    • 0031095319 scopus 로고    scopus 로고
    • A multiple window method for estimation of peaked spectra
    • PII S1053587X97018710
    • M. Hansson and G. Salomonsson, "A multiple window method for estimation of peaked spectra," IEEE Trans. Signal Process., vol. 45, no. 3, pp. 778-781, Mar. 1997. (Pubitemid 127765966)
    • (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.3 , pp. 778-781
    • Hansson, M.1    Salomonsson, G.2
  • 28
    • 0028378535 scopus 로고
    • The variance of multitaper spectrum estimates for real gaussian processes
    • Feb.
    • A. T. Walden, E. McCoy, and D. B. Percival, "The variance of multitaper spectrum estimates for real gaussian processes," IEEE Trans. Signal Process., vol. 42, no. 2, pp. 479-482, Feb. 1994.
    • (1994) IEEE Trans. Signal Process. , vol.42 , Issue.2 , pp. 479-482
    • Walden, A.T.1    McCoy, E.2    Percival, D.B.3
  • 29
    • 0026965779 scopus 로고
    • On the performance advantage of multitaper spectral analysis
    • Dec.
    • T. P. Bronez, "On the performance advantage of multitaper spectral analysis," IEEE Trans. on Sign. Proc., vol. 40, no. 12, pp. 2941-2946, Dec. 1992.
    • (1992) IEEE Trans. on Sign. Proc. , vol.40 , Issue.12 , pp. 2941-2946
    • Bronez, T.P.1
  • 30
    • 84908144695 scopus 로고
    • The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms
    • Jun.
    • P. D. Welch, "The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms," IEEE Trans. Audio Electroacoust., vol. AU-15, no. 2, pp. 70-73, Jun. 1967.
    • (1967) IEEE Trans. Audio Electroacoust. , vol.AU-15 , Issue.2 , pp. 70-73
    • Welch, P.D.1
  • 31
    • 84860843356 scopus 로고    scopus 로고
    • Multitaper analysis of fundamental frequency variations during voiced fricatives
    • Dec.
    • C. H. Shadle and G. Ramsay, "Multitaper analysis of fundamental frequency variations during voiced fricatives," in Proc. 6th Int. Seminar Speech Product., Dec. 2003, p. CD-6.
    • (2003) Proc. 6th Int. Seminar Speech Product.
    • Shadle, C.H.1    Ramsay, G.2
  • 32
    • 33847668886 scopus 로고    scopus 로고
    • Multitaper covariance estimation and spectral denoising
    • 1599939, Conference Record of The Thirty-Ninth Asilomar Conference on Signals, Systems and Computers
    • N. Erdol and T. Gunes, "Multitaper covariance estimation and spectral denoising," in Proc. Conf. Rec. 39th Asilomar Conf. Signals, Syst., Comput., Nov. 2005, pp. 1144-1147. (Pubitemid 46350492)
    • (2005) Conference Record - Asilomar Conference on Signals, Systems and Computers , vol.2005 , pp. 1144-1147
    • Erdol, N.1    Gunes, T.2
  • 33
    • 79959826333 scopus 로고    scopus 로고
    • What else is new than the Hamming window? Robust MFCCs for speaker recognition via multitapering
    • Japan, Sep.
    • T. Kinnunen, R. Saeidi, J. Sandberg, and M. Hansson-Sandsten, "What else is new than the Hamming window? Robust MFCCs for speaker recognition via multitapering," in Proc. Interspeech, Makuhari, Japan, Sep. 2010, pp. 2734-2737.
    • (2010) Proc. Interspeech, Makuhari , pp. 2734-2737
    • Kinnunen, T.1    Saeidi, R.2    Sandberg, J.3    Hansson-Sandsten, M.4
  • 37
    • 85032751338 scopus 로고    scopus 로고
    • Jackknifing multitaper spectrum estimates
    • DOI 10.1109/MSP.2007.4286561
    • D. J. Thomson, "Jackknifing multitaper spectrum estimates," IEEE Signal Process. Mag., vol. 24, no. 4, pp. 20-30, Jul. 2007. (Pubitemid 47316164)
    • (2007) IEEE Signal Processing Magazine , vol.24 , Issue.4 , pp. 20-30
    • Thomson, D.J.1
  • 38
    • 70350488536 scopus 로고    scopus 로고
    • On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling
    • Nov
    • T. Gerkmann and R. Martin, "On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling," IEEE Trans. Signal Process., vol. 57, no. 11, pp. 4165-4174, Nov 2009.
    • (2009) IEEE Trans. Signal Process. , vol.57 , Issue.11 , pp. 4165-4174
    • Gerkmann, T.1    Martin, R.2
  • 39
    • 0000120766 scopus 로고
    • Estimating the dimension of a model
    • Mar.
    • G. Schwarz, "Estimating the dimension of a model," Ann. Statist., vol. 6, pp. 461-464, Mar. 1978.
    • (1978) Ann. Statist. , vol.6 , pp. 461-464
    • Schwarz, G.1
  • 40
    • 0002537922 scopus 로고    scopus 로고
    • Algorithm 808: ARFIT - A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models
    • DOI 10.1145/382043.382316
    • T. Schneider and A. Neumaier, "Algorithm 808: ARfit-a Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models," ACM Trans. Math. Softw., vol. 27, pp. 58-65, 2001. (Pubitemid 33609115)
    • (2001) ACM Transactions on Mathematical Software , vol.27 , Issue.1 , pp. 58-65
    • Schneider, T.1    Neumaier, A.2
  • 41
    • 0033884857 scopus 로고    scopus 로고
    • Score normalization for text-independent speaker verification systems
    • DOI 10.1006/dspr.1999.0360
    • R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, "Score normalization for text-independent speaker verification systems," Digital Signal Process., vol. 10, no. 1-3, pp. 42-54, Jan. 2000. (Pubitemid 30592165)
    • (2000) Digital Signal Processing: A Review Journal , vol.10 , Issue.1 , pp. 42-54
    • Auckenthaler, R.1    Carey, M.2    Lloyd-Thomas, H.3
  • 42
    • 33745210768 scopus 로고    scopus 로고
    • Modelling session variability in text-independent speaker verification
    • Lisbon, Portugal, Sep.
    • R. Vogt, B. Baker, and S. Sridharan, "Modelling session variability in text-independent speaker verification," in Proc. Interspeech '05, Lisbon, Portugal, Sep. 2005, pp. 3117-3120.
    • (2005) Proc. Interspeech ' , vol.5 , pp. 3117-3120
    • Vogt, R.1    Baker, B.2    Sridharan, S.3
  • 44
    • 77952192470 scopus 로고    scopus 로고
    • Temporally weighted linear prediction features for tackling additive noise in speaker verification
    • Jun.
    • R. Saeidi, J. Pohjalainen, T. Kinnunen, and P. Alku, "Temporally weighted linear prediction features for tackling additive noise in speaker verification," IEEE Signal Process. Lett., vol. 17, no. 6, pp. 599-602, Jun. 2010.
    • (2010) IEEE Signal Process. Lett. , vol.17 , Issue.6 , pp. 599-602
    • Saeidi, R.1    Pohjalainen, J.2    Kinnunen, T.3    Alku, P.4
  • 45
    • 79959832654 scopus 로고    scopus 로고
    • Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions
    • Makuhari, Japan, Sep.
    • J. Pohjalainen, R. Saeidi, T. Kinnunen, and P. Alku, "Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions," in Proc. Interspeech '10, Makuhari, Japan, Sep. 2010, pp. 1477-1480.
    • (2010) Proc. Interspeech ' , vol.10 , pp. 1477-1480
    • Pohjalainen, J.1    Saeidi, R.2    Kinnunen, T.3    Alku, P.4
  • 50
    • 29044433161 scopus 로고    scopus 로고
    • NIST and NFI-TNO evaluations of automatic speaker recognition
    • DOI 10.1016/j.csl.2005.07.001, PII S088523080500032X, Odyssey 2004: The Speaker and Language Recognition Workshop Odyssey-04
    • D. A. van Leeuwen, A. F. Martin, M. A. Przybocki, and J. S. Bouten, "NIST and NFI-TNO evaluations of automatic speaker recognition," Comput. Speech Lang., vol. 20, pp. 128-158, Apr.-Jul. 2006. (Pubitemid 41787534)
    • (2006) Computer Speech and Language , vol.20 , Issue.2-3 SPEC. ISS. , pp. 128-158
    • Van Leeuwen, D.A.1    Martin, A.F.2    Przybocki, M.A.3    Bouten, J.S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.