SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 5, 2012, Pages 1608-1616

CASA-Based robust speaker identification

(3) Zhao, Xiaojia a Shao, Yang a Wang, Deliang a

a The Ohio State University (United States)

Author keywords

Computational auditory scene analysis (CASA); gammatone frequency cepstral coefficient (GFCC); ideal binary mask; robust speaker identification

Indexed keywords

AUDITORY PERCEPTION; CEPSTRAL COEFFICIENTS; COMPUTATIONAL AUDITORY SCENE ANALYSIS; IDEAL BINARY MASK; MARGINALIZATION; NOISY SPEECH; PERFORMANCE IMPROVEMENTS; RELATED SYSTEMS; ROBUST SPEAKER IDENTIFICATION; SPEAKER CHARACTERISTICS; SPEAKER RECOGNITION SYSTEM; TIME FREQUENCY;

SPEECH PROCESSING;

SPEECH RECOGNITION;

EID: 84859024513 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2186803 Document Type: Article

Times cited : (161)

References (39)

1
- 65249170183
- Speaker model clustering for efficient speaker identification in large population applications
- May
- V. R. Apsingekar and P. L. De Leon, "Speaker model clustering for efficient speaker identification in large population applications," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 848-853, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 848-853
- Apsingekar, V.R.¹ De Leon, P.L.²

2
- 0003684441
- Cambridge MA: MIT Press
- A. S. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, 1990.
- (1990) Auditory Scene Analysis
- Bregman, A.S.¹

3
- 0031233424
- Speaker recognition: A tutorial
- PII S0018921997069478
- J. P. Campbell, "Speaker recognition: A tutorial," Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, Sep. 1997. (Pubitemid 127745630)
- (1997) Proceedings of the IEEE , vol.85 , Issue.9 , pp. 1437-1462
- Campbell, J.P.¹

4
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- DOI 10.1016/S0167-6393(00)00034-0, PII S0167639300000340
- M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech Commun., vol. 34, pp. 267-285, 2001. (Pubitemid 32284867)
- (2001) Speech Communication , vol.34 , Issue.3 , pp. 267-285
- Cooke, M.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

5
- 34547539772
- [Online]
- M. Cooke and T. Lee, "Speech separation and recognition competition," 2006 [Online]. Available: http://www.dcs.shef.ac.uk/~martin/ SpeechSeparationChallenge.htm
- (2006) Speech Separation and Recognition Competition
- Cooke, M.¹ Lee, T.²

6
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980. (Pubitemid 11464930)
- (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
- Davis Steven, B.¹ Mermelstein Paul²

7
- 85135252448
- Missing features detection and handling for robust speaker verification
- M. El-Maliki and A. Drygajlo, "Missing features detection and handling for robust speaker verification," in Proc. Eurospeech, 1999, pp. 975-978.
- (1999) Proc. Eurospeech , pp. 975-978
- El-Maliki, M.¹ Drygajlo, A.²

8
- 69949172494
- 40 years of progress in automatic speaker recognition
- S. Furui, "40 years of progress in automatic speaker recognition," Lecture Notes Comput. Sci., vol. 5558, pp. 1050-1059, 2009.
- (2009) Lecture Notes Comput. Sci. , vol.5558 , pp. 1050-1059
- Furui, S.¹

9
- 0019555090
- Cepstral analysis technique for automatic speaker verification
- S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 2, pp. 254-272, Apr. 1981. (Pubitemid 11495877)
- (1981) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-29 , Issue.2 , pp. 254-272
- Furui Sadaoki¹

10
- 0036296948
- Noise-robust open-set speaker recognition using noise-dependent Gaussian mixture classifier
- Y. Gong, "Noise-robust open-set speaker recognition using noise-dependent Gaussian mixture classifier," in Proc. ICASSP, 2002, pp. 133-136.
- (2002) Proc. ICASSP , pp. 133-136
- Gong, Y.¹

11
- 0028517164
- RASTA processing of speech
- Oct.
- H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-589, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

12
- 85008056718
- HMM-based multipitch tracking for noisy and reverberant speech
- Jul.
- Z. Jin and D. L.Wang, "HMM-based multipitch tracking for noisy and reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1091-1102, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1091-1102
- Jin, Z.¹ Wang, D.L.²

13
- 65249103478
- A supervised learning approach to monaural segregation of reverberant speech
- May
- Z. Jin and D. L. Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625-638, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
- Jin, Z.¹ Wang, D.L.²

14
- 0002560960
- A database for speaker-independent digit recognition
- R.G. Leonard, "A database for speaker-independent digit recognition," in Proc. ICASSP, 1984, pp. 328-331.
- (1984) Proc. ICASSP , pp. 328-331
- Leonard, R.G.¹

15
- 78049408631
- Robust speaker identification using an auditorybased feature
- Q. Li and Y. Huang, "Robust speaker identification using an auditorybased feature," in Proc. ICASSP, 2010, pp. 4514-4517.
- (2010) Proc. ICASSP , pp. 4514-4517
- Li, Q.¹ Huang, Y.²

16
- 58149196390
- On the optimality of ideal binary time-frequency masks
- Y. Li and D. L. Wang, "On the optimality of ideal binary time-frequency masks," Speech Commun., vol. 51, pp. 230-239, 2009.
- (2009) Speech Commun. , vol.51 , pp. 230-239
- Li, Y.¹ Wang, D.L.²

17
- 0030125219
- Speaker recognition using HMM composition in noisy environments
- DOI 10.1006/csla.1996.0007
- T. Matsui, T. Kanno, and S. Furui, "Speaker recognition using HMM composition in noisy environments," Comput. Speech Lang., vol. 10, pp. 107-116, 1996. (Pubitemid 126346924)
- (1996) Computer Speech and Language , vol.10 , Issue.2 , pp. 107-116
- Matsui, T.¹ Kanno, T.² Furui, S.³

18
- 0003789815
- 5th ed. San Diego, CA: Academic
- B. C. J. Moore, An Introduction to the Psychology of Hearing, 5th ed. San Diego, CA: Academic, 2003.
- (2003) An Introduction to the Psychology of Hearing
- Moore, B.C.J.¹

19
- 0003513556
- 2nd ed. Upper Saddle River, NJ: Prentice-Hall
- A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999.
- (1999) Discrete-Time Signal Processing
- Oppenheim, A.V.¹ Schafer, R.W.² Buck, J.R.³

20
- 85009227702
- Analysis of the Aurora large vocabulary evaluations
- N. Parihar and J. Picone, "Analysis of the Aurora large vocabulary evaluations," in Proc. Eurospeech, 2003, pp. 337-340.
- (2003) Proc. Eurospeech , pp. 337-340
- Parihar, N.¹ Picone, J.²

21
- 0009804718
- Auditory models as preprocessors for speech recognition
- M. E. H. Schouten, Ed. Berlin, Germany: Mouton de Gruyter
- R. D. Patterson, J. Holdsworth, and M. Allerhand, "Auditory models as preprocessors for speech recognition," in The Auditory Processing of Speech: From Sounds to Words, M. E. H. Schouten, Ed. Berlin, Germany: Mouton de Gruyter, 1992, pp. 67-83.
- (1992) The Auditory Processing of Speech: From Sounds to Words , pp. 67-83
- Patterson, R.D.¹ Holdsworth, J.² Allerhand, M.³

22
- 4143152576
- [Online]
- M. Przybocki and A. Martin, "The NIST Year 2002 Speaker Recognition Evaluation Plan," 2002 [Online]. Available: http://www.itl.nist. gov/iad/mig/tests/sre/2002/2002-spkrec-evalplan-v60.pdf
- (2002) The NIST Year 2002 Speaker Recognition Evaluation Plan
- Przybocki, M.¹ Martin, A.²

23
- 51449083412
- Robust speaker identification using combined feature selection and missing data recognition
- D. Pullella, M. Kühne, and R. Togneri, "Robust speaker identification using combined feature selection and missing data recognition," in Proc. ICASSP, 2008, pp. 4833-4836.
- (2008) Proc. ICASSP , pp. 4833-4836
- Pullella, D.¹ Kühne, M.² Togneri, R.³

24
- 4644336054
- Reconstruction of missing features for robust speech recognition
- B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol. 43, pp. 275-296, 2004.
- (2004) Speech Commun. , vol.43 , pp. 275-296
- Raj, B.¹ Seltzer, M.L.² Stern, R.M.³

25
- 85075924869
- Comparison of background normalization methods for text-independent speaker verification
- D. A. Reynolds, "Comparison of background normalization methods for text-independent speaker verification," in Proc. Eurospeech, 1997, pp. 963-966.
- (1997) Proc. Eurospeech , pp. 963-966
- Reynolds, D.A.¹

26
- 0029355999
- Speaker identification and verification using Gaussian mixture speaker models
- D. A. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models," Speech Commun., vol. 17, pp. 91-108, 1995.
- (1995) Speech Commun. , vol.17 , pp. 91-108
- Reynolds, D.A.¹

27
- 0033884858
- Speaker verification using adapted Gaussian mixture models
- DOI 10.1006/dspr.1999.0361
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, pp. 19-41, 2000. (Pubitemid 30592166)
- (2000) Digital Signal Processing: A Review Journal , vol.10 , Issue.1 , pp. 19-41
- Reynolds, D.A.¹ Quatieri, T.F.² Dunn, R.B.³

28
- 0033889739
- Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data
- A. Schmidt-Nielsen and T. H. Crystal, "Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data," Digital Signal Process., vol. 10, pp. 249-266, 2000.
- (2000) Digital Signal Process. , vol.10 , pp. 249-266
- Schmidt-Nielsen, A.¹ Crystal, T.H.²

29
- 34547499683
- Incorporating auditory feature uncertainties in robust speaker identification
- Y. Shao, S. Srinivasan, andD. L.Wang, "Incorporating auditory feature uncertainties in robust speaker identification," in Proc. ICASSP, 2007, pp. 277-280.
- (2007) Proc. ICASSP , pp. 277-280
- Shao, Y.¹ Srinivasan, S.² Wang, D.L.³

30
- 51449101666
- Robust speaker identification using auditory features and computational auditory scene analysis
- Y. Shao and D. L.Wang, "Robust speaker identification using auditory features and computational auditory scene analysis," in Proc. ICASSP, 2008, pp. 1589-1592.
- (2008) Proc. ICASSP , pp. 1589-1592
- Shao, Y.¹ Wang, D.L.²

31
- 33947649051
- Robust speaker recognition using binary time-frequency masks
- Y. Shao and D. L. Wang, "Robust speaker recognition using binary time-frequency masks," in Proc. ICASSP, 2006, pp. 645-648.
- (2006) Proc. ICASSP , pp. 645-648
- Shao, Y.¹ Wang, D.L.²

32
- 36248960119
- Higher-level features in speaker recognition
- Speaker Classification I: Fundamentals, Features and Methods
- E. Shriberg, "Higher-level features in speaker recognition," Lecture Notes Comput. Sci., vol. 4343, pp. 241-259, 2007. (Pubitemid 350121206)
- (2007) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.LNAI 4343 , pp. 241-259
- Shriberg, E.¹

33
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- DOI 10.1016/j.specom.2006.09.003, PII S0167639306001129
- S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006. (Pubitemid 44634774)
- (2006) Speech Communication , vol.48 , Issue.11 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.³

34
- 56249136428
- Transforming binary uncertainties for robust speech recognition
- Sep.
- S. Srinivasan and D. L. Wang, "Transforming binary uncertainties for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2130-2140, Sep. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
- Srinivasan, S.¹ Wang, D.L.²

35
- 38849170676
- ETSI Standard Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms" ETSI ES 202 050 v1.1.4 European Telecommunications Standards Institute, ETSI ES 202 050 v1.1.4
- ETSI Standard, "Speech Processing, Transmission andQuality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms", ETSI ES 202 050 v1.1.4, 2005, European Telecommunications Standards Institute, ETSI ES 202 050 v1.1.4.
- (2005) Speech Processing, Transmission AndQuality Aspects (STQ)

36
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Norwell, MA: Kluwer
- D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

37
- 82255178542
- Hoboken, NJ: Wiley-IEEE
- D. L. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Hoboken, NJ:Wiley-IEEE, 2006.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
- Wang, D.L.¹ Brown, G.J.²

38
- 77957744636
- Robust speaker recognition using denoised vocal source and vocal tract features
- Jan.
- N.Wang, P. C. Ching, N. Zheng, and T. Lee, "Robust speaker recognition using denoised vocal source and vocal tract features," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 1, pp. 196-205, Jan. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.1 , pp. 196-205
- Wang, N.¹ Ching, P.C.² Zheng, N.³ Lee, T.⁴

39
- 0034848879
- Text-dependent speaker verification under noisy conditions using parallel model combination
- L. P.Wong andM. Russell, "Text-dependent speaker verification under noisy conditions using parallel model combination," in Proc. ICASSP, 2001, pp. 457-460. (Pubitemid 32839286)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 457-460
- Wong, L.P.¹ Russell, M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.