SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 8, 2011, Pages 2624-2632

Voice activity detection based on an unsupervised learning framework

(4) Ying, Dongwen a Yan, Yonghong a Dang, Jianwu b,c Soong, Frank K d

a INSTITUTE OF ACOUSTICS (China)

b TIANJIN UNIVERSITY (China)

c JAPAN ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

d MICROSOFT RESEARCH ASIA (China)

Author keywords

Model based Gaussian clustering; sequential Gaussian mixture model (GMM); speech presence probability; unsupervised learning; voice activity detection (VAD)

Indexed keywords

CONSTRUCT MODELS; EM ALGORITHMS; GAUSSIAN MIXTURE MODEL; GAUSSIANS; GSM AMR; MODEL CONSTRUCTION; SEMI-SUPERVISED; SEMI-SUPERVISED LEARNING; STATISTICAL MODELS; SUB-BANDS; TIME FREQUENCY DOMAIN; TIMIT DATABASE; VOICE ACTIVITY DETECTION; VOICE ACTIVITY DETECTION (VAD); VOICE ACTIVITY DETECTORS;

ALGORITHMS; DETECTORS; GAUSSIAN DISTRIBUTION; SUPERVISED LEARNING; UNSUPERVISED LEARNING;

SPEECH RECOGNITION;

EID: 80053614636 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2011.2125953 Document Type: Article

Times cited : (92)

References (35)

1
- 0029290274
- Study of a voice activity detector and its influence on a noise reduction system
- Apr.
- R. Jeannes and G. Faucon, "Study of a voice activity detector and its influence on a noise reduction system," Speech Commun., vol. 16, no. 3, pp. 245-254, Apr. 1995.
- (1995) Speech Commun. , vol.16 , Issue.3 , pp. 245-254
- Jeannes, R.¹ Faucon, G.²

2
- 0019606509
- An improved endpoint detector for isolated word recognition
- Aug.
- F. Lamel, R. Rabiner, E. Rosenberg, and G. Wilpon, "An improved endpoint detector for isolated word recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 4, pp. 777-785, Aug. 1981.
- (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.4 , pp. 777-785
- Lamel, F.¹ Rabiner, R.² Rosenberg, E.³ Wilpon, G.⁴

3
- 80053579910
- Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) ETSI
- Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETS1 EN 301 708 Rec., ETSI, 1999.
- (1999) Speech Traffic Channels, ETS1 EN 301 708 Rec.

4
- 80053587930
- Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; ETSI
- Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 Rec., ETSI, 2002.
- (2002) Compression algorithms, ETSI ES 202 050 Rec.

5
- 0035125193
- Wavelet speech enhancement based on the Teager energy operator
- DOI 10.1109/97.889636
- M. Bahoura and J. Rouat, "Wavelet speech enhancement based on the Teager energy operator," IEEE Signal Process. Lett., vol. 8, no. 1, pp. 10-12, Jan. 2001. (Pubitemid 32130849)
- (2001) IEEE Signal Processing Letters , vol.8 , Issue.1 , pp. 10-12
- Bahoura, M.¹ Rouat, J.²

6
- 1842476689
- Efficient voice activity detection algorithms using long-term speech information
- J. Ramírez and J. C. Segura et al., "Efficient voice activity detection algorithms using long-term speech information," Speech Commun., vol. 42, no. 3, pp. 271-287, 2004.
- (2004) Speech Commun. , vol.42 , Issue.3 , pp. 271-287
- Ramírez, J.¹ Segura, J.C.²

7
- 27744483317
- An effective subband OSF-based VAD with noise reduction for robust speech recognition
- DOI 10.1109/TSA.2005.853212
- J. Ramírez and J. C. Segura et al., "An effective subband OSF-based VAD with noise reduction for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 13, no. 6, pp. 1119-1129, Nov. 2005. (Pubitemid 41605016)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.6 , pp. 1119-1129
- Ramirez, J.¹ Segura, J.C.² Benitez, C.³ De La Torre, A.⁴ Rubio, A.⁵

8
- 84946217050
- ITU
- Coding of speech at 8 kbit/s using conjugate structure algebraic codeexcited linear rrediction. Annex B: A silence compression scheme for G.729 optimized for terminals conforming to recommend, V.70, ITU, 1996.
- (1996) Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Codeexcited Linear Rediction. Annex B: A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommend V.70

9
- 0026907622
- Voice activity detection using a periodicity measure
- R. Tucker, "Voice activity detection using a periodicity measure," Proc. Inst. Elect. Eng. 1992, pp. 377-380, 1992. (Pubitemid 23558809)
- (1992) IEE Proceedings, Part I: Communications, Speech and Vision , vol.139 , Issue.4 , pp. 377-380
- Tucker, R.¹

10
- 85133167678
- Study of noise robust voice activity detection based on periodic component to aperiodic component ratio
- Pittsburgh, PA
- K. Ishizuka and T. Nakatani, "Study of noise robust voice activity detection based on periodic component to aperiodic component ratio," in Proc. SAPA'06, Pittsburgh, PA, 2006, pp. 65-70.
- (2006) Proc. SAPA'06 , pp. 65-70
- Ishizuka, K.¹ Nakatani, T.²

11
- 0036476655
- Speech pause detection for noise spectrum estimation by tracking power envelope dynamics
- DOI 10.1109/89.985548, PII S1063667602015237
- M. Marzinzik and B. Kollmeier, "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics," IEEE Trans. Speech Audio Process., vol. 10, no. 2, pp. 109-118, Feb. 2002. (Pubitemid 34295270)
- (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.2 , pp. 109-118
- Marzinzik, M.¹ Kollmeier, B.²

12
- 33846259282
- Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
- DOI 10.1109/TSA.2005.855842
- A. Davis, S. Nordholm, and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Speech Audio Process., vol. 14, no. 2, pp. 412-423, Mar. 2006. (Pubitemid 46405343)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.2 , pp. 412-423
- Davis, A.¹ Nordholm, S.² Togneri, R.³

13
- 0035274536
- Robust voice activity detection using higher-order statistics in the LPC residual domain
- DOI 10.1109/89.905996, PII S1063667601013244
- E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 217-231, Mar. 2001. (Pubitemid 32300847)
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.3 , pp. 217-231
- Nemer, E.¹ Goubran, R.² Mahmoud, S.³

14
- 0031636164
- A voice activity detector employing soft decision based noise spectrum adaptation
- Seattle, WA
- J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. Int. Conf. Acoust., Speech, Signal Process., Seattle, WA, 1998, vol. 1, pp. 365-368.
- (1998) Proc. Int. Conf. Acoust., Speech, Signal Process. , vol.1 , pp. 365-368
- Sohn, J.¹ Sung, W.²

15
- 0032762471
- A statistical model-based voice activity detection
- Jan.
- J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
- (1999) IEEE Signal Process. Lett. , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.S.² Sung, W.³

16
- 0035481845
- Analysis and improvement of a statistical model-based voice activity detector
- Oct.
- Y. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Signal Process. Lett., vol. 8, no. 10, pp. 276-279, Oct. 2001.
- (2001) IEEE Signal Process. Lett. , vol.8 , Issue.10 , pp. 276-279
- Cho, Y.¹ Kondoz, A.²

17
- 70350433096
- Jointly Gaussian PDFbased likelihood ratio test for voice activity detection
- Nov.
- J. Górriz, J. Ramírez, E. Lang, and C. Puntonet, "Jointly Gaussian PDFbased likelihood ratio test for voice activity detection," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 8, pp. 1565-1578, Nov. 2008.
- (2008) IEEE Trans. Audio Speech Lang. Process. , vol.16 , Issue.8 , pp. 1565-1578
- Górriz, J.¹ Ramírez, J.² Lang, E.³ Puntonet, C.⁴

18
- 10744220144
- A new Kullback-Leibler VAD for speech recognition in noise
- Feb.
- J. Ramírez, J. C. Segura, M. C. Benítez, A. de la Torre, and A. Rubio, "A new Kullback-Leibler VAD for speech recognition in noise," IEEE Signal Process. Lett., vol. 11, no. 2, pp. 666-669, Feb. 2004.
- (2004) IEEE Signal Process. Lett. , vol.11 , Issue.2 , pp. 666-669
- Ramírez, J.¹ Segura, J.C.² Benítez, M.C.³ De La Torre, A.⁴ Rubio, A.⁵

19
- 0042863279
- A soft voice activity detector based on a Laplacian-Gaussian model
- Sep.
- S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 498-505, Sep. 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.5 , pp. 498-505
- Gazor, S.¹ Zhang, W.²

20
- 23344452899
- Statistical voice activity detection using a multiple observation likelihood ratio test
- DOI 10.1109/LSP.2005.855551
- J. Ramírez and J. C. Segura, "Statistical voice activity detection using a multiple observation likelihood ratio test," IEEE Signal Process. Lett., vol. 12, no. 10, pp. 689-692, Oct. 2005. (Pubitemid 41448576)
- (2005) IEEE Signal Processing Letters , vol.12 , Issue.10 , pp. 689-692
- Ramirez, J.¹ Segura, J.C.² Benitez, C.³ Garcia, L.⁴ Rubio, A.⁵

21
- 33744532633
- Voice activity detection based on multiple statistical models
- DOI 10.1109/TSP.2006.874403
- J. Chang, N. Kim, and S. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Signal Process., vol. 54, no. 6, pp. 1965-1976, Jun. 2006. (Pubitemid 43811393)
- (2006) IEEE Transactions on Signal Processing , vol.54 , Issue.6 , pp. 1965-1976
- Chang, J.-H.¹ Kim, N.S.² Mitra, S.K.³

22
- 0036508040
- Robust endpoint detection and energy normalization for real-time speech and speaker recognition
- DOI 10.1109/TSA.2002.1001979, PII S106366760203972X
- Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process., vol. 10, no. 3, pp. 146-157, Mar. 2002. (Pubitemid 34692538)
- (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.3 , pp. 146-157
- Li, Q.¹ Zheng, J.² Tsai, A.³ Zhou, Q.⁴

23
- 66149135598
- Change point detection in GARCH models for voice activity detection
- Jul.
- R. Tahmasbi and S. Rezaei, "Change point detection in GARCH models for voice activity detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 1038-1046, Jul. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.5 , pp. 1038-1046
- Tahmasbi, R.¹ Rezaei, S.²

24
- 58049171791
- Cambridge, MA: MIT Press, ch. 1
- O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006, ch. 1, pp. 1-12.
- (2006) Semi-Supervised Learning , pp. 1-12
- Chapelle, O.¹ Schölkopf, B.² Zien, A.³

25
- 0034832359
- Assessing local noise level estimation methods: Application to noise robust ASR
- DOI 10.1016/S0167-6393(00)00051-0
- C. Ris and S. Dupont, "Assessing local noise level estimation methods: Application to noise robust ASR," Speech Commun., vol. 34, pp. 141-158, 2001. (Pubitemid 32874674)
- (2001) Speech Communication , vol.34 , Issue.1-2 , pp. 141-158
- Ris, C.¹ Dupont, S.²

26
- 33947664527
- Auto-segmentation based partitioning and clustering approach to robust end pointing
- Toulouse, France
- Y. Shi, F. K. Soong, and J. L. Zhou, "Auto-segmentation based partitioning and clustering approach to robust end pointing," in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, 2006, pp. 793-796.
- (2006) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 793-796
- Shi, Y.¹ Soong, F.K.² Zhou, J.L.³

27
- 0006923547
- Noise adaptation in a hidden Markov model speech recognition system
- D. V. Campernolle, "Noise adaptation in a hidden Markov model speech recognition system," Comput. Speech Lang., vol. 3, pp. 151-168, 1989.
- (1989) Comput. Speech Lang. , vol.3 , pp. 151-168
- Campernolle, D.V.¹

28
- 16444383160
- Survey of clustering algorithm
- May
- R. Xu and D. Wunsch, "Survey of clustering algorithm," IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645-678, May 2005.
- (2005) IEEE Trans. Neural Netw. , vol.16 , Issue.3 , pp. 645-678
- Xu, R.¹ Wunsch, D.²

29
- 84898462184
- Incremental learning of temporally coherent Gaussian Mixture Models
- O. Arandjelovic and R. Cipolla, "Incremental learning of temporallycoherent Gaussian Mixture Models," in Proc. BMVC, 2005.
- (2005) Proc. BMVC
- Arandjelovic, O.¹ Cipolla, R.²

30
- 0031103160
- On-Line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate
- Q. Huo and C. Lee, "On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate," IEEE Trans. Speech Audio Process., vol. 5, no. 2, pp. 161-172, Mar. 1997. (Pubitemid 127746048)
- (1997) IEEE Transactions on Speech and Audio Processing , vol.5 , Issue.2 , pp. 161-172
- Huo, Q.¹ Lee, C.-H.²

31
- 0027797470
- On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure
- Aug.
- V. Krishnamurthy and J. Moore, "On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure," IEEE Trans. Signal Process., vol. 41, no. 8, pp. 2557-2573, Aug. 1993.
- (1993) IEEE Trans. Signal Process. , vol.41 , Issue.8 , pp. 2557-2573
- Krishnamurthy, V.¹ Moore, J.²

32
- 0003419545
- Gaithersburg MD, prototype as of Dec.
- J. S. Garofolo, Getting StartedWith the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database Nat. Inst. Standards Technol. (NIST). Gaithersburg, MD, prototype as of Dec. 1988.
- (1988) Getting StartedWith the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database Nat. Inst. Standards Technol. (NIST)
- Garofolo, J.S.¹

33
- 0027623210
- Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- Jul.
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, Jul. 1993.
- (1993) Speech Commun. , vol.12 , Issue.3 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

34
- 80053605786
- Digital Cellular Telecommunications System (Phase 2+); Adaptive Multi Rate (AMR) Speech; ANSI-C Code for AMR Speech Codec
- Digital Cellular Telecommunications System (Phase 2+); Adaptive Multi Rate (AMR) Speech; ANSI-C Code for AMR Speech Codec, 1998.
- (1998)

35
- 80053598228
- ITU, Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Code-Excited Linear Prediction. Annex I: Reference Fixed-Point Implementation for Integrating G.729 CS-ACELP Speech Coding Main Body With Annexes B, D and E, Int. Telecommun. Union
- ITU, Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Code-Excited Linear Prediction. Annex I: Reference Fixed-Point Implementation for Integrating G.729 CS-ACELP Speech Coding Main Body With Annexes B, D and E, Int. Telecommun. Union, 2000.
- (2000)

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.