메뉴 건너뛰기




Volumn 19, Issue 8, 2011, Pages 2624-2632

Voice activity detection based on an unsupervised learning framework

Author keywords

Model based Gaussian clustering; sequential Gaussian mixture model (GMM); speech presence probability; unsupervised learning; voice activity detection (VAD)

Indexed keywords

CONSTRUCT MODELS; EM ALGORITHMS; GAUSSIAN MIXTURE MODEL; GAUSSIANS; GSM AMR; MODEL CONSTRUCTION; SEMI-SUPERVISED; SEMI-SUPERVISED LEARNING; STATISTICAL MODELS; SUB-BANDS; TIME FREQUENCY DOMAIN; TIMIT DATABASE; VOICE ACTIVITY DETECTION; VOICE ACTIVITY DETECTION (VAD); VOICE ACTIVITY DETECTORS;

EID: 80053614636     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2011.2125953     Document Type: Article
Times cited : (92)

References (35)
  • 1
    • 0029290274 scopus 로고
    • Study of a voice activity detector and its influence on a noise reduction system
    • Apr.
    • R. Jeannes and G. Faucon, "Study of a voice activity detector and its influence on a noise reduction system," Speech Commun., vol. 16, no. 3, pp. 245-254, Apr. 1995.
    • (1995) Speech Commun. , vol.16 , Issue.3 , pp. 245-254
    • Jeannes, R.1    Faucon, G.2
  • 3
    • 80053579910 scopus 로고    scopus 로고
    • Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) ETSI
    • Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETS1 EN 301 708 Rec., ETSI, 1999.
    • (1999) Speech Traffic Channels, ETS1 EN 301 708 Rec.
  • 4
    • 80053587930 scopus 로고    scopus 로고
    • Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; ETSI
    • Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 Rec., ETSI, 2002.
    • (2002) Compression algorithms, ETSI ES 202 050 Rec.
  • 5
    • 0035125193 scopus 로고    scopus 로고
    • Wavelet speech enhancement based on the Teager energy operator
    • DOI 10.1109/97.889636
    • M. Bahoura and J. Rouat, "Wavelet speech enhancement based on the Teager energy operator," IEEE Signal Process. Lett., vol. 8, no. 1, pp. 10-12, Jan. 2001. (Pubitemid 32130849)
    • (2001) IEEE Signal Processing Letters , vol.8 , Issue.1 , pp. 10-12
    • Bahoura, M.1    Rouat, J.2
  • 6
    • 1842476689 scopus 로고    scopus 로고
    • Efficient voice activity detection algorithms using long-term speech information
    • J. Ramírez and J. C. Segura et al., "Efficient voice activity detection algorithms using long-term speech information," Speech Commun., vol. 42, no. 3, pp. 271-287, 2004.
    • (2004) Speech Commun. , vol.42 , Issue.3 , pp. 271-287
    • Ramírez, J.1    Segura, J.C.2
  • 10
    • 85133167678 scopus 로고    scopus 로고
    • Study of noise robust voice activity detection based on periodic component to aperiodic component ratio
    • Pittsburgh, PA
    • K. Ishizuka and T. Nakatani, "Study of noise robust voice activity detection based on periodic component to aperiodic component ratio," in Proc. SAPA'06, Pittsburgh, PA, 2006, pp. 65-70.
    • (2006) Proc. SAPA'06 , pp. 65-70
    • Ishizuka, K.1    Nakatani, T.2
  • 11
    • 0036476655 scopus 로고    scopus 로고
    • Speech pause detection for noise spectrum estimation by tracking power envelope dynamics
    • DOI 10.1109/89.985548, PII S1063667602015237
    • M. Marzinzik and B. Kollmeier, "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics," IEEE Trans. Speech Audio Process., vol. 10, no. 2, pp. 109-118, Feb. 2002. (Pubitemid 34295270)
    • (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.2 , pp. 109-118
    • Marzinzik, M.1    Kollmeier, B.2
  • 12
    • 33846259282 scopus 로고    scopus 로고
    • Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold
    • DOI 10.1109/TSA.2005.855842
    • A. Davis, S. Nordholm, and R. Togneri, "Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold," IEEE Trans. Speech Audio Process., vol. 14, no. 2, pp. 412-423, Mar. 2006. (Pubitemid 46405343)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.2 , pp. 412-423
    • Davis, A.1    Nordholm, S.2    Togneri, R.3
  • 13
    • 0035274536 scopus 로고    scopus 로고
    • Robust voice activity detection using higher-order statistics in the LPC residual domain
    • DOI 10.1109/89.905996, PII S1063667601013244
    • E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 217-231, Mar. 2001. (Pubitemid 32300847)
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.3 , pp. 217-231
    • Nemer, E.1    Goubran, R.2    Mahmoud, S.3
  • 14
    • 0031636164 scopus 로고    scopus 로고
    • A voice activity detector employing soft decision based noise spectrum adaptation
    • Seattle, WA
    • J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. Int. Conf. Acoust., Speech, Signal Process., Seattle, WA, 1998, vol. 1, pp. 365-368.
    • (1998) Proc. Int. Conf. Acoust., Speech, Signal Process. , vol.1 , pp. 365-368
    • Sohn, J.1    Sung, W.2
  • 15
    • 0032762471 scopus 로고    scopus 로고
    • A statistical model-based voice activity detection
    • Jan.
    • J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
    • (1999) IEEE Signal Process. Lett. , vol.6 , Issue.1 , pp. 1-3
    • Sohn, J.1    Kim, N.S.2    Sung, W.3
  • 16
    • 0035481845 scopus 로고    scopus 로고
    • Analysis and improvement of a statistical model-based voice activity detector
    • Oct.
    • Y. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Signal Process. Lett., vol. 8, no. 10, pp. 276-279, Oct. 2001.
    • (2001) IEEE Signal Process. Lett. , vol.8 , Issue.10 , pp. 276-279
    • Cho, Y.1    Kondoz, A.2
  • 17
    • 70350433096 scopus 로고    scopus 로고
    • Jointly Gaussian PDFbased likelihood ratio test for voice activity detection
    • Nov.
    • J. Górriz, J. Ramírez, E. Lang, and C. Puntonet, "Jointly Gaussian PDFbased likelihood ratio test for voice activity detection," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 8, pp. 1565-1578, Nov. 2008.
    • (2008) IEEE Trans. Audio Speech Lang. Process. , vol.16 , Issue.8 , pp. 1565-1578
    • Górriz, J.1    Ramírez, J.2    Lang, E.3    Puntonet, C.4
  • 19
    • 0042863279 scopus 로고    scopus 로고
    • A soft voice activity detector based on a Laplacian-Gaussian model
    • Sep.
    • S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 498-505, Sep. 2003.
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.5 , pp. 498-505
    • Gazor, S.1    Zhang, W.2
  • 20
    • 23344452899 scopus 로고    scopus 로고
    • Statistical voice activity detection using a multiple observation likelihood ratio test
    • DOI 10.1109/LSP.2005.855551
    • J. Ramírez and J. C. Segura, "Statistical voice activity detection using a multiple observation likelihood ratio test," IEEE Signal Process. Lett., vol. 12, no. 10, pp. 689-692, Oct. 2005. (Pubitemid 41448576)
    • (2005) IEEE Signal Processing Letters , vol.12 , Issue.10 , pp. 689-692
    • Ramirez, J.1    Segura, J.C.2    Benitez, C.3    Garcia, L.4    Rubio, A.5
  • 21
    • 33744532633 scopus 로고    scopus 로고
    • Voice activity detection based on multiple statistical models
    • DOI 10.1109/TSP.2006.874403
    • J. Chang, N. Kim, and S. Mitra, "Voice activity detection based on multiple statistical models," IEEE Trans. Signal Process., vol. 54, no. 6, pp. 1965-1976, Jun. 2006. (Pubitemid 43811393)
    • (2006) IEEE Transactions on Signal Processing , vol.54 , Issue.6 , pp. 1965-1976
    • Chang, J.-H.1    Kim, N.S.2    Mitra, S.K.3
  • 22
    • 0036508040 scopus 로고    scopus 로고
    • Robust endpoint detection and energy normalization for real-time speech and speaker recognition
    • DOI 10.1109/TSA.2002.1001979, PII S106366760203972X
    • Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process., vol. 10, no. 3, pp. 146-157, Mar. 2002. (Pubitemid 34692538)
    • (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.3 , pp. 146-157
    • Li, Q.1    Zheng, J.2    Tsai, A.3    Zhou, Q.4
  • 23
    • 66149135598 scopus 로고    scopus 로고
    • Change point detection in GARCH models for voice activity detection
    • Jul.
    • R. Tahmasbi and S. Rezaei, "Change point detection in GARCH models for voice activity detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 1038-1046, Jul. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.5 , pp. 1038-1046
    • Tahmasbi, R.1    Rezaei, S.2
  • 25
    • 0034832359 scopus 로고    scopus 로고
    • Assessing local noise level estimation methods: Application to noise robust ASR
    • DOI 10.1016/S0167-6393(00)00051-0
    • C. Ris and S. Dupont, "Assessing local noise level estimation methods: Application to noise robust ASR," Speech Commun., vol. 34, pp. 141-158, 2001. (Pubitemid 32874674)
    • (2001) Speech Communication , vol.34 , Issue.1-2 , pp. 141-158
    • Ris, C.1    Dupont, S.2
  • 26
    • 33947664527 scopus 로고    scopus 로고
    • Auto-segmentation based partitioning and clustering approach to robust end pointing
    • Toulouse, France
    • Y. Shi, F. K. Soong, and J. L. Zhou, "Auto-segmentation based partitioning and clustering approach to robust end pointing," in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, 2006, pp. 793-796.
    • (2006) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 793-796
    • Shi, Y.1    Soong, F.K.2    Zhou, J.L.3
  • 27
    • 0006923547 scopus 로고
    • Noise adaptation in a hidden Markov model speech recognition system
    • D. V. Campernolle, "Noise adaptation in a hidden Markov model speech recognition system," Comput. Speech Lang., vol. 3, pp. 151-168, 1989.
    • (1989) Comput. Speech Lang. , vol.3 , pp. 151-168
    • Campernolle, D.V.1
  • 28
    • 16444383160 scopus 로고    scopus 로고
    • Survey of clustering algorithm
    • May
    • R. Xu and D. Wunsch, "Survey of clustering algorithm," IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645-678, May 2005.
    • (2005) IEEE Trans. Neural Netw. , vol.16 , Issue.3 , pp. 645-678
    • Xu, R.1    Wunsch, D.2
  • 29
    • 84898462184 scopus 로고    scopus 로고
    • Incremental learning of temporally coherent Gaussian Mixture Models
    • O. Arandjelovic and R. Cipolla, "Incremental learning of temporallycoherent Gaussian Mixture Models," in Proc. BMVC, 2005.
    • (2005) Proc. BMVC
    • Arandjelovic, O.1    Cipolla, R.2
  • 30
    • 0031103160 scopus 로고    scopus 로고
    • On-Line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate
    • Q. Huo and C. Lee, "On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate," IEEE Trans. Speech Audio Process., vol. 5, no. 2, pp. 161-172, Mar. 1997. (Pubitemid 127746048)
    • (1997) IEEE Transactions on Speech and Audio Processing , vol.5 , Issue.2 , pp. 161-172
    • Huo, Q.1    Lee, C.-H.2
  • 31
    • 0027797470 scopus 로고
    • On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure
    • Aug.
    • V. Krishnamurthy and J. Moore, "On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure," IEEE Trans. Signal Process., vol. 41, no. 8, pp. 2557-2573, Aug. 1993.
    • (1993) IEEE Trans. Signal Process. , vol.41 , Issue.8 , pp. 2557-2573
    • Krishnamurthy, V.1    Moore, J.2
  • 33
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • Jul.
    • A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, Jul. 1993.
    • (1993) Speech Commun. , vol.12 , Issue.3 , pp. 247-251
    • Varga, A.1    Steeneken, H.2
  • 34
    • 80053605786 scopus 로고    scopus 로고
    • Digital Cellular Telecommunications System (Phase 2+); Adaptive Multi Rate (AMR) Speech; ANSI-C Code for AMR Speech Codec
    • Digital Cellular Telecommunications System (Phase 2+); Adaptive Multi Rate (AMR) Speech; ANSI-C Code for AMR Speech Codec, 1998.
    • (1998)
  • 35
    • 80053598228 scopus 로고    scopus 로고
    • ITU, Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Code-Excited Linear Prediction. Annex I: Reference Fixed-Point Implementation for Integrating G.729 CS-ACELP Speech Coding Main Body With Annexes B, D and E, Int. Telecommun. Union
    • ITU, Coding of Speech at 8 kbit/s Using Conjugate Structure Algebraic Code-Excited Linear Prediction. Annex I: Reference Fixed-Point Implementation for Integrating G.729 CS-ACELP Speech Coding Main Body With Annexes B, D and E, Int. Telecommun. Union, 2000.
    • (2000)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.