메뉴 건너뛰기




Volumn 4, Issue 5, 2010, Pages 798-807

Low-complexity variable frame rate analysis for speech recognition and voice activity detection

Author keywords

Distributed speech recognition; frame selection; noise robust speech recognition; variable frame rate; voice activity detection (VAD)

Indexed keywords

DISTRIBUTED SPEECH RECOGNITION; FRAME SELECTION; NOISE ROBUST SPEECH RECOGNITION; VARIABLE FRAME RATE; VOICE ACTIVITY DETECTION;

EID: 77956733652     PISSN: 19324553     EISSN: None     Source Type: Journal    
DOI: 10.1109/JSTSP.2010.2057192     Document Type: Article
Times cited : (93)

References (40)
  • 1
    • 77956774989 scopus 로고    scopus 로고
    • Automatic Speech Recognition on Mobile Devices and Over Communication Networks, Z.-H. Tan and B. Lindberg, Eds.. London, U.K.: Springer-Verlag, 2008
    • Automatic Speech Recognition on Mobile Devices and Over Communication Networks, Z.-H. Tan and B. Lindberg, Eds.. London, U.K.: Springer-Verlag, 2008.
  • 2
    • 51449115700 scopus 로고    scopus 로고
    • Embedded speech recognition applications in mobile phones: Status, trends, and challenges
    • Las Vegas, NV
    • J. Cohen, "Embedded speech recognition applications in mobile phones: Status, trends, and challenges," in Proc. ICASSP'08, Las Vegas, NV, 2008, pp. 5352-15255
    • (2008) Proc. ICASSP'08 , pp. 5352-15255
    • Cohen, J.1
  • 4
    • 0033690878 scopus 로고    scopus 로고
    • On the use of variable frame rate analysis in speech recognition
    • Q. Zhu and A. Alwan, "On the use of variable frame rate analysis in speech recognition," in Proc. IEEE ICASSP, 2000, pp. 3264-3267.
    • (2000) Proc. IEEE ICASSP , pp. 3264-3267
    • Zhu, Q.1    Alwan, A.2
  • 5
    • 0028739811 scopus 로고
    • A new variable frame rate analysis method for speech recognition
    • Dec
    • P. Le Cerf and D. Van Compernolle, "A new variable frame rate analysis method for speech recognition," IEEE Signal Process. Lett., vol.1, no.12, pp. 185-187, Dec. 1994.
    • (1994) IEEE Signal Process. Lett. , vol.1 , Issue.12 , pp. 185-187
    • Le Cerf, P.1    Van Compernolle, D.2
  • 7
    • 0000500861 scopus 로고
    • The use of variable frame rate analysis in speech recognition
    • K. M. Pointing and S. M. Peeling, "The use of variable frame rate analysis in speech recognition," Comput. Speech Lang., vol.5, no.2, pp. 169-179, 1991.
    • (1991) Comput. Speech Lang. , vol.5 , Issue.2 , pp. 169-179
    • Pointing, K.M.1    Peeling, S.M.2
  • 8
    • 42549121633 scopus 로고    scopus 로고
    • Singing voice recognition considering high-pitched and prolonged sounds
    • Florence, Italy
    • A. Sasou, "Singing voice recognition considering high-pitched and prolonged sounds," in Proc. EUSIPCO, Florence, Italy, 2006.
    • (2006) Proc. EUSIPCO
    • Sasou, A.1
  • 9
    • 4544286862 scopus 로고    scopus 로고
    • Entropy-based variable frame rate analysis of speech signals and its application to ASR
    • H. You, Q. Zhu, and A. Alwan, "Entropy-based variable frame rate analysis of speech signals and its application to ASR," in Proc. IEEE ICASSP, 2004, pp. 549-552.
    • (2004) Proc. IEEE ICASSP , pp. 549-552
    • You, H.1    Zhu, Q.2    Alwan, A.3
  • 10
    • 42549096974 scopus 로고    scopus 로고
    • An energy search approach to variable frame rate front-end processing for robust ASR
    • Lisbon, Portugal
    • J. Epps and E. Choi, "An energy search approach to variable frame rate front-end processing for robust ASR," in Proc. Eurospeech'05, Lisbon, Portugal, 2005.
    • (2005) Proc. Eurospeech'05
    • Epps, J.1    Choi, E.2
  • 11
    • 85009214271 scopus 로고
    • Discriminative analysis for feature reduction in automatic speech recognition
    • E. L. Bocchieri and J. G. Wilpon, "Discriminative analysis for feature reduction in automatic speech recognition," in Proc. IEEE ICASSP, 1992, pp. 501-504.
    • (1992) Proc. IEEE ICASSP , pp. 501-504
    • Bocchieri, E.L.1    Wilpon, J.G.2
  • 12
    • 33847629729 scopus 로고    scopus 로고
    • On noise masking for automatic missing data speech recognition: A survey and discussion
    • Jul.
    • C. Cerisara, S. Demangea, and J.-P. Haton, "On noise masking for automatic missing data speech recognition: A survey and discussion," Comput. Speech Lang., vol.21, no.3, pp. 443-457, Jul. 2007.
    • (2007) Comput. Speech Lang. , vol.21 , Issue.3 , pp. 443-457
    • Cerisara, C.1    Demangea, S.2    Haton, J.-P.3
  • 13
    • 84892174007 scopus 로고    scopus 로고
    • Weighted Viterbi algorithm and state duration modelling for speech recognition in noise
    • Seattle, WA
    • N. B. Yoma, F. R. McInnes, and M. A. Jack, "Weighted Viterbi algorithm and state duration modelling for speech recognition in noise," in Proc. ICASSP'98, Seattle, WA, 1998, pp. 709-712.
    • (1998) Proc. ICASSP'98 , pp. 709-712
    • Yoma, N.B.1    McInnes, F.R.2    Jack, M.A.3
  • 14
    • 0242552300 scopus 로고    scopus 로고
    • Partial splicing packet loss concealment for distributed speech recognition
    • Oct
    • Z.-H. Tan, P. Dalsgaard, and B. Lindberg, "Partial splicing packet loss concealment for distributed speech recognition," IEE Electron. Lett., vol.39, no.22, pp. 1619-1620, Oct. 2003.
    • (2003) IEE Electron. Lett. , vol.39 , Issue.22 , pp. 1619-1620
    • Tan, Z.-H.1    Dalsgaard, P.2    Lindberg, B.3
  • 15
    • 42549131394 scopus 로고    scopus 로고
    • Exploiting temporal correlation of speech for error-robust and bandwidth-flexible distributed speech recognition
    • May
    • Z.-H. Tan, P. Dalsgaard, and B. Lindberg, "Exploiting temporal correlation of speech for error-robust and bandwidth-flexible distributed speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.4, pp. 1391-1403, May 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1391-1403
    • Tan, Z.-H.1    Dalsgaard, P.2    Lindberg, B.3
  • 16
    • 0038669544 scopus 로고    scopus 로고
    • The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Paris, France
    • H. G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR, Paris, France, 2000.
    • (2000) Proc. ISCA ITRW ASR
    • Hirsch, H.G.1    Pearce, D.2
  • 18
    • 85009201789 scopus 로고    scopus 로고
    • Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition
    • Geneva, Switzerland, Sep
    • J. Macias-Guarasa, J. Ordonez, J. M. Montero, J. Ferreiros, R. Cordoba, and L. F. D. Haro, "Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition," in Proc. Eurospeech'03, Geneva, Switzerland, Sep. 2003.
    • (2003) Proc. Eurospeech'03
    • MacIas-Guarasa, J.1    Ordonez, J.2    Montero, J.M.3    Ferreiros, J.4    Cordoba, R.5    Haro, L.F.D.6
  • 19
    • 0018455310 scopus 로고
    • SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    • S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-27, no.2, pp. 113-120, Feb. 1979. (Pubitemid 9467471)
    • (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll Steven, F.1
  • 20
    • 54349123450 scopus 로고    scopus 로고
    • A comparison of three non-linear observation models for noisy speech features
    • Geneva, Switzerland, Sep
    • J. Droppo, L. Deng, and A. Acero, "A comparison of three non-linear observation models for noisy speech features," in Proc. Eurospeech'03, Geneva, Switzerland, Sep. 2003.
    • (2003) Proc. Eurospeech'03
    • Droppo, J.1    Deng, L.2    Acero, A.3
  • 21
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • Apr
    • S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-29, no.2, pp. 254-272, Apr. 1981.
    • (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui, S.1
  • 22
    • 0032141206 scopus 로고    scopus 로고
    • Cepstral domain segmental feature vector normalization for noise robust speech recognition
    • O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Commun., vol.25, no.1-3, pp. 133-147, 1998.
    • (1998) Speech Commun , vol.25 , Issue.1-3 , pp. 133-147
    • Viikki, O.1    Laurila, K.2
  • 23
    • 85009124169 scopus 로고    scopus 로고
    • Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition
    • Aalborg, Denmark, Sep
    • R. Sarikaya and J. H. L. Hansen, "Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition," in Proc. Eurospeech'01, Aalborg, Denmark, Sep. 2001.
    • (2001) Proc. Eurospeech'01
    • Sarikaya, R.1    Hansen, J.H.L.2
  • 24
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • Jul.
    • R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process., vol.9, no.5, pp. 504-512, Jul. 2001.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 26
    • 0142009990 scopus 로고    scopus 로고
    • Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise
    • Oct
    • Q. Zhu and A. Alwan, "Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise," Comput., Speech, Lang., vol.17, no.4, pp. 381-402, Oct. 2003.
    • (2003) Comput., Speech, Lang. , vol.17 , Issue.4 , pp. 381-402
    • Zhu, Q.1    Alwan, A.2
  • 27
    • 85017287487 scopus 로고
    • Linear discriminant analysis for improved large vocabulary continuous speech recognition
    • San Francisco, CA
    • R. Haeb-Umbach and H. Ney, "Linear discriminant analysis for improved large vocabulary continuous speech recognition," in Proc. ICASSP'92, San Francisco, CA, 1992, pp. 13-16.
    • (1992) Proc. ICASSP'92 , pp. 13-16
    • Haeb-Umbach, R.1    Ney, H.2
  • 28
    • 4544321132 scopus 로고    scopus 로고
    • Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving
    • Montreal, QC, Canada
    • W.-H. Hsu and L.-S. Lee, "Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving," in Proc. IEEE ICASSP'04, Montreal, QC, Canada, 2004, pp. 69-72.
    • (2004) Proc. IEEE ICASSP'04 , pp. 69-72
    • Hsu, W.-H.1    Lee, L.-S.2
  • 29
    • 67650275632 scopus 로고    scopus 로고
    • A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition
    • Antwerp, Belgium
    • B. J. Borgstrom and A. Alwan, "A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition," in Proc. Interspeech'07, Antwerp, Belgium, 2007.
    • (2007) Proc. Interspeech'07
    • Borgstrom, B.J.1    Alwan, A.2
  • 30
    • 10744220144 scopus 로고    scopus 로고
    • A new Kullback-Leibler VAD for speech recognition in noise
    • Feb.
    • J. Ramirez, C. Segura, C. Benitez, A. Torre, and A. Rubio, "A new Kullback-Leibler VAD for speech recognition in noise," IEEE Signal Process. Lett., vol.11, no.2, pp. 266-269, Feb. 2004.
    • (2004) IEEE Signal Process. Lett. , vol.11 , Issue.2 , pp. 266-269
    • Ramirez, J.1    Segura, C.2    Benitez, C.3    Torre, A.4    Rubio, A.5
  • 31
    • 85009078216 scopus 로고    scopus 로고
    • Entropy based voice activity detection in very noisy conditions
    • Aalborg, Denmark, Sep
    • P. Renevey and A. Drygajlo, "Entropy based voice activity detection in very noisy conditions," in Proc. Eurospeech'01, Aalborg, Denmark, Sep. 2001.
    • (2001) Proc. Eurospeech'01
    • Renevey, P.1    Drygajlo, A.2
  • 32
    • 20844456665 scopus 로고    scopus 로고
    • A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems
    • D. Vlaj, B. Kotnik, B. Horvat, and Z. Kacic, "A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems," EURASIP J. Appl. Signal Process., vol.4, pp. 487-497, 2005.
    • (2005) EURASIP J. Appl. Signal Process. , vol.4 , pp. 487-497
    • Vlaj, D.1    Kotnik, B.2    Horvat, B.3    Kacic, Z.4
  • 33
    • 51449114537 scopus 로고    scopus 로고
    • Applying support vector machines to voice activity detection
    • Denver, CA
    • E. Dong, G. Liu, Y. Zhou, and X. Zhang, "Applying support vector machines to voice activity detection," in Proc. ICSLP'02, Denver, CA, 2002.
    • (2002) Proc. ICSLP'02
    • Dong, E.1    Liu, G.2    Zhou, Y.3    Zhang, X.4
  • 34
    • 70749138955 scopus 로고    scopus 로고
    • Robust voiced/unvoiced classification using novel features and Gaussian mixture model
    • Montreal, QC, Canada
    • J. K. Shah, A. N. Iyer, B. Y. Smolenski, and R. E. Yantorno, "Robust voiced/unvoiced classification using novel features and Gaussian mixture model," in Proc. ICASSP'04, Montreal, QC, Canada, 2004.
    • (2004) Proc. ICASSP'04
    • Shah, J.K.1    Iyer, A.N.2    Smolenski, B.Y.3    Yantorno, R.E.4
  • 35
    • 0033693061 scopus 로고    scopus 로고
    • Speech/non-speech classification using multiple features for robust endpoint detection
    • Orlando, FL
    • W.-H. Shin, B.-S. Lee, Y.-K. Lee, and J.-S. Lee, "Speech/non-speech classification using multiple features for robust endpoint detection," in Proc. ICASSP'02, Orlando, FL, 2002, pp. 1399-1402.
    • (2002) Proc. ICASSP'02 , pp. 1399-1402
    • Shin, W.-H.1    Lee, B.-S.2    Lee, Y.-K.3    Lee, J.-S.4
  • 36
    • 77956761247 scopus 로고    scopus 로고
    • Speech processing, transmission and quality aspects (STQ), distributed speech recognition, advanced front-end feature extraction algorithm, compression algorithm, ETSI, ES 202 050 v1.1.1, 2002
    • Speech processing, transmission and quality aspects (STQ), distributed speech recognition, advanced front-end feature extraction algorithm, compression algorithm, ETSI, ES 202 050 v1.1.1, 2002.
  • 37
    • 77956751487 scopus 로고    scopus 로고
    • Coding of speech at 8 kbit/s using conjugate structure algebraic codeexcited linear-prediction (CS-ACELP) Annex B: A silence compression scheme, ITU, ITU Recommendation G.729, 1996
    • Coding of speech at 8 kbit/s using conjugate structure algebraic codeexcited linear-prediction (CS-ACELP) Annex B: A silence compression scheme, ITU, ITU Recommendation G.729, 1996.
  • 39
    • 51449100230 scopus 로고    scopus 로고
    • A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme
    • Las Vegas, NV
    • M. Fujimoto, K. Ishizuka, and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme," in Proc. ICASSP'08, Las Vegas, NV, 2008, pp. 4441-4444.
    • (2008) Proc. ICASSP'08 , pp. 4441-4444
    • Fujimoto, M.1    Ishizuka, K.2    Nakatani, T.3
  • 40
    • 77950091897 scopus 로고    scopus 로고
    • Voice activity detection based on statistical models and machine learning approaches
    • J. W. Shin, J.-H. Chang, and N. S. Kim, "Voice activity detection based on statistical models and machine learning approaches," Comput., Speech, Lang., vol.24, no.3, pp. 515-530-2010.
    • Comput., Speech, Lang. , vol.24 , Issue.3 , pp. 515-5302010
    • Shin, J.W.1    Chang, J.-H.2    Kim, N.S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.