SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 4, Issue 5, 2010, Pages 798-807

Low-complexity variable frame rate analysis for speech recognition and voice activity detection

Author keywords

Distributed speech recognition; frame selection; noise robust speech recognition; variable frame rate; voice activity detection (VAD)

Indexed keywords

DISTRIBUTED SPEECH RECOGNITION; FRAME SELECTION; NOISE ROBUST SPEECH RECOGNITION; VARIABLE FRAME RATE; VOICE ACTIVITY DETECTION;

LINGUISTICS; SIGNAL DETECTION; SIGNAL TO NOISE RATIO; SPEECH PROCESSING;

SPEECH RECOGNITION;

EID: 77956733652 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2010.2057192 Document Type: Article

Times cited : (93)

References (40)

1
- 77956774989
- Automatic Speech Recognition on Mobile Devices and Over Communication Networks, Z.-H. Tan and B. Lindberg, Eds.. London, U.K.: Springer-Verlag, 2008
- Automatic Speech Recognition on Mobile Devices and Over Communication Networks, Z.-H. Tan and B. Lindberg, Eds.. London, U.K.: Springer-Verlag, 2008.

2
- 51449115700
- Embedded speech recognition applications in mobile phones: Status, trends, and challenges
- Las Vegas, NV
- J. Cohen, "Embedded speech recognition applications in mobile phones: Status, trends, and challenges," in Proc. ICASSP'08, Las Vegas, NV, 2008, pp. 5352-15255
- (2008) Proc. ICASSP'08 , pp. 5352-15255
- Cohen, J.¹

3
- 0003424145
- 2nd ed. New York: Wiley-IEEE Press
- J. Deller, J. Hansen, and J. Proakis, Discrete-Time Processing of Speech Signals, 2nd ed. New York: Wiley-IEEE Press, 1999.
- (1999) Discrete-Time Processing of Speech Signals
- Deller, J.¹ Hansen, J.² Proakis, J.³

4
- 0033690878
- On the use of variable frame rate analysis in speech recognition
- Q. Zhu and A. Alwan, "On the use of variable frame rate analysis in speech recognition," in Proc. IEEE ICASSP, 2000, pp. 3264-3267.
- (2000) Proc. IEEE ICASSP , pp. 3264-3267
- Zhu, Q.¹ Alwan, A.²

5
- 0028739811
- A new variable frame rate analysis method for speech recognition
- Dec
- P. Le Cerf and D. Van Compernolle, "A new variable frame rate analysis method for speech recognition," IEEE Signal Process. Lett., vol.1, no.12, pp. 185-187, Dec. 1994.
- (1994) IEEE Signal Process. Lett. , vol.1 , Issue.12 , pp. 185-187
- Le Cerf, P.¹ Van Compernolle, D.²

6
- 78649280037
- Optimal frame rate analysis for speech recognition
- Dec
- S. J. Young and D. Rainton, "Optimal frame rate analysis for speech recognition," in Proc. IEE Colloquium on Techniques for Speech Process., Dec. 1990.
- (1990) Proc. IEE Colloquium on Techniques for Speech Process.
- Young, S.J.¹ Rainton, D.²

7
- 0000500861
- The use of variable frame rate analysis in speech recognition
- K. M. Pointing and S. M. Peeling, "The use of variable frame rate analysis in speech recognition," Comput. Speech Lang., vol.5, no.2, pp. 169-179, 1991.
- (1991) Comput. Speech Lang. , vol.5 , Issue.2 , pp. 169-179
- Pointing, K.M.¹ Peeling, S.M.²

8
- 42549121633
- Singing voice recognition considering high-pitched and prolonged sounds
- Florence, Italy
- A. Sasou, "Singing voice recognition considering high-pitched and prolonged sounds," in Proc. EUSIPCO, Florence, Italy, 2006.
- (2006) Proc. EUSIPCO
- Sasou, A.¹

9
- 4544286862
- Entropy-based variable frame rate analysis of speech signals and its application to ASR
- H. You, Q. Zhu, and A. Alwan, "Entropy-based variable frame rate analysis of speech signals and its application to ASR," in Proc. IEEE ICASSP, 2004, pp. 549-552.
- (2004) Proc. IEEE ICASSP , pp. 549-552
- You, H.¹ Zhu, Q.² Alwan, A.³

10
- 42549096974
- An energy search approach to variable frame rate front-end processing for robust ASR
- Lisbon, Portugal
- J. Epps and E. Choi, "An energy search approach to variable frame rate front-end processing for robust ASR," in Proc. Eurospeech'05, Lisbon, Portugal, 2005.
- (2005) Proc. Eurospeech'05
- Epps, J.¹ Choi, E.²

11
- 85009214271
- Discriminative analysis for feature reduction in automatic speech recognition
- E. L. Bocchieri and J. G. Wilpon, "Discriminative analysis for feature reduction in automatic speech recognition," in Proc. IEEE ICASSP, 1992, pp. 501-504.
- (1992) Proc. IEEE ICASSP , pp. 501-504
- Bocchieri, E.L.¹ Wilpon, J.G.²

12
- 33847629729
- On noise masking for automatic missing data speech recognition: A survey and discussion
- Jul.
- C. Cerisara, S. Demangea, and J.-P. Haton, "On noise masking for automatic missing data speech recognition: A survey and discussion," Comput. Speech Lang., vol.21, no.3, pp. 443-457, Jul. 2007.
- (2007) Comput. Speech Lang. , vol.21 , Issue.3 , pp. 443-457
- Cerisara, C.¹ Demangea, S.² Haton, J.-P.³

13
- 84892174007
- Weighted Viterbi algorithm and state duration modelling for speech recognition in noise
- Seattle, WA
- N. B. Yoma, F. R. McInnes, and M. A. Jack, "Weighted Viterbi algorithm and state duration modelling for speech recognition in noise," in Proc. ICASSP'98, Seattle, WA, 1998, pp. 709-712.
- (1998) Proc. ICASSP'98 , pp. 709-712
- Yoma, N.B.¹ McInnes, F.R.² Jack, M.A.³

14
- 0242552300
- Partial splicing packet loss concealment for distributed speech recognition
- Oct
- Z.-H. Tan, P. Dalsgaard, and B. Lindberg, "Partial splicing packet loss concealment for distributed speech recognition," IEE Electron. Lett., vol.39, no.22, pp. 1619-1620, Oct. 2003.
- (2003) IEE Electron. Lett. , vol.39 , Issue.22 , pp. 1619-1620
- Tan, Z.-H.¹ Dalsgaard, P.² Lindberg, B.³

15
- 42549131394
- Exploiting temporal correlation of speech for error-robust and bandwidth-flexible distributed speech recognition
- May
- Z.-H. Tan, P. Dalsgaard, and B. Lindberg, "Exploiting temporal correlation of speech for error-robust and bandwidth-flexible distributed speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.4, pp. 1391-1403, May 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1391-1403
- Tan, Z.-H.¹ Dalsgaard, P.² Lindberg, B.³

16
- 0038669544
- The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Paris, France
- H. G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR, Paris, France, 2000.
- (2000) Proc. ISCA ITRW ASR
- Hirsch, H.G.¹ Pearce, D.²

17
- 0003483593
- Cambridge, U.K.: Cambridge Univ. Speech Group
- S. J. Young et al., HTK: Hidden Markov Model Toolkit V3.2.1, Reference Manual. Cambridge, U.K.: Cambridge Univ. Speech Group, 2004.
- (2004) HTK: Hidden Markov Model Toolkit V3.2.1, Reference Manual
- Young, S.J.¹

18
- 85009201789
- Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition
- Geneva, Switzerland, Sep
- J. Macias-Guarasa, J. Ordonez, J. M. Montero, J. Ferreiros, R. Cordoba, and L. F. D. Haro, "Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition," in Proc. Eurospeech'03, Geneva, Switzerland, Sep. 2003.
- (2003) Proc. Eurospeech'03
- MacIas-Guarasa, J.¹ Ordonez, J.² Montero, J.M.³ Ferreiros, J.⁴ Cordoba, R.⁵ Haro, L.F.D.⁶

19
- 0018455310
- SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
- S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-27, no.2, pp. 113-120, Feb. 1979. (Pubitemid 9467471)
- (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
- Boll Steven, F.¹

20
- 54349123450
- A comparison of three non-linear observation models for noisy speech features
- Geneva, Switzerland, Sep
- J. Droppo, L. Deng, and A. Acero, "A comparison of three non-linear observation models for noisy speech features," in Proc. Eurospeech'03, Geneva, Switzerland, Sep. 2003.
- (2003) Proc. Eurospeech'03
- Droppo, J.¹ Deng, L.² Acero, A.³

21
- 0019555090
- Cepstral analysis technique for automatic speaker verification
- Apr
- S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-29, no.2, pp. 254-272, Apr. 1981.
- (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
- Furui, S.¹

22
- 0032141206
- Cepstral domain segmental feature vector normalization for noise robust speech recognition
- O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Commun., vol.25, no.1-3, pp. 133-147, 1998.
- (1998) Speech Commun , vol.25 , Issue.1-3 , pp. 133-147
- Viikki, O.¹ Laurila, K.²

23
- 85009124169
- Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition
- Aalborg, Denmark, Sep
- R. Sarikaya and J. H. L. Hansen, "Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition," in Proc. Eurospeech'01, Aalborg, Denmark, Sep. 2001.
- (2001) Proc. Eurospeech'01
- Sarikaya, R.¹ Hansen, J.H.L.²

24
- 0035396555
- Noise power spectral density estimation based on optimal smoothing and minimum statistics
- Jul.
- R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process., vol.9, no.5, pp. 504-512, Jul. 2001.
- (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.5 , pp. 504-512
- Martin, R.¹

25
- 42549139762
- MVA processing of speech features
- Jan
- C.-P. Chen and J. A. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.1, pp. 257-270, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.-P.¹ Bilmes, J.A.²

26
- 0142009990
- Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise
- Oct
- Q. Zhu and A. Alwan, "Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise," Comput., Speech, Lang., vol.17, no.4, pp. 381-402, Oct. 2003.
- (2003) Comput., Speech, Lang. , vol.17 , Issue.4 , pp. 381-402
- Zhu, Q.¹ Alwan, A.²

27
- 85017287487
- Linear discriminant analysis for improved large vocabulary continuous speech recognition
- San Francisco, CA
- R. Haeb-Umbach and H. Ney, "Linear discriminant analysis for improved large vocabulary continuous speech recognition," in Proc. ICASSP'92, San Francisco, CA, 1992, pp. 13-16.
- (1992) Proc. ICASSP'92 , pp. 13-16
- Haeb-Umbach, R.¹ Ney, H.²

28
- 4544321132
- Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving
- Montreal, QC, Canada
- W.-H. Hsu and L.-S. Lee, "Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving," in Proc. IEEE ICASSP'04, Montreal, QC, Canada, 2004, pp. 69-72.
- (2004) Proc. IEEE ICASSP'04 , pp. 69-72
- Hsu, W.-H.¹ Lee, L.-S.²

29
- 67650275632
- A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition
- Antwerp, Belgium
- B. J. Borgstrom and A. Alwan, "A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition," in Proc. Interspeech'07, Antwerp, Belgium, 2007.
- (2007) Proc. Interspeech'07
- Borgstrom, B.J.¹ Alwan, A.²

30
- 10744220144
- A new Kullback-Leibler VAD for speech recognition in noise
- Feb.
- J. Ramirez, C. Segura, C. Benitez, A. Torre, and A. Rubio, "A new Kullback-Leibler VAD for speech recognition in noise," IEEE Signal Process. Lett., vol.11, no.2, pp. 266-269, Feb. 2004.
- (2004) IEEE Signal Process. Lett. , vol.11 , Issue.2 , pp. 266-269
- Ramirez, J.¹ Segura, C.² Benitez, C.³ Torre, A.⁴ Rubio, A.⁵

31
- 85009078216
- Entropy based voice activity detection in very noisy conditions
- Aalborg, Denmark, Sep
- P. Renevey and A. Drygajlo, "Entropy based voice activity detection in very noisy conditions," in Proc. Eurospeech'01, Aalborg, Denmark, Sep. 2001.
- (2001) Proc. Eurospeech'01
- Renevey, P.¹ Drygajlo, A.²

32
- 20844456665
- A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems
- D. Vlaj, B. Kotnik, B. Horvat, and Z. Kacic, "A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems," EURASIP J. Appl. Signal Process., vol.4, pp. 487-497, 2005.
- (2005) EURASIP J. Appl. Signal Process. , vol.4 , pp. 487-497
- Vlaj, D.¹ Kotnik, B.² Horvat, B.³ Kacic, Z.⁴

33
- 51449114537
- Applying support vector machines to voice activity detection
- Denver, CA
- E. Dong, G. Liu, Y. Zhou, and X. Zhang, "Applying support vector machines to voice activity detection," in Proc. ICSLP'02, Denver, CA, 2002.
- (2002) Proc. ICSLP'02
- Dong, E.¹ Liu, G.² Zhou, Y.³ Zhang, X.⁴

34
- 70749138955
- Robust voiced/unvoiced classification using novel features and Gaussian mixture model
- Montreal, QC, Canada
- J. K. Shah, A. N. Iyer, B. Y. Smolenski, and R. E. Yantorno, "Robust voiced/unvoiced classification using novel features and Gaussian mixture model," in Proc. ICASSP'04, Montreal, QC, Canada, 2004.
- (2004) Proc. ICASSP'04
- Shah, J.K.¹ Iyer, A.N.² Smolenski, B.Y.³ Yantorno, R.E.⁴

35
- 0033693061
- Speech/non-speech classification using multiple features for robust endpoint detection
- Orlando, FL
- W.-H. Shin, B.-S. Lee, Y.-K. Lee, and J.-S. Lee, "Speech/non-speech classification using multiple features for robust endpoint detection," in Proc. ICASSP'02, Orlando, FL, 2002, pp. 1399-1402.
- (2002) Proc. ICASSP'02 , pp. 1399-1402
- Shin, W.-H.¹ Lee, B.-S.² Lee, Y.-K.³ Lee, J.-S.⁴

36
- 77956761247
- Speech processing, transmission and quality aspects (STQ), distributed speech recognition, advanced front-end feature extraction algorithm, compression algorithm, ETSI, ES 202 050 v1.1.1, 2002
- Speech processing, transmission and quality aspects (STQ), distributed speech recognition, advanced front-end feature extraction algorithm, compression algorithm, ETSI, ES 202 050 v1.1.1, 2002.

37
- 77956751487
- Coding of speech at 8 kbit/s using conjugate structure algebraic codeexcited linear-prediction (CS-ACELP) Annex B: A silence compression scheme, ITU, ITU Recommendation G.729, 1996
- Coding of speech at 8 kbit/s using conjugate structure algebraic codeexcited linear-prediction (CS-ACELP) Annex B: A silence compression scheme, ITU, ITU Recommendation G.729, 1996.

38
- 0003657168
- ITU, ITU Recommendation G.723.1, Annex A: Silence compression scheme
- ITU, ITU Recommendation G.723.1, Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. Annex A: Silence compression scheme, 1996.
- (1996) Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 Kbit/s

39
- 51449100230
- A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme
- Las Vegas, NV
- M. Fujimoto, K. Ishizuka, and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme," in Proc. ICASSP'08, Las Vegas, NV, 2008, pp. 4441-4444.
- (2008) Proc. ICASSP'08 , pp. 4441-4444
- Fujimoto, M.¹ Ishizuka, K.² Nakatani, T.³

40
- 77950091897
- Voice activity detection based on statistical models and machine learning approaches
- J. W. Shin, J.-H. Chang, and N. S. Kim, "Voice activity detection based on statistical models and machine learning approaches," Comput., Speech, Lang., vol.24, no.3, pp. 515-530-2010.
- Comput., Speech, Lang. , vol.24 , Issue.3 , pp. 515-5302010
- Shin, J.W.¹ Chang, J.-H.² Kim, N.S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.