SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 4, Issue 5, 2010, Pages 834-844

Long-term spectro-temporal and static harmonic features for voice activity detection

(3) Fukuda, Takashi a Ichikawa, Osamu a Nishimura, Masafumi a

a IBM RESEARCH TOKYO (Japan)

Author keywords

Average phoneme duration; harmonic structure; long term temporal information; voice activity detection (VAD)

Indexed keywords

ACOUSTIC INFORMATION; AUTOMATIC SPEECH RECOGNITION SYSTEM; AVERAGE PHONEME DURATION; CEPSTRAL DOMAIN; ERROR REDUCTION; HARMONIC STRUCTURE; HARMONIC STRUCTURES; HUMAN VOICE; LOW SNR; MODEL-BASED; NOISE ROBUSTNESS; STATISTICAL MODELS; STRUCTURE-BASED; TEMPORAL FEATURES; TEMPORAL INFORMATION; VOICE ACTIVITY DETECTION; WORD ERROR RATE;

HARMONIC ANALYSIS;

SPEECH RECOGNITION;

EID: 77956739501 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2010.2069750 Document Type: Article

Times cited : (42)

References (26)

1
- 0442270734
- ITU-T Rec. G.729-Annex B
- "A silence compression scheme for G.729 optimized for terminals conforming to Rec. V.70," 1996, ITU-T Rec. G.729-Annex B.
- (1996) A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Rec. V.70

2
- 0442317753
- ETSI EN 301 708 Rec.
- "Voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels," 1999, ETSI EN 301 708 Rec..
- (1999) Voice Activity Detector (VAD) for Adaptive Multi-rate (AMR) Speech Traffic Channels

3
- 0442317754
- ETSI ES 202 050 Rec.
- "Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm," 2002, ETSI ES 202 050 Rec..
- (2002) Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithm

4
- 0032762471
- A statistical model-based voice activity detection
- Jan
- J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol.6, no.1, pp. 1-3, Jan. 1999.
- (1999) IEEE Signal Process. Lett. , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.S.² Sung, W.³

5
- 0035481845
- Analysis and improvement of a statistical model-based voice activity detector
- Oct
- Y. D. Cho and A. Kondoz, "Analysis and improvement of a statistical model-based voice activity detector," IEEE Signal Process. Lett., vol.8, no.10, pp. 276-278, Oct. 2001.
- (2001) IEEE Signal Process. Lett. , vol.8 , Issue.10 , pp. 276-278
- Cho, Y.D.¹ Kondoz, A.²

6
- 84867208777
- Study of integration of statistical model-based voice activity detection and noise suppression
- M. Fujimoto, K. Ishizuka, and T. Nakatani, "Study of integration of statistical model-based voice activity detection and noise suppression," Proc. Interspeech, pp. 2008-2011, 2008.
- (2008) Proc. Interspeech , pp. 2008-2011
- Fujimoto, M.¹ Ishizuka, K.² Nakatani, T.³

7
- 0034854659
- Robust speech/non-speech detection using LDA applied to MFCC
- A. Martin, D. Charlet, and M. Manuuary, "Robust speech/non-speech detection using LDA applied to MFCC," in Proc. ICASSP, 2001, vol.I, pp. 237-240.
- (2001) Proc. ICASSP , vol.1 , pp. 237-240
- Martin, A.¹ Charlet, D.² Manuuary, M.³

8
- 27644475276
- An improved voice activity detection using higher order statistics
- May
- K. Li, M. N. S. Swamy, and M. O. Ahmad, "An improved voice activity detection using higher order statistics," IEEE Trans. Speech Audio Process., vol.13, no.5, pp. 965-974, May 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 965-974
- Li, K.¹ Swamy, M.N.S.² Ahmad, M.O.³

9
- 33947627138
- Robust endpoint detection for speech recognition based on discriminative feature extraction
- K. Yamamoto, F. Jabloun, K. Reinhard, and A. Kawamura, "Robust endpoint detection for speech recognition based on discriminative feature extraction," in Proc. ICASSP, 2006, vol.I, pp. 805-808.
- (2006) Proc. ICASSP , vol.1 , pp. 805-808
- Yamamoto, K.¹ Jabloun, F.² Reinhard, K.³ Kawamura, A.⁴

10
- 0032658253
- TRAPS-Classifiers of temporal patterns
- H. Hermansky and S. Sharma, "TRAPS-Classifiers of temporal patterns," in Proc. ICASSP, 1999, vol.I, pp. 289-292.
- (1999) Proc. ICASSP , vol.1 , pp. 289-292
- Hermansky, H.¹ Sharma, S.²

11
- 84867218137
- Short-and long-term dynamic features for robust speech recognition
- T. Fukuda, O. Ichikawa, and M. Nishimura, "Short-and long-term dynamic features for robust speech recognition," Proc. Interspeech, pp. 2262-2265, 2008.
- (2008) Proc. Interspeech , pp. 2262-2265
- Fukuda, T.¹ Ichikawa, O.² Nishimura, M.³

12
- 1842476689
- Efficient voice activity detection algorithms using long-term speech information
- J. Ramirez, J. C. Segura, C. Benitez, A. Torre, and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information," Speech Commun., vol.42, pp. 271-287, 2004.
- (2004) Speech Commun , vol.42 , pp. 271-287
- Ramirez, J.¹ Segura, J.C.² Benitez, C.³ Torre, A.⁴ Rubio, A.⁵

13
- 51449092667
- Robust automatic continuous-speech recognition based on a voiced-unvoiced decision
- H. Tolba and D. O'Shaughnessy, "Robust automatic continuous-speech recognition based on a voiced-unvoiced decision," Proc. ICSLP, 1998, paper 0342.
- (1998) Proc. ICSLP , pp. 0342
- Tolba, H.¹ O'Shaughnessy, D.²

14
- 0034841228
- Perceptual harmonic cepstral coefficients for speech recognition in noisy environment
- L. Gu and K. Rose, "Perceptual harmonic cepstral coefficients for speech recognition in noisy environment," in Proc. ICASSP, 2001, vol.1, pp. 125-128.
- (2001) Proc. ICASSP , vol.1 , pp. 125-128
- Gu, L.¹ Rose, K.²

15
- 85164649882
- Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio
- K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Proc. Interspeech, pp. 230-233, 2007.
- (2007) Proc. Interspeech , pp. 230-233
- Ishizuka, K.¹ Nakatani, T.² Fujimoto, M.³ Miyazaki, N.⁴

16
- 77956755408
- Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection
- Y. Guo, Q. Fu, and Y. Yan, "Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection," Proc. Interspeech, pp. 2949-2952, 2007.
- (2007) Proc. Interspeech , pp. 2949-2952
- Guo, Y.¹ Fu, Q.² Yan, Y.³

17
- 0021645331
- Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
- Dec
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-32, no.6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

18
- 0027957839
- Effect of temporal envelope smearing on speech reception
- R. Drullman, J. M. Festen, and R. Plomp, "Effect of temporal envelope smearing on speech perception," J. Acoust. Soc. Amer., vol.95, pp. 1053-1064, 1994. (Pubitemid 24056370)
- (1994) Journal of the Acoustical Society of America , vol.95 , Issue.2 , pp. 1053-1064
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

19
- 0028287770
- Effect of reducing slow temporal modulations on speech perception
- R. Drullman, J. M. Festen, and R. Plomp, "Effect of reducing slow temporal modulations on speech perception," J. Acoust. Soc. Amer., vol.95, pp. 2670-2680, 1994.
- (1994) J. Acoust. Soc. Amer. , vol.95 , pp. 2670-2680
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

20
- 0032676337
- On the relative importance of various components of the modulation spectrum for automatic speech recognition
- N. Kanedera, T. Arai, H. Hermansky, and M. Pavel, "On the relative importance of various components of the modulation spectrum for automatic speech recognition," Speech Commun., vol.28, no.1, pp. 43-55, 1999.
- (1999) Speech Commun , vol.28 , Issue.1 , pp. 43-55
- Kanedera, N.¹ Arai, T.² Hermansky, H.³ Pavel, M.⁴

21
- 0038694713
- The analysis of speech in different temporal integration windows: Cerebral lateralization as asymmetric sampling in time
- D. Poeppel, "The analysis of speech in different temporal integration windows: Cerebral lateralization as asymmetric sampling in time," Speech Commun., vol.41, pp. 245-255, 2003.
- (2003) Speech Commun , vol.41 , pp. 245-255
- Poeppel, D.¹

22
- 51449116408
- Local peak enhancement combined with noise reduction algorithms for robust automatic speech recognition in automobiles
- O. Ichikawa, T. Fukuda, and M. Nishimura, "Local peak enhancement combined with noise reduction algorithms for robust automatic speech recognition in automobiles," in Proc. IEEE ICASSP, 2008, pp. 4865-4868.
- (2008) Proc. IEEE ICASSP , pp. 4865-4868
- Ichikawa, O.¹ Fukuda, T.² Nishimura, M.³

23
- 44849110632
- Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance
- N. Kitaoka, K. Yamamoto, T. Kusamizu, S. Nakagawa, T. Yamada, S. Tsuge, C. Miyajima, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Kuroiwa, K. Takeda, and S. Nakamura, "Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance," in Proc. IEEE Workshop ASRU, 2007, pp. 607-612.
- (2007) Proc. IEEE Workshop ASRU , pp. 607-612
- Kitaoka, N.¹ Yamamoto, K.² Kusamizu, T.³ Nakagawa, S.⁴ Yamada, T.⁵ Tsuge, S.⁶ Miyajima, C.⁷ Nishiura, T.⁸ Nakayama, M.⁹ Denda, Y.¹⁰ Fujimoto, M.¹¹ Takiguchi, T.¹² Tamura, S.¹³ Kuroiwa, S.¹⁴ Takeda, K.¹⁵ Nakamura, S.¹⁶

24
- 33646799204
- Data collection and evaluation of AURORA-2 Japanese corpus
- S. Nakamura, K.Yamamoto, K. Takeda, S.Kuroiwa, N. Kitaoka, T.Yamada, M. Mizumachi, T. Nishiura, M. Fujimoto, A. Saso, and T. Endo, "Data collection and evaluation of AURORA-2 Japanese corpus," in Proc. IEEE Workshop ASRU, 2003, pp. 619-623.
- (2003) Proc. IEEE Workshop ASRU , pp. 619-623
- Nakamura, S.¹ Yamamoto, K.² Takeda, K.³ Kuroiwa, S.⁴ Kitaoka, N.⁵ Yamada, T.⁶ Mizumachi, M.⁷ Nishiura, T.⁸ Fujimoto, M.⁹ Saso, A.¹⁰ Endo, T.¹¹

25
- 0037401288
- Toward improving speech detection robustness for speech recognition in adverse environment
- L. Karray and A. Martin, "Toward improving speech detection robustness for speech recognition in adverse environment," Speech Commun., vol.40, pp. 261-276, 2003.
- (2003) Speech Commun , vol.40 , pp. 261-276
- Karray, L.¹ Martin, A.²

26
- 44849131058
- Censrec2: Corpus and evaluation environments for in car continuous digit speech recognition
- S. Nakamura, M. Fujimoto, and K. Takeda, "Censrec2: Corpus and evaluation environments for in car continuous digit speech recognition," Proc. Interspeech, pp. 2330-2333, 2006.
- (2006) Proc. Interspeech , pp. 2330-2333
- Nakamura, S.¹ Fujimoto, M.² Takeda, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.