SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2013, Pages 704-708

A robust frontend for VAD: Exploiting contextual, discriminative and spectral cues of human voice

(3) Segbroeck, Maarten Van a Tsiartas, Andreas a Narayanan, Shrikanth S a

a UNIVERSITY OF SOUTHERN CALIFORNIA (United States)

Author keywords

Noise robust features; Speech activity detection

Indexed keywords

SIGNAL PROCESSING; SPEECH COMMUNICATION; SPEECH PROCESSING;

MULTI-LAYER PERCEPTRON CLASSIFIERS; NOISE ROBUST; PERIODICITY STRUCTURES; ROBUST SIGNAL PROCESSING; SPECTRAL VARIABILITY; SPECTRO-TEMPORAL MODULATIONS; SPEECH ACTIVITY DETECTIONS; VOICE ACTIVITY DETECTION;

CONTINUOUS SPEECH RECOGNITION;

EID: 84906246377 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (62)

References (35)

1
- 33745188004
- Voice activity detection based on optimally weighted combination of multiple features
- Y. Kida and T. Kawahara, "Voice activity detection based on optimally weighted combination of multiple features, " in Proc. Interspeech, 2005, pp. 2621-2624.
- (2005) Proc. Interspeech , pp. 2621-2624
- Kida, Y.¹ Kawahara, T.²

2
- 51449100230
- A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme
- M. Fujimoto, K. Ishizuka, and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme, " in Proc. ICASSP, 2008.
- (2008) Proc. ICASSP
- Fujimoto, M.¹ Ishizuka, K.² Nakatani, T.³

3
- 84878535284
- Developing a speech activity detection system for the DARPA RATS program
- T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, X. Zhou, N. Mesgarani, K. Vesel, and P. Matejka, "Developing a speech activity detection system for the DARPA RATS program." in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Ng, T.¹ Zhang, B.² Nguyen, L.³ Matsoukas, S.⁴ Zhou, X.⁵ Mesgarani, N.⁶ Vesel, K.⁷ Matejka, P.⁸

4
- 84878590831
- Acoustic and data-driven features for robust speech activity detection
- A. Thomas, S. Mallidi, T. Janu, H. Hermansky, N. Mesgarani, X. Zhou, S. Shamma, T. Ng, B. Zhang, L. Nguyen, and S. Matsoukas, "Acoustic and data-driven features for robust speech activity detection, " in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Thomas, A.¹ Mallidi, S.² Janu, T.³ Hermansky, H.⁴ Mesgarani, N.⁵ Zhou, X.⁶ Shamma, S.⁷ Ng, T.⁸ Zhang, B.⁹ Nguyen, L.¹⁰ Matsoukas, S.¹¹

5
- 85073251381
- The rats radio traffic collection system
- K. Walker and S. Strassel, "The rats radio traffic collection system, " in Odyssey 2012-The Speaker and Language Recognition Workshop, 2012.
- (2012) Odyssey 2012-The Speaker and Language Recognition Workshop
- Walker, K.¹ Strassel, S.²

6
- 0019053271
- Comparison of parametric representations for monosyllabic word recognitions in continuously spoken sentences
- Aug
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognitions in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. 1980.
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

7
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Apr
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech, " Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
- (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

8
- 34547499683
- Incorporating auditory feature uncertainties in robust speaker identification
- Y. Shao, S. Srinivasan, and D. Wang, "Incorporating auditory feature uncertainties in robust speaker identification, " in Proc. ICASSP, 2002, pp. 277-280.
- (2002) Proc. ICASSP , pp. 277-280
- Shao, Y.¹ Srinivasan, S.² Wang, D.³

9
- 84890520795
- Power-normalized coefficients (PNCC) for robust speech recognition
- C. Kim and R. M. R. M. Stern, "Power-normalized coefficients (PNCC) for robust speech recognition, " in Proc. ICASSP, 2012.
- (2012) Proc. ICASSP
- Kim, C.¹ Stern, R.M.R.M.²

10
- 70349223037
- An auditory-based feature for robust speech recognition
- Y. Shao, Z. Jin, D. L.Wang, and S. Srinivasan, "An auditory-based feature for robust speech recognition, " in Proc. ICASSP, 2009.
- (2009) Proc. ICASSP
- Shao, Y.¹ Jin, Z.² Wang, D.L.³ Srinivasan, S.⁴

11
- 34547539413
- Gammatone features and feature combination for large vocabulary speech recognition
- R. Schluter, I. Bezrukov, H. Wagner, and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition, " in Proc. ICASSP, 2007.
- (2007) Proc. ICASSP
- Schluter, R.¹ Bezrukov, I.² Wagner, H.³ Ney, H.⁴

12
- 0003235731
- TRAPS - classifiers of temporal patterns
- Sydney, Australia, Nov
- H. Hermansky and S. Sharma, "TRAPS - classifiers of temporal patterns, " in Proc. Interspeech, Sydney, Australia, Nov. 1998, pp. 1003-1006.
- (1998) Proc. Interspeech , pp. 1003-1006
- Hermansky, H.¹ Sharma, S.²

13
- 0034848926
- Tandem acoustic modeling in large-vocabulary recognition
- D. Ellis, R. Singh, and S. Sivadas, "Tandem acoustic modeling in large-vocabulary recognition, " in Proc. ICASSP, 2001.
- (2001) Proc. ICASSP
- Ellis, D.¹ Singh, R.² Sivadas, S.³

14
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- F. Grezl, M. Karafiat, S. Kontar, and J. Cernocky, "Probabilistic and bottle-neck features for LVCSR of meetings, " in Proc. ICASSP, 2007.
- (2007) Proc. ICASSP
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Cernocky, J.⁴

15
- 0032136330
- Robust speech recognition using the modulation spectrogram
- Aug
- B. Kingsbury, N. Morgan, and S. Greenberg, "Robust speech recognition using the modulation spectrogram, " Speech Communication, vol. 25, pp. 117-132, Aug. 1998.
- (1998) Speech Communication , vol.25 , pp. 117-132
- Kingsbury, B.¹ Morgan, N.² Greenberg, S.³

16
- 0032658253
- Temporal patterns (TRAPs) in ASR of noisy speech
- Phoenix, Arizona, U.S.A. Mar
- H. Hermansky and S. Sharma, "Temporal patterns (TRAPs) in ASR of noisy speech, " in Proc. ICASSP, vol. 1, Phoenix, Arizona, U.S.A., Mar. 1997, pp. 289-292.
- (1997) Proc. ICASSP , vol.1 , pp. 289-292
- Hermansky, H.¹ Sharma, S.²

17
- 84906249759
- Spectro-temporal gabor features as a front end for ASR
- K. Mi, "Spectro-temporal gabor features as a front end for ASR, " in Proc. Forum Acusticum Sevilla, 2002.
- (2002) Proc. Forum Acusticum Sevilla
- Mi, K.¹

18
- 34047272330
- Discrimination of speech from non-speech based on multiscale spectro-temporal modulations
- N. Mesgarani, M. Slaney, and S. Shamma, "Discrimination of speech from non-speech based on multiscale spectro-temporal modulations, " IEEE Transactions of Audio, Speech and Language Processing, 2006.
- (2006) IEEE Transactions of Audio, Speech and Language Processing
- Mesgarani, N.¹ Slaney, M.² Shamma, S.³

19
- 33745213373
- Multi-resolution RASTA filtering for TANDEM-based ASR
- Lisbon, Portugal, Oct
- H. Hermansky and P. Fousek, "Multi-resolution RASTA filtering for TANDEM-based ASR, " in Proc. Interspeech, Lisbon, Portugal, Oct. 2005, pp. 361-364.
- (2005) Proc. Interspeech , pp. 361-364
- Hermansky, H.¹ Fousek, P.²

20
- 84867220821
- Multi-stream spectro-temporal features for robust speech recognition
- S. Zhao and N. Morgan, "Multi-stream spectro-temporal features for robust speech recognition, " in Proc. Interspeech, 2008, pp. 898-901.
- (2008) Proc. Interspeech , pp. 898-901
- Zhao, S.¹ Morgan, N.²

21
- 70450182191
- Tandem representations of spectral envelope and modulation frequency features for ASR
- T. S. S. Ganapathy, and H. Hermansky, "Tandem representations of spectral envelope and modulation frequency features for ASR, " in Proc. Interspeech, 2009, pp. 2955-2958.
- (2009) Proc. Interspeech , pp. 2955-2958
- Ganapathy, T.S.S.¹ Hermansky, H.²

22
- 84878395103
- Longer features: They do a speech detector good
- T. Tsai and N. Morgan, "Longer features: They do a speech detector good, " in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Tsai, T.¹ Morgan, N.²

23
- 84890541926
- A robust frontend for ASR: Combining denoising, noise masking and feature normalization
- M. Van Segbroeck and S. Narayanan, "A robust frontend for ASR: combining denoising, noise masking and feature normalization, " in Proc. ICASSP, 2013.
- (2013) Proc. ICASSP
- Van Segbroeck, M.¹ Narayanan, S.²

24
- 84865769808
- Comparing different flavors of spectro-temporal features for ASR
- B. T. Meyer, S. V. Ravuri, M. R. Schadler, and N. Morgan, "Comparing different flavors of spectro-temporal features for ASR, " in Proc. Interspeech, 2011, pp. 1269-1272.
- (2011) Proc. Interspeech , pp. 1269-1272
- Meyer, B.T.¹ Ravuri, S.V.² Schadler, M.R.³ Morgan, N.⁴

25
- 84865769808
- Comparing different flavors of spectro-temporal features for ASR
- B. Meyer, S. Ravuri, M. Schadler, and N. Morgan, "Comparing different flavors of spectro-temporal features for ASR, " in Proc. Interspeech, 2011, pp. 1269-1272.
- (2011) Proc. Interspeech , pp. 1269-1272
- Meyer, B.¹ Ravuri, S.² Schadler, M.³ Morgan, N.⁴

26
- 0036642777
- Use of voicing features in hmm-based speech recognition
- D. L. Thomson and R. Chengalvarayan, "Use of voicing features in hmm-based speech recognition, " Speech Communication, vol. 37, no. 3, pp. 197-211, 2002.
- (2002) Speech Communication , vol.37 , Issue.3 , pp. 197-211
- Thomson, D.L.¹ Chengalvarayan, R.²

27
- 85009188485
- Extraction methods of voicing feature for robust speech recognition
- A. Zolnay, R. Schulter, and H. Ney, "Extraction methods of voicing feature for robust speech recognition, " in Proceedings of EUROSPEECH, 2003, pp. 497-500.
- (2003) Proceedings of EUROSPEECH , pp. 497-500
- Zolnay, A.¹ Schulter, R.² Ney, H.³

28
- 77956739501
- Long-term spectrotemporal and static harmonic features for voice activity detection
- T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-term spectrotemporal and static harmonic features for voice activity detection, " Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 5, pp. 834-844, 2010.
- (2010) Selected Topics in Signal Processing, IEEE Journal of , vol.4 , Issue.5 , pp. 834-844
- Fukuda, T.¹ Ichikawa, O.² Nishimura, M.³

29
- 84865802934
- Robust voice activity detector for real world applications using harmonicity and modulation frequency
- E. Chuangsuwanich and J. Glass, "Robust voice activity detector for real world applications using harmonicity and modulation frequency, " in Twelfth Annual Conference of the International Speech Communication Association, 2011.
- (2011) Twelfth Annual Conference of the International Speech Communication Association
- Chuangsuwanich, E.¹ Glass, J.²

30
- 0242693169
- The correlogram: A visual display of periodicity
- S. Granqvist and B. Hammarberg, "The correlogram: A visual display of periodicity, " The Journal of the Acoustical Society of America, vol. 114, 2003.
- (2003) The Journal of the Acoustical Society of America , vol.114
- Granqvist, S.¹ Hammarberg, B.²

31
- 4544315110
- Robust speech recognition using cepstral domain missing data techniques and noisy masks
- Montreal, Canada, May
- H. Van Hamme, "Robust speech recognition using cepstral domain missing data techniques and noisy masks, " in Proc. ICASSP, Montreal, Canada, May 2004, pp. 213-216.
- (2004) Proc. ICASSP , pp. 213-216
- Van Hamme, H.¹

32
- 78049530924
- Ph.D. dissertation, K.U.Leuven, ESAT, Jan
- M. Van Segbroeck, "Robust large vocabulary continuous speech recognition using missing data techniques, " Ph.D. dissertation, K.U.Leuven, ESAT, Jan. 2010.
- (2010) Robust Large Vocabulary Continuous Speech Recognition Using Missing Data Techniques
- Van Segbroeck, M.¹

33
- 0003719446
- University Park Press Baltimore
- J. M. Pickett, The sounds of speech communication: A primer of acoustic phonetics and speech perception. University Park Press Baltimore, 1980.
- (1980) The Sounds of Speech Communication: A Primer of Acoustic Phonetics and Speech Perception
- Pickett, J.M.¹

34
- 78649989192
- Robust voice activity detection using long-term signal variability
- P. K. Ghosh, A. Tsiartas, and S. Narayanan, "Robust voice activity detection using long-term signal variability, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 3, pp. 600-613, 2011.
- (2011) Audio, Speech, and Language Processing, IEEE Transactions on , vol.19 , Issue.3 , pp. 600-613
- Ghosh, P.K.¹ Tsiartas, A.² Narayanan, S.³

35
- 84906213162
- H. Goldberg, RATS Evaluation Plan, 2011, https://rats.saic.com/index.php/ EvaluationProtocols.
- (2011) RATS Evaluation Plan
- Goldberg, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.