SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2013, Pages 728-731

Speech activity detection on youtube using deep neural networks

(3) Ryant, Neville a Liberman, Mark a Yuan, Jiahong a

a UNIVERSITY OF PENNSYLVANIA (United States)

Author keywords

Deep neural networks; Segmentation; Speech activity detection; Voice activity detection

Indexed keywords

IMAGE SEGMENTATION; SPEECH RECOGNITION;

CEPSTRAL FEATURES; DEEP NEURAL NETWORKS; ENVIRONMENTAL CONDITIONS; GAUSSIAN MIXTURE MODEL (GMMS); MEL-FREQUENCY CEPSTRAL COEFFICIENTS; SPEECH ACTIVITY DETECTIONS; STATIONARY NOISE; VOICE ACTIVITY DETECTION;

SPEECH PROCESSING;

EID: 84906228076 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (119)

References (16)

1
- 0011990786
- The meeting project at ICSI
- N. Morgan, D. Baron, J. Edwards, D. Ellis, D. Gelbart, A. Janin, T. Pfau, E. Shriberg, and A. Stolcke, "The Meeting Project at ICSI, " in Proceedings of the First International Conference on Human Language Technology Research, 2001, pp. 1-7.
- (2001) Proceedings of the First International Conference on Human Language Technology Research , pp. 1-7
- Morgan, N.¹ Baron, D.² Edwards, J.³ Ellis, D.⁴ Gelbart, D.⁵ Janin, A.⁶ Pfau, T.⁷ Shriberg, E.⁸ Stolcke, A.⁹

2
- 0141469852
- Multispeaker speech activity detection for the ICSI meeting recorder
- T. Pfau, D. P. Ellis, and A. Stolcke, "Multispeaker Speech Activity Detection for the ICSI Meeting Recorder, " in Proceedings of Automatic Speech Recognition and Understanding, 2001, pp. 107-110.
- (2001) Proceedings of Automatic Speech Recognition and Understanding , pp. 107-110
- Pfau, T.¹ Ellis, D.P.² Stolcke, A.³

3
- 0036293830
- An overview of automatic speaker recognition technology
- D. A. Reynolds, "An overview of automatic speaker recognition technology, " in Proceedings of ICASSP, vol. 4, 2002, pp. 4072- 4075.
- (2002) Proceedings of ICASSP , vol.4 , pp. 4072-4075
- Reynolds, D.A.¹

4
- 84873315510
- Unsupervised speech activity detection using voicing measures and perceptual spectral flux
- IEEE
- S. Sadjadi and J. Hansen, "Unsupervised speech activity detection using voicing measures and perceptual spectral flux, " Signal Processing Letters, IEEE, vol. 20, pp. 197-200, 2013.
- (2013) Signal Processing Letters , vol.20 , pp. 197-200
- Sadjadi, S.¹ Hansen, J.²

5
- 34047272330
- Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
- N. Mesgarani, M. Slaney, and S. A. Shamma, "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 3, pp. 920-930, 2006.
- (2006) Audio, Speech, and Language Processing, IEEE Transactions on , vol.14 , Issue.3 , pp. 920-930
- Mesgarani, N.¹ Slaney, M.² Shamma, S.A.³

6
- 84878535284
- Developing a speech activity detection system for the DARPA RATS program
- T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, K. Vesely, P. Matejka, X. Zhu, and N. Mesgarani, "Developing a speech activity detection system for the DARPA RATS program, " in Proceedings of InterSpeech, 2012.
- (2012) Proceedings of Inter Speech
- Ng, T.¹ Zhang, B.² Nguyen, L.³ Matsoukas, S.⁴ Vesely, K.⁵ Matejka, P.⁶ Zhu, X.⁷ Mesgarani, N.⁸

7
- 79959838316
- Voice activity detection based on conditional random fields using multiple features
- A. Saito, Y. Nankaku, A. Lee, and K. Tokuda, "Voice activity detection based on conditional random fields using multiple features, " in Proceedings of InterSpeech, 2010, pp. 2086-2089.
- (2010) Proceedings of InterSpeech , pp. 2086-2089
- Saito, A.¹ Nankaku, Y.² Lee, A.³ Tokuda, K.⁴

8
- 80051623447
- Speaker diarization of heterogeneous web video files: A preliminary study
- P. Clement, T. Bazillon, and C. Fredouille, "Speaker diarization of heterogeneous web video files: A preliminary study, " in Proceedings of ICASSP, 2011, pp. 4432-4435.
- (2011) Proceedings of ICASSP , pp. 4432-4435
- Clement, P.¹ Bazillon, T.² Fredouille, C.³

9
- 84878610785
- Speech/nonspeech segmentation in web videos
- A. Misra, "Speech/nonspeech segmentation in web videos, " in Proceedings of InterSpeech, 2012.
- (2012) Proceedings of InterSpeech
- Misra, A.¹

10
- 33745805403
- A fast learning algorithm for deep belief nets
- G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets, " Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

11
- 84937454179
- Creating HAVIC: Heterogeneous audio visual internet collection
- S. Strassel, A. Morris, J. Fiscus, C. Caruso, H. Lee, P. Over, J. Fiumara, B. Shaw, B. Antonishek, and M. Michel, "Creating HAVIC: Heterogeneous Audio Visual Internet Collection, " in Proceedings of the Eight International Conference on Language Resources and Evaluation, 2012.
- (2012) Proceedings of the Eight International Conference on Language Resources and Evaluation
- Strassel, S.¹ Morris, A.² Fiscus, J.³ Caruso, C.⁴ Lee, H.⁵ Over, P.⁶ Fiumara, J.⁷ Shaw, B.⁸ Antonishek, B.⁹ Michel, M.¹⁰

12
- 70450198180
- Xtrans: A speech annotation and transcription tool
- M. L. Glenn, S. M. Strassel, and H. Lee, "XTrans: A speech annotation and transcription tool, " Proceedings of InterSpeech, 2009.
- (2009) Proceedings of InterSpeech
- Glenn, M.L.¹ Strassel, S.M.² Lee, H.³

13
- 33745577702
- The rich transcription 2005 spring meeting recognition evaluation
- J. Fiscus, N. Radde, J. Garofolo, A. Le, J. Ajot, and C. Laprun, "The Rich Transcription 2005 Spring Meeting Recognition Evaluation, " Machine Learning for Multimodal Interaction, pp. 369- 389, 2006.
- (2006) Machine Learning for Multimodal Interaction , pp. 369-389
- Fiscus, J.¹ Radde, N.² Garofolo, J.³ Le, A.⁴ Ajot, J.⁵ Laprun, C.⁶

14
- 23344452899
- Statistical voice activity detection using a multiple observation likelihood ratio test
- IEEE
- J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical voice activity detection using a multiple observation likelihood ratio test, " Signal Processing Letters, IEEE, vol. 12, no. 10, pp. 689-692, 2005.
- (2005) Signal Processing Letters , vol.12 , Issue.10 , pp. 689-692
- Ramirez, J.¹ Segura, J.C.² Benitez, C.³ Garcia, L.⁴ Rubio, A.⁵

15
- 77956509090
- Rectified linear units improve restricted boltzmann machines
- V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines, " in Proc. 27th International Conference on Machine Learning, 2010, pp. 807-814.
- (2010) Proc. 27th International Conference on Machine Learning , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

16
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.