SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2013, Pages 483-487

Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies

(4) Eyben, Florian a Weninger, Felix a Squartini, Stefano b Schuller, Bjorn a

a TECHNICAL UNIVERSITY OF MUNICH (Germany)

b UNIVERSITÀ POLITECNICA DELLE MARCHE (Italy)

Author keywords

Long Short Term Memory; Neural Networks; Speech Detection; Voice Activity Detection

Indexed keywords

DATA-DRIVEN APPROACH; EQUAL ERROR RATE; LONG SHORT-TERM MEMORY; LONG-TERM RECORDING; REFERENCE ALGORITHM; SPEECH DETECTION; SPONTANEOUS SPEECH; VOICE ACTIVITY DETECTION;

BRAIN; MOTION PICTURES; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SIGNAL PROCESSING; SIGNAL TO NOISE RATIO; SPEECH RECOGNITION;

SPEECH;

EID: 84890443834 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2013.6637694 Document Type: Conference Paper

Times cited : (234)

References (26)

1
- 0033903480
- Robust voice activity detection algorithm for estimating noise spectrum
- K. Woo, T. Yang, K. Park, and C. Lee, "Robust voice activity detection algorithm for estimating noise spectrum," IET Electronics Letters, 2000.
- (2000) IET Electronics Letters
- Woo, K.¹ Yang, T.² Park, K.³ Lee, C.⁴

2
- 79953283970
- AR-GARCH in presence of noise: Parameter estimation and its application to voice activity detection
- S. Mousazadeh and I. Cohen, "AR-GARCH in Presence of Noise: Parameter Estimation and Its Application to Voice Activity Detection," IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 4, pp. 916-926, 2011.
- (2011) IEEE Transactions on Audio Speech and Language Processing , vol.19 , Issue.4 , pp. 916-926
- Mousazadeh, S.¹ Cohen, I.²

3
- 84878610785
- Speech/nonspeech segmentation in web videos
- Portland, USA. September, ISCA
- A. Misra, "Speech/nonspeech segmentation in web videos," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Misra, A.¹

4
- 84878535284
- Developing a speech activity detection system for the darpa rats program
- Portland, USA. September, ISCA
- T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, X. Zhou, N. Mesgarani, K. Vesel, and P. Matjka, "Developing a speech activity detection system for the darpa rats program," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Ng, T.¹ Zhang, B.² Nguyen, L.³ Matsoukas, S.⁴ Zhou, X.⁵ Mesgarani, N.⁶ Vesel, K.⁷ Matjka, P.⁸

5
- 0032762471
- A statistical model-based voice activity detection
- J. Sohn and N. Kim, "A statistical model-based voice activity detection," IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, 1999.
- (1999) IEEE Signal Processing Letters , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.²

6
- 23344452899
- Statistical voice activity detection using a multiple observation likelihood ratio test
- J. Ramirez, J. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical voice activity detection using a multiple observation likelihood ratio test," IEEE Signal Processing Letters, vol. 12, no. 10, pp. 689-692, 2005.
- (2005) IEEE Signal Processing Letters , vol.12 , Issue.10 , pp. 689-692
- Ramirez, J.¹ Segura, J.² Benitez, C.³ Garcia, L.⁴ Rubio, A.⁵

7
- 4544379392
- On the decision-directed estimation approach of Ephraim and Malah
- I. Cohen, "On the decision-directed estimation approach of Ephraim and Malah," in Proc. of ICASSP. IEEE, 2004, vol. I, pp. 1-293.
- (2004) Proc. of ICASSP. IEEE , vol.1 , pp. 1-293
- Cohen, I.¹

8
- 1842476689
- Efficient voice activity detection algorithms using long-term speech information
- J. Ramirez, J. Segura, M. Benitez, A. De La Torre, and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information," Speech Communication, vol. 42, no. 3, pp. 271-287, 2004.
- (2004) Speech Communication , vol.42 , Issue.3 , pp. 271-287
- Ramirez, J.¹ Segura, J.² Benitez, M.³ De La Torre, A.⁴ Rubio, A.⁵

9
- 0041360463
- Noise spectrum estimation in adverse environment: Improved minima controlled recursive averaging
- I. Cohen, "Noise spectrum estimation in adverse environment: Improved minima controlled recursive averaging," IEEE Trans. Audio Speech Processing, vol. 11, no. 5, pp. 466-475, 2003.
- (2003) IEEE Trans. Audio Speech Processing , vol.11 , Issue.5 , pp. 466-475
- Cohen, I.¹

10
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9(8), pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

11
- 33745194565
- Non-linear esimation of voice activity to improve automatic recognition of noisy speech
- Lisbon, Portugal. September, ISCA
- R. Gemello, F. Mana, and R.D. Mori, "Non-linear esimation of voice activity to improve automatic recognition of noisy speech," in Proc. of INTERSPEECH 2005, Lisbon, Portugal. September 2005, pp. 2617-2620, ISCA.
- (2005) Proc. of INTERSPEECH 2005 , pp. 2617-2620
- Gemello, R.¹ Mana, F.² Mori, R.D.³

12
- 0041914606
- Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
- S.C. Kremer and J.F. Kolen, Eds., IEEE Press
- S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies," in A Field Guide to Dynamical Recurrent Neural Networks., S.C. Kremer and J.F. Kolen, Eds. 2001, IEEE Press.
- (2001) A Field Guide to Dynamical Recurrent Neural Networks
- Hochreiter, S.¹ Bengio, Y.² Frasconi, P.³ Schmidhuber, J.⁴

13
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Apr.
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
- (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

14
- 78650977476
- OpenSMILE-the munich versatile and fast open-source audio feature extractor
- Florence, Italy, ACM
- F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE- the munich versatile and fast open-source audio feature extractor," in Proc. ACM Multimedia (MM), Florence, Italy. 2010, pp. 1459-1462, ACM.
- (2010) Proc. ACM Multimedia (MM) , pp. 1459-1462
- Eyben, F.¹ Wöllmer, M.² Schuller, B.³

15
- 70349287581
- Multidimensional recurrent neural networks
- Porto, Portugal, September
- A. Graves, S. Fernández, and J. Schmidhuber, " Multidimensional recurrent neural networks," in Proc. of the 2007 International Conference on Artificial Neural Networks, Porto, Portugal, September 2007.
- (2007) Proc. of the 2007 International Conference on Artificial Neural Networks
- Graves, A.¹ Fernández, S.² Schmidhuber, J.³

16
- 51449106187
- Department of Psychology, Ohio State University (Distributor), Columbus, OH, USA
- M.A. Pitt, L. Dilley, K. Johnson, S. Kiesling, W. Raymond, E. Hume, and E. Fosler-Lussier, Buckeye Corpus of Conversational Speech (2nd release), Department of Psychology, Ohio State University (Distributor), Columbus, OH, USA, 2007, [www.buckeyecorpus.osu.edu].
- (2007) Buckeye Corpus of Conversational Speech (2nd Release)
- Pitt, M.A.¹ Dilley, L.² Johnson, K.³ Kiesling, S.⁴ Raymond, W.⁵ Hume, E.⁶ Fosler-Lussier, E.⁷

17
- 0003548585
- J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgrena, and V. Zue, "TIMIT acoustic-phonetic continuous speech corpus," 1993.
- (1993) TIMIT Acoustic-phonetic Continuous Speech Corpus
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgrena, N.L.⁶ Zue, V.⁷

18
- 80051621128
- Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and long short-term memory
- Prague, Czech Republic
- F. Weninger, B. Schuller, M. Wöllmer, and G. Rigoll, "Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and Long Short-Term Memory," in Proc. of ICASSP, Prague, Czech Republic, 2011, pp. 5840-5843.
- (2011) Proc. of ICASSP , pp. 5840-5843
- Weninger, F.¹ Schuller, B.² Wöllmer, M.³ Rigoll, G.⁴

19
- 84877658023
- The media eval 2012 affect task: Violent scenes detection in hollywood movies
- Pisa, Italy
- C.H. Demarty, C. Penet, G. Gravier, and M. Soleymani, "The MediaEval 2012 Affect Task: Violent scenes detection in Hollywood Movies," in Proc. of MediaEval 2012 Workshop, Pisa, Italy, 2012.
- (2012) Proc. of MediaEval 2012 Workshop
- Demarty, C.H.¹ Penet, C.² Gravier, G.³ Soleymani, M.⁴

20
- 84878543378
- Speaker-dependent voice activity detection robust to background speech noise
- Portland, USA. September, ISCA
- S. Matsuda, N. Ito, K. Tsujino, H. Kashioka, and S. Sagayama, "Speaker-dependent voice activity detection robust to background speech noise," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Matsuda, S.¹ Ito, N.² Tsujino, K.³ Kashioka, H.⁴ Sagayama, S.⁵

21
- 80051622763
- A modified MAP criterion based on hidden Markov model for voice activity detecion
- may, IEEE
- S. Deng, J. Han, T. Zheng, and G. Zheng, "A modified MAP criterion based on hidden Markov model for voice activity detecion," in Proc. of ICASSP. may 2011, pp. 5220-5223, IEEE.
- (2011) Proc. of ICASSP , pp. 5220-5223
- Deng, S.¹ Han, J.² Zheng, T.³ Zheng, G.⁴

22
- 85008579584
- Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection
- aug
- Y. Suh and H. Kim, "Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection," Signal Processing Letters, vol. 19, no. 8, pp. 507-510, aug 2012.
- (2012) Signal Processing Letters , vol.19 , Issue.8 , pp. 507-510
- Suh, Y.¹ Kim, H.²

23
- 80055089790
- Frame-wise model re-estimation method based on gaussian pruning with weight normalization for noise robust voice activity detection
- M. Fujimoto, S.Watanabe, and T. Nakatani, "Frame-wise model re-estimation method based on gaussian pruning with weight normalization for noise robust voice activity detection," Speech Communication, vol. 54, no. 2, pp. 229-244, 2012.
- (2012) Speech Communication , vol.54 , Issue.2 , pp. 229-244
- Fujimoto, M.¹ Watanabe, S.² Nakatani, T.³

24
- 84878548167
- Speech activity detection for noisy data using adaptation techniques
- Portland, USA. September, ISCA
- M.K. Omar, "Speech activity detection for noisy data using adaptation techniques," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Omar, M.K.¹

25
- 84878390907
- Voice activity detection using speech recognizer feedback
- Portland, USA. September, ISCA
- K. Thambiratnam, W. Zhu, and F. Seide, "Voice activity detection using speech recognizer feedback," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Thambiratnam, K.¹ Zhu, W.² Seide, F.³

26
- 84878590831
- Acoustic and data-driven features for robust speech activity detection
- Portland, USA. September, ISCA
- S. Thomas, S.H. Mallidi, T. Janu, H. Hermansky, N. Mesgarani, X. Zhou, S. Shamma, T. Ng, B. Zhang, L. Nguyen, and S. Matsoukas, "Acoustic and data-driven features for robust speech activity detection," in Proc. of INTERSPEECH 2012, Portland, USA. September 2012, ISCA.
- (2012) Proc. of INTERSPEECH 2012
- Thomas, S.¹ Mallidi, S.H.² Janu, T.³ Hermansky, H.⁴ Mesgarani, N.⁵ Zhou, X.⁶ Shamma, S.⁷ Ng, T.⁸ Zhang, B.⁹ Nguyen, L.¹⁰ Matsoukas, S.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.