SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2013, Pages 7596-7599

Audio-visual deep learning for noise robust speech recognition

(2) Huang, Jing a Kingsbury, Brian a

a IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Audio visual speech recognition; Deep belief networks; Noise robustness

Indexed keywords

AUDIO VISUAL SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION; DECISION FUSION METHODS; DEEP BELIEF NETWORK (DBN); DEEP BELIEF NETWORKS; GAUSSIAN MIXTURE MODEL; NOISE ROBUST SPEECH RECOGNITION; NOISE ROBUSTNESS;

SIGNAL PROCESSING; SPEECH RECOGNITION;

ACOUSTIC NOISE;

EID: 84890465549 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2013.6639140 Document Type: Conference Paper

Times cited : (196)

References (24)

1
- 0001048664
- Visual contribution to speech intelligibility in noise
- W.H. Sumby and I. Pollack (1954), "Visual contribution to speech intelligibility in noise," in J. Acoustical Society America, 26: 212-215.
- (1954) J. Acoustical Society America , vol.26 , pp. 212-215
- Sumby, W.H.¹ Pollack, I.²

2
- 0032074310
- Audio-visual integration in multi-modal communication
- T. Chen and R.R. Rao (1998), "Audio-visual integration in multi-modal communication," in Proc. IEEE, 86(5): 837-852.
- (1998) Proc. IEEE , vol.86 , Issue.5 , pp. 837-852
- Chen, T.¹ Rao, R.R.²

3
- 84994350739
- Multi-stream speech recognition: Ready for prime time
- A. Janin, D. Ellis and N. Morgan (1999) "Multi-stream speech recognition: Ready for prime time?", in Proc. Europ. Conf. Speech Technol., pp. 591-594, 1999.
- (1999) Proc. Europ. Conf. Speech Technol. , pp. 591-594
- Janin, A.¹ Ellis, D.² Morgan, N.³

4
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- S. Dupont and J. Luettin (2000), "Audio-visual speech modeling for continuous speech recognition," in IEEE Trans. Multimedia, 2(3): 141-151.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

5
- 0036502797
- A review of speech-based bimodal recognition
- C.C. Chibelushi, F. Deravi, and J.S.D. Mason (2002), "A review of speech-based bimodal recognition," in IEEE Trans. Multimedia, 4(1): 23-37.
- (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.D.³

6
- 0036874527
- Noise adaptive stream weighting in audio-visual speech recognition
- M. Heckmann, F. Berthommier, and K. Kroschel (2002), "Noise adaptive stream weighting in audio-visual speech recognition," in EURASIP J. Appl. Signal Process., 2002(11): 1260-1273.
- (2002) EURASIP J. Appl. Signal Process., 2002 , Issue.11 , pp. 1260-1273
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

7
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior (2003), "Recent advances in the automatic recognition of audio-visual speech," in Proc. IEEE, 91(9): 1306-1326.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.W.⁵

8
- 84890472914
- Audio-visual speech recognition
- K. Brown (Ed. In Chief), Elsevier, Oxford, United Kingdom, ISBN: 0-08-044299-4, 2006
- G. Potamianos (2006), "Audio-Visual Speech Recognition," in Encyclopedia of Language and Linguistics, Second Edition, (Speech Technology Section-Computer Understanding of Speech), K. Brown (Ed. In Chief), Elsevier, Oxford, United Kingdom, ISBN: 0-08-044299-4, 2006.
- (2006) Encyclopedia of Language and Linguistics, Second Edition, (Speech Technology Section-Computer Understanding of Speech)
- Potamianos, G.¹

9
- 10444261199
- Audio-visual speech recognition using an infrared headset
- J. Huang, G. Potamianos, J. Connell and C. Neti (2004), "Audio-visual speech recognition using an infrared headset," in Speech Communication 44(4), 83-96.
- (2004) Speech Communication , vol.44 , Issue.4 , pp. 83-96
- Huang, J.¹ Potamianos, G.² Connell, J.³ Neti, C.⁴

10
- 85009111979
- Efficient likelihood computation in multi-stream hmm based audio-visual speech recognition
- E. Marcheret, S. Chu, V. Goel, G. Potamianos (2004), "Efficient Likelihood Computation in Multi-Stream HMM Based Audio-Visual Speech Recognition," in Int. Conf. Speech and Language Processing, 2004.
- (2004) Int. Conf. Speech and Language Processing
- Marcheret, E.¹ Chu, S.² Goel, V.³ Potamianos, G.⁴

11
- 0036296863
- Minimum phone error and i-smoothing for improved discriminative training
- D. Povey and P. C. Woodland, "Minimum Phone Error and I-smoothing for Improved Discriminative Training," in Proceedings of ICASSP, 2002.
- (2002) Proceedings of ICASSP
- Povey, D.¹ Woodland, P.C.²

12
- 33646788786
- FMPE: Discriminatively trained features for speech recognition
- D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig, "fMPE: Discriminatively trained features for speech recognition," in Proceedings of ICASSP, 2005.
- (2005) Proceedings of ICASSP
- Povey, D.¹ Kingsbury, B.² Mangu, L.³ Saon, G.⁴ Soltau, H.⁵ Zweig, G.⁶

13
- 34047244134
- Discriminatively trained features using fmpe for multi-stream audio-visual speech recognition
- J. Huang and D. Povey, "Discriminatively Trained Features Using fMPE for Multi-Stream Audio-Visual Speech Recognition," in Proceedings of Interspeech, 2005.
- (2005) Proceedings of Interspeech
- Huang, J.¹ Povey, D.²

14
- 70450172282
- Combined discriminative training for multi-stream hmm-based audio-visual speech recognition
- J. Huang and K. Visweswariah, "Combined Discriminative Training for Multi-Stream HMM-based Audio-Visual Speech Recognition," in Proceedings of Interspeech, 2009.
- (2009) Proceedings of Interspeech
- Huang, J.¹ Visweswariah, K.²

15
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," in IEEE Signal Processing Magazine, 29(6): 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

16
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- A. Mohamed, G. Hinton, G. Penn, "Understanding how Deep Belief Networks perform acoustic modelling," in Proceedings of ICASSP, 2012.
- (2012) Proceedings of ICASSP
- Mohamed, A.¹ Hinton, G.² Penn, G.³

17
- 85135321224
- See me, hear me: Integrating automatic speech recognition and lipreading
- P. Duchnowski, U. Meier, and A. Waibel, "See me, hear me: Integrating automatic speech recognition and lipreading," in Proceedings of ICSLP, 1994.
- (1994) Proceedings of ICSLP
- Duchnowski, P.¹ Meier, U.² Waibel, A.³

18
- 0029725863
- Adaptive bimodal sensor fursion for automatic speechreading
- U. Meier, W. Hurst and P. Duchnowski, "Adaptive Bimodal Sensor Fursion for Automatic Speechreading," in Proceedings of ICASSP, 1996.
- (1996) Proceedings of ICASSP
- Meier, U.¹ Hurst, W.² Duchnowski, P.³

19
- 0041624571
- Audio-visual speech recognition using red exclusion and neural networks
- T. Lewis and D. Powers, "Audio-Visual Speech Recognition using Red Exclusion and Neural Networks," in Journal of Research and Practice in Information Technology, 2003.
- (2003) Journal of Research and Practice in Information Technology
- Lewis, T.¹ Powers, D.²

20
- 26844502130
- Speech recognition by integrating audio, visual and contextual features based on neural networks
- M. Kim, J. Ryu, and E. Kim, "Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks," in Advances in Natural Computation, Lecture Notes in Computer Science, 2005.
- (2005) Advances in Natural Computation, Lecture Notes in Computer Science
- Kim, M.¹ Ryu, J.² Kim, E.³

21
- 80053437179
- Multimodal deep learning
- J. Ngiam, A. Khosla, J. Nam, H. Lee and A.Ng, "Multimodal Deep Learning", in International Conference on Machine Learning, 2011.
- (2011) International Conference on Machine Learning
- Ngiam, J.¹ Khosla, A.² Nam, J.³ Lee, H.⁴ Ng, A.⁵

22
- 0141814785
- Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
- A. Garg, G. Potamianos, C. Neti, T. Huang, "Frame-Dependent Multi-Stream Reliability Indicators for Audio-Visual Speech Recognition," in Int. Conf. Acoustic Speech and Signal Processing, 2003.
- (2003) Int. Conf. Acoustic Speech and Signal Processing
- Garg, A.¹ Potamianos, G.² Neti, C.³ Huang, T.⁴

23
- 44949227080
- Adaptive multimodal fusion by uncertainty compensation
- V. Pitsikalis, A. Katsamanis, G. Papandreou, and P. Maragos. "Adaptive multimodal fusion by uncertainty compensation," in Proceedings of ICSLP, 2006.
- (2006) Proceedings of ICSLP
- Pitsikalis, V.¹ Katsamanis, A.² Papandreou, G.³ Maragos, P.⁴

24
- 33745805403
- A fast learning algorithm for deep belief nets
- G. E. Hinton, S. Osindero, and Y. Teh. "A Fast Learning Algorithm for Deep Belief Nets," in Neural Computation, vol. 18, pp. 1527-1554, 2006.
- (2006) Neural Computation , vol.18 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.