SCOPUS 정보 검색 플랫폼

Applied Intelligence

Volumn 42, Issue 4, 2015, Pages 722-737

Audio-visual speech recognition using deep learning

(5) Noda, Kuniaki a Yamaguchi, Yuki b Nakadai, Kazuhiro c Okuno, Hiroshi G b Ogata, Tetsuya a

a WASEDA UNIVERSITY (Japan)

b KYOTO UNIVERSITY (Japan)

c HONDA RESEARCH INSTITUTE JAPAN CO LTD (Japan)

Author keywords

Audio visual speech recognition; Deep learning; Feature extraction; Multi stream HMM

Indexed keywords

ACOUSTIC NOISE; AUDIO ACOUSTICS; FEATURE EXTRACTION; HIDDEN MARKOV MODELS; LEARNING SYSTEMS; NEURAL NETWORKS; SIGNAL TO NOISE RATIO; SPEECH ANALYSIS; VOCABULARY CONTROL;

AUDIO VISUAL SPEECH RECOGNITION; CONVOLUTIONAL NEURAL NETWORK; DEEP LEARNING; GENERALIZATION CAPABILITY; ISOLATED WORD RECOGNITION; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MULTI-STREAM HMM; RECOGNITION ALGORITHM;

SPEECH RECOGNITION;

EID: 84939956018 PISSN: 0924669X EISSN: 15737497 Source Type: Journal
DOI: 10.1007/s10489-014-0629-7 Document Type: Article

Times cited : (567)

References (52)

1
- 84906225505
- Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
- Lyon, France
- Abdel-Hamid O, Jiang H. (2013) Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France
- (2013) In: Proceedings of the 14th Annual Conference of the International Speech Communication Association
- Abdel-Hamid, O.¹ Jiang, H.²

2
- 84867605836
- Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- Speech,and Signal Processing, Kyoto
- Abdel-Hamid O, rahman Mohamed A, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech,and Signal Processing, Kyoto, pp 4277–4280
- Proceedings of the IEEE International Conference on Acoustics , pp. 4277-4280
- Abdel-Hamid, O.¹ rahman Mohamed, A.² Jiang, H.³

3
- 4544329810
- Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, Montreal
- Aleksic PS, Katsaggelos AK (2004) Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, Montreal, pp 917–920
- (2004) pp 917–920
- Aleksic, P.S.¹ Katsaggelos, A.K.²

4
- 84977800621
- Evidence of correlation between acoustic and visual features of speech. In: Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco
- Barker J, Berthommier F (1999) Evidence of correlation between acoustic and visual features of speech. In: Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, pp 5–9
- (1999) pp 5–9
- Barker, J.¹ Berthommier, F.²

5
- 69349090197
- Learning deep architectures for AI
- Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
- (2009) Found Trends Mach Learn , vol.2 , Issue.1
- Bengio, Y.¹

6
- 0030355935
- A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol 1, Philadelphia
- Bourlard H, Dupont S (1996) A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol 1, Philadelphia, pp 426–429
- (1996) pp 426–429
- Bourlard, H.¹ Dupont, S.²

7
- 84939943424
- Ris C: Multi-stream speech recognition.IDIAP research report
- Bourlard H, Dupont S, Ris C (1996) Multi-stream speech recognition.IDIAP research report
- (1996) Dupont S
- Bourlard, H.¹

8
- 0003573244
- Springer US, Boston
- Bourlard H a, Morgan N (1994) Connectionist speech recognition: a hybrid approach. Springer US, Boston
- (1994) Connectionist speech recognition: a hybrid approach
- Bourlard, H.¹ Morgan, N.²

9
- 0022920273
- (1986) Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics
- Techniques and Applications, London
- Brooke N, Petajan ED (1986) Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In: Proceedings of the International Conference on Speech Input and Output, Techniques and Applications, London, pp 104–109
- Proceedings of the International Conference on Speech Input and Output , pp. 104-109
- Brooke, N.¹ Petajan, E.D.²

10
- 84894294885
- Deep learning with COTS HPC. In: Proceedings of the 30th international conference on machine learning, Atlanta
- Coates A, Huval B, Wang T, Wu DJ, Ng AY, Catanzaro B (2013) Deep learning with COTS HPC. In: Proceedings of the 30th international conference on machine learning, Atlanta, pp 1337–1345
- (2013) pp 1337–1345
- Coates, A.¹ Huval, B.² Wang, T.³ Wu, D.J.⁴ Ng, A.Y.⁵ Catanzaro, B.⁶

11
- 0035363218
- Active appearance models
- Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
- (2001) IEEE Trans Pattern Anal Mach Intell , vol.23 , Issue.6 , pp. 681-685
- Cootes, T.¹ Edwards, G.² Taylor, C.³

12
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- Dahl GE, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
- (2012) IEEE Trans Audio Speech Lang Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Acero, A.²

13
- 84905259759
- Glass J (2014) Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
- Speech, and Signal Processing, Florence
- Feng X, Zhang Y, Glass J (2014) Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, pp 1759–1763
- Proceedings of the IEEE International Conference on Acoustics , pp. 1759-1763
- Feng, X.¹ Zhang, Y.²

14
- 63449120701
- Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition. In: Proceedings of the 10th International Conference on Multimodal Interfaces, Chania
- Gurban M, Thiran JP, Drugman T, Dutoit T (2008) Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition. In: Proceedings of the 10th International Conference on Multimodal Interfaces, Chania, pp 237– 240
- (2008) pp 237– 240
- Gurban, M.¹ Thiran, J.P.² Drugman, T.³ Dutoit, T.⁴

15
- 85009284526
- DCT-based video features for audio-visual speech recognition. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol 3, Denver
- Heckmann M, Kroschel K, Savariaux C (2002) DCT-based video features for audio-visual speech recognition. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol 3, Denver, pp 1925–1928
- (2002) pp 1925–1928
- Heckmann, M.¹ Kroschel, K.² Savariaux, C.³

16
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 3, Istanbul
- Hermansky H, Ellis D, Sharma S (2000) Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 3, Istanbul, pp 1635–1638
- (2000) pp 1635–1638
- Hermansky, H.¹ Ellis, D.² Sharma, S.³

17
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc Mag 29:82–97
- (2012) IEEE Signal Proc Mag , vol.29 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

18
- 33746600649
- Reducing the dimensionality of data with neural networks
- Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–7
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

19
- 84890465549
- Kingsbury B (2013) Audio-visual deep learning for noise robust speech recognition
- Speech, and Signal Processing, Vancouver
- Huang J, Kingsbury B (2013) Audio-visual deep learning for noise robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, pp 7596–7599
- Proceedings of the IEEE International Conference on Acoustics , pp. 7596-7599
- Huang, J.¹

20
- 84994350739
- Multi-stream speech recognition: Ready for prime time?
- Budapest, Hungary
- Janin A, Ellis D, Morgan N (1999) Multi-stream speech recognition: Ready for prime time? In: Proceedings of the 6th European Conference on Speech Communication and Technology. Budapest, Hungary
- (1999) In: Proceedings of the 6th European Conference on Speech Communication and Technology
- Janin, A.¹ Ellis, D.² Morgan, N.³

21
- 84887042736
- Using very deep autoencoders for content-based image retrieval
- Bruges, Belgium
- Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval. In: Proceedings of the 19th European Symposium on Artificial Neural Networks. Bruges, Belgium
- (2011) In: Proceedings of the 19th European Symposium on Artificial Neural Networks
- Krizhevsky, A.¹ Hinton, G.E.²

22
- 84939939168
- Hinton G: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems
- Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems
- (2012) Sutskever , vol.I
- Krizhevsky, A.¹

23
- 0024874951
- Watanabe T (1989) Construction of a large-scale Japanese speech database and its management system
- Speech, and Signal Processing, Glasgow
- Kuwabara H, Takeda K, Sagisaka Y, Katagiri S, Morikawa S, Watanabe T (1989) Construction of a large-scale Japanese speech database and its management system. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, pp 560–563
- Proceedings of the IEEE International Conference on Acoustics , pp. 560-563
- Kuwabara, H.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Morikawa, S.⁵

24
- 82955182641
- Improving visual features for lip-reading
- Proceedings of the International Conference on Auditory-Visual Speech Processing, Hakone,Japan
- Lan Y, Theobald BJ, Harvey R, Ong EJ, Bowden R (2010) Improving visual features for lip-reading. In: Proceedings of the International Conference on Auditory-Visual Speech Processing. Hakone,Japan
- (2010) In
- Lan, Y.¹ Theobald, B.J.² Harvey, R.³ Ong, E.J.⁴ Bowden, R.⁵

25
- 84867135575
- Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh
- Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, pp 81–88
- (2012) pp 81–88
- Le, Q.V.¹ Ranzato, M.² Monga, R.³ Devin, M.⁴ Chen, K.⁵ Corrado, G.S.⁶ Dean, J.⁷ Ng, A.Y.⁸

26
- 5044231640
- Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2, Washington
- LeCun Y, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2, Washington, pp 97–104
- (2004) pp 97–104
- LeCun, Y.¹ Bottou, L.²

27
- 0032203257
- Gradient-based learning applied to document recognition
- LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
- (1998) Proc IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

28
- 71149119164
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th International Conference on Machine Learning, Montreal
- Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th International Conference on Machine Learning, Montreal, pp 609– 616
- (2009) pp 609– 616
- Lee, H.¹ Grosse, R.² Ranganath, R.³ Ng, A.Y.⁴

29
- 84863380535
- Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems 22, Vancouver
- Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems 22, Vancouver, pp 1096–1104
- (2009) pp 1096–1104
- Lee, H.¹ Pham, P.² Largman, Y.³ Ng, A.Y.⁴

30
- 0032822143
- A comparative study of neural network based feature extraction paradigms
- Lerner B, Guterman H, Aladjem M, Dinstein I (1999) A comparative study of neural network based feature extraction paradigms. Pattern Recogn Lett 20(1):7–14
- (1999) Pattern Recogn Lett , vol.20 , Issue.1 , pp. 7-14
- Lerner, B.¹ Guterman, H.² Aladjem, M.³ Dinstein, I.⁴

31
- 0029765665
- Visual speech recognition using active shape models and hidden Markov models. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 2, Atlanta
- Luettin J, Thacker N, Beet S (1996) Visual speech recognition using active shape models and hidden Markov models. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 2, Atlanta, pp 817–820
- (1996) pp 817–820
- Luettin, J.¹ Thacker, N.² Beet, S.³

32
- 84977820250
- Recurrent neural network feature enhancement: The 2nd chime challenge. In: Proceedings of the 2nd International Workshop on Machine Listening in Multisource Environments.Vancouver
- Maas AL, O’Neil TM, Hannun AY, Ng AY (2013) Recurrent neural network feature enhancement: The 2nd chime challenge. In: Proceedings of the 2nd International Workshop on Machine Listening in Multisource Environments.Vancouver, Canada
- (2013) Canada
- Maas, A.L.¹ O’Neil, T.M.² Hannun, A.Y.³ Ng, A.Y.⁴

33
- 77956541496
- Machine, Haifa
- Martens J (2010) Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Learning, Machine, Haifa, pp 735–742
- (2010) Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Learning , pp. 735-742
- Martens, J.¹

34
- 0036472941
- Extraction of visual features for lipreading
- Matthews I, Cootes T, Bangham J, Cox S, Harvey R (2002) Extraction of visual features for lipreading. IEEE Trans Pattern Anal Mach Intell 24(2):198–213
- (2002) IEEE Trans Pattern Anal Mach Intell , vol.24 , Issue.2 , pp. 198-213
- Matthews, I.¹ Cootes, T.² Bangham, J.³ Cox, S.⁴ Harvey, R.⁵

35
- 84908265391
- A comparison of model and transform-based visual features for audio-visual LVCSR
- Tokyo, Japan
- Matthews I, Potamianos G, Neti C, Luettin J (2001) A comparison of model and transform-based visual features for audio-visual LVCSR. In: Proceedings of the IEEE International Conference on Multimedia and Expo. Tokyo, Japan
- (2001) In: Proceedings of the IEEE International Conference on Multimedia and Expo
- Matthews, I.¹ Potamianos, G.² Neti, C.³ Luettin, J.⁴

36
- 84055211743
- Acoustic modeling using deep belief networks
- Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
- (2012) IEEE Trans Audio Speech Lang Process , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

37
- 80053437179
- Multimodal deep learning
- Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning
- (2011) In: Proceedings of the 28th International Conference on Machine Learning
- Ngiam, J.¹ Khosla, A.² Kim, M.³ Nam, J.⁴ Lee, H.⁵ Ng, A.Y.⁶

38
- 84977817207
- CUBLAS library version 6.0 user guide
- NVIDIA Corporation (2014) CUBLAS library version 6.0 user guide. CUDA Toolkit Documentation
- (2014) CUDA Toolkit Documentation
- Corporation, N.V.I.D.I.A.¹

39
- 84906273908
- Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
- Lyon, France
- Palaz D, Collobert R, Magimai.-Doss M (2013) Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France
- (2013) In: Proceedings of the 14th Annual Conference of the International Speech Communication Association
- Palaz, D.¹ Collobert, R.²

40
- 0000255539
- Fast exact multiplication by the Hessian
- Pearlmutter B (1994) Fast exact multiplication by the Hessian. Neural Comput 6(1):147–160
- (1994) Neural Comput , vol.6 , Issue.1 , pp. 147-160
- Pearlmutter, B.¹

41
- 0028194709
- Renals S, Morgan N, Member S, Bourlard H, Cohen M, Franco H (1994) Connectionist probability estimators in HMM speech recognition 2(1):161–174
- (1994) Connectionist probability estimators in HMM speech recognition , vol.2 , Issue.1 , pp. 161-174
- Renals, S.¹ Morgan, N.² Member, S.³ Bourlard, H.⁴ Cohen, M.⁵ Franco, H.⁶

42
- 0004762797
- Exploiting sensor fusion architectures and stimuli complementarity in av speech recognition
- Stork D, Hennecke M, (eds), Springer, Berlin Heidelberg
- Robert-Ribes J, Piquemal M, Schwartz JL, Escudier P (1996) Exploiting sensor fusion architectures and stimuli complementarity in av speech recognition. In: Stork D, Hennecke M (eds) Speechreading by Humans and Machines. Springer, Berlin Heidelberg, pp 193–210
- (1996) Speechreading by Humans and Machines , pp. 193-210
- Robert-Ribes, J.¹ Piquemal, M.² Schwartz, J.L.³ Escudier, P.⁴

43
- 84867593213
- Auto-encoder bottleneck features using deep belief networks. In:Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto
- Sainath TN, Kingsbury B, Ramabhadran B (2012) Auto-encoder bottleneck features using deep belief networks. In:Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, pp 4153–4156
- (2012) pp 4153–4156
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³

44
- 0035791204
- Reilly R (2001) Feature analysis for automatic speechreading
- Multimedia Signal, Cannes
- Scanlon P, Reilly R (2001) Feature analysis for automatic speechreading. In: Proceedings of the IEEE 4th Workshop on Processing, Multimedia Signal, Cannes, pp 625–630
- Proceedings of the IEEE 4th Workshop on Processing , pp. 625-630
- Scanlon, P.¹

45
- 0036631778
- Fast curvature matrix-vector products for second-order gradient descent
- Schraudolph NN (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–38
- (2002) Neural Comput , vol.14 , Issue.7 , pp. 1723-1738
- Schraudolph, N.N.¹

46
- 0004213132
- Auditory toolbox: A MATLAB toolbox for auditory modeling work version 2
- Slaney M (1998) Auditory toolbox: A MATLAB toolbox for auditory modeling work version 2. Interval research corproation
- (1998) Interval research corproation
- Slaney, M.¹

47
- 80053459857
- Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue
- Sutskever I, Martens J, Hinton G (2011) Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, pp 1017–1024
- (2011) pp 1017–1024
- Sutskever, I.¹ Martens, J.² Hinton, G.³

48
- 56449089103
- Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, New York
- Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, New York, pp 1096–1103
- (2008) pp 1096–1103
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.A.⁴

49
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
- (2010) J Mach Learn Res , vol.11 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.A.⁵

50
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- Yehia H, Rubin P, Vatikiotis-Bateson E (1998) Quantitative association of vocal-tract and facial behavior. Speech Comm 26:23–43
- (1998) Speech Comm , vol.26 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

51
- 77950563943
- Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots, Paris
- Yoshida T, Nakadai K, Okuno HG (2009) Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots, Paris, pp 604–609
- (2009) pp 604–609
- Yoshida, T.¹ Nakadai, K.² Okuno, H.G.³

52
- 84939954092
- Young S, Evermann G, Gales M, Hain T, Liu XA, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2009) The HTK Book (for HTK Version 3.4),.Cambridge University Engineering Department
- Young S, Evermann G, Gales M, Hain T, Liu XA, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2009) The HTK Book (for HTK Version 3.4),.Cambridge University Engineering Department

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.