SCOPUS 정보 검색 플랫폼

2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Volumn , Issue , 2013, Pages 344-349

DNN acoustic modeling with modular multi-lingual feature extraction networks

(4) Gehring, Jonas a Nguyen, Quoc Bao a Metze, Florian b Waibel, Alex a,b

a KARLSRUHE INSTITUTE OF TECHNOLOGY (Germany)

b Carnegie Mellon University ^* (United States)

Author keywords

Deep Neural Networks; Large Vocabulary Speech Recognition; Low Resource Acoustic Modeling; Multi Lingual Acoustic Modeling

Indexed keywords

ACOUSTIC FEATURES; ACOUSTIC MODEL; ACOUSTIC MODEL TRAININGS; CONVERSATIONAL TELEPHONE SPEECH; DEEP NEURAL NETWORKS; FEATURE EXTRACTOR; HIGH-LEVEL FEATURES; MULTIPLE LANGUAGES;

FEATURE EXTRACTION; NEURAL NETWORKS; SPEECH RECOGNITION; SPEECH TRANSMISSION;

LINGUISTICS;

EID: 84893642465 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2013.6707754 Document Type: Conference Paper

Times cited : (10)

References (28)

1
- 0024634603
- Phoneme recognition using time-delay neural networks
- DOI 10.1109/29.21701
- A. Waibel, T. Hanazawa, G.E. Hinton, K. Shikano, and K.J. Lang, "Phoneme recognition using time-delay neural networks, " Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 3, pp. 328-339, 1989. (Pubitemid 19065785)
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.J.⁵

2
- 84951490428
- Review of neural networks for speech recognition
- R.P. Lippmann, "Review of neural networks for speech recognition, " Neural computation, vol. 1, no. 1, pp. 1-38, 1989.
- (1989) Neural Computation , vol.1 , Issue.1 , pp. 1-38
- Lippmann, R.P.¹

3
- 0003573244
- Springer
- H.A. Bourlard and N. Morgan, Connectionist speech recognition: A hybrid approach, vol. 247, Springer, 1994.
- (1994) Connectionist Speech Recognition: A Hybrid Approach , vol.247
- Bourlard, H.A.¹ Morgan, N.²

4
- 0039105699
- Multi-state time delay neural networks for continuous speech recognition
- P. Haffner and A. Waibel, "Multi-state time delay neural networks for continuous speech recognition, " Advances in Neural Information Processing Systems, pp. 135-135, 1993.
- (1993) Advances in Neural Information Processing Systems , pp. 135-135
- Haffner, P.¹ Waibel, A.²

5
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G.E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

6
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks, " in Proc. Interspeech, 2011, pp. 437-440.
- (2011) Proc. Interspeech , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

7
- 84874226274
- The language-independent bottleneck features
- IEEE
- K. Vesely, M. Karafiát, F. Grezl, M. Janda, and E. Egorova, "The language-independent bottleneck features, " in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 336-341.
- (2012) Spoken Language Technology Workshop (SLT), 2012 IEEE , pp. 336-341
- Vesely, K.¹ Karafiát, M.² Grezl, F.³ Janda, M.⁴ Egorova, E.⁵

8
- 84890539009
- Multilingual acoustic models using distributed deep neural networks
- G Heigold, V Vanhoucke, A Senior, P Nguyen, M Ranzato, M Devin, and J Dean, "Multilingual acoustic models using distributed deep neural networks, " in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
- Heigold, G.¹ Vanhoucke, V.² Senior, A.³ Nguyen, P.⁴ Ranzato, M.⁵ Devin, M.⁶ Dean, J.⁷

9
- 0024939480
- Modularity and scaling in large phonemic neural networks
- DOI 10.1109/29.45535
- A Waibel, H Sawai, and K Shikano, "Modularity and scaling in large phonemic neural networks, " Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 12, pp. 1888-1898, 1989. (Pubitemid 20642700)
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.12 , pp. 1888-1898
- Waibel Alexander¹ Sawai Hidefumi² Shikano Kiyohiro³

10
- 27144439262
- Dataderived nonlinear mapping for feature extraction in hmm
- Citeseer
- Hynek Hermansky, Sangita Sharma, and Pratibha Jain, "Dataderived nonlinear mapping for feature extraction in hmm, " in Proc. ASRU. Citeseer, 1999, vol. 99.
- (1999) Proc. ASRU , vol.99
- Hermansky, H.¹ Sharma, S.² Jain, P.³

11
- 70450217311
- Hierarchical processing of the modulation spectrum for gale mandarin lvcsr system
- F. Valente, M. Magimai-Doss, C. Plahl, and S.V. Ravuri, "Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system., " in Proc. Interspeech, 2009, pp. 2963-2966.
- (2009) Proc. Interspeech , pp. 2963-2966
- Valente, F.¹ Magimai-Doss, M.² Plahl, C.³ Ravuri, S.V.⁴

12
- 84906273176
- Modular combination of deep neural networks for acoustic modeling
- to appear
- J. Gehring, W. Lee, K. Kilgour, I. Lane, Y. Miao, and A. Waibel, "Modular combination of deep neural networks for acoustic modeling, " in Proc. Interspeech, 2013, to appear.
- (2013) Proc. Interspeech
- Gehring, J.¹ Lee, W.² Kilgour, K.³ Lane, I.⁴ Miao, Y.⁵ Waibel, A.⁶

13
- 84874278045
- Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR
- IEEE
- P. Swietojanski, A. Ghoshal, and S. Renals, "Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR, " in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 246-251.
- (2012) Spoken Language Technology Workshop (SLT), 2012 IEEE , pp. 246-251
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

14
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- IEEE
- F. Grézl, M. Karafiát, S. Kontár, and J. Cernocky, "Probabilistic and bottle-neck features for LVCSR of meetings, " in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, 2007, vol. 4, pp. IV-757.
- (2007) Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, IEEE International Conference on , vol.4
- Grézl, F.¹ Karafiát, M.² Kontár, S.³ Cernocky, J.⁴

15
- 84890482429
- Extracting deep bottleneck features using stacked auto-encoders
- IEEE
- J Gehring, Y Miao, F Metze, and A Waibel, "Extracting deep bottleneck features using stacked auto-encoders, " in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

16
- 84864073449
- Greedy layer-wise training of deep networks
- Y Bengio, P Lamblin, D Popovici, and H Larochelle, "Greedy layer-wise training of deep networks, " Advances in neural information processing systems, vol. 19, pp. 153, 2007.
- (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 153
- Bengio, Y.¹ Lamblin, P.² Popovici, D.³ Larochelle, H.⁴

17
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, " The Journal of Machine Learning Research, vol. 11, pp. 3371- 3408, 2010.
- (2010) The Journal of Machine Learning Research , vol.11 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.A.⁵

18
- 84878559540
- An investigation on initialization schemes for multilayer perceptron training using multilingual data and their effect on asr performance
- N.T. Vu, W. Breiter, F. Metze, and T. Schultz, "An investigation on initialization schemes for multilayer perceptron training using multilingual data and their effect on ASR performance, " in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Vu, N.T.¹ Breiter, W.² Metze, F.³ Schultz, T.⁴

19
- 84867224965
- On the use of a multilingual neural network front-end
- S. Scanzio, P. Laface, L. Fissore, R. Gemello, and F. Mana, "On the use of a multilingual neural network front-end., " in Proc. Interspeech, 2008, pp. 2711-2714.
- (2008) Proc. Interspeech , pp. 2711-2714
- Scanzio, S.¹ Laface, P.² Fissore, L.³ Gemello, R.⁴ Mana, F.⁵

20
- 84055163920
- Roles of pre-training and finetuning in context-dependent dbn-hmms for real-world speech recognition
- D. Yu, L. Deng, and G. Dahl, "Roles of pre-training and finetuning in context-dependent dbn-hmms for real-world speech recognition, " in Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2010.
- (2010) Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning
- Yu, D.¹ Deng, L.² Dahl, G.³

21
- 84890498592
- Warped minimum variance distortionless response based bottle neck features for LVCSR
- K. Kilgour, T. Seytzer, Q.B. Nguyen, and A. Waibel, "Warped minimum variance distortionless response based bottle neck features for LVCSR, " in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
- Kilgour, K.¹ Seytzer, T.² Nguyen, Q.B.³ Waibel, A.⁴

22
- 84890461500
- Multilingual training of deep-neural netowrks
- A. Ghoshal, P. Swietojanski, and S. Renals, "Multilingual training of deep-neural netowrks, " in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
- Ghoshal, A.¹ Swietojanski, P.² Renals, S.³

23
- 84893672075
- The fundamental frequency variation spectrum
- K Laskowski, MHeldner, and J Edlund, "The fundamental frequency variation spectrum, " Proceedings of FONETIK 2008, pp. 29-32, 2008.
- (2008) Proceedings of FONETIK 2008 , pp. 29-32
- Laskowski, K.¹ Heldner, M.² Edlund, J.³

24
- 84893656667
- Models of tone for tonal and non-tonal languages
- IEEE, submitted for review
- F. Metze, Z.A. Sheik, A. Waibel, J. Gehring, K. Kilgour, Q.B. Nguyen, and V.H. Nguyen, "Models of tone for tonal and non-tonal languages, " in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2013, submitted for review.
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
- Metze, F.¹ Sheik, Z.A.² Waibel, A.³ Gehring, J.⁴ Kilgour, K.⁵ Nguyen, Q.B.⁶ Nguyen, V.H.⁷

25
- 84893675002
- Retrieved 2013-06-29
- "IARPA, Office for Incisive Analysis, Babel Program, " http://www.iarpa.gov/Programs/ia/Babel/babel.html Retrieved 2013-06-29.
- IARPA, Office for Incisive Analysis, Babel Program

26
- 0030643785
- The karlsruhe-verbmobil speech recognition engine
- IEEE
- M. Finke, P. Geutner, H. Hild, T. Kemp, K. Ries, and M.Westphal, "The Karlsruhe-Verbmobil speech recognition engine, " in Acoustics, Speech, and Signal Processing, 1997. ICASSP- 97., 1997 IEEE International Conference on. IEEE, 1997, vol. 1, pp. 83-86.
- (1997) Acoustics, Speech, and Signal Processing, 1997, ICASSP- 97, 1997 IEEE International Conference on , vol.1 , pp. 83-86
- Finke, M.¹ Geutner, P.² Hild, H.³ Kemp, T.⁴ Ries, K.⁵ Westphal, M.⁶

27
- 84857819132
- Theano: A CPU and GPU math expression compiler
- June, Oral Presentation
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, "Theano: A CPU and GPU Math Expression Compiler, " in Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010, Oral Presentation.
- (2010) Proceedings of the Python for Scientific Computing Conference (SciPy)
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

28
- 84874282188
- Improving wideband speech recognition using mixed-bandwidth training data in cd-dnn-hmm
- 2012 IEEE. IEEE
- Jinyu Li, Dong Yu, Jui-Ting Huang, and Yifan Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM, " in Spoken Language Technology Workshop (SLT), 2012 IEEE. IEEE, 2012, pp. 131-136.
- (2012) Spoken Language Technology Workshop (SLT) , pp. 131-136
- Li, J.¹ Yu, D.² Huang, J.-T.³ Gong, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.