SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 7-11

The IBM 2016 English conversational telephone speech recognition system

(4) Saon, George a Sercu, Tom a Rennie, Steven a Kuo, Hong Kwang J a

a IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Conversational speech recognition; Convolutional neural networks; Recurrent neural networks

Indexed keywords

COMPUTATIONAL LINGUISTICS; CONVOLUTION; MODELING LANGUAGES; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION; SPEECH PROCESSING; TELEPHONE SETS;

ACOUSTIC AND LANGUAGE MODELS; CONVERSATIONAL SPEECH RECOGNITION; CONVERSATIONAL TELEPHONE SPEECH RECOGNITION; CONVOLUTIONAL NEURAL NETWORK; HIERARCHICAL NEURAL NETWORKS; LANGUAGE MODEL; LONG SHORT TERM MEMORY; WORD ERROR RATE;

SPEECH RECOGNITION;

EID: 84994201246 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-1460 Document Type: Conference Paper

Times cited : (85)

References (27)

1
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chien, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Seide, F.¹ Li, G.² Chien, X.³ Yu, D.⁴

2
- 84906214784
- Exploring convolutional neural network structures and optimization techniques for speech recognition
- O. Abdel-Hamid, L. Deng, and D. Yu, "Exploring convolutional neural network structures and optimization techniques for speech recognition." in INTERSPEECH, 2013, pp. 3366-3370.
- (2013) INTERSPEECH , pp. 3366-3370
- Abdel-Hamid, O.¹ Deng, L.² Yu, D.³

3
- 84890525984
- Deep convolutional neural networks for lvcsr
- T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for lvcsr," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8614-8618.
- (2013) Acoustics, Speech and Signal Processing (ICASSP 2013 IEEE International Conference On. IEEE , pp. 8614-8618
- Sainath, T.N.¹ Mohamed, A.-R.² Kingsbury, B.³ Ramabhadran, B.⁴

4
- 84910072497
- Unfolded recurrent neural networks for speech recognition
- G. Saon, H. Soltau, A. Emami, and M. Picheny, "Unfolded recurrent neural networks for speech recognition," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- (2014) Fifteenth Annual Conference of the International Speech Communication Association
- Saon, G.¹ Soltau, H.² Emami, A.³ Picheny, M.⁴

5
- 84959115289
- A time delay neural network architecture for efficient modeling of long temporal contexts
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Povey, D.² Khudanpur, S.³

6
- 84928545733
- arXiv preprint arXiv:1412.5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., "Deepspeech: Scaling up end-to-end speech recognition," arXiv preprint arXiv:1412.5567, 2014.
- (2014) Deepspeech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷ Satheesh, S.⁸ Sengupta, S.⁹ Coates, A.¹⁰

7
- 84946084790
- Learning acoustic frame labeling for speech recognition with recurrent neural networks
- H. Sak, A. Senior, K. Rao, O. Irsoy, A. Graves, F. Beaufays, and J. Schalkwyk, "Learning acoustic frame labeling for speech recognition with recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4280-4284.
- (2015) Acoustics, Speech and Signal Processing (ICASSP 2015 IEEE International Conference On. IEEE , pp. 4280-4284
- Sak, H.¹ Senior, A.² Rao, K.³ Irsoy, O.⁴ Graves, A.⁵ Beaufays, F.⁶ Schalkwyk, J.⁷

8
- 84964489732
- arXiv preprint arXiv:1507.08240
- Y. Miao, M. Gowayyed, and F. Metze, "Eesen: End-to-end speech recognition using deep rnn models and wfst-based decoding," arXiv preprint arXiv:1507.08240, 2015.
- (2015) Eesen: End-to-end Speech Recognition Using Deep Rnn Models and Wfst-based Decoding
- Miao, Y.¹ Gowayyed, M.² Metze, F.³

9
- 84964507635
- Deep bi-directional recurrent networks over spectral windows
- A.-r. Mohamed, F. Seide, D. Yu, J. Droppo, A. Stolcke, G. Zweig, and G. Penn, "Deep bi-directional recurrent networks over spectral windows," in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015.
- (2015) Automatic Speech Recognition and Understanding (ASRU 2015 IEEE Workshop On. IEEE
- Mohamed, A.-R.¹ Seide, F.² Yu, D.³ Droppo, J.⁴ Stolcke, A.⁵ Zweig, G.⁶ Penn, G.⁷

10
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A.-r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6645-6649.
- (2013) Acoustics, Speech and Signal Processing (ICASSP 2013 IEEE International Conference On. IEEE , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

11
- 84959129849
- The IBM 2015 English conversational speech recognition system
- G. Saon, H.-K. Kuo, S. Rennie, and M. Picheny, "The IBM 2015 English conversational speech recognition system," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
- (2015) Sixteenth Annual Conference of the International Speech Communication Association
- Saon, G.¹ Kuo, H.-K.² Rennie, S.³ Picheny, M.⁴

12
- 84892421248
- arXiv preprint arXiv:1302.4389
- I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks," arXiv preprint arXiv:1302.4389, 2013.
- (2013) Maxout Networks
- Goodfellow, I.J.¹ Warde-Farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

13
- 84878379108
- Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization
- B. Kingsbury, T. Sainath, and H. Soltau, "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Kingsbury, B.¹ Sainath, T.² Soltau, H.³

14
- 84973324686
- Very deep multilingual convolutional neural networks for lvcsr
- T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, "Very deep multilingual convolutional neural networks for lvcsr," Proc. ICASSP, 2016.
- Proc. ICASSP, 2016
- Sercu, T.¹ Puhrsch, C.² Kingsbury, B.³ LeCun, Y.⁴

15
- 84994202024
- arXiv
- T. Sercu and V. Goel, "Advances in very deep convolutional neural networks for lvcsr," arXiv, 2016.
- (2016) Advances in Very Deep Convolutional Neural Networks for Lvcsr
- Sercu, T.¹ Goel, V.²

16
- 84978755117
- Very deep convolutional networks for large-scale image recognition
- arXiv:1409.1556
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," CoRR arXiv:1409.1556, 2014.
- (2014) CoRR
- Simonyan, K.¹ Zisserman, A.²

17
- 84905265980
- Joint training of convolutional and non-convolutional neural networks
- H. Soltau, G. Saon, and T. N. Sainath, "Joint training of convolutional and non-convolutional neural networks," to Proc. ICASSP, 2014.
- (2014) Proc. ICASSP
- Soltau, H.¹ Saon, G.² Sainath, T.N.³

18
- 84990044091
- Torch7: A matlab-like environment for machine learning
- R. Collobert, K. Kavukcuoglu, and C. Farabet, "Torch7: A matlab-like environment for machine learning," in BigLearn, NIPS Workshop, no. EPFL-CONF-192376, 2011.
- (2011) BigLearn, NIPS Workshop, No. EPFL-CONF-192376
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

19
- 84890543852
- Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription
- H. Su, G. Li, D. Yu, and F. Seide, "Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription," Proc. ICASSP, 2013.
- (2013) Proc. ICASSP
- Su, H.¹ Li, G.² Yu, D.³ Seide, F.⁴

20
- 0033329799
- An empirical study of smoothing techniques for language modeling
- S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Computer Speech & Language, vol. 13, no. 4, pp. 359-393, 1999.
- (1999) Computer Speech & Language , vol.13 , Issue.4 , pp. 359-393
- Chen, S.F.¹ Goodman, J.²

21
- 0012611072
- Entropy-based pruning of backoff language models
- A. Stolcke, "Entropy-based pruning of backoff language models," in Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 270-274.
- (1998) Proc. DARPA Broadcast News Transcription and Understanding Workshop , pp. 270-274
- Stolcke, A.¹

22
- 84863387613
- Shrinking exponential language models
- S. F. Chen, "Shrinking exponential language models," in Proc. NAACL-HLT, 2009, pp. 468-476.
- (2009) Proc. NAACL-HLT , pp. 468-476
- Chen, S.F.¹

23
- 0142166851
- A neural probabilistic language model
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
- (2003) Journal of Machine Learning Research , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Jauvin, C.⁴

24
- 85055309630
- Ph.D. dissertation, Johns Hopkins University, Baltimore, MD, USA
- A. Emami, "A neural syntactic language model," Ph.D. dissertation, Johns Hopkins University, Baltimore, MD, USA, 2006.
- (2006) A Neural Syntactic Language Model
- Emami, A.¹

25
- 33847610331
- Continuous space language models
- H. Schwenk, "Continuous space language models," Computer Speech & Language, vol. 21, no. 3, pp. 492-518, 2007.
- (2007) Computer Speech & Language , vol.21 , Issue.3 , pp. 492-518
- Schwenk, H.¹

26
- 44849092930
- Empirical study of neural network language models for Arabic speech recognition
- A. Emami and L. Mangu, "Empirical study of neural network language models for Arabic speech recognition," in Proc. ASRU, 2007, pp. 147-152.
- (2007) Proc. ASRU , pp. 147-152
- Emami, A.¹ Mangu, L.²

27
- 84878422162
- Large scale hierarchical neural network language models
- H.-K. J. Kuo, E. Arisoy, A. Emami, and P. Vozila, "Large scale hierarchical neural network language models," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Kuo, H.-K.J.¹ Arisoy, E.² Emami, A.³ Vozila, P.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.