SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 3141-3144

The IBM 2015 English conversational telephone speech recognition system

(4) Saon, George a Kuo, Hong Kwang J a Rennie, Steven a Picheny, Michael a

a IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Conversational speech recognition; Convolutional neural networks; Recurrent neural networks

Indexed keywords

CONVOLUTION; MODELING LANGUAGES; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION; TELEPHONE SETS;

CONVERSATIONAL SPEECH RECOGNITION; CONVERSATIONAL TELEPHONE SPEECH RECOGNITION; CONVOLUTIONAL NEURAL NETWORK; EVALUATION TEST; JOINT MODELING; LANGUAGE MODEL; OUTPUT LAYER; WORD ERROR RATE;

SPEECH RECOGNITION;

EID: 84959129849 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (84)

References (30)

1
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chien, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Seide, F.¹ Li, G.² Chien, X.³ Yu, D.⁴

2
- 0031187171
- Speech recognition by machines and humans
- R. P. Lippmann, "Speech recognition by machines and humans," Speech communication, vol. 22, no. 1, pp. 1-15, 1997.
- (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
- Lippmann, R.P.¹

3
- 84976224306
- Performance of the IBM LVCSR system on the Switchboard corpus
- F. Liu, M. Monkowski, M. Novak, M. Padmanabhan, M. Picheny, and P. Rao, "Performance of the IBM LVCSR system on the Switchboard corpus," in Proceedings of Speech Research Symposium, 1995, p. 189.
- (1995) Proceedings of Speech Research Symposium , pp. 189
- Liu, F.¹ Monkowski, M.² Novak, M.³ Padmanabhan, M.⁴ Picheny, M.⁵ Rao, P.⁶

4
- 0012236195
- The CU-HTK March 2000 HUB5E transcription system
- Baltimore
- T. Hain, P. Woodland, G. Evermann, and D. Povey, "The CU-HTK march 2000 HUB5E transcription system," in Proc. Speech Transcription Workshop, vol. 1. Baltimore, 2000.
- (2000) Proc. Speech Transcription Workshop , vol.1
- Hain, T.¹ Woodland, P.² Evermann, G.³ Povey, D.⁴

5
- 33646798740
- The IBM 2004 conversational telephony system for rich transcription.
- H. Soltau, B. Kingsbury, L. Mangu, D. Povey, G. Saon, and G. Zweig, "The IBM 2004 conversational telephony system for rich transcription." in Acoustics, Speech and Signal Processing (ICASSP), 2005 IEEE International Conference on, 2005, pp. 205-208.
- (2005) Acoustics, Speech and Signal Processing (ICASSP), 2005 IEEE International Conference on , pp. 205
- Soltau, H.¹ Kingsbury, B.² Mangu, L.³ Povey, D.⁴ Saon, G.⁵ Zweig, G.⁶

6
- 79951796005
- The IBM Attila speech recognition toolkit
- H. Soltau, G. Saon, and B. Kingsbury, "The IBM Attila speech recognition toolkit," in Proc. of IEEE Workshop on Spoken Language Technology (SLT), 2010, pp. 97-102.
- (2010) Proc. of IEEE Workshop on Spoken Language Technology (SLT) , pp. 97-102
- Soltau, H.¹ Saon, G.² Kingsbury, B.³

7
- 33646788786
- FMPE: Discriminatively trained features for speech recognition
- D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig, "fMPE: Discriminatively trained features for speech recognition," in Proc. of ICASSP, 2005, pp. 961-964.
- (2005) Proc. of ICASSP , pp. 961
- Povey, D.¹ Kingsbury, B.² Mangu, L.³ Saon, G.⁴ Soltau, H.⁵ Zweig, G.⁶

8
- 84906274730
- Sequence-discriminative training of deep neural networks
- K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequence-discriminative training of deep neural networks," in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Vesely, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

9
- 84910069984
- 1-bit stochastic gradient descent and its application to dataparallel distributed training of speech dnns
- F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, "1-bit stochastic gradient descent and its application to dataparallel distributed training of speech dnns," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- (2014) Fifteenth Annual Conference of the International Speech Communication Association
- Seide, F.¹ Fu, H.² Droppo, J.³ Li, G.⁴ Yu, D.⁵

10
- 84928545733
- arXiv preprint arXiv:1412.5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., "Deepspeech: Scaling up end-to-end speech recognition," arXiv preprint arXiv:1412.5567, 2014.
- (2014) Deepspeech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷ Satheesh, S.⁸ Sengupta, S.⁹ Coates, A.¹⁰

11
- 84905240378
- Sequence training of multiple deep neural networks for better performance and faster training speed
- IEEE
- P. Zhou, L. Dai, and H. Jiang, "Sequence training of multiple deep neural networks for better performance and faster training speed," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 5627-5631.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 5627-5631
- Zhou, P.¹ Dai, L.² Jiang, H.³

12
- 84959160199
- arXiv preprint arXiv:1406.7806
- A. L. Maas, A. Y. Hannun, C. T. Lengerich, P. Qi, D. Jurafsky, and A. Y. Ng, "Increasing deep neural network acoustic model size for large vocabulary continuous speech recognition," arXiv preprint arXiv:1406.7806, 2014.
- (2014) Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition
- Maas, A.L.¹ Hannun, A.Y.² Lengerich, C.T.³ Qi, P.⁴ Jurafsky, D.⁵ Ng, A.Y.⁶

13
- 84905265980
- Joint training of convolutional and non-convolutional neural networks
- H. Soltau, G. Saon, and T. N. Sainath, "Joint training of convolutional and non-convolutional neural networks," to Proc. ICASSP, 2014.
- (2014) To Proc. ICASSP
- Soltau, H.¹ Saon, G.² Sainath, T.N.³

14
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors," in Proc. ASRU, 2013.
- (2013) Proc. ASRU
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

15
- 84878379108
- Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization
- B. Kingsbury, T. Sainath, and H. Soltau, "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Kingsbury, B.¹ Sainath, T.² Soltau, H.³

16
- 84892421248
- arXiv preprint arXiv:1302.4389
- I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks," arXiv preprint arXiv:1302.4389, 2013.
- (2013) Maxout Networks
- Goodfellow, I.J.¹ Warde-Farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

17
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

18
- 84905239342
- Improving deep neural network acoustic models using generalized maxout networks
- IEEE
- X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving deep neural network acoustic models using generalized maxout networks," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 215-219.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 215
- Zhang, X.¹ Trmal, J.² Povey, D.³ Khudanpur, S.⁴

19
- 84946693226
- Annealed dropout training of deep networks
- IEEE
- S. Rennie, V. Goel, and S. Thomas, "Annealed dropout training of deep networks," in Spoken Language Technology (SLT), IEEE Workshop on. IEEE, 2014.
- (2014) Spoken Language Technology (SLT), IEEE Workshop on
- Rennie, S.¹ Goel, V.² Thomas, S.³

20
- 84890454527
- Low-rank matrix factorization for deep neural network training with high-dimensional output targets
- T. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, "Low-rank matrix factorization for deep neural network training with high-dimensional output targets," in Proc. of ICASSP, 2013.
- (2013) Proc. of ICASSP
- Sainath, T.¹ Kingsbury, B.² Sindhwani, V.³ Arisoy, E.⁴ Ramabhadran, B.⁵

21
- 84890525984
- Deep convolutional neural networks for LVCSR
- IEEE
- T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for LVCSR," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8614-8618.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pp. 8614-8618
- Sainath, T.N.¹ Mohamed, A.-R.² Kingsbury, B.³ Ramabhadran, B.⁴

22
- 84910072497
- Unfolded recurrent neural networks for speech recognition
- G. Saon, H. Soltau, A. Emami, and M. Picheny, "Unfolded recurrent neural networks for speech recognition," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- (2014) Fifteenth Annual Conference of the International Speech Communication Association
- Saon, G.¹ Soltau, H.² Emami, A.³ Picheny, M.⁴

23
- 0033329799
- An empirical study of smoothing techniques for language modeling
- S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Computer Speech & Language, vol. 13, no. 4, pp. 359-393, 1999.
- (1999) Computer Speech & Language , vol.13 , Issue.4 , pp. 359-393
- Chen, S.F.¹ Goodman, J.²

24
- 0012611072
- Entropy-based pruning of backoff language models
- A. Stolcke, "Entropy-based pruning of backoff language models," in Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 270-274.
- (1998) Proc. DARPA Broadcast News Transcription and Understanding Workshop , pp. 270-274
- Stolcke, A.¹

25
- 84863387613
- Shrinking exponential language models
- S. F. Chen, "Shrinking exponential language models," in Proc. NAACL-HLT, 2009, pp. 468-476.
- (2009) Proc. NAACL-HLT , pp. 468-476
- Chen, S.F.¹

26
- 0142166851
- A neural probabilistic language model
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
- (2003) Journal of Machine Learning Research , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Jauvin, C.⁴

27
- 85055309630
- Ph.D. dissertation, Johns Hopkins University, Baltimore, MD, USA
- A. Emami, "A neural syntactic language model," Ph.D. dissertation, Johns Hopkins University, Baltimore, MD, USA, 2006.
- (2006) A Neural Syntactic Language Model
- Emami, A.¹

28
- 33847610331
- Continuous space language models
- H. Schwenk, "Continuous space language models," Computer Speech & Language, vol. 21, no. 3, pp. 492-518, 2007.
- (2007) Computer Speech & Language , vol.21 , Issue.3 , pp. 492-518
- Schwenk, H.¹

29
- 44849092930
- Empirical study of neural network language models for Arabic speech recognition
- A. Emami and L. Mangu, "Empirical study of neural network language models for Arabic speech recognition," in Proc. ASRU, 2007, pp. 147-152.
- (2007) Proc. ASRU , pp. 147-152
- Emami, A.¹ Mangu, L.²

30
- 84878422162
- Large scale hierarchical neural network language models, language models
- H.-K. J. Kuo, E. Arisoy, A. Emami, and P. Vozila, "Large scale hierarchical neural network language models," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Kuo, H.-K.J.¹ Arisoy, E.² Emami, A.³ Vozila, P.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.