SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4280-4284

Learning acoustic frame labeling for speech recognition with recurrent neural networks

(7) Sak, Hasim a Senior, Andrew a Rao, Kanishka a Irsoy, Ozan a Graves, Alex a Beaufays, Francoise a Schalkwyk, Johan a

a GOOGLE INC (United States)

Author keywords

acoustic modeling; CTC; LSTM; RNN

Indexed keywords

AUDIO SIGNAL PROCESSING; SPEECH COMMUNICATION; SPEECH RECOGNITION; TELEPHONE SETS;

ACOUSTIC MODEL; CONTEXT DEPENDENT; CONVENTIONAL APPROACH; DISCRIMINATIVE TRAINING; FINITE STATE TRANSDUCERS; LARGE VOCABULARY SPEECH RECOGNITION; LSTM; TEMPORAL CLASSIFICATION;

LONG SHORT-TERM MEMORY;

EID: 84946084790 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178778 Document Type: Conference Paper

Times cited : (229)

References (25)

1
- 0003573244
- Kluwer Academic Publishers
- H. Bourlard and N. Morgan, Connectionist speech recognition. Kluwer Academic Publishers, 1994
- (1994) Connectionist Speech Recognition
- Bourlard, H.¹ Morgan, N.²

2
- 84893701254
- Hybrid speech recognition with deep bidirectional LSTM
- A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEEWorkshop on. IEEE, 2013, pp. 273-278
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEEWorkshop On. IEEE , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.³

3
- 84910046405
- Long short-term memory recurrent neural network architectures for large scale acoustic modeling
- H. Sak, A. Senior, and F. Beaufays, Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling, in INTERSPEECH 2014, 2014
- (2014) INTERSPEECH 2014
- Sak, H.¹ Senior, A.² Beaufays, F.³

4
- 84910072094
- Sequence discriminative distributed training of long short-term memory recurrent neural networks
- H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, and M. Mao, Sequence discriminative distributed training of long short-term memory recurrent neural networks, in Interspeech, 2014
- (2014) Interspeech
- Sak, H.¹ Vinyals, O.² Heigold, G.³ Senior, A.⁴ McDermott, E.⁵ Monga, R.⁶ Mao, M.⁷

5
- 84908677215
- ArXiv e-prints, Feb
- H. Sak, A. Senior, and F. Beaufays, Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition, ArXiv e-prints, Feb. 2014
- (2014) Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
- Sak, H.¹ Senior, A.² Beaufays, F.³

6
- 33749259827
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 369-376
- (2006) Proceedings of the 23rd International Conference on Machine Learning. ACM , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

7
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, in Proceedings of ICASSP, 2013
- (2013) Proceedings of ICASSP
- Graves, A.¹ Mohamed, A.² Hinton, G.³

8
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, Conversational speech transcription using context-dependent deep neural networks, in INTERSPEECH, 2011, pp. 437-440
- (2011) INTERSPEECH , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

9
- 84905237729
- Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
- G. Dahl, D. Yu, and L. Deng, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011
- (2011) IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Dahl, G.¹ Yu, D.² Deng, L.³

10
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- Jan
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech &Language Processing, vol. 20, no. 1, pp. 30-42, Jan. 2012. [Online]. Available: http://dx.doi.org/10.1109/TASL.2011.2134090
- (2012) IEEE Transactions on Audio, Speech &Language Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

11
- 84878539964
- Application of pretrained deep neural networks to large vocabulary speech recognition
- N. Jaitly, P. Nguyen, A. Senior, and V. Vanhoucke, Application of pretrained deep neural networks to large vocabulary speech recognition, in INTERSPEECH, 2012
- (2012) INTERSPEECH
- Jaitly, N.¹ Nguyen, P.² Senior, A.³ Vanhoucke, V.⁴

12
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

13
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- Feb
- L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989
- (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

14
- 0003459132
- Ph.D. dissertation, McGill University, Montreal, Canada
- Y. Normandin, Hidden Markov models, maximum mutual information, and the speech recognition problem, Ph.D. dissertation, McGill University, Montreal, Canada, 1991
- (1991) Hidden Markov Models, Maximum Mutual Information, and the Speech Recognition Problem
- Normandin, Y.¹

15
- 4544265717
- Ph.D. dissertation, Cambridge, England
- D. Povey, Discriminative training for large vocabulary speech recognition, Ph.D. dissertation, Cambridge, England, 2004
- (2004) Discriminative Training for Large Vocabulary Speech Recognition
- Povey, D.¹

16
- 70349213445
- Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- Taipei, Taiwan, Apr
- B. Kingsbury, Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 3761-3764
- (2009) IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 3761-3764
- Kingsbury, B.¹

17
- 84878379108
- Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization
- B. Kingsbury, T. N. Sainath, and H. Soltau, Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization, in INTERSPEECH, 2012
- (2012) INTERSPEECH
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

18
- 84890543852
- Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription
- H. Su, G. Li, D. Yu, and F. Seide, Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 6664-6668
- (2013) IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 6664-6668
- Su, H.¹ Li, G.² Yu, D.³ Seide, F.⁴

19
- 84906274730
- Sequencediscriminative training of deep neural networks
- K. Veselý, A. Ghoshal, L. Burget, and D. Povey, Sequencediscriminative training of deep neural networks, in INTERSPEECH, 2013
- (2013) INTERSPEECH
- Veselý, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

20
- 80051640064
- Ph.D. dissertation, RWTH Aachen University, Aachen, Germany, Jun
- G. Heigold, A log-linear discriminative modeling framework for speech recognition, Ph.D. dissertation, RWTH Aachen University, Aachen, Germany, Jun. 2010
- (2010) A Log-linear Discriminative Modeling Framework for Speech Recognition
- Heigold, G.¹

21
- 0031268931
- Bidirectional recurrent neural networks
- M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp. 2673-2681, 1997
- (1997) Signal Processing, IEEE Transactions on , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

22
- 77949404053
- From speech to letters using a novel neural network architecture for grapheme based ASR
- F. Eyben, M. Wollmer, B. Schuller, and A. Graves, From speech to letters using a novel neural network architecture for grapheme based ASR, in Automatic Speech Recognition &Understanding, 2009. ASRU 2009. IEEE Workshop on. IEEE, 2009, pp. 376-380
- (2009) Automatic Speech Recognition &Understanding, 2009. ASRU 2009. IEEE Workshop On. IEEE , pp. 376-380
- Eyben, F.¹ Wollmer, M.² Schuller, B.³ Graves, A.⁴

23
- 84867135575
- Building high-level features using large scale unsupervised learning
- Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, Building high-level features using large scale unsupervised learning, in International Conference on Machine Learning, 2012, pp. 81-88
- (2012) International Conference on Machine Learning , pp. 81-88
- Le, Q.¹ Ranzato, M.² Monga, R.³ Devin, M.⁴ Chen, K.⁵ Corrado, G.⁶ Dean, J.⁷ Ng, A.⁸

24
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, Large scale distributed deep networks, in Advances in Neural Information Processing Systems (NIPS), 2012
- (2012) Advances in Neural Information Processing Systems (NIPS)
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Le, Q.⁶ Mao, M.⁷ Ranzato, M.⁸ Senior, A.⁹ Tucker, P.¹⁰ Yang, K.¹¹ Ng, A.¹²

25
- 84890539009
- Multilingual acoustic models using distributed deep neural networks
- Vancouver, Canada, Apr
- G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean, Multilingual acoustic models using distributed deep neural networks, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Vancouver, Canada, Apr. 2013
- (2013) IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , vol.1
- Heigold, G.¹ Vanhoucke, V.² Senior, A.³ Nguyen, P.⁴ Ranzato, M.⁵ Devin, M.⁶ Dean, J.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.