SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 2751-2755

Purely sequence-trained neural networks for ASR based on lattice-free MMI

(8) Povey, Daniel a,b Peddinti, Vijayaditya a Galvez, Daniel c Ghahremani, Pegah a Manohar, Vimal a Na, Xingyu d Wang, Yiming a Khudanpur, Sanjeev a,b

a Johns Hopkins University (United States)

b JOHNS HOPKINS UNIVERSITY (United States)

c Department of Obstetrics Gynecology (United States)

d Lele Innovation and Intelligence Technology (Beijing) Co (China)

Author keywords

Neural networks; Sequence discriminative training

Indexed keywords

COMPUTATIONAL LINGUISTICS; ENTROPY; IMAGE CODING; NEURAL NETWORKS; PROGRAM PROCESSORS; SPEECH COMMUNICATION; SPEECH PROCESSING; SPEECH RECOGNITION; STEREOPHONIC BROADCASTING;

DISCRIMINATIVE TRAINING; FORWARD / BACKWARD ALGORITHMS; MAXIMUM MUTUAL INFORMATION; N-GRAM LANGUAGE MODELS; OBJECTIVE FUNCTIONS; SPACE AND TIME COMPLEXITY; TRAINED NEURAL NETWORKS; WORD ERROR RATE REDUCTIONS;

COMPLEX NETWORKS;

EID: 84994310412 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-595 Document Type: Conference Paper

Times cited : (1006)

References (23)

1
- 70349213445
- Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- apr
- B. Kingsbury, "Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling," in Proceedings of ICASSP. IEEE, apr 2009, pp. 3761-3764.
- (2009) Proceedings of ICASSP. IEEE , pp. 3761-3764
- Kingsbury, B.¹

2
- 84906274730
- Sequencediscriminative training of deep neural networks
- K. Veselý, A. Ghoshal, L. Burget, and D. Povey, "Sequencediscriminative training of deep neural networks." in Proceedings of INTERSPEECH, 2013, pp. 2345-2349.
- (2013) Proceedings of INTERSPEECH , pp. 2345-2349
- Veselý, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

3
- 84890543852
- Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription
- may
- H. Su, G. Li, D. Yu, and F. Seide, "Error back propagation for sequence training of Context-Dependent Deep Networks for conversational speech transcription," in Proceedings of ICASSP. IEEE, may 2013, pp. 6664-6668.
- (2013) Proceedings of ICASSP. IEEE , pp. 6664-6668
- Su, H.¹ Li, G.² Yu, D.³ Seide, F.⁴

4
- 4544265717
- Ph.D. dissertation, Cambridge University
- D. Povey, "Discriminative training for large vocabulary speech recognition," Ph.D. dissertation, Cambridge University, 2005.
- (2005) Discriminative Training for Large Vocabulary Speech Recognition
- Povey, D.¹

5
- 33749259827
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- ACM
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 369-376.
- (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

6
- 84946084790
- Learning acoustic frame labeling for speech recognition with recurrent neural networks
- H. Sak, A. Senior, K. Rao, O. Irsoy, A. Graves, F. Beaufays, and J. Schalkwyk, "Learning acoustic frame labeling for speech recognition with recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4280-4284.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. IEEE , pp. 4280-4284
- Sak, H.¹ Senior, A.² Rao, K.³ Irsoy, O.⁴ Graves, A.⁵ Beaufays, F.⁶ Schalkwyk, J.⁷

7
- 84928545733
- arXiv preprint arXiv:1412.5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., "Deep speech: Scaling up end-to-end speech recognition," arXiv preprint arXiv:1412.5567, 2014.
- (2014) Deep Speech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷ Satheesh, S.⁸ Sengupta, S.⁹ Coates, A.¹⁰

8
- 84959112739
- Fast and accurate recurrent neural network acoustic models for speech recognition
- H. Sak, A. Senior, K. Rao, and F. Beaufays, "Fast and accurate recurrent neural network acoustic models for speech recognition," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
- (2015) Sixteenth Annual Conference of the International Speech Communication Association
- Sak, H.¹ Senior, A.² Rao, K.³ Beaufays, F.⁴

9
- 84994286302
- Acoustic modelling with cd-ctc-smbr lstm rnns
- A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath, and K. Rao, "Acoustic Modelling with CD-CTC-SMBR LSTM RNNS," in ASRU, 2015.
- (2015) ASRU
- Senior, A.¹ Sak, H.² Quitry Chaumont, F.D.³ Sainath, T.N.⁴ Rao, K.⁵

10
- 34047266376
- Advances in speech transcription at IBM under the DARPA EARS program
- sep
- S. Chen, B. Kingsbury, Lidia Mangu, D. Povey, G. Saon, H. Soltau, and G. Zweig, "Advances in speech transcription at IBM under the DARPA EARS program," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1596-1608, sep 2006.
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.5 , pp. 1596-1608
- Chen, S.¹ Kingsbury, B.² Mangu, L.³ Povey, D.⁴ Saon, G.⁵ Soltau, H.⁶ Zweig, G.⁷

11
- 85075929453
- Speech recognition with weighted finite-state transducers
- M. Mohri, F. Pereira, and M. Riley, "Speech recognition with weighted finite-state transducers," in Springer Handbook of Speech Processing. Springer, 2008, pp. 559-584.
- (2008) Springer Handbook of Speech Processing. Springer , pp. 559-584
- Mohri, M.¹ Pereira, F.² Riley, M.³

12
- 0002197352
- An n log n algorithm for minimizing states in a finite automaton
- J. Hopcroft, "An n log n algorithm for minimizing states in a finite automaton," Theory of Machines and Computations, pp. 189-196, 1971.
- (1971) Theory of Machines and Computations , pp. 189-196
- Hopcroft, J.¹

13
- 84867616340
- Generating exact lattices in the wfst framework
- D. Povey, M. Hannemann, G. Boulianne, L. Burget, A. Ghoshal, M. Janda, M. Karafiát, S. Kombrink, P. Motlicek, Y. Qian et al., "Generating exact lattices in the wfst framework," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 4213-4216.
- (2012) Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference On. IEEE , pp. 4213-4216
- Povey, D.¹ Hannemann, M.² Boulianne, G.³ Burget, L.⁴ Ghoshal, A.⁵ Janda, M.⁶ Karafiát, M.⁷ Kombrink, S.⁸ Motlicek, P.⁹ Qian, Y.¹⁰

14
- 84959115289
- A time delay neural network architecture for efficient modeling of long temporal contexts
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Povey, D.² Khudanpur, S.³

15
- 84994327921
- (2016) Readme file in code repository to replicate the experiments in this paper. [Online]. Available: https://github.com/vijayaditya/kaldi/blob/chain paper results/egs/README paper results
- (2016) Readme File in Code Repository to Replicate the Experiments in This Paper

16
- 84959118622
- Audio augmentation for speech recognition
- T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech recognition," in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Ko, T.¹ Peddinti, V.² Povey, D.³ Khudanpur, S.⁴

17
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- Dec
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors," in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, Dec 2013, pp. 55-59.
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

18
- 84959101589
- Pronunciation and silence probability modeling for ASR
- G. Chen, H. Xu, M. Wu, D. Povey, and S. Khudanpur, "Pronunciation and silence probability modeling for ASR," in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Chen, G.¹ Xu, H.² Wu, M.³ Povey, D.⁴ Khudanpur, S.⁵

19
- 84908677215
- Feb
- H. Sak, A. Senior, and F. Beaufays, "Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition," Feb. 2014. [Online]. Available: http://arxiv.org/abs/1402.1128
- (2014) Long Short-term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
- Sak, H.¹ Senior, A.² Beaufays, F.³

20
- 84946076428
- Ted-lium: An automatic speech recognition dedicated corpus
- A. Rousseau, P. Deléglise, and Y. Esteve, "Ted-lium: an automatic speech recognition dedicated corpus." in LREC, 2012, pp. 125-129.
- (2012) LREC , pp. 125-129
- Rousseau, A.¹ Deléglise, Y.² Esteve, P.³

21
- 84994300353
- Far-field ASR without parallel data
- V. Peddinti, V. Manohar, Y. Wang, D. Povey, and S. Khudanpur, "Far-field ASR without parallel data," in Proceedings of Interspeech, 2016. [Online]. Available: http://www.danielpovey.com/files/2016 interspeech ami.pdf
- (2016) Proceedings of Interspeech
- Peddinti, V.¹ Manohar, V.² Wang, Y.³ Povey, D.⁴ Khudanpur, S.⁵

22
- 84964507635
- Deep bi-directional recurrent networks over spectral windows
- ASRU
- A.-r. Mohamed, F. Seide, D. Yu, J. Droppo, A. Stolcke, G. Zweig, and G. Penn, "Deep bi-directional recurrent networks over spectral windows," in Proceedings of ASRU. ASRU, 2015.
- (2015) Proceedings of ASRU
- Mohamed, A.-R.¹ Seide, F.² Yu, D.³ Droppo, J.⁴ Stolcke, A.⁵ Zweig, G.⁶ Penn, G.⁷

23
- 84959129849
- may
- G. Saon, H.-K. J. Kuo, S. Rennie, and M. Picheny, "The IBM 2015 English Conversational Telephone Speech Recognition System," may 2015. [Online]. Available: http://arxiv.org/abs/1505.05899
- (2015) The IBM 2015 English Conversational Telephone Speech Recognition System
- Saon, G.¹ Kuo, H.-K.J.² Rennie, S.³ Picheny, M.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.