SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 3214-3218

A time delay neural network architecture for efficient modeling of long temporal contexts

(3) Peddinti, Vijayaditya a Povey, Daniel a Khudanpur, Sanjeev a

a Whiting School of Engineering (United States)

Author keywords

Acoustic modeling; Recurrent neural networks; Time delay neural networks

Indexed keywords

FEEDFORWARD NEURAL NETWORKS; LEARNING ALGORITHMS; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION; TIME DELAY;

ACOUSTIC EVENTS; ACOUSTIC MODEL; FEED-FORWARD NETWORK; RECURRENT NETWORKS; SUB-SAMPLING; TIME DELAY NEURAL NETWORKS; TRAINING DATA; TRAINING TIME;

NETWORK ARCHITECTURE;

EID: 84959115289 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1131)

References (32)

1
- 84908677215
- Feb
- H. Sak, A. Senior, and F. Beaufays, "Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition, " Feb. 2014. [Online]. Available: http: //arxiv. org/abs/1402. 1128
- (2014) Long Short-term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
- Sak, H.¹ Senior, A.² Beaufays, F.³

2
- 0024634603
- Phoneme recognition using time-delay neural networks
- Mar
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks, " IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, Mar. 1989.
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.⁵

3
- 84935413199
- Modular construction of time-delay neural networks for speech recognition
- A. Waibel, "Modular construction of time-delay neural networks for speech recognition, " Neural computation, vol. 1, no. 1, pp. 39-46, 1989.
- (1989) Neural Computation , vol.1 , Issue.1 , pp. 39-46
- Waibel, A.¹

4
- 51449120120
- Boosted MMI for model and feature-space discriminative training
- D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah, "Boosted mmi for model and feature-space discriminative training, " in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2008, pp. 4057-4060.
- (2008) Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

5
- 84921731072
- Fast adaptation of deep neural network based on discriminant codes for speech recognition
- S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q.-F. Liu, "Fast Adaptation of Deep Neural Network based on Discriminant Codes for Speech Recognition, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, no. 99, pp. 1-1, 2014.
- (2014) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.PP , Issue.99 , pp. 1
- Xue, S.¹ Abdel-Hamid, O.² Jiang, H.³ Dai, L.⁴ Liu, Q.-F.⁵

6
- 84858984756
- IVector-based discriminative adaptation for automatic speech recognition
- IEEE, Dec
- M. Karafiat, L. Burget, P. Matejka, O. Glembek, and J. Cernocky, "iVector-based discriminative adaptation for automatic speech recognition, " in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE, Dec. 2011, pp. 152-157.
- (2011) 2011 IEEE Workshop on Automatic Speech Recognition & Understanding , pp. 152-157
- Karafiat, M.¹ Burget, L.² Matejka, P.³ Glembek, O.⁴ Cernocky, J.⁵

7
- 85016587886
- Switchboard: Telephone speech corpus for research and development
- Mar
- J. Godfrey, E. Holliman, and J. McDaniel, "Switchboard: telephone speech corpus for research and development, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Mar 1992, pp. 517-520 vol. 1.
- (1992) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 517-520
- Godfrey, J.¹ Holliman, E.² McDaniel, J.³

8
- 0032658253
- Temporal patterns (traps) in asr of noisy speech
- Mar
- H. Hermansky and S. Sharma, "Temporal patterns (traps) in asr of noisy speech, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Mar 1999, pp. 289-292 vol. 1.
- (1999) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 289-292
- Hermansky, H.¹ Sharma, S.²

9
- 4544303183
- Speech discrimination based on multiscale spectro-temporal modulations
- May
- N. Mesgarani, S. Shamma, and M. Slaney, "Speech discrimination based on multiscale spectro-temporal modulations, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, May 2004, pp. I-601-4 vol. 1.
- (2004) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. I601-I604
- Mesgarani, N.¹ Shamma, S.² Slaney, M.³

10
- 84896734479
- Deep scattering spectrum
- Aug
- J. Andén and S. Mallat, "Deep scattering spectrum, " Signal Processing, IEEE Transactions on, vol. 62, no. 16, pp. 4114-4128, Aug 2014.
- (2014) Signal Processing, IEEE Transactions on , vol.62 , Issue.16 , pp. 4114-4128
- Andén, J.¹ Mallat, S.²

11
- 70349212558
- Phoneme recognition using spectral envelope and modulation frequency features
- April
- S. Thomas, S. Ganapathy, and H. Hermansky, "Phoneme recognition using spectral envelope and modulation frequency features, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2009, pp. 4453-4456.
- (2009) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4453-4456
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

12
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks, " in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2013, pp. 6645-6649.
- (2013) Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

13
- 84910072497
- Unfolded recurrent neural networks for speech recognition
- G. Saon, H. Soltau, A. Emami, and M. Picheny, "Unfolded recurrent neural networks for speech recognition, " in Proceedings of INTERSPEECH, 2014.
- (2014) Proceedings of INTERSPEECH
- Saon, G.¹ Soltau, H.² Emami, A.³ Picheny, M.⁴

14
- 84905248329
- Adaptation of multilingual stacked bottle-neck neural network structure for new language
- May
- F. Grezl, M. Karafiat, and K. Vesely, "Adaptation of multilingual stacked bottle-neck neural network structure for new language, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2014, pp. 7654-7658.
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 7654-7658
- Grezl, F.¹ Karafiat, M.² Vesely, K.³

15
- 84905239342
- Improving deep neural network acoustic models using generalized maxout networks
- IEEE, May
- X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving deep neural network acoustic models using generalized maxout networks, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, May 2014, pp. 215-219.
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). , pp. 215-219
- Zhang, X.¹ Trmal, J.² Povey, D.³ Khudanpur, S.⁴

16
- 0019053271
- Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
- S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

17
- 84944790942
- CoRR abs/1410. 7455
- D. Povey, X. Zhang, and S. Khudanpur, "Parallel training of deep neural networks with natural gradient and parameter averaging, " CoRR, vol. abs/1410. 7455, 2014. [Online]. Available: http: //arxiv. org/abs/1410. 7455
- (2014) Parallel Training of Deep Neural Networks with Natural Gradient and Parameter Averaging
- Povey, D.¹ Zhang, X.² Khudanpur, S.³

18
- 79951609039
- Front-end factor analysis for speaker verification
- May
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, May 2011.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

19
- 77955810266
- Ph. D. dissertation, University of Sheffield
- M. Gibson, "Minimum bayes risk acoustic model estimation and adaptation, " Ph. D. dissertation, University of Sheffield, 2008.
- (2008) Minimum Bayes Risk Acoustic Model Estimation and Adaptation
- Gibson, M.¹

20
- 84906274730
- Sequencediscriminative training of deep neural networks
- K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequencediscriminative training of deep neural networks. " in Proceedings of INTERSPEECH, 2013, pp. 2345-2349.
- (2013) Proceedings of INTERSPEECH , pp. 2345-2349
- Vesely, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

21
- 84976219984
- An i-vector based time delay neural network architecture for far field recognition
- V. Peddinti, G. Chen, D. Povey, and S. Khudanpur, "An i-vector based time delay neural network architecture for far field recognition, " in Proceedings of INTERSPEECH, 2015. [Online]. Available: http: //www. danielpovey. com/files/ 2015 interspeech aspire. pdf
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Chen, G.² Povey, D.³ Khudanpur, S.⁴

22
- 84959176266
- Semi-supervised maximum mutual information training of deep neural network acoustic models
- V. Manohar, D. Povey, and S. Khudanpur, "Semi-supervised maximum mutual information training of deep neural network acoustic models, " in Proceedings of INTERSPEECH, 2015. [Online]. Available: http: //www. danielpovey. com/files/ 2015 interspeech entropy. pdf
- (2015) Proceedings of INTERSPEECH
- Manohar, V.¹ Povey, D.² Khudanpur, S.³

23
- 84959118622
- Audio augmentation for speech recognition
- T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech recognition, " in Proceedings of INTERSPEECH, 2015. [Online]. Available: http: //www. danielpovey. com/files/2015 interspeech augmentation. pdf
- (2015) Proceedings of INTERSPEECH
- Ko, T.¹ Peddinti, V.² Povey, D.³ Khudanpur, S.⁴

24
- 84959101589
- Pronunciation and silence probability modeling for ASR
- G. Chen, H. Xu, M. Wu, D. Povey, and S. Khudanpur, "Pronunciation and silence probability modeling for ASR, " in Proceedings of INTERSPEECH, 2015. [Online]. Available: http: //www. danielpovey. com/files/2015 interspeech silprob. pdf
- (2015) Proceedings of INTERSPEECH
- Chen, G.¹ Xu, H.² Wu, M.³ Povey, D.⁴ Khudanpur, S.⁵

25
- 84905265980
- Joint training of convolutional and non-convolutional neural networks
- May
- H. Soltau, G. Saon, and T. Sainath, "Joint training of convolutional and non-convolutional neural networks, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2014, pp. 5572-5576.
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 5572-5576
- Soltau, H.¹ Saon, G.² Sainath, T.³

26
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011, pp. 24-29.
- (2011) Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop On. IEEE , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

27
- 84858953642
- The kaldi speech recognition toolkit
- IEEE Signal Processing Society
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The kaldi speech recognition toolkit, " in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
- (2011) IEEE 2011 Workshop on Automatic Speech Recognition and Understanding
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

28
- 0023776398
- The DARPA 1000-word resource management database for continuous speech recognition
- Apr
- P. Price, W. Fisher, J. Bernstein, and D. Pallett, "The darpa 1000-word resource management database for continuous speech recognition, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 1988, pp. 651-654 vol. 1.
- (1988) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 651-654
- Price, P.¹ Fisher, W.² Bernstein, J.³ Pallett, D.⁴

29
- 0012330750
- The design for the wall street journalbased csr corpus
- D. B. Paul and J. M. Baker, "The design for the wall street journalbased csr corpus, " in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357-362.
- (1992) Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

30
- 84946076428
- Ted-lium: An automatic speech recognition dedicated corpus
- A. Rousseau, P. Deléglise, and Y. Estève, "Ted-lium: An automatic speech recognition dedicated corpus. " in LREC, 2012, pp. 125-129.
- (2012) LREC , pp. 125-129
- Rousseau, A.¹ Deléglise, P.² Estève, Y.³

31
- 84946015916
- Librispeech: An ASR corpus based on public domain audio books
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.
- (2015) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
- Panayotov, V.¹ Chen, G.² Povey, D.³ Khudanpur, S.⁴

32
- 84959118000
- The fisher corpus: A resource for the next generations of speech-to-text
- C. Cieri, D. Miller, and K. Walker, "The fisher corpus: A resource for the next generations of speech-to-text. " in LREC, vol. 4, 2004, pp. 69-71.
- (2004) LREC , vol.4 , pp. 69-71
- Cieri, C.¹ Miller, D.² Walker, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.