SCOPUS 정보 검색 플랫폼

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Volumn , Issue , 2016, Pages 539-546

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

(6) Peddinti, Vijayaditya a Chen, Guoguo a Manohar, Vimal a Ko, Tom b Povey, Daniel a Khudanpur, Sanjeev a

a Whiting School of Engineering (United States)

b HUAWEI NOAH S ARK LAB (China)

Author keywords

far field speech recognition; iVectors; recurrent neural network language models; time delay neural networks

Indexed keywords

COMPUTATIONAL LINGUISTICS; FEEDFORWARD NEURAL NETWORKS; NEURAL NETWORKS; PROGRAM PROCESSORS; RECURRENT NEURAL NETWORKS; REVERBERATION; SPEECH; TIME DELAY;

DISTRIBUTED OPTIMIZATION; ENVIRONMENT ADAPTATION; FAR FIELD; I-VECTORS; LONG-TERM INTERACTION; REVERBERANT ENVIRONMENT; STATE OF THE ART; TIME DELAY NEURAL NETWORKS;

SPEECH RECOGNITION;

EID: 84964483822 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2015.7404842 Document Type: Conference Paper

Times cited : (115)

References (35)

1
- 85032751613
- Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
- Nov
- T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, "Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition," Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 114-126, Nov 2012
- (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 114-126
- Yoshioka, T.¹ Sehr, A.² Delcroix, M.³ Kinoshita, K.⁴ Maas, R.⁵ Nakatani, T.⁶ Kellermann, W.⁷

2
- 0024634603
- Phoneme recognition using time-delay neural networks
- Mar
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, Mar. 1989
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.⁵

3
- 84959115289
- A time delay neural network architecture for efficient modeling of long temporal contexts
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proceedings of INTERSPEECH, 2015
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Povey, D.² Khudanpur, S.³

4
- 84921731072
- Fast adaptation of deep neural network based on discriminant codes for speech recognition
- S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q.-F. Liu, "Fast Adaptation of Deep Neural Network based on Discriminant Codes for Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, no. 99, pp. 1-1, 2014
- (2014) IEEE/ACM Transactions on Audio, Speech, and Language Processing , Issue.99 , pp. 1
- Xue, S.¹ Abdel-Hamid, O.² Jiang, H.³ Dai, L.⁴ Liu, Q.-F.⁵

5
- 84858984756
- IVector-based discriminative adaptation for automatic speech recognition
- Dec
- M. Karafiat, L. Burget, P. Matejka, O. Glembek, and J. Cernocky, "iVector-based discriminative adaptation for automatic speech recognition," in 2011 IEEE Workshop on Automatic Speech Recognition &Understanding. IEEE, Dec. 2011, pp. 152-157
- (2011) 2011 IEEE Workshop on Automatic Speech Recognition &Understanding. IEEE , pp. 152-157
- Karafiat, M.¹ Burget, L.² Matejka, P.³ Glembek, O.⁴ Cernocky, J.⁵

6
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- Dec
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors," in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, Dec 2013, pp. 55-59
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

7
- 84928158251
- Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition
- M. J. Alam, V. Gupta, P. Kenny, and P. Dumouchel, "Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition," Proc. of IEEE REVERB Workshop, 2014
- (2014) Proc. of IEEE REVERB Workshop
- Alam, M.J.¹ Gupta, V.² Kenny, P.³ Dumouchel, P.⁴

8
- 84959081447
- (accessed March 23 , 2015)
- Automatic Speech recognition In Reverberant Environments (ASpIRE) Challenge, 2015 (accessed March 23 , 2015), http://www:iarpa:gov/index:php/working-with-iarpa/ prize-challenges/306-automatic-speech-in-reverberantenvironments-aspire-challenge
- (2015) Automatic Speech Recognition in Reverberant Environments (ASpIRE) Challenge

9
- 84959118000
- The fisher corpus: A resource for the next generations of speech-to-text
- C. Cieri, D. Miller, and K. Walker, "The fisher corpus a resource for the next generations of speech-to-text." in LREC, vol. 4, 2004, pp. 69-71
- (2004) LREC , vol.4 , pp. 69-71
- Cieri, C.¹ Miller, D.² Walker, K.³

10
- 85083954109
- Parallel training of deep neural networks with natural gradient and parameter averaging
- D. Povey, X. Zhang, and S. Khudanpur, "Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging," in Proceedings of the ICLR Workshop, 2015
- (2015) Proceedings of the ICLR Workshop
- Povey, D.¹ Zhang, X.² Khudanpur, S.³

11
- 0019053271
- Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
- S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

12
- 79951609039
- Front-end factor analysis for speaker verification
- May
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, May 2011
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

13
- 84959075954
- Reverberation robust acoustic modeling using i-vectors with time delay neural networks
- V. Peddinti, G. Chen, D. Povey, and S. Khudanpur, "Reverberation robust acoustic modeling using i-vectors with time delay neural networks," in Proceedings of Interspeech, 2015
- (2015) Proceedings of Interspeech
- Peddinti, V.¹ Chen, G.² Povey, D.³ Khudanpur, S.⁴

14
- 84905239342
- Improving deep neural network acoustic models using generalized maxout networks
- May
- X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving deep neural network acoustic models using generalized maxout networks," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, May 2014, pp. 215-219
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE , pp. 215-219
- Zhang, X.¹ Trmal, J.² Povey, D.³ Khudanpur, S.⁴

15
- 78049391669
- Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition
- S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, and T. Yamada, "Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition." in LREC, 2000
- (2000) LREC
- Nakamura, S.¹ Hiyane, K.² Asano, F.³ Nishiura, T.⁴ Yamada, T.⁵

16
- 84893622444
- The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech
- K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, A. Sehr, W. Kellermann, and R. Maas, "The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech," in Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on. IEEE, 2013, pp. 1-4
- (2013) Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop On. IEEE , pp. 1-4
- Kinoshita, K.¹ Delcroix, M.² Yoshioka, T.³ Nakatani, T.⁴ Sehr, A.⁵ Kellermann, W.⁶ Maas, R.⁷

17
- 70449564650
- A binaural room impulse response database for the evaluation of dereverberation algorithms
- M. Jeub, M. Schafer, and P. Vary, "A binaural room impulse response database for the evaluation of dereverberation algorithms," in Digital Signal Processing, 2009 16th International Conference on. IEEE, 2009, pp. 1-5
- (2009) Digital Signal Processing, 2009 16th International Conference On. IEEE , pp. 1-5
- Jeub, M.¹ Schafer, M.² Vary, P.³

18
- 84959118622
- Audio augmentation for speech recognition
- T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech recognition," in Proceedings of INTERSPEECH, 2015
- (2015) Proceedings of INTERSPEECH
- Ko, T.¹ Peddinti, V.² Povey, D.³ Khudanpur, S.⁴

19
- 78649514658
- Robust speech/non-speech classification in heterogeneous multimedia content
- M. Huijbregts and F. de Jong, "Robust speech/non-speech classification in heterogeneous multimedia content," Speech Communication, vol. 53, no. 2, pp. 143-153, 2011. [Online]. Available: http://www:sciencedirect:com/science/ article/pii/S0167639310001421
- (2011) Speech Communication , vol.53 , Issue.2 , pp. 143-153
- Huijbregts, M.¹ De Jong, F.²

20
- 77955810266
- Ph.D. dissertation, University of Sheffield
- M. Gibson, "Minimum bayes risk acoustic model estimation and adaptation," Ph.D. dissertation, University of Sheffield, 2008
- (2008) Minimum Bayes Risk Acoustic Model Estimation and Adaptation
- Gibson, M.¹

21
- 84906274730
- Sequencediscriminative training of deep neural networks
- K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequencediscriminative training of deep neural networks." in Proceedings of INTERSPEECH, 2013, pp. 2345-2349
- (2013) Proceedings of INTERSPEECH , pp. 2345-2349
- Vesely, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

22
- 84959176266
- Semi-supervised maximum mutual information training of deep neural network acoustic models
- V. Manohar, D. Povey, and S. Khudanpur, "Semi-supervised maximum mutual information training of deep neural network acoustic models," in Proceedings of INTERSPEECH, 2015
- (2015) Proceedings of INTERSPEECH
- Manohar, V.¹ Povey, D.² Khudanpur, S.³

23
- 84933559263
- Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the reverb challenge
- M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, T. Hori, T. Nakatani et al., "Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the reverb challenge," in Proc. IEEE REVERB Workshop, 2014
- (2014) Proc. IEEE REVERB Workshop
- Delcroix, M.¹ Yoshioka, T.² Ogawa, A.³ Kubo, Y.⁴ Fujimoto, M.⁵ Ito, N.⁶ Kinoshita, K.⁷ Espi, M.⁸ Hori, T.⁹ Nakatani, T.¹⁰

24
- 84959082206
- Dual system combination approach for various reverberant environments with dereverberation techniques
- Y. Tachioka, T. Narita, F. Weninger, and S. Watanabe, "Dual system combination approach for various reverberant environments with dereverberation techniques," in Proc. of IEEE REVERB Workshop, 2014
- (2014) Proc. of IEEE REVERB Workshop
- Tachioka, Y.¹ Narita, T.² Weninger, F.³ Watanabe, S.⁴

25
- 84959101589
- Pronunciation and silence probability modeling for ASR
- G. Chen, H. Xu, M. Wu, D. Povey, and S. Khudanpur, "Pronunciation and silence probability modeling for ASR," in Proceedings of INTERSPEECH, 2015
- (2015) Proceedings of INTERSPEECH
- Chen, G.¹ Xu, H.² Wu, M.³ Povey, D.⁴ Khudanpur, S.⁵

26
- 19944415893
- Implicit modelling of pronunciation variation in automatic speech recognition
- T. Hain, "Implicit modelling of pronunciation variation in automatic speech recognition," Speech Communication, vol. 46, no. 2, pp. 171-188, 2005
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 171-188
- Hain, T.¹

27
- 0032639915
- Improvements in recognition of conversational telephone speech
- IEEE
- B. Peskin, M. Newman, D. McAllaster, V. Nagesha, H. Richards, S. Wegmann, M. Hunt, and L. Gillick, "Improvements in recognition of conversational telephone speech," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1. IEEE, 1999, pp. 53-56
- (1999) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 53-56
- Peskin, B.¹ Newman, M.² McAllaster, D.³ Nagesha, V.⁴ Richards, H.⁵ Wegmann, S.⁶ Hunt, M.⁷ Gillick, L.⁸

28
- 0012236195
- The CUHTK March 2000 Hub5e transcription system
- T. Hain, P. Woodland, G. Evermann, and D. Povey, "The CUHTK March 2000 Hub5e transcription system," in Proceedings of Speech Transcription Workshop, vol. 1, 2000
- (2000) Proceedings of Speech Transcription Workshop , vol.1
- Hain, T.¹ Woodland, P.² Evermann, G.³ Povey, D.⁴

29
- 0043272135
- Automatic learning of word pronunciation from data
- E. Fosler, M. Weintraub, S. Wegmann, Y.-H. Kao, S. Khudanpur, C. Galles, and M. Saraclar, "Automatic learning of word pronunciation from data," in Proceedings of the International Conference on Spoken Language Processing, 1996
- (1996) Proceedings of the International Conference on Spoken Language Processing
- Fosler, E.¹ Weintraub, M.² Wegmann, S.³ Kao, Y.-H.⁴ Khudanpur, S.⁵ Galles, C.⁶ Saraclar, M.⁷

30
- 84891308106
- SRILM-an extensible language modeling toolkit
- A. Stolcke et al., "SRILM-an extensible language modeling toolkit." in Proceedings of INTERSPEECH, 2002
- (2002) Proceedings of INTERSPEECH
- Stolcke, A.¹

31
- 84946015916
- Librispeech: An ASR corpus based on public domain audio books
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech an ASR corpus based on public domain audio books," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015
- (2015) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
- Panayotov, V.¹ Chen, G.² Povey, D.³ Khudanpur, S.⁴

32
- 84901784231
- Rnnlm-recurrent neural network language modeling toolkit
- T. Mikolov, S. Kombrink, A. Deoras, L. Burget, and J. Cernocky, "Rnnlm-recurrent neural network language modeling toolkit," in Proc. of the 2011 ASRU Workshop, 2011, pp. 196-201
- (2011) Proc. of the 2011 ASRU Workshop , pp. 196-201
- Mikolov, T.¹ Kombrink, S.² Deoras, A.³ Burget, L.⁴ Cernocky, J.⁵

33
- 84858966958
- Strategies for training large scale neural network language models
- T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Černocky, "Strategies for training large scale neural network language models," in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011, pp. 196-201
- (2011) Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop On. IEEE , pp. 196-201
- Mikolov, T.¹ Deoras, A.² Povey, D.³ Burget, L.⁴ Černocky, J.⁵

34
- 0003465475
- Learning internal representations by error propagation
- Tech. Rep
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," DTIC Document, Tech. Rep., 1985
- (1985) DTIC Document
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

35
- 84905240726
- Efficient lattice rescoring using recurrent neural network language models
- X. Liu, Y. Wang, X. Chen, M. J. Gales, and P. C. Woodland, "Efficient lattice rescoring using recurrent neural network language models," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 4908-4912
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On. IEEE , pp. 4908-4912
- Liu, X.¹ Wang, Y.² Chen, X.³ Gales, M.J.⁴ Woodland, P.C.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.