메뉴 건너뛰기




Volumn , Issue , 2016, Pages 539-546

JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

Author keywords

far field speech recognition; iVectors; recurrent neural network language models; time delay neural networks

Indexed keywords

COMPUTATIONAL LINGUISTICS; FEEDFORWARD NEURAL NETWORKS; NEURAL NETWORKS; PROGRAM PROCESSORS; RECURRENT NEURAL NETWORKS; REVERBERATION; SPEECH; TIME DELAY;

EID: 84964483822     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ASRU.2015.7404842     Document Type: Conference Paper
Times cited : (115)

References (35)
  • 1
    • 85032751613 scopus 로고    scopus 로고
    • Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
    • Nov
    • T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, "Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition," Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 114-126, Nov 2012
    • (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 114-126
    • Yoshioka, T.1    Sehr, A.2    Delcroix, M.3    Kinoshita, K.4    Maas, R.5    Nakatani, T.6    Kellermann, W.7
  • 3
    • 84959115289 scopus 로고    scopus 로고
    • A time delay neural network architecture for efficient modeling of long temporal contexts
    • V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Proceedings of INTERSPEECH, 2015
    • (2015) Proceedings of INTERSPEECH
    • Peddinti, V.1    Povey, D.2    Khudanpur, S.3
  • 7
    • 84928158251 scopus 로고    scopus 로고
    • Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition
    • M. J. Alam, V. Gupta, P. Kenny, and P. Dumouchel, "Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition," Proc. of IEEE REVERB Workshop, 2014
    • (2014) Proc. of IEEE REVERB Workshop
    • Alam, M.J.1    Gupta, V.2    Kenny, P.3    Dumouchel, P.4
  • 9
    • 84959118000 scopus 로고    scopus 로고
    • The fisher corpus: A resource for the next generations of speech-to-text
    • C. Cieri, D. Miller, and K. Walker, "The fisher corpus a resource for the next generations of speech-to-text." in LREC, vol. 4, 2004, pp. 69-71
    • (2004) LREC , vol.4 , pp. 69-71
    • Cieri, C.1    Miller, D.2    Walker, K.3
  • 10
    • 85083954109 scopus 로고    scopus 로고
    • Parallel training of deep neural networks with natural gradient and parameter averaging
    • D. Povey, X. Zhang, and S. Khudanpur, "Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging," in Proceedings of the ICLR Workshop, 2015
    • (2015) Proceedings of the ICLR Workshop
    • Povey, D.1    Zhang, X.2    Khudanpur, S.3
  • 11
    • 0019053271 scopus 로고
    • Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
    • S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980
    • (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
    • Davis, S.B.1    Mermelstein, P.2
  • 13
    • 84959075954 scopus 로고    scopus 로고
    • Reverberation robust acoustic modeling using i-vectors with time delay neural networks
    • V. Peddinti, G. Chen, D. Povey, and S. Khudanpur, "Reverberation robust acoustic modeling using i-vectors with time delay neural networks," in Proceedings of Interspeech, 2015
    • (2015) Proceedings of Interspeech
    • Peddinti, V.1    Chen, G.2    Povey, D.3    Khudanpur, S.4
  • 15
    • 78049391669 scopus 로고    scopus 로고
    • Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition
    • S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, and T. Yamada, "Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition." in LREC, 2000
    • (2000) LREC
    • Nakamura, S.1    Hiyane, K.2    Asano, F.3    Nishiura, T.4    Yamada, T.5
  • 19
    • 78649514658 scopus 로고    scopus 로고
    • Robust speech/non-speech classification in heterogeneous multimedia content
    • M. Huijbregts and F. de Jong, "Robust speech/non-speech classification in heterogeneous multimedia content," Speech Communication, vol. 53, no. 2, pp. 143-153, 2011. [Online]. Available: http://www:sciencedirect:com/science/ article/pii/S0167639310001421
    • (2011) Speech Communication , vol.53 , Issue.2 , pp. 143-153
    • Huijbregts, M.1    De Jong, F.2
  • 22
    • 84959176266 scopus 로고    scopus 로고
    • Semi-supervised maximum mutual information training of deep neural network acoustic models
    • V. Manohar, D. Povey, and S. Khudanpur, "Semi-supervised maximum mutual information training of deep neural network acoustic models," in Proceedings of INTERSPEECH, 2015
    • (2015) Proceedings of INTERSPEECH
    • Manohar, V.1    Povey, D.2    Khudanpur, S.3
  • 24
    • 84959082206 scopus 로고    scopus 로고
    • Dual system combination approach for various reverberant environments with dereverberation techniques
    • Y. Tachioka, T. Narita, F. Weninger, and S. Watanabe, "Dual system combination approach for various reverberant environments with dereverberation techniques," in Proc. of IEEE REVERB Workshop, 2014
    • (2014) Proc. of IEEE REVERB Workshop
    • Tachioka, Y.1    Narita, T.2    Weninger, F.3    Watanabe, S.4
  • 26
    • 19944415893 scopus 로고    scopus 로고
    • Implicit modelling of pronunciation variation in automatic speech recognition
    • T. Hain, "Implicit modelling of pronunciation variation in automatic speech recognition," Speech Communication, vol. 46, no. 2, pp. 171-188, 2005
    • (2005) Speech Communication , vol.46 , Issue.2 , pp. 171-188
    • Hain, T.1
  • 30
    • 84891308106 scopus 로고    scopus 로고
    • SRILM-an extensible language modeling toolkit
    • A. Stolcke et al., "SRILM-an extensible language modeling toolkit." in Proceedings of INTERSPEECH, 2002
    • (2002) Proceedings of INTERSPEECH
    • Stolcke, A.1
  • 34
    • 0003465475 scopus 로고
    • Learning internal representations by error propagation
    • Tech. Rep
    • D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," DTIC Document, Tech. Rep., 1985
    • (1985) DTIC Document
    • Rumelhart, D.E.1    Hinton, G.E.2    Williams, R.J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.