SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2013, Pages 6744-6748

Revisiting hybrid and GMM-HMM system combination techniques

(3) Swietojanski, Pawel a Ghoshal, Arnab a Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

deep neural networks; hybrid; system combination; tandem; TED

Indexed keywords

DEEP NEURAL NETWORKS; HYBRID; SYSTEM COMBINATION; TANDEM; TED;

HIDDEN MARKOV MODELS; NEURAL NETWORKS;

SIGNAL PROCESSING;

EID: 84890492591 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2013.6638967 Document Type: Conference Paper

Times cited : (59)

References (37)

1
- 0030638031
- A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
- JG Fiscus, "A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)," in Proc. IEEE ASRU, 1997, pp. 347-352
- (1997) Proc. IEEE ASRU , pp. 347-352
- Fiscus, J.G.¹

2
- 85061808589
- Explicit word error minimization in n-best list rescoring
- A Stolcke, Y Konig, and M Weintraub, "Explicit word error minimization in n-best list rescoring.," in EUROSPEECH, 1997
- (1997) EUROSPEECH
- Stolcke, A.¹ Konig, Y.² Weintraub, M.³

3
- 0034296009
- Finding consensus in speech recognition: Word error minimization and other appli-cations of confusion networks
- L. Mangu, E. Brill, and A. Stolcke, "Finding consensus in speech recognition: word error minimization and other appli-cations of confusion networks," Computer Speech and Language, vol. 14, no. 4, pp. 373-400, 2000
- (2000) Computer Speech and Language , vol.14 , Issue.4 , pp. 373-400
- Mangu, L.¹ Brill, E.² Stolcke, A.³

4
- 0141477960
- Posterior probability decoding, confidence estimation and system combination
- G Evermann and PC Woodland, "Posterior probability decoding, confidence estimation and system combination," in Proc. NIST Speech Transcription Workshop, 2000
- (2000) Proc. NIST Speech Transcription Workshop
- Evermann, G.¹ Woodland, P.C.²

5
- 44949249226
- Generating complementary systems for speech recognition
- C Breslin and MJF Gales, "Generating complementary systems for speech recognition.," in INTERSPEECH, 2006
- (2006) INTERSPEECH
- Breslin, C.¹ Gales, M.²

6
- 58149202339
- Directed decision trees for generating complementary systems
- C. Breslin and M. J. F. Gales, "Directed decision trees for generating complementary systems," Speech Communication, vol. 51, no. 3, pp. 284-295, 2009
- (2009) Speech Communication , vol.51 , Issue.3 , pp. 284-295
- Breslin, C.¹ Gales, M.J.F.²

7
- 0009129790
- Adaptively growing hierarchical mixtures of experts
- J. Fritsch, M. Finke, and A. Waibel, "Adaptively growing hierarchical mixtures of experts," in Advances in Neural Information Processing Systems, 1997, pp. 459-465
- (1997) Advances in Neural Information Processing Systems , pp. 459-465
- Fritsch, J.¹ Finke, M.² Waibel, A.³

8
- 27744451514
- Product of gaussians for speech recognition
- MJF Gales and SS Airey, "Product of gaussians for speech recognition," Computer Speech and Language, vol. 20, 2006
- (2006) Computer Speech and Language , vol.20
- Gales, M.J.F.¹ Airey, S.S.²

9
- 34547548235
- Probabilistic and bottleneck features for LVCSR of meetings
- F. Grezl, M Karafiat, S. Kontar, and J. Cernokcy, "Probabilistic and bottleneck features for LVCSR of meetings," in Proc. IEEE ICASSP, 2007
- (2007) Proc. IEEE ICASSP
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Cernokcy, J.⁴

10
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H Hermansky, DPW Ellis, and S Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. IEEE ICASSP, 2000
- (2000) Proc. IEEE ICASSP
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

11
- 84874245054
- Transcription of multigenre media archives using out-of-domain data
- P. Bell, M. Gales, P. Lanchantin, X. Liu, Y. Long, S. Renals, P. Swietojanski, and P. Woodland, "Transcription of multigenre media archives using out-of-domain data," in Proc. IEEE Workshop on Spoken Language Technology, Miami, 2012
- (2012) Proc. IEEE Workshop on Spoken Language Technology, Miami
- Bell, P.¹ Gales, M.² Lanchantin, P.³ Liu, X.⁴ Long, Y.⁵ Renals, S.⁶ Swietojanski, P.⁷ Woodland, P.⁸

12
- 0003573244
- Kluwer Academic Publishers
- H. Bourlard and N. Morgan, Connectionist Speech Recognition: A Hybrid Approach, Kluwer Academic Publishers, 1994
- (1994) Connectionist Speech Recognition: A Hybrid Approach
- Bourlard, H.¹ Morgan, N.²

13
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- GE Dahl, D Yu, L Deng, and A Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Transactions on Audio, Speech &Language Processing, vol. 20, no. 1, pp. 30-42, 2012
- (2012) IEEE Transactions on Audio, Speech &Language Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

14
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- April
- MJF Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Computer Speech and Language, vol. 12, no. 2, pp. 75-98, April 1998
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.¹

15
- 85045373614
- Overview of the IWSLT 2012 evaluation campaign
- Hong Kong, HK, December
- M Federico, M. Cettolo, L. Bentivogli, M. Paul, and S. Stuker, "Overview of the IWSLT 2012 evaluation campaign," in Proc. of the International Workshop on Spoken Language Translation, Hong Kong, HK, December 2012
- (2012) Proc. of the International Workshop on Spoken Language Translation
- Federico, M.¹ Cettolo, M.² Bentivogli, L.³ Paul, M.⁴ Stuker, S.⁵

16
- 0028204660
- Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition
- jan
- C. Dugast, L. Devillers, and X. Aubert, "Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition," Speech and Audio Processing, IEEE Transactions on, vol. 2, no. 1, pp. 217-223, jan 1994
- (1994) Speech and Audio Processing, IEEE Transactions on , vol.2 , Issue.1 , pp. 217-223
- Dugast, C.¹ Devillers, L.² Aubert, X.³

17
- 0028194709
- Connectionist probability estimators in HMM speech recognition
- S Renals, N Morgan, H Bourlard, M Cohen, and H Franco, "Connectionist probability estimators in HMM speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 161-174, 1994
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.1 , pp. 161-174
- Renals, S.¹ Morgan, N.² Bourlard, H.³ Cohen, M.⁴ Franco, H.⁵

18
- 0002384092
- Large vocabulary continuous speech recognition using a hybrid connectionist/ HMM system
- 1994
- M. Hochberg, S. Renals, T. Robinson, and D. Kershaw, "Large vocabulary continuous speech recognition using a hybrid connectionist/ HMM system," in Proc. ICSLP, Yokohama
- (1994) Proc. ICSLP, Yokohama
- Hochberg, M.¹ Renals, S.² Robinson, T.³ Kershaw, D.⁴

19
- 0028288775
- A hybrid segmental neural net/hidden Markov model system for continuous speech recognition
- jan
- G. Zavaliagkos, Y. Zhao, R. Schwartz, and J. Makhoul, "A hybrid segmental neural net/hidden Markov model system for continuous speech recognition," Speech and Audio Processing, IEEE Transactions on, vol. 2, no. 1, pp. 151-160, jan 1994
- (1994) Speech and Audio Processing, IEEE Transactions on , vol.2 , Issue.1 , pp. 151-160
- Zavaliagkos, G.¹ Zhao, Y.² Schwartz, R.³ Makhoul, J.⁴

20
- 0029732695
- Multilayer perceptrons for statedependent weightings of HMM likelihoods
- Y. J. Chung and C. K. Un, "Multilayer perceptrons for statedependent weightings of HMM likelihoods," Speech Communication, vol. 18, no. 1, pp. 79-89, 1996
- (1996) Speech Communication , vol.18 , Issue.1 , pp. 79-89
- Chung, Y.J.¹ Un, C.K.²

21
- 84867593213
- Autoencoder bottleneck features using deep belief networks
- T. N. Sainath, B. Kingsbury, and B. Rambahadron, "Autoencoder bottleneck features using deep belief networks," in Proc. IEEE ICASSP, 2012
- (2012) Proc. IEEE ICASSP
- Sainath, T.N.¹ Kingsbury, B.² Rambahadron, B.³

22
- 84878539964
- Application of pretrained deep neural networks to large vocabulary speech recognition
- N Jaitly, P Nguyen, A Senior, and V Vanhoucke, "Application of pretrained deep neural networks to large vocabulary speech recognition," in Interspeech, 2012
- (2012) Interspeech
- Jaitly, N.¹ Nguyen, P.² Senior, A.³ Vanhoucke, V.⁴

23
- 79959814724
- Scarf: A segmental conditional random field toolkit for speech recognition
- G Zweig and P Nguyen, "Scarf: A segmental conditional random field toolkit for speech recognition," in Interspeech, 2010, pp. 2858-2861
- (2010) Interspeech , pp. 2858-2861
- Zweig, G.¹ Nguyen, P.²

24
- 0034825241
- Multi-stream adaptive evidence combination for noise robust ASR
- A Morris, A Hagen, H Glotin, and H Bourlard, "Multi-stream adaptive evidence combination for noise robust ASR," Speech Communication, vol. 34, no. 1-2, pp. 25-40, 2001
- (2001) Speech Communication , vol.34 , Issue.1-2 , pp. 25-40
- Morris, A.¹ Hagen, A.² Glotin, H.³ Bourlard, H.⁴

25
- 79953250475
- Minimum bayes risk decoding and system combination based on a recursion for edit distance
- October
- H Xu, D Povey, L Mangu, and J Zhu, "Minimum bayes risk decoding and system combination based on a recursion for edit distance," Computer Speech and Language, vol. 25, no. 4, pp. 802-828, October 2011
- (2011) Computer Speech and Language , vol.25 , Issue.4 , pp. 802-828
- Xu, H.¹ Povey, D.² Mangu, L.³ Zhu, J.⁴

26
- 85001124710
- Wit3: Web inventory of transcribed and translated talks
- Trento, Italy, May
- M. Cettolo, C. Girardi, and M. Federico, "Wit3: Web inventory of transcribed and translated talks," in Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, May 2012, pp. 261-268
- (2012) Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT) , pp. 261-268
- Cettolo, M.¹ Girardi, C.² Federico, M.³

27
- 84890543632
- The UEDIN systems for the IWSLT 2012 evaluation
- E. Hasler, P. Bell, A. Ghoshal, B. Haddow, P. Koehn, F. McInnes, S. Renals, and P. Swietojanski, "The UEDIN systems for the IWSLT 2012 evaluation," in Proc. IWSLT, 2012
- (2012) Proc. IWSLT
- Hasler, E.¹ Bell, P.² Ghoshal, A.³ Haddow, B.⁴ Koehn, P.⁵ McInnes, F.⁶ Renals, S.⁷ Swietojanski, P.⁸

28
- 84874276847
- The Kaldi speech recognition toolkit
- December
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motl?cek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," in Proc. IEEE ASRU, December 2011
- (2011) Proc. IEEE ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlcek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

29
- 84873443879
- Theano: A CPU and GPU math expression compiler
- J Bergstra, O Breuleux, F Bastien, P Lamblin, R Pascanu, G Desjardins, J Turian, D Warde-Farley, and Y Bengio, "Theano: A CPU and GPU math expression compiler," in Proc. SciPy, 2010
- (2010) Proc. SciPy
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

30
- 51449120120
- Boosted MMI for model and featurespace discriminative training
- D Povey, D Kanevsky, B Kingsbury, B Ramabhadran, G Saon, and K Visweswariah, "Boosted MMI for model and featurespace discriminative training," in Proc. IEEE ICASSP, 2008, pp. 4057-4060
- (2008) Proc. IEEE ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

31
- 85008520364
- Transcribing meetings with the AMIDA systems
- T. Hain, L. Burget, J. Dines, P.N. Garner, F. Grezl, A.E. Hannani, M. Huijbregts, M. Karafiat, M. Lincoln, and V. Wan, "Transcribing meetings with the AMIDA systems," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 486-498, 2012
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.2 , pp. 486-498
- Hain, T.¹ Burget, L.² Dines, J.³ Garner, P.N.⁴ Grezl, F.⁵ Hannani, A.E.⁶ Huijbregts, M.⁷ Karafiat, M.⁸ Lincoln, M.⁹ Wan, V.¹⁰

32
- 84872560515
- arXiv.1206.5533
- Y. Bengio, "Practical recommendations for gradient-based training of deep architectures," arXiv.1206.5533, 2012
- (2012) Practical Recommendations for Gradient-based Training of Deep Architectures
- Bengio, Y.¹

33
- 84874278045
- Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR
- Miami
- P. Swietojanski, A. Ghoshal, and S. Renals, "Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR," in Proc. IEEE Workshop on Spoken Language Technology, Miami, 2012
- (2012) Proc. IEEE Workshop on Spoken Language Technology
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

34
- 33745805403
- A fast learning algorithm for deep belief nets
- GE Hinton, S Osindero, and Y Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, 2006
- (2006) Neural Computation , vol.18
- Hinton, G.E.¹ Osindero, S.² Teh, Y.³

35
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F Seide, G Li, X Chen, and D Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. IEEE ASRU, 2011
- (2011) Proc. IEEE ASRU
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

36
- 84055211743
- Acoustic modeling using deep belief networks
- A Mohamed, GE Dahl, and GE Hinton, "Acoustic modeling using deep belief networks," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, 2012
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.1
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

37
- 0033350721
- Products of experts
- GE Hinton, "Products of experts," in Proc. Int. Conf. Artificial Neural Networks (ICANN), 1999, vol. 1, pp. 1-6.
- (1999) Proc. Int. Conf. Artificial Neural Networks (ICANN) , vol.1 , pp. 1-6
- Hinton, G.E.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.