SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 22, Issue 11, 2014, Pages 1660-1669

Regression-based context-dependent modeling of deep neural networks for speech recognition

(2) Wang, Guangsen a Sim, Khe Chai a

a NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

Author keywords

Articulatory features; context dependent modeling; deep neural network; logistic regression

Indexed keywords

DECISION TREES; REGRESSION ANALYSIS; TELEPHONE SETS; TREES (MATHEMATICS);

ARTICULATORY FEATURES; BROADCAST NEWS TRANSCRIPTIONS; CONTEXT DEPENDENT MODELING; CONTEXT-DEPENDENT MODELS; DEEP NEURAL NETWORKS; LOG-POSTERIOR PROBABILITY; LOGISTIC REGRESSIONS; WORD ERROR RATE REDUCTIONS;

SPEECH RECOGNITION;

EID: 84916199887 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2014.2344855 Document Type: Article

Times cited : (18)

References (31)

1
- 0030244826
- A review of large-vocabulary continuous-speech recognition
- Sep
- S. Young, "A review of large-vocabulary continuous-speech recognition," IEEE Signal Process. Mag., vol. 13, no. 5, pp. 45-57, Sep. 1996.
- (1996) IEEE Signal Process. Mag , vol.13 , Issue.5 , pp. 45-57
- Young, S.¹

2
- 0027683813
- Shared-distribution hiddenMarkov models for speech recognition
- Oct
- M. Hwang and X. Huang, "Shared-distribution hiddenMarkov models for speech recognition," IEEE Trans. Speech Audio Process., vol. 1, no. 4, pp. 414-420, Oct. 1993.
- (1993) IEEE Trans. Speech Audio Process , vol.1 , Issue.4 , pp. 414-420
- Hwang, M.¹ Huang, X.²

3
- 0002144369
- Tree-based state tying for high accuracy acoustic modelling
- S. J. Young, J. J. Odell, and P. C.Woodland, "Tree-based state tying for high accuracy acoustic modelling," in Proc. HLT, 1994, pp. 307-312.
- (1994) Proc. HLT , pp. 307-312
- Young, S.J.¹ Odell, J.J.² Woodland, P.C.³

4
- 0034273299
- Robust decision tree state tying for continuous speech recognition
- Sep
- W. Reichl and W. Chou, "Robust decision tree state tying for continuous speech recognition," IEEE Trans. Speech Audio Process., vol. 8, no. 5, pp. 555-566, Sep. 2000.
- (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.5 , pp. 555-566
- Reichl, W.¹ Chou, W.²

5
- 84867620524
- An investigation of tied-mixture GMMbased triphone state clustering
- G.Wang and K. C. Sim, "An investigation of tied-mixture GMMbased triphone state clustering," in Proc. ICASSP, 2012, pp. 4717-4720.
- (2012) Proc. ICASSP , pp. 4717-4720
- Wang, G.¹ Sim, K.C.²

6
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

7
- 33745805403
- A fast learning algorithm for deep belief nets
- G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
- (2006) Neural Comput , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

8
- 84055222005
- Context-dependent pretrained deep neural networks for large vocabulary speech recognition
- Jan.
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

9
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide,G.Li,X.Chen, andD.Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. ASRU, 2011, pp. 24-29.
- (2011) Proc. ASRU , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

10
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novák, and A. RahmanMohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition," in Proc. ASRU, 2011, pp. 30-35.
- (2011) Proc. ASRU , pp. 30-35
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novák, P.⁵ Rahmanmohamed, A.⁶

11
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- Nov.
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

12
- 84055211743
- Acoustic modeling using deep belief networks
- Jan.
- A. Mohamed, G. E. Dahl, and G. E. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

13
- 0025419316
- Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition
- Apr
- K. F. Lee, "Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 4, pp. 599-609, Apr. 1990.
- (1990) IEEE Trans. Acoust., Speech, Signal Process , vol.38 , Issue.4 , pp. 599-609
- Lee, K.F.¹

14
- 67650819459
- Karlsruhe, Germany: Karlsruhe Insti. of Technol
- F. Metze, Articulatory features for conversational speech recognition. Karlsruhe, Germany: Karlsruhe Insti. of Technol., 2005.
- (2005) Articulatory Features for Conversational Speech Recognition
- Metze, F.¹

15
- 84893699565
- Context-dependent modelling of deep neural network using logistic regression
- G.Wang and K. C. Sim, "Context-dependent modelling of deep neural network using logistic regression," in Proc. IEEE Workshop Autom. Speech Recogn. Understand., 2013, pp. 338-343.
- (2013) Proc IEEE Workshop Autom. Speech Recogn. Understand , pp. 338-343
- Wang, G.¹ Sim, K.C.²

16
- 84916235883
- Multiple codebook semi-continuous hidden Markov models for speaker-independent continuous speech recognition
- X. Huang, H. Hon, and K. Lee, "Multiple codebook semi-continuous hidden Markov models for speaker-independent continuous speech recognition," Carnegie Mellon Univ., Computer Science Dept., Tech. Rep. v. 89-136, 1989.
- (1989) Carnegie Mellon Univ., Computer Science Dept., Tech. Rep , vol.89 , Issue.136
- Huang, X.¹ Hon, H.² Lee, K.³

17
- 0003940203
- Cambridge Univ. Engineering Dept, Tech. Rep.
- M. J. F. Gales, "The generation and use of regression class trees for MLLR adaptation," Cambridge Univ. Engineering Dept., 1996, Tech. Rep..
- (1996) The Generation and Use of Regression Class Trees for MLLR Adaptation
- Gales, M.J.F.¹

18
- 79955538498
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
- K. Yu, H. Zen, F.Mairesse, and S. Young, "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis," Speech Commun., vol. 53, no. 6, pp. 914-923, 2011.
- (2011) Speech Commun , vol.53 , Issue.6 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

19
- 0013344078
- Training products of experts by minimizing contrastive divergence
- G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 14, no. 8, pp. 1771-1800, 2002.
- (2002) Neural Comput , vol.14 , Issue.8 , pp. 1771-1800
- Hinton, G.E.¹

20
- 77949342620
- Discriminative product-of-expert acoustic mapping for cross-lingual phone recognition
- K. C. Sim, "Discriminative product-of-expert acoustic mapping for cross-lingual phone recognition," in Proc. ASRU, 2009, pp. 546-551.
- (2009) Proc. ASRU , pp. 546-551
- Sim, K.C.¹

21
- 77949394249
- Phoneme recognition based on long temporal context. Brno, Czech Republic: Brno Univ. of Technology
- P. Schwarz, Phoneme recognition based on long temporal context. Brno, Czech Republic: Brno Univ. of Technology, Faculty of Inf. Technol., 2008.
- (2008) Faculty of Inf. Technol
- Schwarz, P.¹

22
- 56249116597
- An overview on automatic speech attribute transcription (ASAT)
- C.-H.Lee,M.Clements, S. Dusan, E.Fosler-Lussier, K. Johnson, B.-H. Juang, and L. Rabiner, "An overview on automatic speech attribute transcription (ASAT)," in Proc. Interspeech, 2007.
- (2007) Proc. Interspeech
- Lee, C.-H.¹ Clements, M.² Dusan, S.³ Fosler-Lussier, E.⁴ Johnson, K.⁵ Juang, B.-H.⁶ Rabiner, L.⁷

23
- 84875405186
- Exploiting deep neural networks for detection-based speech recognition
- Apr
- S. M. Siniscalchi, D. Yu, L. Deng, and C.-H. Lee, "Exploiting deep neural networks for detection-based speech recognition," Neurocomput., vol. 106, pp. 148-157, Apr. 2013.
- (2013) Neurocomput , vol.106 , pp. 148-157
- Siniscalchi, S.M.¹ Yu, D.² Deng, L.³ Lee, C.-H.⁴

24
- 0028234947
- A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features
- L. Deng and D. X. Sun, "A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features," J. Acoust. Soc. Amer., vol. 95, no. 5, pp. 2702-2719, 1994.
- (1994) J. Acoust. Soc. Amer , vol.95 , Issue.5 , pp. 2702-2719
- Deng, L.¹ Sun, D.X.²

25
- 0001865554
- The TDT-3 text and speech corpus
- Morgan Kaufmann
- D. Graff, C. Cieri, S. Strassel, and N. Martey, "The TDT-3 Text And Speech Corpus," in Proc. DARPA Broadcast News Workshop, 1999, pp. 57-60, Morgan Kaufmann.
- (1999) Proc. DARPA Broadcast News Workshop , pp. 57-60
- Graff, D.¹ Cieri, C.² Strassel, S.³ Martey, N.⁴

26
- 60749097551
- Cambridge, U.K.: Cambridge Univ. Engineering Dept
- S. J. Y. et al., The HTK Book, version 3.4. Cambridge, U.K.: Cambridge Univ. Engineering Dept., 2009.
- (2009) The HTK Book, Version 3 , vol.4

27
- 4544265717
- Ph.D. dissertation Cambridge Univ.
- D. Povey, "Discriminative training for large vocabulary speech recognition," Ph.D. dissertation, Cambridge Univ., , 2004.
- (2004) Discriminative Training for Large Vocabulary Speech Recognition
- Povey, D.¹

28
- 0031222490
- MMIE training of large vocabulary recognition systems
- V. Valtchev, J. J. Odell, P. C. Woodland, and S. J. Young, "MMIE training of large vocabulary recognition systems," Speech Commun., vol. 22, no. 4, pp. 303-314, 1997.
- (1997) Speech Commun , vol.22 , Issue.4 , pp. 303-314
- Valtchev, V.¹ Odell, J.J.² Woodland, P.C.³ Young, S.J.⁴

29
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," in Proc. IEEE Workshop Autom. Speech Recogn. Understand., 2011.
- (2011) Proc IEEE Workshop Autom. Speech Recogn. Understand
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

30
- 84255177123
- Deep and wide: Multiple layers in automatic speech recognition
- Jan.
- N. Morgan, "Deep and wide: Multiple layers in automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 7-13, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 7-13
- Morgan, N.¹

31
- 33646818291
- Constructing ensembles of ASR systems using randomized decision trees
- O. Siohan, B. Ramabhadran, and B. Kingsbury, "Constructing ensembles of ASR systems using randomized decision trees," in Proc. Int. Conf. Acoust. Speech, Signal Process., 2005.
- (2005) Proc. Int. Conf. Acoust. Speech, Signal Process
- Siohan, O.¹ Ramabhadran, B.² Kingsbury, B.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.