메뉴 건너뛰기




Volumn , Issue , 2013, Pages 285-290

Hybrid acoustic models for distant and multichannel large vocabulary speech recognition

Author keywords

Beamforming; Deep Neural Networks; Distant Speech Recognition; Meeting recognition; Microphone Arrays

Indexed keywords

CONVENTIONAL SYSTEMS; DEEP NEURAL NETWORKS; DISTANT SPEECH RECOGNITION; GAUSSIAN MIXTURE MODEL (GMMS); HYBRID ACOUSTIC MODEL; LARGE VOCABULARY SPEECH RECOGNITION; MEETING RECOGNITION; MICROPHONE ARRAYS;

EID: 84893704659     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ASRU.2013.6707744     Document Type: Conference Paper
Times cited : (106)

References (33)
  • 2
    • 47749152568 scopus 로고    scopus 로고
    • The rich transcription 2007 meeting recognition evaluation
    • R Stiefelhagen, R Bowers, and J Fiscus, Eds. number 4625 in Lecture Notes in Computer Science Volume
    • J Fiscus, J Ajot, and J Garofolo, "The rich transcription 2007 meeting recognition evaluation, " in Multimodal Technologies for Perception of Humans, R Stiefelhagen, R Bowers, and J Fiscus, Eds., number 4625 in Lecture Notes in Computer Science Volume, pp. 373-389. 2008.
    • (2008) Multimodal Technologies for Perception of Humans , pp. 373-389
    • Fiscus, J.1    Ajot, J.2    Garofolo, J.3
  • 4
    • 84893665400 scopus 로고    scopus 로고
    • The SRI-ICSI spring 2007 meeting and lecture recognition system
    • R Stiefelhagen, R Bowers, and J Fiscus, Eds. number 4625 in Lecture Notes in Computer Science Volume
    • A Stolcke, X Anguera, K Boakye, O Cetin, A Janin, M Magimai-Doss, C Wooters, and J Zheng, "The SRI-ICSI Spring 2007 meeting and lecture recognition system, " in Multimodal Technologies for Perception of Humans, R Stiefelhagen, R Bowers, and J Fiscus, Eds., number 4625 in Lecture Notes in Computer Science Volume, pp. 373-389. 2008.
    • (2008) Multimodal Technologies for Perception of Humans , pp. 373-389
    • Stolcke, A.1    Anguera, X.2    Boakye, K.3    Cetin, O.4    Janin, A.5    Magimai-Doss, M.6    Wooters, C.7    Zheng, J.8
  • 5
    • 0016990291 scopus 로고
    • The generalized correlation method for estimation of time delay
    • CH Knapp and GC Carter, "The generalized correlation method for estimation of time delay, " IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320-327, 1976.
    • (1976) IEEE Trans. Acoust., Speech, Signal Process , vol.24 , Issue.4 , pp. 320-327
    • Knapp, C.H.1    Carter, G.C.2
  • 7
    • 0141603901 scopus 로고    scopus 로고
    • Superdirective microphone arrays
    • M Brandstein and D Ward, Eds., Springer
    • J Bitzer and KU Simmer, "Superdirective microphone arrays, " in Microphone Arrays, M Brandstein and D Ward, Eds., pp. 19-38. Springer, 2001.
    • (2001) Microphone Arrays , pp. 19-38
    • Bitzer, J.1    Simmer, K.U.2
  • 9
    • 50449096811 scopus 로고    scopus 로고
    • Subband likelihood-maximizing beamforming for speech recognition in reverberant environments
    • M Seltzer and R Stern, "Subband likelihood-maximizing beamforming for speech recognition in reverberant environments, " IEEE Trans. Audio, Speech, Language Process., vol. 14, pp. 2109-2121, 2006.
    • (2006) IEEE Trans. Audio, Speech, Language Process , vol.14 , pp. 2109-2121
    • Seltzer, M.1    Stern, R.2
  • 11
    • 84867195294 scopus 로고    scopus 로고
    • Multi-source far-distance microphone selection and combination for automatic transcription of lectures
    • MWölfel, C Fügen, S Ikbal, and J McDonough, "Multi-source far-distance microphone selection and combination for automatic transcription of lectures, " in Proc ICSLP, 2006.
    • (2006) Proc ICSLP
    • Wölfel, M.1    Fügen, C.2    Ikbal, S.3    McDonough, J.4
  • 12
    • 80051654520 scopus 로고    scopus 로고
    • Making the most from multiple microphones in meeting recognition
    • A Stolcke, "Making the most from multiple microphones in meeting recognition, " in Proc IEEE ICASSP, 2011.
    • (2011) Proc IEEE ICASSP
    • Stolcke, A.1
  • 13
    • 84865729496 scopus 로고    scopus 로고
    • An analysis of automatic speech recognition with multiple microphones
    • D Marino and T Hain, "An analysis of automatic speech recognition with multiple microphones, " in INTERSPEECH, 2011, pp. 1281-1284.
    • (2011) Interspeech , pp. 1281-1284
    • Marino, D.1    Hain, T.2
  • 17
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • F Seide, G Li, X Chen, and D Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. IEEE ASRU, 2011.
    • (2011) Proc. IEEE ASRU
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 18
    • 85032750883 scopus 로고    scopus 로고
    • Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors
    • IEEE
    • K Kumatani, J McDonough, and B Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors, " Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 127-140, 2012.
    • (2012) Signal Processing Magazine , vol.29 , Issue.6 , pp. 127-140
    • Kumatani, K.1    McDonough, J.2    Raj, B.3
  • 21
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • DOI 10.1162/neco.2006.18.7.1527
    • G Hinton, S Osindero, and Y Teh, "A fast learning algorithm for deep belief nets, " Neural Computation, vol. 18, pp. 1527- 1554, 2006. (Pubitemid 44024729)
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 23
    • 85037535397 scopus 로고    scopus 로고
    • Multiple dimension levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech
    • JG Fiscus, J Ajot, N Radde, and C Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech, " in Proc. LREC, 2006.
    • (2006) Proc. LREC
    • Fiscus, J.G.1    Ajot, J.2    Radde, N.3    Laprun, C.4
  • 24
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden markov models
    • MJF Gales, "Semi-tied covariance matrices for hidden Markov models, " IEEE Trans. Speech, Audio Process., vol. 7, no. 3, pp. 272-281, 1999.
    • (1999) IEEE Trans. Speech, Audio Process , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.J.F.1
  • 29
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • GE Dahl, D Yu, L Deng, and A Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 30-42, 2012.
    • (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 32
    • 84890492030 scopus 로고    scopus 로고
    • An investigation of deep neural networks for noise robust speech recognition
    • M Seltzer, D Yu, and Y Wang, "An investigation of deep neural networks for noise robust speech recognition, " in In Proc. ICASSP, 2013.
    • (2013) Proc. ICASSP
    • Seltzer, M.1    Yu, D.2    Wang, Y.3
  • 33


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.