SCOPUS 정보 검색 플랫폼

2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Volumn , Issue , 2013, Pages 285-290

Hybrid acoustic models for distant and multichannel large vocabulary speech recognition

(3) Swietojanski, Pawel a Ghoshal, Arnab a Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Beamforming; Deep Neural Networks; Distant Speech Recognition; Meeting recognition; Microphone Arrays

Indexed keywords

CONVENTIONAL SYSTEMS; DEEP NEURAL NETWORKS; DISTANT SPEECH RECOGNITION; GAUSSIAN MIXTURE MODEL (GMMS); HYBRID ACOUSTIC MODEL; LARGE VOCABULARY SPEECH RECOGNITION; MEETING RECOGNITION; MICROPHONE ARRAYS;

BEAMFORMING; HIDDEN MARKOV MODELS; MICROPHONES; NEURAL NETWORKS;

SPEECH RECOGNITION;

EID: 84893704659 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2013.6707744 Document Type: Conference Paper

Times cited : (106)

References (33)

1
- 50449083999
- Wiley
- M Wölfel and J McDonough, Distant Speech Recognition, Wiley, 2009.
- (2009) Distant Speech Recognition
- Wölfel, M.¹ McDonough, J.²

2
- 47749152568
- The rich transcription 2007 meeting recognition evaluation
- R Stiefelhagen, R Bowers, and J Fiscus, Eds. number 4625 in Lecture Notes in Computer Science Volume
- J Fiscus, J Ajot, and J Garofolo, "The rich transcription 2007 meeting recognition evaluation, " in Multimodal Technologies for Perception of Humans, R Stiefelhagen, R Bowers, and J Fiscus, Eds., number 4625 in Lecture Notes in Computer Science Volume, pp. 373-389. 2008.
- (2008) Multimodal Technologies for Perception of Humans , pp. 373-389
- Fiscus, J.¹ Ajot, J.² Garofolo, J.³

3
- 85008520364
- Transcribing meetings with the amida systems
- T Hain, L Burget, J Dines, PN Garner, F Grezl, AE Hannani, M Huijbregts, M Karafiat, M Lincoln, and VWan, "Transcribing meetings with the AMIDA systems, " IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 2, pp. 486-498, 2012.
- (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.2 , pp. 486-498
- Hain, T.¹ Burget, L.² Dines, J.³ Garner, P.N.⁴ Grezl, F.⁵ Hannani, A.E.⁶ Huijbregts, M.⁷ Karafiat, M.⁸ Lincoln, M.⁹ Wan, V.¹⁰

4
- 84893665400
- The SRI-ICSI spring 2007 meeting and lecture recognition system
- R Stiefelhagen, R Bowers, and J Fiscus, Eds. number 4625 in Lecture Notes in Computer Science Volume
- A Stolcke, X Anguera, K Boakye, O Cetin, A Janin, M Magimai-Doss, C Wooters, and J Zheng, "The SRI-ICSI Spring 2007 meeting and lecture recognition system, " in Multimodal Technologies for Perception of Humans, R Stiefelhagen, R Bowers, and J Fiscus, Eds., number 4625 in Lecture Notes in Computer Science Volume, pp. 373-389. 2008.
- (2008) Multimodal Technologies for Perception of Humans , pp. 373-389
- Stolcke, A.¹ Anguera, X.² Boakye, K.³ Cetin, O.⁴ Janin, A.⁵ Magimai-Doss, M.⁶ Wooters, C.⁷ Zheng, J.⁸

5
- 0016990291
- The generalized correlation method for estimation of time delay
- CH Knapp and GC Carter, "The generalized correlation method for estimation of time delay, " IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320-327, 1976.
- (1976) IEEE Trans. Acoust., Speech, Signal Process , vol.24 , Issue.4 , pp. 320-327
- Knapp, C.H.¹ Carter, G.C.²

6
- 50449086237
- Acoustic beamforming for speaker diarization of meetings
- X Anguera, CWooters, and J Hernando, "Acoustic beamforming for speaker diarization of meetings, " IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 7, pp. 2011-2021, 2007.
- (2007) IEEE Trans. Audio, Speech, Language Process , vol.15 , Issue.7 , pp. 2011-2021
- Anguera, X.¹ Wooters, C.² Hernando, J.³

7
- 0141603901
- Superdirective microphone arrays
- M Brandstein and D Ward, Eds., Springer
- J Bitzer and KU Simmer, "Superdirective microphone arrays, " in Microphone Arrays, M Brandstein and D Ward, Eds., pp. 19-38. Springer, 2001.
- (2001) Microphone Arrays , pp. 19-38
- Bitzer, J.¹ Simmer, K.U.²

8
- 67651154520
- Beamforming with a maximum negentropy criterion
- K Kumatani, J McDonough, B Rauch, D Klakow, PN Garner, and W Li, "Beamforming with a maximum negentropy criterion, " IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 5, pp. 994-1008, 2009.
- (2009) IEEE Trans. Audio, Speech, Language Process , vol.17 , Issue.5 , pp. 994-1008
- Kumatani, K.¹ McDonough, J.² Rauch, B.³ Klakow, D.⁴ Garner, P.N.⁵ Li, W.⁶

9
- 50449096811
- Subband likelihood-maximizing beamforming for speech recognition in reverberant environments
- M Seltzer and R Stern, "Subband likelihood-maximizing beamforming for speech recognition in reverberant environments, " IEEE Trans. Audio, Speech, Language Process., vol. 14, pp. 2109-2121, 2006.
- (2006) IEEE Trans. Audio, Speech, Language Process , vol.14 , pp. 2109-2121
- Seltzer, M.¹ Stern, R.²

10
- 85008590333
- Low-latency real-time meeting recognition and understanding using distant microphones and omni-directional camera
- T Hori, S Araki, T Yoshioka, M Fujimoto, SWatanabe, T Oba, A Ogawa, K Otsuka, D Mikami, K Kinoshita, T Nakatani, A Nakamura, and J Yamoto, "Low-latency real-time meeting recognition and understanding using distant microphones and omni-directional camera, " IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 2, pp. 499-513, 2012.
- (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.2 , pp. 499-513
- Hori, T.¹ Araki, S.² Yoshioka, T.³ Fujimoto, M.⁴ Watanabe, S.⁵ Oba, T.⁶ Ogawa, A.⁷ Otsuka, K.⁸ Mikami, D.⁹ Kinoshita, K.¹⁰ Nakatani, T.¹¹ Nakamura, A.¹² Yamoto, J.¹³

11
- 84867195294
- Multi-source far-distance microphone selection and combination for automatic transcription of lectures
- MWölfel, C Fügen, S Ikbal, and J McDonough, "Multi-source far-distance microphone selection and combination for automatic transcription of lectures, " in Proc ICSLP, 2006.
- (2006) Proc ICSLP
- Wölfel, M.¹ Fügen, C.² Ikbal, S.³ McDonough, J.⁴

12
- 80051654520
- Making the most from multiple microphones in meeting recognition
- A Stolcke, "Making the most from multiple microphones in meeting recognition, " in Proc IEEE ICASSP, 2011.
- (2011) Proc IEEE ICASSP
- Stolcke, A.¹

13
- 84865729496
- An analysis of automatic speech recognition with multiple microphones
- D Marino and T Hain, "An analysis of automatic speech recognition with multiple microphones, " in INTERSPEECH, 2011, pp. 1281-1284.
- (2011) Interspeech , pp. 1281-1284
- Marino, D.¹ Hain, T.²

14
- 84924139705
- Cambridge University Press
- S Renals, H Bourlard, J Carleta, and A Popescu-Belis, Multimodal Signal Processing, Cambridge University Press, 2012.
- (2012) Multimodal Signal Processing
- Renals, S.¹ Bourlard, H.² Carleta, J.³ Popescu-Belis, A.⁴

15
- 84879854889
- Representation learning: A review and new perspectives
- Y Bengio, A Courville, and P Vincent, "Representation learning: A review and new perspectives, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798-1828, 2013.
- (2013) IEEE Trans. Pattern Anal. Mach. Intell. , vol.35 , Issue.8 , pp. 1798-1828
- Bengio, Y.¹ Courville, A.² Vincent, P.³

16
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G Hinton, L Deng, D Yu, GE Dahl, A-R Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, TN Sainath, and B Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

17
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F Seide, G Li, X Chen, and D Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. IEEE ASRU, 2011.
- (2011) Proc. IEEE ASRU
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

18
- 85032750883
- Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors
- IEEE
- K Kumatani, J McDonough, and B Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors, " Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 127-140, 2012.
- (2012) Signal Processing Magazine , vol.29 , Issue.6 , pp. 127-140
- Kumatani, K.¹ McDonough, J.² Raj, B.³

19
- 0003573244
- Kluwer Academic
- H Bourlard and N Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic, 1994.
- (1994) Connectionist Speech Recognition-A Hybrid Approach
- Bourlard, H.¹ Morgan, N.²

20
- 0028194709
- Connectionist probability estimators in hmm speech recognition
- S Renals, N Morgan, H Bourlard, M Cohen, and H Franco, "Connectionist probability estimators in HMM speech recognition, " IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 161-174, 1994.
- (1994) IEEE Trans. Speech Audio Process , vol.2 , Issue.1 , pp. 161-174
- Renals, S.¹ Morgan, N.² Bourlard, H.³ Cohen, M.⁴ Franco, H.⁵

21
- 33745805403
- A fast learning algorithm for deep belief nets
- DOI 10.1162/neco.2006.18.7.1527
- G Hinton, S Osindero, and Y Teh, "A fast learning algorithm for deep belief nets, " Neural Computation, vol. 18, pp. 1527- 1554, 2006. (Pubitemid 44024729)
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

22
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- DOI 10.1109/ICASSP.2007.367023, 4218211, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
- F Grézl, M Karafiát, S Kontár, and J Černocký, "Probabilistic and bottle-neck features for LVCSR of meetings, " in Proc. ICASSP, 2007, vol. 4, pp. IV-757-IV-760. (Pubitemid 47178482)
- (2007) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.4
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Cernocky, J.⁴

23
- 85037535397
- Multiple dimension levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech
- JG Fiscus, J Ajot, N Radde, and C Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech, " in Proc. LREC, 2006.
- (2006) Proc. LREC
- Fiscus, J.G.¹ Ajot, J.² Radde, N.³ Laprun, C.⁴

24
- 0032638856
- Semi-tied covariance matrices for hidden markov models
- MJF Gales, "Semi-tied covariance matrices for hidden Markov models, " IEEE Trans. Speech, Audio Process., vol. 7, no. 3, pp. 272-281, 1999.
- (1999) IEEE Trans. Speech, Audio Process , vol.7 , Issue.3 , pp. 272-281
- Gales, M.J.F.¹

25
- 85009231870
- Qualcomm-icsi-ogi features for ASR
- A Adami, L Burget, S Dupontb, H Garudadric, F Grezl, H Hermansky, P Jain, S Kajarekar, N Morgan, and S Sivadas, "Qualcomm-ICSI-OGI features for ASR, " in In Proc. ICSLP, 2002, pp. 21-24.
- (2002) Proc. ICSLP , pp. 21-24
- Adami, A.¹ Burget, L.² Dupontb, S.³ Garudadric, H.⁴ Grezl, F.⁵ Hermansky, H.⁶ Jain, P.⁷ Kajarekar, S.⁸ Morgan, N.⁹ Sivadas, S.¹⁰

26
- 51449120120
- Boosted MMI for model and featurespace discriminative training
- D Povey, D Kanevsky, B Kingsbury, B Ramabhadran, G Saon, and K Visweswariah, "Boosted MMI for model and featurespace discriminative training, " in Proc. IEEE ICASSP, 2008, pp. 4057-4060.
- (2008) Proc. IEEE ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

27
- 84874276847
- The kaldi speech recognition toolkit
- D Povey, A Ghoshal, G Boulianne, L Burget, O Glembek, N Goel, M Hannemann, P Motlíček, Y Qian, P Schwarz, J Silovský, G Stemmer, and K Veselý, "The Kaldi speech recognition toolkit, " in Proc. IEEE ASRU, 2011.
- (2011) Proc. IEEE ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlíček, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovský, J.¹¹ Stemmer, G.¹² Veselý, K.¹³

28
- 84873443879
- Theano: A CPU and GPU math expression compiler
- J Bergstra, O Breuleux, F Bastien, P Lamblin, R Pascanu, G Desjardins, J Turian, D Warde-Farley, and Y Bengio, "Theano: A CPU and GPU math expression compiler, " in Proc. SciPy, 2010.
- (2010) Proc. SciPy
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

29
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- GE Dahl, D Yu, L Deng, and A Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 30-42, 2012.
- (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

30
- 84874278045
- Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR
- Miami, Florida, USA, Dec
- P Swietojanski, A Ghoshal, and S Renals, "Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR, " in Proc. IEEE Workshop on Spoken Language Technology, Miami, Florida, USA, Dec. 2012.
- (2012) Proc. IEEE Workshop on Spoken Language Technology
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

31
- 84906274730
- Sequencediscriminative training of deep neural networks
- K Veselý, A Ghoshal, L Burget, and D Povey, " Sequencediscriminative training of deep neural networks, " in Proc. INTERSPEECH, 2013.
- (2013) Proc. INTERSPEECH
- Veselý, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

32
- 84890492030
- An investigation of deep neural networks for noise robust speech recognition
- M Seltzer, D Yu, and Y Wang, "An investigation of deep neural networks for noise robust speech recognition, " in In Proc. ICASSP, 2013.
- (2013) Proc. ICASSP
- Seltzer, M.¹ Yu, D.² Wang, Y.³

33
- 84890461500
- Multilingual training of deep neural networks
- A Ghoshal, P Swietojanski, and S Renals, "Multilingual training of deep neural networks, " in Proc. IEEE ICASSP, 2013, pp. 7319-7323.
- (2013) Proc. IEEE ICASSP , pp. 7319-7323
- Ghoshal, A.¹ Swietojanski, P.² Renals, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.