메뉴 건너뛰기




Volumn , Issue , 2014, Pages 172-176

Neural networks for distant speech recognition

Author keywords

AMI corpus; beam forming; convolutional neural networks; distant speech recognition; ICSI corpus; maxout networks; meetings; rectifier unit

Indexed keywords

MARKOV PROCESSES; MICROPHONES; NEURAL NETWORKS; REVERBERATION; SIGNAL PROCESSING;

EID: 84904512262     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/HSCMA.2014.6843274     Document Type: Conference Paper
Times cited : (43)

References (50)
  • 2
    • 0025543907 scopus 로고
    • Speech recognition in noisy environments with the aid of microphone arrays
    • D Van Compernolle,WMa, F Xie, and M Van Diest, "Speech recognition in noisy environments with the aid of microphone arrays," Speech Commun., vol. 9, pp. 433-442, 1990.
    • (1990) Speech Commun. , vol.9 , pp. 433-442
    • Van Compernolle, D.1    Ma, W.2    Xie, F.3    Van Diest, M.4
  • 4
    • 0030676367 scopus 로고    scopus 로고
    • Microphone array based speech recognition with different talker-array positions
    • M Omologo, M Matassoni, P Svaizer, and D Giuliani, "Microphone array based speech recognition with different talker-array positions," in Proc IEEE ICASSP, 1997, pp. 227-230.
    • (1997) Proc IEEE ICASSP , pp. 227-230
    • Omologo, M.1    Matassoni, M.2    Svaizer, P.3    Giuliani, D.4
  • 5
    • 33846217002 scopus 로고    scopus 로고
    • The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments
    • M Lincoln, I McCowan, J Vepa, and HK Maganti, "The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments," in Proc IEEE ASRU, 2005.
    • (2005) Proc IEEE ASRU
    • Lincoln, M.1    McCowan, I.2    Vepa, J.3    Maganti, H.K.4
  • 6
    • 84890443591 scopus 로고    scopus 로고
    • Recognition of overlapping speech using digital MEMS microphone arrays
    • E Zwyssig, F Faubel, S Renals, and M Lincoln, "Recognition of overlapping speech using digital MEMS microphone arrays," in Proc IEEE ICASSP, 2013.
    • (2013) Proc IEEE ICASSP
    • Zwyssig, E.1    Faubel, F.2    Renals, S.3    Lincoln, M.4
  • 7
    • 85032751613 scopus 로고    scopus 로고
    • Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
    • T Yoshioka, A Sehr,MDelcroix, K Kinoshita, R Maas, T Nakatani, and WKellermann, "Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition.," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 114-126, 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 114-126
    • Yoshioka, T.1    Sehr, A.2    Delcroix, M.3    Kinoshita, K.4    Maas, R.5    Nakatani, T.6    Kellermann, W.7
  • 8
    • 85032750883 scopus 로고    scopus 로고
    • Microphone array processing for distant speech recognition: From close-talking microphones to farfield sensors
    • K Kumatani, J McDonough, and B Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to farfield sensors," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 127-140, 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 127-140
    • Kumatani, K.1    McDonough, J.2    Raj, B.3
  • 10
    • 35948981862 scopus 로고    scopus 로고
    • Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus
    • J Carletta, "Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus," Language Resources & Evaluation, vol. 41, pp. 181-190, 2007.
    • (2007) Language Resources & Evaluation , vol.41 , pp. 181-190
    • Carletta, J.1
  • 13
    • 0036296863 scopus 로고    scopus 로고
    • Minimum phone error and I-smoothing for improved discriminative training
    • D Povey and PC Woodland, "Minimum phone error and I-smoothing for improved discriminative training," in Proc IEEE ICASSP, 2002, pp. 105-108.
    • (2002) Proc IEEE ICASSP , pp. 105-108
    • Povey, D.1    Woodland, P.C.2
  • 15
    • 34547548235 scopus 로고    scopus 로고
    • Probabilistic and bottle-neck features for LVCSR of meetings
    • F Gre?zl, M Karafia?t, S Konta?r, and J Ci ernocky?, "Probabilistic and bottle-neck features for LVCSR of meetings," in Proc IEEE ICASSP, 2007, vol. 4, pp. IV-757-IV-760.
    • (2007) Proc IEEE ICASSP , vol.4
    • Grezl, F.1    Karafiat, M.2    Kontar, S.3    Ciernocky, J.4
  • 16
    • 50449092852 scopus 로고    scopus 로고
    • Bridging the gap: Towards a unified framework for handsfree speech recognition using microphone arrays
    • ML Seltzer, "Bridging the gap: Towards a unified framework for handsfree speech recognition using microphone arrays," in Proc HSCMA, 2008.
    • (2008) Proc HSCMA
    • Seltzer, M.L.1
  • 17
    • 4344607755 scopus 로고    scopus 로고
    • Likelihood-maximizing beamforming for robust hands-free speech recognition
    • M Seltzer, B Raj, and R Stern, "Likelihood-maximizing beamforming for robust hands-free speech recognition," IEEE Trans. Speech, & Audio Process., vol. 12, pp. 489-498, 2004.
    • (2004) IEEE Trans. Speech, & Audio Process. , vol.12 , pp. 489-498
    • Seltzer, M.1    Raj, B.2    Stern, R.3
  • 18
    • 50449096811 scopus 로고    scopus 로고
    • Subband likelihood-maximizing beamforming for speech recognition in reverberant environments
    • M Seltzer and R Stern, "Subband likelihood-maximizing beamforming for speech recognition in reverberant environments," IEEE Trans. Audio, Speech, & Lang. Process., vol. 14, pp. 2109-2121, 2006.
    • (2006) IEEE Trans. Audio, Speech, & Lang. Process. , vol.14 , pp. 2109-2121
    • Seltzer, M.1    Stern, R.2
  • 19
    • 84865729496 scopus 로고    scopus 로고
    • An analysis of automatic speech recognition with multiple microphones
    • D Marino and T Hain, "An analysis of automatic speech recognition with multiple microphones," in Proc Interspeech, 2011, pp. 1281-1284.
    • (2011) Proc Interspeech , pp. 1281-1284
    • Marino, D.1    Hain, T.2
  • 23
    • 0029308753 scopus 로고
    • Neural networks for statistical recognition of continuous speech
    • N Morgan and H Bourlard, "Neural networks for statistical recognition of continuous speech," Proc IEEE, vol. 83, pp. 742-772, 1995.
    • (1995) Proc IEEE , vol.83 , pp. 742-772
    • Morgan, N.1    Bourlard, H.2
  • 26
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • GE Dahl, D Yu, L Deng, and A Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans Audio, Speech & Lang. Process., vol. 20, pp. 30-42, 2012.
    • (2012) IEEE Trans Audio, Speech & Lang Process , vol.20 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 27
    • 84893704659 scopus 로고    scopus 로고
    • Hybrid acoustic models for distant and multichannel large vocabulary speech recognition
    • P Swietojanski, A Ghoshal, and S Renals, "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition," in Proc IEEE ASRU, 2013.
    • (2013) Proc IEEE ASRU
    • Swietojanski, P.1    Ghoshal, A.2    Renals, S.3
  • 28
    • 84874282188 scopus 로고    scopus 로고
    • Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
    • J Li, D Yu, J-T Huang, and Y Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc IEEE SLT, 2012, pp. 131-136.
    • (2012) Proc IEEE SLT , pp. 131-136
    • Li, J.1    Yu, D.2    Huang, J.-T.3    Gong, Y.4
  • 30
    • 84893651518 scopus 로고    scopus 로고
    • Deep maxout neural networks for speech recognition
    • M Cai, Y Shi, and J Liu, "Deep maxout neural networks for speech recognition," in Proc ASRU, 2013.
    • (2013) Proc ASRU
    • Cai, M.1    Shi, Y.2    Liu, J.3
  • 31
    • 84893701756 scopus 로고    scopus 로고
    • Deep maxout networks for lowresource speech recognition
    • Y Miao, F Metze, and S Rawat, "Deep maxout networks for lowresource speech recognition," in Proc. IEEE ASRU, 2013.
    • (2013) Proc. IEEE ASRU
    • Miao, Y.1    Metze, F.2    Rawat, S.3
  • 32
  • 34
    • 84867605836 scopus 로고    scopus 로고
    • Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
    • O Abdel-Hamid, A-R Mohamed, J Hui, and G Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition," in Proc IEEE ICASSP, 2012, pp. 4277-4280.
    • (2012) Proc IEEE ICASSP , pp. 4277-4280
    • Abdel-Hamid, O.1    Mohamed, A.-R.2    Hui, J.3    Penn, G.4
  • 35
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc IEEE, vol. 86, pp. 2278-2324, 1998.
    • (1998) Proc IEEE , vol.86 , pp. 2278-2324
    • Lecun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 37
    • 0025254722 scopus 로고
    • A time-delay neural network architecture for isolated
    • word recognition
    • KJ Lang, AH Waibel, and GE Hinton, "A time-delay neural network architecture for isolated word recognition," Neural Networks, vol. 3, pp. 23-43, 1990.
    • (1990) Neural Networks , vol.3 , pp. 23-43
    • Lang, K.J.1    Waibel, A.H.2    Hinton, G.E.3
  • 38
    • 84990059834 scopus 로고    scopus 로고
    • Rectified linear units improve restricted Boltzmann machines
    • V Nair and G Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc ICML, 2010, pp. 131-136.
    • (2010) Proc ICML , pp. 131-136
    • Nair, V.1    Hinton, G.2
  • 41
    • 84903707061 scopus 로고    scopus 로고
    • Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech
    • JG Fiscus, J Ajot, N Radde, and C Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech," in Proc LREC, 2006.
    • (2006) Proc LREC
    • Fiscus, J.G.1    Ajot, J.2    Radde, N.3    Laprun, C.4
  • 44
    • 79951563340 scopus 로고    scopus 로고
    • Understanding the difficulty of training deep feedforward neural networks
    • X Glorot and Y Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proc AISTATS, 2010.
    • (2010) Proc AISTATS
    • Glorot, X.1    Bengio, Y.2
  • 45
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G Hinton, S Osindero, and Y Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
    • (2006) Neural Computation , vol.18 , pp. 1527-1554
    • Hinton, G.1    Osindero, S.2    Teh, Y.3
  • 46
    • 84863380535 scopus 로고    scopus 로고
    • Unsupervised feature learning for audio classification using convolutional deep belief networks
    • H Lee, P Pham, Y Largman, and A Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Proc NIPS 22, 2009, pp. 1096-1104.
    • (2009) Proc NIPS , vol.22 , pp. 1096-1104
    • Lee, H.1    Pham, P.2    Largman, Y.3    Ng, A.4
  • 49
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • MJF Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans Speech and Audio Process., vol. 7, pp. 272-281, 1999.
    • (1999) IEEE Trans Speech and Audio Process , vol.7 , pp. 272-281
    • Gales, M.J.F.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.