메뉴 건너뛰기




Volumn 21, Issue 9, 2014, Pages 1120-1124

Convolutional neural networks for distant speech recognition

Author keywords

AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings

Indexed keywords

CONVOLUTION; MICROPHONES; NEURAL NETWORKS; SPEECH RECOGNITION;

EID: 84901999583     PISSN: 10709908     EISSN: None     Source Type: Journal    
DOI: 10.1109/LSP.2014.2325781     Document Type: Article
Times cited : (235)

References (43)
  • 2
    • 80051654520 scopus 로고    scopus 로고
    • Making the most from multiple microphones in meeting recognition
    • A. Stolcke, "Making the most from multiple microphones in meeting recognition," in Proc. IEEE ICASSP, 2011
    • (2011) Proc. IEEE ICASSP
    • Stolcke, A.1
  • 3
    • 85032750883 scopus 로고    scopus 로고
    • Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors
    • K. Kumatani, J. McDonough, and B. Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 127-140, 2012
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 127-140
    • Kumatani, K.1    McDonough, J.2    Raj, B.3
  • 8
    • 0029308753 scopus 로고
    • Neural networks for statistical recognition of continuous speech
    • N. Morgan and H. Bourlard, "Neural networks for statistical recognition of continuous speech," in Proc. IEEE, 1995, vol. 83, no. 5, pp. 742-772
    • (1995) Proc. IEEE , vol.83 , Issue.5 , pp. 742-772
    • Morgan, N.1    Bourlard, H.2
  • 11
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech Lang. Process., vol. 20, no. 1, pp. 30-42, 2012
    • (2012) IEEE Trans. Audio, Speech Lang. Process. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 12
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conventional HMM systems
    • H. Hermansky, D. PW. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. IEEE ICASSP, 2000, pp. 1635-1638
    • (2000) Proc. IEEE ICASSP , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.P.W.2    Sharma, S.3
  • 13
    • 33745528628 scopus 로고    scopus 로고
    • Using MLP features in SRI's conversational speech recognition system
    • Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan, "Using MLP features in SRI's conversational speech recognition system," in Proc. Eurospeech, 2005
    • (2005) Proc. Eurospeech
    • Zhu, Q.1    Stolcke, A.2    Chen, B.Y.3    Morgan, N.4
  • 16
    • 84893704659 scopus 로고    scopus 로고
    • Hybrid acoustic models for distant and multichannel large vocabulary speech recognition
    • Dec.
    • P. Swietojanski, A. Ghoshal, and S. Renals, "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition," in Proc. IEEE ASRU, Dec. 2013
    • (2013) Proc. IEEE ASRU
    • Swietojanski, P.1    Ghoshal, A.2    Renals, S.3
  • 17
    • 84874282188 scopus 로고    scopus 로고
    • Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
    • J. Li, D. Yu, J.-T. Huang, and Y. Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc. IEEE SLT, 2012, pp. 131-136
    • (2012) Proc. IEEE SLT , pp. 131-136
    • Li, J.1    Yu, D.2    Huang, J.-T.3    Gong, Y.4
  • 18
    • 0002263996 scopus 로고
    • Convolutional networks for images, speech and time series
    • Cambridge, MA, USA: MIT Press
    • Y. LeCun and Y. Bengio, "Convolutional networks for images, speech and time series," in The Handbook of Brain Theory and Neural Networks. Cambridge, MA, USA: MIT Press, 1995, pp. 255-258
    • (1995) The Handbook of Brain Theory and Neural Networks , pp. 255-258
    • Lecun, Y.1    Bengio, Y.2
  • 19
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998
    • (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • Lecun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 21
    • 0025254722 scopus 로고
    • A time-delay neural network architecture for isolated word recognition
    • K. J. Lang, A. H. Waibel, and G. E. Hinton, "A time-delay neural network architecture for isolated word recognition," Neural Netw., vol. 3, no. 1, pp. 23-43, 1990
    • (1990) Neural Netw. , vol.3 , Issue.1 , pp. 23-43
    • Lang, K.J.1    Waibel, A.H.2    Hinton, G.E.3
  • 22
  • 24
    • 84906273908 scopus 로고    scopus 로고
    • Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
    • D. Palaz, R. Collobert, and M. Magimai-Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks," in Proc. Interspeech, 2013
    • (2013) Proc. Interspeech
    • Palaz, D.1    Collobert, R.2    Magimai-Doss, M.3
  • 25
    • 84867605836 scopus 로고    scopus 로고
    • Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
    • O. Abdel-Hamid,A.-R. Mohamed, J. Hui, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition.," in Proc. IEEE ICASSP, 2012, pp. 4277-4280
    • (2012) Proc. IEEE ICASSP , pp. 4277-4280
    • Abdel-Hamid, O.1    Mohamed, A.-R.2    Hui, J.3    Penn, G.4
  • 28
    • 35948981862 scopus 로고    scopus 로고
    • Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus
    • J. Carletta, "Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus," Lang. Res. Eval. J., vol. 41, no. 2, pp. 181-190, 2007
    • (2007) Lang. Res. Eval. J. , vol.41 , Issue.2 , pp. 181-190
    • Carletta, J.1
  • 29
    • 0001595997 scopus 로고
    • Neural network classifiers estimate Bayesian a posteriori probabilities
    • M. D. Richard and R. P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Comput., vol. 3, no. 4, pp. 461-483, 1991
    • (1991) Neural Comput. , vol.3 , Issue.4 , pp. 461-483
    • Richard, M.D.1    Lippmann, R.P.2
  • 30
    • 51249118803 scopus 로고    scopus 로고
    • Unsupervised learning of invariant feature hierarchies with applications to object recognition
    • M. A. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in IEEE CVPR, 2007
    • (2007) IEEE CVPR
    • Ranzato, M.A.1    Huang, F.J.2    Boureau, Y.-L.3    Lecun, Y.4
  • 31
    • 84902000293 scopus 로고    scopus 로고
    • Mar., [Online; accessed 27-March-2014]
    • "NumPy Reference," Mar. 2014 [Online]. Available: http://docs. scipy. org/doc/numpy/numpy-ref-1. 8. 1. pdf, [Online; accessed 27-March-2014]
    • (2014) NumPy Reference
  • 32
    • 84906214784 scopus 로고    scopus 로고
    • Exploring convolutional neural network structures and optimisation techniques for speech recognition
    • ICSA
    • O. Abdel-Hamid, L. Deng, and D. Yu, "Exploring convolutional neural network structures and optimisation techniques for speech recognition," in Proc. Interspeech, 2013, ICSA
    • (2013) Proc. Interspeech
    • Abdel-Hamid, O.1    Deng, L.2    Yu, D.3
  • 33
    • 0033316361 scopus 로고    scopus 로고
    • Hierarchical models of object recognition in cortex
    • DOI 10.1038/14819
    • M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neurosci., vol. 2, pp. 1019-1025, 1999 (Pubitemid 30599567)
    • (1999) Nature Neuroscience , vol.2 , Issue.11 , pp. 1019-1025
    • Riesenhuber, M.1    Poggio, T.2
  • 34
    • 84903707061 scopus 로고    scopus 로고
    • Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech
    • J. G. Fiscus, J. Ajot, N. Radde, and C. Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech," in Proc. LREC, 2006
    • (2006) Proc. LREC
    • Fiscus, J.G.1    Ajot, J.2    Radde, N.3    Laprun, C.4
  • 37
    • 85009224911 scopus 로고    scopus 로고
    • From switchboard to fisher: Telephone collection protocols, their uses and yields
    • C. Cieri, D. Miller, and K. Walker, "From switchboard to fisher: Telephone collection protocols, their uses and yields," in Proc. Eurospeech, 2003
    • (2003) Proc. Eurospeech
    • Cieri, C.1    Miller, D.2    Walker, K.3
  • 38
    • 0033329799 scopus 로고    scopus 로고
    • An empirical study of smoothing techniques for language modeling
    • S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Comput. Speech Lang., vol. 13, no. 4, pp. 359-393, 1999
    • (1999) Comput. Speech Lang. , vol.13 , Issue.4 , pp. 359-393
    • Chen, S.F.1    Goodman, J.2
  • 40
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • MJF Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, 1999
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.1
  • 41
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • DOI 10.1162/neco.2006.18.7.1527
    • G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, pp. 1527-1554, 2006 (Pubitemid 44024729)
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 43
    • 84863380535 scopus 로고    scopus 로고
    • Unsupervised feature learning for audio classification using convolutional deep belief networks
    • H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," Adv. Neural Inf. Process. Syst. 22, pp. 1096-1104, 2009.
    • (2009) Adv. Neural Inf. Process. Syst. , vol.22 , pp. 1096-1104
    • Lee, H.1    Pham, P.2    Largman, Y.3    Ng, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.