메뉴 건너뛰기




Volumn 2015-August, Issue , 2015, Pages 4624-4628

Speech acoustic modeling from raw multichannel waveforms

Author keywords

acoustic modeling; Automatic speech recognition; beamforming; convolutional neural networks

Indexed keywords

ACOUSTIC NOISE; AUDIO SIGNAL PROCESSING; BEAMFORMING; CONVOLUTION; DEEP NEURAL NETWORKS; FILTER BANKS; NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION;

EID: 84946030537     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2015.7178847     Document Type: Conference Paper
Times cited : (224)

References (20)
  • 1
    • 84876231242 scopus 로고    scopus 로고
    • Imagenet classification with deep convolutional neural networks
    • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, "Imagenet classification with deep convolutional neural networks, " in NIPS, 2012, pp. 1097-1105
    • (2012) NIPS , pp. 1097-1105
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.E.3
  • 2
    • 84994264999 scopus 로고    scopus 로고
    • Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
    • Dimitri Palaz, Ronan Coli obert, and Mathew Magimai Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, " Interspeech, 2014
    • (2014) Interspeech
    • Palaz, D.1    Coli Obert, R.2    Magimai Doss, M.3
  • 3
    • 84910065702 scopus 로고    scopus 로고
    • Acoustic modeling with deep neural networks using raw time signal for LV CSR
    • Singapore, Sept
    • Zoltan Ttiske, Pavel Golik, Ralf SchlUter, and Hermann Ney, "Acoustic modeling with deep neural networks using raw time signal for LV CSR, " in Interspeech, Singapore, Sept. 2014
    • (2014) Interspeech
    • Ttiske, Z.1    Golik, P.2    SchlUter, R.3    Ney, H.4
  • 5
    • 80051609011 scopus 로고    scopus 로고
    • Learning a better representation of speech soundwaves using restricted Boltzmann machines
    • Navdeep Jaitly and Geoffrey Hinton, "Learning a better representation of speech soundwaves using restricted Boltzmann machines, " in ICASSP. IEEE, 2011, pp. 5884-5887
    • (2011) ICASSP. IEEE , pp. 5884-5887
    • Jaitly, N.1    Hinton, G.2
  • 6
    • 84893622444 scopus 로고    scopus 로고
    • The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech
    • Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Armin Sehr, Walter Kellermann, and Roland Maas, "The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech, " in W ASPAA. IEEE, 2013, pp. 1-4
    • (2013) W ASPAA. IEEE , pp. 1-4
    • Kinoshita, K.1    Delcroix, M.2    Yoshioka, T.3    Nakatani, T.4    Sehr, A.5    Kellermann, W.6    Maas, R.7
  • 7
    • 84890541701 scopus 로고    scopus 로고
    • The second CHiME speech separation and recognition challenge: Datasets, tasks and baselines
    • Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, and Marco Matassoni, 'The second CHiME speech separation and recognition challenge: Datasets, tasks and baselines, " in ICASSP. IEEE, 2013, pp. 126-130
    • (2013) ICASSP. IEEE , pp. 126-130
    • Vincent, E.1    Barker, J.2    Watanabe, S.3    Le Roux, J.4    Nesta, F.5    Matassoni, M.6
  • 9
    • 80052067786 scopus 로고    scopus 로고
    • Reverberant speech segregation based on multipitch tracking and classification
    • Zhaozhang Jin and DeLiang Wang, "Reverberant speech segregation based on multipitch tracking and classification, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2328-2337, 2011
    • (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.8 , pp. 2328-2337
    • Jin, Z.1    Wang, D.2
  • 11
    • 84893688455 scopus 로고    scopus 로고
    • Learning filter banks within a deep neural network framework
    • Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, and Bhuvana Ramabhadran, "Learning filter banks within a deep neural network framework, " in ASRU. IEEE, 2013, pp. 297-302
    • (2013) ASRU. IEEE , pp. 297-302
    • Sainath, T.N.1    Kingsbury, B.2    Mohamed, A.-R.3    Ramabhadran, B.4
  • 12
    • 0020596154 scopus 로고
    • Cepstral analysis synthesis on the mel frequency scale
    • Satoshi Imai, "Cepstral analysis synthesis on the mel frequency scale, " in ICASSP. IEEE, 1983, vol. 8, pp. 93-96
    • (1983) ICASSP. IEEE , vol.8 , pp. 93-96
    • Imai, S.1
  • 13
    • 84893704659 scopus 로고    scopus 로고
    • Hybrid acoustic models for distant and multichannel large vocabulary speech recognition
    • Pawel Swietojanski, Arnab Ghoshal, and Steve Renals, "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition, " in ASRU. IEEE, 2013, pp. 285-290
    • (2013) ASRU. IEEE , pp. 285-290
    • Swietojanski, P.1    Ghoshal, A.2    Renals, S.3
  • 16
    • 34547539413 scopus 로고    scopus 로고
    • Gammatone features and feature combination for large vocabulary speech recognition
    • Ralf SchlUter, Ilja Bezrukov, Hermann Wagner, and Hermann Ney, "Gammatone features and feature combination for large vocabulary speech recognition, " in ICASSP. IEEE, 2007, vol. 4, pp. IV-649
    • (2007) ICASSP. IEEE , vol.4 , pp. 4-649
    • SchlUter, R.1    Bezrukov, I.2    Wagner, H.3    Ney, H.4
  • 20
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • John Duchi, Elad Hazan, and Yoram Singer, "Adaptive subgradient methods for online learning and stochastic optimization, " T he lournal of Machine Learning Research, vol. 12, pp. 2121-2159, 2011
    • (2011) T He Lournal of Machine Learning Research , vol.12 , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.