SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 26-30

Convolutional neural networks for acoustic modeling of raw time signal in LVCSR

(4) Golik, Pavel a Tüske, Zoltán a Schlüter, Ralf a Ney, Hermann a,b

a RWTH AACHEN UNIVERSITY (Germany)

b Laboratoire Interdisciplinaire des Sciences du Numérique (France)

Author keywords

Acoustic modeling; Convolutional neural networks; Raw time signal

Indexed keywords

CONVOLUTION; EXTRACTION; FEATURE EXTRACTION; FILTER BANKS; NEURAL NETWORKS; SPEECH COMMUNICATION;

ACOUSTIC MODEL; ANALYSIS WINDOWS; AUTOMATIC SPEECH RECOGNITION; CONVOLUTIONAL NEURAL NETWORK; DEEP NEURAL NETWORKS; PERFORMANCE GAPS; TIME SIGNALS; TIME-FREQUENCY DECOMPOSITION;

SPEECH RECOGNITION;

EID: 84959110637 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (106)

References (21)

1
- 84910065702
- Acoustic modeling with deep neural networks using raw time signal for LVCSR
- Singapore, Sep.
- Z. Tüske, P. Golik, R. Schlüter, and H. Ney, "Acoustic modeling with deep neural networks using raw time signal for LVCSR, " in Proc. Interspeech, Singapore, Sep. 2014, pp. 890-894.
- (2014) Proc. Interspeech , pp. 890-894
- Tüske, Z.¹ Golik, P.² Schlüter, R.³ Ney, H.⁴

2
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Aug.
- S. B. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. 1980.
- (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

3
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech, " Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, 1990.
- (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

4
- 34547539413
- Gammatone features and feature combination for large vocabulary speech recognition
- Honolulu, HI, USA, Apr.
- R. Schlüter, I. Bezrukov, H. Wagner, and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition, " in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, HI, USA, Apr. 2007, pp. 649-652.
- (2007) Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing , pp. 649-652
- Schlüter, R.¹ Bezrukov, I.² Wagner, H.³ Ney, H.⁴

5
- 33747649659
- Filter bank design based on discriminative feature extraction
- Adelaide, Australia, Apr.
- A. Biem and S. Katagiri, "Filter bank design based on discriminative feature extraction, " in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Adelaide, Australia, Apr. 1994, pp. 485-488.
- (1994) Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing , pp. 485-488
- Biem, A.¹ Katagiri, S.²

6
- 84893688455
- Learning filter banks within a deep neural network framework
- Olomouc, Czech Republic, Dec.
- T. N. Sainath, B. Kingsbury, A.-r. Mohamed, and B. Ramabhadran, "Learning filter banks within a deep neural network framework, " in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic, Dec. 2013, pp. 297-302.
- (2013) Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 297-302
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.-R.³ Ramabhadran, B.⁴

7
- 84906273908
- Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
- Lyon, France, Aug.
- D. Palaz, R. Collobert, and M. Magimai.-Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, " in Proc. Interspeech, Lyon, France, Aug. 2013, pp. 1766-1770.
- (2013) Proc. Interspeech , pp. 1766-1770
- Palaz, D.¹ Collobert, R.² Magimai.-Doss, M.³

8
- 84946023646
- Convolutional neural networks-based continuous speech recognition using raw speech signal
- accepted for publication, Australia, Apr.
- D. Palaz, M. Magimai.-Doss, and R. Collobert, "Convolutional neural networks-based continuous speech recognition using raw speech signal, " in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane, Australia, Apr. 2015, accepted for publication.
- (2015) Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane
- Palaz, D.¹ Magimai.-Doss, M.² Collobert, R.³

9
- 0024634603
- Phoneme recognition using time-delay neural networks
- Mar.
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks, " Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 3, pp. 328-339, Mar. 1989.
- (1989) Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.⁵

10
- 0000494467
- Handwritten digit recognition with a back-propagation network
- D. Touretzky, Ed. Denver, CO: Morgan Kaufman
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Handwritten digit recognition with a back-propagation network, " in Advances in Neural Information Processing Systems 2, D. Touretzky, Ed., vol. 2. Denver, CO: Morgan Kaufman, 1990.
- (1990) Advances in Neural Information Processing Systems 2 , vol.2
- LeCun, Y.¹ Boser, B.² Denker, J.S.³ Henderson, D.⁴ Howard, R.E.⁵ Hubbard, W.⁶ Jackel, L.D.⁷

11
- 84876231242
- ImageNet classification with deep convolutional neural networks
- F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds. Curran Associates, Inc.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks, " in Advances in Neural Information Processing Systems 25, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097-1105.
- (2012) Advances in Neural Information Processing Systems , vol.25 , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

12
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- Kyoto, Japan, Mar.
- O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Kyoto, Japan, Mar. 2012, pp. 4277-4280.
- (2012) Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

13
- 84906257050
- Neural network acoustic models for the DARPA RATS program
- Lyon, France, Aug.
- H. Soltau, H. Kuo, L. Mangu, G. Saon, and T. Beran, "Neural network acoustic models for the DARPA RATS program, " in Proc. Interspeech, Lyon, France, Aug. 2013, pp. 3092-3096.
- (2013) Proc. Interspeech , pp. 3092-3096
- Soltau, H.¹ Kuo, H.² Mangu, L.³ Saon, G.⁴ Beran, T.⁵

14
- 84893654379
- Improvements to deep convolutional neural networks for LVCSR
- Olomouc, Czech Republic, Dec.
- T. N. Sainath, B. Kingsbury, A. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B. Ramabhadran, "Improvements to deep convolutional neural networks for LVCSR, " in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic, Dec. 2013, pp. 315-320.
- (2013) Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 315-320
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.³ Dahl, G.E.⁴ Saon, G.⁵ Soltau, H.⁶ Beran, T.⁷ Aravkin, A.Y.⁸ Ramabhadran, B.⁹

15
- 77956509090
- Rectified linear units improve restricted Boltzmann machines
- Haifa, Israel, Jun.
- V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines, " in Proc. of the 27th Int. Conf. on Machine Learning, Haifa, Israel, Jun. 2010, pp. 807-814.
- (2010) Proc. of the 27th Int. Conf. on Machine Learning , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

16
- 84959129131
- Accessed: 2015-03-27
- (2013) Quaero Programme. Accessed: 2015-03-27. [Online]. Available: http: //www. quaero. org
- (2013) Quaero Programme

17
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- Honolulu, HI, USA, Dec.
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, HI, USA, Dec. 2011, pp. 24-29.
- (2011) Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

18
- 84878410921
- RASR-the rwth aachen university open source speech recognition toolkit
- Honolulu, HI, USA, Dec.
- D. Rybach, S. Hahn, P. Lehnen, D. Nolden, M. Sundermeyer, Z. Tüske, S. Wiesler, R. Schlüter, and H. Ney, "RASR-the RWTH Aachen University open source speech recognition toolkit, " in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, HI, USA, Dec. 2011.
- (2011) Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- Rybach, D.¹ Hahn, S.² Lehnen, P.³ Nolden, D.⁴ Sundermeyer, M.⁵ Tüske, Z.⁶ Wiesler, S.⁷ Schlüter, R.⁸ Ney, H.⁹

19
- 84905222840
- RASR/NN: The RWTH neural network toolkit for speech recognition
- Florence, Italy, May
- S. Wiesler, A. Richard, P. Golik, R. Schlüter, and H. Ney, "RASR/NN: The RWTH neural network toolkit for speech recognition, " in IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy, May 2014, pp. 3313-3317.
- (2014) IEEE International Conference on Acoustics, Speech, and Signal Processing , pp. 3313-3317
- Wiesler, S.¹ Richard, A.² Golik, P.³ Schlüter, R.⁴ Ney, H.⁵

20
- 33745213373
- Multi-resolution RASTA filtering for TANDEM-based ASR
- Lisbon, Portugal, Sep.
- H. Hermansky and P. Fousek, "Multi-resolution RASTA filtering for TANDEM-based ASR, " in Proc. Interspeech, Lisbon, Portugal, Sep. 2005, pp. 361-364.
- (2005) Proc. Interspeech , pp. 361-364
- Hermansky, H.¹ Fousek, P.²

21
- 84910036228
- Robust CNN-based speech recognition with Gabor filter kernels
- Singapore, Sep.
- S. Chang and N. Morgan, "Robust CNN-based speech recognition with Gabor filter kernels, " in Proc. Interspeech, Singapore, Sep. 2014, pp. 905-909.
- (2014) Proc. Interspeech , pp. 905-909
- Chang, S.¹ Morgan, N.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.