SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 3434-3438

Acoustic modelling from the signal domain using CNNs

(4) Ghahremani, Pegah a Manohar, Vimal a Povey, Daniel a,b Khudanpur, Sanjeev a,b

a Johns Hopkins University (United States)

b JOHNS HOPKINS UNIVERSITY (United States)

Author keywords

Network In Network nonlinearity; Raw waveform; Statistic extraction layer

Indexed keywords

SPEECH COMMUNICATION; SPEECH PROCESSING; SPEECH RECOGNITION;

ACOUSTIC MODELLING; ADAPTATION METHODS; FEATURE EXTRACTOR; IN NETWORKS; INTERMEDIATE LAYERS; OBJECTIVE FUNCTIONS; SPEECH RECOGNITION SYSTEMS; WAVE FORMS;

NETWORK ARCHITECTURE;

EID: 84994235770 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-1495 Document Type: Conference Paper

Times cited : (88)

References (25)

1
- 0038133939
- Distance measures for speech recognition, psychological and instrumental
- P. Mermelstein, "Distance measures for speech recognition, psychological and instrumental, " Pattern recognition and artificial intelligence, vol. 116, pp. 374-388, 1976.
- (1976) Pattern Recognition and Artificial Intelligence , vol.116 , pp. 374-388
- Mermelstein, P.¹

2
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech, " Journal of the Acoustical Society of America, vol. 87, pp. 1738-1752, 1990.
- (1990) Journal of the Acoustical Society of America , vol.87 , pp. 1738-1752
- Hermansky, H.¹

3
- 84893688455
- Learning filter banks within a deep neural network framework
- T. N. Sainath, B. Kingsbury, A.-R. Mohamed, and B. Ramabhadran, "Learning filter banks within a deep neural network framework, " in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 297-302.
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop On. IEEE , pp. 297-302
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.-R.³ Ramabhadran, B.⁴

4
- 84910065702
- Acoustic modeling with deep neural networks using raw time signal for LVCSR
- Z. Tüske, P. Golik, R. Schlüter, and H. Ney, "Acoustic modeling with deep neural networks using raw time signal for LVCSR, " in Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Tüske, Z.¹ Golik, P.² Schlüter, R.³ Ney, H.⁴

5
- 84946030537
- Speech acoustic modeling from raw multichannel waveforms
- Y. Hoshen, R. J.Weiss, and K.W.Wilson, "Speech acoustic modeling from raw multichannel waveforms, " in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4624-4628.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. IEEE , pp. 4624-4628
- Hoshen, Y.¹ Weiss, R.J.² Wilson, K.W.³

6
- 84955059475
- Analysis of cnn-based speech recognition system using raw speech as input
- D. Palaz, R. Collobert et al., "Analysis of CNN-based Speech Recognition System using Raw Speech as Input, " in Proceedings of Interspeech, no. EPFL-CONF-210029, 2015.
- (2015) Proceedings of Interspeech, No. EPFL-CONF-210029
- Palaz, D.¹ Collobert, R.²

7
- 84959168440
- Learning the speech front-end with raw waveform cldnns
- T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson, and O. Vinyals, "Learning the speech front-end with raw waveform cldnns, " in Proc. Interspeech, 2015.
- (2015) Proc. Interspeech
- Sainath, T.N.¹ Weiss, R.J.² Senior, A.³ Wilson, K.W.⁴ Vinyals, O.⁵

8
- 0024634603
- Phoneme recognition using time-delay neural networks
- Mar
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, "Phoneme recognition using time-delay neural networks, " IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, Mar 1989.
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.J.⁵

9
- 84994310412
- Purely sequencetrained neural networks for ASR based on lattice-free MMI
- [Online]
- D. Povey, V. Peddinti, D. Galvez, P. Ghahrmani, V. Manohar, X. Na, Y. Wang, and S. Khudanpur, "Purely sequencetrained neural networks for ASR based on lattice-free MMI, " in Submitted to Interspeech, 2016. [Online]. Available: http://www.danielpovey.com/files/2016 interspeech mmi.pdf
- (2016) Submitted to Interspeech
- Povey, D.¹ Peddinti, V.² Galvez, D.³ Ghahrmani, P.⁴ Manohar, V.⁵ Na, X.⁶ Wang, Y.⁷ Khudanpur, S.⁸

10
- 85016587886
- Switchboard: Telephone speech corpus for research and development
- J. J. Godfrey et al., "Switchboard: Telephone speech corpus for research and development, " in ICASSP, 1992.
- (1992) ICASSP
- Godfrey, J.J.¹

11
- 84994264999
- arXiv preprint arXiv:1304 1018
- D. Palaz, R. Collobert, and M. M. Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, " arXiv preprint arXiv:1304.1018, 2013.
- (2013) Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal Using Convolutional Neural Networks
- Palaz, D.¹ Collobert, R.² Doss, M.M.³

12
- 84959098603
- Architectures for deep neural network based acoustic models defined over windowed speech waveforms
- M. Bhargava and R. Rose, "Architectures for deep neural network based acoustic models defined over windowed speech waveforms, " in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
- (2015) Sixteenth Annual Conference of the International Speech Communication Association
- Bhargava, M.¹ Rose, R.²

13
- 84973310829
- arXiv preprint arXiv:1312 4400
- M. Lin, Q. Chen, and S. Yan, "Network in network, " arXiv preprint arXiv:1312.4400, 2013.
- (2013) Network in Network
- Lin, M.¹ Chen, Q.² Yan, S.³

14
- 0030263447
- Mean and variance adaptation within the MLLR framework
- M. J. F. Gales and P. C. Woodland, "Mean and Variance Adaptation Within the MLLR Framework, " Computer Speech and Language, vol. 10, pp. 249-264, 1996.
- (1996) Computer Speech and Language , vol.10 , pp. 249-264
- Gales, M.J.F.¹ Woodland, P.C.²

15
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 4, pp. 788-798, 2011.
- (2011) Audio, Speech, and Language Processing, IEEE Transactions on , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

16
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors." in ASRU, 2013, pp. 55-59.
- (2013) ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

17
- 84964483822
- JHU ASpIRE system: Robust LVCSR with TDNNs, ivector Adaptation, and RNN-LMs
- V. Peddinti, G. Chen, V. Manohar, T. Ko, D. Povey, and S. Khudanpur, "JHU ASpIRE system: Robust LVCSR with TDNNs, ivector Adaptation, and RNN-LMs, " in ASRU, 2015.
- (2015) ASRU
- Peddinti, V.¹ Chen, G.² Manohar, V.³ Ko, T.⁴ Povey, D.⁵ Khudanpur, S.⁶

18
- 84959142471
- Robust i-vector based adaptation of DNN acoustic model for speech recognition
- S. Garimella, A. Mandal, N. Strom, B. Hoffmeister, S. Matsoukas, and S. H. K. Parthasarathi, "Robust i-vector based adaptation of DNN acoustic model for speech recognition, " In Proceedings of Interspeech, 2015.
- (2015) Proceedings of Interspeech
- Garimella, S.¹ Mandal, A.² Strom, N.³ Hoffmeister, B.⁴ Matsoukas, S.⁵ Parthasarathi, S.H.K.⁶

19
- 84959118622
- Audio augmentation for speech recognition
- T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech recognition, " in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Ko, T.¹ Peddinti, V.² Povey, D.³ Khudanpur, S.⁴

20
- 84896734479
- Deep scattering spectrum
- J. Andén and S. Mallat, "Deep scattering spectrum, " Signal Processing, IEEE Transactions on, vol. 62, no. 16, pp. 4114-4128, 2014.
- (2014) Signal Processing, IEEE Transactions on , vol.62 , Issue.16 , pp. 4114-4128
- Andén, J.¹ Mallat, S.²

21
- 84905239342
- Improving Deep Neural Network Acoustic Models using Generalized Maxout Networks
- May
- X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving Deep Neural Network Acoustic Models using Generalized Maxout Networks, " in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, May 2014, pp. 215-219.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 215-219
- Zhang, X.¹ Trmal, J.² Povey, D.³ Khudanpur, S.⁴

22
- 84959110637
- Convolutional neural networks for acoustic modeling of raw time signal in lvcsr
- P. Golik, Z. Tüske, R. Schlüter, and H. Ney, "Convolutional Neural Networks for Acoustic Modeling of Raw Time Signal in LVCSR, " in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
- (2015) Sixteenth Annual Conference of the International Speech Communication Association
- Golik, P.¹ Tüske, Z.² Schlüter, R.³ Ney, H.⁴

23
- 84959115289
- A time delay neural network architecture for efficient modeling of long temporal contexts
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts, " in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Povey, D.² Khudanpur, S.³

24
- 0012330750
- The design for the Wall Street Journal-based CSR corpus
- Association for Computational Linguistics
- D. B. Paul and J. M. Baker, "The design for the Wall Street Journal-based CSR corpus, " in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357-362.
- (1992) Proceedings of the Workshop on Speech and Natural Language , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

25
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal et al., "The Kaldi Speech Recognition Toolkit, " in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.