SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2013, Pages 1766-1770

Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks

(3) Palaz, Dimitri a,b Collobert, Ronan a Magimai Doss, Mathew a

a IDIAP RESEARCH INSTITUTE (Switzerland)

Author keywords

Artificial neu ral networks; Automatic speech recognition; Convolutional neural networks; Data driven feature extraction; Phonemes

Indexed keywords

FEATURE EXTRACTION; IMAGE PROCESSING; LEARNING SYSTEMS; SPEECH RECOGNITION;

AUTOMATIC SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION SYSTEM; CONDITIONAL PROBABILITIES; CONVOLUTIONAL NEURAL NETWORK; MACHINE LEARNING TECHNIQUES; MULTI LAYER PERCEPTRON; PHONEME CLASSIFICATION; PHONEMES;

NEURAL NETWORKS;

EID: 84906273908 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (111)

References (23)

1
- 0032203257
- Gradient-based learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition, " Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

2
- 80053558787
- Natural language processing (almost) from scratch
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch, " The Journal of Machine Learning Research, vol. 12, pp. 2493-2537, 2011.
- (2011) The Journal of Machine Learning Research , vol.12 , pp. 2493-2537
- Collobert, R.¹ Weston, J.² Bottou, L.³ Karlen, M.⁴ Kavukcuoglu, K.⁵ Kuksa, P.⁶

3
- 0024634603
- Phoneme recognition using time-delay neural networks
- mar
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme recognition using time-delay neural networks, " Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 3, pp. 328 -339, mar 1989.
- (1989) Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.⁵

4
- 0002291365
- Generalization and network design strategies
- R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels, Eds. Zurich, Switzerland: Elsevier
- Y. LeCun, "Generalization and network design strategies, " in Connectionism in Perspective, R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels, Eds. Zurich, Switzerland: Elsevier, 1989.
- (1989) Connectionism in Perspective
- Lecun, Y.¹

5
- 84985742249
- Linear predictive hidden Markov models and the speech signal
- May
- A. Poritz, "Linear predictive hidden markov models and the speech signal, " in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82., vol. 7, May 1982, pp. 1291-1294.
- (1982) Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82 , vol.7 , pp. 1291-1294
- Poritz, A.¹

6
- 13244265597
- Revisiting autoregressive hidden Markov modeling of speech signals
- Feb
- Y. Ephraim and W. J. J. Roberts, "Revisiting autoregressive hidden markov modeling of speech signals, " IEEE Signal Processing Letters, vol. 12, no. 2, pp. 166-169, Feb. 2005.
- (2005) IEEE Signal Processing Letters , vol.12 , Issue.2 , pp. 166-169
- Ephraim, Y.¹ Roberts, W.J.J.²

7
- 54349106040
- Switching linear dynamical systems for noise robust speech recognition
- Aug
- B. Mesot and D. Barber, "Switching linear dynamical systems for noise robust speech recognition, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 6, pp. 1850-1858, Aug. 2008.
- (2008) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.6 , pp. 1850-1858
- Mesot, B.¹ Barber, D.²

8
- 0028195651
- Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization
- IEEE Transactions on
- H. Sheikhzadeh and L. Deng, "Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization, " Speech and Audio Processing, IEEE Transactions on, vol. 2, no. 1, p. 8089, 1994.
- (1994) Speech and Audio Processing , vol.2 , Issue.1 , pp. 8089
- Sheikhzadeh, H.¹ Deng, L.²

9
- 70450190485
- Tuning support vector machines for robust phoneme classification with acoustic waveforms
- J. Yousafzai, Z. Cvetkovic, and P. Sollich, "Tuning support vector machines for robust phoneme classification with acoustic waveforms, " in INTERSPEECH, 2009, pp. 2391-2394.
- (2009) Interspeech , pp. 2391-2394
- Yousafzai, J.¹ Cvetkovic, Z.² Sollich, P.³

10
- 84055211743
- Acoustic modeling using deep belief networks
- IEEE Transactions on, jan
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 14 -22, jan. 2012.
- (2012) Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

11
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- IEEE International Conference on, 2012
- O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 4277-4280.
- (2012) Acoustics, Speech and Signal Processing (ICASSP) , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Jiang, H.³ Penn, G.⁴

12
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, p. 3042, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 3042
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

13
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- IEEE
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, "Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, " Signal Processing Magazine, IEEE, vol. 29, no. 6, p. 8297, 2012.
- (2012) Signal Processing Magazine , vol.29 , Issue.6 , pp. 8297
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

14
- 5044231640
- Learning methods for generic object recognition with invariance to pose and lighting
- Y. LeCun, F. J. Huang, and L. Bottou, "Learning methods for generic object recognition with invariance to pose and lighting, " in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2004, pp. II-97.
- (2004) Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition , vol.2
- Lecun, Y.¹ Huang, F.J.² Bottou, L.³

15
- 84878919540
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks, " in Advances in Neural Information Processing Systems 25, 2012, pp. 1106-1114.
- (2012) Advances in Neural Information Processing Systems , vol.25 , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.³

16
- 0000583248
- Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition
- NATO ASI series ed. F. Fogelman Soulie and J. Herault, Eds
- J. Bridle, "Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition, " in Neuro-computing: Algorithms, Architectures and Applications, NATO ASI series ed., F. Fogelman Soulie and J. Herault, Eds., 1990, pp. 227-236.
- (1990) Neuro-computing: Algorithms, Architectures and Applications , pp. 227-236
- Bridle, J.¹

17
- 33847215211
- Stochastic gradient learning in neural networks
- Nimes, France: EC2
- L. Bottou, "Stochastic gradient learning in neural networks, " in Proceedings of Neuro-Nmes 91. Nimes, France: EC2, 1991.
- (1991) Proceedings of Neuro-Nmes , vol.91
- Bottou, L.¹

18
- 0024768209
- Speaker-independent phone recognition using hidden markov models
- K. F. Lee and H. W. Hon, "Speaker-independent phone recognition using hidden markov models, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, no. 11, pp. 1641- 1648, 1989.
- (1989) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.37 , Issue.11 , pp. 1641-1648
- Lee, K.F.¹ Hon, H.W.²

19
- 0003822743
- Cambridge University Engineering Department
- S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, "The htk book, " Cambridge University Engineering Department, vol. 3, 2002.
- (2002) The Htk Book , vol.3
- Young, S.¹ Evermann, G.² Kershaw, D.³ Moore, G.⁴ Odell, J.⁵ Ollason, D.⁶ Valtchev, V.⁷ Woodland, P.⁸

20
- 0029306621
- Continuous speech recognition
- IEEE, May
- N. Morgan and H. Bourlard, "Continuous speech recognition, " Signal Processing Magazine, IEEE, vol. 12, no. 3, pp. 24 -42, May 1995.
- (1995) Signal Processing Magazine , vol.12 , Issue.3 , pp. 24-42
- Morgan, N.¹ Bourlard, H.²

21
- 84888340666
- Torch7: A matlab-like environment for machine learning
- R. Collobert, K. Kavukcuoglu, and C. Farabet, "Torch7: A matlab-like environment for machine learning, " in BigLearn, NIPS Workshop, 2011.
- (2011) BigLearn, NIPS Workshop
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

22
- 70349212558
- Phoneme recognition using spectral envelope and modulation frequency features
- IEEE International Conference on
- S. Thomas, S. Ganapathy, and H. Hermansky, "Phoneme recognition using spectral envelope and modulation frequency features, " in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, 2009, pp. 4453-4456.
- (2009) Acoustics, Speech and Signal Processing, 2009, ICASSP 2009 , pp. 4453-4456
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

23
- 0030648914
- Global training of document processing systems using graph transformer networks
- L. Bottou, Y. Bengio, and Y. LeCun, "Global training of document processing systems using graph transformer networks." in In Proc. of Computer Vision and Pattern Recognition. Puerto-Rico., 1997, pp. 490-494.
- (1997) Proc. of Computer Vision and Pattern Recognition. Puerto-Rico , pp. 490-494
- Bottou, L.¹ Bengio, Y.² Lecun, Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.