SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 2346-2350

Phone classification by a hierarchy of invariant representation layers

(5) Zhang, Chiyuan a Voinea, Stephen a Evangelopoulos, Georgios a,b Rosasco, Lorenzo a,b Poggio, Tomaso a,b

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Auditory cortex; Convolutional network; Invariance; Phonetic classification

Indexed keywords

CLASSIFICATION (OF INFORMATION); COMPLEX NETWORKS; FEATURE EXTRACTION; INVARIANCE; SPEECH; SPEECH COMMUNICATION; TELEPHONE SETS;

AUDITORY CORTEX; CONVOLUTIONAL NETWORKS; EMPIRICAL - COMPARISONS; INVARIANT REPRESENTATION; PHONE CLASSIFICATIONS; PHONETIC CLASSIFICATION; SAMPLE COMPLEXITY; VOCAL TRACT LENGTHS;

SPEECH RECOGNITION;

EID: 84910037127 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (5)

References (38)

1
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Aug
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug. 1980.
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

2
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech, " The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, 1990.
- (1990) The Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

3
- 0031187171
- Speech recognition by machines and humans
- Jul
- R. P. Lippmann, "Speech recognition by machines and humans, " Speech Communication, vol. 22, no. 1, pp. 1-15, Jul. 1997.
- (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
- Lippmann, R.P.¹

4
- 77949396605
- Comparing human and machine recognition performance on a VCV corpus
- Aalborg, Denmark
- O. Scharenborg and M. P. Cooke, "Comparing human and machine recognition performance on a VCV corpus, " in ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery", Aalborg, Denmark, 2008.
- (2008) ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery
- Scharenborg, O.¹ Cooke, M.P.²

5
- 84897584256
- Phonetic feature encoding in human superior temporal gyrus
- Jan
- N. Mesgarani, C. Cheung, K. Johnson, and E. F. Chang, "Phonetic feature encoding in human superior temporal gyrus, " Science, vol. 343, no. 6174, pp. 1006-1010, Jan. 2014.
- (2014) Science , vol.343 , Issue.6174 , pp. 1006-1010
- Mesgarani, N.¹ Cheung, C.² Johnson, K.³ Chang, E.F.⁴

6
- 82855178812
- Hierarchical representations in the auditory cortex
- Jun
- T. O. Sharpee, C. A. Atencio, and C. E. Schreiner, "Hierarchical representations in the auditory cortex, " Curr. Opin. Neurobiol., vol. 21, no. 5, pp. 761-767, Jun. 2011.
- (2011) Curr. Opin. Neurobiol. , vol.21 , Issue.5 , pp. 761-767
- Sharpee, T.O.¹ Atencio, C.A.² Schreiner, C.E.³

7
- 85032751341
- Hearing is believing: Biologically inspired methods for robust automatic speech recognition
- Nov
- R. Stern and N. Morgan, "Hearing is believing: Biologically inspired methods for robust automatic speech recognition, " IEEE Signal Process. Mag., vol. 29, no. 6, pp. 34-43, Nov. 2012.
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 34-43
- Stern, R.¹ Morgan, N.²

8
- 78049398611
- Sparse coding for speech recognition
- G. Sivaram, S. K. Nemala, M. Elhilali, T. D. Tran, and H. Hermansky, "Sparse coding for speech recognition, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2010, pp. 4346-4349.
- (2010) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4346-4349
- Sivaram, G.¹ Nemala, S.K.² Elhilali, M.³ Tran, T.D.⁴ Hermansky, H.⁵

9
- 84878538214
- Are sparse representations rich enough for acoustic modeling?
- Portland, Oregon
- O. Vinyals and L. Deng, "Are sparse representations rich enough for acoustic modeling?" in Proc. INTERSPEECH 2012, 13th Annual Conf. of the ISCA, Portland, Oregon, 2012.
- (2012) Proc. INTERSPEECH 2012, 13th Annual Conf. of the ISCA
- Vinyals, O.¹ Deng, L.²

10
- 84858975144
- A convex hull approach to sparse representations for exemplar-based speech recognition
- Dec
- T. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran, and P. Shah, "A convex hull approach to sparse representations for exemplar-based speech recognition, " in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec. 2011.
- (2011) IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
- Sainath, T.¹ Nahamoo, D.² Kanevsky, D.³ Ramabhadran, B.⁴ Shah, P.⁵

11
- 77952744810
- Sparse representations in audio and music: From coding to source separation
- June
- M. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M. Davies, "Sparse representations in audio and music: From coding to source separation, " Proceedings of the IEEE, vol. 98, no. 6, pp. 995-1005, June 2010.
- (2010) Proceedings of the IEEE , vol.98 , Issue.6 , pp. 995-1005
- Plumbley, M.¹ Blumensath, T.² Daudet, L.³ Gribonval, R.⁴ Davies, M.⁵

12
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. P. W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, 2000, pp. 1635-1638.
- (2000) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.3 , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

13
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- Apr
- F. Grezl, M. Karafiat, S. Kontar, and J. Cernocky, "Probabilistic and bottle-neck features for LVCSR of meetings, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, Apr. 2007, pp. 757-760.
- (2007) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.4 , pp. 757-760
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Cernocky, J.⁴

14
- 84865785753
- Improved bottleneck features using pretrained deep neural networks
- D. Yu and M. L. Seltzer, "Improved bottleneck features using pretrained deep neural networks, " in Proc. INTERSPEECH 2011, 12th Annual Conference of the ISCA, 2011, pp. 237-240.
- (2011) Proc. INTERSPEECH 2011, 12th Annual Conference of the ISCA , pp. 237-240
- Yu, D.¹ Seltzer, M.L.²

15
- 84890482429
- Extracting deep bottleneck features using stacked auto-encoders
- May
- J. Gehring, Y. Miao, F. Metze, and A. Waibel, "Extracting deep bottleneck features using stacked auto-encoders, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp. 3377-3381.
- (2013) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 3377-3381
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

16
- 84904482232
- CoRR
- F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, "Unsupervised learning of invariant representations in hierarchical architectures, " CoRR, vol. abs/1311.4158, 2013.
- (2013) Unsupervised Learning of Invariant Representations in Hierarchical Architectures
- Anselmi, F.¹ Leibo, J.Z.² Rosasco, L.³ Mutch, J.⁴ Tacchetti, A.⁵ Poggio, T.⁶

17
- 33645410496
- Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
- Jan
- D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex, " Journal of Physiology, vol. 160, no. 1, pp. 106-154, Jan. 1962.
- (1962) Journal of Physiology , vol.160 , Issue.1 , pp. 106-154
- Hubel, D.H.¹ Wiesel, T.N.²

18
- 0003548585
- National Institute of Standards and Technology
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, "DARPA, TIMIT acousticphonetic continuous speech corpus, " National Institute of Standards and Technology, 1990.
- (1990) DARPA, TIMIT Acousticphonetic Continuous Speech Corpus
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgren, N.L.⁶ Zue, V.⁷

19
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Nov
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Seniore, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Seniore, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

20
- 84055211743
- Acoustic modeling using deep belief networks
- A.-R. Mohamed, G. E. Dahl, and G. E. Hinton, "Acoustic modeling using deep belief networks, " IEEE Transactions on Audio, Speech & Language Processing, vol. 20, no. 1, pp. 14-22, 2012.
- (2012) IEEE Transactions on Audio, Speech & Language Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.-R.¹ Dahl, G.E.² Hinton, G.E.³

21
- 84863380535
- Unsupervised feature learning for audio classification using convolutional deep belief networks
- H. Lee, P. T. Pham, Y. Largman, and A. Y. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks, " in Advances in Neural Information Processing Systems (NIPS) 22, 2009, pp. 1096-1104.
- (2009) Advances in Neural Information Processing Systems (NIPS) , vol.22 , pp. 1096-1104
- Lee, H.¹ Pham, P.T.² Largman, Y.³ Ng, A.Y.⁴

22
- 84867605836
- Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition
- March
- O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012, pp. 4277-4280.
- (2012) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Jiang, H.³ Penn, G.⁴

23
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A.-R. Mohamed, and G. E. Hinton, "Speech recognition with deep recurrent neural networks, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 6645-6649.
- (2013) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.E.³

24
- 84893701254
- Hybrid speech recognition with deep bidirectional LSTM
- Dec
- A. Graves, N. Jaitly, and A.-R. Mohamed, "Hybrid speech recognition with deep bidirectional LSTM, " in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec. 2013, pp. 273-278.
- (2013) IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.-R.³

25
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011, pp. 24-29.
- (2011) IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

26
- 84905267489
- Deep scattering spectrum
- J. Anden and S. Mallat, "Deep scattering spectrum, " 2013, IEEE Trans. Signal Processing (submitted). [Online]. Available: Http://arxiv.org/abs/1304.6763.
- (2013) IEEE Trans. Signal Processing
- Anden, J.¹ Mallat, S.²

27
- 84878539964
- Application of pretrained deep neural networks to large vocabulary speech recognition
- Portland, Oregon
- N. Jaitly, P. Nguyen, A. Senior, and V. Vanhoucke, "Application of pretrained deep neural networks to large vocabulary speech recognition, " in Proc. INTERSPEECH 2012, 13th Annual Conf. of the ISCA, Portland, Oregon, 2012.
- (2012) Proc. INTERSPEECH 2012, 13th Annual Conf. of the ISCA
- Jaitly, N.¹ Nguyen, P.² Senior, A.³ Vanhoucke, V.⁴

28
- 84878919540
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks, " in Advances in Neural Information Processing Systems (NIPS) 25, 2012, pp. 1106-1114.
- (2012) Advances in Neural Information Processing Systems (NIPS) , vol.25 , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

29
- 84893681011
- Vocal Tract Length Perturbation (VTLP) improves speech recognition
- N. Jaitly and G. E. Hinton, "Vocal Tract Length Perturbation (VTLP) improves speech recognition, " in Proc. ICML Workshop on Deep Learning for Audio, Speech and Language, 2013.
- (2013) Proc. ICML Workshop on Deep Learning for Audio, Speech and Language
- Jaitly, N.¹ Hinton, G.E.²

30
- 84976447702
- A comparison of the data requirements of automatic speech recognition systems and human listeners
- Geneva, Switzerland
- R. K. Moore, "A comparison of the data requirements of automatic speech recognition systems and human listeners, " in Proc. EUROSPEECH, 8th European Conf. on Speech Communication and Technology, Geneva, Switzerland, 2003, pp. 2582-2584.
- (2003) Proc. EUROSPEECH, 8th European Conf. on Speech Communication and Technology , pp. 2582-2584
- Moore, R.K.¹

31
- 80053971654
- Video-based descriptors for object recognition
- Sep
- T. Lee and S. Soatto, "Video-based descriptors for object recognition, " Image and Vision Computing, vol. 29, no. 10, pp. 639-652, Sep. 2011.
- (2011) Image and Vision Computing , vol.29 , Issue.10 , pp. 639-652
- Lee, T.¹ Soatto, S.²

32
- 80055110996
- From Finite Groups to Lie Groups, ser. Universitext. Springer
- Y. Kosmann-Schwarzbach, Groups and Symmetries, From Finite Groups to Lie Groups, ser. Universitext. Springer, 2010.
- (2010) Groups and Symmetries
- Kosmann-Schwarzbach, Y.¹

33
- 84963012029
- Some theorems on distribution functions
- Oct
- H. Cramer and H. Wold, "Some theorems on distribution functions, " Journal of the London Mathematical Society, vol. 1-11, no. 4, pp. 290-294, Oct. 1936.
- (1936) Journal of the London Mathematical Society , vol.1-11 , Issue.4 , pp. 290-294
- Cramer, H.¹ Wold, H.²

34
- 0022014331
- Spatiotemporal energy models for the perception of motion
- Feb
- E. Adelson and J. Bergen, "Spatiotemporal energy models for the perception of motion, " Journal of the Optical Society of America A, vol. 2, no. 2, pp. 284-299, Feb. 1985.
- (1985) Journal of the Optical Society of America A , vol.2 , Issue.2 , pp. 284-299
- Adelson, E.¹ Bergen, J.²

35
- 0033316361
- Hierarchical models of object recognition
- Nov
- M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition, " Nature Neurosience, vol. 2, no. 11, pp. 1019-1025, Nov. 2000.
- (2000) Nature Neurosience , vol.2 , Issue.11 , pp. 1019-1025
- Riesenhuber, M.¹ Poggio, T.²

36
- 59849113779
- A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data
- Apr
- R. E. Turner, T. C. Walters, J. J. M. Monaghan, and R. D. Patterson, "A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data, " J. Acoust. Soc. Am., vol. 125, no. 4, pp. 2374-2386, Apr. 2009.
- (2009) J. Acoust. Soc. Am. , vol.125 , Issue.4 , pp. 2374-2386
- Turner, R.E.¹ Walters, T.C.² Monaghan, J.J.M.³ Patterson, R.D.⁴

37
- 0029725604
- A parametric approach to vocal tract length normalization
- May
- E. Eide and H. Gish, "A parametric approach to vocal tract length normalization, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, May 1996, pp. 346-348.
- (1996) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 346-348
- Eide, E.¹ Gish, H.²

38
- 0024768209
- Speaker-independent phone recognition using hidden Markov models
- Nov
- K.-F. Lee and H.-W. Hon, "Speaker-independent phone recognition using hidden Markov models, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, no. 11, pp. 1641- 1648, Nov 1989.
- (1989) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.37 , Issue.11 , pp. 1641-1648
- Lee, K.-F.¹ Hon, H.-W.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.