SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2014, Pages 5582-5586

Data Augmentation for deep neural network acoustic modeling

(3) Cui, Xiaodong a Goel, Vaibhava a Kingsbury, Brian a

a IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

automatic speech recognition; data augmentation; deep neural networks; stochastic feature mapping; vocal tract length perturbation

Indexed keywords

MAPPING; NEURAL NETWORKS; SPEECH RECOGNITION; STOCHASTIC SYSTEMS;

AUTOMATIC SPEECH RECOGNITION; DATA AUGMENTATION; DEEP NEURAL NETWORKS; STOCHASTIC FEATURES; VOCAL TRACT LENGTHS;

SIGNAL PROCESSING;

EID: 84905247925 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2014.6854671 Document Type: Conference Paper

Times cited : (66)

References (17)

1
- 0032203257
- Gradientbased learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradientbased learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

2
- 84945900998
- Best practices for convolutional neural networks applied to visual document analysis
- P. Y. Simard, D. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks applied to visual document analysis," in International Conference on Document Analysis and Recognition (ICDAR), 2003, pp. 958-963.
- (2003) International Conference on Document Analysis and Recognition (ICDAR) , pp. 958-963
- Simard, P.Y.¹ Steinkraus, D.² Platt, J.C.³

3
- 84893681011
- Vocal tract length perturbation (VTLP) improves speech recognition
- N. Jaitly and G. E. Hinton, "Vocal tract length perturbation (VTLP) improves speech recognition," in International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing, 2013.
- (2013) International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing
- Jaitly, N.¹ Hinton, G.E.²

4
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Neural Information Processing Systems (NIPS), 2012, pp. 1106-1114.
- (2012) Neural Information Processing Systems (NIPS) , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

5
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," in Interspeech, 2011, pp. 437-440.
- (2011) Interspeech , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

6
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Automatic Speech Recognition and Understanding Workshop (ASRU), 2011, pp. 24-29.
- (2011) Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

7
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- November
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," in IEEE Signal Processing Maganize, November 2012, pp. 82-97.
- (2012) IEEE Signal Processing Maganize , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

8
- 84878379108
- Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization
- B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization," in Interspeech, 2012.
- (2012) Interspeech
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

9
- 84905249354
- http://www.iarpa.gov/Programs/ia/Babel/babel.html.

10
- 84890507010
- Developing speech recognition systems for corpus indexing under the IARPA Babel program
- J. Cui, X. Cui, B. Ramabhadran, J. Kim, B. Kingsbury, J. Mamou, L. Mangu, M. Picheny, T. N. Sainath, and A. Sethy, "Developing speech recognition systems for corpus indexing under the IARPA Babel program," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 6753-6757.
- (2013) International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 6753-6757
- Cui, J.¹ Cui, X.² Ramabhadran, B.³ Kim, J.⁴ Kingsbury, B.⁵ Mamou, J.⁶ Mangu, L.⁷ Picheny, M.⁸ Sainath, T.N.⁹ Sethy, A.¹⁰

11
- 84890537373
- A highperformance Cantonese keyword search system
- B. Kingsbury, J. Cui, X. Cui, M. J. F. Gales, K. Knill, J. Mamou, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schluter, A. Sethy, and P. C. Woodland, "A highperformance Cantonese keyword search system," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8277-8281.
- (2013) International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 8277-8281
- Kingsbury, B.¹ Cui, J.² Cui, X.³ Gales, M.J.F.⁴ Knill, K.⁵ Mamou, J.⁶ Mangu, L.⁷ Nolden, D.⁸ Picheny, M.⁹ Ramabhadran, B.¹⁰ Schluter, R.¹¹ Sethy, A.¹² Woodland, P.C.¹³

12
- 0031647824
- A frequency warping approach to speaker normalization
- L. Lee and R. Rose, "A frequency warping approach to speaker normalization," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 1, pp. 49-60, 1998.
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.1 , pp. 49-60
- Lee, L.¹ Rose, R.²

13
- 0032638856
- Semi-tied covariance matrices for hidden Markov models
- M. J. F. Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 272-281, 1999.
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.3 , pp. 272-281
- Gales, M.J.F.¹

14
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Computer Speech and Language, vol. 12, pp. 75-98, 1998.
- (1998) Computer Speech and Language , vol.12 , pp. 75-98
- Gales, M.J.F.¹

15
- 79951796005
- The IBM Attila speech recognition toolkit
- H. Soltau, G. Saon, and B. Kingsbury, "The IBM Attila speech recognition toolkit," in Spoken Language Technology Workshop (SLT), 2010, pp. 97-101.
- (2010) Spoken Language Technology Workshop (SLT) , pp. 97-101
- Soltau, H.¹ Saon, G.² Kingsbury, B.³

16
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hiddenMarkov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hiddenMarkov models," Computer Speech and Language, vol. 9, pp. 171-185, 1995.
- (1995) Computer Speech and Language , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

17
- 84865265602
- Hidden Markov acoustic modeling with bootstrap and restructuring for lowresourced languages
- X. Cui, J. Xue, X. Chen, P. A. Olsen, P. L. Dognin, U. V. Chaudhari, J. R. Hershey, and B. Zhou, "Hidden Markov acoustic modeling with bootstrap and restructuring for lowresourced languages," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 8, pp. 2252-2264, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.8 , pp. 2252-2264
- Cui, X.¹ Xue, J.² Chen, X.³ Olsen, P.A.⁴ Dognin, P.L.⁵ Chaudhari, U.V.⁶ Hershey, J.R.⁷ Zhou, B.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.