SCOPUS 정보 검색 플랫폼

Volumn 2015-January, Issue , 2015, Pages 3586-3589

Audio augmentation for speech recognition

Author keywords

Data augmentation; Deep neural network; Speech recognition

Indexed keywords

SPEECH; SPEECH COMMUNICATION;

AUDIO SIGNAL; AUGMENTATION METHODS; COMMON STRATEGY; DATA AUGMENTATION; DEEP NEURAL NETWORKS; IMPLEMENTATION COST; ORIGINAL SIGNAL; TRAINING DATA;

SPEECH RECOGNITION;

EID: 84959118622 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1345)

References (17)

1
- 84928545733
- CoRR, abs/1412. 5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng, "Deep speech: Scaling up end-to-end speech recognition, " CoRR, abs/1412. 5567, 2014.
- (2014) Deep Speech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷ Satheesh, S.⁸ Sengupta, S.⁹ Coates, A.¹⁰ Ng, A.Y.¹¹

2
- 77949375556
- Support vector machines for noise robust asr
- M. J. F. Gales, A. Ragni, H. AlDamarki, and C. Gautier, "Support vector machines for noise robust asr, " in ASRU, 2009, pp. 205-210.
- (2009) ASRU , pp. 205-210
- Gales, M.J.F.¹ Ragni, A.² AlDamarki, H.³ Gautier, C.⁴

3
- 84893681011
- Vocal tract length perturbation (VTLP) improves speech recognition
- N. Jaitly and G. E. Hinton, "Vocal tract length perturbation (VTLP) improves speech recognition, " in International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing, 2013.
- (2013) International Conference on Machine Learning (ICML) Workshop on Deep Learning for Audio, Speech, and Language Processing
- Jaitly, N.¹ Hinton, G.E.²

4
- 84905247925
- Data augmentation for deep neural network acoustic modeling
- X. Cui, V. Goel, and B. Kingsbury, "Data augmentation for deep neural network acoustic modeling, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 100-104.
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 100-104
- Cui, X.¹ Goel, V.² Kingsbury, B.³

5
- 84910031125
- Data augmentation for low resource languages
- A. Ragni, K. M. Knill, S. P. Rath, and M. J. F. Gales, "Data augmentation for low resource languages, " in Interspeech, 2014.
- (2014) Interspeech
- Ragni, A.¹ Knill, K.M.² Rath, S.P.³ Gales, M.J.F.⁴

6
- 84893642825
- Elastic spectral distortion for low resource speech recognition with deep neural networks
- N. Kanda, R. Takeda, and Y. Obuchi, "Elastic spectral distortion for low resource speech recognition with deep neural networks, " in ASRU, 2013.
- (2013) ASRU
- Kanda, N.¹ Takeda, R.² Obuchi, Y.³

7
- 84959115289
- A time delay neural network architecture for efficient modeling of long temporal contexts
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts, " in Proceedings of INTERSPEECH, 2015.
- (2015) Proceedings of INTERSPEECH
- Peddinti, V.¹ Povey, D.² Khudanpur, S.³

9
- 84959085793
- accessed March 25, 2015
- SoX, audio manipulation tool, (accessed March 25, 2015). [Online]. Available: http: //sox. sourceforge. net/
- Audio Manipulation Tool

10
- 84946076428
- Ted-lium: An automatic speech recognition dedicated corpus
- A. Rousseau, P. Deléglise, and Y. Estève, "Ted-lium: An automatic speech recognition dedicated corpus. " in LREC, 2012, pp. 125-129.
- (2012) LREC , pp. 125-129
- Rousseau, A.¹ Deléglise, P.² Estève, Y.³

11
- 84946015916
- Librispeech: An ASR corpus based on public domain audio books
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.
- (2015) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
- Panayotov, V.¹ Chen, G.² Povey, D.³ Khudanpur, S.⁴

13
- 0019053271
- Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
- S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

14
- 84858984756
- IEEE, Dec.
- M. Karafiat, L. Burget, P. Matejka, O. Glembek, and J. Cernocky, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE, Dec., pp. 152-157.
- 2011 IEEE Workshop on Automatic Speech Recognition & Understanding , pp. 152-157
- Karafiat, M.¹ Burget, L.² Matejka, P.³ Glembek, O.⁴ Cernocky, J.⁵

15
- 84944790942
- CoRR abs/1410. 7455
- D. Povey, X. Zhang, and S. Khudanpur, "Parallel training of deep neural networks with natural gradient and parameter averaging, " CoRR, vol. abs/1410. 7455, 2014. [Online]. Available: http: //arxiv. org/abs/1410. 7455
- (2014) Parallel Training of Deep Neural Networks with Natural Gradient and Parameter Averaging
- Povey, D.¹ Zhang, X.² Khudanpur, S.³

17
- 84905252790
- A pitch extraction algorithm tuned for automatic speech recognition
- P. Ghahremani, B. BabaAli, D. Povey, K. Riedhammer, J. Trmal, and S. Khudanpur, "A pitch extraction algorithm tuned for automatic speech recognition, " in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 2494-2498.
- (2014) Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE , pp. 2494-2498
- Ghahremani, P.¹ BabaAli, B.² Povey, D.³ Riedhammer, K.⁴ Trmal, J.⁵ Khudanpur, S.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.