SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2017-August, Issue , 2017, Pages 2411-2415

Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition

(3) Kim, Taesup a Song, Inchul b Bengio, Yoshua a

a UNIVERSITÉ DE MONTRÉAL (Canada)

b Samsung Advanced Institute of Technology (SAIT) (South Korea)

Author keywords

Adaptive acoustic model; Dynamic layer normalization; Speech recognition

Indexed keywords

DEEP NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION;

ACOUSTIC MODEL; ACOUSTIC VARIABILITY; ADAPTATION PARAMETERS; BENCHMARK DATASETS; CHANNEL NOISE; LARGE VOCABULARY; MODEL SIZE; TRAINING SPEED;

SPEECH RECOGNITION;

EID: 85039151782 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2017-556 Document Type: Conference Paper

Times cited : (48)

References (29)

1
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, Vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

2
- 0031573117
- Long short-term memory
- Nov.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., Vol. 9, no. 8, pp. 1735-1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
- (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

3
- 33646258991
- A. Graves, S. Fernandez, and J. Schmidhuber, Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, 2005, pp. 799-804.
- (2005) Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , pp. 799-804
- Graves, A.¹ Fernandez, S.² Schmidhuber, J.³

4
- 84910046405
- Long short-term memory recurrent neural network architectures for large scale acoustic modeling
- H. Sak, A. W. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." in Interspeech, 2014, pp. 338-342.
- (2014) Interspeech , pp. 338-342
- Sak, H.¹ Senior, A.W.² Beaufays, F.³

5
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors." in ASRU, 2013, pp. 55-59.
- (2013) ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

6
- 84905259145
- I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription
- V. Gupta, P. Kenny, P. Ouellet, and T. Stafylakis, "I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 6334-6338.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On. IEEE , pp. 6334-6338
- Gupta, V.¹ Kenny, P.² Ouellet, P.³ Stafylakis, T.⁴

7
- 84973380342
- Speaker-aware training of lstm-rnns for acoustic modelling
- T. Tan, Y. Qian, D. Yu, S. Kundu, L. Lu, K. C. Sim, X. Xiao, and Y Zhang, "Speaker-aware training of lstm-rnns for acoustic modelling," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 5280-5284.
- (2016) Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference On. IEEE , pp. 5280-5284
- Tan, T.¹ Qian, Y.² Yu, D.³ Kundu, S.⁴ Lu, L.⁵ Sim, K.C.⁶ Xiao, X.⁷ Zhang, Y.⁸

8
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. Audio, Speech & Language Processing, Vol. 19, no. 4, pp. 788-798, 2011. [Online]. Available: http://dx.doi.org/10.1109/TASL.2010.2064307
- (2011) IEEE Trans. Audio, Speech & Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

9
- 84890521103
- Speaker adaptation of context dependent deep neural networks
- H. Liao, "Speaker adaptation of context dependent deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7947-7951.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On. IEEE , pp. 7947-7951
- Liao, H.¹

10
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- IEEE
- P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models," in Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014, pp. 171-176.
- (2014) Spoken Language Technology Workshop (SLT), 2014 IEEE , pp. 171-176
- Swietojanski, P.¹ Renals, S.²

11
- 84976435936
- Learning hidden unit contributions for unsupervised acoustic model adaptation
- P. Swietojanski, J. Li, and S. Renals, "Learning hidden unit contributions for unsupervised acoustic model adaptation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 8, pp. 1450-1463, 2016.
- (2016) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.8 , pp. 1450-1463
- Swietojanski, P.¹ Li, J.² Renals, S.³

12
- 84938688160
- Speaker adaptive training of deep neural network acoustic models using i-vectors
- Y Miao, H. Zhang, and F. Metze, "Speaker adaptive training of deep neural network acoustic models using i-vectors," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Vol. 23, no. 11, pp. 1938-1949, 2015.
- (2015) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) , vol.23 , Issue.11 , pp. 1938-1949
- Miao, Y.¹ Zhang, H.² Metze, F.³

13
- 85039174342
- Layer Normalization
- L. J. Ba, R. Kiros, and G. E. Hinton, "Layer normalization," CoRR, Vol. abs/1607.06450, 2016. [Online]. Available: http://arxiv.org/abs/1607.06450
- (2016) CoRR
- Ba, L.J.¹ Kiros, R.² Hinton, G.E.³

14
- 84990067826
- Texture networks: Feed-forward synthesis of textures and stylized images
- D. Ulyanov, V. Lebedev, A. Vedaldi, and V. S. Lempitsky, "Texture networks: Feed-forward synthesis of textures and stylized images," CoRR, Vol. abs/1603.03417, 2016. [Online]. Available: http://arxiv.org/abs/1603.03417
- (2016) CoRR
- Ulyanov, D.¹ Lebedev, V.² Vedaldi, A.³ Lempitsky, V.S.⁴

15
- 84990034290
- Perceptual losses for real-time style transfer and super-resolution
- J. Johnson, A. Alahi, and F. Li, "Perceptual losses for real-time style transfer and super-resolution," CoRR, Vol. abs/1603.08155, 2016. [Online]. Available: http://arxiv.org/abs/1603.08155
- (2016) CoRR
- Johnson, J.¹ Alahi, A.² Li, F.³

16
- 85039172195
- Instance Normalization: The missing ingredient for fast stylization
- D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, "Instance normalization: The missing ingredient for fast stylization," CoRR, Vol. abs/1607.08022, 2016. [Online]. Available: http://arxiv.org/abs/1607.08022
- (2016) CoRR
- Ulyanov, D.¹ Vedaldi, A.² Lempitsky, V.S.³

17
- 85028600965
- A learned representation for artistic style
- V. Dumoulin, J. Shlens, and M. Kudlur, "A learned representation for artistic style," CoRR, Vol. abs/1610.07629, 2016. [Online]. Available: http://arxiv.org/abs/1610.07629
- (2016) CoRR
- Dumoulin, V.¹ Shlens, J.² Kudlur, M.³

18
- 0012330750
- The design for the wall street journal-based csr corpus
- Association for Computational Linguistics
- D. B. Paul and J. M. Baker, "The design for the wall street journal-based csr corpus," in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357-362.
- (1992) Proceedings of the Workshop on Speech and Natural Language , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

19
- 85020205851
- Enhancing the tedlium corpus with selected data for language modeling and more ted talks
- A. Rousseau, P. Deléglise, and Y Estève, "Enhancing the tedlium corpus with selected data for language modeling and more ted talks." in LREC, 2014, pp. 3935-3939.
- (2014) LREC , pp. 3935-3939
- Rousseau, A.¹ Deléglise, P.² Estève, Y.³

20
- 84969584486
- Batch Normalization: Accelerating deep network training by reducing internal covariate shift
- F. R. Bach and D. M. Blei, Eds. JMLR.org
- S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift." in ICML, ser. JMLR Workshop and Conference Proceedings, F. R. Bach and D. M. Blei, Eds., Vol. 37. JMLR.org, 2015, pp. 448-456.
- (2015) ICML, Ser. JMLR Workshop and Conference Proceedings , vol.37 , pp. 448-456
- Ioffe, S.¹ Szegedy, C.²

21
- 85030994434
- D. Ha, A. Dai, and Q. V. Le, "Hypernetworks," arXiv preprint arXiv:1609.09106, 2016.
- (2016) Hypernetworks
- Ha, D.¹ Dai, A.² Le, Q.V.³

22
- 0031268931
- Bidirectional recurrent neural networks
- Nov
- M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, Vol. 45, no. 11, pp. 2673-2681, Nov 1997.
- (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

23
- 84893701254
- Hybrid speech recognition with deep bidirectional lstm
- A. Graves, N. Jaitly, and A.-r. Mohamed, "Hybrid speech recognition with deep bidirectional lstm," in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 273-278.
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop On. IEEE , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.-R.³

24
- 85083951076
- Adam: A method for stochastic optimization
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," CoRR, Vol. abs/1412.6980, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
- (2014) CoRR
- Kingma, D.P.¹ Ba, J.²

25
- 84988375859
- M. Henaff, A. Szlam, and Y LeCun, "Orthogonal rnns and long-memory tasks," arXiv preprint arXiv:1602.06662, 2016.
- (2016) Orthogonal Rnns and Long-memory Tasks
- Henaff, M.¹ Szlam, A.² LeCun, Y.³

26
- 84979557463
- arXiv eprints May
- Theano Development Team, "Theano: A Python framework for fast computation of mathematical expressions," arXiv eprints, Vol. abs/1605.02688, May 2016. [Online]. Available: http://arxiv.org/abs/1605.02688
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

27
- 84973384984
- Aug.
- S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri, D. Maturana, M. Thoma, E. Battenberg, J. Kelly, J. D. Fauw, M. Heilman, D. M. de Almeida, B. McFee, H. Weideman, G. Takâcs, P. de Rivaz, J. Crall, G. Sanders, K. Rasul, C. Liu, G. French, and J. Degrave, "Lasagne: First release." Aug. 2015. [Online]. Available: http://dx.doi.org/10.5281/zenodo.27878
- (2015) Lasagne: First Release
- Dieleman, S.¹ Schlüter, J.² Raffel, C.³ Olson, E.⁴ Sønderby, S.K.⁵ Nouri, D.⁶ Maturana, D.⁷ Thoma, M.⁸ Battenberg, E.⁹ Kelly, J.¹⁰ Fauw, J.D.¹¹ Heilman, M.¹² De Almeida, D.M.¹³ McFee, B.¹⁴ Weideman, H.¹⁵ Takâcs, G.¹⁶ De Rivaz, P.¹⁷ Crall, J.¹⁸ Sanders, G.¹⁹ Rasul, K.²⁰ more..

28
- 84893696682
- The kaldi speech recognition toolkit
- IEEE Signal Processing Society
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y Qian, P. Schwarz et al, "The kaldi speech recognition toolkit," in IEEE 2011 workshop on automatic speech recognition and understanding, no. EPFLCONF-192584. IEEE Signal Processing Society, 2011.
- (2011) IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, No. EPFLCONF-192584
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰

29
- 57249084011
- Visualizing high-dimensional data using t-sne
- L. van der Maaten and G. E. Hinton, "Visualizing high-dimensional data using t-sne," Journal of Machine Learning Research, Vol. 9, pp. 2579-2605, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 2579-2605
- Van Der Maaten, L.¹ Hinton, G.E.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.