SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4310-4314

Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data

(4) Zhao, Yong a Li, Jinyu a Xue, Jian a Gong, Yifan a

a MICROSOFT (United States)

Author keywords

automatic speech recognition; deep neural network; low footprint; speaker adaptation

Indexed keywords

AUDIO SIGNAL PROCESSING; CHEMICAL ACTIVATION; DIGITAL STORAGE; LINEAR TRANSFORMATIONS; MATHEMATICAL TRANSFORMATIONS; SINGULAR VALUE DECOMPOSITION; SPEECH COMMUNICATION; SPEECH RECOGNITION;

ACCURACY IMPROVEMENT; ACTIVATION FUNCTIONS; ADAPTATION ALGORITHMS; ADAPTATION TECHNIQUES; AUTOMATIC SPEECH RECOGNITION; LOW FOOTPRINT; SIGMOID ACTIVATION FUNCTION; SPEAKER ADAPTATION;

DEEP NEURAL NETWORKS;

EID: 84946061232 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178784 Document Type: Conference Paper

Times cited : (32)

References (27)

1
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, 2012
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

2
- 84878539964
- Application of pretrained deep neural networks to large vocabulary speech recognition
- N. Jaitly, P. Nguyen, A. Senior, and V. Vanhoucke, Application of pretrained deep neural networks to large vocabulary speech recognition, in Proc. Interspeech, 2012, pp. 2578-2581
- (2012) Proc. Interspeech , pp. 2578-2581
- Jaitly, N.¹ Nguyen, P.² Senior, A.³ Vanhoucke, V.⁴

3
- 84890491198
- Recent advances in deep learning for speech research at Microsoft
- L. Deng, J. Li, J.-T. Huang, et al., Recent advances in deep learning for speech research at Microsoft, in Proc. ICASSP, 2013, pp. 8604-8608
- (2013) Proc. ICASSP , pp. 8604-8608
- Deng, L.¹ Li, J.² Huang, J.-T.³

4
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, and A. Mohamed, Making deep belief networks effective for large vocabulary continuous speech recognition, in Proc. ASRU, 2011, pp. 30-35
- (2011) Proc. ASRU , pp. 30-35
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.⁶

5
- 84937880519
- Connectionist speaker normalization and adaptation
- V. Abrash, H. Franco, A. Sankar, and M. Cohen, Connectionist speaker normalization and adaptation, in Proc. Eurospeech, 1995, pp. 2183-2186
- (1995) Proc. Eurospeech , pp. 2183-2186
- Abrash, V.¹ Franco, H.² Sankar, A.³ Cohen, M.⁴

6
- 84937854847
- Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system
- J. Neto, L. Almeida, M. Hochberg, C. Martins, Lu. Nunes, S. Renals, and T. Robinson, Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system, in Proc. Eurospeech, 1995, pp. 2171-2174
- (1995) Proc. Eurospeech , pp. 2171-2174
- Neto, J.¹ Almeida, L.² Hochberg, M.³ Martins, C.⁴ Nunes, Lu.⁵ Renals, S.⁶ Robinson, T.⁷

7
- 79959849500
- Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
- B. Li and K. C. Sim, Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems, in Proc. Interspeech, 2010, pp. 526-529
- (2010) Proc. Interspeech , pp. 526-529
- Li, B.¹ Sim, K.C.²

8
- 84865740155
- Improving LVCSR system combination using neural network language model cross adaptation
- X. Liu, M. J. F. Gales, and P. C. Woodland, Improving LVCSR system combination using neural network language model cross adaptation, in Proc. Interspeech, 2011, pp. 2857-2860
- (2011) Proc. Interspeech , pp. 2857-2860
- Liu, X.¹ Gales, M.J.F.² Woodland, P.C.³

9
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, Feature engineering in context-dependent deep neural networks for conversational speech transcription, in Proc. ASRU, 2011, pp. 24-29
- (2011) Proc. ASRU , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

10
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- Jan
- M. J. F. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, Jan. 1998
- (1998) Comput. Speech Lang , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

11
- 34548012893
- Linear hidden transformations for adaptation of hybrid ANN/HMM models
- R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. De Mori, Linear hidden transformations for adaptation of hybrid ANN/HMM models, Speech Commun., vol. 49, no. 10, pp. 827-835, 2007
- (2007) Speech Commun , vol.49 , Issue.10 , pp. 827-835
- Gemello, R.¹ Mana, F.² Scanzio, S.³ Laface, P.⁴ De Mori, R.⁵

12
- 84874226579
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, Adaptation of context-dependent deep neural networks for automatic speech recognition., in Proc. SLT, 2012, pp. 366-369
- (2012) Proc. SLT , pp. 366-369
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

13
- 84905262902
- Factorized adaptation for deep neural network
- J. Li, J.-T. Huang, and Y. Gong, Factorized adaptation for deep neural network, in Proc. ICASSP, 2014, pp. 5574-5578
- (2014) Proc. ICASSP , pp. 5574-5578
- Li, J.¹ Huang, J.-T.² Gong, Y.³

14
- 84905229915
- Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network
- J. Xue, J. Li, D. Yu, M. Seltzer, and Y. Gong, Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network, in Proc. ICASSP, 2014, pp. 6409-6413
- (2014) Proc. ICASSP , pp. 6409-6413
- Xue, J.¹ Li, J.² Yu, D.³ Seltzer, M.⁴ Gong, Y.⁵

15
- 84878606732
- Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models
- S. M. Siniscalchi, J. Li, and C.-H. Lee, Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models, in Proc. INTERSPEECH, 2012, pp. 2590-2593
- (2012) Proc. INTERSPEECH , pp. 2590-2593
- Siniscalchi, S.M.¹ Li, J.² Lee, C.-H.³

16
- 84881054791
- Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
- S. M. Siniscalchi, J. Li, and C.-H. Lee, Hermitian polynomial for speaker adaptation of connectionist speech recognition systems, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2152-2161, 2013
- (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.10 , pp. 2152-2161
- Siniscalchi, S.M.¹ Li, J.² Lee, C.-H.³

17
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- P. Swietojanski and S. Renals, Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, in Proc. SLT, 2014, pp. 171-176
- (2014) Proc. SLT , pp. 171-176
- Swietojanski, P.¹ Renals, S.²

18
- 84906227589
- Restructuring of deep neural network acoustic models with singular value decomposition
- J. Xue, J. Li, and Y. Gong, Restructuring of deep neural network acoustic models with singular value decomposition, in Proc. Interspeech, 2013, pp. 2365-2369
- (2013) Proc. Interspeech , pp. 2365-2369
- Xue, J.¹ Li, J.² Gong, Y.³

19
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. J. Leggetter and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Comput. Speech Lang., vol. 9, pp. 171-186, 1995
- (1995) Comput. Speech Lang , vol.9 , pp. 171-186
- Leggetter, C.J.¹ Woodland, P.C.²

20
- 84921817164
- Learning representations by back-propagating errors
- MIT Press, Cambridge, MA
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, in Cognitive modeling. MIT Press, Cambridge, MA, 1988
- (1988) Cognitive Modeling
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

21
- 84890542079
- KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, in Proc. ICASSP, 2013, pp. 7893-7897
- (2013) Proc. ICASSP , pp. 7893-7897
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

22
- 4544253838
- Improving broadcast news transcription by lightly supervised discriminative training
- H. Y. Chan and P. C. Woodland, Improving broadcast news transcription by lightly supervised discriminative training, in Proc. ICASSP, 2004, pp. 737-740
- (2004) Proc. ICASSP , pp. 737-740
- Chan, H.Y.¹ Woodland, P.C.²

23
- 77950063604
- Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion
- D. Yu, B. Varadarajan, L. Deng, and A. Acero, Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion, Comput. Speech Lang., vol. 24, no. 3, pp. 433-444, 2010
- (2010) Comput. Speech Lang , vol.24 , Issue.3 , pp. 433-444
- Yu, D.¹ Varadarajan, B.² Deng, L.³ Acero, A.⁴

24
- 84893650076
- Semi-supervised training of deep neural networks
- K. Vesely, M. Hannemann, and L. Burget, Semi-supervised training of deep neural networks, in Proc. ASRU, 2013, pp. 267-272
- (2013) Proc. ASRU , pp. 267-272
- Vesely, K.¹ Hannemann, M.² Burget, L.³

25
- 84906218045
- Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration
- Y. Huang, D. Yu, Y. Gong, and C. Liu, Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration, in Proc. Interspeech, 2013, pp. 2360-2364
- (2013) Proc. Interspeech , pp. 2360-2364
- Huang, Y.¹ Yu, D.² Gong, Y.³ Liu, C.⁴

26
- 84905251985
- Training data selection based on contextdependent state matching
- O. Siohan, Training data selection based on contextdependent state matching, in Proc. ICASSP, 2014, pp. 3316-3319
- (2014) Proc. ICASSP , pp. 3316-3319
- Siohan, O.¹

27
- 84905259138
- Improving DNN speaker independence with i-vector inputs
- A. Senior and I. Lopez-Moreno, Improving DNN speaker independence with i-vector inputs, in Proc. ICASSP, 2014, pp. 225-229
- (2014) Proc. ICASSP , pp. 225-229
- Senior, A.¹ Lopez-Moreno, I.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.