메뉴 건너뛰기




Volumn , Issue , 2014, Pages 2189-2193

Towards speaker adaptive training of deep neural network acoustic models

Author keywords

Automatic speech recognition; Deep neural networks; Speaker adaptive training

Indexed keywords

SPEECH COMMUNICATION; VECTOR SPACES;

EID: 84910031119     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (87)

References (32)
  • 1
    • 84055222005 scopus 로고    scopus 로고
    • Context dependent pre-trained deep neural networks for large vocabulary speech recognition
    • G. Dahl, D. Yu, L. Deng, and A. Acero, "Context dependent pre-trained deep neural networks for large vocabulary speech recognition, " IEEE Transactions on Audio, Speech and Language Processing, vol. 20(1), pp. 30-42, 2012.
    • (2012) IEEE Transactions on Audio, Speech and Language Processing , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 2
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. ASRU, pp. 24-29, 2011.
    • (2011) Proc. ASRU , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 3
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for hmm-based speech recognition
    • M. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition, " Computer Speech and Language, vol. 12, pp. 75-98, 1998.
    • (1998) Computer Speech and Language , vol.12 , pp. 75-98
    • Gales, M.1
  • 4
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems
    • B. Li, and K. C. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems, " in Proc. Inter speech, pp. 526-529, 2010.
    • (2010) Proc. Inter Speech , pp. 526-529
    • Li, B.1    Sim, K.C.2
  • 6
    • 84878606732 scopus 로고    scopus 로고
    • Hermitian-based hidden activation functions for adaptation of hybrid hmm/ann models
    • S. M. Siniscalchi, J. Li, and C.-H. Lee, "Hermitian-based hidden activation functions for adaptation of hybrid HMM/ANN models, " in Proc. Inter speech, pp. 526-529, 2012.
    • (2012) Proc. Inter Speech , pp. 526-529
    • Siniscalchi, S.M.1    Li, J.2    Lee, C.-H.3
  • 8
    • 84893691530 scopus 로고    scopus 로고
    • Speaker adaptation of neural network acoustic models using i-vectors
    • G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors, " in Proc. ASRU, pp. 55-59, 2013.
    • (2013) Proc. ASRU , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 9
    • 70450180849 scopus 로고    scopus 로고
    • Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
    • N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, and P. Dumouchel, "Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, " in Proc. Inter speech, pp. 1559- 1562, 2009.
    • (2009) Proc. Inter Speech , pp. 1559-1562
    • Dehak, N.1    Dehak, R.2    Kenny, P.3    Brummer, N.4    Ouellet, P.5    Dumouchel, P.6
  • 12
    • 0030677475 scopus 로고    scopus 로고
    • Speaker adaptive training: A maximum likelihood approach to speaker normalization
    • T. Anastasakos, J. McDonough, and J. Makhoul, "Speaker adaptive training: A maximum likelihood approach to speaker normalization, " in Proc. ICASSP, pp. 1043-1046, 1997.
    • (1997) Proc. ICASSP , pp. 1043-1046
    • Anastasakos, T.1    McDonough, J.2    Makhoul, J.3
  • 13
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybrid nn/hmm model for speech recognition based on discriminative learning of speaker code
    • O. Abdel-Hamid, and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in Proc. ICASSP, pp. 7942-7946, 2013.
    • (2013) Proc. ICASSP , pp. 7942-7946
    • Abdel-Hamid, O.1    Jiang, H.2
  • 14
    • 84906225505 scopus 로고    scopus 로고
    • Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
    • O. Abdel-Hamid, and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition, " in Proc. Inter speech, 2013.
    • (2013) Proc. Inter Speech
    • Abdel-Hamid, O.1    Jiang, H.2
  • 16
    • 84858984756 scopus 로고    scopus 로고
    • Ivector-based discriminative adaptation for automatic speech recognition
    • M. Karafiat, L. Burget, P. Matejka, O. Glembek, and J. Cernocky, "iVector-based discriminative adaptation for automatic speech recognition, " in Proc. ASRU, pp. 152- 157, 2011.
    • (2011) Proc. ASRU , pp. 152-157
    • Karafiat, M.1    Burget, L.2    Matejka, P.3    Glembek, O.4    Cernocky, J.5
  • 18
    • 84890483211 scopus 로고    scopus 로고
    • Learning discriminative basis coefficients for eigen space mllr unsupervised adaptation
    • Y. Miao, F. Metze, and A. Waibel, "Learning discriminative basis coefficients for eigen space MLLR unsupervised adaptation, " in Proc. ICASSP, pp. 7927- 7931, 2013.
    • (2013) Proc. ICASSP , pp. 7927-7931
    • Miao, Y.1    Metze, F.2    Waibel, A.3
  • 19
    • 44949102463 scopus 로고    scopus 로고
    • Recent progress on the discriminative region-dependent transform for speech feature extraction
    • B. Zhang, S. Matsoukas, and R. Schwartz, "Recent progress on the discriminative region-dependent transform for speech feature extraction, " in Proc. Inter speech, 2006.
    • (2006) Proc. Inter Speech
    • Zhang, B.1    Matsoukas, S.2    Schwartz, R.3
  • 20
    • 84867605836 scopus 로고    scopus 로고
    • Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition
    • O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Proc. ICASSP, pp. 4277-4280, 2012.
    • (2012) Proc. ICASSP , pp. 4277-4280
    • Abdel-Hamid, O.1    Mohamed, A.2    Jiang, H.3    Penn, G.4
  • 22
    • 84858953642 scopus 로고    scopus 로고
    • The kaldi speech recognition toolkit
    • D. Povey, A. Ghoshal, et al., "The Kaldi speech recognition toolkit, " in Proc. ASRU, 2011.
    • (2011) Proc. ASRU
    • Povey, D.1    Ghoshal, A.2
  • 25
    • 79551480483 scopus 로고    scopus 로고
    • Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
    • P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, " Journal of Machine Learning Research, vol. 11, pp. 3371-3408, 2010.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 3371-3408
    • Vincent, P.1    Larochelle, H.2    Lajoie, I.3    Bengio, Y.4    Manzagol, P.5
  • 26
    • 84890482429 scopus 로고    scopus 로고
    • Extracting deep bottleneck features using stacked auto encoders
    • J. Gehring, Y. Miao, F. Metze, and A. Waibel, "Extracting deep bottleneck features using stacked auto encoders, " in Proc. ICASSP, 2013.
    • (2013) Proc. ICASSP
    • Gehring, J.1    Miao, Y.2    Metze, F.3    Waibel, A.4
  • 28
    • 84893701756 scopus 로고    scopus 로고
    • Deep maxout networks for low-resource speech recognition
    • Y. Miao, F. Metze, and S. Rawat, "Deep maxout networks for low-resource speech recognition, " in Proc. ASRU, pp. 398-403, 2013.
    • (2013) Proc. ASRU , pp. 398-403
    • Miao, Y.1    Metze, F.2    Rawat, S.3
  • 29
    • 84906283232 scopus 로고    scopus 로고
    • Using conversational word bursts in spoken term detection
    • J. Chiu, and A. Rudnicky, "Using conversational word bursts in spoken term detection, " in Proc. Inter speech, 2013.
    • (2013) Proc. Inter Speech
    • Chiu, J.1    Rudnicky, A.2
  • 30
    • 84906273501 scopus 로고    scopus 로고
    • Improving low-resource cddnn- hmm using dropout and multilingual dnn training
    • Y. Miao, and F. Metze, "Improving low-resource CDDNN- HMM using dropout and multilingual DNN training, " in Proc. Inter speech, pp. 2237-2241, 2013.
    • (2013) Proc. Inter Speech , pp. 2237-2241
    • Miao, Y.1    Metze, F.2
  • 31
    • 84910068044 scopus 로고    scopus 로고
    • Distributed learning of multilingual dnn feature extractors using gpus
    • to appear
    • Y. Miao, H. Zhang, and F. Metze, "Distributed learning of multilingual DNN feature extractors using GPUs, " to appear in Proc. Inter speech, 2014.
    • (2014) Proc. Inter Speech
    • Miao, Y.1    Zhang, H.2    Metze, F.3
  • 32
    • 84910028405 scopus 로고    scopus 로고
    • Improving language-universal feature extraction with deep maxout and convolutional neural networks
    • to appear
    • Y. Miao, and F. Metze, "Improving language-universal feature extraction with deep maxout and convolutional neural networks, " to appear in Proc. Inter speech, 2014.
    • (2014) Proc. Inter Speech
    • Miao, Y.1    Metze, F.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.