메뉴 건너뛰기




Volumn 23, Issue 7, 2015, Pages 1172-1183

Multitask learning of deep neural networks for low-resource speech recognition

Author keywords

Deep neural network (DNN); low resource speech recognition; multitask learning; universal grapheme set; universal phone set

Indexed keywords

COMPUTATIONAL LINGUISTICS; LEARNING ALGORITHMS; LINEARIZATION; TELEPHONE SETS; LEARNING SYSTEMS;

EID: 84929327770     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2015.2422573     Document Type: Article
Times cited : (105)

References (58)
  • 1
    • 0039330830 scopus 로고
    • Reversible letter-to-sound sound-to-letter generation based on parsing word morphology
    • S. Hunnicutt, H. M. Meng, S. Seneff, and V. W. Zue, "Reversible letter-to-sound sound-to-letter generation based on parsing word morphology," in Proc. Eurospeech, 1993, pp. 763-766.
    • (1993) Proc. Eurospeech , pp. 763-766
    • Hunnicutt, S.1    Meng, H.M.2    Seneff, S.3    Zue, V.W.4
  • 3
    • 0036295940 scopus 로고    scopus 로고
    • Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition
    • S. Kanthak and H. Ney, "Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition," in Proc. ICASSP, 2002, vol. 1, pp. 845-848.
    • (2002) Proc. ICASSP , vol.1 , pp. 845-848
    • Kanthak, S.1    Ney, H.2
  • 6
    • 84893697912 scopus 로고    scopus 로고
    • Eigentrigraphemes for under-resourced languages
    • Jan.
    • T. Ko and B. Mak, "Eigentrigraphemes for under-resourced languages," Speech Commun., vol. 56, pp. 132-141, Jan. 2014.
    • (2014) Speech Commun. , vol.56 , pp. 132-141
    • Ko, T.1    Mak, B.2
  • 7
    • 0028996958 scopus 로고
    • Four-level tied-structure for efficient representation of acoustic modeling
    • S. Takahashi and S. Sagayama, "Four-level tied-structure for efficient representation of acoustic modeling," in Proc. ICASSP, 1995, vol. 1, pp. 520-523.
    • (1995) Proc. ICASSP , vol.1 , pp. 520-523
    • Takahashi, S.1    Sagayama, S.2
  • 8
    • 0025419316 scopus 로고
    • Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition
    • Apr.
    • K. F. Lee, "Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 4, pp. 599-609, Apr. 1990.
    • (1990) IEEE Trans. Acoust., Speech, Signal Process , vol.38 , Issue.4 , pp. 599-609
    • Lee, K.F.1
  • 9
    • 85135369802 scopus 로고
    • The use of state tying in continuous speech recognition
    • S. J. Young and P. C. Woodland, "The use of state tying in continuous speech recognition," in Proc. Eurospeech, 1993, vol. 3, pp. 2203-2206.
    • (1993) Proc. Eurospeech , vol.3 , pp. 2203-2206
    • Young, S.J.1    Woodland, P.C.2
  • 10
    • 0027683813 scopus 로고
    • Shared-distribution hidden Markov model for speech recognition
    • Jan.
    • M. Y. Hwang and X. D. Huang, "Shared-distribution hidden Markov model for speech recognition," IEEE Trans. Speech Audio Process., vol. 1, pp. 414-420, Jan. 1993.
    • (1993) IEEE Trans. Speech Audio Process , vol.1 , pp. 414-420
    • Hwang, M.Y.1    Huang, X.D.2
  • 11
    • 0035280044 scopus 로고    scopus 로고
    • Subspace distribution clustering hidden Markov model
    • Mar.
    • E. Bocchieri and B. Mak, "Subspace distribution clustering hidden Markov model," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 264-275, Mar. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.3 , pp. 264-275
    • Bocchieri, E.1    Mak, B.2
  • 12
    • 0025629882 scopus 로고
    • Tied mixture continuous parameter modeling for speech recognition
    • Dec.
    • J. R. Bellegarda and D. Nahamoo, "Tied mixture continuous parameter modeling for speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 12, pp. 2033-2045, Dec. 1990.
    • (1990) IEEE Trans. Acoust., Speech, Signal Process , vol.38 , Issue.12 , pp. 2033-2045
    • Bellegarda, J.R.1    Nahamoo, D.2
  • 13
    • 0000250399 scopus 로고
    • Semi-continuous hidden Markov models for speech signals
    • Jul.
    • X. Huang and M. A. Jack, "Semi-continuous hidden Markov models for speech signals," Comput. Speech Lang., vol. 3, no. 3, pp. 239-251, Jul. 1989.
    • (1989) Comput. Speech Lang. , vol.3 , Issue.3 , pp. 239-251
    • Huang, X.1    Jack, M.A.2
  • 15
  • 16
    • 79959841827 scopus 로고    scopus 로고
    • Canonical state models for automatic speech recognition
    • M. J. F. Gales and K. Yu, "Canonical state models for automatic speech recognition," in Proc. Interspeech, 2010, pp. 58-61.
    • (2010) Proc. Interspeech , pp. 58-61
    • Gales, M.J.F.1    Yu, K.2
  • 17
    • 77956031473 scopus 로고    scopus 로고
    • A survey on transfer learning
    • S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345-1359, 2010.
    • (2010) IEEE Trans. Knowl. Data Eng. , vol.22 , Issue.10 , pp. 1345-1359
    • Pan, S.J.1    Yang, Q.2
  • 19
    • 69249139569 scopus 로고    scopus 로고
    • Automatic speech recognition for under-resourced languages: Application to Vietnamese language
    • Nov.
    • V. Le and L. Besacier, "Automatic speech recognition for under-resourced languages: Application to Vietnamese language," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 8, pp. 1471-1482, Nov. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.8 , pp. 1471-1482
    • Le, V.1    Besacier, L.2
  • 20
    • 0002871277 scopus 로고    scopus 로고
    • Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds
    • J. Kohler, "Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds," in Proc. ICSLP, 1996.
    • (1996) Proc. ICSLP
    • Kohler, J.1
  • 21
    • 84890461500 scopus 로고    scopus 로고
    • Multilingual training of deep-neural networks
    • A. Ghoshal, P. Swietojanski, and S. Renals, "Multilingual training of deep-neural networks," in Proc. ICASSP, 2013, pp. 7319-7323.
    • (2013) Proc. ICASSP , pp. 7319-7323
    • Ghoshal, A.1    Swietojanski, P.2    Renals, S.3
  • 22
    • 84890527497 scopus 로고    scopus 로고
    • Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers
    • J.-T. Huang, J. Li, D. Yu, L. Deng, and Y. Gong, "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers," in Proc. ICASSP, 2013, pp. 7304-7308.
    • (2013) Proc. ICASSP , pp. 7304-7308
    • Huang, J.-T.1    Li, J.2    Yu, D.3    Deng, L.4    Gong, Y.5
  • 23
    • 0031619917 scopus 로고    scopus 로고
    • Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks
    • J. Kohler, "Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks," in Proc. ICASSP, 1998, vol. 1, pp. 417-420.
    • (1998) Proc. ICASSP , vol.1 , pp. 417-420
    • Kohler, J.1
  • 24
    • 84905248329 scopus 로고    scopus 로고
    • Adaptation of multilingual stacked bottle-neck neural network structure for new language
    • F. Grezl, M. Karafiat, and K. Vesely, "Adaptation of multilingual stacked bottle-neck neural network structure for new language," in Proc. ICASSP, 2014, pp. 7654-7658.
    • (2014) Proc. ICASSP , pp. 7654-7658
    • Grezl, F.1    Karafiat, M.2    Vesely, K.3
  • 25
    • 84877867684 scopus 로고    scopus 로고
    • Applying multi-and cross-lingual stochastic phone space transformations to non-native speech recognition
    • Aug.
    • D. Imseng, H. Bourlard, J. Dines, P. Garner, and M. Magimai-Doss, "Applying multi-and cross-lingual stochastic phone space transformations to non-native speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 8, pp. 1713-1726, Aug. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.8 , pp. 1713-1726
    • Imseng, D.1    Bourlard, H.2    Dines, J.3    Garner, P.4    Magimai-Doss, M.5
  • 26
    • 84905223329 scopus 로고    scopus 로고
    • Multilingual deep neural network based acoustic modeling for rapid language adaptation
    • N. T. Vu, D. Imseng, D. Povey, P. Motlicek, T. Schultz, and H. Bourlard, "Multilingual deep neural network based acoustic modeling for rapid language adaptation," in Proc. ICASSP, 2014, pp. 7639-7643.
    • (2014) Proc. ICASSP , pp. 7639-7643
    • Vu, N.T.1    Imseng, D.2    Povey, D.3    Motlicek, P.4    Schultz, T.5    Bourlard, H.6
  • 27
    • 84905247925 scopus 로고    scopus 로고
    • Data augmentation for deep neural network acoustic modeling
    • X. Cui, V. Goel, and B. Kingsbury, "Data augmentation for deep neural network acoustic modeling," in Proc. ICASSP, 2014, pp. 5582-5586.
    • (2014) Proc. ICASSP , pp. 5582-5586
    • Cui, X.1    Goel, V.2    Kingsbury, B.3
  • 30
    • 70349220094 scopus 로고    scopus 로고
    • A study on multilingual acoustic modeling for large vocabulary ASR
    • Apr.
    • H. Lin, L. Deng, D. Yu, Y.-F. Gong, A. Acero, and C.-H. Lee, "A study on multilingual acoustic modeling for large vocabulary ASR," in Proc. ICASSP, Apr. 2009, pp. 4333-4336.
    • (2009) Proc. ICASSP , pp. 4333-4336
    • Lin, H.1    Deng, L.2    Yu, D.3    Gong, Y.-F.4    Acero, A.5    Lee, C.-H.6
  • 31
    • 0031189914 scopus 로고    scopus 로고
    • Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, USA
    • R. Caruana, "Multitask learning," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, USA, 1997.
    • (1997) Multitask Learning
    • Caruana, R.1
  • 33
    • 84905283791 scopus 로고    scopus 로고
    • Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition
    • D. Chen, B. Mak, C.-C. Leung, and S. Sivadas, "Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition," in Proc. ICASSP, 2014, pp. 5992-5296.
    • (2014) Proc. ICASSP , pp. 5992-5296
    • Chen, D.1    Mak, B.2    Leung, C.-C.3    Sivadas, S.4
  • 35
    • 14344277592 scopus 로고    scopus 로고
    • A model of inductive bias learning
    • J. Baxter, "A model of inductive bias learning," J. Artif. Intell. Res., vol. 12, pp. 149-198, 2000.
    • (2000) J. Artif. Intell. Res. , vol.12 , pp. 149-198
    • Baxter, J.1
  • 36
    • 9444270330 scopus 로고    scopus 로고
    • Exploiting task relatedness for multiple task learning
    • S. Ben-David and R. Schuller, "Exploiting task relatedness for multiple task learning," in Proc. COLT, 2003, pp. 567-580.
    • (2003) Proc. COLT , pp. 567-580
    • Ben-David, S.1    Schuller, R.2
  • 37
    • 56449095373 scopus 로고    scopus 로고
    • A unified architecture for natural language processing: Deep neural networks with multitask learning
    • ACM
    • R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proc. ICML, 2008, pp. 160-167, ACM.
    • (2008) Proc. ICML , pp. 160-167
    • Collobert, R.1    Weston, J.2
  • 38
    • 33947651202 scopus 로고    scopus 로고
    • Multitask learning for spoken language understanding
    • G. Tur, "Multitask learning for spoken language understanding," in Proc. ICASSP, 2006, pp. 585-588.
    • (2006) Proc. ICASSP , pp. 585-588
    • Tur, G.1
  • 39
    • 84897808098 scopus 로고    scopus 로고
    • Multi-task deep neural network for multi-label learning
    • Y. Huang, W. Wang, L. Wang, and T. Tan, "Multi-task deep neural network for multi-label learning," in Proc. ICIP, 2013, pp. 2897-2900.
    • (2013) Proc. ICIP , pp. 2897-2900
    • Huang, Y.1    Wang, W.2    Wang, L.3    Tan, T.4
  • 40
    • 85009167968 scopus 로고    scopus 로고
    • Multitask learning in connectionist ASR using recurrent neural networks
    • S. Parveen and P. D. Green, "Multitask learning in connectionist ASR using recurrent neural networks," in Proc. Eurospeech, 2003, pp. 1813-1816.
    • (2003) Proc. Eurospeech , pp. 1813-1816
    • Parveen, S.1    Green, P.D.2
  • 41
    • 84890545600 scopus 로고    scopus 로고
    • Multi-task learning in deep neural networks for improved phoneme recognition
    • M. Seltzer and J. Droppo, "Multi-task learning in deep neural networks for improved phoneme recognition," in Proc. ICASSP, 2013, pp. 6965-6968.
    • (2013) Proc. ICASSP , pp. 6965-6968
    • Seltzer, M.1    Droppo, J.2
  • 43
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    • J. G. Fiscus, "A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)," in Proc. IEEE ASRU, 1997, pp. 347-354.
    • (1997) Proc. IEEE ASRU , pp. 347-354
    • Fiscus, J.G.1
  • 45
    • 84929366263 scopus 로고    scopus 로고
    • Meraka-Institute, Lwazi ASR corpus
    • Meraka-Institute, Lwazi ASR corpus [Online]. Available: http://www.meraka.org.za/lwazi 2009
    • (2009)
  • 46
    • 84929366264 scopus 로고    scopus 로고
    • "Lwazi phone set," [Online]. Available: ftp://hlt.mirror.ac.za/Phoneset/Lwazi.Phoneset.1.2.pdf 2009
    • (2009) Lwazi Phone Set
  • 49
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
    • (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.3
  • 50
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
    • Jan.
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 51
    • 0025477640 scopus 로고
    • Speech database development at MIT: TIMIT and beyond
    • Aug.
    • V. Zue, S. Seneff, and J. Glass, "Speech database development at MIT: TIMIT and beyond," Speech Commun., vol. 9, no. 4, pp. 351-356, Aug. 1990.
    • (1990) Speech Commun. , vol.9 , Issue.4 , pp. 351-356
    • Zue, V.1    Seneff, S.2    Glass, J.3
  • 53
    • 84978890015 scopus 로고    scopus 로고
    • Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training
    • Y. Miao and F. Metze, "Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training," in Proc. ICASSP, 2013, pp. 7304-7308.
    • (2013) Proc. ICASSP , pp. 7304-7308
    • Miao, Y.1    Metze, F.2
  • 54
    • 84867606668 scopus 로고    scopus 로고
    • Exploiting sparseness in deep neural networks for large vocabulary speech recognition
    • D. Yu, F. Seide, G. Li, and L. Deng, "Exploiting sparseness in deep neural networks for large vocabulary speech recognition," in Proc. ICASSP, 2012, pp. 4409-4412.
    • (2012) Proc. ICASSP , pp. 4409-4412
    • Yu, D.1    Seide, F.2    Li, G.3    Deng, L.4
  • 56
    • 77951179654 scopus 로고    scopus 로고
    • A deep nonlinear feature mapping for large-margin kNN classification
    • R. Min, Z. Yuan, D. A. Stanley, A. Bonner, and Z. Zhang, "A deep nonlinear feature mapping for large-margin kNN classification," in Proc. ICDM, 2009, pp. 357-366.
    • (2009) Proc. ICDM , pp. 357-366
    • Min, R.1    Yuan, Z.2    Stanley, D.A.3    Bonner, A.4    Zhang, Z.5
  • 57
    • 80053162594 scopus 로고    scopus 로고
    • A convex formulation for learning task relationships in multi-task learning
    • Jul.
    • Y. Zhang and D.-Y. Yeung, "A convex formulation for learning task relationships in multi-task learning," in Proc. 26th Conf. UAI, Jul. 2010.
    • (2010) Proc. 26th Conf. UAI
    • Zhang, Y.1    Yeung, D.-Y.2
  • 58
    • 84867129866 scopus 로고    scopus 로고
    • Convex multitask learning with flexible task clusters
    • W. Zhong and J. Kwok, "Convex multitask learning with flexible task clusters," in Proc. ICML, 2012.
    • (2012) Proc. ICML
    • Zhong, W.1    Kwok, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.