메뉴 건너뛰기




Volumn 22, Issue 3, 2014, Pages 711-726

Theoretical analysis of diversity in an ensemble of automatic speech recognition systems

Author keywords

Ambiguity decomposition; Automatic speech recognition; Discriminative training; Diversity; Ensemble methods; ROVER; System combination

Indexed keywords

AMBIGUITY DECOMPOSITION; AUTOMATIC SPEECH RECOGNITION; DISCRIMINATIVE TRAINING; DIVERSITY; ENSEMBLE METHODS; ROVER; SYSTEM COMBINATION;

EID: 84898080333     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2303295     Document Type: Article
Times cited : (17)

References (65)
  • 1
    • 84906226217 scopus 로고    scopus 로고
    • Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems
    • K. Audhkhasi, A. M. Zavou, P. G. Georgiou, and S. S. Narayanan, "Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems," in Proc. Interspeech, 2013.
    • Proc. Interspeech, 2013
    • Audhkhasi, K.1    Zavou, A.M.2    Georgiou, P.G.3    Narayanan, S.S.4
  • 8
    • 0030638031 scopus 로고    scopus 로고
    • A post processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
    • J. Fiscus, "A post processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)," in Proc. ASRU, 1997, pp. 347-354.
    • Proc. ASRU, 1997 , pp. 347-354
    • Fiscus, J.1
  • 9
    • 0032639912 scopus 로고    scopus 로고
    • Using boosting to improve a hybrid HMM/neural network speech recognizer
    • H. Schwenk, "Using boosting to improve a hybrid HMM/neural network speech recognizer," in Proc. IEEE ICASSP, 1999, vol. 2, pp. 1009-1012.
    • Proc. IEEE ICASSP, 1999 , vol.2 , pp. 1009-1012
    • Schwenk, H.1
  • 10
    • 0030351194 scopus 로고    scopus 로고
    • Boosting the performance of connectionist large vocabulary speech recognition
    • G. Cook and T. Robinson, "Boosting the performance of connectionist large vocabulary speech recognition," in Proc. ICSLP, 1996, vol. 3, pp. 1305-1308.
    • Proc. ICSLP, 1996 , vol.3 , pp. 1305-1308
    • Cook, G.1    Robinson, T.2
  • 11
    • 4544236424 scopus 로고    scopus 로고
    • Boosting HMMs with an application to speech recognition
    • C. Dimitrakakis and S. Bengio, "Boosting HMMs with an application to speech recognition," in Proc. ICASSP, 2004, vol. 5, pp. 618-621.
    • Proc. ICASSP, 2004 , vol.5 , pp. 618-621
    • Dimitrakakis, C.1    Bengio, S.2
  • 12
    • 33646818291 scopus 로고    scopus 로고
    • Constructing ensembles of ASR systems using randomized decision trees
    • O. Siohan, B. Ramabhadran, and B. Kingsbury, "Constructing ensembles of ASR systems using randomized decision trees," in Proc. ICASSP, 2005, vol. 1, pp. 197-200.
    • Proc. ICASSP, 2005 , vol.1 , pp. 197-200
    • Siohan, O.1    Ramabhadran, B.2    Kingsbury, B.3
  • 13
    • 51549086717 scopus 로고    scopus 로고
    • Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition
    • Mar.
    • J. Xue and Y. Zhao, "Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 3, pp. 519-528, Mar. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.3 , pp. 519-528
    • Xue, J.1    Zhao, Y.2
  • 14
    • 80055092534 scopus 로고    scopus 로고
    • Boosting systems for large vocabulary continuous speech recognition
    • G. Saon and H. Soltau, "Boosting systems for large vocabulary continuous speech recognition," Speech Commun., vol. 54, no. 2, pp. 212-218, 2012.
    • (2012) Speech Commun. , vol.54 , Issue.2 , pp. 212-218
    • Saon, G.1    Soltau, H.2
  • 15
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123-140, 1996.
    • (1996) Mach. Learn. , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 16
    • 84983110889 scopus 로고
    • A decision-theoretic generalization of on-line learning and an application to boosting
    • New York, NY, USA: Springer
    • Y. Freund and R. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," in Computational learning theory. New York, NY, USA: Springer, 1995, pp. 23-37.
    • (1995) Computational Learning Theory , pp. 23-37
    • Freund, Y.1    Schapire, R.2
  • 17
    • 0035470889 scopus 로고    scopus 로고
    • Greedy function approximation: A gradient boosting machine
    • J. H. Friedman, "Greedy function approximation: A gradient boosting machine," Ann. Statist., vol. 29, no. 5, pp. 1189-1232, 2001.
    • (2001) Ann. Statist. , vol.29 , Issue.5 , pp. 1189-1232
    • Friedman, J.H.1
  • 18
    • 0035478854 scopus 로고    scopus 로고
    • Random forests
    • L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.
    • (2001) Mach. Learn. , vol.45 , Issue.1 , pp. 5-32
    • Breiman, L.1
  • 20
    • 84860878023 scopus 로고    scopus 로고
    • Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition
    • Sep.
    • X. Cui, J. Huang, and J.-T. Chien, "Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 1923-1935, Sep. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.7 , pp. 1923-1935
    • Cui, X.1    Huang, J.2    Chien, J.-T.3
  • 21
    • 84872174281 scopus 로고    scopus 로고
    • Building acoustic model ensembles by data sampling with enhanced trainings and features
    • Mar.
    • X. Chen and Y. Zhao, "Building acoustic model ensembles by data sampling with enhanced trainings and features," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 3, pp. 498-507, Mar. 2013.
    • (2013) IEEE Trans. Audio, Speech, Language Process. , vol.21 , Issue.3 , pp. 498-507
    • Chen, X.1    Zhao, Y.2
  • 23
    • 80053403826 scopus 로고    scopus 로고
    • Ensemble methods inmachine learning
    • T. Dietterich, "Ensemble methods inmachine learning," Multiple Classifier Syst., pp. 1-15, 2000.
    • (2000) Multiple Classifier Syst. , pp. 1-15
    • Dietterich, T.1
  • 24
    • 85054435084 scopus 로고
    • Neural network ensembles, cross validation, and active learning
    • A. Krogh and J. Vedelsby, "Neural network ensembles, cross validation, and active learning," Adv. Neural Inf. Process. Syst., pp. 231-238, 1995.
    • (1995) Adv. Neural Inf. Process. Syst. , pp. 231-238
    • Krogh, A.1    Vedelsby, J.2
  • 26
    • 0030085913 scopus 로고    scopus 로고
    • Analysis of decision boundaries in linearly combined neural classifiers
    • DOI 10.1016/0031-3203(95)00085-2
    • K. Tumer and J. Ghosh, "Analysis of decision boundaries in linearly combined neural classifiers," Pattern Recogn., vol. 29, no. 2, pp. 341-348, 1996. (Pubitemid 126397840)
    • (1996) Pattern Recognition , vol.29 , Issue.2 , pp. 341-348
    • Tumer, K.1    Ghosh, J.2
  • 27
    • 0033485370 scopus 로고    scopus 로고
    • Ensemble learning via negative correlation
    • Y. Liu and X. Yao, "Ensemble learning via negative correlation," Neural Netw., vol. 12, no. 10, pp. 1399-1404, 1999.
    • (1999) Neural Netw. , vol.12 , Issue.10 , pp. 1399-1404
    • Liu, Y.1    Yao, X.2
  • 28
    • 0001942829 scopus 로고
    • Neural networks and the bias/variance dilemma
    • S. Geman, E. Bienenstock, and R. Doursat, "Neural networks and the bias/variance dilemma," Neural Comput., vol. 4, no. 1, pp. 1-58, 1992.
    • (1992) Neural Comput. , vol.4 , Issue.1 , pp. 1-58
    • Geman, S.1    Bienenstock, E.2    Doursat, R.3
  • 31
    • 77949367510 scopus 로고    scopus 로고
    • Self-supervised discriminative training of statistical language models
    • P. Xu, D. Karakos, and S. Khudanpur, "Self-supervised discriminative training of statistical language models," in Proc. ASRU, 2009, pp. 317-322.
    • Proc. ASRU, 2009 , pp. 317-322
    • Xu, P.1    Karakos, D.2    Khudanpur, S.3
  • 32
    • 0037403516 scopus 로고    scopus 로고
    • Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy
    • L. I. Kuncheva and C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Mach. Learn., vol. 51, no. 2, pp. 181-207, 2003.
    • (2003) Mach. Learn. , vol.51 , Issue.2 , pp. 181-207
    • Kuncheva, L.I.1    Whitaker, C.J.2
  • 33
    • 0036609602 scopus 로고    scopus 로고
    • Relationships between combination methods and measures of diversity in combining classifiers
    • C. A. Shipp and L. I. Kuncheva, "Relationships between combination methods and measures of diversity in combining classifiers," Inf. Fusion, vol. 3, no. 2, pp. 135-148, 2002.
    • (2002) Inf. Fusion , vol.3 , Issue.2 , pp. 135-148
    • Shipp, C.A.1    Kuncheva, L.I.2
  • 37
    • 0012330750 scopus 로고    scopus 로고
    • The design for the Wall Street Journal-based CSR corpus
    • Association for Computational Linguistics
    • D. B. Paul and J. M. Baker, "The design for the Wall Street Journal-based CSR corpus," in Proc. Workshop Speech Natural Lang., 1992, pp. 357-362, Association for Computational Linguistics.
    • Proc. Workshop Speech Natural Lang., 1992 , pp. 357-362
    • Paul, D.B.1    Baker, J.M.2
  • 41
    • 15844411850 scopus 로고    scopus 로고
    • Confidence measures for speech recognition: A survey
    • DOI 10.1016/j.specom.2004.12.004, PII S0167639305000051
    • H. Jiang, "Confidence measures for speech recognition: A survey," Speech Commun., vol. 45, no. 4, pp. 455-470, 2005. (Pubitemid 40423290)
    • (2005) Speech Communication , vol.45 , Issue.4 , pp. 455-470
    • Jiang, H.1
  • 42
    • 0022594196 scopus 로고
    • An introduction to hidden Markov models
    • Jan.
    • L. Rabiner and B. Juang, "An introduction to hidden Markov models," IEEE ASSP Mag., vol. 3, no. 1, pp. 4-16, Jan. 1986.
    • (1986) IEEE ASSP Mag. , vol.3 , Issue.1 , pp. 4-16
    • Rabiner, L.1    Juang, B.2
  • 44
    • 0035278951 scopus 로고    scopus 로고
    • Confidence measures for large vocabulary continuous speech recognition
    • DOI 10.1109/89.906002, PII S1063667601013281
    • F. Wessel, R. Schluter, K. Macherey, and H. Ney, "Confidence measures for large vocabulary continuous speech recognition," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 288-298, Mar. 2001. (Pubitemid 32286598)
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.3 , pp. 288-298
    • Wessel, F.1    Schluter, R.2    Macherey, K.3    Ney, H.4
  • 45
    • 34547538178 scopus 로고    scopus 로고
    • Maximum entropy confidence estimation for speech recognition
    • C. White, J. Droppo, A. Acero, and J. Odell, "Maximum entropy confidence estimation for speech recognition," in Proc. ICASSP, 2007, vol. 4, pp. 809-812.
    • Proc. ICASSP, 2007 , vol.4 , pp. 809-812
    • White, C.1    Droppo, J.2    Acero, A.3    Odell, J.4
  • 46
    • 84865793084 scopus 로고    scopus 로고
    • Combining information sources for confidence estimation with CRF models
    • M. S. Seigel and P. C. Woodland, "Combining information sources for confidence estimation with CRF models," in Proc. INTERSPEECH, 2011, pp. 905-908.
    • (2011) Proc. Interspeech , pp. 905-908
    • Seigel, M.S.1    Woodland, P.C.2
  • 49
    • 0142192295 scopus 로고    scopus 로고
    • Conditional random fields: Probabilistic models for segmenting and labeling sequence data
    • J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. ICML-01, 2001, pp. 282-289.
    • Proc. ICML-01, 2001 , pp. 282-289
    • Lafferty, J.1    McCallum, A.2    Pereira, F.3
  • 50
    • 33646887390 scopus 로고
    • On the limited memory BFGS method for large scale optimization
    • D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Math. Program., vol. 45, no. 1-3, pp. 503-528, 1989. (Pubitemid 20660315)
    • (1989) Mathematical Programming, Series B , vol.45 , Issue.3 , pp. 503-528
    • Liu, D.C.1    Nocedal, J.2
  • 51
    • 78650977476 scopus 로고    scopus 로고
    • Opensmile: The Munich versatile and fast open-source audio feature extractor
    • ACM
    • F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: The Munich versatile and fast open-source audio feature extractor," in Proc. Int. Conf. Multimedia, 2010, pp. 1459-1462, ACM.
    • Proc. Int. Conf. Multimedia, 2010 , pp. 1459-1462
    • Eyben, F.1    Wöllmer, M.2    Schuller, B.3
  • 53
    • 73649124909 scopus 로고    scopus 로고
    • Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates
    • S. Goldwater, D. Jurafsky, and C. D. Manning, "Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates," Speech Commun., vol. 52, no. 3, pp. 181-200, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.3 , pp. 181-200
    • Goldwater, S.1    Jurafsky, D.2    Manning, C.D.3
  • 54
    • 2942568545 scopus 로고    scopus 로고
    • Prosodic and other cues to speech recognition failures
    • J. Hirschberg, D. Litman, and M. Swerts, "Prosodic and other cues to speech recognition failures," Speech Commun., vol. 43, no. 1, pp. 155-175, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.1 , pp. 155-175
    • Hirschberg, J.1    Litman, D.2    Swerts, M.3
  • 55
    • 85009223733 scopus 로고    scopus 로고
    • Automatic disfluency identification in conversational speech using multiple knowledge sources
    • Y. Liu, E. Shriberg, and A. Stolcke, "Automatic disfluency identification in conversational speech using multiple knowledge sources," in Proc. Eurospeech, Geneva, Switzerland, 2003, vol. 1, pp. 957-960.
    • Proc. Eurospeech, Geneva, Switzerland, 2003 , vol.1 , pp. 957-960
    • Liu, Y.1    Shriberg, E.2    Stolcke, A.3
  • 57
    • 38149013231 scopus 로고    scopus 로고
    • An improved hierarchical speaker clustering
    • W. Wang, P. Lu, and Y. Yan, "An improved hierarchical speaker clustering," Acta Acoustica, 2006.
    • (2006) Acta Acoustica
    • Wang, W.1    Lu, P.2    Yan, Y.3
  • 58
    • 84891308106 scopus 로고    scopus 로고
    • SRILM - an extensible language modeling toolkit
    • A. Stolcke, "SRILM - an extensible language modeling toolkit," in Proc. ICSLP, 2002, pp. 901-904.
    • (2002) Proc. ICSLP , pp. 901-904
    • Stolcke, A.1
  • 61
    • 84893695671 scopus 로고    scopus 로고
    • Discriminative training of acoustic models for system combination
    • Y. Tachioka and S. Watanabe, "Discriminative training of acoustic models for system combination," in Proc. Interspeech, 2013.
    • Proc. Interspeech, 2013
    • Tachioka, Y.1    Watanabe, S.2
  • 62
    • 29444447644 scopus 로고    scopus 로고
    • Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition
    • DOI 10.1016/j.specom.2005.07.002, PII S016763930500172X
    • V. Doumpiotis and W. Byrne, "Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition," Speech Commun., vol. 48, no. 2, pp. 142-160, 2006. (Pubitemid 43012028)
    • (2006) Speech Communication , vol.48 , Issue.2 , pp. 142-160
    • Doumpiotis, V.1    Byrne, W.2
  • 63
    • 85032751713 scopus 로고    scopus 로고
    • Discriminative training for automatic speech recognition: Modeling, criteria, optimization, implementation, and performance
    • Nov.
    • G. Heigold, H. Ney, R. Schluter, and S. Wiesler, "Discriminative training for automatic speech recognition: Modeling, criteria, optimization, implementation, and performance," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 58-69, Nov. 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 58-69
    • Heigold, G.1    Ney, H.2    Schluter, R.3    Wiesler, S.4
  • 64
    • 0141480019 scopus 로고    scopus 로고
    • Discriminative MAP for acoustic model adaptation
    • D. Povey, P. C. Woodland, and M. J. F. Gales, "Discriminative MAP for acoustic model adaptation," in Proc. ICASSP, 2003, vol. 1, pp. 312-315.
    • (2003) Proc. ICASSP , vol.1 , pp. 312-315
    • Povey, D.1    Woodland, P.C.2    Gales, M.J.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.