메뉴 건너뛰기




Volumn , Issue , 2008, Pages 597-616

Towards Superhuman Speech Recognition

Author keywords

Defense Advance Research Project Agency; Language Model; Speech Recognition; Speech Recognition System; Word Error Rate

Indexed keywords


EID: 77949481364     PISSN: 25228692     EISSN: 25228706     Source Type: Book Series    
DOI: 10.1007/978-3-540-49127-9_30     Document Type: Chapter
Times cited : (3)

References (59)
  • 4
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • R.P. Lippmann: Speech recognition by machines and humans, Speech Commun. 22(1), 1–15 (1997)
    • (1997) Speech Commun , vol.22 , Issue.1 , pp. 1-15
    • Lippmann, R.P.1
  • 5
    • 84964176674 scopus 로고
    • The intelligibility of excerpts from conversation
    • I. Pollack, J.M. Pickett: The intelligibility of excerpts from conversation, Lang. Speech 6, 165–171 (1963)
    • (1963) Lang. Speech , vol.6 , pp. 165-171
    • Pollack, I.1    Pickett, J.M.2
  • 6
    • 0029770992 scopus 로고    scopus 로고
    • Improving wordspotting performance with artificially generated data
    • Vol.,) pp
    • E. Chang, R. Lippmann: Improving wordspotting performance with artificially generated data, Proc. ICASSP, Vol. 1 (1996) pp. 526–529
    • (1996) Proc. ICASSP , vol.1 , pp. 526-529
    • Chang, E.1    Lippmann, R.2
  • 7
    • 0028516073 scopus 로고
    • How do humans process and recognize speech?
    • J.B. Allen: How do humans process and recognize speech?, IEEE Trans. Speech Audio Process. 2(4), 567–577 (1994)
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 567-577
    • Allen, J.B.1
  • 8
    • 84944486544 scopus 로고
    • Prediction and entropy of printed English
    • C.E. Shannon: Prediction and entropy of printed English, Bell Syst. Tech. J. 30, 50–64 (1950)
    • (1950) Bell Syst. Tech. J. , vol.30 , pp. 50-64
    • Shannon, C.E.1
  • 11
    • 85075914250 scopus 로고    scopus 로고
    • I. Nastajus: http://en.wikipedia.org/wiki/Naturally Speeking (2007)
    • (2007)
    • Nastajus, I.1
  • 14
    • 33646798740 scopus 로고    scopus 로고
    • The IBM 2004 conversational telephony system for rich transcription
    • Vol.,) pp
    • H. Soltau, B. Kingsbury, L. Mangu, D. Povey, G. Saon, G. Zweig: The IBM 2004 conversational telephony system for rich transcription, Proc. ICASSP, Vol. 1 (2005) pp. 205–208
    • (2005) Proc. ICASSP , vol.1 , pp. 205-208
    • Soltau, H.1    Kingsbury, B.2    Mangu, L.3    Povey, D.4    Saon, G.5    Zweig, G.6
  • 16
    • 0033677121 scopus 로고    scopus 로고
    • Maximum likelihood discriminant feature spaces
    • pp
    • G. Saon, M. Padmanbhan, R. Gopinath, S. Chen: Maximum likelihood discriminant feature spaces, Proc. ICASSP, Vol. 2 (2000) pp. 1129–1132
    • (2000) Proc. ICASSP , vol.2 , pp. 1129-1132
    • Saon, G.1    Padmanbhan, M.2    Gopinath, R.3    Chen, S.4
  • 17
    • 84892187452 scopus 로고    scopus 로고
    • Maximum likelihood modeling with Gaussian distributions for classification
    • Vol.,) pp
    • R.A. Gopinath: Maximum likelihood modeling with Gaussian distributions for classification, Proc. ICASSP, Vol. 2 (1998) pp. 661–664
    • (1998) Proc. ICASSP , vol.2 , pp. 661-664
    • Gopinath, R.A.1
  • 20
    • 85009192356 scopus 로고    scopus 로고
    • An architecture for rapid decoding of large vocabulary conversational speech
    • Vol.,) pp
    • G. Saon, G. Zweig, B. Kingsbury, L. Mangu, U. Chaudhari: An architecture for rapid decoding of large vocabulary conversational speech, Proc. Eurospeech, Vol. 3 (2003) pp. 1977–1981
    • (2003) Proc. Eurospeech , vol.3 , pp. 1977-1981
    • Saon, G.1    Zweig, G.2    Kingsbury, B.3    Mangu, L.4    Chaudhari, U.5
  • 21
    • 44949140997 scopus 로고    scopus 로고
    • Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices
    • Vol.,) pp
    • S. Axelrod, V. Goel, B. Kingsbury, K. Visweswariah, R. Gopinath: Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices, Proc. Eurospeech, Vol. 3 (2003) pp. 1613–1616
    • (2003) Proc. Eurospeech , vol.3 , pp. 1613-1616
    • Axelrod, S.1    Goel, V.2    Kingsbury, B.3    Visweswariah, K.4    Gopinath, R.5
  • 22
    • 85009289957 scopus 로고    scopus 로고
    • Modeling with a subspace constraint on inverse covariance matrices
    • Vol.,) pp
    • S. Axelrod, R.A. Gopinath, P. Olsen: Modeling with a subspace constraint on inverse covariance matrices, Proc. Int. Conf. Spoken Lang. Process., Vol. 2 (2002) pp. 2177–2180
    • (2002) Proc. Int. Conf. Spoken Lang. Process , vol.2 , pp. 2177-2180
    • Axelrod, S.1    Gopinath, R.A.2    Olsen, P.3
  • 23
    • 0029764708 scopus 로고    scopus 로고
    • Speaker normalization on conversational telephone speech
    • Vol.,) pp
    • S. Wegmann, D. MacAllaster, J. Orloff, B. Peskin: Speaker normalization on conversational telephone speech, Proc. ICASSP, Vol. 1 (1996) pp. 339–342
    • (1996) Proc. ICASSP , vol.1 , pp. 339-342
    • Wegmann, S.1    Macallaster, D.2    Orloff, J.3    Peskin, B.4
  • 25
    • 84864010278 scopus 로고
    • Speaker adaptation of continuous density HMMs using multivariate linear regression
    • Vol.,) pp
    • C.J. Leggetter, P.C. Woodland: Speaker adaptation of continuous density HMMs using multivariate linear regression, Proc. Int. Conf. Spoken Lang. Process., Vol. I (1994) pp. 451–454
    • (1994) Proc. Int. Conf. Spoken Lang. Process , vol.1 , pp. 451-454
    • Leggetter, C.J.1    Woodland, P.C.2
  • 26
    • 0033329799 scopus 로고    scopus 로고
    • An empirical study of smoothing techniques for language modeling
    • S.F. Chen, J. Goodman: An empirical study of smoothing techniques for language modeling, Computer, Speech Lang. 13(4), 359–393 (1999)
    • (1999) Computer, Speech Lang , vol.13 , Issue.4 , pp. 359-393
    • Chen, S.F.1    Goodman, J.2
  • 27
    • 85079084846 scopus 로고
    • Robust methods for using context-dependent features and models in a continuous speech recognizer
    • pp
    • L.R. Bahl, P.V. deSouza, P.S. Gopalakrishnan, D. Nahamoo, M. Picheny: Robust methods for using context-dependent features and models in a continuous speech recognizer, Proc. ICASSP, Vol. 1 (1994) pp. 533–536
    • (1994) Proc. ICASSP , vol.1 , pp. 533-536
    • Bahl, L.R.1    Desouza, P.V.2    Gopalakrishnan, P.S.3    Nahamoo, D.4    Picheny, M.5
  • 29
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition: Word error minimization and other applications of confusion networks
    • L. Mangu, E. Brill, A. Stolcke: Finding consensus in speech recognition: Word error minimization and other applications of confusion networks, Computer, Speech Lang. 14(4), 373–400 (2000)
    • (2000) Computer, Speech Lang , vol.14 , Issue.4 , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3
  • 30
    • 85009145345 scopus 로고    scopus 로고
    • Observations on overlap: Findings and implications for automatic processing of multi-party conversation
    • Vol.,) pp
    • E. Shriberg, A. Stolcke, D. Baron: Observations on overlap: Findings and implications for automatic processing of multi-party conversation, Proc. Eurospeech, Vol. 2 (2001) pp. 1359–1362
    • (2001) Proc. Eurospeech , vol.2 , pp. 1359-1362
    • Shriberg, E.1    Stolcke, A.2    Baron, D.3
  • 31
    • 0036296863 scopus 로고    scopus 로고
    • Minimum phone error and I-smoothing for improved discriminative training
    • Vol.,) pp
    • D. Povey, P. Woodland: Minimum phone error and I-smoothing for improved discriminative training, Proc. ICASSP, Vol. 1 (2002) pp. 105–108
    • (2002) Proc. ICASSP , vol.1 , pp. 105-108
    • Povey, D.1    Woodland, P.2
  • 32
    • 33646788786 scopus 로고    scopus 로고
    • FMPE: Discriminatively trained features for speech recognition
    • Vol.,) pp
    • D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig: FMPE: Discriminatively trained features for speech recognition, Proc. ICASSP, Vol. 1 (2005) pp. 961–964
    • (2005) Proc. ICASSP , vol.1 , pp. 961-964
    • Povey, D.1    Kingsbury, B.2    Mangu, L.3    Saon, G.4    Soltau, H.5    Zweig, G.6
  • 33
    • 0036534754 scopus 로고    scopus 로고
    • Large-vocabulary speech recognition algorithms
    • M. Padmanabhan, M. Picheny: Large-vocabulary speech recognition algorithms, IEEE Comput. 35(4), 42–50 (2002)
    • (2002) IEEE Comput , vol.35 , Issue.4 , pp. 42-50
    • Padmanabhan, M.1    Picheny, M.2
  • 34
    • 85075909935 scopus 로고    scopus 로고
    • Google Desktop Developer Group: http://www. google.com/apis/(2007)
    • (2007)
  • 35
    • 0032136330 scopus 로고    scopus 로고
    • Robust speech recognition using the modulation spectrogram
    • B.E.D. Kingsbury, N. Morgan, S. Greenberg: Robust speech recognition using the modulation spectrogram, Speech Commun. 25(1-3), 117–132 (1998)
    • (1998) Speech Commun , vol.25 , Issue.1-3 , pp. 117-132
    • Kingsbury, B.E.D.1    Morgan, N.2    Greenberg, S.3
  • 36
    • 0030245363 scopus 로고    scopus 로고
    • From HMMs to segment models: A unified view of stochastic modeling for speech recognition
    • M. Ostendorf, V.V. Digilakis, O.A. Kimball: From HMMs to segment models: A unified view of stochastic modeling for speech recognition, Proc. IEEE Trans. Speech Audio Process. 4(5), 360–378 (1996)
    • (1996) Proc. IEEE Trans. Speech Audio Process. , vol.4 , Issue.5 , pp. 360-378
    • Ostendorf, M.1    Digilakis, V.2    Kimball, O.A.3
  • 38
    • 2442590895 scopus 로고    scopus 로고
    • Dependency modeling with Bayesian networks in a voicemail transcription system
    • Vol.,) pp
    • G. Zweig, M. Padmanabhan: Dependency modeling with Bayesian networks in a voicemail transcription system, Proc. Eurospeech, Vol. 3 (1999) pp. 1335–1338
    • (1999) Proc. Eurospeech , vol.3 , pp. 1335-1338
    • Zweig, G.1    Padmanabhan, M.2
  • 39
    • 0032666052 scopus 로고    scopus 로고
    • Buried Markov models
    • Vol.,) pp
    • J. Bilmes: Buried Markov models, Proc. ICASSP, Vol. 2 (1999) pp. 713–716
    • (1999) Proc. ICASSP , vol.2 , pp. 713-716
    • Bilmes, J.1
  • 41
    • 27144520907 scopus 로고    scopus 로고
    • Multi-rate and variable-rate modeling of speech at phone and syllable time scales
    • Vol.,) pp
    • Ö. Çetin, M. Ostendorf: Multi-rate and variable-rate modeling of speech at phone and syllable time scales, Proc. Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2005) pp. 665–668
    • (2005) Proc. Int. Conf. Acoust. Speech Signal Process , vol.1 , pp. 665-668
    • Çetin, Ö.1    Ostendorf, M.2
  • 42
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and uncertain acoustic data
    • M.P. Cooke, P.D. Green, L.B. Josifovski, A. Vizinho: Robust automatic speech recognition with missing and uncertain acoustic data, Speech Commun. 34, 267–285 (2001)
    • (2001) Speech Commun , vol.34 , pp. 267-285
    • Cooke, M.P.1    Green, P.D.2    Josifovski, L.B.3    Vizinho, A.4
  • 43
    • 85009074785 scopus 로고    scopus 로고
    • A nonlinear unsupervised adaptation technique for speech recognition
    • Vol.,) pp
    • S. Dharanipragada, M. Padmanabhan: A nonlinear unsupervised adaptation technique for speech recognition, Proc. Int. Conf. Spoken Lang. Process., Vol. IV (2000) pp. 556–559
    • (2000) Proc. Int. Conf. Spoken Lang. Process , vol.4 , pp. 556-559
    • Dharanipragada, S.1    Padmanabhan, M.2
  • 44
    • 84892157236 scopus 로고    scopus 로고
    • Non-parametric estimation and correction of non-linear distortion in speech systems
    • pp
    • R. Balchandran, R. Mammone: Non-parametric estimation and correction of non-linear distortion in speech systems, Proc. ICASSP, Vol. 2 (1998) pp. 749– 752
    • (1998) Proc. ICASSP , vol.2 , pp. 749-752
    • Balchandran, R.1    Mammone, R.2
  • 46
    • 0141590649 scopus 로고    scopus 로고
    • Word level confidence measurement using semantic features
    • pp
    • R. Sarikaya, Y. Gao, M. Picheny: Word level confidence measurement using semantic features, Proc. ICASSP, Vol. 1 (2003) pp. 604–607
    • (2003) Proc. ICASSP , vol.1 , pp. 604-607
    • Sarikaya, R.1    Gao, Y.2    Picheny, M.3
  • 47
    • 0000274403 scopus 로고    scopus 로고
    • Exploiting latent semantic information in statistical language modeling
    • J. Bellegarda: Exploiting latent semantic information in statistical language modeling, Proc. IEEE 88(8), 1279–1296 (2000)
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1279-1296
    • Bellegarda, J.1
  • 48
    • 85135271453 scopus 로고    scopus 로고
    • Putting language into language modeling
    • Vol.,) pp
    • F. Jelinek, C. Chelba: Putting language into language modeling, Proc. Eurospeech, Vol. 1 (1999) pp. KN–1–KN–4
    • (1999) Proc. Eurospeech , vol.1 , pp. KN-1
    • Jelinek, F.1    Chelba, C.2
  • 49
    • 85075949361 scopus 로고    scopus 로고
    • Semantic coherence scoring using an ontology, Proc. HLT-NAACL, pp
    • I. Gurevych, R. Malaka, R. Porzel, H.P. Zorn: Semantic coherence scoring using an ontology, Proc. HLT-NAACL (2003) pp. 88–95
    • (2003) , pp. 88-95
    • Gurevych, I.1    Malaka, R.2    Porzel, R.3    Zorn, H.P.4
  • 50
    • 0036289830 scopus 로고    scopus 로고
    • Direct models for phoneme recognition
    • Vol.,) pp
    • A. Likhododev, Y. Gao: Direct models for phoneme recognition, Proc. ICASSP, Vol. 1 (2002) pp. 89–92
    • (2002) Proc. ICASSP , vol.1 , pp. 89-92
    • Likhododev, A.1    Gao, Y.2
  • 53
    • 33645759970 scopus 로고    scopus 로고
    • Lattice segmentation and support vector machines for large vocabulary continuous speech recognition
    • Vol.,) pp
    • V. Venkataramani, W. Byrne: Lattice segmentation and support vector machines for large vocabulary continuous speech recognition, Proc. ICASSP, Vol. 1 (2005) pp. 817–820
    • (2005) Proc. ICASSP , vol.1 , pp. 817-820
    • Venkataramani, V.1    Byrne, W.2
  • 55
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    • pp
    • J.G. Fiscus: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), Proc. IEEE Workshop Autom. Speech Recognition Understanding, Santa Barbara (1997) pp. 347–355
    • (1997) Proc. IEEE Workshop Autom. Speech Recognition Understanding, Santa Barbara , pp. 347-355
    • Fiscus, J.G.1
  • 56
    • 0002978642 scopus 로고    scopus 로고
    • Experiments with a new boosting algorithm
    • pp
    • Y. Freund, R.E. Schapire: Experiments with a new boosting algorithm, Proc. ICML (1996) pp. 148–156
    • (1996) Proc. ICML , pp. 148-156
    • Freund, Y.1    Schapire, R.E.2
  • 57
    • 33646818291 scopus 로고    scopus 로고
    • Constructing ensembles of ASR systems using randomized decision trees
    • Vol.,) pp
    • O. Siohan, B. Ramabhadran, B. Kingsbury: Constructing ensembles of ASR systems using randomized decision trees, Proc. ICASSP, Vol. 1 (2005) pp. 197–200
    • (2005) Proc. ICASSP , vol.1 , pp. 197-200
    • Siohan, O.1    Ramabhadran, B.2    Kingsbury, B.3
  • 58
    • 85075931534 scopus 로고    scopus 로고
    • IBM Research Communication Dept.: http://www. research.ibm.com/bluegene (2007)
    • (2007)
  • 59
    • 85075936662 scopus 로고    scopus 로고
    • IBM Research Communication Dept.: http://www. research.ibm.com/cell (2007)
    • (2007)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.