메뉴 건너뛰기




Volumn 14, Issue 5, 2006, Pages 1729-1742

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Author keywords

Broadcast news (BN); Conversational telephone speech (CTS); Specch to text (STT)

Indexed keywords

ACOUSTIC FEATURES; BROADCAST NEWS (BN); CONVERSATIONAL TELEPHONE SPEECH (CTS); SPEECH TO TEXT (STT);

EID: 34047270914     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2006.879807     Document Type: Article
Times cited : (75)

References (53)
  • 2
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • Apr
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, pp. 1738-1752, Apr. 1990.
    • (1990) J. Acoust. Soc. Amer , vol.87 , pp. 1738-1752
    • Hermansky, H.1
  • 3
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of HMMs
    • C. Leggetter and P. Woodland, "Maximum likelihood linear regression for speaker adaptation of HMMs," Comput. Speech Lang., vol. 9, pp. 171-186, 1995.
    • (1995) Comput. Speech Lang , vol.9 , pp. 171-186
    • Leggetter, C.1    Woodland, P.2
  • 6
    • 0003871508 scopus 로고    scopus 로고
    • Investigation of silicon-auditory models and generalization of linear discriminant analysis for improved speech recognition,
    • Ph.D. dissertation, Johns Hopkins Univ, Baltimore, MD
    • N. Kumar, "Investigation of silicon-auditory models and generalization of linear discriminant analysis for improved speech recognition," Ph.D. dissertation, Johns Hopkins Univ., Baltimore, MD, 1997.
    • (1997)
    • Kumar, N.1
  • 7
    • 0036475982 scopus 로고    scopus 로고
    • Maximum likelihood multiple subspace projections for hidden Markov models
    • Feb
    • M. J. Gales, "Maximum likelihood multiple subspace projections for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 10, no. 2, pp. 37-17, Feb. 2002.
    • (2002) IEEE Trans. Speech Audio Process , vol.10 , Issue.2 , pp. 37-17
    • Gales, M.J.1
  • 9
    • 0036296863 scopus 로고    scopus 로고
    • Minimum phone error and I-smoothing for improved discriminative training
    • Orlando, FL, May
    • D. Povey and P. C. Woodland, "Minimum phone error and I-smoothing for improved discriminative training," in Proc. IEEE Conf. Acoust., Speech, Signal Process., vol. 1, Orlando, FL, May 2002, pp. 105-108.
    • (2002) Proc. IEEE Conf. Acoust., Speech, Signal Process , vol.1 , pp. 105-108
    • Povey, D.1    Woodland, P.C.2
  • 10
    • 44949090835 scopus 로고    scopus 로고
    • Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures
    • M. Hearst and M. Ostendorf, Eds, Edmonton, AB, Canada, Mar
    • I. Bulyko, M. Ostendorf, and A. Stolcke, "Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures," in Proc. HLT-NAACL, Conf. North Amer. Chap. Assoc. Comput. Ling., vol. 2, M. Hearst and M. Ostendorf, Eds., Edmonton, AB, Canada, Mar. 2003, pp. 7-9.
    • (2003) Proc. HLT-NAACL, Conf. North Amer. Chap. Assoc. Comput. Ling , vol.2 , pp. 7-9
    • Bulyko, I.1    Ostendorf, M.2    Stolcke, A.3
  • 12
    • 0016067897 scopus 로고
    • Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
    • B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Amer., vol. 55, pp. 1304-1312, 1974.
    • (1974) J. Acoust. Soc. Amer , vol.55 , pp. 1304-1312
    • Atal, B.1
  • 14
  • 15
    • 0023540097 scopus 로고
    • Multilayer perceptrons and automatic speech recognition
    • San Diego, CA
    • H. Bourlard and C. Wellekens, "Multilayer perceptrons and automatic speech recognition," in Proc. 1st Int. Conf. Neural Netw., vol. IV, San Diego, CA, 1987, pp. 407-416.
    • (1987) Proc. 1st Int. Conf. Neural Netw , vol.4 , pp. 407-416
    • Bourlard, H.1    Wellekens, C.2
  • 16
    • 0141676589 scopus 로고    scopus 로고
    • New entropy based combination rules in HMM/ANN multi-stream ASR
    • Hong Kong, Apr
    • H. Misra, H. Bourlard, and V. Tyagi, "New entropy based combination rules in HMM/ANN multi-stream ASR," in Proc. IEEE Conf. Acoust., Speech. Signal Process., vol. 2, Hong Kong, Apr. 2003, pp. 741-744.
    • (2003) Proc. IEEE Conf. Acoust., Speech. Signal Process , vol.2 , pp. 741-744
    • Misra, H.1    Bourlard, H.2    Tyagi, V.3
  • 17
    • 34047245552 scopus 로고    scopus 로고
    • Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech,
    • Ph.D. dissertation, Univ. California, Berkeley
    • B. Y. Chen, "Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech," Ph.D. dissertation, Univ. California, Berkeley, 2005.
    • (2005)
    • Chen, B.Y.1
  • 18
    • 33745185321 scopus 로고    scopus 로고
    • Using MLP features in SRI's conversational speech recognition system
    • Lisbon, Portugal, Sep
    • Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan, "Using MLP features in SRI's conversational speech recognition system," in Proc. 9th Eur. Conf. Speech Commun. Technol., Lisbon, Portugal, Sep. 2005, pp. 2141-2144.
    • (2005) Proc. 9th Eur. Conf. Speech Commun. Technol , pp. 2141-2144
    • Zhu, Q.1    Stolcke, A.2    Chen, B.Y.3    Morgan, N.4
  • 19
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • Apr
    • S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech. Signal Process., vol. ASSP-29, no. 2, pp. 254-272, Apr. 1981.
    • (1981) IEEE Trans. Acoust., Speech. Signal Process , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui, S.1
  • 21
    • 84946807902 scopus 로고    scopus 로고
    • V. R. R. Gadde, A. Stolcke, D. Vergyri, J. Zheng, K. Sonmez, and A. Venkataraman, Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system, in Proc. Int. Conf. Spoken Lang. Process., 3, J. H. L. Hansen and B. Pellom, Eds., Denver, CO, Sep. 2002, pp. 1577-1580.
    • V. R. R. Gadde, A. Stolcke, D. Vergyri, J. Zheng, K. Sonmez, and A. Venkataraman, "Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system," in Proc. Int. Conf. Spoken Lang. Process., vol. 3, J. H. L. Hansen and B. Pellom, Eds., Denver, CO, Sep. 2002, pp. 1577-1580.
  • 23
    • 0022890536 scopus 로고
    • Maximum mutual information estimation of hidden Markov model parameters for speech recognition
    • Tokyo, Japan, Apr
    • L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech. Signal Process., vol. 1, Tokyo, Japan, Apr. 1986, pp. 49-52.
    • (1986) Proc. IEEE Int. Conf. Acoust., Speech. Signal Process , vol.1 , pp. 49-52
    • Bahl, L.R.1    Brown, P.F.2    de Souza, P.V.3    Mercer, R.L.4
  • 24
    • 0036461035 scopus 로고    scopus 로고
    • Large scale discriminative training of hidden Markov models of speech recognition
    • P. C. Woodland and D. Povey, "Large scale discriminative training of hidden Markov models of speech recognition," Comput. Speech Lang., vol. 16, pp. 25-47, 2002.
    • (2002) Comput. Speech Lang , vol.16 , pp. 25-47
    • Woodland, P.C.1    Povey, D.2
  • 26
    • 0348198473 scopus 로고    scopus 로고
    • Finite-state transducers in language and speech processing
    • M. Mohri, "Finite-state transducers in language and speech processing," Comput. Ling., vol. 23, pp. 269-311, 1997.
    • (1997) Comput. Ling , vol.23 , pp. 269-311
    • Mohri, M.1
  • 27
    • 85135253868 scopus 로고    scopus 로고
    • Efficient general lattice generation and rescoring
    • Budapest, Hungary, Sep
    • A. Ljolje, F. Pereira, and M. Riley, "Efficient general lattice generation and rescoring," in Proc. 6th Eur. Conf. Speech Commun. Technol., vol. 3, Budapest, Hungary, Sep. 1999, pp. 1251-1254.
    • (1999) Proc. 6th Eur. Conf. Speech Commun. Technol , vol.3 , pp. 1251-1254
    • Ljolje, A.1    Pereira, F.2    Riley, M.3
  • 28
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition: Word error minimization and other applications of confusion networks
    • L. Mangu, E. Brill, and A. Stolcke, "Finding consensus in speech recognition: Word error minimization and other applications of confusion networks," Comput. Speech Lang., vol. 14, no. 4, pp. 373-400, 2000.
    • (2000) Comput. Speech Lang , vol.14 , Issue.4 , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3
  • 29
    • 0141477960 scopus 로고    scopus 로고
    • Posterior probability decoding, confidence estimation, and system combination
    • College Park, MD, May
    • G. Evermann and P. Woodland, "Posterior probability decoding, confidence estimation, and system combination," in Proc. NIST Speech Transcription Workshop, College Park, MD, May 2000.
    • (2000) Proc. NIST Speech Transcription Workshop
    • Evermann, G.1    Woodland, P.2
  • 32
    • 34047268272 scopus 로고    scopus 로고
    • M. J. Gales, The generation and use of regression class trees for MLLR adaptation, Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR263, 1996.
    • M. J. Gales, "The generation and use of regression class trees for MLLR adaptation," Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR263, 1996.
  • 33
    • 4544358964 scopus 로고    scopus 로고
    • The SuperARV language model: Investigating the effectiveness of tightly integrating multiple knowledge sources
    • W. Wang and M. Harper, "The SuperARV language model: Investigating the effectiveness of tightly integrating multiple knowledge sources," in Proc. Conf. Empirical Methods Natural Language Process., 2002, pp. 238-247.
    • (2002) Proc. Conf. Empirical Methods Natural Language Process , pp. 238-247
    • Wang, W.1    Harper, M.2
  • 34
    • 85149132266 scopus 로고
    • Structural disambiguation with constraints propagation
    • Pittsburgh, PA, Jun
    • H. Maruyama, "Structural disambiguation with constraints propagation," in Proc. 28th Annu. Meeting Assoc. Comput. Ling., Pittsburgh, PA, Jun. 1990, pp. 31-38.
    • (1990) Proc. 28th Annu. Meeting Assoc. Comput. Ling , pp. 31-38
    • Maruyama, H.1
  • 35
    • 34047258426 scopus 로고    scopus 로고
    • Statistical parsing and language modeling based on constraint dependency grammar,
    • Ph.D. dissertation, Purdue Univ, West Lafayette, IN
    • W. Wang, "Statistical parsing and language modeling based on constraint dependency grammar," Ph.D. dissertation, Purdue Univ., West Lafayette, IN, 2003.
    • (2003)
    • Wang, W.1
  • 36
    • 0141480038 scopus 로고    scopus 로고
    • The robustness of an almost-parsing language model given errorful training data
    • Hong Kong, China, Apr
    • W. Wang, M. P. Harper, and A. Stolcke, "The robustness of an almost-parsing language model given errorful training data," in Proc. IEEE Conf. Acoust., Speech, Signal Process., vol. 1, Hong Kong, China, Apr. 2003, pp. 240-243.
    • (2003) Proc. IEEE Conf. Acoust., Speech, Signal Process , vol.1 , pp. 240-243
    • Wang, W.1    Harper, M.P.2    Stolcke, A.3
  • 37
    • 4544383109 scopus 로고    scopus 로고
    • The use of a linguistically motivated language model in conversational speech recognition
    • Montreal, QC, Canada, May
    • W. Wang, A. Stolcke, and M. P. Harper, "The use of a linguistically motivated language model in conversational speech recognition," in Proc. IEEE Conf. Acoust., Speech, Signal Process., vol. 1, Montreal, QC, Canada, May 2004, pp. 261-264.
    • (2004) Proc. IEEE Conf. Acoust., Speech, Signal Process , vol.1 , pp. 261-264
    • Wang, W.1    Stolcke, A.2    Harper, M.P.3
  • 38
    • 34047245727 scopus 로고    scopus 로고
    • S. F. Chen and J. Goodman, An empirical study of smoothing techniques for language modeling, Computer Science Group, Harvard Univ., Cambridge, MA, Tech. Rep. TR-10-98, 1998.
    • S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Computer Science Group, Harvard Univ., Cambridge, MA, Tech. Rep. TR-10-98, 1998.
  • 39
    • 85009223249 scopus 로고    scopus 로고
    • Techniques for effective vocabulary selection
    • Geneva, Switzerland, Sep
    • A. Venkataraman and W. Wang, "Techniques for effective vocabulary selection," in Proc. 8th Eur. Conf. Speech Commun. Technol., Geneva, Switzerland, Sep. 2003, pp. 245-248.
    • (2003) Proc. 8th Eur. Conf. Speech Commun. Technol , pp. 245-248
    • Venkataraman, A.1    Wang, W.2
  • 43
    • 0028996852 scopus 로고
    • The 1994 HTK large vocabulary speech recognition system
    • Detroit, MI
    • P. Woodland, C. Leggetter, J. Odell, V. Valtchev, and S. Young, "The 1994 HTK large vocabulary speech recognition system," in Proc. ICASSP, Detroit, MI, 1995, pp. 73-76.
    • (1995) Proc. ICASSP , pp. 73-76
    • Woodland, P.1    Leggetter, C.2    Odell, J.3    Valtchev, V.4    Young, S.5
  • 45
    • 85093280076 scopus 로고    scopus 로고
    • Factored language models and generalized parallel backoff
    • J. Bilmes and K. Kirchhoff, "Factored language models and generalized parallel backoff," in Proc. HLT/NACCL, 2003, pp. 4-6.
    • (2003) Proc. HLT/NACCL , pp. 4-6
    • Bilmes, J.1    Kirchhoff, K.2
  • 47
    • 85009110467 scopus 로고    scopus 로고
    • Morphology-based language modeling for Arabic speech recognition
    • D. Vergyri, K. Kirchhoff, K. Duh, and A. Stolcke, "Morphology-based language modeling for Arabic speech recognition," in Proc. ICSLP, 2004, pp. 2245-2248.
    • (2004) Proc. ICSLP , pp. 2245-2248
    • Vergyri, D.1    Kirchhoff, K.2    Duh, K.3    Stolcke, A.4
  • 49
    • 34047258983 scopus 로고    scopus 로고
    • Porting Decipher from English to Mandarin
    • presented at the, Elect. Eng. Dept, Univ. Washington, Tech. Rep. UWEETR-2006-0013, Seattle, WA
    • M. Hwang, X. Lei, T. Ng, M. Ostendorf, A. Stolcke, W. Wang, J. Zheng, and V. Gadde, "Porting Decipher from English to Mandarin," presented at the NIST RT-04 EARS Fall Workshop 2004. Elect. Eng. Dept., Univ. Washington, Tech. Rep. UWEETR-2006-0013, Seattle, WA.
    • (2004) NIST RT-04 EARS Fall Workshop
    • Hwang, M.1    Lei, X.2    Ng, T.3    Ostendorf, M.4    Stolcke, A.5    Wang, W.6    Zheng, J.7    Gadde, V.8
  • 50
    • 34047258615 scopus 로고
    • New Mexico State Univ, Las Cruces, NM, Tech. Rep. MCCS-92-227
    • W. Jin, "Chinese segmentation and its diambiguation," New Mexico State Univ., Las Cruces, NM, Tech. Rep. MCCS-92-227, 1992.
    • (1992) Chinese segmentation and its diambiguation
    • Jin, W.1
  • 52
    • 84905283451 scopus 로고    scopus 로고
    • New methods in continuous Mandarin speech recognition
    • G. Kokkinakis, N. Fakotakis, and E. Dermatas, Eds, Rhodes, Greece, Sep
    • C. J. Chen, R. A. Gopinath, M. D. Monkowski, M. A. Picheny, and K. Shen, "New methods in continuous Mandarin speech recognition," in Proc. 5th Eur. Conf. Speech Commun. Technol., vol. 3, G. Kokkinakis, N. Fakotakis, and E. Dermatas, Eds., Rhodes, Greece, Sep. 1997, pp. 1543-1546.
    • (1997) Proc. 5th Eur. Conf. Speech Commun. Technol , vol.3 , pp. 1543-1546
    • Chen, C.J.1    Gopinath, R.A.2    Monkowski, M.D.3    Picheny, M.A.4    Shen, K.5
  • 53
    • 85135139722 scopus 로고    scopus 로고
    • A lognormal tied mixture model of pitch for prosody-based speaker recognition
    • G. Kokkinakis, N. Fakotakis, and E. Dermatas, Eds, Rhodes, Greece, Sep
    • M. K. Sönmez, L. Heck, M. Weintraub, and E. Shriberg, "A lognormal tied mixture model of pitch for prosody-based speaker recognition," in Proc. 5th Eur. Conf. Speech Commun. Technol., G. Kokkinakis, N. Fakotakis, and E. Dermatas, Eds., Rhodes, Greece, Sep. 1997, pp. 1391-1394.
    • (1997) Proc. 5th Eur. Conf. Speech Commun. Technol , pp. 1391-1394
    • Sönmez, M.K.1    Heck, L.2    Weintraub, M.3    Shriberg, E.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.