메뉴 건너뛰기




Volumn 14, Issue 5, 2006, Pages 1526-1539

Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

Author keywords

Conditional random field; Confusion network; Disfluency; Maximum entropy; Metadata extraction; Prosody; Punctuation; Rich transcription; Sentence boundary

Indexed keywords

CONDITIONAL RANDOM FIELDS; CONFUSION NETWORKS; MAXIMUM ENTROPY; METADATA EXTRACTION; PROSODY; PUNCTUATION; RICH TRANSCRIPTION; SENTENCE BOUNDARY;

EID: 34047266607     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2006.878255     Document Type: Article
Times cited : (255)

References (69)
  • 2
    • 0040958578 scopus 로고    scopus 로고
    • Speech repairs, intonational phrases and discourse markers: Modeling speakers' utterances in spoken dialogue
    • P. Heeman and J. Allen, "Speech repairs, intonational phrases and discourse markers: Modeling speakers' utterances in spoken dialogue," Comput. Ling., vol. 25, pp. 527-571, 1999.
    • (1999) Comput. Ling , vol.25 , pp. 527-571
    • Heeman, P.1    Allen, J.2
  • 3
    • 84919457977 scopus 로고    scopus 로고
    • The use of prosody in a combined system for punctuation generation and speech recognition
    • J. Kim and P. C. Woodland, "The use of prosody in a combined system for punctuation generation and speech recognition," in Proc. Eurospeech, 2001, pp. 2757-2760.
    • (2001) Proc. Eurospeech , pp. 2757-2760
    • Kim, J.1    Woodland, P.C.2
  • 6
    • 33646764337 scopus 로고    scopus 로고
    • A lexically-driven algorithm for disfluency detection
    • M. Snover, B. Dorr, and R. Schwartz, "A lexically-driven algorithm for disfluency detection," in Proc. HLT/NAACL, 2004, pp. 157-160.
    • (2004) Proc. HLT/NAACL , pp. 157-160
    • Snover, M.1    Dorr, B.2    Schwartz, R.3
  • 7
    • 34047256013 scopus 로고    scopus 로고
    • Automatic detection of sentence boundaries, disflucncies, and conversational fillers in spontaneous speech,
    • M.S. thesis, Univ. Washington, Seattle
    • J. Kim, "Automatic detection of sentence boundaries, disflucncies, and conversational fillers in spontaneous speech," M.S. thesis, Univ. Washington, Seattle, 2004.
    • (2004)
    • Kim, J.1
  • 8
    • 57849131781 scopus 로고    scopus 로고
    • A TAG-based noisy channel model of speech repairs
    • M. Johnson and E. Charniak, "A TAG-based noisy channel model of speech repairs," in Proc. ACL, 2004, pp. 33-39.
    • (2004) Proc. ACL , pp. 33-39
    • Johnson, M.1    Charniak, E.2
  • 9
    • 33646786923 scopus 로고    scopus 로고
    • Sentence-internal prosody does not help parsing the way punctuation does
    • M. Gregory, M. Johnson, and E. Charniak, "Sentence-internal prosody does not help parsing the way punctuation does," in Proc. HLT/NAACL, 2004, pp. 81-88.
    • (2004) Proc. HLT/NAACL , pp. 81-88
    • Gregory, M.1    Johnson, M.2    Charniak, E.3
  • 10
    • 85083514474 scopus 로고    scopus 로고
    • Parsing conversational speech using enhanced segmentation
    • J. G. Kahn, M. Ostendorf, and C. Chelba, "Parsing conversational speech using enhanced segmentation," in Proc. HLT/NAACL, 2004, pp. 121-128.
    • (2004) Proc. HLT/NAACL , pp. 121-128
    • Kahn, J.G.1    Ostendorf, M.2    Chelba, C.3
  • 12
    • 34047245387 scopus 로고    scopus 로고
    • S. Coquoz, Broadcast News Segmentation Using MDE and STT Information to Improve Speech Recognition, Int. Computer Sci. Inst., Berkeley, CA, Tech. Rep., 2004.
    • S. Coquoz, "Broadcast News Segmentation Using MDE and STT Information to Improve Speech Recognition," Int. Computer Sci. Inst., Berkeley, CA, Tech. Rep., 2004.
  • 13
    • 33646786242 scopus 로고    scopus 로고
    • Simple metadata annotation specification V6.2
    • LDC, Online, Available
    • S. Strassel, "Simple metadata annotation specification V6.2," LDC, [Online]. Available: http://www.ldc.upenn.edu/Projects/MDE/Guide-Iines/ SimpleMDE_V6.2.pdf, 2004.
    • (2004)
    • Strassel, S.1
  • 14
    • 34047262465 scopus 로고    scopus 로고
    • Gaithersburg, MD, Online] Available
    • Nat. Inst. Stand. Technol., Gaithersburg, MD. (2004). [Online] Available: http://www.nist.gov/speech/tests/rt/rt2004/fall/
    • (2004) Nat. Inst. Stand. Technol
  • 15
    • 34047252980 scopus 로고    scopus 로고
    • Online] Available
    • _, (2000) Significance Tests for ASR. [Online] Available: http://www.nist.gov/speech/tests/sigtests/sigtests.htm
    • (2000) Significance Tests for ASR
  • 16
    • 0034275920 scopus 로고    scopus 로고
    • Prosody-based automatic segmentation of speech into sentences and topics
    • E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur, "Prosody-based automatic segmentation of speech into sentences and topics," Speech Commun., pp. 127-154, 2000.
    • (2000) Speech Commun , pp. 127-154
    • Shriberg, E.1    Stolcke, A.2    Hakkani-Tur, D.3    Tur, G.4
  • 20
    • 85009145332 scopus 로고    scopus 로고
    • Prosody-based automatic detection of annoyance and frustration in human-computer dialog
    • J. Ang, R. Dhilon, A. Krupski, E. Shriberg, and A. Stolcke, "Prosody-based automatic detection of annoyance and frustration in human-computer dialog," in Proc. ICSLP, 2002, pp. 2037-2040.
    • (2002) Proc. ICSLP , pp. 2037-2040
    • Ang, J.1    Dhilon, R.2    Krupski, A.3    Shriberg, E.4    Stolcke, A.5
  • 21
    • 4544316886 scopus 로고    scopus 로고
    • A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues
    • D. Wang and S. S. Narayanan, "A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues," in Proc. ICASSP, 2004, pp. 525-528.
    • (2004) Proc. ICASSP , pp. 525-528
    • Wang, D.1    Narayanan, S.S.2
  • 22
    • 85009291541 scopus 로고    scopus 로고
    • Maximum entropy model for punctuation annotation from speech
    • J. Huang and G. Zweig, "Maximum entropy model for punctuation annotation from speech," in Proc. ICSLP, 2002, pp. 917-920.
    • (2002) Proc. ICSLP , pp. 917-920
    • Huang, J.1    Zweig, G.2
  • 24
    • 34047255011 scopus 로고    scopus 로고
    • Automatic punctuation annotation in czech broadcast news speech
    • J. Kolar, J. Svec, and J. Psutka, "Automatic punctuation annotation in czech broadcast news speech," in Proc. 9th Conf. Speech Comput., 2004, pp. 319-325.
    • (2004) Proc. 9th Conf. Speech Comput , pp. 319-325
    • Kolar, J.1    Svec, J.2    Psutka, J.3
  • 25
    • 85120620835 scopus 로고    scopus 로고
    • Edit detection and parsing for transcribed speech
    • E. Charniak and M. Johnson, "Edit detection and parsing for transcribed speech," in Proc. NAACL, 2001, pp. 118-126.
    • (2001) Proc. NAACL , pp. 118-126
    • Charniak, E.1    Johnson, M.2
  • 26
    • 33646762857 scopus 로고    scopus 로고
    • Automatic disfluency removal on recognized spontaneous speech - Rapid adaptation to speaker dependent disfluencies
    • M. Honal and T. Schultz, "Automatic disfluency removal on recognized spontaneous speech - Rapid adaptation to speaker dependent disfluencies," in Proc. ICASSP, 2005, pp. 969-972.
    • (2005) Proc. ICASSP , pp. 969-972
    • Honal, M.1    Schultz, T.2
  • 27
    • 56149102222 scopus 로고    scopus 로고
    • Corrections of disfluencies in spontaneous speech using a noisy-channel approach
    • _, "Corrections of disfluencies in spontaneous speech using a noisy-channel approach," in Proc. Eurospeech, 2003, pp. 2781-2784.
    • (2003) Proc. Eurospeech , pp. 2781-2784
    • Honal, M.1    Schultz, T.2
  • 28
    • 0028215480 scopus 로고
    • A corpus-based study of repair cues in spontaneous speech
    • C. Nakatani and J. Hirschberg, "A corpus-based study of repair cues in spontaneous speech," J. Acoust. Soc. Amer., pp. 1603-1616, 1994.
    • (1994) J. Acoust. Soc. Amer , pp. 1603-1616
    • Nakatani, C.1    Hirschberg, J.2
  • 29
    • 0000703860 scopus 로고    scopus 로고
    • Phonetic consequences of speech disfluency
    • E. Shriberg, "Phonetic consequences of speech disfluency," in Proc. Int. Conf. Phonetics Sci., 1999, pp. 619-622.
    • (1999) Proc. Int. Conf. Phonetics Sci , pp. 619-622
    • Shriberg, E.1
  • 30
    • 0030351630 scopus 로고    scopus 로고
    • Juncture cues to disfluency
    • R. Lickley, "Juncture cues to disfluency," in Proc. ICSLP, 1996, pp. 2478-2481.
    • (1996) Proc. ICSLP , pp. 2478-2481
    • Lickley, R.1
  • 31
    • 84878523744 scopus 로고    scopus 로고
    • Prosodic features of four types of disfluencies
    • G. Savova and J. Bachenko, "Prosodic features of four types of disfluencies," in Proc. DiSS, 2003, pp. 91-94.
    • (2003) Proc. DiSS , pp. 91-94
    • Savova, G.1    Bachenko, J.2
  • 32
    • 0010125082 scopus 로고    scopus 로고
    • A prosody-only decision-tree model for disfluency detection
    • E. Shriberg and A. Stolcke, "A prosody-only decision-tree model for disfluency detection," in Proc. Eurospeech, 1997, pp. 2383-2386.
    • (1997) Proc. Eurospeech , pp. 2383-2386
    • Shriberg, E.1    Stolcke, A.2
  • 33
    • 33646819463 scopus 로고    scopus 로고
    • Detecting structural meta-data with decision trees and transformation-based learning
    • J. Kim, S. Schwarm, and M. Ostendorf, "Detecting structural meta-data with decision trees and transformation-based learning," in Proc. HLT/NAACL, 2004, pp. 137-144.
    • (2004) Proc. HLT/NAACL , pp. 137-144
    • Kim, J.1    Schwarm, S.2    Ostendorf, M.3
  • 36
    • 0142179461 scopus 로고
    • Durational cues to prominence and grouping
    • Lund, Sweden
    • W. N. Campbell, "Durational cues to prominence and grouping," in Proc. ECSA Workshop on Prosody, Lund, Sweden, 1993, pp. 38-41.
    • (1993) Proc. ECSA Workshop on Prosody , pp. 38-41
    • Campbell, W.N.1
  • 37
    • 0028088647 scopus 로고
    • On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues
    • Oct
    • J. R. De Pijper and A. A. Sanderman, "On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues," J. Acoust. Soc. Amer., vol. 96, no. 4, pp. 2037-2047, Oct. 1994.
    • (1994) J. Acoust. Soc. Amer , vol.96 , Issue.4 , pp. 2037-2047
    • De Pijper, J.R.1    Sanderman, A.A.2
  • 38
    • 34047275519 scopus 로고
    • Peak, boundary and cohesion characteristics of prosodie grouping
    • Lund, Sweden
    • D. Hirst, "Peak, boundary and cohesion characteristics of prosodie grouping," in Proc. ECSA Workshop on Prosody, Lund, Sweden, 1993, pp. 32-37.
    • (1993) Proc. ECSA Workshop on Prosody , pp. 32-37
    • Hirst, D.1
  • 40
    • 0020121734 scopus 로고
    • Duration as a cue to the perception of a phrase boundary
    • D. R. Scott, "Duration as a cue to the perception of a phrase boundary," J. Acoust. Soc. Amer., vol. 71, no. 4, pp. 996-1007, 1982.
    • (1982) J. Acoust. Soc. Amer , vol.71 , Issue.4 , pp. 996-1007
    • Scott, D.R.1
  • 41
    • 0031033301 scopus 로고    scopus 로고
    • Prosodic features at discourse boundaries of different strength
    • Jan
    • M. Swerts, "Prosodic features at discourse boundaries of different strength," J. Acoust. Soc. Amer., vol. 101, no. 1, pp. 514-521, Jan. 1997.
    • (1997) J. Acoust. Soc. Amer , vol.101 , Issue.1 , pp. 514-521
    • Swerts, M.1
  • 42
    • 85128436986 scopus 로고    scopus 로고
    • Modeling dynamic prosodic variation for speaker verification
    • K. Sonmez, E. Shriberg, L. Heck, and M. Weintraub, "Modeling dynamic prosodic variation for speaker verification," in Proc. ICSLP, 1998, pp. 3189-3192.
    • (1998) Proc. ICSLP , pp. 3189-3192
    • Sonmez, K.1    Shriberg, E.2    Heck, L.3    Weintraub, M.4
  • 44
    • 0002200840 scopus 로고    scopus 로고
    • Acoustic modeling for the SRI Hub4 partitioned evaluation continuous speech recognition system
    • Chantilly, VA, Feb, Online, Available
    • A. Sankar, L. Heck, and A. Stolcke, "Acoustic modeling for the SRI Hub4 partitioned evaluation continuous speech recognition system," in Proc. DARPA Speech Recognition Workshop. Chantilly, VA, Feb. 1997, pp. 127-132. [Online]. Available: http://www.nist.gov/speech/proc/darpa97/html/ sankar1/sankar1.htm.
    • (1997) Proc. DARPA Speech Recognition Workshop , pp. 127-132
    • Sankar, A.1    Heck, L.2    Stolcke, A.3
  • 45
    • 34047275875 scopus 로고    scopus 로고
    • W. Buntime and R. Caruana, Introduction to IND Version 2.1 and Recursive Partitioning. Moffett Field, CA: NASA Ames Research Center, 1992
    • W. Buntime and R. Caruana, Introduction to IND Version 2.1 and Recursive Partitioning. Moffett Field, CA: NASA Ames Research Center, 1992.
  • 47
    • 0030374907 scopus 로고    scopus 로고
    • Automatic linguistic segmentation of conversational speech
    • A. Stolcke and E. Shriberg, "Automatic linguistic segmentation of conversational speech," in Proc. ICSLP, 1996, pp. 1005-1008.
    • (1996) Proc. ICSLP , pp. 1005-1008
    • Stolcke, A.1    Shriberg, E.2
  • 48
    • 84891308106 scopus 로고    scopus 로고
    • SRILM - An extensible language modeling toolkit
    • A. Stolcke, "SRILM - An extensible language modeling toolkit," in Proc. ICSLP, 2002, pp. 901-904.
    • (2002) Proc. ICSLP , pp. 901-904
    • Stolcke, A.1
  • 49
    • 34047250739 scopus 로고    scopus 로고
    • S. F. Chen and J. T. Goodman, An empirical study of smoothing techniques for language modeling, Comput. Sci. Group, Harvard University, Cambridge, MA, Tech. Rep., 1998.
    • S. F. Chen and J. T. Goodman, "An empirical study of smoothing techniques for language modeling," Comput. Sci. Group, Harvard University, Cambridge, MA, Tech. Rep., 1998.
  • 50
    • 0002652285 scopus 로고    scopus 로고
    • A maximum entropy approach to natural language processing
    • A. L. Berger, S. A. D. Pietra, and V. J. D. Pietra, "A maximum entropy approach to natural language processing," Comput. Ling., vol. 22, pp. 39-72, 1996.
    • (1996) Comput. Ling , vol.22 , pp. 39-72
    • Berger, A.L.1    Pietra, S.A.D.2    Pietra, V.J.D.3
  • 51
    • 85117232702 scopus 로고    scopus 로고
    • Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech
    • Y. Liu, A. Stolcke, E. Shriberg, and M. Harper, "Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech," in Proc. EMNLP, 2004, pp. 64-71.
    • (2004) Proc. EMNLP , pp. 64-71
    • Liu, Y.1    Stolcke, A.2    Shriberg, E.3    Harper, M.4
  • 52
    • 0000732463 scopus 로고
    • A limited memory algorithm for bound constrained optimization
    • R. H. Ryrd, P. Lu, and J. Nocedal, "A limited memory algorithm for bound constrained optimization," SIAM J. Sci. Statist. Comput., vol. 16, no. 5, pp. 1190-1208, 1995.
    • (1995) SIAM J. Sci. Statist. Comput , vol.16 , Issue.5 , pp. 1190-1208
    • Ryrd, R.H.1    Lu, P.2    Nocedal, J.3
  • 54
    • 0142192295 scopus 로고    scopus 로고
    • Conditional random field: Probabilistic models for segmenting and labeling sequence data
    • J. Lafferty, A. McCallum, and F. Pereira, "Conditional random field: probabilistic models for segmenting and labeling sequence data," in Proc. ICML, 2001, pp. 282-289.
    • (2001) Proc. ICML , pp. 282-289
    • Lafferty, J.1    McCallum, A.2    Pereira, F.3
  • 56
    • 85043116988 scopus 로고    scopus 로고
    • Shallow parsing with conditional random fields
    • F. Sha and F. Pereira, "Shallow parsing with conditional random fields," in Proc. HLT/NAACL, 2003, pp. 134-141.
    • (2003) Proc. HLT/NAACL , pp. 134-141
    • Sha, F.1    Pereira, F.2
  • 57
    • 85121365374 scopus 로고    scopus 로고
    • Early results for named entity recognition with conditional random fields
    • A. McCallum and W. Li, "Early results for named entity recognition with conditional random fields," in Proc. CoNLL, 2003, pp. 188-191.
    • (2003) Proc. CoNLL , pp. 188-191
    • McCallum, A.1    Li, W.2
  • 58
    • 0003332498 scopus 로고    scopus 로고
    • TnT a statistical part-of-speech tageer
    • T. Brants, "TnT a statistical part-of-speech tageer," in Proc. 6th Applied NLP Conf., 2000, pp. 224-231.
    • (2000) Proc. 6th Applied NLP Conf , pp. 224-231
    • Brants, T.1
  • 60
    • 85120618233 scopus 로고    scopus 로고
    • Transformation-based learning in the fast lane
    • Jun
    • G. Ngai and R. Florian, "Transformation-based learning in the fast lane," in Proc. NAACL, Jun. 2001, pp. 40-47.
    • (2001) Proc. NAACL , pp. 40-47
    • Ngai, G.1    Florian, R.2
  • 61
    • 33746529930 scopus 로고    scopus 로고
    • A study in machine learning from imbalanced data for sentence boundary detection in speech
    • to be published
    • Y. Liu, N. Chawla, M. Harper, E. Shriberg, and A. Stolcke, "A study in machine learning from imbalanced data for sentence boundary detection in speech," Comput. Speech Lang., pp. 468-494, 2006, to be published.
    • (2006) Comput. Speech Lang , pp. 468-494
    • Liu, Y.1    Chawla, N.2    Harper, M.3    Shriberg, E.4    Stolcke, A.5
  • 62
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123-140, 1996.
    • (1996) Mach. Learn , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 63
    • 33646809491 scopus 로고    scopus 로고
    • Structural event detection for rich transcription of speech,
    • Ph.D. dissertation, Purdue Univ, West Lafayette, IN
    • Y. Liu, "Structural event detection for rich transcription of speech," Ph.D. dissertation, Purdue Univ., West Lafayette, IN, 2004.
    • (2004)
    • Liu, Y.1
  • 64
    • 85009223733 scopus 로고    scopus 로고
    • Automatic disfluency identification in conversational speech using multiple knowledge sources
    • Y. Liu, E. Shriberg, and A. Stolcke, "Automatic disfluency identification in conversational speech using multiple knowledge sources," in Proc. Eurospeech, 2003, pp. 957-960.
    • (2003) Proc. Eurospeech , pp. 957-960
    • Liu, Y.1    Shriberg, E.2    Stolcke, A.3
  • 65
    • 0032253898 scopus 로고    scopus 로고
    • Repeating words in spontaneous speech
    • H. H. Clark and T. Wasow, "Repeating words in spontaneous speech," Cognitive Psych., pp. 201-242, 1998.
    • (1998) Cognitive Psych , pp. 201-242
    • Clark, H.H.1    Wasow, T.2
  • 66
    • 34047272452 scopus 로고    scopus 로고
    • Word fragment identification using acoustic-prosodic features in conversational speech
    • Y. Liu, "Word fragment identification using acoustic-prosodic features in conversational speech," in Proc. HLT Student Workshop, 2003, pp. 37-42.
    • (2003) Proc. HLT Student Workshop , pp. 37-42
    • Liu, Y.1
  • 67
    • 33646790992 scopus 로고    scopus 로고
    • Improving automatic sentence boundary detection with confusion networks
    • D. Hillard, M. Ostendorf, A. Stolcke, Y. Liu, and E. Shriberg, "Improving automatic sentence boundary detection with confusion networks," in Proc. HLT/NAACL, 2004, pp. 69-72.
    • (2004) Proc. HLT/NAACL , pp. 69-72
    • Hillard, D.1    Ostendorf, M.2    Stolcke, A.3    Liu, Y.4    Shriberg, E.5
  • 68
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition: Word error minimization and other applications of confusion networks
    • L. Mangu, E. Brill, and A. Stolcke, "Finding consensus in speech recognition: word error minimization and other applications of confusion networks," Comput. Speech Lang., pp. 373-400, 2000.
    • (2000) Comput. Speech Lang , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.