메뉴 건너뛰기




Volumn , Issue , 2008, Pages 497-528

Speaker Classification for Next-Generation Voice-Dialog Systems

Author keywords

Acoustic prosodic speech signal analysis; Automatic voice dialog systems; Category specific phoneme bi grams; Classification inherent uncertainty; Emotion aware voice portals; Mixed initiative and directed dialogs; Phoneme recognition based classifiers; Very large vocabulary speaker independent speech recognizer; Voice controlled customer self service applications; Voice controlled value added services

Indexed keywords


EID: 67349258028     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1002/9780470727188.ch17     Document Type: Chapter
Times cited : (3)

References (46)
  • 1
    • 85009145332 scopus 로고    scopus 로고
    • Prosody-based Automatic Detection of Annoyance and Frustration in Human-computer Dialog
    • Proceeding of International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA
    • Ang, J.; Dhillon, R.; Krupski, A.; Shriberg, E.; Stolcke, A. (2002). Prosody-based Automatic Detection of Annoyance and Frustration in Human-computer Dialog, Proceeding of International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA.
    • (2002)
    • Ang, J.1    Dhillon, R.2    Krupski, A.3    Shriberg, E.4    Stolcke, A.5
  • 2
    • 0003430332 scopus 로고    scopus 로고
    • Clinical Measurement of Speech and Voice
    • 2nd edn, Singular Publishing Group, San Diego, CA, USA
    • Baken, R.; Orlikoff, R. (2000). Clinical Measurement of Speech and Voice, 2nd edn, Singular Publishing Group, San Diego, CA, USA.
    • (2000)
    • Baken, R.1    Orlikoff, R.2
  • 3
    • 0003432984 scopus 로고    scopus 로고
    • How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues
    • Enterprise Integration Group, Inc., San Ramon
    • Balentine, B.; Morgan, D. P. (1999). How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues, Enterprise Integration Group, Inc., San Ramon.
    • (1999)
    • Balentine, B.1    Morgan, D.P.2
  • 4
    • 84889375807 scopus 로고    scopus 로고
    • A Taxonomy of Applications that Utilize Emotional Awareness
    • Proceedings of Fifth Slovenian and First International Language Technologies Conference (SLTS), Ljubljana, Slovenia
    • Batliner, A.; Burkhardt, F.; van Ballegooy, M.; Nöth, E. (2006). A Taxonomy of Applications that Utilize Emotional Awareness, Proceedings of Fifth Slovenian and First International Language Technologies Conference (SLTS), Ljubljana, Slovenia.
    • (2006)
    • Batliner, A.1    Burkhardt, F.2    Van Ballegooy, M.3    Nöth, E.4
  • 5
    • 84889359142 scopus 로고    scopus 로고
    • Study of the Utility and Validity of Voice Stress Analyzers
    • Board for Professional and Occupational Regulation, Technical report, Department of Professional and Occupational Regulation
    • Board for Professional and Occupational Regulation (2003). Study of the Utility and Validity of Voice Stress Analyzers, Technical report, Department of Professional and Occupational Regulation.
    • (2003)
  • 6
    • 33745201308 scopus 로고    scopus 로고
    • Estimating Speaker Age across Languages
    • Proceedings of International Congress of Phonetic Sciences (ICPhS), San Francisco, CA, USA
    • Braun, A.; Cerrato, L. (1999). Estimating Speaker Age across Languages, Proceedings of International Congress of Phonetic Sciences (ICPhS), San Francisco, CA, USA, vol. 2, pp. 1369-1372.
    • (1999) , vol.2 , pp. 1369-1372
    • Braun, A.1    Cerrato, L.2
  • 7
    • 0003802343 scopus 로고
    • Classification and Regression Trees
    • Chapman & Hall, New York, NY, USA
    • Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. (1984). Classification and Regression Trees, Chapman & Hall, New York, NY, USA.
    • (1984)
    • Breiman, L.1    Friedman, J.2    Olshen, R.3    Stone, C.4
  • 8
    • 44949198552 scopus 로고    scopus 로고
    • Detecting Anger in Automated Voice Portal Dialogs
    • Proceedings of Interspeech, Pittsburgh, PA, USA
    • Burkhardt, F.; Ajmera, J.; Englert, R.; Burleson, W.; Stegmann, J. (2006). Detecting Anger in Automated Voice Portal Dialogs, Proceedings of Interspeech, Pittsburgh, PA, USA.
    • (2006)
    • Burkhardt, F.1    Ajmera, J.2    Englert, R.3    Burleson, W.4    Stegmann, J.5
  • 9
    • 36249021789 scopus 로고    scopus 로고
    • An Emotion-Aware Voice Portal
    • Proceedings of Conference for Electronic Speech Signal Processing (ESSP), Prague, Czech Republic
    • Burkhardt, F.; van Ballegooy, M.; Englert, R.; Huber, R. (2005a). An Emotion-Aware Voice Portal, Proceedings of Conference for Electronic Speech Signal Processing (ESSP), Prague, Czech Republic.
    • (2005)
    • Burkhardt, F.1    Van Ballegooy, M.2    Englert, R.3    Huber, R.4
  • 10
    • 84874244450 scopus 로고    scopus 로고
    • A Voiceportal Enhanced by Semantic Processing and Affect Awareness
    • Proceedings of INFORMATIK 2005, Gesellschaft für Informatik, Bonn, Germany
    • Burkhardt, F.; van Ballegooy, M.; Stegmann, J. (2005b). A Voiceportal Enhanced by Semantic Processing and Affect Awareness, Proceedings of INFORMATIK 2005, Gesellschaft für Informatik, Bonn, Germany, vol. 2.
    • (2005) , vol.2
    • Burkhardt, F.1    Van Ballegooy, M.2    Stegmann, J.3
  • 11
    • 0033747189 scopus 로고    scopus 로고
    • Subjective Age Estimation of Telephonic Voices
    • Cerrato, L.; Falcone, M.; Paoloni, A. (2000). Subjective Age Estimation of Telephonic Voices, Speech Communication, vol. 31, no. 2-3, pp. 107-102.
    • (2000) Speech Communication , vol.31 , Issue.2-3 , pp. 107-102
    • Cerrato, L.1    Falcone, M.2    Paoloni, A.3
  • 12
    • 84889419264 scopus 로고    scopus 로고
    • Annotation and Detection of Emotion in a Taskoriented Human-Human Dialog Corpus
    • Proceedings on ISLE Workshop on Dialogue Tagging
    • Devillers, L.; Lamel, L.; Vasilescu, I. (2002). Annotation and Detection of Emotion in a Taskoriented Human-Human Dialog Corpus, Proceedings on ISLE Workshop on Dialogue Tagging.
    • (2002)
    • Devillers, L.1    Lamel, L.2    Vasilescu, I.3
  • 13
    • 0031269184 scopus 로고    scopus 로고
    • On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
    • Domingos, P.; Pazzani, M. J. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Machine Learning, vol. 29, no. 2-3, pp. 103-130.
    • (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 103-130
    • Domingos, P.1    Pazzani, M.J.2
  • 14
    • 0003922190 scopus 로고    scopus 로고
    • Pattern Classification
    • 2nd edn, Wiley Interscience
    • Duda, R. O.; Hart, P. E.; Stork, D. G. (2000). Pattern Classification, 2nd edn, Wiley Interscience.
    • (2000)
    • Duda, R.O.1    Hart, P.E.2    Stork, D.G.3
  • 15
    • 0036985308 scopus 로고    scopus 로고
    • Harmonics-to-Noise-Ratio: an Index of Vocal Aging
    • Ferrand, C. T. (2002). Harmonics-to-Noise-Ratio: an Index of Vocal Aging, Journal of Voice, vol. 16, no. 4, pp. 480-487.
    • (2002) Journal of Voice , vol.16 , Issue.4 , pp. 480-487
    • Ferrand, C.T.1
  • 16
    • 0003473607 scopus 로고
    • Statistical Pattern Recognition
    • 2nd edn, Academic Press, San Diego, CA, USA
    • Fukunaga, K. (1990). Statistical Pattern Recognition, 2nd edn, Academic Press, San Diego, CA, USA.
    • (1990)
    • Fukunaga, K.1
  • 17
    • 84889332827 scopus 로고    scopus 로고
    • Usability of a Telephone-Based Speech Dialogue System as Experienced by User Groups of Different Age and Background
    • Proceedings of 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany
    • Hempel, T. (2006). Usability of a Telephone-Based Speech Dialogue System as Experienced by User Groups of Different Age and Background, Proceedings of 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany.
    • (2006)
    • Hempel, T.1
  • 18
    • 0004249294 scopus 로고    scopus 로고
    • SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line
    • Proceedings of Eurospeech 1999, ISCA, Budapest, Hungary
    • Höge, H.; Draxler, C.; van den Heuvel, H.; Johansen, F. T.; Sanders, E.; Tropf, H. S. (1999). SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line, Proceedings of Eurospeech 1999, ISCA, Budapest, Hungary. http://www.speechdat.org/.
    • (1999)
    • Höge, H.1    Draxler, C.2    Van Den Heuvel, H.3    Johansen, F.T.4    Sanders, E.5    Tropf, H.S.6
  • 19
    • 0003786003 scopus 로고    scopus 로고
    • Statistical Methods for Speech Recognition
    • MIT Press, Boston
    • Jelinek, F. (1998). Statistical Methods for Speech Recognition, MIT Press, Boston.
    • (1998)
    • Jelinek, F.1
  • 20
    • 0002658916 scopus 로고    scopus 로고
    • Articulatory Reduction in Emotional Speech
    • Proceedings of Eurospeech 99 Budapest
    • Kienast, M.; Paeschke, A.; Sendlmeier, W. F. (1999). Articulatory Reduction in Emotional Speech, Proceedings of Eurospeech 99 Budapest, pp. 117-120.
    • (1999) , pp. 117-120
    • Kienast, M.1    Paeschke, A.2    Sendlmeier, W.F.3
  • 21
    • 84889349060 scopus 로고
    • Ein Mehrkanalverfahren zur Berechnung der Grundfrequenzkontur unter Einsatz der dynamischen Programmierung
    • Master's thesis, Universität Erlangen-Nürnberg
    • Kompe, R. (1989). Ein Mehrkanalverfahren zur Berechnung der Grundfrequenzkontur unter Einsatz der dynamischen Programmierung, Master's thesis, Universität Erlangen-Nürnberg.
    • (1989)
    • Kompe, R.1
  • 23
    • 84889399491 scopus 로고    scopus 로고
    • Vocal Aging, Singular Publishing Group
    • San Diego, CA, USA
    • Linville, S. E. (2001). Vocal Aging, Singular Publishing Group, San Diego, CA, USA.
    • (2001)
    • Linville, S.E.1
  • 24
    • 84889451757 scopus 로고    scopus 로고
    • Using Context to Improve Emotion Detection in Spoken Dialog Systems
    • Proceedings of Interspeech, Lisbon, Portugal
    • Liscombe, J.; Riccardi, G.; Hakkani-Tür, D. (2005). Using Context to Improve Emotion Detection in Spoken Dialog Systems, Proceedings of Interspeech, Lisbon, Portugal.
    • (2005)
    • Liscombe, J.1    Riccardi, G.2    Hakkani-Tür, D.3
  • 25
    • 34547542381 scopus 로고    scopus 로고
    • Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications
    • Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii
    • Metze, F.; Ajmera, J.; Englert, R.; Bub, U.; Burkhardt, F.; Stegmann, J.; Müller, C.; Huber, R.; Andrassy, B.; Bauer, J. G.; Littel, B. (2007). Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii.
    • (2007)
    • Metze, F.1    Ajmera, J.2    Englert, R.3    Bub, U.4    Burkhardt, F.5    Stegmann, J.6    Müller, C.7    Huber, R.8    Andrassy, B.9    Bauer, J.G.10    Littel, B.11
  • 26
    • 0036299156 scopus 로고    scopus 로고
    • Automatic Estimation of One's Age with His/Her Speech Based Upon Acoustic Modeling Techniques of Speakers
    • Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA
    • Minematsu, N.; Sekiguchi, M.; Hirose, K. (2002). Automatic Estimation of One's Age with His/Her Speech Based Upon Acoustic Modeling Techniques of Speakers, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA.
    • (2002)
    • Minematsu, N.1    Sekiguchi, M.2    Hirose, K.3
  • 27
    • 36249021170 scopus 로고    scopus 로고
    • Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht
    • PhD thesis, Computer Science Institute; Universität des Saarlandes; Germany
    • Müller, C. (2005). Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht, PhD thesis, Computer Science Institute; Universität des Saarlandes; Germany.
    • (2005)
    • Müller, C.1
  • 28
    • 79958831606 scopus 로고    scopus 로고
    • Speaker Classification
    • Springer, New York-Berlin
    • Müller, C.; Schötz, S. (eds.) (2007). Speaker Classification, Springer, New York-Berlin.
    • (2007)
    • Müller, C.1    Schötz, S.2
  • 29
    • 84994747210 scopus 로고    scopus 로고
    • Exploiting Speech for Recognizing Elderly Users to Respond to their Special Needs
    • Proceedings of Interspeech (Eurospeech), ISCA, Geneva, Switzerland
    • Müller, C.; Wittig, F.; Baus, J. (2003). Exploiting Speech for Recognizing Elderly Users to Respond to their Special Needs, Proceedings of Interspeech (Eurospeech), ISCA, Geneva, Switzerland.
    • (2003)
    • Müller, C.1    Wittig, F.2    Baus, J.3
  • 30
    • 0001937343 scopus 로고
    • Pitch Duration Characteristics of Older Males
    • Mysak, E. D. (1959). Pitch Duration Characteristics of Older Males, Journal of Speech and Hearing Research, vol. 2, pp. 46-54.
    • (1959) Journal of Speech and Hearing Research , vol.2 , pp. 46-54
    • Mysak, E.D.1
  • 31
    • 0033329296 scopus 로고    scopus 로고
    • Emotion in Speech: Recognition and Application to Call Centers
    • Conference on Artificial Neural Networks In Engineering (ANNIE), St Louis, USA
    • Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers, Conference on Artificial Neural Networks In Engineering (ANNIE), St Louis, USA.
    • (1999)
    • Petrushin, V.1
  • 32
    • 0003425258 scopus 로고
    • Digital Processing of Speech Signals
    • Prentice-Hall
    • Rabiner, L. R. (1978). Digital Processing of Speech Signals, Prentice-Hall.
    • (1978)
    • Rabiner, L.R.1
  • 33
    • 0024610919 scopus 로고
    • A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
    • Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286.
    • (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 35
    • 0036293830 scopus 로고    scopus 로고
    • An Overview of Automatic Speaker Recognition Technology
    • Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA
    • Reynolds, D. A. (2002). An Overview of Automatic Speaker Recognition Technology, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA, vol. 4, pp. 4072-4075.
    • (2002) , vol.4 , pp. 4072-4075
    • Reynolds, D.A.1
  • 37
    • 0003408420 scopus 로고    scopus 로고
    • Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning)
    • MIT Press, Cambridge, MA, USA
    • Schölkopf, B.; Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning), MIT Press, Cambridge, MA, USA.
    • (2002)
    • Schölkopf, B.1    Smola, A.2
  • 38
    • 67349139346 scopus 로고    scopus 로고
    • Automatic Prediction of Speaker Age Using CART
    • Term paper for course in Forensic Phonetics, Göteborg University
    • Schötz, S. (2004). Automatic Prediction of Speaker Age Using CART, http://www.ling.lu.se/persons/Suzi/downloads/RF paper SusanneS2004.pdf. Term paper for course in Forensic Phonetics, Göteborg University.
    • (2004)
    • Schötz, S.1
  • 39
    • 85013700737 scopus 로고    scopus 로고
    • Multilingual Speech Processing
    • Academic Press
    • Schultz, T.; Kirchhoff, K. (eds.) (2006). Multilingual Speech Processing, Academic Press.
    • (2006)
    • Schultz, T.1    Kirchhoff, K.2
  • 40
    • 84946723550 scopus 로고    scopus 로고
    • Voice Signatures
    • Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, U.S. Virgin Islands
    • Shafran, I.; Riley, M.; Mohri, M. (2003). Voice Signatures, Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, U.S. Virgin Islands.
    • (2003)
    • Shafran, I.1    Riley, M.2    Mohri, M.3
  • 41
    • 33646774273 scopus 로고    scopus 로고
    • Of all Things the Measure is Man-Classification of Emotions and Inter-Labeler Consistency
    • Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
    • Steidl, S.; Levit, M.; Batliner, A.; Nöth, E.; Niemann, H. (2005). Of all Things the Measure is Man-Classification of Emotions and Inter-Labeler Consistency, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
    • (2005)
    • Steidl, S.1    Levit, M.2    Batliner, A.3    Nöth, E.4    Niemann, H.5
  • 42
    • 29044438709 scopus 로고    scopus 로고
    • Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection
    • Lawyers & Judges Publishing Company
    • Tanner, D. C.; Tanner, M. E. (2004). Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection, Lawyers & Judges Publishing Company.
    • (2004)
    • Tanner, D.C.1    Tanner, M.E.2
  • 43
    • 0004217877 scopus 로고
    • Information Retrieval
    • Butterworths, London
    • van Rijsbergen, C. J. (1979). Information Retrieval, Butterworths, London.
    • (1979)
    • Van Rijsbergen, C.J.1
  • 45
    • 84947280249 scopus 로고    scopus 로고
    • Recognition of Emotions in Interactive Voice Response Systems
    • Proceedings of Interspeech (Eurospeech), Geneva, Switzerland
    • Yacoub, S.; Simske, S.; Lin, X.; Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems, Proceedings of Interspeech (Eurospeech), Geneva, Switzerland.
    • (2003)
    • Yacoub, S.1    Simske, S.2    Lin, X.3    Burns, J.4
  • 46
    • 0031624532 scopus 로고    scopus 로고
    • Speech Recognition with Dynamic Bayesian Networks
    • Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI), Madison, WI
    • Zweig, G.; Russell, S. (1998). Speech Recognition with Dynamic Bayesian Networks, Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI), Madison, WI.
    • (1998)
    • Zweig, G.1    Russell, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.