SCOPUS 정보 검색 플랫폼

Volumn , Issue , 2008, Pages 497-528

Speaker Classification for Next-Generation Voice-Dialog Systems

(3) Burkhardt, Felix a Metze, Florian b Stegmann, Joachim a

b DEUTSCHE TELEKOM LABORATORIES (Germany)

Author keywords

Acoustic prosodic speech signal analysis; Automatic voice dialog systems; Category specific phoneme bi grams; Classification inherent uncertainty; Emotion aware voice portals; Mixed initiative and directed dialogs; Phoneme recognition based classifiers; Very large vocabulary speaker independent speech recognizer; Voice controlled customer self service applications; Voice controlled value added services

Indexed keywords

EID: 67349258028 PISSN: None EISSN: None Source Type: Book
DOI: 10.1002/9780470727188.ch17 Document Type: Chapter

Times cited : (3)

References (46)

1
- 85009145332
- Prosody-based Automatic Detection of Annoyance and Frustration in Human-computer Dialog
- Proceeding of International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA
- Ang, J.; Dhillon, R.; Krupski, A.; Shriberg, E.; Stolcke, A. (2002). Prosody-based Automatic Detection of Annoyance and Frustration in Human-computer Dialog, Proceeding of International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA.
- (2002)
- Ang, J.¹ Dhillon, R.² Krupski, A.³ Shriberg, E.⁴ Stolcke, A.⁵

2
- 0003430332
- Clinical Measurement of Speech and Voice
- 2nd edn, Singular Publishing Group, San Diego, CA, USA
- Baken, R.; Orlikoff, R. (2000). Clinical Measurement of Speech and Voice, 2nd edn, Singular Publishing Group, San Diego, CA, USA.
- (2000)
- Baken, R.¹ Orlikoff, R.²

3
- 0003432984
- How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues
- Enterprise Integration Group, Inc., San Ramon
- Balentine, B.; Morgan, D. P. (1999). How to Build a Speech Recognition Application. A Style Guide for Telephony Dialogues, Enterprise Integration Group, Inc., San Ramon.
- (1999)
- Balentine, B.¹ Morgan, D.P.²

4
- 84889375807
- A Taxonomy of Applications that Utilize Emotional Awareness
- Proceedings of Fifth Slovenian and First International Language Technologies Conference (SLTS), Ljubljana, Slovenia
- Batliner, A.; Burkhardt, F.; van Ballegooy, M.; Nöth, E. (2006). A Taxonomy of Applications that Utilize Emotional Awareness, Proceedings of Fifth Slovenian and First International Language Technologies Conference (SLTS), Ljubljana, Slovenia.
- (2006)
- Batliner, A.¹ Burkhardt, F.² Van Ballegooy, M.³ Nöth, E.⁴

5
- 84889359142
- Study of the Utility and Validity of Voice Stress Analyzers
- Board for Professional and Occupational Regulation, Technical report, Department of Professional and Occupational Regulation
- Board for Professional and Occupational Regulation (2003). Study of the Utility and Validity of Voice Stress Analyzers, Technical report, Department of Professional and Occupational Regulation.
- (2003)

6
- 33745201308
- Estimating Speaker Age across Languages
- Proceedings of International Congress of Phonetic Sciences (ICPhS), San Francisco, CA, USA
- Braun, A.; Cerrato, L. (1999). Estimating Speaker Age across Languages, Proceedings of International Congress of Phonetic Sciences (ICPhS), San Francisco, CA, USA, vol. 2, pp. 1369-1372.
- (1999) , vol.2 , pp. 1369-1372
- Braun, A.¹ Cerrato, L.²

7
- 0003802343
- Classification and Regression Trees
- Chapman & Hall, New York, NY, USA
- Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. (1984). Classification and Regression Trees, Chapman & Hall, New York, NY, USA.
- (1984)
- Breiman, L.¹ Friedman, J.² Olshen, R.³ Stone, C.⁴

8
- 44949198552
- Detecting Anger in Automated Voice Portal Dialogs
- Proceedings of Interspeech, Pittsburgh, PA, USA
- Burkhardt, F.; Ajmera, J.; Englert, R.; Burleson, W.; Stegmann, J. (2006). Detecting Anger in Automated Voice Portal Dialogs, Proceedings of Interspeech, Pittsburgh, PA, USA.
- (2006)
- Burkhardt, F.¹ Ajmera, J.² Englert, R.³ Burleson, W.⁴ Stegmann, J.⁵

9
- 36249021789
- An Emotion-Aware Voice Portal
- Proceedings of Conference for Electronic Speech Signal Processing (ESSP), Prague, Czech Republic
- Burkhardt, F.; van Ballegooy, M.; Englert, R.; Huber, R. (2005a). An Emotion-Aware Voice Portal, Proceedings of Conference for Electronic Speech Signal Processing (ESSP), Prague, Czech Republic.
- (2005)
- Burkhardt, F.¹ Van Ballegooy, M.² Englert, R.³ Huber, R.⁴

10
- 84874244450
- A Voiceportal Enhanced by Semantic Processing and Affect Awareness
- Proceedings of INFORMATIK 2005, Gesellschaft für Informatik, Bonn, Germany
- Burkhardt, F.; van Ballegooy, M.; Stegmann, J. (2005b). A Voiceportal Enhanced by Semantic Processing and Affect Awareness, Proceedings of INFORMATIK 2005, Gesellschaft für Informatik, Bonn, Germany, vol. 2.
- (2005) , vol.2
- Burkhardt, F.¹ Van Ballegooy, M.² Stegmann, J.³

11
- 0033747189
- Subjective Age Estimation of Telephonic Voices
- Cerrato, L.; Falcone, M.; Paoloni, A. (2000). Subjective Age Estimation of Telephonic Voices, Speech Communication, vol. 31, no. 2-3, pp. 107-102.
- (2000) Speech Communication , vol.31 , Issue.2-3 , pp. 107-102
- Cerrato, L.¹ Falcone, M.² Paoloni, A.³

12
- 84889419264
- Annotation and Detection of Emotion in a Taskoriented Human-Human Dialog Corpus
- Proceedings on ISLE Workshop on Dialogue Tagging
- Devillers, L.; Lamel, L.; Vasilescu, I. (2002). Annotation and Detection of Emotion in a Taskoriented Human-Human Dialog Corpus, Proceedings on ISLE Workshop on Dialogue Tagging.
- (2002)
- Devillers, L.¹ Lamel, L.² Vasilescu, I.³

13
- 0031269184
- On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
- Domingos, P.; Pazzani, M. J. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Machine Learning, vol. 29, no. 2-3, pp. 103-130.
- (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 103-130
- Domingos, P.¹ Pazzani, M.J.²

14
- 0003922190
- Pattern Classification
- 2nd edn, Wiley Interscience
- Duda, R. O.; Hart, P. E.; Stork, D. G. (2000). Pattern Classification, 2nd edn, Wiley Interscience.
- (2000)
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

15
- 0036985308
- Harmonics-to-Noise-Ratio: an Index of Vocal Aging
- Ferrand, C. T. (2002). Harmonics-to-Noise-Ratio: an Index of Vocal Aging, Journal of Voice, vol. 16, no. 4, pp. 480-487.
- (2002) Journal of Voice , vol.16 , Issue.4 , pp. 480-487
- Ferrand, C.T.¹

16
- 0003473607
- Statistical Pattern Recognition
- 2nd edn, Academic Press, San Diego, CA, USA
- Fukunaga, K. (1990). Statistical Pattern Recognition, 2nd edn, Academic Press, San Diego, CA, USA.
- (1990)
- Fukunaga, K.¹

17
- 84889332827
- Usability of a Telephone-Based Speech Dialogue System as Experienced by User Groups of Different Age and Background
- Proceedings of 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany
- Hempel, T. (2006). Usability of a Telephone-Based Speech Dialogue System as Experienced by User Groups of Different Age and Background, Proceedings of 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany.
- (2006)
- Hempel, T.¹

18
- 0004249294
- SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line
- Proceedings of Eurospeech 1999, ISCA, Budapest, Hungary
- Höge, H.; Draxler, C.; van den Heuvel, H.; Johansen, F. T.; Sanders, E.; Tropf, H. S. (1999). SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line, Proceedings of Eurospeech 1999, ISCA, Budapest, Hungary. http://www.speechdat.org/.
- (1999)
- Höge, H.¹ Draxler, C.² Van Den Heuvel, H.³ Johansen, F.T.⁴ Sanders, E.⁵ Tropf, H.S.⁶

19
- 0003786003
- Statistical Methods for Speech Recognition
- MIT Press, Boston
- Jelinek, F. (1998). Statistical Methods for Speech Recognition, MIT Press, Boston.
- (1998)
- Jelinek, F.¹

20
- 0002658916
- Articulatory Reduction in Emotional Speech
- Proceedings of Eurospeech 99 Budapest
- Kienast, M.; Paeschke, A.; Sendlmeier, W. F. (1999). Articulatory Reduction in Emotional Speech, Proceedings of Eurospeech 99 Budapest, pp. 117-120.
- (1999) , pp. 117-120
- Kienast, M.¹ Paeschke, A.² Sendlmeier, W.F.³

21
- 84889349060
- Ein Mehrkanalverfahren zur Berechnung der Grundfrequenzkontur unter Einsatz der dynamischen Programmierung
- Master's thesis, Universität Erlangen-Nürnberg
- Kompe, R. (1989). Ein Mehrkanalverfahren zur Berechnung der Grundfrequenzkontur unter Einsatz der dynamischen Programmierung, Master's thesis, Universität Erlangen-Nürnberg.
- (1989)
- Kompe, R.¹

22
- 14644439843
- Toward Detecting Emotions in Spoken Dialogs
- Lee, C. M.; Narayanan, S. S. (2005). Toward Detecting Emotions in Spoken Dialogs, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 2, pp. 293-303.
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.2 , pp. 293-303
- Lee, C.M.¹ Narayanan, S.S.²

23
- 84889399491
- Vocal Aging, Singular Publishing Group
- San Diego, CA, USA
- Linville, S. E. (2001). Vocal Aging, Singular Publishing Group, San Diego, CA, USA.
- (2001)
- Linville, S.E.¹

24
- 84889451757
- Using Context to Improve Emotion Detection in Spoken Dialog Systems
- Proceedings of Interspeech, Lisbon, Portugal
- Liscombe, J.; Riccardi, G.; Hakkani-Tür, D. (2005). Using Context to Improve Emotion Detection in Spoken Dialog Systems, Proceedings of Interspeech, Lisbon, Portugal.
- (2005)
- Liscombe, J.¹ Riccardi, G.² Hakkani-Tür, D.³

25
- 34547542381
- Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications
- Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii
- Metze, F.; Ajmera, J.; Englert, R.; Bub, U.; Burkhardt, F.; Stegmann, J.; Müller, C.; Huber, R.; Andrassy, B.; Bauer, J. G.; Littel, B. (2007). Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, Hawaii.
- (2007)
- Metze, F.¹ Ajmera, J.² Englert, R.³ Bub, U.⁴ Burkhardt, F.⁵ Stegmann, J.⁶ Müller, C.⁷ Huber, R.⁸ Andrassy, B.⁹ Bauer, J.G.¹⁰ Littel, B.¹¹

26
- 0036299156
- Automatic Estimation of One's Age with His/Her Speech Based Upon Acoustic Modeling Techniques of Speakers
- Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA
- Minematsu, N.; Sekiguchi, M.; Hirose, K. (2002). Automatic Estimation of One's Age with His/Her Speech Based Upon Acoustic Modeling Techniques of Speakers, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA.
- (2002)
- Minematsu, N.¹ Sekiguchi, M.² Hirose, K.³

27
- 36249021170
- Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht
- PhD thesis, Computer Science Institute; Universität des Saarlandes; Germany
- Müller, C. (2005). Zweistufige kontextsensitive Sprecherklassifikation am Beispiel von Alter und Geschlecht, PhD thesis, Computer Science Institute; Universität des Saarlandes; Germany.
- (2005)
- Müller, C.¹

28
- 79958831606
- Speaker Classification
- Springer, New York-Berlin
- Müller, C.; Schötz, S. (eds.) (2007). Speaker Classification, Springer, New York-Berlin.
- (2007)
- Müller, C.¹ Schötz, S.²

29
- 84994747210
- Exploiting Speech for Recognizing Elderly Users to Respond to their Special Needs
- Proceedings of Interspeech (Eurospeech), ISCA, Geneva, Switzerland
- Müller, C.; Wittig, F.; Baus, J. (2003). Exploiting Speech for Recognizing Elderly Users to Respond to their Special Needs, Proceedings of Interspeech (Eurospeech), ISCA, Geneva, Switzerland.
- (2003)
- Müller, C.¹ Wittig, F.² Baus, J.³

30
- 0001937343
- Pitch Duration Characteristics of Older Males
- Mysak, E. D. (1959). Pitch Duration Characteristics of Older Males, Journal of Speech and Hearing Research, vol. 2, pp. 46-54.
- (1959) Journal of Speech and Hearing Research , vol.2 , pp. 46-54
- Mysak, E.D.¹

31
- 0033329296
- Emotion in Speech: Recognition and Application to Call Centers
- Conference on Artificial Neural Networks In Engineering (ANNIE), St Louis, USA
- Petrushin, V. (1999). Emotion in Speech: Recognition and Application to Call Centers, Conference on Artificial Neural Networks In Engineering (ANNIE), St Louis, USA.
- (1999)
- Petrushin, V.¹

32
- 0003425258
- Digital Processing of Speech Signals
- Prentice-Hall
- Rabiner, L. R. (1978). Digital Processing of Speech Signals, Prentice-Hall.
- (1978)
- Rabiner, L.R.¹

33
- 0024610919
- A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286.
- (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

34
- 0021404166
- Mixture Densities
- Redner, R. A.; Walker, H. F. (1984). Mixture Densities, Maximum Likelihood and the EM Algorithm, SIAM Review, vol. 26, no. 2, pp. 195-239.
- (1984) Maximum Likelihood and the EM Algorithm, SIAM Review , vol.26 , Issue.2 , pp. 195-239
- Redner, R.A.¹ Walker, H.F.²

35
- 0036293830
- An Overview of Automatic Speaker Recognition Technology
- Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA
- Reynolds, D. A. (2002). An Overview of Automatic Speaker Recognition Technology, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA, vol. 4, pp. 4072-4075.
- (2002) , vol.4 , pp. 4072-4075
- Reynolds, D.A.¹

36
- 84889349849
- Beyond Cepstra: Exploiting High-Level Information in Speaker Recognition
- Workshop on Multimodal User Authentication
- Reynolds, D.; Campbell, J.; Campbell, B.; Dunn, B.; Gleason, T.; Jones, D.; Quatieri, T.; Quillen, C.; Sturim, D.; Torres-Carrasquillo, P. (2003). Beyond Cepstra: Exploiting High-Level Information in Speaker Recognition, Workshop on Multimodal User Authentication.
- (2003)
- Reynolds, D.¹ Campbell, J.² Campbell, B.³ Dunn, B.⁴ Gleason, T.⁵ Jones, D.⁶ Quatieri, T.⁷ Quillen, C.⁸ Sturim, D.⁹ Torres-Carrasquillo, P.¹⁰

37
- 0003408420
- Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning)
- MIT Press, Cambridge, MA, USA
- Schölkopf, B.; Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning), MIT Press, Cambridge, MA, USA.
- (2002)
- Schölkopf, B.¹ Smola, A.²

38
- 67349139346
- Automatic Prediction of Speaker Age Using CART
- Term paper for course in Forensic Phonetics, Göteborg University
- Schötz, S. (2004). Automatic Prediction of Speaker Age Using CART, http://www.ling.lu.se/persons/Suzi/downloads/RF paper SusanneS2004.pdf. Term paper for course in Forensic Phonetics, Göteborg University.
- (2004)
- Schötz, S.¹

39
- 85013700737
- Multilingual Speech Processing
- Academic Press
- Schultz, T.; Kirchhoff, K. (eds.) (2006). Multilingual Speech Processing, Academic Press.
- (2006)
- Schultz, T.¹ Kirchhoff, K.²

40
- 84946723550
- Voice Signatures
- Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, U.S. Virgin Islands
- Shafran, I.; Riley, M.; Mohri, M. (2003). Voice Signatures, Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, U.S. Virgin Islands.
- (2003)
- Shafran, I.¹ Riley, M.² Mohri, M.³

41
- 33646774273
- Of all Things the Measure is Man-Classification of Emotions and Inter-Labeler Consistency
- Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Steidl, S.; Levit, M.; Batliner, A.; Nöth, E.; Niemann, H. (2005). Of all Things the Measure is Man-Classification of Emotions and Inter-Labeler Consistency, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
- (2005)
- Steidl, S.¹ Levit, M.² Batliner, A.³ Nöth, E.⁴ Niemann, H.⁵

42
- 29044438709
- Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection
- Lawyers & Judges Publishing Company
- Tanner, D. C.; Tanner, M. E. (2004). Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection, Lawyers & Judges Publishing Company.
- (2004)
- Tanner, D.C.¹ Tanner, M.E.²

43
- 0004217877
- Information Retrieval
- Butterworths, London
- van Rijsbergen, C. J. (1979). Information Retrieval, Butterworths, London.
- (1979)
- Van Rijsbergen, C.J.¹

44
- 30344452293
- Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System
- Walker, M.; Langkilde-Geary, I.; Wright, H.; Wright, J.; Gorin, A. (2002). Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System, Journal of Artificial Intelligence Research, vol. 16, pp. 293-319.
- (2002) Journal of Artificial Intelligence Research , vol.16 , pp. 293-319
- Walker, M.¹ Langkilde-Geary, I.² Wright, H.³ Wright, J.⁴ Gorin, A.⁵

45
- 84947280249
- Recognition of Emotions in Interactive Voice Response Systems
- Proceedings of Interspeech (Eurospeech), Geneva, Switzerland
- Yacoub, S.; Simske, S.; Lin, X.; Burns, J. (2003). Recognition of Emotions in Interactive Voice Response Systems, Proceedings of Interspeech (Eurospeech), Geneva, Switzerland.
- (2003)
- Yacoub, S.¹ Simske, S.² Lin, X.³ Burns, J.⁴

46
- 0031624532
- Speech Recognition with Dynamic Bayesian Networks
- Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI), Madison, WI
- Zweig, G.; Russell, S. (1998). Speech Recognition with Dynamic Bayesian Networks, Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI), Madison, WI.
- (1998)
- Zweig, G.¹ Russell, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.