-
2
-
-
85009259855
-
-
ICSLP, Denver, CO, USA
-
Angkititrakul P, Hansen J L 2002 Stochastic trajectory model analysis for accent classification. ICSLP, Denver, CO, USA, pp. 493-496.
-
(2002)
Stochastic Trajectory Model Analysis For Accent Classification
, pp. 493-496
-
-
Angkititrakul, P.1
Hansen, J.L.2
-
4
-
-
84856275800
-
-
Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
-
Anjani A V N S 2000 Autoassociate neural network models for processing degraded speech. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
-
(2000)
Autoassociate Neural Network Models For Processing Degraded Speech
-
-
Anjani, A.V.N.S.1
-
5
-
-
0030165438
-
Language accent classification in American English
-
Arslan L, Hansen J 1996 Language accent classification in American English. Speech Commun. 18(4): 353-367.
-
(1996)
Speech Commun.
, vol.18
, Issue.4
, pp. 353-367
-
-
Arslan, L.1
Hansen, J.2
-
6
-
-
0030757418
-
A study of temporal features and frequency characteristics in American English foreign accent
-
Arslan L, Hansen J 1997 A study of temporal features and frequency characteristics in American English foreign accent. J. Acoust. Soc. Am. 102: 28-40.
-
(1997)
J. Acoust. Soc. Am.
, vol.102
, pp. 28-40
-
-
Arslan, L.1
Hansen, J.2
-
7
-
-
84856266997
-
-
Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France
-
Barbosa P A, Bailly G 1992 Generating segmental duration by p-centers. Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France, pp. 163-168.
-
(1992)
Generating Segmental Duration By P-centers
, pp. 163-168
-
-
Barbosa, P.A.1
Bailly, G.2
-
8
-
-
0028531866
-
Characterization of rhythmic patterns for text-to-speech synthesis
-
Barbosa P A, Bailly G 1994 Characterization of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15: 127-137.
-
(1994)
Speech Commun.
, vol.15
, pp. 127-137
-
-
Barbosa, P.A.1
Bailly, G.2
-
9
-
-
84994310262
-
-
Proc. Eurospeech, Scandinavia
-
Batliner A, Mobius B, Mohler G, Schweitzer A, Noth E 2001 Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground. Proc. Eurospeech, Scandinavia.
-
(2001)
Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground
-
-
Batliner, A.1
Mobius, B.2
Mohler, G.3
Schweitzer, A.4
Noth, E.5
-
10
-
-
67650565075
-
-
J. Benesty, M. M. Sondhi, and Y. Huang (Eds.), Berlin, Germany: Springer Publishers
-
Benesty J, Sondhi M M, Huang Y (eds) 2008 Springer handbook on speech processing, (Berlin, Germany: Springer Publishers).
-
(2008)
Springer Handbook on Speech Processing
-
-
-
11
-
-
0003571407
-
The festival speech synthesis system: System documentation
-
University of Edinburgh, 1. 4. 0 edition
-
Black A W, Taylor P, Caley R 2000 The festival speech synthesis system: System documentation. The Centre for Speech Technology Research (CSTR), University of Edinburgh, 1. 4. 0 edition. http://www.cstr.ed.ac.uk/projects/festival/manual/festival_toc.html.
-
(2000)
The Centre For Speech Technology Research (CSTR)
-
-
Black, A.W.1
Taylor, P.2
Caley, R.3
-
15
-
-
85009062747
-
-
Proc. Int. Conf. Spoken Language Processing, Beijing, China
-
Buhmann J, Vereecken H, Fackrell J, Martens J P, Coile B V 2000 Data driven intonation modeling of 6 languages. Proc. Int. Conf. Spoken Language Processing, vol. 3, Beijing, China, pp. 179-183.
-
(2000)
Data Driven Intonation Modeling of 6 Languages
, vol.3
, pp. 179-183
-
-
Buhmann, J.1
Vereecken, H.2
Fackrell, J.3
Martens, J.P.4
Coile, B.V.5
-
16
-
-
14944351245
-
-
In ACM 6th Int. Conf. on Multimodal Interfaces (ICMI 2004), ACM, State College, PA
-
Busso C, Deng Z, Yildirim S, Bulut M, Lee C M, Kazemzadeh A, Lee S, Neumann U, Narayanan S 2004 Analysis of emotion recognition using facial expressions, speech and multimodal information. In ACM 6th Int. Conf. on Multimodal Interfaces (ICMI 2004), ACM, State College, PA.
-
(2004)
Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information
-
-
Busso, C.1
Deng, Z.2
Yildirim, S.3
Bulut, M.4
Lee, C.M.5
Kazemzadeh, A.6
Lee, S.7
Neumann, U.8
Narayanan, S.9
-
17
-
-
65249116503
-
Analysis of emotionally salient aspects of fundamental frequency for emotion detection
-
Busso C, Lee S, Narayanan S 2009 Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Speech Audio Process. 17: 582-596.
-
(2009)
IEEE Trans. Speech Audio Process.
, vol.17
, pp. 582-596
-
-
Busso, C.1
Lee, S.2
Narayanan, S.3
-
18
-
-
0025387541
-
Analog i/o nets for syllable timing
-
Campbell W N 1990 Analog i/o nets for syllable timing. Speech Commun. 9(1): 57-61.
-
(1990)
Speech Commun.
, vol.9
, Issue.1
, pp. 57-61
-
-
Campbell, W.N.1
-
19
-
-
84856255811
-
-
In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam
-
Campbell W N 1992 Syllable based segment duration. In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam, pp. 211-224.
-
(1992)
Syllable Based Segment Duration
, pp. 211-224
-
-
Campbell, W.N.1
-
24
-
-
33750687350
-
Perceptual evaluation of duration models in spoken Korean
-
Chung H 2002b Perceptual evaluation of duration models in spoken Korean. Korean J. Speech Sci. 9: 207-215.
-
(2002)
Korean J. Speech Sci.
, vol.9
, pp. 207-215
-
-
Chung, H.1
-
25
-
-
84856275798
-
-
In Proc. European Conf. Speech Communication and Technology, Budapest, Hungary
-
Cordoba R, Vallejo J A, Montero J M, Gutierrezarriola J, Lopez M A, Pardo J M 1999 Automatic modeling of duration in a Spanish text-to-speech system using neural networks. In Proc. European Conf. Speech Communication and Technology, Budapest, Hungary.
-
Automatic Modeling of Duration In a Spanish Text-to-speech System Using Neural Networks
-
-
Cordoba, R.1
Vallejo, J.A.2
Montero, J.M.3
Gutierrezarriola, J.4
Lopez, M.A.5
Pardo, J.M.6
-
26
-
-
84869597582
-
-
In Proc. Eurospeech 2001, Aalborg, Denmark
-
Cosi P, Tesser F, Gretter R 2001 Festival speaks Italian. In Proc. Eurospeech 2001, Aalborg, Denmark, pp. 509-512.
-
(2001)
Festival Speaks Italian
, pp. 509-512
-
-
Cosi, P.1
Tesser, F.2
Gretter, R.3
-
27
-
-
0019053271
-
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
-
Davis S, Mermelstein P 1980 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Speech Audio Process. 28(4): 357-366.
-
(1980)
IEEE Trans. Speech Audio Process.
, vol.28
, Issue.4
, pp. 357-366
-
-
Davis, S.1
Mermelstein, P.2
-
28
-
-
0030353343
-
-
In International Conference on Spoken Language Processing (ICSLP) 96, Philadelphia, PA, USA
-
Dellaert F, Polzin T, Waibel A 1996 Recognising emotions in speech. In International Conference on Spoken Language Processing (ICSLP) 96, vol. 3, Philadelphia, PA, USA, pp. 1816-1819.
-
(1996)
Recognising Emotions In Speech
, vol.3
, pp. 1816-1819
-
-
Dellaert, F.1
Polzin, T.2
Waibel, A.3
-
30
-
-
77953707533
-
Spectral mapping using artificial neural networks for voice conversion
-
Desai S, Black A W, Yegnanarayana B, Prahlad K 2010 Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18(5): 954-964.
-
(2010)
IEEE Trans. Audio Speech Lang. Process.
, vol.18
, Issue.5
, pp. 954-964
-
-
Desai, S.1
Black, A.W.2
Yegnanarayana, B.3
Prahlad, K.4
-
33
-
-
85032751766
-
Emotion recognition in human computer interaction
-
Mag., Stockholm, Sweden, 23-25 April 2001
-
Douglas-Cowie R, Tsapatsoulis E, Votsis N, Kollias G, Fellenz S, Fellinge W, Taylor J 2001 Emotion recognition in human computer interaction. IEEE Signal Process. Mag., Stockholm, Sweden, 23-25 April 2001.
-
(2001)
IEEE Signal Process
-
-
Douglas-Cowie, R.1
Tsapatsoulis, E.2
Votsis, N.3
Kollias, G.4
Fellenz, S.5
Fellinge, W.6
Taylor, J.7
-
35
-
-
0002944642
-
Dynamic characteristics of voice fundamental frequency in speech and singing
-
P. F. MacNeilage (Ed.), New York, USA: Springer-Verlag
-
Fujisaki H 1983 Dynamic characteristics of voice fundamental frequency in speech and singing. In (ed) P F MacNeilage, The production of speech, New York, USA: Springer-Verlag, pp. 39-55.
-
(1983)
The Production of Speech
, pp. 39-55
-
-
Fujisaki, H.1
-
36
-
-
0001810979
-
-
In (ed) O Fujimura, Vocal physiology: Voice production, mechanisms and functions, New York, USA: Raven Press
-
Fujisaki H 1988 A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In (ed) O Fujimura, Vocal physiology: Voice production, mechanisms and functions, New York, USA: Raven Press, pp. 347-355.
-
(1988)
A Note On the Physiological and Physical Basis For the Phrase and Accent Components In the Voice Fundamental Frequency Contour
, pp. 347-355
-
-
Fujisaki, H.1
-
37
-
-
1942535983
-
-
In Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing, Chennai, India
-
Gangashetty S V, Sekhar C C, Yegnanarayana B 2004 Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing, Chennai, India, pp. 159-164.
-
(2004)
Extraction of Fixed Dimension Patterns From Varying Duration Segments of Consonant-vowel Utterances
, pp. 159-164
-
-
Gangashetty, S.V.1
Sekhar, C.C.2
Yegnanarayana, B.3
-
39
-
-
40249100308
-
Bayesian networks for phone duration prediction
-
Goubanova O, King S 2008 Bayesian networks for phone duration prediction. Speech Commun. 50: 301-311.
-
(2008)
Speech Commun.
, vol.50
, pp. 301-311
-
-
Goubanova, O.1
King, S.2
-
40
-
-
85009107944
-
-
Proc. Int. Conf. Spoken Language Processing, Beijing, China
-
Goubanova O, Taylor P 2000 Using bayesian belief networks for modeling duration in text-to-speech systems. In Proc. Int. Conf. Spoken Language Processing, vol. 2, Beijing, China, pp. 427-431.
-
(2000)
Using Bayesian Belief Networks For Modeling Duration In Text-to-speech Systems
, vol.2
, pp. 427-431
-
-
Goubanova, O.1
Taylor, P.2
-
42
-
-
78049394179
-
Automatic, dimensional and continuous emotion recognition
-
Gunes H, Pantic M 2010 Automatic, dimensional and continuous emotion recognition. Int. J. Synthetic Emotions 1(1): 68-99.
-
(2010)
Int. J. Synthetic Emotions
, vol.1
, Issue.1
, pp. 68-99
-
-
Gunes, H.1
Pantic, M.2
-
43
-
-
84856240372
-
-
Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
-
Gupta C S 2003 Significance of source features for speaker recognition. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
-
(2003)
Significance of Source Features For Speaker Recognition
-
-
Gupta, C.S.1
-
45
-
-
77952314450
-
-
Proc. Int. Conf. Spoken Language Processing, Denver, Colorado, USA
-
Hifny Y, Rashwan M 2002 Duration modeling of Arabic text-to-speech synthesis. In Proc. Int. Conf. Spoken Language Processing, Denver, Colorado, USA, pp. 1773-1776.
-
(2002)
Duration Modeling of Arabic Text-to-speech Synthesis
, pp. 1773-1776
-
-
Hifny, Y.1
Rashwan, M.2
-
47
-
-
0028712434
-
Neural-network-based F0 text-to-speech synthesizer for Mandarin
-
Hwang S H, Chen S H 1994 Neural-network-based F0 text-to-speech synthesizer for Mandarin. IEE Proc. Image Signal Process. 141(6): 384-390.
-
(1994)
IEE Proc. Image Signal Process.
, vol.141
, Issue.6
, pp. 384-390
-
-
Hwang, S.H.1
Chen, S.H.2
-
48
-
-
0028996978
-
A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech
-
Hwang S-H, Chen S-H 1995 A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 616-619.
-
(1995)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
, vol.1
, pp. 616-619
-
-
Hwang, S.-H.1
Chen, S.-H.2
-
50
-
-
77950073346
-
Spoken emotion recognition through optimum-path forest classification using glottal features
-
Iliev A I, Scordilis M S, Papa J P, Falcão A X 2010 Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech. Lang. 24(3): 445-460.
-
(2010)
Comput. Speech. Lang.
, vol.24
, Issue.3
, pp. 445-460
-
-
Iliev, A.I.1
Scordilis, M.S.2
Papa, J.P.3
Falcão, A.X.4
-
52
-
-
85095823619
-
A method of classification among Japanese dialects
-
Itahashi S, Tanaka K 1993 A method of classification among Japanese dialects. Proc. Eurospeech 1: 639-642.
-
(1993)
Proc. Eurospeech
, vol.1
, pp. 639-642
-
-
Itahashi, S.1
Tanaka, K.2
-
53
-
-
0016467604
-
Minimum prediction residual principle applied to speech recognition
-
Itakura F 1975 Minimum prediction residual principle applied to speech recognition. IEEE Trans. Speech Audio Process. 23(1): 67-72.
-
(1975)
IEEE Trans. Speech Audio Process.
, vol.23
, Issue.1
, pp. 67-72
-
-
Itakura, F.1
-
54
-
-
4444285698
-
-
PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA
-
Kain A 2001 High resolution voice transformation. PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA.
-
(2001)
High Resolution Voice Transformation
-
-
Kain, A.1
-
55
-
-
0034841948
-
-
In Proc. IEEE Int. Conf. Acoust. Speech Signal Process, Salt Lake City, UT, USA
-
Kain A, Macon M W 2001 Design and evaluation of a voice conversion algorithm based on spectral envelop mapping and residual prediction. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2, Salt Lake City, UT, USA, pp. 813-816.
-
(2001)
Design and Evaluation of a Voice Conversion Algorithm Based On Spectral Envelop Mapping and Residual Prediction
, vol.2
, pp. 813-816
-
-
Kain, A.1
Macon, M.W.2
-
56
-
-
84856275255
-
-
Int. Conf. Natural Language Processing, Mysore, India
-
Khan A N, Gangashetty S V, Yegnanarayana B 2003 Syllabic properties of three Indian languages: Implications for speech recognition and language identification. In Int. Conf. Natural Language Processing, Mysore, India, pp. 125-134.
-
(2003)
Syllabic Properties of Three Indian Languages: Implications For Speech Recognition and Language Identification
, pp. 125-134
-
-
Khan, A.N.1
Gangashetty, S.V.2
Yegnanarayana, B.3
-
57
-
-
84856266801
-
-
In Int. Conf. Natural Language Processing, Mumbai, India
-
Kishore S P, Kumar R, Sangal R 2002 A data-driven synthesis approach for indian languages using syllable as basic unit. In Int. Conf. Natural Language Processing, Mumbai, India, pp. 311-316.
-
(2002)
A Data-driven Synthesis Approach For Indian Languages Using Syllable As Basic Unit
, pp. 311-316
-
-
Kishore, S.P.1
Kumar, R.2
Sangal, R.3
-
60
-
-
84856280329
-
-
Int. Conf. Natural Language Processing, Mumbai, 18-21 December 2002
-
Krishna N S, Murthy H A, Gonsalves T A 2002 Text-to-speech (tts) in Indian languages. In Int. Conf. Natural Language Processing, Mumbai, pp. 317-326, 18-21 December 2002.
-
(2002)
Text-to-speech (tts) In Indian Languages
, pp. 317-326
-
-
Krishna, N.S.1
Murthy, H.A.2
Gonsalves, T.A.3
-
61
-
-
85009064374
-
-
In Proc. Int. Conf. Spoken Language Processing, Denver, USA
-
Krishna N, Tulukdar P, Bali K, Ramakrishnan A 2004 Duration modeling for Hindi text-to-speech synthesis system. In Proc. Int. Conf. Spoken Language Processing, Denver, USA.
-
(2004)
Duration Modeling For Hindi Text-to-speech Synthesis System
-
-
Krishna, N.1
Tulukdar, P.2
Bali, K.3
Ramakrishnan, A.4
-
62
-
-
85009223246
-
-
In Eurospeech, Geneva
-
Kwon O, Chan K, Hao J, Lee T 2003 Emotion recognition by speech signals. In Eurospeech, Geneva, pp. 125-128.
-
(2003)
Emotion Recognition By Speech Signals
, pp. 125-128
-
-
Kwon, O.1
Chan, K.2
Hao, J.3
Lee, T.4
-
63
-
-
14644439843
-
Toward detecting emotions in spoken dialogs
-
Lee C M, Narayanan S 2005a Toward detecting emotions in spoken dialogs. IEEEAUP 13(2): 293-303.
-
(2005)
Ieeeaup
, vol.13
, Issue.2
, pp. 293-303
-
-
Lee, C.M.1
Narayanan, S.2
-
65
-
-
38149065136
-
Statistical approach for voice personality transformation
-
Lee K 2007 Statistical approach for voice personality transformation. IEEE Trans. Audio Speech Lang. Process. 15: 641-651.
-
(2007)
IEEE Trans. Audio Speech Lang. Process.
, vol.15
, pp. 641-651
-
-
Lee, K.1
-
67
-
-
84867217083
-
-
Interspeech, vol. 1, Brisbane, Australia, September 2008
-
Leung C-C, Ferras M, Barras C, Gauvain J-L 2008 Comparing prosodic models for speaker recognition. In Interspeech, vol. 1, Brisbane, Australia, pp. 1945-1948, September 2008.
-
(2008)
Comparing Prosodic Models For Speaker Recognition
, pp. 1945-1948
-
-
Leung, C.-C.1
Ferras, M.2
Barras, C.3
Gauvain, J.-L.4
-
68
-
-
51449108623
-
Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters
-
Lugger M, Yang B 2008 Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 4945- 4948.
-
(2008)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
, vol.4
, pp. 4945-4948
-
-
Lugger, M.1
Yang, B.2
-
69
-
-
52949094265
-
Extraction and representation of prosodic features for language and speaker recognition
-
Mary L, Yegnanarayana B 2008 Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10): 782-796.
-
(2008)
Speech Commun.
, vol.50
, Issue.10
, pp. 782-796
-
-
Mary, L.1
Yegnanarayana, B.2
-
70
-
-
84856289467
-
-
In ISCA Workshop on Speech and Emotion, Belfast
-
McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S 2000 Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA Workshop on Speech and Emotion, Belfast.
-
(2000)
Approaching Automatic Recognition of Emotion From Voice: A Rough Benchmark
-
-
McGilloway, S.1
Cowie, R.2
Douglas-Cowie, E.3
Gielen, S.4
Westerdijk, M.5
Stroeve, S.6
-
71
-
-
84874424790
-
An efficient method to compute lsfs from lpc coefficients
-
Mei X, Sun S 2000 An efficient method to compute lsfs from lpc coefficients. In ICSP-2000, pp. 655-658.
-
(2000)
ICSP-2000
, pp. 655-658
-
-
Mei, X.1
Sun, S.2
-
72
-
-
85009154226
-
-
In Proc. European Conf. Speech Communication and Technology, Aalborg, Denmark
-
Mixdorff H, Jokisch O 2001 Building an integrated prosodic model of German. In Proc. European Conf. Speech Communication and Technology, vol. 2, Aalborg, Denmark, pp. 947-950.
-
(2001)
Building An Integrated Prosodic Model of German
, vol.2
, pp. 947-950
-
-
Mixdorff, H.1
Jokisch, O.2
-
73
-
-
0000668614
-
Robustness of group-delay-based method for extraction of significant excitation from speech signals
-
Murthy P S, Yegnanarayana B 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7: 609-619.
-
(1999)
IEEE Trans. Speech Audio Process.
, vol.7
, pp. 609-619
-
-
Murthy, P.S.1
Yegnanarayana, B.2
-
74
-
-
0029254176
-
Transformation of formants for voice conversion using artificial neural networks
-
Narendranadh M, Murthy H A, Rajendran S, Yegnanarayana B 1995 Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2): 206-216.
-
(1995)
Speech Commun.
, vol.16
, Issue.2
, pp. 206-216
-
-
Narendranadh, M.1
Murthy, H.A.2
Rajendran, S.3
Yegnanarayana, B.4
-
76
-
-
0242721417
-
Speech emotion recognition using hidden Markov models
-
Nwe T L, Foo S W, Silva L C D 2003 Speech emotion recognition using hidden Markov models. Speech Commun. 41(4): 603-623.
-
(2003)
Speech Commun.
, vol.41
, Issue.4
, pp. 603-623
-
-
Nwe, T.L.1
Foo, S.W.2
Silva, L.C.D.3
-
79
-
-
57049108119
-
-
(eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education
-
Pantic M, Bartlett M 2007 Machine analysis of facial expressions. In (eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education) pp. 377-416.
-
(2007)
Machine Analysis of Facial Expressions
, pp. 377-416
-
-
Pantic, M.1
Bartlett, M.2
-
80
-
-
2942590310
-
Toward an affect-sensitive multimodal human-computer interaction
-
Pantic M, Rothkrantz L J M 2003 Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91: 1370-1390.
-
(2003)
Proc. IEEE
, vol.91
, pp. 1370-1390
-
-
Pantic, M.1
Rothkrantz, L.J.M.2
-
83
-
-
33745205178
-
-
PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India
-
Prasanna S R M 2004 Event-based analysis of speech. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India.
-
(2004)
Event-based Analysis of Speech
-
-
Prasanna, S.R.M.1
-
86
-
-
85143189780
-
-
Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida, USA
-
Prasanna S R M, Zachariah J M 2002 Detection of vowel onset point in speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 3, Orlando, Florida, USA.
-
(2002)
Detection of Vowel Onset Point In Speech
, vol.3
-
-
Prasanna, S.R.M.1
Zachariah, J.M.2
-
88
-
-
79953168002
-
Application of prosody models for developing speech systems in Indian languages
-
Rao K S 2011 Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14: 19-33.
-
(2011)
Int. J. Speech Technol.
, vol.14
, pp. 19-33
-
-
Rao, K.S.1
-
89
-
-
54049142844
-
Voice conversion by mapping the speaker-specific features using pitch synchronous approach
-
Rao K S 2009 Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 23(2): 240-256.
-
(2009)
Comput. Speech Lang
, vol.23
, Issue.2
, pp. 240-256
-
-
Rao, K.S.1
-
91
-
-
84870181682
-
-
In WMSCI-2010, Orlando, Florida, USA
-
Rao K S, Nandy S, Koolagudi S G 2010 Identification of Hindi dialects using speech. In WMSCI-2010, Orlando, Florida, USA.
-
(2010)
Identification of Hindi Dialects Using Speech
-
-
Rao, K.S.1
Nandy, S.2
Koolagudi, S.G.3
-
92
-
-
84856289465
-
-
In Speech Prosody, Chicago, USA
-
Rao K S, Reddy R, Maity S, Koolagudi S G 2010 Characterization of emotions using dynamics of prosodic features. In Speech Prosody, Chicago, USA.
-
(2010)
Characterization of Emotions Using Dynamics of Prosodic Features
-
-
Rao, K.S.1
Reddy, R.2
Maity, S.3
Koolagudi, S.G.4
-
93
-
-
4544252352
-
-
Proc. IEEE Int. Conf. Multimedia and Expo, Baltimore, Maryland, USA
-
Rao K S, Yegnanarayana B 2003 Prosodic manipulation using instants of significant excitation. In Proc. IEEE Int. Conf. Multimedia and Expo, Baltimore, Maryland, USA, pp. 389-392.
-
(2003)
Prosodic Manipulation Using Instants of Significant Excitation
, pp. 389-392
-
-
Rao, K.S.1
Yegnanarayana, B.2
-
96
-
-
33750713338
-
Modeling durations of syllables using neural networks
-
Rao K S, Yegnanarayana B 2007 Modeling durations of syllables using neural networks. Comput. Speech Lang. 21: 282-295.
-
(2007)
Comput. Speech Lang.
, vol.21
, pp. 282-295
-
-
Rao, K.S.1
Yegnanarayana, B.2
-
98
-
-
0002069313
-
-
In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam
-
Riley M 1992 Tree-based modeling of segmental durations. In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam, pp. 265-273.
-
(1992)
Tree-based Modeling of Segmental Durations
, pp. 265-273
-
-
Riley, M.1
-
99
-
-
0028405296
-
Assignment of segment duration in text-to-speech synthesis
-
Santen J P H V 1994 Assignment of segment duration in text-to-speech synthesis. Comput. Speech Lang. 8: 95-128.
-
(1994)
Comput. Speech Lang.
, vol.8
, pp. 95-128
-
-
Santen, J.P.H.V.1
-
103
-
-
85009089741
-
-
In 7th European Conference on Speech Communication and Technology. Eurospeech 2001 Scandinavia, 2nd Interspeech Event, Aalborg, Denmark
-
Schroder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen S 2001 Acoustic correlates of emotion dimensions in view of speech synthesis. In 7th European Conference on Speech Communication and Technology. Eurospeech 2001 Scandinavia, 2nd Interspeech Event, Aalborg, Denmark.
-
(2001)
Acoustic Correlates of Emotion Dimensions In View of Speech Synthesis
-
-
Schroder, M.1
Cowie, R.2
Douglas-Cowie, E.3
Westerdijk, M.4
Gielen, S.5
-
104
-
-
0024876896
-
-
In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Glasgow, Scotland
-
Scordilis M S, Gowdy J N 1989 Neural network based generation of fundamental frequency contours. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1, Glasgow, Scotland, pp. 219-222.
-
(1989)
Neural Network Based Generation of Fundamental Frequency Contours
, vol.1
, pp. 219-222
-
-
Scordilis, M.S.1
Gowdy, J.N.2
-
105
-
-
84856289466
-
-
In Prosody in Speech Recognition and Understanding, ISCA Tutorial and Research Workshop (ITRW), Molly Pitcher Inn, Red Bank, NJ, USA
-
Shriberg E, Stolcke A 2001 Prosody modeling for automatic speech understanding: An overview of recent research at SRI, In Prosody in Speech Recognition and Understanding, ISCA Tutorial and Research Workshop (ITRW), Molly Pitcher Inn, Red Bank, NJ, USA.
-
(2001)
Prosody Modeling For Automatic Speech Understanding: An Overview of Recent Research At SRI
-
-
Shriberg, E.1
Stolcke, A.2
-
107
-
-
0029375490
-
Determination of instants of significant excitation in speech using group delay function
-
Smits R, Yegnanarayana B 1995 Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5): 325-333.
-
(1995)
IEEE Trans. Speech Audio Process.
, vol.3
, Issue.5
, pp. 325-333
-
-
Smits, R.1
Yegnanarayana, B.2
-
108
-
-
0030710662
-
-
Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Munich, Germany
-
Sonntag G P, Portele T, Heuft B 1997 Prosody generation with a neural network: Weighing the importance of input parameters. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Munich, Germany, pp. 931-934.
-
(1997)
Prosody Generation With a Neural Network: Weighing the Importance of Input Parameters
, pp. 931-934
-
-
Sonntag, G.P.1
Portele, T.2
Heuft, B.3
-
109
-
-
0026953356
-
Feedback stabilization using two hidden layer nets
-
Sontag E D 1992 Feedback stabilization using two hidden layer nets. IEEE Trans. Neural Networks 3: 981-990.
-
(1992)
IEEE Trans. Neural Networks
, vol.3
, pp. 981-990
-
-
Sontag, E.D.1
-
110
-
-
37549013057
-
-
Technical report no. 11, Project VOIS, Department of Computer Science and Engineering, Indian Institute of Technology Madras
-
Srikanth S, Kumar S R R, Sundar R, Yegnanarayana B 1989 A text-to-speech conversion system for Indian languages based on waveform concatenation model. Technical report no. 11, Project VOIS, Department of Computer Science and Engineering, Indian Institute of Technology Madras.
-
(1989)
A Text-to-speech Conversion System For Indian Languages Based On Waveform Concatenation Model
-
-
Srikanth, S.1
Kumar, S.R.R.2
Sundar, R.3
Yegnanarayana, B.4
-
111
-
-
85135175982
-
-
In Eurospeech, Madrid, Spain
-
Stylianou Y, Cappe O, Moulines E 1995 Statistical methods for voice quality transformation. In Eurospeech, Madrid, Spain, pp. 447-450.
-
(1995)
Statistical Methods For Voice Quality Transformation
, pp. 447-450
-
-
Stylianou, Y.1
Cappe, O.2
Moulines, E.3
-
116
-
-
84856275794
-
-
In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany
-
Sundermann D, Bonafonte A, Duxans H, Hoege H 2005b Tc-star: Evaluation plan for voice conversion technology. In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany.
-
(2005)
Tc-star: Evaluation Plan For Voice Conversion Technology
-
-
Sundermann, D.1
Bonafonte, A.2
Duxans, H.3
Hoege, H.4
-
117
-
-
0034008810
-
Analysis and synthesis of intonation using the Tilt model
-
Taylor P A 2000 Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3): 1697-1714.
-
(2000)
J. Acoust. Soc. Am.
, vol.107
, Issue.3
, pp. 1697-1714
-
-
Taylor, P.A.1
-
119
-
-
84945895231
-
-
In 5th ESCA Speech Synthesis Workshop, Pittsburgh, USA
-
Tesser F, Cosi P, Drioli C, Tisato G 2004 Prosodic data driven modeling of a narrative style in Festival TTS. In 5th ESCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 185-190.
-
(2004)
Prosodic Data Driven Modeling of a Narrative Style In Festival TTS
, pp. 185-190
-
-
Tesser, F.1
Cosi, P.2
Drioli, C.3
Tisato, G.4
-
121
-
-
0034842552
-
Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum
-
Toda T, Saruwatari H, Shikano K 2001 Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2: 841-844.
-
(2001)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
, vol.2
, pp. 841-844
-
-
Toda, T.1
Saruwatari, H.2
Shikano, K.3
-
122
-
-
34250619456
-
-
Proc. Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain
-
Torres-Carrasquillo P A, Gleason T P, Reynolds D A 2004 Dialect identification using gaussian mixture models. In Proc. Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain, pp. 297-300.
-
(2004)
Dialect Identification Using Gaussian Mixture Models
, pp. 297-300
-
-
Torres-Carrasquillo, P.A.1
Gleason, T.P.2
Reynolds, D.A.3
-
123
-
-
84867204678
-
-
Interspeech, Brisbane, Australia, 22-26 September 2008
-
Torres-Carrasquillo P A, Sturim D, Reynolds D A, McCree A 2008 Eigen channel compensation and discriminatively trained gaussian mixture models for dialect and accent recognition. In Interspeech, Brisbane, Australia, 22-26 September 2008.
-
(2008)
Eigen Channel Compensation and Discriminatively Trained Gaussian Mixture Models For Dialect and Accent Recognition
-
-
Torres-Carrasquillo, P.A.1
Sturim, D.2
Reynolds, D.A.3
McCree, A.4
-
124
-
-
84867198266
-
-
Interspeech, Brisbane, Australia
-
Toth A R, Black A W 2008 Incorporating durational modification in voice transformation. In Interspeech, Brisbane, Australia, vol. 2, pp. 1088-1091.
-
(2008)
Incorporating Durational Modification In Voice Transformation
, vol.2
, pp. 1088-1091
-
-
Toth, A.R.1
Black, A.W.2
-
125
-
-
77950029784
-
-
Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany
-
Turk O 2007 Cross-lingual voice conversion. Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany.
-
(2007)
Cross-lingual Voice Conversion
-
-
Turk, O.1
-
127
-
-
33746653351
-
Robust processing techniques for voice conversion
-
Turk O, Arslan L M 2006 Robust processing techniques for voice conversion. Comput. Speech Lang. 20: 441-467.
-
(2006)
Comput. Speech Lang.
, vol.20
, pp. 441-467
-
-
Turk, O.1
Arslan, L.M.2
-
128
-
-
77953699443
-
Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques
-
Turk O, Schroder M 2010 Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques. IEEE Trans. Speech Audio Process. 18(5): 965-973.
-
(2010)
IEEE Trans. Speech Audio Process.
, vol.18
, Issue.5
, pp. 965-973
-
-
Turk, O.1
Schroder, M.2
-
133
-
-
33746410556
-
Emotional speech recognition: Resources, features, and methods
-
Ververidis D, Kotropoulos C 2006a Emotional speech recognition: Resources, features, and methods. Speech Commun. 48: 11621181.
-
(2006)
Speech Commun.
, vol.48
, pp. 11621181
-
-
Ververidis, D.1
Kotropoulos, C.2
-
136
-
-
85009266993
-
-
Proc. Int. Conf. Spoken Language Processing, Denver, CO, USA
-
Watanabe T, Murakami T, Namba M, Hoya T, Ishida Y 2002 Transformation of spectral envelope for voice conversion based on radial basis function networks. In Proc. Int. Conf. Spoken Language Processing, Denver, CO, USA, pp. 285-288.
-
(2002)
Transformation of Spectral Envelope For Voice Conversion Based On Radial Basis Function Networks
, pp. 285-288
-
-
Watanabe, T.1
Murakami, T.2
Namba, M.3
Hoya, T.4
Ishida, Y.5
-
137
-
-
0036299157
-
-
Proc. ICASSP, Orlando
-
Weber F, Manganaro L, Peskin B, Shriberg E 2002 Using prosodic and lexical information for speaker identification. Proc. ICASSP, vol. 1, Orlando, pp. 141-144.
-
(2002)
Using Prosodic and Lexical Information For Speaker Identification
, vol.1
, pp. 141-144
-
-
Weber, F.1
Manganaro, L.2
Peskin, B.3
Shriberg, E.4
-
142
-
-
0035989168
-
AANN an alternative to GMM for pattern recognition
-
Yegnanarayana B, Kishore S P 2002 AANN an alternative to GMM for pattern recognition. Neural Networks 15: 459-469.
-
(2002)
Neural Networks
, vol.15
, pp. 459-469
-
-
Yegnanarayana, B.1
Kishore, S.P.2
-
144
-
-
0034856452
-
-
In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, Utah, USA
-
Yegnanarayana B, Reddy K S, Kishore S P 2001b Source and system features for speaker recognition using AANN models. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1, Salt Lake City, Utah, USA, pp. 409-412.
-
(2001)
Source and System Features For Speaker Recognition Using AANN Models
, vol.1
, pp. 409-412
-
-
Yegnanarayana, B.1
Reddy, K.S.2
Kishore, S.P.3
-
145
-
-
0032121729
-
Extraction of vocal-tract system characteristics from speech signals
-
Yegnanarayana B, Veldhuis R N J 1998 Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech Audio Process. 6(4): 313-327.
-
(1998)
IEEE Trans. Speech Audio Process.
, vol.6
, Issue.4
, pp. 313-327
-
-
Yegnanarayana, B.1
Veldhuis, R.N.J.2
-
147
-
-
0029725861
-
Automatic dialect identification of extemporaneous, conversational, latin american spanish speech
-
Atlanta, Georgia, USA
-
Zissman M A, Gleason T P, Rekart D M, Losiewicz B L 1996 Automatic dialect identification of extemporaneous, conversational, latin american spanish speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2, Atlanta, Georgia, USA.
-
(1996)
In Proc. IEEE Int. Conf. Acoust. Speech Signal Process
, vol.2
-
-
Zissman, M.A.1
Gleason, T.P.2
Rekart, D.M.3
Losiewicz, B.L.4
|