SCOPUS 정보 검색 플랫폼

Sadhana - Academy Proceedings in Engineering Sciences

Volumn 36, Issue 5, 2011, Pages 783-836

Role of neural network models for developing speech systems

a INDIAN INSTITUTE OF TECHNOLOGY (India)

Author keywords

autoassociative neural network (AANN); Dialect identification; Emotion recognition; Feedforward neural network (FFNN); language identification; prosody models; speaker recognition; voice conversion

Indexed keywords

AUTOASSOCIATIVE NEURAL NETWORKS; DIALECT IDENTIFICATION; EMOTION RECOGNITION; LANGUAGE IDENTIFICATION; PROSODY MODEL; SPEAKER RECOGNITION; VOICE CONVERSION;

CHARACTER RECOGNITION; FEEDFORWARD NEURAL NETWORKS; IDENTIFICATION (CONTROL SYSTEMS); SPEECH PROCESSING; SPEECH SYNTHESIS;

SPEECH RECOGNITION;

EID: 84856289513 PISSN: 02562499 EISSN: 09737677 Source Type: Journal
DOI: 10.1007/s12046-011-0047-z Document Type: Article

Times cited : (34)

References (147)

1
- 0023739214
- Voice conversion through vector quantization
- Abe M, Nakanura S, Shikano K, Kuwabara H 1998 Voice conversion through vector quantization. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 655-658.
- (1998) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 655-658
- Abe, M.¹ Nakanura, S.² Shikano, K.³ Kuwabara, H.⁴

2
- 85009259855
- ICSLP, Denver, CO, USA
- Angkititrakul P, Hansen J L 2002 Stochastic trajectory model analysis for accent classification. ICSLP, Denver, CO, USA, pp. 493-496.
- (2002) Stochastic Trajectory Model Analysis For Accent Classification , pp. 493-496
- Angkititrakul, P.¹ Hansen, J.L.²

3
- 85009231014
- Proc. Eurospeech, Geneva, Switzerland
- Angkititrakul P, Hansen J L 2003 Use of trajectory models for automatic accent classification. Proc. Eurospeech, Geneva, Switzerland, pp. 1353-1356.
- (2003) Use of Trajectory Models For Automatic Accent Classification , pp. 1353-1356
- Angkititrakul, P.¹ Hansen, J.L.²

4
- 84856275800
- Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
- Anjani A V N S 2000 Autoassociate neural network models for processing degraded speech. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
- (2000) Autoassociate Neural Network Models For Processing Degraded Speech
- Anjani, A.V.N.S.¹

5
- 0030165438
- Language accent classification in American English
- Arslan L, Hansen J 1996 Language accent classification in American English. Speech Commun. 18(4): 353-367.
- (1996) Speech Commun. , vol.18 , Issue.4 , pp. 353-367
- Arslan, L.¹ Hansen, J.²

6
- 0030757418
- A study of temporal features and frequency characteristics in American English foreign accent
- Arslan L, Hansen J 1997 A study of temporal features and frequency characteristics in American English foreign accent. J. Acoust. Soc. Am. 102: 28-40.
- (1997) J. Acoust. Soc. Am. , vol.102 , pp. 28-40
- Arslan, L.¹ Hansen, J.²

7
- 84856266997
- Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France
- Barbosa P A, Bailly G 1992 Generating segmental duration by p-centers. Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France, pp. 163-168.
- (1992) Generating Segmental Duration By P-centers , pp. 163-168
- Barbosa, P.A.¹ Bailly, G.²

8
- 0028531866
- Characterization of rhythmic patterns for text-to-speech synthesis
- Barbosa P A, Bailly G 1994 Characterization of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15: 127-137.
- (1994) Speech Commun. , vol.15 , pp. 127-137
- Barbosa, P.A.¹ Bailly, G.²

9
- 84994310262
- Proc. Eurospeech, Scandinavia
- Batliner A, Mobius B, Mohler G, Schweitzer A, Noth E 2001 Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground. Proc. Eurospeech, Scandinavia.
- (2001) Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground
- Batliner, A.¹ Mobius, B.² Mohler, G.³ Schweitzer, A.⁴ Noth, E.⁵

10
- 67650565075
- J. Benesty, M. M. Sondhi, and Y. Huang (Eds.), Berlin, Germany: Springer Publishers
- Benesty J, Sondhi M M, Huang Y (eds) 2008 Springer handbook on speech processing, (Berlin, Germany: Springer Publishers).
- (2008) Springer Handbook on Speech Processing

11
- 0003571407
- The festival speech synthesis system: System documentation
- University of Edinburgh, 1. 4. 0 edition
- Black A W, Taylor P, Caley R 2000 The festival speech synthesis system: System documentation. The Centre for Speech Technology Research (CSTR), University of Edinburgh, 1. 4. 0 edition. http://www.cstr.ed.ac.uk/projects/festival/manual/festival_toc.html.
- (2000) The Centre For Speech Technology Research (CSTR)
- Black, A.W.¹ Taylor, P.² Caley, R.³

12
- 0011102471
- Proc. Eurospeech
- Blackburn C S, Vonwiller J P, King R W 1993 Automatic accent classification using artificial neural networks. In Proc. Eurospeech, vol. 2, pp. 1241-1244.
- (1993) Automatic Accent Classification Using Artificial Neural Networks , vol.2 , pp. 1241-1244
- Blackburn, C.S.¹ Vonwiller, J.P.² King, R.W.³

13
- 84865709374
- In Proc. LREC, Marrakech (Morocco)
- Blin L, Boeffard O, Barreaud V 2008 Web-based listening test system for speech synthesis and speech conversion evaluation. In Proc. LREC, Marrakech (Morocco).
- (2008) Web-based Listening Test System For Speech Synthesis and Speech Conversion Evaluation
- Blin, L.¹ Boeffard, O.² Barreaud, V.³

14
- 0003802343
- Pacific Grove, CA, USA: Wadsworth and Brooks
- Breiman L, Friedman N, Olshen R 1984 Classification and regression trees (Pacific Grove, CA, USA: Wadsworth and Brooks).
- (1984) Classification and Regression Trees
- Breiman, L.¹ Friedman, N.² Olshen, R.³

15
- 85009062747
- Proc. Int. Conf. Spoken Language Processing, Beijing, China
- Buhmann J, Vereecken H, Fackrell J, Martens J P, Coile B V 2000 Data driven intonation modeling of 6 languages. Proc. Int. Conf. Spoken Language Processing, vol. 3, Beijing, China, pp. 179-183.
- (2000) Data Driven Intonation Modeling of 6 Languages , vol.3 , pp. 179-183
- Buhmann, J.¹ Vereecken, H.² Fackrell, J.³ Martens, J.P.⁴ Coile, B.V.⁵

16
- 14944351245
- In ACM 6th Int. Conf. on Multimodal Interfaces (ICMI 2004), ACM, State College, PA
- Busso C, Deng Z, Yildirim S, Bulut M, Lee C M, Kazemzadeh A, Lee S, Neumann U, Narayanan S 2004 Analysis of emotion recognition using facial expressions, speech and multimodal information. In ACM 6th Int. Conf. on Multimodal Interfaces (ICMI 2004), ACM, State College, PA.
- (2004) Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information
- Busso, C.¹ Deng, Z.² Yildirim, S.³ Bulut, M.⁴ Lee, C.M.⁵ Kazemzadeh, A.⁶ Lee, S.⁷ Neumann, U.⁸ Narayanan, S.⁹

17
- 65249116503
- Analysis of emotionally salient aspects of fundamental frequency for emotion detection
- Busso C, Lee S, Narayanan S 2009 Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Speech Audio Process. 17: 582-596.
- (2009) IEEE Trans. Speech Audio Process. , vol.17 , pp. 582-596
- Busso, C.¹ Lee, S.² Narayanan, S.³

18
- 0025387541
- Analog i/o nets for syllable timing
- Campbell W N 1990 Analog i/o nets for syllable timing. Speech Commun. 9(1): 57-61.
- (1990) Speech Commun. , vol.9 , Issue.1 , pp. 57-61
- Campbell, W.N.¹

19
- 84856255811
- In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam
- Campbell W N 1992 Syllable based segment duration. In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam, pp. 211-224.
- (1992) Syllable Based Segment Duration , pp. 211-224
- Campbell, W.N.¹

20
- 85027104127
- In Proc. European Conf. Speech Communication and Technology, Berlin, Germany
- Campbell W N 1993 Predicting segmental durations for accommodation within a syllable-level timing framework. In Proc. European Conf. Speech Communication and Technology, vol. 2, Berlin, Germany, pp. 1081-1084.
- (1993) Predicting Segmental Durations For Accommodation Within a Syllable-level Timing Framework , vol.2 , pp. 1081-1084
- Campbell, W.N.¹

21
- 84930566044
- Segment durations in a syllable frame
- Campbell W N, Isard S D 1991 Segment durations in a syllable frame. J. Phonetics: Special issue on speech synthesis 19: 37-47.
- (1991) J. Phonetics: Special Issue on Speech Synthesis , vol.19 , pp. 37-47
- Campbell, W.N.¹ Isard, S.D.²

22
- 33745459765
- Chopde A 2001 Itrans Indian language transliteration package version 5. 2 source. http://www.aczone.con/itrans/.
- (2001) Itrans Indian Language Transliteration Package Version 5. 2 Source
- Chopde, A.¹

23
- 47949104091
- In Proc. Speech Prosody, Aix-en-Provence, France
- Chung H 2002a Duration models and the perceptual evaluation of spoken Korean. In Proc. Speech Prosody, Aix-en-Provence, France, pp. 219-222.
- (2002) Duration Models and The Perceptual Evaluation of Spoken Korean , pp. 219-222
- Chung, H.¹

24
- 33750687350
- Perceptual evaluation of duration models in spoken Korean
- Chung H 2002b Perceptual evaluation of duration models in spoken Korean. Korean J. Speech Sci. 9: 207-215.
- (2002) Korean J. Speech Sci. , vol.9 , pp. 207-215
- Chung, H.¹

25
- 84856275798
- In Proc. European Conf. Speech Communication and Technology, Budapest, Hungary
- Cordoba R, Vallejo J A, Montero J M, Gutierrezarriola J, Lopez M A, Pardo J M 1999 Automatic modeling of duration in a Spanish text-to-speech system using neural networks. In Proc. European Conf. Speech Communication and Technology, Budapest, Hungary.
- Automatic Modeling of Duration In a Spanish Text-to-speech System Using Neural Networks
- Cordoba, R.¹ Vallejo, J.A.² Montero, J.M.³ Gutierrezarriola, J.⁴ Lopez, M.A.⁵ Pardo, J.M.⁶

26
- 84869597582
- In Proc. Eurospeech 2001, Aalborg, Denmark
- Cosi P, Tesser F, Gretter R 2001 Festival speaks Italian. In Proc. Eurospeech 2001, Aalborg, Denmark, pp. 509-512.
- (2001) Festival Speaks Italian , pp. 509-512
- Cosi, P.¹ Tesser, F.² Gretter, R.³

27
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis S, Mermelstein P 1980 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Speech Audio Process. 28(4): 357-366.
- (1980) IEEE Trans. Speech Audio Process. , vol.28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

28
- 0030353343
- In International Conference on Spoken Language Processing (ICSLP) 96, Philadelphia, PA, USA
- Dellaert F, Polzin T, Waibel A 1996 Recognising emotions in speech. In International Conference on Spoken Language Processing (ICSLP) 96, vol. 3, Philadelphia, PA, USA, pp. 1816-1819.
- (1996) Recognising Emotions In Speech , vol.3 , pp. 1816-1819
- Dellaert, F.¹ Polzin, T.² Waibel, A.³

29
- 0003424145
- New York, USA: Macmilan Publishing
- Deller J R, Proakis J G, Hansen J H L 1993 Discrete-time processing of speech signals (New York, USA: Macmilan Publishing Company).
- (1993) Discrete-Time Processing of Speech Signals
- Deller, J.R.¹ Proakis, J.G.² Hansen, J.H.L.³

30
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- Desai S, Black A W, Yegnanarayana B, Prahlad K 2010 Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18(5): 954-964.
- (2010) IEEE Trans. Audio Speech Lang. Process. , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahlad, K.⁴

31
- 0003549017
- New York: John Wiley and Sons
- Diamantaras K I, Kung S Y 1996 Principal component neural networks: Theory and applications (New York: John Wiley and Sons).
- (1996) Principal Component Neural Networks: Theory and Applications
- Diamantaras, K.I.¹ Kung, S.Y.²

32
- 0037380084
- Emotional speech: Towards a new generation of databases
- Douglas-Cowie E, Campbell N, Cowie R, Roach P 2003 Emotional speech: Towards a new generation of databases. Speech Commun. 40: 3360.
- (2003) Speech Commun. , vol.40 , pp. 3360
- Douglas-Cowie, E.¹ Campbell, N.² Cowie, R.³ Roach, P.⁴

33
- 85032751766
- Emotion recognition in human computer interaction
- Mag., Stockholm, Sweden, 23-25 April 2001
- Douglas-Cowie R, Tsapatsoulis E, Votsis N, Kollias G, Fellenz S, Fellinge W, Taylor J 2001 Emotion recognition in human computer interaction. IEEE Signal Process. Mag., Stockholm, Sweden, 23-25 April 2001.
- (2001) IEEE Signal Process
- Douglas-Cowie, R.¹ Tsapatsoulis, E.² Votsis, N.³ Kollias, G.⁴ Fellenz, S.⁵ Fellinge, W.⁶ Taylor, J.⁷

34
- 0005318943
- In Proc. Eurospeech, Budapest, Hungary
- Dusterhoff K E, Black A W, Taylor P A 1999 Using decision trees within the Tilt intonation model to predict F0 contour. In Proc. Eurospeech, Budapest, Hungary.
- (1999) Using Decision Trees Within the Tilt Intonation Model to Predict F0 Contour
- Dusterhoff, K.E.¹ Black, A.W.² Taylor, P.A.³

35
- 0002944642
- Dynamic characteristics of voice fundamental frequency in speech and singing
- P. F. MacNeilage (Ed.), New York, USA: Springer-Verlag
- Fujisaki H 1983 Dynamic characteristics of voice fundamental frequency in speech and singing. In (ed) P F MacNeilage, The production of speech, New York, USA: Springer-Verlag, pp. 39-55.
- (1983) The Production of Speech , pp. 39-55
- Fujisaki, H.¹

36
- 0001810979
- In (ed) O Fujimura, Vocal physiology: Voice production, mechanisms and functions, New York, USA: Raven Press
- Fujisaki H 1988 A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In (ed) O Fujimura, Vocal physiology: Voice production, mechanisms and functions, New York, USA: Raven Press, pp. 347-355.
- (1988) A Note On the Physiological and Physical Basis For the Phrase and Accent Components In the Voice Fundamental Frequency Contour , pp. 347-355
- Fujisaki, H.¹

37
- 1942535983
- In Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing, Chennai, India
- Gangashetty S V, Sekhar C C, Yegnanarayana B 2004 Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing, Chennai, India, pp. 159-164.
- (2004) Extraction of Fixed Dimension Patterns From Varying Duration Segments of Consonant-vowel Utterances , pp. 159-164
- Gangashetty, S.V.¹ Sekhar, C.C.² Yegnanarayana, B.³

38
- 70349439215
- A dimensional approach to emotion recognition of speech from movies
- Giannakopoulos T, Pikrakis A, Theodoridis S 2009 A dimensional approach to emotion recognition of speech from movies. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 65-68.
- (2009) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 65-68
- Giannakopoulos, T.¹ Pikrakis, A.² Theodoridis, S.³

39
- 40249100308
- Bayesian networks for phone duration prediction
- Goubanova O, King S 2008 Bayesian networks for phone duration prediction. Speech Commun. 50: 301-311.
- (2008) Speech Commun. , vol.50 , pp. 301-311
- Goubanova, O.¹ King, S.²

40
- 85009107944
- Proc. Int. Conf. Spoken Language Processing, Beijing, China
- Goubanova O, Taylor P 2000 Using bayesian belief networks for modeling duration in text-to-speech systems. In Proc. Int. Conf. Spoken Language Processing, vol. 2, Beijing, China, pp. 427-431.
- (2000) Using Bayesian Belief Networks For Modeling Duration In Text-to-speech Systems , vol.2 , pp. 427-431
- Goubanova, O.¹ Taylor, P.²

41
- 34547518166
- Support vector regression for automatic recognition of spontaneous emotions in speech
- Grimm M, Kroschel K, Narayanan S 2007 Support vector regression for automatic recognition of spontaneous emotions in speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 1085-1088.
- (2007) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 1085-1088
- Grimm, M.¹ Kroschel, K.² Narayanan, S.³

42
- 78049394179
- Automatic, dimensional and continuous emotion recognition
- Gunes H, Pantic M 2010 Automatic, dimensional and continuous emotion recognition. Int. J. Synthetic Emotions 1(1): 68-99.
- (2010) Int. J. Synthetic Emotions , vol.1 , Issue.1 , pp. 68-99
- Gunes, H.¹ Pantic, M.²

43
- 84856240372
- Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
- Gupta C S 2003 Significance of source features for speaker recognition. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
- (2003) Significance of Source Features For Speaker Recognition
- Gupta, C.S.¹

44
- 0003413187
- New Delhi: Pearson Education Aisa, Inc
- Haykin S 1999 Neural networks: A comprehensive foundation (New Delhi: Pearson Education Aisa, Inc.).
- (1999) Neural Networks: A Comprehensive Foundation
- Haykin, S.¹

45
- 77952314450
- Proc. Int. Conf. Spoken Language Processing, Denver, Colorado, USA
- Hifny Y, Rashwan M 2002 Duration modeling of Arabic text-to-speech synthesis. In Proc. Int. Conf. Spoken Language Processing, Denver, Colorado, USA, pp. 1773-1776.
- (2002) Duration Modeling of Arabic Text-to-speech Synthesis , pp. 1773-1776
- Hifny, Y.¹ Rashwan, M.²

46
- 0003962869
- New York: Macmillan Publishing
- Hogg R V, Ledolter J 1987 Engineering statistics (New York: Macmillan Publishing Company).
- (1987) Engineering Statistics
- Hogg, R.V.¹ Ledolter, J.²

47
- 0028712434
- Neural-network-based F0 text-to-speech synthesizer for Mandarin
- Hwang S H, Chen S H 1994 Neural-network-based F0 text-to-speech synthesizer for Mandarin. IEE Proc. Image Signal Process. 141(6): 384-390.
- (1994) IEE Proc. Image Signal Process. , vol.141 , Issue.6 , pp. 384-390
- Hwang, S.H.¹ Chen, S.H.²

48
- 0028996978
- A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech
- Hwang S-H, Chen S-H 1995 A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 616-619.
- (1995) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 616-619
- Hwang, S.-H.¹ Chen, S.-H.²

49
- 84856296907
- In Int. Joint Conf. Neural Networks, USA
- Ikbal M S, Misra H, Yegnanarayana B 1999 Analysis of autoassociative mapping neural networks. In Int. Joint Conf. Neural Networks, USA, pp. 854-858.
- (1999) Analysis of Autoassociative Mapping Neural Networks , pp. 854-858
- Ikbal, M.S.¹ Misra, H.² Yegnanarayana, B.³

50
- 77950073346
- Spoken emotion recognition through optimum-path forest classification using glottal features
- Iliev A I, Scordilis M S, Papa J P, Falcão A X 2010 Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech. Lang. 24(3): 445-460.
- (2010) Comput. Speech. Lang. , vol.24 , Issue.3 , pp. 445-460
- Iliev, A.I.¹ Scordilis, M.S.² Papa, J.P.³ Falcão, A.X.⁴

51
- 33947693233
- Master's thesis, St. Edmunds College, University of Cambridge
- Inanoglu Z 2003 Transforming pitch in a voice conversion framework. Master's thesis, St. Edmunds College, University of Cambridge.
- (2003) Transforming Pitch In a Voice Conversion Framework
- Inanoglu, Z.¹

52
- 85095823619
- A method of classification among Japanese dialects
- Itahashi S, Tanaka K 1993 A method of classification among Japanese dialects. Proc. Eurospeech 1: 639-642.
- (1993) Proc. Eurospeech , vol.1 , pp. 639-642
- Itahashi, S.¹ Tanaka, K.²

53
- 0016467604
- Minimum prediction residual principle applied to speech recognition
- Itakura F 1975 Minimum prediction residual principle applied to speech recognition. IEEE Trans. Speech Audio Process. 23(1): 67-72.
- (1975) IEEE Trans. Speech Audio Process. , vol.23 , Issue.1 , pp. 67-72
- Itakura, F.¹

54
- 4444285698
- PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA
- Kain A 2001 High resolution voice transformation. PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA.
- (2001) High Resolution Voice Transformation
- Kain, A.¹

55
- 0034841948
- In Proc. IEEE Int. Conf. Acoust. Speech Signal Process, Salt Lake City, UT, USA
- Kain A, Macon M W 2001 Design and evaluation of a voice conversion algorithm based on spectral envelop mapping and residual prediction. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2, Salt Lake City, UT, USA, pp. 813-816.
- (2001) Design and Evaluation of a Voice Conversion Algorithm Based On Spectral Envelop Mapping and Residual Prediction , vol.2 , pp. 813-816
- Kain, A.¹ Macon, M.W.²

56
- 84856275255
- Int. Conf. Natural Language Processing, Mysore, India
- Khan A N, Gangashetty S V, Yegnanarayana B 2003 Syllabic properties of three Indian languages: Implications for speech recognition and language identification. In Int. Conf. Natural Language Processing, Mysore, India, pp. 125-134.
- (2003) Syllabic Properties of Three Indian Languages: Implications For Speech Recognition and Language Identification , pp. 125-134
- Khan, A.N.¹ Gangashetty, S.V.² Yegnanarayana, B.³

57
- 84856266801
- In Int. Conf. Natural Language Processing, Mumbai, India
- Kishore S P, Kumar R, Sangal R 2002 A data-driven synthesis approach for indian languages using syllable as basic unit. In Int. Conf. Natural Language Processing, Mumbai, India, pp. 311-316.
- (2002) A Data-driven Synthesis Approach For Indian Languages Using Syllable As Basic Unit , pp. 311-316
- Kishore, S.P.¹ Kumar, R.² Sangal, R.³

58
- 0034862114
- In Int. Joint Conf. Neural Networks, Washington, DC, USA
- Kishore S P, Yegnanarayana B 2001 Online text-independent speaker verification system using autoassociative neural network models. In Int. Joint Conf. Neural Networks, Washington, DC, USA.
- (2001) Online Text-independent Speaker Verification System Using Autoassociative Neural Network Models
- Kishore, S.P.¹ Yegnanarayana, B.²

59
- 85133465524
- 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA
- Krishna N S, Murthy H A 2004 Duration modeling of Indian languages Hindi and Telugu. In 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 197-202.
- (2004) Duration Modeling of Indian Languages Hindi and Telugu , pp. 197-202
- Krishna, N.S.¹ Murthy, H.A.²

60
- 84856280329
- Int. Conf. Natural Language Processing, Mumbai, 18-21 December 2002
- Krishna N S, Murthy H A, Gonsalves T A 2002 Text-to-speech (tts) in Indian languages. In Int. Conf. Natural Language Processing, Mumbai, pp. 317-326, 18-21 December 2002.
- (2002) Text-to-speech (tts) In Indian Languages , pp. 317-326
- Krishna, N.S.¹ Murthy, H.A.² Gonsalves, T.A.³

61
- 85009064374
- In Proc. Int. Conf. Spoken Language Processing, Denver, USA
- Krishna N, Tulukdar P, Bali K, Ramakrishnan A 2004 Duration modeling for Hindi text-to-speech synthesis system. In Proc. Int. Conf. Spoken Language Processing, Denver, USA.
- (2004) Duration Modeling For Hindi Text-to-speech Synthesis System
- Krishna, N.¹ Tulukdar, P.² Bali, K.³ Ramakrishnan, A.⁴

62
- 85009223246
- In Eurospeech, Geneva
- Kwon O, Chan K, Hao J, Lee T 2003 Emotion recognition by speech signals. In Eurospeech, Geneva, pp. 125-128.
- (2003) Emotion Recognition By Speech Signals , pp. 125-128
- Kwon, O.¹ Chan, K.² Hao, J.³ Lee, T.⁴

63
- 14644439843
- Toward detecting emotions in spoken dialogs
- Lee C M, Narayanan S 2005a Toward detecting emotions in spoken dialogs. IEEEAUP 13(2): 293-303.
- (2005) Ieeeaup , vol.13 , Issue.2 , pp. 293-303
- Lee, C.M.¹ Narayanan, S.²

64
- 14644439843
- Toward detecting emotions in spoken dialogs
- Lee C M, Narayanan S S 2005b Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2): 293-303.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.2 , pp. 293-303
- Lee, C.M.¹ Narayanan, S.S.²

65
- 38149065136
- Statistical approach for voice personality transformation
- Lee K 2007 Statistical approach for voice personality transformation. IEEE Trans. Audio Speech Lang. Process. 15: 641-651.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , pp. 641-651
- Lee, K.¹

66
- 33745191649
- An articulatory study of emotional speech production
- Lee S, Yildirim S, Kazemzadeh A, Narayanan S 2007 An articulatory study of emotional speech production. Proc. Interspeech 4: 497-500.
- (2007) Proc. Interspeech , vol.4 , pp. 497-500
- Lee, S.¹ Yildirim, S.² Kazemzadeh, A.³ Narayanan, S.⁴

67
- 84867217083
- Interspeech, vol. 1, Brisbane, Australia, September 2008
- Leung C-C, Ferras M, Barras C, Gauvain J-L 2008 Comparing prosodic models for speaker recognition. In Interspeech, vol. 1, Brisbane, Australia, pp. 1945-1948, September 2008.
- (2008) Comparing Prosodic Models For Speaker Recognition , pp. 1945-1948
- Leung, C.-C.¹ Ferras, M.² Barras, C.³ Gauvain, J.-L.⁴

68
- 51449108623
- Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters
- Lugger M, Yang B 2008 Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 4945- 4948.
- (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 4945-4948
- Lugger, M.¹ Yang, B.²

69
- 52949094265
- Extraction and representation of prosodic features for language and speaker recognition
- Mary L, Yegnanarayana B 2008 Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10): 782-796.
- (2008) Speech Commun. , vol.50 , Issue.10 , pp. 782-796
- Mary, L.¹ Yegnanarayana, B.²

70
- 84856289467
- In ISCA Workshop on Speech and Emotion, Belfast
- McGilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S 2000 Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA Workshop on Speech and Emotion, Belfast.
- (2000) Approaching Automatic Recognition of Emotion From Voice: A Rough Benchmark
- McGilloway, S.¹ Cowie, R.² Douglas-Cowie, E.³ Gielen, S.⁴ Westerdijk, M.⁵ Stroeve, S.⁶

71
- 84874424790
- An efficient method to compute lsfs from lpc coefficients
- Mei X, Sun S 2000 An efficient method to compute lsfs from lpc coefficients. In ICSP-2000, pp. 655-658.
- (2000) ICSP-2000 , pp. 655-658
- Mei, X.¹ Sun, S.²

72
- 85009154226
- In Proc. European Conf. Speech Communication and Technology, Aalborg, Denmark
- Mixdorff H, Jokisch O 2001 Building an integrated prosodic model of German. In Proc. European Conf. Speech Communication and Technology, vol. 2, Aalborg, Denmark, pp. 947-950.
- (2001) Building An Integrated Prosodic Model of German , vol.2 , pp. 947-950
- Mixdorff, H.¹ Jokisch, O.²

73
- 0000668614
- Robustness of group-delay-based method for extraction of significant excitation from speech signals
- Murthy P S, Yegnanarayana B 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7: 609-619.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 609-619
- Murthy, P.S.¹ Yegnanarayana, B.²

74
- 0029254176
- Transformation of formants for voice conversion using artificial neural networks
- Narendranadh M, Murthy H A, Rajendran S, Yegnanarayana B 1995 Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2): 206-216.
- (1995) Speech Commun. , vol.16 , Issue.2 , pp. 206-216
- Narendranadh, M.¹ Murthy, H.A.² Rajendran, S.³ Yegnanarayana, B.⁴

75
- 85016350179
- In 6th International Conference on Neural Information Processing, ICONIP-99
- Nicholson J, Takahashi K, Nakatsu R 1999 Emotion recognition in speech using neural networks. In 6th International Conference on Neural Information Processing, ICONIP-99, pp. 495-501.
- (1999) Emotion Recognition In Speech Using Neural Networks , pp. 495-501
- Nicholson, J.¹ Takahashi, K.² Nakatsu, R.³

76
- 0242721417
- Speech emotion recognition using hidden Markov models
- Nwe T L, Foo S W, Silva L C D 2003 Speech emotion recognition using hidden Markov models. Speech Commun. 41(4): 603-623.
- (2003) Speech Commun. , vol.41 , Issue.4 , pp. 603-623
- Nwe, T.L.¹ Foo, S.W.² Silva, L.C.D.³

77
- 0003513556
- NJ: Prentice-Hall
- Oppenheim A V, Schafer R W, Buck J R 1999 Discrete-time signal processing (NJ: Prentice-Hall).
- (1999) Discrete-Time Signal Processing
- Oppenheim, A.V.¹ Schafer, R.W.² Buck, J.R.³

78
- 84856289468
- In Symposium on Prosody and Speech, Tokyo, Japan
- Ostendorfy M, Shafranz I, Bates R 2003 Prosody models for conversational speech recognition. In Symposium on Prosody and Speech, Tokyo, Japan.
- (2003) Prosody Models For Conversational Speech Recognition
- Ostendorfy, M.¹ Shafranz, I.² Bates, R.³

79
- 57049108119
- (eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education
- Pantic M, Bartlett M 2007 Machine analysis of facial expressions. In (eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education) pp. 377-416.
- (2007) Machine Analysis of Facial Expressions , pp. 377-416
- Pantic, M.¹ Bartlett, M.²

80
- 2942590310
- Toward an affect-sensitive multimodal human-computer interaction
- Pantic M, Rothkrantz L J M 2003 Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91: 1370-1390.
- (2003) Proc. IEEE , vol.91 , pp. 1370-1390
- Pantic, M.¹ Rothkrantz, L.J.M.²

81
- 51449094434
- Voice conversion with linear prediction residual estimation
- Percybrooks W S, Moore-II E 2008 Voice conversion with linear prediction residual estimation. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 4673-4676.
- (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 4673-4676
- Percybrooks, W.S.¹ Moore-Ii, E.²

82
- 0003754220
- PhD thesis, MIT, MA, USA
- Pierrehumbert J B 1980 The phonology and phonetics of English intonation. PhD thesis, MIT, MA, USA.
- (1980) The Phonology and Phonetics of English Intonation
- Pierrehumbert, J.B.¹

83
- 33745205178
- PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India
- Prasanna S R M 2004 Event-based analysis of speech. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India.
- (2004) Event-based Analysis of Speech
- Prasanna, S.R.M.¹

84
- 4544369752
- In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Montreal, Canada
- Prasanna S R M, Yegnanarayana B 2004a Extraction of pitch in adverse conditions. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1, Montreal, Canada.
- (2004) Extraction of Pitch In Adverse Conditions , vol.1
- Prasanna, S.R.M.¹ Yegnanarayana, B.²

85
- 4544369752
- IEEE Int. Conf. Acoust. Speech Audio Process., Montreal, Canada
- Prasanna S R M, Yegnanarayana B 2004b Extraction of pitch in adverse conditions. In IEEE Int. Conf. Acoust. Speech Audio Process., vol. 1, Montreal, Canada.
- (2004) Extraction of Pitch In Adverse Conditions , vol.1
- Prasanna, S.R.M.¹ Yegnanarayana, B.²

86
- 85143189780
- Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida, USA
- Prasanna S R M, Zachariah J M 2002 Detection of vowel onset point in speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 3, Orlando, Florida, USA.
- (2002) Detection of Vowel Onset Point In Speech , vol.3
- Prasanna, S.R.M.¹ Zachariah, J.M.²

87
- 0004244302
- Englewood Cliffs, New Jersey: Prentice-Hall
- Rabiner L R, Juang B H 1993 Fundamentals of speech recognition (Englewood Cliffs, New Jersey: Prentice-Hall).
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.R.¹ Juang, B.H.²

88
- 79953168002
- Application of prosody models for developing speech systems in Indian languages
- Rao K S 2011 Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14: 19-33.
- (2011) Int. J. Speech Technol. , vol.14 , pp. 19-33
- Rao, K.S.¹

89
- 54049142844
- Voice conversion by mapping the speaker-specific features using pitch synchronous approach
- Rao K S 2009 Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 23(2): 240-256.
- (2009) Comput. Speech Lang , vol.23 , Issue.2 , pp. 240-256
- Rao, K.S.¹

90
- 84856280321
- 2nd International Conference on Pattern Recognition and Machine Intelligence (Premi-2007), Kolkata, India
- Rao K S, Laskar R H, Koolagudi S G 2007 Voice transformation by mapping the features at syllable level. In 2nd International Conference on Pattern Recognition and Machine Intelligence (Premi-2007), Kolkata, India.
- (2007) Voice Transformation By Mapping the Features At Syllable Level
- Rao, K.S.¹ Laskar, R.H.² Koolagudi, S.G.³

91
- 84870181682
- In WMSCI-2010, Orlando, Florida, USA
- Rao K S, Nandy S, Koolagudi S G 2010 Identification of Hindi dialects using speech. In WMSCI-2010, Orlando, Florida, USA.
- (2010) Identification of Hindi Dialects Using Speech
- Rao, K.S.¹ Nandy, S.² Koolagudi, S.G.³

92
- 84856289465
- In Speech Prosody, Chicago, USA
- Rao K S, Reddy R, Maity S, Koolagudi S G 2010 Characterization of emotions using dynamics of prosodic features. In Speech Prosody, Chicago, USA.
- (2010) Characterization of Emotions Using Dynamics of Prosodic Features
- Rao, K.S.¹ Reddy, R.² Maity, S.³ Koolagudi, S.G.⁴

93
- 4544252352
- Proc. IEEE Int. Conf. Multimedia and Expo, Baltimore, Maryland, USA
- Rao K S, Yegnanarayana B 2003 Prosodic manipulation using instants of significant excitation. In Proc. IEEE Int. Conf. Multimedia and Expo, Baltimore, Maryland, USA, pp. 389-392.
- (2003) Prosodic Manipulation Using Instants of Significant Excitation , pp. 389-392
- Rao, K.S.¹ Yegnanarayana, B.²

94
- 34047248058
- Prosody modification using instants of significant excitation
- Rao K S, Yegnanarayana B 2006a Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3): 972-980.
- (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.3 , pp. 972-980
- Rao, K.S.¹ Yegnanarayana, B.²

95
- 36048979629
- In 9th Int. Conf. Information Technology, Bhubaneswar, Orissa, India
- Rao K S, Yegnanarayana B 2006b Voice conversion by prosody and vocal tract modification. In 9th Int. Conf. Information Technology, Bhubaneswar, Orissa, India.
- (2006) Voice Conversion By Prosody and Vocal Tract Modification
- Rao, K.S.¹ Yegnanarayana, B.²

96
- 33750713338
- Modeling durations of syllables using neural networks
- Rao K S, Yegnanarayana B 2007 Modeling durations of syllables using neural networks. Comput. Speech Lang. 21: 282-295.
- (2007) Comput. Speech Lang. , vol.21 , pp. 282-295
- Rao, K.S.¹ Yegnanarayana, B.²

97
- 54049142844
- Intonation modeling for Indian languages
- Rao K S, Yegnanarayana B 2009 Intonation modeling for Indian languages. Comput. Speech Lang. 23: 240-256.
- (2009) Comput. Speech Lang. , vol.23 , pp. 240-256
- Rao, K.S.¹ Yegnanarayana, B.²

98
- 0002069313
- In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam
- Riley M 1992 Tree-based modeling of segmental durations. In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam, pp. 265-273.
- (1992) Tree-based Modeling of Segmental Durations , pp. 265-273
- Riley, M.¹

99
- 0028405296
- Assignment of segment duration in text-to-speech synthesis
- Santen J P H V 1994 Assignment of segment duration in text-to-speech synthesis. Comput. Speech Lang. 8: 95-128.
- (1994) Comput. Speech Lang. , vol.8 , pp. 95-128
- Santen, J.P.H.V.¹

100
- 0010058685
- In Eurospeech, Aalborg, Denmark
- Schrder M 2001 Emotional speech synthesis: A review. In Eurospeech, Aalborg, Denmark.
- (2001) Emotional Speech Synthesis: A Review
- Schrder, M.¹

101
- 0001876893
- Phonus 4, Research report of the Institute of Phonetics, University of Saarland
- Schroder M 1996 Can emotions be synthesized without controlling voice quality? Phonus 4, Research report of the Institute of Phonetics, University of Saarland.
- (1996) Can Emotions Be Synthesized Without Controlling Voice Quality?
- Schroder, M.¹

102
- 84856280326
- Workshop on Emotion and Computing, Bremen, Germany
- Schroder M, Cowie R 2006 Issues in emotion-oriented computing towards a shared understanding. Workshop on Emotion and Computing, Bremen, Germany.
- (2006) Issues In Emotion-oriented Computing Towards a Shared Understanding
- Schroder, M.¹ Cowie, R.²

103
- 85009089741
- In 7th European Conference on Speech Communication and Technology. Eurospeech 2001 Scandinavia, 2nd Interspeech Event, Aalborg, Denmark
- Schroder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen S 2001 Acoustic correlates of emotion dimensions in view of speech synthesis. In 7th European Conference on Speech Communication and Technology. Eurospeech 2001 Scandinavia, 2nd Interspeech Event, Aalborg, Denmark.
- (2001) Acoustic Correlates of Emotion Dimensions In View of Speech Synthesis
- Schroder, M.¹ Cowie, R.² Douglas-Cowie, E.³ Westerdijk, M.⁴ Gielen, S.⁵

104
- 0024876896
- In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Glasgow, Scotland
- Scordilis M S, Gowdy J N 1989 Neural network based generation of fundamental frequency contours. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1, Glasgow, Scotland, pp. 219-222.
- (1989) Neural Network Based Generation of Fundamental Frequency Contours , vol.1 , pp. 219-222
- Scordilis, M.S.¹ Gowdy, J.N.²

105
- 84856289466
- In Prosody in Speech Recognition and Understanding, ISCA Tutorial and Research Workshop (ITRW), Molly Pitcher Inn, Red Bank, NJ, USA
- Shriberg E, Stolcke A 2001 Prosody modeling for automatic speech understanding: An overview of recent research at SRI, In Prosody in Speech Recognition and Understanding, ISCA Tutorial and Research Workshop (ITRW), Molly Pitcher Inn, Red Bank, NJ, USA.
- (2001) Prosody Modeling For Automatic Speech Understanding: An Overview of Recent Research At SRI
- Shriberg, E.¹ Stolcke, A.²

106
- 79953165800
- Philadelphia, PA, USA: Springer
- Shriberg E, Stolcke A 2004 Mathematical foundations of speech and language processing (Philadelphia, PA, USA: Springer).
- (2004) Mathematical Foundations of Speech and Language Processing
- Shriberg, E.¹ Stolcke, A.²

107
- 0029375490
- Determination of instants of significant excitation in speech using group delay function
- Smits R, Yegnanarayana B 1995 Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5): 325-333.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 325-333
- Smits, R.¹ Yegnanarayana, B.²

108
- 0030710662
- Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Munich, Germany
- Sonntag G P, Portele T, Heuft B 1997 Prosody generation with a neural network: Weighing the importance of input parameters. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Munich, Germany, pp. 931-934.
- (1997) Prosody Generation With a Neural Network: Weighing the Importance of Input Parameters , pp. 931-934
- Sonntag, G.P.¹ Portele, T.² Heuft, B.³

109
- 0026953356
- Feedback stabilization using two hidden layer nets
- Sontag E D 1992 Feedback stabilization using two hidden layer nets. IEEE Trans. Neural Networks 3: 981-990.
- (1992) IEEE Trans. Neural Networks , vol.3 , pp. 981-990
- Sontag, E.D.¹

110
- 37549013057
- Technical report no. 11, Project VOIS, Department of Computer Science and Engineering, Indian Institute of Technology Madras
- Srikanth S, Kumar S R R, Sundar R, Yegnanarayana B 1989 A text-to-speech conversion system for Indian languages based on waveform concatenation model. Technical report no. 11, Project VOIS, Department of Computer Science and Engineering, Indian Institute of Technology Madras.
- (1989) A Text-to-speech Conversion System For Indian Languages Based On Waveform Concatenation Model
- Srikanth, S.¹ Kumar, S.R.R.² Sundar, R.³ Yegnanarayana, B.⁴

111
- 85135175982
- In Eurospeech, Madrid, Spain
- Stylianou Y, Cappe O, Moulines E 1995 Statistical methods for voice quality transformation. In Eurospeech, Madrid, Spain, pp. 447-450.
- (1995) Statistical Methods For Voice Quality Transformation , pp. 447-450
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

112
- 0032026483
- Continuous probabilistic transform for voice conversion
- Stylianou Y, Cappe Y, Moulines E 1998 Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6: 131-142.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , pp. 131-142
- Stylianou, Y.¹ Cappe, Y.² Moulines, E.³

113
- 70349199919
- Sun R, Moore E, Torres J F 2009 Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 4509-4512.
- (2009) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 4509-4512
- Sun, R.¹ Moore, E.² Torres, J.F.³

114
- 84856280320
- In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany
- Sundermann D 2005 Voice conversion: State-of-the-art and future work. In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany.
- (2005) Voice Conversion: State-of-the-art and Future Work
- Sundermann, D.¹

115
- 33646767751
- A study on residual prediction techniques for voice conversion
- Sundermann D, Bonafonte A, Ney H 2005a A study on residual prediction techniques for voice conversion, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 13-16.
- (2005) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 13-16
- Sundermann, D.¹ Bonafonte, A.² Ney, H.³

116
- 84856275794
- In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany
- Sundermann D, Bonafonte A, Duxans H, Hoege H 2005b Tc-star: Evaluation plan for voice conversion technology. In Proc. DAGA: 31st German Annual Conf. on Acoustics, Munich, Germany.
- (2005) Tc-star: Evaluation Plan For Voice Conversion Technology
- Sundermann, D.¹ Bonafonte, A.² Duxans, H.³ Hoege, H.⁴

117
- 0034008810
- Analysis and synthesis of intonation using the Tilt model
- Taylor P A 2000 Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3): 1697-1714.
- (2000) J. Acoust. Soc. Am. , vol.107 , Issue.3 , pp. 1697-1714
- Taylor, P.A.¹

118
- 85009231337
- Segmental durations predicted with a neural network
- Geneva, Switzerland
- Teixeira J P, Freitas D 2003 Segmental durations predicted with a neural network. In Proc. European Conf. Speech Communication and Technology, Geneva, Switzerland, pp. 169-172.
- (2003) Proc. European Conf. Speech Communication and Technology , pp. 169-172
- Teixeira, J.P.¹ Freitas, D.²

119
- 84945895231
- In 5th ESCA Speech Synthesis Workshop, Pittsburgh, USA
- Tesser F, Cosi P, Drioli C, Tisato G 2004 Prosodic data driven modeling of a narrative style in Festival TTS. In 5th ESCA Speech Synthesis Workshop, Pittsburgh, USA, pp. 185-190.
- (2004) Prosodic Data Driven Modeling of a Narrative Style In Festival TTS , pp. 185-190
- Tesser, F.¹ Cosi, P.² Drioli, C.³ Tisato, G.⁴

120
- 0003788784
- Cambridge: Cambridge University Press
- t'Hart J, Collier R, Cohen A 1990 A perceptual study of intonation (Cambridge: Cambridge University Press).
- (1990) A Perceptual Study of Intonation
- t'Hart, J.¹ Collier, R.² Cohen, A.³

121
- 0034842552
- Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum
- Toda T, Saruwatari H, Shikano K 2001 Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2: 841-844.
- (2001) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 841-844
- Toda, T.¹ Saruwatari, H.² Shikano, K.³

122
- 34250619456
- Proc. Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain
- Torres-Carrasquillo P A, Gleason T P, Reynolds D A 2004 Dialect identification using gaussian mixture models. In Proc. Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain, pp. 297-300.
- (2004) Dialect Identification Using Gaussian Mixture Models , pp. 297-300
- Torres-Carrasquillo, P.A.¹ Gleason, T.P.² Reynolds, D.A.³

123
- 84867204678
- Interspeech, Brisbane, Australia, 22-26 September 2008
- Torres-Carrasquillo P A, Sturim D, Reynolds D A, McCree A 2008 Eigen channel compensation and discriminatively trained gaussian mixture models for dialect and accent recognition. In Interspeech, Brisbane, Australia, 22-26 September 2008.
- (2008) Eigen Channel Compensation and Discriminatively Trained Gaussian Mixture Models For Dialect and Accent Recognition
- Torres-Carrasquillo, P.A.¹ Sturim, D.² Reynolds, D.A.³ McCree, A.⁴

124
- 84867198266
- Interspeech, Brisbane, Australia
- Toth A R, Black A W 2008 Incorporating durational modification in voice transformation. In Interspeech, Brisbane, Australia, vol. 2, pp. 1088-1091.
- (2008) Incorporating Durational Modification In Voice Transformation , vol.2 , pp. 1088-1091
- Toth, A.R.¹ Black, A.W.²

125
- 77950029784
- Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany
- Turk O 2007 Cross-lingual voice conversion. Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany.
- (2007) Cross-lingual Voice Conversion
- Turk, O.¹

126
- 84863647359
- In Proc. EUSIPCO, Antalya, Turkey
- Turk O, Arslan L M 2005 Donor selection for voice conversion. In Proc. EUSIPCO, Antalya, Turkey.
- (2005) Donor Selection For Voice Conversion
- Turk, O.¹ Arslan, L.M.²

127
- 33746653351
- Robust processing techniques for voice conversion
- Turk O, Arslan L M 2006 Robust processing techniques for voice conversion. Comput. Speech Lang. 20: 441-467.
- (2006) Comput. Speech Lang. , vol.20 , pp. 441-467
- Turk, O.¹ Arslan, L.M.²

128
- 77953699443
- Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques
- Turk O, Schroder M 2010 Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques. IEEE Trans. Speech Audio Process. 18(5): 965-973.
- (2010) IEEE Trans. Speech Audio Process. , vol.18 , Issue.5 , pp. 965-973
- Turk, O.¹ Schroder, M.²

129
- 33646681559
- PhD thesis, Dept. of Phonetics, University of Helsinki, Finland
- Vainio M 2001 Artificial neural network based prosody models for Finnish text-to-speech synthesis. PhD thesis, Dept. of Phonetics, University of Helsinki, Finland.
- (2001) Artificial Neural Network Based Prosody Models For Finnish Text-to-speech Synthesis
- Vainio, M.¹

130
- 84856275795
- In Proc. Int. Conf. Spoken Language Processing, Sydney, Australia
- Vainio M, Altosaar T 1998 Modeling the microprosody of pitch and loudness for speech synthesis with neural networks. In Proc. Int. Conf. Spoken Language Processing, Sydney, Australia.
- (1998) Modeling the Microprosody of Pitch and Loudness For Speech Synthesis With Neural Networks
- Vainio, M.¹ Altosaar, T.²

131
- 0026880275
- Voice transformation using PSOLA techniques
- Valbret H, Moulines E, Tubach J P 1992 Voice transformation using PSOLA techniques. Speech Commun. 11: 175-187.
- (1992) Speech Commun. , vol.11 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.P.³

132
- 37549053769
- Master's thesis, Dept. of Linguistics, University of Edinburgh
- Vegnaduzzo M 2003 Modeling intonation for the Italian Festival TTS using linear regression. Master's thesis, Dept. of Linguistics, University of Edinburgh.
- (2003) Modeling Intonation For the Italian Festival TTS Using Linear Regression
- Vegnaduzzo, M.¹

133
- 33746410556
- Emotional speech recognition: Resources, features, and methods
- Ververidis D, Kotropoulos C 2006a Emotional speech recognition: Resources, features, and methods. Speech Commun. 48: 11621181.
- (2006) Speech Commun. , vol.48 , pp. 11621181
- Ververidis, D.¹ Kotropoulos, C.²

134
- 84856280323
- In Eleventh Australasian International Conference on Speech Science and Technology, Auckland, New Zealand
- Ververidis D, Kotropoulos C 2006b A state of the art review on emotional speech databases. In Eleventh Australasian International Conference on Speech Science and Technology, Auckland, New Zealand.
- (2006) A State of the Art Review On Emotional Speech Databases
- Ververidis, D.¹ Kotropoulos, C.²

135
- 13344275792
- An investigation of speech-based human emotion recognition
- Wang Y, Guan L 2004 An investigation of speech-based human emotion recognition. In IEEE 6th Workshop on Multimedia Signal Processing, pp. 15-18.
- (2004) In IEEE 6th Workshop On Multimedia Signal Processing , pp. 15-18
- Wang, Y.¹ Guan, L.²

136
- 85009266993
- Proc. Int. Conf. Spoken Language Processing, Denver, CO, USA
- Watanabe T, Murakami T, Namba M, Hoya T, Ishida Y 2002 Transformation of spectral envelope for voice conversion based on radial basis function networks. In Proc. Int. Conf. Spoken Language Processing, Denver, CO, USA, pp. 285-288.
- (2002) Transformation of Spectral Envelope For Voice Conversion Based On Radial Basis Function Networks , pp. 285-288
- Watanabe, T.¹ Murakami, T.² Namba, M.³ Hoya, T.⁴ Ishida, Y.⁵

137
- 0036299157
- Proc. ICASSP, Orlando
- Weber F, Manganaro L, Peskin B, Shriberg E 2002 Using prosodic and lexical information for speaker identification. Proc. ICASSP, vol. 1, Orlando, pp. 141-144.
- (2002) Using Prosodic and Lexical Information For Speaker Identification , vol.1 , pp. 141-144
- Weber, F.¹ Manganaro, L.² Peskin, B.³ Shriberg, E.⁴

138
- 79953659944
- doi: 10. 1016/j. specom. 2010. 08. 013
- Wua S, Falk B.T H, Chan W-Y 2010 Automatic speech emotion recognition using modulation spectral features. Speech Commun. doi: 10. 1016/j. specom. 2010. 08. 013.
- (2010) Automatic Speech Emotion Recognition Using Modulation Spectral Features. Speech Commun
- Wua, S.¹ Falk, B.T.H.² Chan, W.-Y.³

139
- 79958007348
- Berlin, Germany: Springer
- Yoon W, Kyu-SikPark 2007 Modelling decisions for artificial intelligency (Berlin, Germany: Springer) pp. 455-462.
- (2007) Modelling Decisions for Artificial Intelligency , pp. 455-462
- Yoon, W.¹ Kyu-Sikpark²

140
- 4544284652
- High quality voice morphing
- Ye H, Young S 2004 High quality voice morphing. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 9-12.
- (2004) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 9-12
- Ye, H.¹ Young, S.²

141
- 0004312284
- New Delhi: Prentice-Hall
- Yegnanarayana B 1999 Artificial neural networks (New Delhi: Prentice-Hall).
- (1999) Artificial Neural Networks
- Yegnanarayana, B.¹

142
- 0035989168
- AANN an alternative to GMM for pattern recognition
- Yegnanarayana B, Kishore S P 2002 AANN an alternative to GMM for pattern recognition. Neural Networks 15: 459-469.
- (2002) Neural Networks , vol.15 , pp. 459-469
- Yegnanarayana, B.¹ Kishore, S.P.²

143
- 0034856452
- In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, UT
- Yegnanarayana B, Reddy K S, Kishore S P 2001a Source and system features for speaker recognition using aann models. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, UT.
- (2001) Source and System Features For Speaker Recognition Using Aann Models
- Yegnanarayana, B.¹ Reddy, K.S.² Kishore, S.P.³

144
- 0034856452
- In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, Utah, USA
- Yegnanarayana B, Reddy K S, Kishore S P 2001b Source and system features for speaker recognition using AANN models. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1, Salt Lake City, Utah, USA, pp. 409-412.
- (2001) Source and System Features For Speaker Recognition Using AANN Models , vol.1 , pp. 409-412
- Yegnanarayana, B.¹ Reddy, K.S.² Kishore, S.P.³

145
- 0032121729
- Extraction of vocal-tract system characteristics from speech signals
- Yegnanarayana B, Veldhuis R N J 1998 Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech Audio Process. 6(4): 313-327.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.4 , pp. 313-327
- Yegnanarayana, B.¹ Veldhuis, R.N.J.²

146
- 34147129605
- Yin1 B, Ambikairajah E, Chen F 2006 Combining cepstral and prosodic features in language identification. In 18th Int. Conf. on Pattern Recognition (ICPR'06), Hong Kong, China, 20-24 August 2006.
- (2006) Combining Cepstral and Prosodic Features In Language Identification , pp. 20-24
- Yin1, B.¹ Ambikairajah, E.² Chen, F.³

147
- 0029725861
- Automatic dialect identification of extemporaneous, conversational, latin american spanish speech
- Atlanta, Georgia, USA
- Zissman M A, Gleason T P, Rekart D M, Losiewicz B L 1996 Automatic dialect identification of extemporaneous, conversational, latin american spanish speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2, Atlanta, Georgia, USA.
- (1996) In Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.2
- Zissman, M.A.¹ Gleason, T.P.² Rekart, D.M.³ Losiewicz, B.L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.