메뉴 건너뛰기




Volumn 36, Issue 5, 2011, Pages 783-836

Role of neural network models for developing speech systems

Author keywords

autoassociative neural network (AANN); Dialect identification; Emotion recognition; Feedforward neural network (FFNN); language identification; prosody models; speaker recognition; voice conversion

Indexed keywords

AUTOASSOCIATIVE NEURAL NETWORKS; DIALECT IDENTIFICATION; EMOTION RECOGNITION; LANGUAGE IDENTIFICATION; PROSODY MODEL; SPEAKER RECOGNITION; VOICE CONVERSION;

EID: 84856289513     PISSN: 02562499     EISSN: 09737677     Source Type: Journal    
DOI: 10.1007/s12046-011-0047-z     Document Type: Article
Times cited : (34)

References (147)
  • 4
    • 84856275800 scopus 로고    scopus 로고
    • Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
    • Anjani A V N S 2000 Autoassociate neural network models for processing degraded speech. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
    • (2000) Autoassociate Neural Network Models For Processing Degraded Speech
    • Anjani, A.V.N.S.1
  • 5
    • 0030165438 scopus 로고    scopus 로고
    • Language accent classification in American English
    • Arslan L, Hansen J 1996 Language accent classification in American English. Speech Commun. 18(4): 353-367.
    • (1996) Speech Commun. , vol.18 , Issue.4 , pp. 353-367
    • Arslan, L.1    Hansen, J.2
  • 6
    • 0030757418 scopus 로고    scopus 로고
    • A study of temporal features and frequency characteristics in American English foreign accent
    • Arslan L, Hansen J 1997 A study of temporal features and frequency characteristics in American English foreign accent. J. Acoust. Soc. Am. 102: 28-40.
    • (1997) J. Acoust. Soc. Am. , vol.102 , pp. 28-40
    • Arslan, L.1    Hansen, J.2
  • 7
    • 84856266997 scopus 로고
    • Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France
    • Barbosa P A, Bailly G 1992 Generating segmental duration by p-centers. Proc. of the Fourth Workshop on Rhythm Perception and Production, Bourges, France, pp. 163-168.
    • (1992) Generating Segmental Duration By P-centers , pp. 163-168
    • Barbosa, P.A.1    Bailly, G.2
  • 8
    • 0028531866 scopus 로고
    • Characterization of rhythmic patterns for text-to-speech synthesis
    • Barbosa P A, Bailly G 1994 Characterization of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15: 127-137.
    • (1994) Speech Commun. , vol.15 , pp. 127-137
    • Barbosa, P.A.1    Bailly, G.2
  • 10
    • 67650565075 scopus 로고    scopus 로고
    • J. Benesty, M. M. Sondhi, and Y. Huang (Eds.), Berlin, Germany: Springer Publishers
    • Benesty J, Sondhi M M, Huang Y (eds) 2008 Springer handbook on speech processing, (Berlin, Germany: Springer Publishers).
    • (2008) Springer Handbook on Speech Processing
  • 11
    • 0003571407 scopus 로고    scopus 로고
    • The festival speech synthesis system: System documentation
    • University of Edinburgh, 1. 4. 0 edition
    • Black A W, Taylor P, Caley R 2000 The festival speech synthesis system: System documentation. The Centre for Speech Technology Research (CSTR), University of Edinburgh, 1. 4. 0 edition. http://www.cstr.ed.ac.uk/projects/festival/manual/festival_toc.html.
    • (2000) The Centre For Speech Technology Research (CSTR)
    • Black, A.W.1    Taylor, P.2    Caley, R.3
  • 17
    • 65249116503 scopus 로고    scopus 로고
    • Analysis of emotionally salient aspects of fundamental frequency for emotion detection
    • Busso C, Lee S, Narayanan S 2009 Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Speech Audio Process. 17: 582-596.
    • (2009) IEEE Trans. Speech Audio Process. , vol.17 , pp. 582-596
    • Busso, C.1    Lee, S.2    Narayanan, S.3
  • 18
    • 0025387541 scopus 로고
    • Analog i/o nets for syllable timing
    • Campbell W N 1990 Analog i/o nets for syllable timing. Speech Commun. 9(1): 57-61.
    • (1990) Speech Commun. , vol.9 , Issue.1 , pp. 57-61
    • Campbell, W.N.1
  • 19
    • 84856255811 scopus 로고
    • In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam
    • Campbell W N 1992 Syllable based segment duration. In (eds) G Bailly, C Benoit, T R Sawallis, Talking machines: Theories, models and designs, Elsevier, Amsterdam, pp. 211-224.
    • (1992) Syllable Based Segment Duration , pp. 211-224
    • Campbell, W.N.1
  • 24
    • 33750687350 scopus 로고    scopus 로고
    • Perceptual evaluation of duration models in spoken Korean
    • Chung H 2002b Perceptual evaluation of duration models in spoken Korean. Korean J. Speech Sci. 9: 207-215.
    • (2002) Korean J. Speech Sci. , vol.9 , pp. 207-215
    • Chung, H.1
  • 27
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • Davis S, Mermelstein P 1980 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Speech Audio Process. 28(4): 357-366.
    • (1980) IEEE Trans. Speech Audio Process. , vol.28 , Issue.4 , pp. 357-366
    • Davis, S.1    Mermelstein, P.2
  • 28
    • 0030353343 scopus 로고    scopus 로고
    • In International Conference on Spoken Language Processing (ICSLP) 96, Philadelphia, PA, USA
    • Dellaert F, Polzin T, Waibel A 1996 Recognising emotions in speech. In International Conference on Spoken Language Processing (ICSLP) 96, vol. 3, Philadelphia, PA, USA, pp. 1816-1819.
    • (1996) Recognising Emotions In Speech , vol.3 , pp. 1816-1819
    • Dellaert, F.1    Polzin, T.2    Waibel, A.3
  • 35
    • 0002944642 scopus 로고
    • Dynamic characteristics of voice fundamental frequency in speech and singing
    • P. F. MacNeilage (Ed.), New York, USA: Springer-Verlag
    • Fujisaki H 1983 Dynamic characteristics of voice fundamental frequency in speech and singing. In (ed) P F MacNeilage, The production of speech, New York, USA: Springer-Verlag, pp. 39-55.
    • (1983) The Production of Speech , pp. 39-55
    • Fujisaki, H.1
  • 39
    • 40249100308 scopus 로고    scopus 로고
    • Bayesian networks for phone duration prediction
    • Goubanova O, King S 2008 Bayesian networks for phone duration prediction. Speech Commun. 50: 301-311.
    • (2008) Speech Commun. , vol.50 , pp. 301-311
    • Goubanova, O.1    King, S.2
  • 42
    • 78049394179 scopus 로고    scopus 로고
    • Automatic, dimensional and continuous emotion recognition
    • Gunes H, Pantic M 2010 Automatic, dimensional and continuous emotion recognition. Int. J. Synthetic Emotions 1(1): 68-99.
    • (2010) Int. J. Synthetic Emotions , vol.1 , Issue.1 , pp. 68-99
    • Gunes, H.1    Pantic, M.2
  • 43
    • 84856240372 scopus 로고    scopus 로고
    • Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
    • Gupta C S 2003 Significance of source features for speaker recognition. Master's thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
    • (2003) Significance of Source Features For Speaker Recognition
    • Gupta, C.S.1
  • 47
    • 0028712434 scopus 로고
    • Neural-network-based F0 text-to-speech synthesizer for Mandarin
    • Hwang S H, Chen S H 1994 Neural-network-based F0 text-to-speech synthesizer for Mandarin. IEE Proc. Image Signal Process. 141(6): 384-390.
    • (1994) IEE Proc. Image Signal Process. , vol.141 , Issue.6 , pp. 384-390
    • Hwang, S.H.1    Chen, S.H.2
  • 48
    • 0028996978 scopus 로고
    • A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech
    • Hwang S-H, Chen S-H 1995 A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1: 616-619.
    • (1995) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 616-619
    • Hwang, S.-H.1    Chen, S.-H.2
  • 50
    • 77950073346 scopus 로고    scopus 로고
    • Spoken emotion recognition through optimum-path forest classification using glottal features
    • Iliev A I, Scordilis M S, Papa J P, Falcão A X 2010 Spoken emotion recognition through optimum-path forest classification using glottal features. Comput. Speech. Lang. 24(3): 445-460.
    • (2010) Comput. Speech. Lang. , vol.24 , Issue.3 , pp. 445-460
    • Iliev, A.I.1    Scordilis, M.S.2    Papa, J.P.3    Falcão, A.X.4
  • 52
    • 85095823619 scopus 로고
    • A method of classification among Japanese dialects
    • Itahashi S, Tanaka K 1993 A method of classification among Japanese dialects. Proc. Eurospeech 1: 639-642.
    • (1993) Proc. Eurospeech , vol.1 , pp. 639-642
    • Itahashi, S.1    Tanaka, K.2
  • 53
    • 0016467604 scopus 로고
    • Minimum prediction residual principle applied to speech recognition
    • Itakura F 1975 Minimum prediction residual principle applied to speech recognition. IEEE Trans. Speech Audio Process. 23(1): 67-72.
    • (1975) IEEE Trans. Speech Audio Process. , vol.23 , Issue.1 , pp. 67-72
    • Itakura, F.1
  • 54
    • 4444285698 scopus 로고    scopus 로고
    • PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA
    • Kain A 2001 High resolution voice transformation. PhD thesis, OGI School of Science and Engineering, Oregon Health and Science University, USA.
    • (2001) High Resolution Voice Transformation
    • Kain, A.1
  • 63
    • 14644439843 scopus 로고    scopus 로고
    • Toward detecting emotions in spoken dialogs
    • Lee C M, Narayanan S 2005a Toward detecting emotions in spoken dialogs. IEEEAUP 13(2): 293-303.
    • (2005) Ieeeaup , vol.13 , Issue.2 , pp. 293-303
    • Lee, C.M.1    Narayanan, S.2
  • 65
    • 38149065136 scopus 로고    scopus 로고
    • Statistical approach for voice personality transformation
    • Lee K 2007 Statistical approach for voice personality transformation. IEEE Trans. Audio Speech Lang. Process. 15: 641-651.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , pp. 641-651
    • Lee, K.1
  • 68
    • 51449108623 scopus 로고    scopus 로고
    • Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters
    • Lugger M, Yang B 2008 Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 4: 4945- 4948.
    • (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 4945-4948
    • Lugger, M.1    Yang, B.2
  • 69
    • 52949094265 scopus 로고    scopus 로고
    • Extraction and representation of prosodic features for language and speaker recognition
    • Mary L, Yegnanarayana B 2008 Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10): 782-796.
    • (2008) Speech Commun. , vol.50 , Issue.10 , pp. 782-796
    • Mary, L.1    Yegnanarayana, B.2
  • 71
    • 84874424790 scopus 로고    scopus 로고
    • An efficient method to compute lsfs from lpc coefficients
    • Mei X, Sun S 2000 An efficient method to compute lsfs from lpc coefficients. In ICSP-2000, pp. 655-658.
    • (2000) ICSP-2000 , pp. 655-658
    • Mei, X.1    Sun, S.2
  • 72
    • 85009154226 scopus 로고    scopus 로고
    • In Proc. European Conf. Speech Communication and Technology, Aalborg, Denmark
    • Mixdorff H, Jokisch O 2001 Building an integrated prosodic model of German. In Proc. European Conf. Speech Communication and Technology, vol. 2, Aalborg, Denmark, pp. 947-950.
    • (2001) Building An Integrated Prosodic Model of German , vol.2 , pp. 947-950
    • Mixdorff, H.1    Jokisch, O.2
  • 73
    • 0000668614 scopus 로고    scopus 로고
    • Robustness of group-delay-based method for extraction of significant excitation from speech signals
    • Murthy P S, Yegnanarayana B 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals. IEEE Trans. Speech Audio Process. 7: 609-619.
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 609-619
    • Murthy, P.S.1    Yegnanarayana, B.2
  • 74
    • 0029254176 scopus 로고
    • Transformation of formants for voice conversion using artificial neural networks
    • Narendranadh M, Murthy H A, Rajendran S, Yegnanarayana B 1995 Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2): 206-216.
    • (1995) Speech Commun. , vol.16 , Issue.2 , pp. 206-216
    • Narendranadh, M.1    Murthy, H.A.2    Rajendran, S.3    Yegnanarayana, B.4
  • 76
    • 0242721417 scopus 로고    scopus 로고
    • Speech emotion recognition using hidden Markov models
    • Nwe T L, Foo S W, Silva L C D 2003 Speech emotion recognition using hidden Markov models. Speech Commun. 41(4): 603-623.
    • (2003) Speech Commun. , vol.41 , Issue.4 , pp. 603-623
    • Nwe, T.L.1    Foo, S.W.2    Silva, L.C.D.3
  • 79
    • 57049108119 scopus 로고    scopus 로고
    • (eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education
    • Pantic M, Bartlett M 2007 Machine analysis of facial expressions. In (eds) K Delac, M Grgic, Face recognition (Vienna: I-Tech Education) pp. 377-416.
    • (2007) Machine Analysis of Facial Expressions , pp. 377-416
    • Pantic, M.1    Bartlett, M.2
  • 80
    • 2942590310 scopus 로고    scopus 로고
    • Toward an affect-sensitive multimodal human-computer interaction
    • Pantic M, Rothkrantz L J M 2003 Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91: 1370-1390.
    • (2003) Proc. IEEE , vol.91 , pp. 1370-1390
    • Pantic, M.1    Rothkrantz, L.J.M.2
  • 83
    • 33745205178 scopus 로고    scopus 로고
    • PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India
    • Prasanna S R M 2004 Event-based analysis of speech. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India.
    • (2004) Event-based Analysis of Speech
    • Prasanna, S.R.M.1
  • 88
    • 79953168002 scopus 로고    scopus 로고
    • Application of prosody models for developing speech systems in Indian languages
    • Rao K S 2011 Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14: 19-33.
    • (2011) Int. J. Speech Technol. , vol.14 , pp. 19-33
    • Rao, K.S.1
  • 89
    • 54049142844 scopus 로고    scopus 로고
    • Voice conversion by mapping the speaker-specific features using pitch synchronous approach
    • Rao K S 2009 Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 23(2): 240-256.
    • (2009) Comput. Speech Lang , vol.23 , Issue.2 , pp. 240-256
    • Rao, K.S.1
  • 94
    • 34047248058 scopus 로고    scopus 로고
    • Prosody modification using instants of significant excitation
    • Rao K S, Yegnanarayana B 2006a Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3): 972-980.
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.3 , pp. 972-980
    • Rao, K.S.1    Yegnanarayana, B.2
  • 96
    • 33750713338 scopus 로고    scopus 로고
    • Modeling durations of syllables using neural networks
    • Rao K S, Yegnanarayana B 2007 Modeling durations of syllables using neural networks. Comput. Speech Lang. 21: 282-295.
    • (2007) Comput. Speech Lang. , vol.21 , pp. 282-295
    • Rao, K.S.1    Yegnanarayana, B.2
  • 97
    • 54049142844 scopus 로고    scopus 로고
    • Intonation modeling for Indian languages
    • Rao K S, Yegnanarayana B 2009 Intonation modeling for Indian languages. Comput. Speech Lang. 23: 240-256.
    • (2009) Comput. Speech Lang. , vol.23 , pp. 240-256
    • Rao, K.S.1    Yegnanarayana, B.2
  • 98
    • 0002069313 scopus 로고
    • In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam
    • Riley M 1992 Tree-based modeling of segmental durations. In Talking machines: Theories, models and designs, Elsevier Publishers, Amsterdam, pp. 265-273.
    • (1992) Tree-based Modeling of Segmental Durations , pp. 265-273
    • Riley, M.1
  • 99
    • 0028405296 scopus 로고
    • Assignment of segment duration in text-to-speech synthesis
    • Santen J P H V 1994 Assignment of segment duration in text-to-speech synthesis. Comput. Speech Lang. 8: 95-128.
    • (1994) Comput. Speech Lang. , vol.8 , pp. 95-128
    • Santen, J.P.H.V.1
  • 107
    • 0029375490 scopus 로고
    • Determination of instants of significant excitation in speech using group delay function
    • Smits R, Yegnanarayana B 1995 Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5): 325-333.
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 325-333
    • Smits, R.1    Yegnanarayana, B.2
  • 109
    • 0026953356 scopus 로고
    • Feedback stabilization using two hidden layer nets
    • Sontag E D 1992 Feedback stabilization using two hidden layer nets. IEEE Trans. Neural Networks 3: 981-990.
    • (1992) IEEE Trans. Neural Networks , vol.3 , pp. 981-990
    • Sontag, E.D.1
  • 117
    • 0034008810 scopus 로고    scopus 로고
    • Analysis and synthesis of intonation using the Tilt model
    • Taylor P A 2000 Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3): 1697-1714.
    • (2000) J. Acoust. Soc. Am. , vol.107 , Issue.3 , pp. 1697-1714
    • Taylor, P.A.1
  • 121
    • 0034842552 scopus 로고    scopus 로고
    • Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum
    • Toda T, Saruwatari H, Shikano K 2001 Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2: 841-844.
    • (2001) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 841-844
    • Toda, T.1    Saruwatari, H.2    Shikano, K.3
  • 125
    • 77950029784 scopus 로고    scopus 로고
    • Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany
    • Turk O 2007 Cross-lingual voice conversion. Ph. D. thesis, Institute for Graduate Studies in Science and Engineering, Bogazii University, Berlin, Germany.
    • (2007) Cross-lingual Voice Conversion
    • Turk, O.1
  • 127
    • 33746653351 scopus 로고    scopus 로고
    • Robust processing techniques for voice conversion
    • Turk O, Arslan L M 2006 Robust processing techniques for voice conversion. Comput. Speech Lang. 20: 441-467.
    • (2006) Comput. Speech Lang. , vol.20 , pp. 441-467
    • Turk, O.1    Arslan, L.M.2
  • 128
    • 77953699443 scopus 로고    scopus 로고
    • Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques
    • Turk O, Schroder M 2010 Evaluation of expressive speech synthesis with voice conversion and copy re-synthesis techniques. IEEE Trans. Speech Audio Process. 18(5): 965-973.
    • (2010) IEEE Trans. Speech Audio Process. , vol.18 , Issue.5 , pp. 965-973
    • Turk, O.1    Schroder, M.2
  • 131
  • 133
    • 33746410556 scopus 로고    scopus 로고
    • Emotional speech recognition: Resources, features, and methods
    • Ververidis D, Kotropoulos C 2006a Emotional speech recognition: Resources, features, and methods. Speech Commun. 48: 11621181.
    • (2006) Speech Commun. , vol.48 , pp. 11621181
    • Ververidis, D.1    Kotropoulos, C.2
  • 142
    • 0035989168 scopus 로고    scopus 로고
    • AANN an alternative to GMM for pattern recognition
    • Yegnanarayana B, Kishore S P 2002 AANN an alternative to GMM for pattern recognition. Neural Networks 15: 459-469.
    • (2002) Neural Networks , vol.15 , pp. 459-469
    • Yegnanarayana, B.1    Kishore, S.P.2
  • 145
    • 0032121729 scopus 로고    scopus 로고
    • Extraction of vocal-tract system characteristics from speech signals
    • Yegnanarayana B, Veldhuis R N J 1998 Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech Audio Process. 6(4): 313-327.
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.4 , pp. 313-327
    • Yegnanarayana, B.1    Veldhuis, R.N.J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.