메뉴 건너뛰기




Volumn 18, Issue 5, 2010, Pages 965-973

Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques

Author keywords

Expressive speech synthesis; Prosody; Voice conversion; Voice quality transformation

Indexed keywords

APPROXIMATE MODEL; COMBINED MODELING; EXPRESSIVE SPEECH; EXPRESSIVE SPEECH SYNTHESIS; FACTORIAL DESIGN; LISTENING TESTS; OPEN SOURCES; RELATIVE CONTRIBUTION; SIGNAL MANIPULATION; SYNTHETIC SPEECH; TEXT TO SPEECH; TRANSFORMATION ALGORITHM; UNIT SELECTION; VOCAL-TRACTS; VOICE CONVERSION; VOICE QUALITY;

EID: 77953699443     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2010.2041113     Document Type: Article
Times cited : (59)

References (47)
  • 3
    • 84966398940 scopus 로고
    • Optimising selection of units from speech databases for concatenative synthesis
    • Madrid, Spain
    • A. W. Black and N. Campbell, "Optimising selection of units from speech databases for concatenative synthesis," in Proc. Eurospeech, Madrid, Spain, 1995, pp. 581-584.
    • (1995) Proc. Eurospeech , pp. 581-584
    • Black, A.W.1    Campbell, N.2
  • 5
    • 0142153901 scopus 로고    scopus 로고
    • Speech database design for aconcatenative text-to-speech synthesis system for individuals with communication disorders
    • A. Iida and N. Campbell, "Speech database design for aconcatenative text-to-speech synthesis system for individuals with communication disorders," Int. J. Speech Technol., vol.6, pp. 379-392, 2003.
    • (2003) Int. J. Speech Technol. , vol.6 , pp. 379-392
    • Iida, A.1    Campbell, N.2
  • 6
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • Budapest, Hungary
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, Budapest, Hungary, 1999.
    • (1999) Proc. Eurospeech
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 8
    • 34547529978 scopus 로고    scopus 로고
    • Model adaptation approach to speech synthesis with diverse voices and styles
    • Honolulu, Hawaii
    • J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano, "Model adaptation approach to speech synthesis with diverse voices and styles," in Proc. ICASSP, Honolulu, Hawaii, 2007, pp. 1233-1236.
    • (2007) Proc. ICASSP , pp. 1233-1236
    • Yamagishi, J.1    Kobayashi, T.2    Tachibana, M.3    Ogata, K.4    Nakano, Y.5
  • 9
    • 51449098017 scopus 로고    scopus 로고
    • Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis
    • Las Vegas, NV
    • M. Tachibana, S. Izawa, T. Nose, and T. Kobayashi, "Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis," in Proc. ICASSP, Las Vegas, NV, pp. 4633-4636.
    • Proc. ICASSP , pp. 4633-4636
    • Tachibana, M.1    Izawa, S.2    Nose, T.3    Kobayashi, T.4
  • 10
    • 85009069226 scopus 로고    scopus 로고
    • A style control technique for HMM-based speech synthesis
    • Jeju, Korea
    • K. Miyanaga, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based speech synthesis," in Proc. ICSLP, Jeju, Korea, 2004.
    • (2004) Proc. ICSLP
    • Miyanaga, K.1    Masuko, T.2    Kobayashi, T.3
  • 11
    • 34547529063 scopus 로고    scopus 로고
    • A style control technique for speech synthesis using multiple regression HSMM
    • Pittsburgh, PA, USA
    • T. Nose, J. Yamagishi, and T. Kobayashi, "A style control technique for speech synthesis using multiple regression HSMM," in Proc. INTERSPEECH 2006, Pittsburgh, PA, USA.
    • Proc. INTERSPEECH 2006
    • Nose, T.1    Yamagishi, J.2    Kobayashi, T.3
  • 14
    • 0033154052 scopus 로고    scopus 로고
    • Speaker transformation algorithm using segmental codebooks
    • L. M. Arslan, "Speaker transformation algorithm using segmental codebooks," Speech Commun., vol.28, pp. 211-226, 1999.
    • (1999) Speech Commun. , vol.28 , pp. 211-226
    • Arslan, L.M.1
  • 15
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar.
    • Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol.6, no.2, pp. 131-142, Mar. 1998.
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappe, O.2    Moulines, E.3
  • 16
    • 4444285698 scopus 로고    scopus 로고
    • Ph.D. dissertation, OGI School of Sci. and Eng., Oregon Health and Sci. Univ., Beaverton
    • A. B. Kain, "High resolution voice transformation," Ph.D. dissertation, OGI School of Sci. and Eng., Oregon Health and Sci. Univ., Beaverton, 2001.
    • (2001) High Resolution Voice Transformation
    • Kain, A.B.1
  • 17
    • 77950029784 scopus 로고    scopus 로고
    • Ph.D. dissertation, Bo?gaziçi Univ., Istanbul, Turkey
    • O. Türk, "Cross-lingual voice conversion," Ph.D. dissertation, Bo?gaziçi Univ., Istanbul, Turkey, 2007.
    • (2007) Cross-lingual Voice Conversion
    • Türk, O.1
  • 19
    • 85135141647 scopus 로고    scopus 로고
    • Hidden markov model based voice conversion using dynamic characteristics of speaker
    • E.-K. Kim, S. Lee, and Y.-H. Oh, "Hidden markov model based voice conversion using dynamic characteristics of speaker," in Proc. Eurospeech, 1997, pp. 2519-2522.
    • (1997) Proc. Eurospeech , pp. 2519-2522
    • Kim, E.-K.1    Lee, S.2    Oh, Y.-H.3
  • 20
    • 85009250849 scopus 로고    scopus 로고
    • Subband based voice conversion
    • CO, Sep.
    • O. Türk and L. M. Arslan, "Subband based voice conversion," in Proc. ICSLP, Denver, CO, Sep. 2002, vol.1, pp. 289-292.
    • (2002) Proc. ICSLP, Denver , vol.1 , pp. 289-292
    • Türk, O.1    Arslan, L.M.2
  • 21
    • 70349207267 scopus 로고    scopus 로고
    • Application of voice conversion for cross-language rap singing transformation
    • Taipei, Taiwan, Apr.
    • O. Türk, O. Büyük, A. Haznedaroglu, and L. M. Arslan, "Application of voice conversion for cross-language rap singing transformation," in Proc. IEEE ICASSP, Taipei, Taiwan, Apr. 2009.
    • (2009) Proc. IEEE ICASSP
    • Türk, O.1    Büyük, O.2    Haznedaroglu, A.3    Arslan, L.M.4
  • 22
    • 84938935270 scopus 로고    scopus 로고
    • A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality
    • Antwerp, Belgium, Aug. 27-31
    • Z. Inanoglu and S. J. Young, "A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality," in Proc. Interspeech, Antwerp, Belgium, Aug. 27-31, 2007.
    • (2007) Proc. Interspeech
    • Inanoglu, Z.1    Young, S.J.2
  • 23
    • 70349200844 scopus 로고    scopus 로고
    • Voice conversion for various types of body transmitted speech
    • Taipei, Taiwan, Apr.
    • T. Toda, K. Nakamura, H. Sekimoto, and K. Shikano, "Voice conversion for various types of body transmitted speech," in Proc. IEEE ICASSP, Taipei, Taiwan, Apr. 2009.
    • (2009) Proc. IEEE ICASSP
    • Toda, T.1    Nakamura, K.2    Sekimoto, H.3    Shikano, K.4
  • 24
    • 84869508926 scopus 로고    scopus 로고
    • A voice conversion method based on joint pitch and spectral envelope transformation
    • Jeju, Korea
    • T. En-Najjary, O. Rosec, and T. Chonavel, "A voice conversion method based on joint pitch and spectral envelope transformation," in Proc. 8th Int. Conf. Spoken Lang. Process., Jeju, Korea, 2004.
    • (2004) Proc. 8th Int. Conf. Spoken Lang. Process.
    • En-Najjary, T.1    Rosec, O.2    Chonavel, T.3
  • 25
    • 84867219635 scopus 로고    scopus 로고
    • A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
    • Brisbane, Australia
    • O. Türk and M. Schröder, "A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis," in Proc. Interspeech, Brisbane, Australia, 2008, pp. 2282-2285.
    • (2008) Proc. Interspeech , pp. 2282-2285
    • Türk, O.1    Schröder, M.2
  • 26
    • 0032629673 scopus 로고    scopus 로고
    • Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis
    • Phoenix, AZ
    • Y. Stylianou, "Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis," in Proc. IEEE ICASSP, Phoenix, AZ, 1999.
    • (1999) Proc. IEEE ICASSP
    • Stylianou, Y.1
  • 27
    • 33646769932 scopus 로고    scopus 로고
    • Polyglot synthesis using amixture of monolingual corpora
    • J. Latorre, K. Iwano, and S. Furui, "Polyglot synthesis using amixture of monolingual corpora," in Proc. IEEE ICASSP, 2005, vol.1, pp. 1-4.
    • (2005) Proc. IEEE ICASSP , vol.1 , pp. 1-4
    • Latorre, J.1    Iwano, K.2    Furui, S.3
  • 29
    • 0027839344 scopus 로고    scopus 로고
    • Text-to-speech synthesis based on aMBE re-synthesis of segments database
    • T. Dutoit and H. Leich, "Text-to-speech synthesis based on aMBE re-synthesis of segments database," Speech Commun., vol.13, pp. 435-440.
    • Speech Commun. , vol.13 , pp. 435-440
    • Dutoit, T.1    Leich, H.2
  • 30
    • 85009141811 scopus 로고    scopus 로고
    • Improvement in corpus-based generation of f0 contours using generation process model for emotional speech synthesis
    • K. Hirose, "Improvement in corpus-based generation of f0 contours using generation process model for emotional speech synthesis," in Proc. Interspeech, 2004, pp. 1349-1352.
    • (2004) Proc. Interspeech , pp. 1349-1352
    • Hirose, K.1
  • 32
    • 39649107657 scopus 로고    scopus 로고
    • Content-based transformation of the expressivity in speech
    • Saarbrücken, Germany, Aug.
    • G. Beller and X. Rodet, "Content-based transformation of the expressivity in speech," in Proc. 16th Int. Congr. Phonetic Sci., Saarbrücken, Germany, Aug. 2007, pp. 2157-2160.
    • (2007) Proc. 16th Int. Congr. Phonetic Sci. , pp. 2157-2160
    • Beller, G.1    Rodet, X.2
  • 33
    • 33646791479 scopus 로고    scopus 로고
    • Prosody analysis and modeling for emotional speech synthesis
    • Mar.
    • D. Jiang, W. Zhang, L. Shen, and L. Cai, "Prosody analysis and modeling for emotional speech synthesis," in Proc. IEEE ICASSP, Mar. 2005, vol.1, pp. 281-284.
    • (2005) Proc. IEEE ICASSP , vol.1 , pp. 281-284
    • Jiang, D.1    Zhang, W.2    Shen, L.3    Cai, L.4
  • 35
    • 34547519038 scopus 로고    scopus 로고
    • A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
    • Honolulu, HI, Apr.
    • M. Bulut, S. Lee, and S. Narayanan, "A statistical approach for modeling prosody features using POS tags for emotional speech synthesis," in Proc. IEEE ICASSP, Honolulu, HI, Apr. 2007, vol.4, pp. 1237-1240.
    • (2007) Proc. IEEE ICASSP , vol.4 , pp. 1237-1240
    • Bulut, M.1    Lee, S.2    Narayanan, S.3
  • 36
    • 33746653351 scopus 로고    scopus 로고
    • Robust processing techniques for voice conversion
    • O. Türk and L. M. Arslan, "Robust processing techniques for voice conversion," Comput. Speech Lang., vol.20, pp. 441-467, 2006.
    • (2006) Comput. Speech Lang. , vol.20 , pp. 441-467
    • Türk, O.1    Arslan, L.M.2
  • 37
    • 0009151070 scopus 로고
    • Time-domain and frequency-domain techniques for prosodic modification of speech
    • Kleijn and Paliwal, Eds. Amsterdam, The Netherlands: Elsevier
    • E. Moulines and W. Verhelst, "Time-domain and frequency-domain techniques for prosodic modification of speech," in Speech Coding and Synthesis, Kleijn and Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995, pp. 519-555.
    • (1995) Speech Coding and Synthesis , pp. 519-555
    • Moulines, E.1    Verhelst, W.2
  • 39
    • 58149203393 scopus 로고    scopus 로고
    • Data-driven emotion conversion in spoken English
    • Mar.
    • Z. Inanoglu and S. Young, "Data-driven emotion conversion in spoken English," Speech Commun., vol.51, no.3, pp. 268-283, Mar. 2009.
    • (2009) Speech Commun. , vol.51 , Issue.3 , pp. 268-283
    • Inanoglu, Z.1    Young, S.2
  • 42
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis
    • A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. IEEE ICASSP, 1998, vol.1, pp. 285-288.
    • (1998) Proc. IEEE ICASSP , vol.1 , pp. 285-288
    • Kain, A.1    MacOn, M.2
  • 43
    • 0026400231 scopus 로고
    • Robust and efficient quantization of speech LSP parameters using structured vector quantizers
    • R. Laroia, N. Phamdo, and N. Farvardin, "Robust and efficient quantization of speech LSP parameters using structured vector quantizers," in Proc. IEEE ICASSP, 1991, pp. 641-644.
    • (1991) Proc. IEEE ICASSP , pp. 641-644
    • Laroia, R.1    Phamdo, N.2    Farvardin, N.3
  • 45
    • 0001884644 scopus 로고
    • Individual comparisons by ranking methods
    • F. Wilcoxon, "Individual comparisons by ranking methods," Biometrics Bull. 1, pp. 80-83, 1945.
    • (1945) Biometrics Bull. , vol.1 , pp. 80-83
    • Wilcoxon, F.1
  • 47
    • 0031624617 scopus 로고    scopus 로고
    • TDPSOLA versus harmonic plus noise model in diphone based speech synthesis
    • Seattle, WA
    • A. Syrdal, Y. Stylianou, L. Garrison, A. Conkie, and J. Schroeter, "TDPSOLA versus harmonic plus noise model in diphone based speech synthesis," in Proc. IEEE ICASSP, Seattle, WA, 1998, pp. 273-276.
    • (1998) Proc. IEEE ICASSP , pp. 273-276
    • Syrdal, A.1    Stylianou, Y.2    Garrison, L.3    Conkie, A.4    Schroeter, J.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.