메뉴 건너뛰기




Volumn 17, Issue 6, 2009, Pages 1208-1230

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

Author keywords

Average voice; HMM Speech Synthesis System; HMM based speech synthesis; HTS; speaker adaptation; speech synthesis; voice conversion

Indexed keywords


EID: 85008006694     PISSN: 15587916     EISSN: 15587924     Source Type: Journal    
DOI: 10.1109/TASL.2009.2016394     Document Type: Article
Times cited : (161)

References (77)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • Sep.
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. EUROSPEECH-99, Sep. 1999, pp. 2374–2350.
    • (1999) Proc. EUROSPEECH-99 , pp. 2374
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 7044242284 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • in Japanese, Nov.
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J83-D-II, no. 11, pp. 2099–2107, Nov. 2000.
    • (2000) IEICE Trans. , vol.J83-D-II , Issue.11 , pp. 2099-2107
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 3
    • 34547526960 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Apr.
    • A. Black, H. Zen, and K. Tokuda, “Statistical parametric speech synthesis,” in Proc. ICASSP 2007, Apr. 2007, pp. 1229–1232.
    • (2007) Proc. ICASSP 2007 , pp. 1229-1232
    • Black, A.1    Zen, H.2    Tokuda, K.3
  • 5
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • May
    • K. Tokuda, T. Kobayashi, and S. Imai, “Speech parameter generation from HMM using dynamic features,” in Proc. ICASSP-95, May 1995, pp. 660–663.
    • (1995) Proc. ICASSP-95 , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 6
    • 0038582234 scopus 로고    scopus 로고
    • An algorithm for speech parameter generation from HMM using dynamic features
    • in Japanese, Mar.
    • K. Tokuda, T. Masuko, T. Kobayashi, and S. Imai “An algorithm for speech parameter generation from HMM using dynamic features,” (in Japanese) J. Acoust. Soc. Japan, vol. 53, no. 3, pp. 192–200, Mar. 1997.
    • (1997) J. Acoust. Soc. Japan , vol.53 , Issue.3 , pp. 192-200
    • Tokuda, K.1    Masuko, T.2    Kobayashi, T.3    Imai, S.4
  • 7
    • 0029725605 scopus 로고    scopus 로고
    • Speech synthesis using HMMs with dynamic features
    • May
    • T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, “Speech synthesis using HMMs with dynamic features,” in Proc. ICASSP-96, May 1996, pp. 389–392.
    • (1996) Proc. ICASSP-96 , pp. 389-392
    • Masuko, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 8
    • 0002025578 scopus 로고    scopus 로고
    • HMM-based speech synthesis using dynamic features
    • in Japanese, Dec.
    • T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai “HMM-based speech synthesis using dynamic features,” (in Japanese) IEICE Trans., vol. J79-D-II, no. 12, pp. 2184–2190, Dec. 1996.
    • (1996) IEICE Trans. , vol.J79-D-II , Issue.12 , pp. 2184-2190
    • Masuko, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 9
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorigthms for HMM-based speech synthesis
    • Jun.
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorigthms for HMM-based speech synthesis,” in Proc. ICASSP 2000, Jun. 2000, pp. 1315–1318.
    • (2000) Proc. ICASSP 2000 , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 10
    • 0036522887 scopus 로고    scopus 로고
    • Multi-space probability distribution HMM
    • Mar.
    • K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Multi-space probability distribution HMM,” IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455–464, Mar. 2002.
    • (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
    • Tokuda, K.1    Masuko, T.2    Miyazaki, N.3    Kobayashi, T.4
  • 11
    • 44449177634 scopus 로고    scopus 로고
    • A hidden semi-Markov model-based speech synthesis system
    • May
    • H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “A hidden semi-Markov model-based speech synthesis system,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825–834, May 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
    • Zen, H.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 13
    • 0022234383 scopus 로고
    • Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition
    • Mar.
    • M. Russell and R. Moore, “Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP-85, Mar. 1985, pp. 5–8.
    • (1985) Proc. ICASSP-85 , pp. 5-8
    • Russell, M.1    Moore, R.2
  • 14
    • 0022685753 scopus 로고
    • Continuously variable duration hidden Markov models for automatic speech recognition
    • S. Levinson, “Continuously variable duration hidden Markov models for automatic speech recognition,” Comput. Speech Lang., vol. 1, no. 1, pp. 29–45, 1986.
    • (1986) Comput. Speech Lang. , vol.1 , Issue.1 , pp. 29-45
    • Levinson, S.1
  • 15
    • 0029341719 scopus 로고
    • A mixed excitation LPC vocoder model for low bit rate speech coding
    • Jul.
    • A. McCree and T. Barnwell, III “A mixed excitation LPC vocoder model for low bit rate speech coding,” IEEE Trans. Speech Audio Process., vol. 3, no. 4, pp. 242–250, Jul. 1995.
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.4 , pp. 242-250
    • McCree, A.1    Barnwell, T.2
  • 16
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
    • Sep.
    • H. Kawahara, J. Estill, and O. Fujimura, “Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,” in Proc. 2nd MAVEBA, Sep. 2001, pp. 13–15.
    • (2001) Proc. 2nd MAVEBA , pp. 13-15
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 19
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
    • in Japanese, Aug.
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis,” (in Japanese) IEICE Trans., vol. J87-D-II, no. 8, pp. 1565–1571, Aug. 2004.
    • (2004) IEICE Trans. , vol.J87-D-II , Issue.8 , pp. 1565-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 20
    • 33846405723 scopus 로고    scopus 로고
    • Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
    • Jan.
    • H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005,” IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325–333, Jan. 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, K.4
  • 21
    • 34547542349 scopus 로고    scopus 로고
    • Improving Arabic HMM based speech synthesis quality
    • Sep.
    • A.-H. Ossama, A. S. Mahdy, and R. Mohsen, “Improving Arabic HMM based speech synthesis quality,” in Proc. Interspeech 2006, Sep. 2006, pp. 1332–1335.
    • (2006) Proc. Interspeech 2006 , pp. 1332-1335
    • Ossama, A.-H.1    Mahdy, A.S.2    Mohsen, R.3
  • 22
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • May
    • T. Toda and K. Tokuda, “A speech parameter generation algorithm considering global variance for HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816–824, May 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 23
    • 68249104241 scopus 로고    scopus 로고
    • The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006
    • Jun.
    • H. Zen, T. Toda, and K. Tokuda, “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006,” IEICE Trans. Inf. Syst., vol. E91-D, no. 6, pp. 1764–1773, Jun. 2008.
    • (2008) IEICE Trans. Inf. Syst. , vol.E91-D , Issue.6 , pp. 1764-1773
    • Zen, H.1    Toda, T.2    Tokuda, K.3
  • 24
  • 25
    • 77953693469 scopus 로고    scopus 로고
    • Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007
    • Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003
    • J. Yamagishi, H. Zen, T. Toda, and K. Tokuda, “Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003.
    • (2007) Proc. BLZ3-2007 (in Proc. SSW6)
    • Yamagishi, J.1    Zen, H.2    Toda, T.3    Tokuda, K.4
  • 26
    • 33745216749 scopus 로고    scopus 로고
    • The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets
    • Sep.
    • A. Black and K. Tokuda, “The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets,” in Proc. Eurospeech 2005, Sep. 2005, pp. 77–80.
    • (2005) Proc. Eurospeech 2005 , pp. 77-80
    • Black, A.1    Tokuda, K.2
  • 27
    • 68249083782 scopus 로고    scopus 로고
    • The blizzard challenge 2006
    • Sep., [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
    • C. Bennett and A. Black, “The blizzard challenge 2006,” in Proc. Blizzard Challenge 2006, Sep. 2006 [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
    • (2006) Proc. Blizzard Challenge 2006
    • Bennett, C.1    Black, A.2
  • 28
    • 79952269421 scopus 로고    scopus 로고
    • The Blizzard Challenge 2007
    • Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001
    • M. Fraser and S. King, “The Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001.
    • (2007) Proc. BLZ3-2007 (in Proc. SSW6)
    • Fraser, M.1    King, S.2
  • 29
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., vol. 27, pp. 187–207, 1999.
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigne, A.3
  • 30
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • Mar.
    • M. Gales “Semi-tied covariance matrices for hidden Markov models,” IEEE Trans. Speech Audio Process., vol. 7, pp. 272–281, Mar. 1999.
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 272-281
    • Gales, M.1
  • 31
    • 84892187452 scopus 로고    scopus 로고
    • Maximum likelihood modeling with Gaussian distributions for classfication
    • May
    • R. Gopinath, “Maximum likelihood modeling with Gaussian distributions for classfication,” in Proc. ICASSP-98, May 1998, pp. 661–664.
    • (1998) Proc. ICASSP-98 , pp. 661-664
    • Gopinath, R.1
  • 32
    • 33847129573 scopus 로고    scopus 로고
    • Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
    • Feb.
    • J. Yamagishi and T. Kobayashi, “Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,” IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533–543, Feb. 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
    • Yamagishi, J.1    Kobayashi, T.2
  • 33
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • Jan. 2009
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm,” IEEE Trans. Speech, Audio, Lang. Process., vol. 17, no. 1, pp. 66–83, Jan. 2009, 2007.
    • (2007) IEEE Trans. Speech, Audio, Lang. Process , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 35
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. Leggetter and P. Woodland “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., vol. 9, no. 2, pp. 171–185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.1    Woodland, P.2
  • 36
    • 0034842740 scopus 로고    scopus 로고
    • Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    • May
    • M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR,” in Proc. ICASSP-01, May 2001, pp. 805–808.
    • (2001) Proc. ICASSP-01 , pp. 805-808
    • Tamura, M.1    Masuko, T.2    Tokuda, K.3    Kobayashi, T.4
  • 37
    • 85008066911 scopus 로고    scopus 로고
    • Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
    • in Japanese, Apr.
    • M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Speaker adaptation of pitch and spectrum for HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J85-D-II, no. 4, pp. 545–553, Apr. 2002.
    • (2002) IEICE Trans. , vol.J85-D-II , Issue.4 , pp. 545-553
    • Tamura, M.1    Masuko, T.2    Tokuda, K.3    Kobayashi, T.4
  • 39
    • 0142007308 scopus 로고    scopus 로고
    • A training method of average voice model for HMM-based speech synthesis
    • Aug.
    • J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “A training method of average voice model for HMM-based speech synthesis,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 1956–1963, Aug. 2003.
    • (2003) IEICE Trans. Fundamentals , vol.E86-A , Issue.8 , pp. 1956-1963
    • Yamagishi, J.1    Tamura, M.2    Masuko, T.3    Tokuda, K.4    Kobayashi, T.5
  • 40
    • 33645768204 scopus 로고    scopus 로고
    • A style adaptation technique for speech synthesis using HSMM and suprasegmental features
    • Mar.
    • M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, “A style adaptation technique for speech synthesis using HSMM and suprasegmental features,” IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1092–1099, Mar. 2006.
    • (2006) IEICE Trans. Inf. Syst. , vol.E89-D , Issue.3 , pp. 1092-1099
    • Tachibana, M.1    Yamagishi, J.2    Masuko, T.3    Kobayashi, T.4
  • 42
    • 33748468338 scopus 로고    scopus 로고
    • New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
    • J. Latorre, K. Iwano, and S. Furui, “New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer,” Speech Commun., vol. 48, no. 10, pp. 1227–1242, 2006.
    • (2006) Speech Commun. , vol.48 , Issue.10 , pp. 1227-1242
    • Latorre, J.1    Iwano, K.2    Furui, S.3
  • 43
    • 0030189744 scopus 로고    scopus 로고
    • Speaker adaptation using combined transformation and Bayesian methods
    • Jul.
    • V. Digalakis and L. Neumeyer “Speaker adaptation using combined transformation and Bayesian methods,” IEEE Trans. Speech Audio Process., vol. 4, pp. 294–300, Jul. 1996.
    • (1996) IEEE Trans. Speech Audio Process. , vol.4 , pp. 294-300
    • Digalakis, V.1    Neumeyer, L.2
  • 44
    • 0035279111 scopus 로고    scopus 로고
    • A structural Bayes approach to speaker adaptation
    • Mar.
    • K. Shinoda and C. Lee, “A structural Bayes approach to speaker adaptation,” IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 276–287, Mar. 2001.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.3 , pp. 276-287
    • Shinoda, K.1    Lee, C.2
  • 45
    • 11144317887 scopus 로고    scopus 로고
    • Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
    • Dec.
    • D. Arifianto, T. Tanaka, T. Masuko, and T. Kobayashi, “Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency,” IEICE Trans. Inf. Syst., vol. E87-D, no. 12, pp. 2812–2820, Dec. 2004.
    • (2004) IEICE Trans. Inf. Syst. , vol.E87-D , Issue.12 , pp. 2812-2820
    • Arifianto, D.1    Tanaka, T.2    Masuko, T.3    Kobayashi, T.4
  • 46
    • 84928118106 scopus 로고    scopus 로고
    • Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity
    • Sep.
    • H. Kawahara, H. Katayose, A. Cheveigne, and R. Patterson, “Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity,” in Proc. Eurospeech 1999, Sep. 1999, pp. 2781–2784.
    • (1999) Proc. Eurospeech 1999 , pp. 2781-2784
    • Kawahara, H.1    Katayose, H.2    Cheveigne, A.3    Patterson, R.4
  • 47
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • W. Kleijn and K. Paliwal, Eds. New York: Elsevier
    • D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. New York: Elsevier, 1995, pp. 495–518.
    • (1995) Speech Coding and Synthesis , pp. 495-518
    • Talkin, D.1
  • 49
    • 84966348891 scopus 로고    scopus 로고
    • An HMM-based speech synthesis system applied to English
    • Sep.
    • K. Tokuda, H. Zen, and A. Black, “An HMM-based speech synthesis system applied to English,” in Proc. IEEE Speech Synth. Workshop, Sep. 2002, pp. 227–230.
    • (2002) Proc. IEEE Speech Synth. Workshop , pp. 227-230
    • Tokuda, K.1    Zen, H.2    Black, A.3
  • 50
    • 0002985991 scopus 로고
    • Mora and syllable
    • N. Tsujimura, Ed. Chichester, U.K.: Blackwell
    • H. Kubozono, “Mora and syllable,” in The handbook of Japanese Linguistics, N. Tsujimura, Ed. Chichester, U.K.: Blackwell, 1995, pp. 31–61.
    • (1995) The handbook of Japanese Linguistics , pp. 31-61
    • Kubozono, H.1
  • 51
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. Gales “Maximum likelihood linear transformations for HMM-based speech recognition,” Comput. Speech Lang., vol. 12, no. 2, pp. 75–98, 1998.
    • (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.1
  • 52
    • 0029375590 scopus 로고
    • Speaker adaptation using constrained reestimation of Gaussian mixtures
    • Sep.
    • V. Digalakis, D. Rtischev, and L. Neumeyer “Speaker adaptation using constrained reestimation of Gaussian mixtures,” IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357–366, Sep. 1995.
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 357-366
    • Digalakis, V.1    Rtischev, D.2    Neumeyer, L.3
  • 53
    • 85008042245 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • A. Dempster, N. Laird, and D. Rubin “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc., Series B, vol. 39, no. 1, pp. 1–38, 1977.
    • (1977) J. R. Statist. Soc., Series B , vol.39 , Issue.1 , pp. 1-38
    • Dempster, A.1    Laird, N.2    Rubin, D.3
  • 54
    • 24144497811 scopus 로고    scopus 로고
    • Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
    • Mar.
    • J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, “Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 503–509, Mar. 2005.
    • (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 503-509
    • Yamagishi, J.1    Onishi, K.2    Masuko, T.3    Kobayashi, T.4
  • 55
    • 0033906251 scopus 로고    scopus 로고
    • MDL-based context-dependent subword modeling for speech recognition
    • Mar.
    • K. Shinoda and T. Watanabe, “MDL-based context-dependent subword modeling for speech recognition,” J. Acoust. Soc. Japan (E), vol. 21, pp. 79–86, Mar. 2000.
    • (2000) J. Acoust. Soc. Japan (E) , vol.21 , pp. 79-86
    • Shinoda, K.1    Watanabe, T.2
  • 56
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines and F. Charpentier “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Commun., vol. 9, no. 5–6, pp. 453–468, 1990.
    • (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-468
    • Moulines, E.1    Charpentier, F.2
  • 57
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
    • Sep.
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, “Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation,” in Proc. Interspeech 2006, Sep. 2006, pp. 2266–2269.
    • (2006) Proc. Interspeech 2006 , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 59
    • 34547529978 scopus 로고    scopus 로고
    • Model adaptation approach to speech synthesis with diverse voices and styles
    • Apr.
    • J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano, “Model adaptation approach to speech synthesis with diverse voices and styles,” in Proc. ICASSP-07, Apr. 2007, pp. 1233–1236.
    • (2007) Proc. ICASSP-07 , pp. 1233-1236
    • Yamagishi, J.1    Kobayashi, T.2    Tachibana, M.3    Ogata, K.4    Nakano, Y.5
  • 62
    • 0037278070 scopus 로고    scopus 로고
    • An efficient forward-backward algorithm for an explicit-duration hidden Markov model
    • Jan.
    • S.-Z. Yu and H. Kobayashi, “An efficient forward-backward algorithm for an explicit-duration hidden Markov model,” IEEE Signal Process. Lett., vol. 10, no. 1, pp. 11–14, Jan. 2003.
    • (2003) IEEE Signal Process. Lett. , vol.10 , Issue.1 , pp. 11-14
    • Yu, S.-Z.1    Kobayashi, H.2
  • 66
    • 67650832556 scopus 로고    scopus 로고
    • Statistical analysis of the Blizzard Challenge 2007 listening test results
    • Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003
    • R. Clark, M. Podsiadlo, M. Fraser, C. Mayo, and S. King, “Statistical analysis of the Blizzard Challenge 2007 listening test results,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003.
    • (2007) Proc. BLZ3-2007 (in Proc. SSW6)
    • Clark, R.1    Podsiadlo, M.2    Fraser, M.3    Mayo, C.4    King, S.5
  • 67
    • 85008031526 scopus 로고    scopus 로고
    • The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007
    • Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017
    • Z.-H. Ling, L. Qin, H. Lu, Y. Gao, L.-R. Dai, R.-H. Wang, Y. Jiang, Z.-W. Zhao, J.-H.Y.J. Chen, and G.-P. Hu, “The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017.
    • (2007) Proc. BLZ3-2007 (in Proc. SSW6)
    • Ling, Z.-H.1    Qin, L.2    Lu, H.3    Gao, Y.4    Dai, L.-R.5    Wang, R.-H.6    Jiang, Y.7    Zhao, Z.-W.8    Chen, J.-H.Y.J.9    Hu, G.-P.10
  • 68
    • 51449101140 scopus 로고    scopus 로고
    • Festival Multisyn voices for the 2007 Blizzard Challenge
    • Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006
    • K. Richmond, V. Strom, R. Clark, J. Yamagishi, and S. Fitt, “Festival Multisyn voices for the 2007 Blizzard Challenge,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006.
    • (2007) Proc. BLZ3-2007 (in Proc. SSW6)
    • Richmond, K.1    Strom, V.2    Clark, R.3    Yamagishi, J.4    Fitt, S.5
  • 69
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • May
    • A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP-96, May 1996, pp. 373–376.
    • (1996) Proc. ICASSP-96 , pp. 373-376
    • Hunt, A.1    Black, A.2
  • 70
    • 34547503417 scopus 로고    scopus 로고
    • HMM-based unit selection using frame sized speech segments
    • Sep.
    • Z.-H. Ling and R.-H. Wang, “HMM-based unit selection using frame sized speech segments,” in Proc. Interspeech 2006, Sep. 2006, pp. 2034–2037.
    • (2006) Proc. Interspeech 2006 , pp. 2034-2037
    • Ling, Z.-H.1    Wang, R.-H.2
  • 71
    • 34547612590 scopus 로고    scopus 로고
    • HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion
    • Apr.
    • Z.-H. Ling and R.-H. Wang, “HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion,” in Proc. ICASSP-07, Apr. 2007, pp. 1245–1248.
    • (2007) Proc. ICASSP-07 , pp. 1245-1248
    • Ling, Z.-H.1    Wang, R.-H.2
  • 72
    • 34047123652 scopus 로고    scopus 로고
    • Multisyn: Open-domain unit selection for the Festival speech synthesis system
    • R. A. J. Clark, K. Richmond, and S. King, “Multisyn: Open-domain unit selection for the Festival speech synthesis system,” Speech Commun., vol. 49, no. 4, pp. 317–330, 2007.
    • (2007) Speech Commun. , vol.49 , Issue.4 , pp. 317-330
    • Clark, R.A.J.1    Richmond, K.2    King, S.3
  • 73
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • May, [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
    • Y. Wu and R.-H. Wang, “Minimum generation error training for HMM-based speech synthesis,” in Proc. ICASSP-06, May 2006, pp. 89–92 [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
    • (2006) Proc. ICASSP-06 , pp. 89-92
    • Wu, Y.1    Wang, R.-H.2
  • 74
    • 0030166343 scopus 로고    scopus 로고
    • The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
    • C. Benoit, M. Grice, and V. Hazan, “The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences,” Speech Commun., vol. 18, no. 4, pp. 381–392, 1996.
    • (1996) Speech Commun. , vol.18 , Issue.4 , pp. 381-392
    • Benoit, C.1    Grice, M.2    Hazan, V.3
  • 75
    • 85030493378 scopus 로고    scopus 로고
    • Synthesis of regional English using a keyword lexicon
    • Sep.
    • S. Fitt and S. Isard, “Synthesis of regional English using a keyword lexicon,” in Proc. Eurospeech 1999, Sep. 1999, vol. 2, pp. 823–826.
    • (1999) Proc. Eurospeech 1999 , vol.2 , pp. 823-826
    • Fitt, S.1    Isard, S.2
  • 76
    • 70449126171 scopus 로고    scopus 로고
    • Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
    • Sep.
    • J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, “The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge,” in Proc. Blizzard Challenge 2008, Sep. 2008.
    • (2008) Proc. Blizzard Challenge 2008
    • Yamagishi, J.1    Zen, H.2    Wu, Y.-J.3    Toda, T.4    Tokuda, K.5
  • 77
    • 67650803663 scopus 로고    scopus 로고
    • Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning
    • Feb., [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
    • M. Aylett and J. Yamagishi, “Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning,” in Proc. LangTech 2008, Feb. 2008 [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
    • (2008) Proc. LangTech 2008
    • Aylett, M.1    Yamagishi, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.