메뉴 건너뛰기




Volumn 21, Issue 1, 2013, Pages 51-62

Personalized spectral and prosody conversion using frame-based codeword distribution and adaptive CRF

Author keywords

Conditional random field; frame alignment; principal component analysis; prosodic boundary; voice conversion

Indexed keywords

ALIGNMENT; FUNCTION EVALUATION; IMAGE SEGMENTATION; PRINCIPAL COMPONENT ANALYSIS; RANDOM PROCESSES; SPEECH COMMUNICATION; TELEPHONE SETS;

EID: 84867950508     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2213247     Document Type: Article
Times cited : (15)

References (35)
  • 1
    • 85133674021 scopus 로고    scopus 로고
    • Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV
    • Aug
    • J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, and K. Tokuda, "Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV," in Proc. ISCA SSW6, Aug. 2007.
    • (2007) Proc. ISCA SSW6
    • Yamagishi, J.1    Kobayashi, T.2    Renals, S.3    King, S.4    Zen, H.5    Toda, T.6    Tokuda, K.7
  • 2
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • Jan
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 4
    • 77953723062 scopus 로고    scopus 로고
    • Synthesis of child speech with HMM adaptation and voice conversion
    • Aug
    • O. Watts, J. Yamagishi, S. King, and K. Berkling, "Synthesis of child speech with HMM adaptation and voice conversion," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1005-1016, Aug. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.6 , pp. 1005-1016
    • Watts, O.1    Yamagishi, J.2    King, S.3    Berkling, K.4
  • 5
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-tospeech synthesis
    • May 12-15 vol. 1
    • A. Kain and M. W. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP'98, May 12-15, 1998, vol. 1, pp. 285-288, vol. 1.
    • (1998) Proc. ICASSP'98 , vol.1 , pp. 285-288
    • Kain, A.1    MacOn, M.W.2
  • 6
    • 84994241109 scopus 로고    scopus 로고
    • Including dynamic and phonetic information in voice conversion systems
    • Jeju Island, South Korea
    • H. Duxans, A. Bonafonte, A. Kain, and J. van Santen, "Including dynamic and phonetic information in voice conversion systems," in Proc. ICSLP '04, Jeju Island, South Korea, 2004, pp. 5-8.
    • (2004) Proc. ICSLP '04 , pp. 5-8
    • Duxans, H.1    Bonafonte, A.2    Kain, A.3    Van Santen, J.4
  • 7
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 8
    • 77952978184 scopus 로고    scopus 로고
    • Adaptive training for voice conversion based on eigenvoices
    • Jun
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Adaptive training for voice conversion based on eigenvoices," IEICE Trans. Inf. Syst., vol. E93-D, no. 6, pp. 1589-1598, Jun. 2010.
    • (2010) IEICE Trans. Inf. Syst , vol.E93-D , Issue.6 , pp. 1589-1598
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 9
    • 34548216761 scopus 로고    scopus 로고
    • Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion
    • Sep
    • C.-C. Hsia, C.-H. Wu, and J.-Q. Wu, "Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion," IEEE Trans. Comput., vol. 56, no. 9, pp. 1225-1233, Sep. 2007.
    • (2007) IEEE Trans. Comput , vol.56 , Issue.9 , pp. 1225-1233
    • Hsia, C.-C.1    Wu, C.-H.2    Wu, J.-Q.3
  • 10
    • 34047247202 scopus 로고    scopus 로고
    • Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis
    • DOI 10.1109/TASL.2006.876112
    • C.-H. Wu, C.-C. Hsia, T.-H. Liu, and J.-F. Wang, "Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1109-1116, Jul. 2006. (Pubitemid 46547608)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.4 , pp. 1109-1116
    • Wu, C.-H.1    Hsia, C.-C.2    Liu, T.-H.3    Wang, J.-F.4
  • 11
    • 0026394044 scopus 로고
    • Speaker adaptation and voice conversion by codebook mapping
    • Jun. 11-14
    • K. Shikano, S. Nakamura, and M. Abe, "Speaker adaptation and voice conversion by codebook mapping," in Proc. IEEE Int. Symp. Circuits Syst., Jun. 11-14, 1991, vol. 1, pp. 594-597, vol., no.
    • (1991) Proc. IEEE Int. Symp. Circuits Syst , vol.1 , pp. 594-597
    • Shikano, K.1    Nakamura, S.2    Abe, M.3
  • 13
    • 84946753271 scopus 로고    scopus 로고
    • VTLN-based cross-language voice conversion
    • 30 Nov.-3 Dec
    • D. Sundermann, H. Ney, and H. Hoge, "VTLN-based cross-language voice conversion," in Proc. IEEE Workshop on ASRU'03, 30 Nov.-3 Dec. 2003, pp. 676-681.
    • (2003) Proc. IEEE Workshop on ASRU'03 , pp. 676-681
    • Sundermann, D.1    Ney, H.2    Hoge, H.3
  • 14
    • 85128407266 scopus 로고    scopus 로고
    • Phonetic Alignment: Speech Synthesis vs. Hybrid HMM/ANN
    • Sydney, Australia Dec
    • F. Malfrere, O. Deroo, and T. Dutoit, "Phonetic Alignment: Speech Synthesis vs. Hybrid HMM/ANN," in Proc. ICSLP'98, Sydney, Australia, Dec. 1998, vol. 4, p. 1571.
    • (1998) Proc. ICSLP'98 , vol.4 , pp. 1571
    • Malfrere, F.1    Deroo, O.2    Dutoit, T.3
  • 15
    • 0030366724 scopus 로고    scopus 로고
    • Autolabelling japanese ToBI
    • Philadelphia, PA Oct
    • N. Campbell, "Autolabelling Japanese ToBI," in Proc. ICSLP'96, Philadelphia, PA, Oct. 1996.
    • (1996) Proc. ICSLP'96
    • Campbell, N.1
  • 16
    • 77955722263 scopus 로고    scopus 로고
    • Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis
    • Aug
    • C.-H. Wu, C.-C. Hsia, C.-H. Lee, and M.-C. Lin, "Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1394-1405, Aug. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.6 , pp. 1394-1405
    • Wu, C.-H.1    Hsia, C.-C.2    Lee, C.-H.3    Lin, M.-C.4
  • 17
    • 77956285048 scopus 로고    scopus 로고
    • Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis
    • Nov
    • C.-C. Hsia, C.-H. Wu, and J.-Y. Wu, "Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1994-2003, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.8 , pp. 1994-2003
    • Hsia, C.-C.1    Wu, C.-H.2    Wu, J.-Y.3
  • 18
    • 21844474040 scopus 로고    scopus 로고
    • Fluent speech prosody: Framework and modeling
    • DOI 10.1016/j.specom.2005.03.015, PII S0167639305000919, Quantitative Prosody Modelling for Natural Speech Description and Generation
    • C.-Y. Tseng, S.-H. Pin, Y.-L. Lee, H. M. Wang, and Y. C. Chen, "Fluent Speech Prosody: Framework and Modeling," Speech Commun., Spec. Iss. Quantitative Prosody Modeling for Natural Speech Description and Generation, vol. 46, no. 3-4, pp. 284-309, 2005. (Pubitemid 40952517)
    • (2005) Speech Communication , vol.46 , Issue.3-4 , pp. 284-309
    • Tseng, C.-Y.1    Pin, S.-H.2    Lee, Y.3    Wang, H.-M.4    Chen, Y.-C.5
  • 19
    • 13544257213 scopus 로고    scopus 로고
    • A statistics-based pitch contour model for Mandarin speech
    • DOI 10.1121/1.1841572
    • S.-H. Chen, W.-H. Lai, and Y.-R. Wang, "A statistics-based pitch contour model for mandarin speech," J. Acoust. Soc. Amer., vol. 117, no. 2, pp. 908-925, 2005. (Pubitemid 40223449)
    • (2005) Journal of the Acoustical Society of America , vol.117 , Issue.2 , pp. 908-925
    • Chen, S.-H.1    Lai, W.-H.2    Wang, Y.-R.3
  • 20
    • 4544354696 scopus 로고    scopus 로고
    • Segmental tonal modeling for phone set design in mandarin LVCSR
    • C. Huang, Y. Shi, J. L. Zhou, M. Chu, T. Wang, and E. Chang, "Segmental tonal modeling for phone set design in mandarin LVCSR," in Proc. ICASSP'04, 2004, pp. 901-904.
    • (2004) Proc. ICASSP'04 , pp. 901-904
    • Huang, C.1    Shi, Y.2    Zhou, J.L.3    Chu, M.4    Wang, T.5    Chang, E.6
  • 21
    • 0030677481 scopus 로고    scopus 로고
    • Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited
    • Munich, Germany
    • H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited," in Proc. ICASSP'97, Munich, Germany, 1997, pp. 1303-1306.
    • (1997) Proc. ICASSP'97 , pp. 1303-1306
    • Kawahara, H.1
  • 22
    • 21444431930 scopus 로고    scopus 로고
    • Locating boundaries for prosodic constituents in unrestricted mandarin texts
    • M. Chu and Y. Qian, "Locating boundaries for prosodic constituents in unrestricted mandarin texts," Comput. Linguist. Chinese Lang. Process., vol. 6, no. 1, pp. 61-82, 2001.
    • (2001) Comput. Linguist. Chinese Lang. Process , vol.6 , Issue.1 , pp. 61-82
    • Chu, M.1    Qian, Y.2
  • 24
    • 84856036312 scopus 로고
    • A corpus-based Mandarin text-to-speech synthesizer
    • A. Benijamin, S. Chilin, and S. Richard, "A corpus-based Mandarin text-to-speech synthesizer," in Proc. ICSLP, 1994, vol. S29, no. 8. 1-8. 4, pp. 1771-1774.
    • (1994) Proc. ICSLP , vol.S29 , Issue.81-84 , pp. 1771-1774
    • Benijamin, A.1    Chilin, S.2    Richard, S.3
  • 25
    • 70450171823 scopus 로고    scopus 로고
    • Analysis and recognition of accentual patterns
    • Wagner and Agnieszka, "Analysis and recognition of accentual patterns," in Proc. Interspeech'09, 2009, pp. 2427-2430, (2009).
    • (2009) Proc. Interspeech'09 , vol.2009 , pp. 2427-2430
    • Wagner1    Agnieszka2
  • 26
    • 0022796218 scopus 로고
    • Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models
    • Oct
    • L. Andrej and F. Frank, "Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1074-1080, Oct. 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-34 , Issue.5 , pp. 1074-1080
    • Andrej, L.1    Frank, F.2
  • 28
    • 85009282418 scopus 로고    scopus 로고
    • Pitch Contour Model for Chinese text-tospeech using CART and statistical model
    • M. Dong and K.-T. Lua, "Pitch Contour Model for Chinese text-tospeech using CART and statistical model," in Proc. ICSLP, 2002, pp. 2405-2408.
    • (2002) Proc. ICSLP , pp. 2405-2408
    • Dong, M.1    Lua, K.-T.2
  • 29
    • 0142192295 scopus 로고    scopus 로고
    • Conditional random fields: Probabilistic models for segmenting and labeling sequence data
    • J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. Int. Conf. Mach. Learn., 2001.
    • (2001) Proc. Int. Conf. Mach. Learn
    • Lafferty, J.1    McCallum, A.2    Pereira, F.3
  • 31
    • 33646887390 scopus 로고
    • On the limited memory BFGS method for large scale optimization
    • D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization,"Math.Programming, ser. B, vol. 45, no. 3, pp. 503-528, 1989. (Pubitemid 20660315)
    • (1989) Mathematical Programming, Series B , vol.45 , Issue.3 , pp. 503-528
    • Liu Dong, C.1    Nocedal Jorge2
  • 32
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the em algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. B, vol. 39, pp. 1-38, 1977.
    • (1977) J. R. Statist. Soc. B , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 33
    • 84867197177 scopus 로고    scopus 로고
    • Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
    • Sep
    • Z. H. Ling, K. Richmond, J. Yamagishi, and R. H. Wang, "Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008, pp. 573-576.
    • (2008) Proc. Interspeech'08, Brisbane, Australia , pp. 573-576
    • Ling, Z.H.1    Richmond, K.2    Yamagishi, J.3    Wang, R.H.4
  • 34
    • 44449179384 scopus 로고    scopus 로고
    • TH-CoSS, aMandarin speech corpus for TTS
    • Mar
    • L. H. Cai, D. D. Cui, and R. Cai, "TH-CoSS, aMandarin speech corpus for TTS," J. Chinese Inf. Process., vol. 21, no. 2, pp. 94-99, Mar. 2007.
    • (2007) J. Chinese Inf. Process , vol.21 , Issue.2 , pp. 94-99
    • Cai, L.H.1    Cui, D.D.2    Cai, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.