SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 1, 2013, Pages 51-62

Personalized spectral and prosody conversion using frame-based codeword distribution and adaptive CRF

(3) Huang, Yi Chin a Wu, Chung Hsien a Chao, Yu Ting a

a NATIONAL CHENG KUNG UNIVERSITY (Taiwan)

Author keywords

Conditional random field; frame alignment; principal component analysis; prosodic boundary; voice conversion

Indexed keywords

ALIGNMENT; FUNCTION EVALUATION; IMAGE SEGMENTATION; PRINCIPAL COMPONENT ANALYSIS; RANDOM PROCESSES; SPEECH COMMUNICATION; TELEPHONE SETS;

CONDITIONAL RANDOM FIELD; CONVERSION FUNCTION; DISTANCE ESTIMATION; FRAME ALIGNMENTS; PROSODIC BOUNDARY; SPECTRAL PROPERTIES; VOICE CHARACTERISTICS; VOICE CONVERSION;

ARTIFICIAL INTELLIGENCE;

EID: 84867950508 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2213247 Document Type: Article

Times cited : (15)

References (35)

1
- 85133674021
- Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV
- Aug
- J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, and K. Tokuda, "Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV," in Proc. ISCA SSW6, Aug. 2007.
- (2007) Proc. ISCA SSW6
- Yamagishi, J.¹ Kobayashi, T.² Renals, S.³ King, S.⁴ Zen, H.⁵ Toda, T.⁶ Tokuda, K.⁷

2
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

3
- 0032026483
- Continuous probabilistic transform for voice conversion
- PII S1063667698017386
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998. (Pubitemid 128720639)
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

4
- 77953723062
- Synthesis of child speech with HMM adaptation and voice conversion
- Aug
- O. Watts, J. Yamagishi, S. King, and K. Berkling, "Synthesis of child speech with HMM adaptation and voice conversion," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1005-1016, Aug. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.6 , pp. 1005-1016
- Watts, O.¹ Yamagishi, J.² King, S.³ Berkling, K.⁴

5
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- May 12-15 vol. 1
- A. Kain and M. W. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP'98, May 12-15, 1998, vol. 1, pp. 285-288, vol. 1.
- (1998) Proc. ICASSP'98 , vol.1 , pp. 285-288
- Kain, A.¹ MacOn, M.W.²

6
- 84994241109
- Including dynamic and phonetic information in voice conversion systems
- Jeju Island, South Korea
- H. Duxans, A. Bonafonte, A. Kain, and J. van Santen, "Including dynamic and phonetic information in voice conversion systems," in Proc. ICSLP '04, Jeju Island, South Korea, 2004, pp. 5-8.
- (2004) Proc. ICSLP '04 , pp. 5-8
- Duxans, H.¹ Bonafonte, A.² Kain, A.³ Van Santen, J.⁴

7
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Nov
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

8
- 77952978184
- Adaptive training for voice conversion based on eigenvoices
- Jun
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Adaptive training for voice conversion based on eigenvoices," IEICE Trans. Inf. Syst., vol. E93-D, no. 6, pp. 1589-1598, Jun. 2010.
- (2010) IEICE Trans. Inf. Syst , vol.E93-D , Issue.6 , pp. 1589-1598
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

9
- 34548216761
- Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion
- Sep
- C.-C. Hsia, C.-H. Wu, and J.-Q. Wu, "Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion," IEEE Trans. Comput., vol. 56, no. 9, pp. 1225-1233, Sep. 2007.
- (2007) IEEE Trans. Comput , vol.56 , Issue.9 , pp. 1225-1233
- Hsia, C.-C.¹ Wu, C.-H.² Wu, J.-Q.³

10
- 34047247202
- Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis
- DOI 10.1109/TASL.2006.876112
- C.-H. Wu, C.-C. Hsia, T.-H. Liu, and J.-F. Wang, "Voice conversion using duration-embedded Bi-HMMs for expressive speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1109-1116, Jul. 2006. (Pubitemid 46547608)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.4 , pp. 1109-1116
- Wu, C.-H.¹ Hsia, C.-C.² Liu, T.-H.³ Wang, J.-F.⁴

11
- 0026394044
- Speaker adaptation and voice conversion by codebook mapping
- Jun. 11-14
- K. Shikano, S. Nakamura, and M. Abe, "Speaker adaptation and voice conversion by codebook mapping," in Proc. IEEE Int. Symp. Circuits Syst., Jun. 11-14, 1991, vol. 1, pp. 594-597, vol., no.
- (1991) Proc. IEEE Int. Symp. Circuits Syst , vol.1 , pp. 594-597
- Shikano, K.¹ Nakamura, S.² Abe, M.³

12
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- Jul
- S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad, "Spectral mapping using artificial neural networks for voice conversion," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 954-964, Jul. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahallad, K.⁴

13
- 84946753271
- VTLN-based cross-language voice conversion
- 30 Nov.-3 Dec
- D. Sundermann, H. Ney, and H. Hoge, "VTLN-based cross-language voice conversion," in Proc. IEEE Workshop on ASRU'03, 30 Nov.-3 Dec. 2003, pp. 676-681.
- (2003) Proc. IEEE Workshop on ASRU'03 , pp. 676-681
- Sundermann, D.¹ Ney, H.² Hoge, H.³

14
- 85128407266
- Phonetic Alignment: Speech Synthesis vs. Hybrid HMM/ANN
- Sydney, Australia Dec
- F. Malfrere, O. Deroo, and T. Dutoit, "Phonetic Alignment: Speech Synthesis vs. Hybrid HMM/ANN," in Proc. ICSLP'98, Sydney, Australia, Dec. 1998, vol. 4, p. 1571.
- (1998) Proc. ICSLP'98 , vol.4 , pp. 1571
- Malfrere, F.¹ Deroo, O.² Dutoit, T.³

15
- 0030366724
- Autolabelling japanese ToBI
- Philadelphia, PA Oct
- N. Campbell, "Autolabelling Japanese ToBI," in Proc. ICSLP'96, Philadelphia, PA, Oct. 1996.
- (1996) Proc. ICSLP'96
- Campbell, N.¹

16
- 77955722263
- Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis
- Aug
- C.-H. Wu, C.-C. Hsia, C.-H. Lee, and M.-C. Lin, "Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1394-1405, Aug. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.6 , pp. 1394-1405
- Wu, C.-H.¹ Hsia, C.-C.² Lee, C.-H.³ Lin, M.-C.⁴

17
- 77956285048
- Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis
- Nov
- C.-C. Hsia, C.-H. Wu, and J.-Y. Wu, "Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1994-2003, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.8 , pp. 1994-2003
- Hsia, C.-C.¹ Wu, C.-H.² Wu, J.-Y.³

18
- 21844474040
- Fluent speech prosody: Framework and modeling
- DOI 10.1016/j.specom.2005.03.015, PII S0167639305000919, Quantitative Prosody Modelling for Natural Speech Description and Generation
- C.-Y. Tseng, S.-H. Pin, Y.-L. Lee, H. M. Wang, and Y. C. Chen, "Fluent Speech Prosody: Framework and Modeling," Speech Commun., Spec. Iss. Quantitative Prosody Modeling for Natural Speech Description and Generation, vol. 46, no. 3-4, pp. 284-309, 2005. (Pubitemid 40952517)
- (2005) Speech Communication , vol.46 , Issue.3-4 , pp. 284-309
- Tseng, C.-Y.¹ Pin, S.-H.² Lee, Y.³ Wang, H.-M.⁴ Chen, Y.-C.⁵

19
- 13544257213
- A statistics-based pitch contour model for Mandarin speech
- DOI 10.1121/1.1841572
- S.-H. Chen, W.-H. Lai, and Y.-R. Wang, "A statistics-based pitch contour model for mandarin speech," J. Acoust. Soc. Amer., vol. 117, no. 2, pp. 908-925, 2005. (Pubitemid 40223449)
- (2005) Journal of the Acoustical Society of America , vol.117 , Issue.2 , pp. 908-925
- Chen, S.-H.¹ Lai, W.-H.² Wang, Y.-R.³

20
- 4544354696
- Segmental tonal modeling for phone set design in mandarin LVCSR
- C. Huang, Y. Shi, J. L. Zhou, M. Chu, T. Wang, and E. Chang, "Segmental tonal modeling for phone set design in mandarin LVCSR," in Proc. ICASSP'04, 2004, pp. 901-904.
- (2004) Proc. ICASSP'04 , pp. 901-904
- Huang, C.¹ Shi, Y.² Zhou, J.L.³ Chu, M.⁴ Wang, T.⁵ Chang, E.⁶

21
- 0030677481
- Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited
- Munich, Germany
- H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited," in Proc. ICASSP'97, Munich, Germany, 1997, pp. 1303-1306.
- (1997) Proc. ICASSP'97 , pp. 1303-1306
- Kawahara, H.¹

22
- 21444431930
- Locating boundaries for prosodic constituents in unrestricted mandarin texts
- M. Chu and Y. Qian, "Locating boundaries for prosodic constituents in unrestricted mandarin texts," Comput. Linguist. Chinese Lang. Process., vol. 6, no. 1, pp. 61-82, 2001.
- (2001) Comput. Linguist. Chinese Lang. Process , vol.6 , Issue.1 , pp. 61-82
- Chu, M.¹ Qian, Y.²

23
- 0024736612
- The synthesis rules in a Chinese text-to-speech system
- Sep
- L.-S. Lee, C.-Y. Tseng, and M. Ouh-young, "The synthesis rules in a Chinese text-to-speech system," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 9, pp. 1309-1319, Sep. 1989.
- (1989) IEEE Trans. Acoust., Speech, Signal Process , vol.37 , Issue.9 , pp. 1309-1319
- Lee, L.-S.¹ Tseng, C.-Y.² Ouh-Young, M.³

24
- 84856036312
- A corpus-based Mandarin text-to-speech synthesizer
- A. Benijamin, S. Chilin, and S. Richard, "A corpus-based Mandarin text-to-speech synthesizer," in Proc. ICSLP, 1994, vol. S29, no. 8. 1-8. 4, pp. 1771-1774.
- (1994) Proc. ICSLP , vol.S29 , Issue.81-84 , pp. 1771-1774
- Benijamin, A.¹ Chilin, S.² Richard, S.³

25
- 70450171823
- Analysis and recognition of accentual patterns
- Wagner and Agnieszka, "Analysis and recognition of accentual patterns," in Proc. Interspeech'09, 2009, pp. 2427-2430, (2009).
- (2009) Proc. Interspeech'09 , vol.2009 , pp. 2427-2430
- Wagner¹ Agnieszka²

26
- 0022796218
- Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models
- Oct
- L. Andrej and F. Frank, "Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1074-1080, Oct. 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-34 , Issue.5 , pp. 1074-1080
- Andrej, L.¹ Frank, F.²

27
- 0034509204
- Prosody model in a Mandarin text-to-speech system based on a hierarchical approach
- N.-H. Pan, W.-T. Jen, S.-S. Yu, S.-Y. Huang, and M.-J. Wu, "Prosody model in a Mandarin text-to-speech system based on a hierarchical approach," in Proc. IEEE Int. Conf. Multimedia and Expo, 2000, vol. 1, pp. 448-451. (Pubitemid 33058980)
- (2000) IEEE International Conference on Multi-Media and Expo , Issue.IMONDAY , pp. 448-451
- Pan, N.-H.¹ Jen, W.-T.² Yu, S.-S.³ Yu, M.-S.⁴ Huang, S.-Y.⁵ Wu, M.-J.⁶

28
- 85009282418
- Pitch Contour Model for Chinese text-tospeech using CART and statistical model
- M. Dong and K.-T. Lua, "Pitch Contour Model for Chinese text-tospeech using CART and statistical model," in Proc. ICSLP, 2002, pp. 2405-2408.
- (2002) Proc. ICSLP , pp. 2405-2408
- Dong, M.¹ Lua, K.-T.²

29
- 0142192295
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. Int. Conf. Mach. Learn., 2001.
- (2001) Proc. Int. Conf. Mach. Learn
- Lafferty, J.¹ McCallum, A.² Pereira, F.³

30
- 84867923069
- Domain adaptation for conditional random fields
- New York: Springer
- Q. Zhang, X. Qiu, X. Huang, and L. Wu, "Domain Adaptation for Conditional Random Fields," in Information Retrieval Technology. New York: Springer, 2008.
- (2008) Information Retrieval Technology
- Zhang, Q.¹ Qiu, X.² Huang, X.³ Wu, L.⁴

31
- 33646887390
- On the limited memory BFGS method for large scale optimization
- D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization,"Math.Programming, ser. B, vol. 45, no. 3, pp. 503-528, 1989. (Pubitemid 20660315)
- (1989) Mathematical Programming, Series B , vol.45 , Issue.3 , pp. 503-528
- Liu Dong, C.¹ Nocedal Jorge²

32
- 0002629270
- Maximum likelihood from incomplete data via the em algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. B, vol. 39, pp. 1-38, 1977.
- (1977) J. R. Statist. Soc. B , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

33
- 84867197177
- Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
- Sep
- Z. H. Ling, K. Richmond, J. Yamagishi, and R. H. Wang, "Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008, pp. 573-576.
- (2008) Proc. Interspeech'08, Brisbane, Australia , pp. 573-576
- Ling, Z.H.¹ Richmond, K.² Yamagishi, J.³ Wang, R.H.⁴

34
- 44449179384
- TH-CoSS, aMandarin speech corpus for TTS
- Mar
- L. H. Cai, D. D. Cui, and R. Cai, "TH-CoSS, aMandarin speech corpus for TTS," J. Chinese Inf. Process., vol. 21, no. 2, pp. 94-99, Mar. 2007.
- (2007) J. Chinese Inf. Process , vol.21 , Issue.2 , pp. 94-99
- Cai, L.H.¹ Cui, D.D.² Cai, R.³

35
- 70350498327
- [Online]
- H. Zen, T. Nose, J. Yamagishi, S. Sako, and K. Tokuda, The HMM-based Speech Synthesis System (HTS) Version 2. 0 2007 [Online]. Available: http://hts. sp. nitech. ac. jp/
- (2007) The HMM-based Speech Synthesis System (HTS) Version 2. 0
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Tokuda, K.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.