SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 6, 2012, Pages 1784-1794

Statistical voice conversion based on noisy channel model

(4) Saito, Daisuke a Watanabe, Shinji b,c Nakamura, Atsushi b Minematsu, Nobuaki a

a UNIVERSITY OF TOKYO (Japan)

b Nippon Telegraph and Telephone Corporation (Japan)

c MITSUBISHI ELECTRIC RESEARCH LABORATORIES (United States)

Author keywords

Joint density model; noisy channel model; probabilistic integration; speaker model; voice conversion (VC)

Indexed keywords

JOINT DENSITIES; NOISY CHANNEL; PROBABILISTIC INTEGRATION; SPEAKER MODEL; VOICE CONVERSION;

SPEECH RECOGNITION;

SPEECH PROCESSING;

EID: 84859768504 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2188628 Document Type: Article

Times cited : (28)

References (37)

1
- 34047254509
- Quality-enhanced voice morphing using maximum likelihood transformations
- DOI 10.1109/TSA.2005.860839
- H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1301-1312, Jul. 2006. (Pubitemid 46547625)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.4 , pp. 1301-1312
- Ye, H.¹ Young, S.²

2
- 0032026483
- Continuous probabilistic transform for voice conversion
- PII S1063667698017386
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998. (Pubitemid 128720639)
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

3
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- A. Kain and M. W. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP, 1998, vol. 1, pp. 285-288.
- (1998) Proc. ICASSP , vol.1 , pp. 285-288
- Kain, A.¹ MacOn, M.W.²

4
- 0034855352
- High-performance robust speech recognition using stereo training data
- L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data," in Proc. ICASSP, 2001, pp. 301-304. (Pubitemid 32839247)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 301-304
- Deng, L.¹ Acero, A.² Jiang, L.³ Droppo, J.⁴ Huang, X.⁵

5
- 0036291376
- Uncertainty decoding with SPLICE for noise robust speech recognition
- J. Droppo, A. Acero, and L. Deng, "Uncertainty decoding with SPLICE for noise robust speech recognition," in Proc. ICASSP, 2002, pp. 57-60.
- (2002) Proc. ICASSP , pp. 57-60
- Droppo, J.¹ Acero, A.² Deng, L.³

6
- 0033692729
- Narrowband to wideband conversion of speech using GMM based transformation
- K. Y. Park and H. S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," in Proc. ICASSP, 2000, pp. 1847-1850.
- (2000) Proc. ICASSP , pp. 1847-1850
- Park, K.Y.¹ Kim, H.S.²

7
- 44949265538
- Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech," in Proc. Interspeech, 2006, pp. 1395-1398.
- (2006) Proc. Interspeech , pp. 1395-1398
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

8
- 70450192197
- Speech generation from hand gestures based on space mapping
- A. Kunikoshi, Y. Qiao, N. Minematsu, and K. Hirose, "Speech generation from hand gestures based on space mapping," in Proc. Interspeech, 2009, pp. 308-311.
- (2009) Proc. Interspeech , pp. 308-311
- Kunikoshi, A.¹ Qiao, Y.² Minematsu, N.³ Hirose, K.⁴

9
- 84984984625
- Voice conversion using pitch shifting algorithm by time stretching with PSOLA and re-sampling
- A. Mousa, "Voice conversion using pitch shifting algorithm by time stretching with PSOLA and re-sampling," J. Elect. Eng., vol. 61, no. 1, pp. 57-61, 2010.
- (2010) J. Elect. Eng. , vol.61 , Issue.1 , pp. 57-61
- Mousa, A.¹

10
- 0029254176
- Transformation of formants for voice conversion using artificial neural networks
- M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants for voice conversion using artificial neural networks," Speech Commun., vol. 16, no. 2, pp. 207-216, 1995.
- (1995) Speech Commun. , vol.16 , Issue.2 , pp. 207-216
- Narendranath, M.¹ Murthy, H.A.² Rajendran, S.³ Yegnanarayana, B.⁴

11
- 70349197691
- Voice conversion using artificial neural networks
- S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, "Voice conversion using artificial neural networks," in Proc. ICASSP, 2009, pp. 3893-3896.
- (2009) Proc. ICASSP , pp. 3893-3896
- Desai, S.¹ Raghavendra, E.V.² Yegnanarayana, B.³ Black, A.W.⁴ Prahallad, K.⁵

12
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. ICASSP, 1988, pp. 655-658. (Pubitemid 18666100)
- (1988) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , pp. 655-658
- Abe Masanobu¹ Nakamura Satoshi² Shikano Kiyohiro³ Kuwabara Hisao⁴

13
- 34047245444
- Nonparallel training for voice conversion based on a parameter adaptation approach
- DOI 10.1109/TSA.2005.857790
- A. Mouchtaris, J. V. der Spiegel, and P. Mueller, "Nonparallel training for voice conversion based on a parameter adaptation approach," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 3, pp. 952-963, Mar. 2006. (Pubitemid 46547656)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 952-963
- Mouchtaris, A.¹ Van Der Spiegel, J.² Mueller, P.³

14
- 44949210554
- Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
- C. H. Lee and C. H.Wu, "Map-based adaptation for speech conversion using adaptation data selection and non-parallel training," in Proc. Interspeech, 2006, pp. 2254-2257.
- (2006) Proc. Interspeech , pp. 2254-2257
- Lee, C.H.¹ Wu, C.H.²

15
- 34547512822
- Eigenvoice conversion based on Gaussian mixture model
- T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on Gaussian mixture model," in Proc. Interspeech, 2006, pp. 2446-2449.
- (2006) Proc. Interspeech , pp. 2446-2449
- Toda, T.¹ Ohtani, Y.² Shikano, K.³

16
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
- (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

17
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition," Comput. Speech Lang., vol. 12, pp. 75-98, 1998. (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

18
- 0026142334
- A study on speaker adaptation of the parameters of continuous density hidden Markov models
- Apr.
- C.-H. Lee, C.-H. Lin, and B.-H. Juang, "A study on speaker adaptation of the parameters of continuous density hidden Markov models," IEEE Trans. Audio, Speech, Lang. Process., vol. 39, no. 4, pp. 806-814, Apr. 1991.
- (1991) IEEE Trans. Audio, Speech, Lang. Process. , vol.39 , Issue.4 , pp. 806-814
- Lee, C.-H.¹ Lin, C.-H.² Juang, B.-H.³

19
- 0034320005
- Rapid speaker adaptation in eigenvoice space
- DOI 10.1109/89.876308
- R. Kuhn, J.-C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid speaker adaptation in eigenvoice space," IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 695-707, Nov. 2000. (Pubitemid 32025317)
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.6 , pp. 695-707
- Kuhn, R.¹ Junqua, J.-C.² Nguyen, P.³ Niedzielski, N.⁴

20
- 0016939124
- Continuous speech recognition by statistical methods
- F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE, vol. 64, no. 4, pp. 532-556, Apr. 1976. (Pubitemid 8019230)
- (1976) Proceedings of the IEEE , vol.64 , Issue.4 , pp. 532-556
- Jelinek, F.¹

21
- 84936823635
- A statistical approach to machine translation
- P. F. Brown, J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin, "A statistical approach to machine translation," Comput. Linguist., vol. 16, no. 2, pp. 79-85, 1990.
- (1990) Comput. Linguist. , vol.16 , Issue.2 , pp. 79-85
- Brown, P.F.¹ Cocke, J.² Della Pietra, S.A.³ Della Pietra, V.J.⁴ Jelinek, F.⁵ Lafferty, J.D.⁶ Mercer, R.L.⁷ Roossin, P.S.⁸

22
- 0033884858
- Speaker verification using adapted Gaussian mixture models
- DOI 10.1006/dspr.1999.0361
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, no. 1-3, pp. 19-41, 2000. (Pubitemid 30592166)
- (2000) Digital Signal Processing: A Review Journal , vol.10 , Issue.1 , pp. 19-41
- Reynolds, D.A.¹ Quatieri, T.F.² Dunn, R.B.³

23
- 58349106697
- A study of interspeaker variability in speaker verification
- Jul.
- P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel, "A study of interspeaker variability in speaker verification," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 980-988, Jul. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.5 , pp. 980-988
- Kenny, P.¹ Ouellet, P.² Dehak, N.³ Gupta, V.⁴ Dumouchel, P.⁵

24
- 79959834571
- Probabilistic integration of joint density model and speaker model for voice conversion
- D. Saito, S.Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion," in Proc. Interspeech, 2010, pp. 1728-1731.
- (2010) Proc. Interspeech , pp. 1728-1731
- Saito, D.¹ Watanabe, S.² Nakamura, A.³ Minematsu, N.⁴

25
- 0022667694
- Speaker-independent isolated word recognition using dynamic features of speech spectrum
- S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 1, pp. 52-59, Feb.. 1986. (Pubitemid 16575387)
- (1986) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-34 , Issue.1 , pp. 52-59
- Furui Sadaoki¹

26
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

27
- 0028996993
- Speech parameter generation from HMM using dynamic features
- K. Tokuda, T. Kobaayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features," in Proc. ICASSP, 1995, pp. 660-663.
- (1995) Proc. ICASSP , pp. 660-663
- Tokuda, K.¹ Kobaayashi, T.² Imai, S.³

28
- 84905560807
- Voice conversion with smoothedGMM and MAP adaptation
- Y. Chen, M. Chu, E. Chang, J. Jiu, and R. Liu, "Voice conversion with smoothedGMM and MAP adaptation," in Proc. Eurospeech, 2003, pp. 2413-2416.
- (2003) Proc. Eurospeech , pp. 2413-2416
- Chen, Y.¹ Chu, M.² Chang, E.³ Jiu, J.⁴ Liu, R.⁵

29
- 33646773080
- CMU ARCTIC databases for speech synthesis
- Carnegie Mellon Univ., Pittsburgh, PA, [Online]
- J. Kominek and A. W. Black, "CMU ARCTIC databases for speech synthesis," Lang. Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, 2003[Online].Available: http://festvox.org/cmu-arctic/index.html
- (2003) Lang. Technol. Inst.
- Kominek, J.¹ Black, A.W.²

30
- 0025475528
- ATR Japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, pp. 357-363, 1990.
- (1990) Speech Commun. , vol.9 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

31
- 85133439657
- An introduction of trajectory model into HMM-based speech synthesis
- H. Zen, K. Tokuda, and T. Kitamura, "An introduction of trajectory model into HMM-based speech synthesis," in Proc. 5th ISCA Speech Synth. Workshop, 2004, pp. 191-196.
- (2004) Proc. 5th ISCA Speech Synth. Workshop , pp. 191-196
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

32
- 3042741069
- Variational Bayesian estimation and clustering for speech recognition
- Jul.
- S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, "Variational Bayesian estimation and clustering for speech recognition," IEEE Trans. Speech Audio Process., vol. 12, no. 4, pp. 365-381, Jul. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.4 , pp. 365-381
- Watanabe, S.¹ Minami, Y.² Nakamura, A.³ Ueda, N.⁴

33
- 80051615070
- High accurate model-integration-based voice conversion using dynamic features and model structure optimization
- D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "High accurate model-integration-based voice conversion using dynamic features and model structure optimization," in Proc. ICASSP, 2011, pp. 4576-4579.
- (2011) Proc. ICASSP , pp. 4576-4579
- Saito, D.¹ Watanabe, S.² Nakamura, A.³ Minematsu, N.⁴

34
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

35
- 84994241109
- Including dynamic and phonetic information in voice conversion systems
- H. Duxans, A. Bonafonte, A. Kain, and J. Van Santen, "Including dynamic and phonetic information in voice conversion systems," in Proc. ICSLP, 2004, pp. 1193-1196.
- (2004) Proc. ICSLP , pp. 1193-1196
- Duxans, H.¹ Bonafonte, A.² Kain, A.³ Van Santen, J.⁴

36
- 34047247202
- Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
- Jul.
- C. H. Wu, C. C. Hsia, T. H. Liu, and J. F. Wang, "Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1109-1116, Jul. 2004.
- (2004) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1109-1116
- Wu, C.H.¹ Hsia, C.C.² Liu, T.H.³ Wang, F.J.⁴

37
- 77956285048
- Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis
- Nov.
- C.-C. Hsia, C.-H.Wu, and J.-Y.Wu, "Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1994-2003, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 1994-2003
- Hsia, C.-C.¹ Wu, C.-H.² Wu, J.-Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.