SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4859-4863

Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion

(4) Takamichi, Shinnosuke a,b Toda, Tomoki a Black, Alan W b Nakamura, Satoshi a

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

b Carnegie Mellon University ^* (United States)

Author keywords

GMM based voice conversion; modulation spectrum; over smoothing; training algorithm

Indexed keywords

COMPUTATIONAL EFFICIENCY; GAUSSIAN DISTRIBUTION; MODULATION; SPEECH COMMUNICATION; SPEECH PROCESSING;

COMPUTATIONALLY EFFICIENT; CONSTRAINED TRAJECTORIES; GAUSSIAN MIXTURE MODEL; MODULATION SPECTRUM; OPTIMIZATION CRITERIA; OVER-SMOOTHING; TRAINING ALGORITHMS; VOICE CONVERSION;

AUDIO SIGNAL PROCESSING;

EID: 84946033919 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178894 Document Type: Conference Paper

Times cited : (33)

References (27)

1
- 84905252904
- An evaluation of excitation feature prediction in a hybrid approach to electro laryngeal speech enhancement
- Florence, Italy, May
- K. Tanaka, T. Toda, G. Neubig, S. Sakti, and S. Nakamura. An evaluation of excitation feature prediction in a hybrid approach to electro laryngeal speech enhancement. In Proc. ICASSP, pp. 4521-4525, Florence, Italy, May 2014
- (2014) Proc. ICASSP , pp. 4521-4525
- Tanaka, K.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

2
- 84905223321
- Regression approaches to perceptual age control in singing voice conversion
- Florence, Italy, May
- K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, and S. Nakamura. Regression approaches to perceptual age control in singing voice conversion. In Proc. ICASSP, pp. 7954-7958, Florence, Italy, May 2014
- (2014) Proc. ICASSP , pp. 7954-7958
- Kobayashi, K.¹ Toda, T.² Nakano, T.³ Goto, M.⁴ Neubig, G.⁵ Sakti, S.⁶ Nakamura, S.⁷

3
- 84865743435
- Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speechto-speech translation
- Florence, Italy, Aug
- N. Hattori, T. Toda, H. Kawai, H. Saruwatari, and K. Shikano. Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speechto-speech translation. In Proc. INTERSPEECH, pp. 2769-2772, Florence, Italy, Aug. 2011
- (2011) Proc. INTERSPEECH , pp. 2769-2772
- Hattori, N.¹ Toda, T.² Kawai, H.³ Saruwatari, H.⁴ Shikano, K.⁵

4
- 84905281157
- Can voice conversion be used to reduce non-native accents? in Proc
- Florence, Italy, May
- S. Aryal and R. G.-Osuna. Can voice conversion be used to reduce non-native accents? In Proc. ICASSP, Florence, Italy, May 2014
- (2014) ICASSP
- Aryal, S.¹ Osuna, R.-G.²

5
- 84905252390
- Voice conversion in time-invariant speaker independent space
- Frorence, Italy, May
- T. Nakashika, T. Takiguchi, and Y. Ariki. Voice conversion in time-invariant speaker independent space. In Proc. ICASSP, pp. 7939-7943, Frorence, Italy, May 2014
- (2014) Proc. ICASSP , pp. 7939-7943
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

6
- 84901803470
- Exemplar-based voice conversion using non-negative spectrogram deconvolution
- Catalunya, Spain, Aug
- Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li. Exemplar-based voice conversion using non-negative spectrogram deconvolution. In Proc. 8th ISCA SSw, Catalunya, Spain, Aug. 2013
- (2013) Proc. 8th ISCA SSw
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.S.⁴ Li, H.⁵

7
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- Mar
- E. Helander, H. Silen, T. Virtanen, and M. Gabbouj. Voice conversion using dynamic kernel partial least squares regression. IEEE Trans., Vol. 20, No.3, pp. 806-817, Mar. 2012
- (2012) IEEE Trans , vol.20 , Issue.3 , pp. 806-817
- Helander, E.¹ Silen, H.² Virtanen, T.³ Gabbouj, M.⁴

8
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar
- Y. Stylianou, O. Cappe, and E. Moulines. Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Processing, Vol. 6, No.2, pp. 131-142, Mar. 1988
- (1988) IEEE Trans. Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

9
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on AUdio, Speech and Language Processing; Vol. 15, No.8, pp. 2222-2235, 2007
- (2007) IEEE Transactions on AUdio, Speech and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

10
- 0028996993
- Speech parameter generation from HMM using dynamic features
- Detroit, U.S.A, May
- K. Tokuda, T. Kobayashi, and S. Imai. Speech parameter generation from HMM using dynamic features. In Proc. ICASSP, pp. 660-663, Detroit, U.S.A, May 1995
- (1995) Proc. ICASSP , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

11
- 84878390910
- Implementation of conputationally efficient real-time voice conversion
- Portland, Oregon, U.S., Sept
- T. Toda, T. Muramatsu, and H. Banno. Implementation of conputationally efficient real-time voice conversion. In Proc. INTERSPEECH, Portland, Oregon, U.S., Sept. 2012
- (2012) Proc. INTERSPEECH
- Toda, T.¹ Muramatsu, T.² Banno, H.³

12
- 33749573927
- Refomulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- Jan
- H. Zen, K. Tokuda, and T. Kitamura. Refomulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech and Language, Vol. 21, No.1, pp. 153-173, Jan. 2007
- (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

13
- 84876687945
- Speech synthesis based on hidden Markov models
- K. Tokuda, Y Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura. Speech synthesis based on hidden Markov models. Proceedings of the IEEE, Vol. 101, No.5, pp. 1234-1252, 2013
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

14
- 67650826181
- Trajectory training considering global variance for HMM-based speech synthesis
- Taipei, Taiwan, Aug
- T. Toda and S. Young. Trajectory training considering global variance for HMM-based speech synthesis. In Proc. ICASSP, pp. 4025-4028, Taipei, Taiwan, Aug. 2009
- (2009) Proc. ICASSP , pp. 4025-4028
- Toda, T.¹ Young, S.²

15
- 0033708106
- Speech parameter generation algorithms for HMMbased speech synthesis
- Istanbul, Turkey, June
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. Speech parameter generation algorithms for HMMbased speech synthesis. In Proc. ICASSP, pp. 1315-1318, Istanbul, Turkey, June 2000
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

16
- 84893234191
- Incorporating global variance in the training phase of GMM-based voice conversion
- Kaohsiung, Taiwan, Oct
- H. Hwang, Y. Tsao, H. Wang, Y. Wang, and S. Chen. Incorporating global variance in the training phase of GMM-based voice conversion. In Proc. APSIPA, pp. 1-6, Kaohsiung, Taiwan, Oct. 2013
- (2013) Proc. APSIPA , pp. 1-6
- Hwang, H.¹ Tsao, Y.² Wang, H.³ Wang, Y.⁴ Chen, S.⁵

17
- 84905234422
- A postfilter to modify modulation spectrum in HMM-based speech synthesis
- Florence, Italy, May
- S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura. A postfilter to modify modulation spectrum in HMM-based speech synthesis. In Proc. ICASSP, pp. 290-294, Florence, Italy, May 2014
- (2014) Proc. ICASSP , pp. 290-294
- Takamichi, S.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

18
- 84867211725
- Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Brisbane, Australia, Sep
- T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano. Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. In Proc. INTERSPEECH, pp. 1076-1079, Brisbane, Australia, Sep.2008
- (2008) Proc. INTERSPEECH , pp. 1076-1079
- Muramatsu, T.¹ Ohtani, Y.² Toda, T.³ Saruwatari, H.⁴ Shikano, K.⁵

19
- 78149260085
- Continuous stochastic feature mapping based on trajectory HMMs
- Jan
- H. Zen, Y. Nankaku, and K. Tokuda. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans., Vol. 19, pp. 417-430, Jan. 2011
- (2011) IEEE Trans , vol.19 , pp. 417-430
- Zen, H.¹ Nankaku, Y.² Tokuda, K.³

20
- 0028287770
- Effect of reducing slow temporal modulations on speech reception
- R. Drullman, J.M. Festen, and R. Plomp. Effect of reducing slow temporal modulations on speech reception. J Acoust. Soc. of America, Vol. 95, pp. 2670-2680, 1994
- (1994) J Acoust. Soc. of America , vol.95 , pp. 2670-2680
- Drullman, R.¹ Festen, M.J.² Plomp, R.³

21
- 70349212558
- Phoneme recgnition using spectral envelop and modulation frequency features
- Taipei, Taiwan, April
- S. Thomas, S. Ganapathy, and H. Hermansky. Phoneme recgnition using spectral envelop and modulation frequency features. In Proc. ICASSP, pp. 4453-4456, Taipei, Taiwan, April 2009
- (2009) Proc. ICASSP , pp. 4453-4456
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

22
- 33646773080
- Tech. Rep. CMULTI-03-177, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, U.S.A
- J. Kominek and A. W Black. The CMU ARCTIC speech databases for speech synthesis research. In Tech. Rep. CMULTI-03-177, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, U.S.A, 2003
- (2003) The CMU ARCTIC Speech Databases for Speech Synthesis Research
- Kominek, J.¹ Black, A.W.²

23
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STR AIGHT
- Firentze, Italy, Sept
- H. Kawahara, Jo Estill, and O. Fujimura. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STR AIGHT". In MAVEBA 2001, pp. 1-6, Firentze, Italy, Sept. 200 I
- (2001) MAVEBA 2001 , pp. 1-6
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

24
- 44949143155
- Maximum likelihood voice conversion based on GMM with STR AIGHT mixed excitation
- Pittsburgh, U.S.A Sept
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano. Maximum likelihood voice conversion based on GMM with STR AIGHT mixed excitation. In Proc. INTERSPEECH, pp. 2266-2269, Pittsburgh, U.S.A" Sept. 2006
- (2006) Proc. INTERSPEECH , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

25
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based FO extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, J. Masuda-Katsuse, and A. D. Cheveigne. Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based FO extraction: Possible role of a repetitive structure in sounds. Speech Commun., Vol. 27, No. 3-4, pp. 187-207, 1999
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, J.² Cheveigne, A.D.³

26
- 34547496175
- One-to-many and many-toone voice conversion based on eigenvoices
- Hawaii, U.S.A Apr
- T. Toda, Y. Ohtani, and K. Shikano. One-to-many and many-toone voice conversion based on eigenvoices. In Proc. ICASSP, pp. 1249-1252, Hawaii, U.S.A" Apr. 2007
- (2007) Proc. ICASSP , pp. 1249-1252
- Toda, T.¹ Ohtani, Y.² Shikano, K.³

27
- 70450194389
- Many-tomany eigenvoice conversion with reference voice
- Brington U.K., Sep
- Y. Ohtani, T. Toda, H. Saruwatari, and S. Shikano. Many-tomany eigenvoice conversion with reference voice. In Proc. INTERSPEECH, pp. 1623-1626, Brington U.K., Sep. 2009
- (2009) Proc. INTERSPEECH , pp. 1623-1626
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.