메뉴 건너뛰기




Volumn 2015-August, Issue , 2015, Pages 4859-4863

Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion

Author keywords

GMM based voice conversion; modulation spectrum; over smoothing; training algorithm

Indexed keywords

COMPUTATIONAL EFFICIENCY; GAUSSIAN DISTRIBUTION; MODULATION; SPEECH COMMUNICATION; SPEECH PROCESSING;

EID: 84946033919     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2015.7178894     Document Type: Conference Paper
Times cited : (33)

References (27)
  • 1
    • 84905252904 scopus 로고    scopus 로고
    • An evaluation of excitation feature prediction in a hybrid approach to electro laryngeal speech enhancement
    • Florence, Italy, May
    • K. Tanaka, T. Toda, G. Neubig, S. Sakti, and S. Nakamura. An evaluation of excitation feature prediction in a hybrid approach to electro laryngeal speech enhancement. In Proc. ICASSP, pp. 4521-4525, Florence, Italy, May 2014
    • (2014) Proc. ICASSP , pp. 4521-4525
    • Tanaka, K.1    Toda, T.2    Neubig, G.3    Sakti, S.4    Nakamura, S.5
  • 2
    • 84905223321 scopus 로고    scopus 로고
    • Regression approaches to perceptual age control in singing voice conversion
    • Florence, Italy, May
    • K. Kobayashi, T. Toda, T. Nakano, M. Goto, G. Neubig, S. Sakti, and S. Nakamura. Regression approaches to perceptual age control in singing voice conversion. In Proc. ICASSP, pp. 7954-7958, Florence, Italy, May 2014
    • (2014) Proc. ICASSP , pp. 7954-7958
    • Kobayashi, K.1    Toda, T.2    Nakano, T.3    Goto, M.4    Neubig, G.5    Sakti, S.6    Nakamura, S.7
  • 3
    • 84865743435 scopus 로고    scopus 로고
    • Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speechto-speech translation
    • Florence, Italy, Aug
    • N. Hattori, T. Toda, H. Kawai, H. Saruwatari, and K. Shikano. Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speechto-speech translation. In Proc. INTERSPEECH, pp. 2769-2772, Florence, Italy, Aug. 2011
    • (2011) Proc. INTERSPEECH , pp. 2769-2772
    • Hattori, N.1    Toda, T.2    Kawai, H.3    Saruwatari, H.4    Shikano, K.5
  • 4
    • 84905281157 scopus 로고    scopus 로고
    • Can voice conversion be used to reduce non-native accents? in Proc
    • Florence, Italy, May
    • S. Aryal and R. G.-Osuna. Can voice conversion be used to reduce non-native accents? In Proc. ICASSP, Florence, Italy, May 2014
    • (2014) ICASSP
    • Aryal, S.1    Osuna, R.-G.2
  • 5
    • 84905252390 scopus 로고    scopus 로고
    • Voice conversion in time-invariant speaker independent space
    • Frorence, Italy, May
    • T. Nakashika, T. Takiguchi, and Y. Ariki. Voice conversion in time-invariant speaker independent space. In Proc. ICASSP, pp. 7939-7943, Frorence, Italy, May 2014
    • (2014) Proc. ICASSP , pp. 7939-7943
    • Nakashika, T.1    Takiguchi, T.2    Ariki, Y.3
  • 6
    • 84901803470 scopus 로고    scopus 로고
    • Exemplar-based voice conversion using non-negative spectrogram deconvolution
    • Catalunya, Spain, Aug
    • Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li. Exemplar-based voice conversion using non-negative spectrogram deconvolution. In Proc. 8th ISCA SSw, Catalunya, Spain, Aug. 2013
    • (2013) Proc. 8th ISCA SSw
    • Wu, Z.1    Virtanen, T.2    Kinnunen, T.3    Chng, E.S.4    Li, H.5
  • 7
    • 84856141218 scopus 로고    scopus 로고
    • Voice conversion using dynamic kernel partial least squares regression
    • Mar
    • E. Helander, H. Silen, T. Virtanen, and M. Gabbouj. Voice conversion using dynamic kernel partial least squares regression. IEEE Trans., Vol. 20, No.3, pp. 806-817, Mar. 2012
    • (2012) IEEE Trans , vol.20 , Issue.3 , pp. 806-817
    • Helander, E.1    Silen, H.2    Virtanen, T.3    Gabbouj, M.4
  • 9
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • T. Toda, A. W. Black, and K. Tokuda. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on AUdio, Speech and Language Processing; Vol. 15, No.8, pp. 2222-2235, 2007
    • (2007) IEEE Transactions on AUdio, Speech and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 10
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • Detroit, U.S.A, May
    • K. Tokuda, T. Kobayashi, and S. Imai. Speech parameter generation from HMM using dynamic features. In Proc. ICASSP, pp. 660-663, Detroit, U.S.A, May 1995
    • (1995) Proc. ICASSP , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 11
    • 84878390910 scopus 로고    scopus 로고
    • Implementation of conputationally efficient real-time voice conversion
    • Portland, Oregon, U.S., Sept
    • T. Toda, T. Muramatsu, and H. Banno. Implementation of conputationally efficient real-time voice conversion. In Proc. INTERSPEECH, Portland, Oregon, U.S., Sept. 2012
    • (2012) Proc. INTERSPEECH
    • Toda, T.1    Muramatsu, T.2    Banno, H.3
  • 12
    • 33749573927 scopus 로고    scopus 로고
    • Refomulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • Jan
    • H. Zen, K. Tokuda, and T. Kitamura. Refomulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech and Language, Vol. 21, No.1, pp. 153-173, Jan. 2007
    • (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 14
    • 67650826181 scopus 로고    scopus 로고
    • Trajectory training considering global variance for HMM-based speech synthesis
    • Taipei, Taiwan, Aug
    • T. Toda and S. Young. Trajectory training considering global variance for HMM-based speech synthesis. In Proc. ICASSP, pp. 4025-4028, Taipei, Taiwan, Aug. 2009
    • (2009) Proc. ICASSP , pp. 4025-4028
    • Toda, T.1    Young, S.2
  • 15
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMMbased speech synthesis
    • Istanbul, Turkey, June
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. Speech parameter generation algorithms for HMMbased speech synthesis. In Proc. ICASSP, pp. 1315-1318, Istanbul, Turkey, June 2000
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 16
    • 84893234191 scopus 로고    scopus 로고
    • Incorporating global variance in the training phase of GMM-based voice conversion
    • Kaohsiung, Taiwan, Oct
    • H. Hwang, Y. Tsao, H. Wang, Y. Wang, and S. Chen. Incorporating global variance in the training phase of GMM-based voice conversion. In Proc. APSIPA, pp. 1-6, Kaohsiung, Taiwan, Oct. 2013
    • (2013) Proc. APSIPA , pp. 1-6
    • Hwang, H.1    Tsao, Y.2    Wang, H.3    Wang, Y.4    Chen, S.5
  • 17
    • 84905234422 scopus 로고    scopus 로고
    • A postfilter to modify modulation spectrum in HMM-based speech synthesis
    • Florence, Italy, May
    • S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura. A postfilter to modify modulation spectrum in HMM-based speech synthesis. In Proc. ICASSP, pp. 290-294, Florence, Italy, May 2014
    • (2014) Proc. ICASSP , pp. 290-294
    • Takamichi, S.1    Toda, T.2    Neubig, G.3    Sakti, S.4    Nakamura, S.5
  • 18
    • 84867211725 scopus 로고    scopus 로고
    • Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Brisbane, Australia, Sep
    • T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano. Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. In Proc. INTERSPEECH, pp. 1076-1079, Brisbane, Australia, Sep.2008
    • (2008) Proc. INTERSPEECH , pp. 1076-1079
    • Muramatsu, T.1    Ohtani, Y.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 19
    • 78149260085 scopus 로고    scopus 로고
    • Continuous stochastic feature mapping based on trajectory HMMs
    • Jan
    • H. Zen, Y. Nankaku, and K. Tokuda. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans., Vol. 19, pp. 417-430, Jan. 2011
    • (2011) IEEE Trans , vol.19 , pp. 417-430
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3
  • 20
    • 0028287770 scopus 로고
    • Effect of reducing slow temporal modulations on speech reception
    • R. Drullman, J.M. Festen, and R. Plomp. Effect of reducing slow temporal modulations on speech reception. J Acoust. Soc. of America, Vol. 95, pp. 2670-2680, 1994
    • (1994) J Acoust. Soc. of America , vol.95 , pp. 2670-2680
    • Drullman, R.1    Festen, M.J.2    Plomp, R.3
  • 21
    • 70349212558 scopus 로고    scopus 로고
    • Phoneme recgnition using spectral envelop and modulation frequency features
    • Taipei, Taiwan, April
    • S. Thomas, S. Ganapathy, and H. Hermansky. Phoneme recgnition using spectral envelop and modulation frequency features. In Proc. ICASSP, pp. 4453-4456, Taipei, Taiwan, April 2009
    • (2009) Proc. ICASSP , pp. 4453-4456
    • Thomas, S.1    Ganapathy, S.2    Hermansky, H.3
  • 23
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STR AIGHT
    • Firentze, Italy, Sept
    • H. Kawahara, Jo Estill, and O. Fujimura. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STR AIGHT". In MAVEBA 2001, pp. 1-6, Firentze, Italy, Sept. 200 I
    • (2001) MAVEBA 2001 , pp. 1-6
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 24
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on GMM with STR AIGHT mixed excitation
    • Pittsburgh, U.S.A Sept
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano. Maximum likelihood voice conversion based on GMM with STR AIGHT mixed excitation. In Proc. INTERSPEECH, pp. 2266-2269, Pittsburgh, U.S.A" Sept. 2006
    • (2006) Proc. INTERSPEECH , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 25
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based FO extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, J. Masuda-Katsuse, and A. D. Cheveigne. Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based FO extraction: Possible role of a repetitive structure in sounds. Speech Commun., Vol. 27, No. 3-4, pp. 187-207, 1999
    • (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, J.2    Cheveigne, A.D.3
  • 26
    • 34547496175 scopus 로고    scopus 로고
    • One-to-many and many-toone voice conversion based on eigenvoices
    • Hawaii, U.S.A Apr
    • T. Toda, Y. Ohtani, and K. Shikano. One-to-many and many-toone voice conversion based on eigenvoices. In Proc. ICASSP, pp. 1249-1252, Hawaii, U.S.A" Apr. 2007
    • (2007) Proc. ICASSP , pp. 1249-1252
    • Toda, T.1    Ohtani, Y.2    Shikano, K.3
  • 27
    • 70450194389 scopus 로고    scopus 로고
    • Many-tomany eigenvoice conversion with reference voice
    • Brington U.K., Sep
    • Y. Ohtani, T. Toda, H. Saruwatari, and S. Shikano. Many-tomany eigenvoice conversion with reference voice. In Proc. INTERSPEECH, pp. 1623-1626, Brington U.K., Sep. 2009
    • (2009) Proc. INTERSPEECH , pp. 1623-1626
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.