메뉴 건너뛰기




Volumn 20, Issue 9, 2012, Pages 2505-2517

Statistical voice conversion techniques for body-conducted unvoiced speech enhancement

Author keywords

body conducted unvoiced speech; nonaudible murmur; Silent speech; voice conversion; whispered voice

Indexed keywords

ACOUSTIC FEATURES; CONVERSION MODEL; EXPERIMENTAL EVALUATION; FUNDAMENTAL FREQUENCY CONTOUR; GAUSSIAN MIXTURE MODELS; NON-AUDIBLE MURMUR; SPEECH SOUNDS; STATISTICAL APPROACH; UNVOICED SPEECH; VOICE CONVERSION; VOICE CONVERSION TECHNIQUES;

EID: 84865698185     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2205241     Document Type: Article
Times cited : (188)

References (32)
  • 2
    • 84890487256 scopus 로고    scopus 로고
    • Adaptation for soft whisper recognition using a throat microphone
    • Jeju Island, Korea Sep.
    • S.-C. Jou, T. Schultz, and A. Waibel, "Adaptation for soft whisper recognition using a throat microphone," in Proc. INTERSPEECH, Jeju Island, Korea, Sep. 2004, pp. 1493-1496.
    • (2004) Proc. INTERSPEECH , pp. 1493-1496
    • Jou, S.-C.1    Schultz, T.2    Waibel, A.3
  • 3
    • 76849099234 scopus 로고    scopus 로고
    • Modeling coarticulation in emg-based continuous speech recognition
    • T. Schultz and M. Wand, "Modeling coarticulation in EMG-based continuous speech recognition," Speech Commun., vol. 52, no. 4, pp. 341-353, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.4 , pp. 341-353
    • Schultz, T.1    Wand, M.2
  • 4
    • 76849104115 scopus 로고    scopus 로고
    • Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips
    • T. Hueber, E.-L. Benaroya, G. Chollet, B. Denby, G. Dreyfus, and M. Stone, "Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips," Speech Commun., vol. 52, no. 4, pp. 288-300, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.4 , pp. 288-300
    • Hueber, T.1    Benaroya, E.-L.2    Chollet, G.3    Denby, B.4    Dreyfus, G.5    Stone, M.6
  • 5
    • 38649090114 scopus 로고    scopus 로고
    • Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling
    • DOI 10.1016/j.specom.2007.09.002, PII S0167639307001501
    • A. Subramanya, Z. Zhang, Z. Liu, and A. Acero, "Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling," Speech Commun., vol. 50, no. 3, pp. 228-243, 2008. (Pubitemid 351172472)
    • (2008) Speech Communication , vol.50 , Issue.3 , pp. 228-243
    • Subramanya, A.1    Zhang, Z.2    Liu, Z.3    Acero, A.4
  • 8
    • 70450162011 scopus 로고    scopus 로고
    • Technologies for processing body-conducted speech detected with non-audible murmur microphone
    • Brighton, U.K. Sep.
    • T. Toda, K. Nakamura, T. Nagai, T. Kaino, Y. Nakajima, and K. Shikano, "Technologies for processing body-conducted speech detected with non-audible murmur microphone," in Proc. INTERSPEECH, Brighton, U.K., Sep. 2009, pp. 632-635.
    • (2009) Proc. INTERSPEECH , pp. 632-635
    • Toda, T.1    Nakamura, K.2    Nagai, T.3    Kaino, T.4    Nakajima, Y.5    Shikano, K.6
  • 9
    • 77955720028 scopus 로고    scopus 로고
    • Analysis and recognition of namspeech using hmm distances and visual information
    • Aug
    • P. Heracleous, V.-A. Tran, T. Nagai, and K. Shikano, "Analysis and recognition of NAMspeech using HMM distances and visual information," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 6, pp. 1528-1538, Aug. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.6 , pp. 1528-1538
    • Heracleous, P.1    Tran, V.-A.2    Nagai, T.3    Shikano, K.4
  • 11
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 12
    • 33745214435 scopus 로고    scopus 로고
    • NAM-to-speech conversion with Gaussian mixture models
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • T. Toda and K. Shikano, "NAM-to-speech conversion with Gaussian mixture models," in Proc. INTERSPEECH, Lisbon, Portugal, Sep. 2005, pp. 1957-1960. (Pubitemid 43908472)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 1957-1960
    • Toda, T.1    Shikano, K.2
  • 13
    • 44949187612 scopus 로고    scopus 로고
    • Improving body transmitted unvoiced speech with statistical voice conversion
    • Pittsburgh, PA Sep.
    • M. Nakagiri, T. Toda, H. Saruwatari, and K. Shikano, "Improving body transmitted unvoiced speech with statistical voice conversion," in Proc. INTERSPEECH, Pittsburgh, PA, Sep. 2006, pp. 2270-2273.
    • (2006) Proc. INTERSPEECH , pp. 2270-2273
    • Nakagiri, M.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 14
    • 70349200844 scopus 로고    scopus 로고
    • Voice conversion for various types of body transmitted speech
    • Taipei, Taiwan Apr.
    • T. Toda, K. Nakamura, H. Sekimoto, and K. Shikano, "Voice conversion for various types of body transmitted speech," in Proc. ICASSP, Taipei, Taiwan, Apr. 2009, pp. 3601-3604.
    • (2009) Proc. ICASSP , pp. 3601-3604
    • Toda, T.1    Nakamura, K.2    Sekimoto, H.3    Shikano, K.4
  • 15
    • 76849105528 scopus 로고    scopus 로고
    • Improvement to a nam-captured whisper-to-speech system
    • V.-A. Tran, G. Bailly, H. Loevenbruck, and T. Toda, "Improvement to a NAM-captured whisper-to-speech system," Speech Commun., vol. 52, no. 4, pp. 314-326, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.4 , pp. 314-326
    • Tran, V.-A.1    Bailly, G.2    Loevenbruck, H.3    Toda, T.4
  • 17
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis
    • Seattle, WA May
    • A. Kain and M. W. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. ICASSP, Seattle, WA, May 1998, pp. 285-288.
    • (1998) Proc. ICASSP , pp. 285-288
    • Kain, A.1    Macon, M.W.2
  • 18
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for hmm-based speech synthesis
    • Istanbul, Turkey Jun.
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, Istanbul, Turkey, Jun. 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 19
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 20
    • 38649140222 scopus 로고    scopus 로고
    • Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
    • DOI 10.1016/j.specom.2007.09.001, PII S0167639307001495
    • T. Toda, A. W. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model," Speech Commun., vol. 50, no. 3, pp. 215-227, 2008. (Pubitemid 351172471)
    • (2008) Speech Communication , vol.50 , Issue.3 , pp. 215-227
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 21
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • Firenze, Italy Sep.
    • H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in Proc. MAVEBA, Firenze, Italy, Sep. 2001.
    • (2001) Proc. MAVEBA
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 22
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on gmm with straight mixed excitation
    • Pittsburgh, PA Sep.
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation," in Proc. INTERSPEECH, Pittsburgh, PA, Sep. 2006, pp. 2266-2269.
    • (2006) Proc. INTERSPEECH , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 24
    • 84865690183 scopus 로고    scopus 로고
    • JNAS: Japanese Newspaper Article Sentences [Online]. Available:
    • JNAS: Japanese Newspaper Article Sentences, [Online]. Available: http://www.milab.is.tsukuba.ac.jp/jnas/instruct.html
  • 25
    • 85131821539 scopus 로고
    • Mel-generalized cepstral analysis a unified approach to speech spectral estimation
    • Yokohama, Japan Sep.
    • K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis -A unified approach to speech spectral estimation," in Proc. ICSLP, Yokohama, Japan, Sep. 1994, pp. 1043-1045.
    • (1994) Proc. ICSLP , pp. 1043-1045
    • Tokuda, K.1    Kobayashi, T.2    Masuko, T.3    Imai, S.4
  • 26
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 27
    • 84928118106 scopus 로고    scopus 로고
    • Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of and periodicity
    • Budapest, Hungary Sep.
    • H. Kawahara, H. Katayose, A. de Cheveigné, and R. D. Patterson, "Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of and periodicity," in Proc. EUROSPEECH, Budapest, Hungary, Sep. 1999, pp. 2781-2784.
    • (1999) Proc. EUROSPEECH , pp. 2781-2784
    • Kawahara, H.1    Katayose, H.2    De Cheveigné, A.3    Patterson, R.D.4
  • 28
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • San Francisco, CA Mar.
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP, San Francisco, CA, Mar. 1992, vol. 1, pp. 137-140.
    • (1992) Proc. ICASSP , vol.1 , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 29
    • 84867211725 scopus 로고    scopus 로고
    • Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Brisbane, Australia Sep.
    • T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory," in Proc. INTERSPEECH, Brisbane, Australia, Sep. 2008, pp. 1076-1079.
    • (2008) Proc. INTERSPEECH , pp. 1076-1079
    • Muramatsu, T.1    Ohtani, Y.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 30
    • 70349223901 scopus 로고    scopus 로고
    • Acoustic compensation methods for body transmitted speech conversion
    • Taipei, Taiwan Apr.
    • D. Miyamoto, K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Acoustic compensation methods for body transmitted speech conversion," in Proc. ICASSP, Taipei, Taiwan, Apr. 2009, pp. 3901-3904.
    • (2009) Proc. ICASSP , pp. 3901-3904
    • Miyamoto, D.1    Nakamura, K.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 31
    • 34547496175 scopus 로고    scopus 로고
    • One-to-many and many-to-one voice conversion based on eigenvoices
    • Apr.
    • T. Toda, Y. Ohtani, and K. Shikano, "One-to-many and many-to-one voice conversion based on eigenvoices," in Proc. ICASSP, Apr. 2007, pp. 1249-1252.
    • (2007) Proc. ICASSP , pp. 1249-1252
    • Toda, T.1    Ohtani, Y.2    Shikano, K.3
  • 32
    • 70450194389 scopus 로고    scopus 로고
    • Many-to-many eigenvoice conversion with reference voice
    • Brighton, U.K. Sep.
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Many-to-many eigenvoice conversion with reference voice," in Proc. INTERSPEECH, Brighton, U.K., Sep. 2009, pp. 1623-1626.
    • (2009) Proc. INTERSPEECH , pp. 1623-1626
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.