메뉴 건너뛰기




Volumn , Issue , 2014, Pages 1159-1163

Multimodal exemplar-based voice conversion using lip features in noisy environments

Author keywords

Image features; Multimodal; Noise robustness; Non negative matrix factorization; Voice conversion

Indexed keywords

ACOUSTIC NOISE; COST FUNCTIONS; FACE RECOGNITION; FACTORIZATION; SIGNAL PROCESSING; SPEECH COMMUNICATION;

EID: 84910091291     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (6)

References (28)
  • 2
    • 50249152311 scopus 로고    scopus 로고
    • Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness cri- Teria
    • T. Virtanen, "Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness cri- Teria, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 3, pp. 1066-1074, 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.3 , pp. 1066-1074
    • Virtanen, T.1
  • 3
    • 44949110218 scopus 로고    scopus 로고
    • Single-channel speech sepa- ration using sparse non-negative matrix factorization
    • M. N. Schmidt and R. K. Olsson, "Single-channel speech sepa- ration using sparse non-negative matrix factorization, " in Inter- speech, 2006.
    • (2006) Inter- Speech
    • Schmidt, M.N.1    Olsson, R.K.2
  • 4
    • 79960657803 scopus 로고    scopus 로고
    • Exemplar- based sparse representations for noise robust automatic speech recognition
    • J. F. Gemmeke, T. Viratnen, and A. Hurmalainen, "Exemplar- based sparse representations for noise robust automatic speech recognition, " IEEE Trans. Audio, Speech and Language Processing, vol. 19, no. 7, pp. 2067-2080, 2011.
    • (2011) IEEE Trans. Audio, Speech and Language Processing , vol.19 , Issue.7 , pp. 2067-2080
    • Gemmeke, J.F.1    Viratnen, T.2    Hurmalainen, A.3
  • 5
    • 84874248255 scopus 로고    scopus 로고
    • Exemplar-based voice conversion in noisy environment
    • R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion in noisy environment, " in SLT, pp. 313-317, 2012.
    • (2012) SLT , pp. 313-317
    • Takashima, R.1    Takiguchi, T.2    Ariki, Y.3
  • 7
    • 0031624666 scopus 로고    scopus 로고
    • Discriminative training of HMM stream exponents for audio-visual speech recognition
    • G. Potamianos and H. P. Graf, "Discriminative training of HMM stream exponents for audio-visual speech recognition, " in ICASSP, pp. 3733-3736, 1998.
    • (1998) ICASSP , pp. 3733-3736
    • Potamianos, G.1    Graf, H.P.2
  • 8
    • 0042954451 scopus 로고    scopus 로고
    • Late inte- gration in audio-visual continuous speech recognition
    • A. Verma, T. Faruquie, C. Neti, S. Basu, and A. Senior, "Late inte- gration in audio-visual continuous speech recognition, " in ASRU, 1999.
    • (1999) ASRU
    • Verma, A.1    Faruquie, T.2    Neti, C.3    Basu, S.4    Senior, A.5
  • 9
    • 0029747053 scopus 로고    scopus 로고
    • Integrat- ing audio and visual information to provide highly robust speech recognition
    • M. J. Tomlinson, M. J. Russell, and N. M. Brooke, "Integrat- ing audio and visual information to provide highly robust speech recognition, " in ICASSP, pp. 821-824, 1996.
    • (1996) ICASSP , pp. 821-824
    • Tomlinson, M.J.1    Russell, M.J.2    Brooke, N.M.3
  • 10
    • 84871395683 scopus 로고    scopus 로고
    • Robust aam- based audio-visual speech recognition against face direction changes
    • Y. Komai, N. Yang, T. Takiguchi, and Y. Ariki, "Robust aam- based audio-visual speech recognition against face direction changes, " ACM Multimedia, pp. 1161-1164, 2012.
    • (2012) ACM Multimedia , pp. 1161-1164
    • Komai, Y.1    Yang, N.2    Takiguchi, T.3    Ariki, Y.4
  • 12
    • 84865747520 scopus 로고    scopus 로고
    • Intonation conversion from neutral to expressive speech
    • C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech, " in Interspeech, pp. 2765-2768, 2011.
    • (2011) Interspeech , pp. 2765-2768
    • Veaux, C.1    Robet, X.2
  • 14
    • 80052698826 scopus 로고    scopus 로고
    • Speaking- Aid systems using GMM-based voice conversion for electrolaryn- geal speech
    • K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking- Aid systems using GMM-based voice conversion for electrolaryn- geal speech, " Speech Communication, vol. 54, no. 1, pp. 134-146, 2012.
    • (2012) Speech Communication , vol.54 , Issue.1 , pp. 134-146
    • Nakamura, K.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 15
    • 84890519936 scopus 로고    scopus 로고
    • Individuality-preserving voice conversion for articulation disor- ders based on non-negative matrix factorization
    • R. Aihara, R. Takashima, T. Takiguchi, and Y. Ariki, "Individuality-preserving voice conversion for articulation disor- ders based on Non-negative Matrix Factorization, " in ICASSP, pp. 8037-8040, 2013.
    • (2013) ICASSP , pp. 8037-8040
    • Aihara, R.1    Takashima, R.2    Takiguchi, T.3    Ariki, Y.4
  • 16
    • 84905224579 scopus 로고    scopus 로고
    • Speak- ing aid system for total laryngectomees using voice conversion of body transmitted artificial speech
    • K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speak- ing aid system for total laryngectomees using voice conversion of body transmitted artificial speech, " in Interspeech, pp. 148-151, 2006.
    • (2006) Interspeech , pp. 148-151
    • Nakamura, K.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 17
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to- speech synthesis
    • A. Kain and M.W. Macon, "Spectral voice conversion for text-to- speech synthesis, " in ICASSP, vol. 1, pp. 285-288, 1998.
    • (1998) ICASSP , vol.1 , pp. 285-288
    • Kain, A.1    Macon, M.W.2
  • 18
    • 0023739214 scopus 로고
    • Esophageal speech enhancement based on statistical voice con- version with gaussian mixture models
    • M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Esophageal speech enhancement based on statistical voice con- version with Gaussian mixture models, " in ICASSP, pp. 655-658, 1988.
    • (1988) ICASSP , pp. 655-658
    • Abe, M.1    Nakamura, S.2    Shikano, K.3    Kuwabara, H.4
  • 19
    • 0026880275 scopus 로고
    • Voice transformation using PSOLA technique
    • H. Valbret, E. Moulines, and J. P. Tubach, "Voice transformation using PSOLA technique, " Speech Communication, vol. 11, no. 2- 3, pp. 175-187, 1992.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 175-187
    • Valbret, H.1    Moulines, E.2    Tubach, J.P.3
  • 20
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 22
    • 44949210554 scopus 로고    scopus 로고
    • Map-based adaptation for speech con- version using adaptation data selection and non-parallel training
    • C. H. Lee and C. H. Wu, "Map-based adaptation for speech con- version using adaptation data selection and non-parallel training, " in Interspeech, pp. 2254-2257, 2006.
    • (2006) Interspeech , pp. 2254-2257
    • Lee, C.H.1    Wu, C.H.2
  • 23
    • 34547512822 scopus 로고    scopus 로고
    • Eigenvoice conversion based on Gaussian mixture model
    • T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on Gaussian mixture model, " in Interspeech, pp. 2446-2449, 2006.
    • (2006) Interspeech , pp. 2446-2449
    • Toda, T.1    Ohtani, Y.2    Shikano, K.3
  • 24
    • 84865798483 scopus 로고    scopus 로고
    • One-to- many voice conversion based on tensor representation of speaker space
    • D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-to- many voice conversion based on tensor representation of speaker space, " in Interspeech, pp. 653-656, 2011.
    • (2011) Interspeech , pp. 653-656
    • Saito, D.1    Yamamoto, K.2    Minematsu, N.3    Hirose, K.4
  • 25
    • 84905269973 scopus 로고    scopus 로고
    • Mutimodal voice conversion using non-negative matrix factorization in noisy environments
    • K. Masaka, R. Aihara, T. Takiguchi, and Y. Ariki, "Mutimodal voice conversion using non-negative matrix factorization in noisy environments, " in ICASSP 2014, 2014.
    • (2014) ICASSP 2014
    • Masaka, K.1    Aihara, R.2    Takiguchi, T.3    Ariki, Y.4
  • 27
    • 85009089413 scopus 로고    scopus 로고
    • HMM-based text-to-audio-visual speech synthesis - image-based approach
    • S. Sako, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "HMM-based text-to-audio-visual speech synthesis - image-based approach, " ICSLP, vol.III, pp.25-28, 2000.
    • (2000) ICSLP , vol.3 , pp. 25-28
    • Sako, S.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.