메뉴 건너뛰기




Volumn 32, Issue 7, 2004, Pages 1165-1172

Voice conversion technology and its development

Author keywords

Artificial neural network; Codebook mapping; Gaussian mixture model; Glottal excitation; Hidden Markov model; Pitch contour; Speech spectrum; Voice conversion

Indexed keywords

ALGORITHMS; MARKOV PROCESSES; MATHEMATICAL MODELS; NEURAL NETWORKS; SPEECH CODING;

EID: 5444260705     PISSN: 03722112     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (7)

References (56)
  • 1
    • 58149209073 scopus 로고
    • Voice conversion: State of the art and perspectives
    • E Moulines and Y Sagisaka. Voice conversion: state of the art and perspectives [J]. Speech Communication. 1995, 16(2): 125-126.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 125-126
    • Moulines, E.1    Sagisaka, Y.2
  • 2
    • 0034842740 scopus 로고    scopus 로고
    • Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    • Salt Lake City, USA: IEEE
    • M Tamura, et al. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR [A]. Proc ICASSP [C]. Salt Lake City, USA: IEEE, May 2001. 805-808.
    • (2001) Proc ICASSP , pp. 805-808
    • Tamura, M.1
  • 3
    • 84905560807 scopus 로고    scopus 로고
    • Voice conversion with smoothed GMM and MAP adaptation
    • Geneva, Switzerland: ISCA
    • Y Chen, M Chu, et al. Voice Conversion with Smoothed GMM and MAP Adaptation [A]. Proc Eurospeech [C]. Geneva, Switzerland: ISCA, Sept. 2003: 2413-2416.
    • (2003) Proc Eurospeech , pp. 2413-2416
    • Chen, Y.1    Chu, M.2
  • 4
    • 1842369784 scopus 로고
    • Glottal waveform parameters for different speakers types
    • Edinburgh, Scotland: IOA
    • I Karlsson. Glottal waveform parameters for different speakers types [A]. Proc Speech'88, 7th FASE Symp [C]. Edinburgh, Scotland: IOA, 1988. 225-231.
    • (1988) Proc Speech'88, 7th FASE Symp , pp. 225-231
    • Karlsson, I.1
  • 5
    • 0016939145 scopus 로고
    • Automatic recognition of speaker from their voices
    • B S Atal. Automatic recognition of speaker from their voices [J]. Proceedings of the IEEE, April 1976, 64(4): 460-475.
    • (1976) Proceedings of the IEEE , vol.64 , Issue.4 , pp. 460-475
    • Atal, B.S.1
  • 6
    • 0015112070 scopus 로고
    • Speech analysis and synthesis by linear prediction of the speech wave
    • B S Atal, S L Hanauer. Speech analysis and synthesis by linear prediction of the speech wave [J]. J Acoust Soc Am, 1971, 50(2): 637-655.
    • (1971) J Acoust Soc Am , vol.50 , Issue.2 , pp. 637-655
    • Atal, B.S.1    Hanauer, S.L.2
  • 7
    • 0020166649 scopus 로고
    • System to independently modify excitation and/or spectrum of speech waveform without explicit pitch extraction
    • S Seneff. System to independently modify excitation and/or spectrum of speech waveform without explicit pitch extraction [J]. IEEE Trans Acoust Speech Sig, 30(4), 1982: 566-578.
    • (1982) IEEE Trans Acoust Speech Sig , vol.30 , Issue.4 , pp. 566-578
    • Seneff, S.1
  • 8
    • 0023246352 scopus 로고
    • Factors in voice quality: Acoustic features related to gender
    • New York, USA: IEEE
    • D G Childers, et al. Factors in voice quality: Acoustic features related to gender [A]. Proc ICASSP [C]. New York, USA: IEEE, 1987. 293-296.
    • (1987) Proc ICASSP , pp. 293-296
    • Childers, D.G.1
  • 9
    • 0024680919 scopus 로고
    • Voice conversion
    • D G Childers, et al. Voice Conversion [J]. Speech Communication, 1989, 8(2): 147-158.
    • (1989) Speech Communication , vol.8 , Issue.2 , pp. 147-158
    • Childers, D.G.1
  • 10
    • 0023739214 scopus 로고
    • Voice conversion through vector quantization
    • New York, USA: IEEE
    • M Abe, et al. Voice Conversion through Vector Quantization [A]. Proc ICASSP [C]. New York, USA: IEEE, 1988(1). 655-658.
    • (1988) Proc ICASSP , Issue.1 , pp. 655-658
    • Abe, M.1
  • 11
    • 0029251946 scopus 로고
    • Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks
    • N Iwahashi, et al. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks [J]. Speech Communication. 1995, 16(2): 139-151.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 139-151
    • Iwahashi, N.1
  • 12
    • 0030359624 scopus 로고    scopus 로고
    • Voice conversion based on topological feature maps and time-variant filtering
    • Philadelphia, USA: ESCA
    • A Rinscheid. Voice conversion based on topological feature maps and time-variant filtering [A]. Proc ICSLP [C]. Philadelphia, USA: ESCA, Oct. 1996.1445-1448.
    • (1996) Proc ICSLP , pp. 1445-1448
    • Rinscheid, A.1
  • 13
    • 0026880275 scopus 로고
    • Voice transformation using PSOLA technique
    • H Valbret, et al. Voice transformation using PSOLA technique [J]. Speech Communication, 1992,11(2-3): 175-187.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 175-187
    • Valbret, H.1
  • 14
    • 0029254176 scopus 로고
    • Transformation of formants for voice conversion using artificial neural networks
    • M Narendranath, et al. Transformation of formants for voice conversion using artificial neural networks [J]. Speech Communication, 1995, 16(2): 207-216.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 207-216
    • Narendranath, M.1
  • 15
    • 85009266993 scopus 로고    scopus 로고
    • Transformation of spectral envelope for voice conversion based on radial basis function networks
    • Denver, USA: ISCA
    • T Watanabe, et al. Transformation of spectral envelope for voice conversion based on radial basis function networks [A]. Proc ICSLP'2002 [C]. Denver, USA: ISCA, Sept. 2002.285-288.
    • (2002) Proc ICSLP'2002 , pp. 285-288
    • Watanabe, T.1
  • 16
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Y Stylianou, et al. Continuous probabilistic transform for voice conversion [J]. IEEE Transactions on Speech and Audio Processing. March 1998, 6(2): 131-142.
    • (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1
  • 17
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis
    • Seattle, USA: IEEE
    • A Kain, M Macon. Spectral voice conversion for text-to-speech synthesis [A]. Proc ICASSP [C]. Seattle, USA: IEEE, May 1998 (1). 285-288.
    • (1998) Proc ICASSP , Issue.1 , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 18
    • 85009069262 scopus 로고    scopus 로고
    • STRAIGHT-based voice conversion algorithm based on Gaussian mixture model
    • Beijing, China: ESCA
    • T Toda, et al. STRAIGHT-based voice conversion algorithm based on Gaussian mixture model [A]. Proc ICSLP [C]. Beijing, China: ESCA, Oct. 2000.279-282.
    • (2000) Proc ICSLP , pp. 279-282
    • Toda, T.1
  • 19
    • 85135141647 scopus 로고    scopus 로고
    • Hidden markov model based voice conversion using dynamic characteristics of speaker
    • Rhodes, Greece: ESCA
    • E K Kim, et al. Hidden markov model based voice conversion using dynamic characteristics of speaker [A]. Proc Eurospeech [C]. Rhodes, Greece: ESCA, 1997.2519-2522.
    • (1997) Proc Eurospeech , pp. 2519-2522
    • Kim, E.K.1
  • 20
    • 5444231240 scopus 로고    scopus 로고
    • Voice conversion based on acoustic feature transformation
    • Shenzhen, China
    • W Zhang, et al. Voice conversion based on acoustic feature transformation [A]. Proc NCMMSC [C]. Shenzhen, China, 2001.189-192.
    • (2001) Proc NCMMSC , pp. 189-192
    • Zhang, W.1
  • 21
    • 0026372714 scopus 로고
    • Experiments with voice modeling in speech synthesis
    • R Carlson, et al. Experiments with voice modeling in speech synthesis [J]. Speech Communication. 1991, 16(5-6): 481-489.
    • (1991) Speech Communication , vol.16 , Issue.5-6 , pp. 481-489
    • Carlson, R.1
  • 22
    • 0026369941 scopus 로고
    • A segment-based approach to voice conversion
    • Toronto, Canada: IEEE
    • M Abe. A segment-based approach to voice conversion [A]. Proc ICASSP [C]. Toronto, Canada: IEEE, May 1991.765-768.
    • (1991) Proc ICASSP , pp. 765-768
    • Abe, M.1
  • 23
    • 0029764985 scopus 로고    scopus 로고
    • Speaker recognizability testing for voice coders
    • Atlanta, USA: IEEE
    • A Schmidt-Nielson, D P Brock. Speaker recognizability testing for voice coders [A]. Proc ICASSP [C]. Atlanta, USA: IEEE, May 1996.1149-1152.
    • (1996) Proc ICASSP , pp. 1149-1152
    • Schmidt-Nielson, A.1    Brock, D.P.2
  • 24
    • 0029256373 scopus 로고
    • Acoustic characteristics of speaker individuality: Control and conversion
    • H Kuwabara and Y Sagisaka. Acoustic characteristics of speaker individuality: control and conversion [J]. Speech Communication. 1995, 16(2): 165-173.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 165-173
    • Kuwabara, H.1    Sagisaka, Y.2
  • 25
    • 0025321354 scopus 로고
    • Analysis, synthesis, and perception of voice quality variations among female and male talkers
    • D Klatt and L C Klatt. Analysis, synthesis, and perception of voice quality variations among female and male talkers [J]. J Acoust Soc Am, 1990, 87(2): 820-857.
    • (1990) J Acoust Soc Am , vol.87 , Issue.2 , pp. 820-857
    • Klatt, D.1    Klatt, L.C.2
  • 26
    • 0027409390 scopus 로고
    • Voice source model for continuous control of pitch period
    • P H Milenkovic. Voice source model for continuous control of pitch period [J]. J Acoust Soc Am, 1993, 93(2): 1087-1096.
    • (1993) J Acoust Soc Am , vol.93 2 , pp. 1087-1096
    • Milenkovic, P.H.1
  • 27
    • 0015677419 scopus 로고
    • Multidimensional representation of personal quality of vowels and its acoustical correlates
    • H Matsumoto, et al. Multidimensional representation of personal quality of vowels and its acoustical correlates [J]. IEEE Trans Audio and Electroacoustics, 1973, 21(5): 428-436.
    • (1973) IEEE Trans Audio and Electroacoustics , vol.21 , Issue.5 , pp. 428-436
    • Matsumoto, H.1
  • 28
    • 0001174086 scopus 로고
    • Research on individuality features in speech waves and automatic speaker recognition techniques
    • S Furui. Research on individuality features in speech waves and automatic speaker recognition techniques [J]. Speech Communication, 1986, 5(2): 183-197.
    • (1986) Speech Communication , vol.5 , Issue.2 , pp. 183-197
    • Furui, S.1
  • 29
    • 0030365550 scopus 로고    scopus 로고
    • A new voice transformation based on both linear and nonlinear prediction
    • Philadelphia, USA: ESCA
    • K S Lee, et al. A new voice transformation based on both linear and nonlinear prediction [A]. Proc ICSLP [C]. Philadelphia, USA: ESCA, 1996.1401-1404.
    • (1996) Proc ICSLP , pp. 1401-1404
    • Lee, K.S.1
  • 30
    • 0033154052 scopus 로고    scopus 로고
    • Speaker transformation algorithm using segmental code-books (STASC)
    • L M Arslan. Speaker transformation algorithm using segmental code-books (STASC) [I]. Speech Communication, 1999, 28(3): 211-226.
    • (1999) Speech Communication , vol.28 , Issue.3 , pp. 211-226
    • Arslan, L.M.1
  • 31
    • 0029256372 scopus 로고
    • Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt
    • H Mizuno and M Abe. Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt [J]. Speech Communication. 1995, 16(2): 165-173.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 165-17
    • Mizuno, H.1    Abe, M.2
  • 32
    • 85009133605 scopus 로고    scopus 로고
    • A new multi-speaker formant synthesizer that applies voice conversion techniques
    • Aalborg, Denmark: ESCA
    • J Gutirrez, et al. A new multi-speaker formant synthesizer that applies voice conversion techniques [A]. Proc Eurospeech [C]. Aalborg, Denmark: ESCA, 2001:357-360.
    • (2001) Proc Eurospeech , pp. 357-360
    • Gutirrez, J.1
  • 33
    • 85135145847 scopus 로고    scopus 로고
    • Speaker interpolation in HMM-based speech synthesis system
    • Rhodes, Greece: ESCA
    • T Yoshimura, et al. Speaker interpolation in HMM-based speech synthesis system [A]. Proc. Eurospeech [C]. Rhodes, Greece: ESCA, 1997.2523-2526.
    • (1997) Proc. Eurospeech , pp. 2523-2526
    • Yoshimura, T.1
  • 34
    • 0029253818 scopus 로고
    • Glottal source modeling for voice conversion
    • D G Childers. Glottal source modeling for voice conversion [J]. Speech Communication. 1995, 16(2): 127-138.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 127-138
    • Childers, D.G.1
  • 35
    • 0034841948 scopus 로고    scopus 로고
    • Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction
    • Salt Lake City, USA: IEEE
    • A Kain, M Macon. Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction [A]. Proc ICASSP [C]. Salt Lake City, USA: IEEE, June 2001.813-816.
    • (2001) Proc ICASSP , pp. 813-816
    • Kain, A.1    Macon, M.2
  • 36
    • 4444285698 scopus 로고    scopus 로고
    • High resolution voice transformation
    • SA: Oregon Health and Science University
    • A Kain. High resolution voice transformation [D]. Portland, USA: Oregon Health and Science University, Oct.2001.
    • (2001) Portland
    • Kain, A.1
  • 37
    • 0344557332 scopus 로고    scopus 로고
    • Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum
    • Rhodes, Greece: ESCA
    • L M Arslan, D Talkin. Voice Conversion by Codebook Mapping of Line Spectral Frequencies and Excitation Spectrum [A]. Proc Eurospeech [C]. Rhodes, Greece: ESCA, 1997(3).481-489.
    • (1997) Proc Eurospeech , Issue.3 , pp. 481-489
    • Arslan, L.M.1    Talkin, D.2
  • 38
    • 0022906142 scopus 로고
    • Speaker adaptation through vector quantization
    • Tokyo, Japan: IEEE
    • K Shikano, et al. Speaker adaptation through vector quantization [A]. Proc ICASSP [C]. Tokyo, Japan: IEEE, 1986.2643-2646.
    • (1986) Proc ICASSP , pp. 2643-2646
    • Shikano, K.1
  • 39
    • 5444268579 scopus 로고
    • Spectrogram normalization using fuzzy vector quantization
    • S Nakamura, K Shikano. Spectrogram normalization using fuzzy vector quantization [J]. J Acoust Soc Japan, 1989, 45(2): 107-114.
    • (1989) J Acoust Soc Japan , vol.45 , Issue.2 , pp. 107-114
    • Nakamura, S.1    Shikano, K.2
  • 41
    • 85010456787 scopus 로고
    • Spectral mapping for voice conversion using speaker selection and vector field smoothing
    • Madrid, Spain: ESCA
    • M Hashimoto, N Higuchi. Spectral mapping for voice conversion using speaker selection and vector field smoothing [A]. Proc Eurospeech [C]. Madrid, Spain: ESCA, 1995: 431-434.
    • (1995) Proc Eurospeech , pp. 431-434
    • Hashimoto, M.1    Higuchi, N.2
  • 42
    • 0003795363 scopus 로고    scopus 로고
    • Local models and Gaussian mixture models for statistical data processing
    • Portland, USA: Oregon Graduate of Institute of Science and Technology
    • N Kambhatla. Local Models and Gaussian Mixture Models for Statistical Data Processing [D]. Portland, USA: Oregon Graduate of Institute of Science and Technology, 1996.
    • (1996)
    • Kambhatla, N.1
  • 43
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C J Leggetter, P C Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models [J]. Computer Speech and Language, 1995, 9(2): 171-185.
    • (1995) Computer Speech and Language , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 44
    • 0026142334 scopus 로고
    • A study on speaker adaptation of the parameters of continuous density hidden markov models
    • Lee Chin-Hui, et al. A study on speaker adaptation of the parameters of continuous density hidden markov models [J]. IEEE Trans on Signal Processing, 1991, 39(4): 806-814.
    • (1991) IEEE Trans on Signal Processing , vol.39 , Issue.4 , pp. 806-814
    • Lee, C.-H.1
  • 45
    • 85135109228 scopus 로고
    • Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs
    • Banff, Canada: ESCA
    • K Ohjura, et al. Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs [A]. Proc ICSLP [C]. Banff, Canada: ESCA, Oct. 1992.369-372.
    • (1992) Proc ICSLP , pp. 369-372
    • Ohjura, K.1
  • 46
    • 5444247189 scopus 로고
    • Voice personality transformation using an orthogonal vector space conversion
    • Madrid, Spain: ESCA
    • K S Lee, et al. Voice personality transformation using an orthogonal vector space conversion [A]. Proc Eurospeech [C]. Madrid, Spain: ESCA, 1995 (1). 427-430.
    • (1995) Proc Eurospeech , Issue.1 , pp. 427-430
    • Lee, K.S.1
  • 47
    • 85135175982 scopus 로고
    • Statistical methods for voice quality transformation
    • Madrid Spain: ESCA
    • Y Stylianou, et al. Statistical methods for voice quality transformation [A]. Proc Eurospeech [C]. Madrid Spain: ESCA, 1995.447-450
    • (1995) Proc Eurospeech , pp. 447-450
    • Stylianou, Y.1
  • 48
    • 5444259197 scopus 로고    scopus 로고
    • On the construction of a pitch conversion system
    • Toulouse, France: EUSIP
    • T Ceyssens, et al. On the Construction of a Pitch Conversion System [A]. Proc EUSIPCO [C]. Toulouse, France: EUSIP, 2002. 1301-1304.
    • (2002) Proc EUSIPCO , pp. 1301-1304
    • Ceyssens, T.1
  • 49
    • 84866347413 scopus 로고    scopus 로고
    • Transforming voice quality
    • Geneva, Switzerland: ISCA
    • B Gillett, S King. Transforming voice quality [A]. Proc Eurospeech [C]. Geneva, Switzerland: ISCA, 2003.1713-1716.
    • (2003) Proc Eurospeech , pp. 1713-1716
    • Gillett, B.1    King, S.2
  • 50
    • 5444243681 scopus 로고    scopus 로고
    • Speaker-specific pitch contour modeling and modification
    • Seattle, USA: IEEE
    • D Chappell, J Hansen. Speaker-specific pitch contour modeling and modification [A]. Proc ICASSP [C]. Seattle, USA: IEEE, May 1998. 885-888.
    • (1998) Proc ICASSP , pp. 855-888
    • Chappell, D.1    Hansen, J.2
  • 51
    • 5444225334 scopus 로고    scopus 로고
    • New methods for voice conversion
    • Istanbul, Turkey: Bog azici University
    • O Turk. New Methods for Voice Conversion [D]. Istanbul, Turkey: Bog azici University, 2003.
    • (2003)
    • Turk, O.1
  • 52
    • 0033693289 scopus 로고    scopus 로고
    • Stochastic modeling of spectral adjustment for high quality pitch modification
    • Istanbul, Turkey: IEEE
    • A Kain, Y Stylianou. Stochastic modeling of spectral adjustment for high quality pitch modification [A]. Proc ICASSP [C]. Istanbul, Turkey: IEEE, June 2000. 949-952.
    • (2000) Proc ICASSP , pp. 949-952
    • Kain, A.1    Stylianou, Y.2
  • 53
    • 0031643805 scopus 로고    scopus 로고
    • Speaker transformation using sentence HMM based alignments and detailed prosody modification
    • Seattle, USA: IEEE
    • L M Arslan, et al. Speaker transformation using sentence HMM based alignments and detailed prosody modification [A]. Proc ICASSP [C]. Seattle, USA: IEEE, May 1998.289-292.
    • (1998) Proc ICASSP , pp. 289-292
    • Arslan, L.M.1
  • 54
    • 0023407575 scopus 로고
    • Review of text-to-speech conversion for English
    • D Klatt. Review of text-to-speech conversion for English [J]. J Acoust Soc Am. 1987, 82(3): 737-793.
    • (1987) J Acoust Soc Am. , vol.82 , Issue.3 , pp. 737-793
    • Klatt, D.1
  • 55
    • 5444223461 scopus 로고
    • A global framework for the assessment of synthetic speech without subjects
    • Berlin, Germany: ESCA
    • A Mariniak. A global framework for the assessment of synthetic speech without subjects [A]. Proc Eurospeech [C]. Berlin, Germany: ESCA, 1993(3): 1683-1686.
    • (1993) Proc Eurospeech , Issue.3 , pp. 1683-1686
    • Mariniak, A.1
  • 56
    • 84971539709 scopus 로고    scopus 로고
    • Emotional speech synthesis - A review
    • Aalborg, Denmark: ISCA
    • M Schroder. Emotional speech synthesis-A review [A]. Proc Eurospeech [C]. Aalborg, Denmark: ISCA, 2001(1): 561-564.
    • (2001) Proc Eurospeech , Issue.1 , pp. 561-564
    • Schroder, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.