SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 18, Issue 5, 2010, Pages 932-943

Supervisory data alignment for text-independent voice conversion

(5) Tao, Jianhua a Zhang, Meng a Nurminen, Jani b Tian, Jilei c Wang, Xia c

a INSTITUTE OF AUTOMATION (China)

b MICROSOFT (United States)

c NOKIA RESEARCH CENTER (United States)

Author keywords

Data alignment; Self organized learning; Supervisory phonetic restriction; Text independent voice conversion

Indexed keywords

ALIGNMENT ACCURACY; CROSS-LINGUAL; DATA ALIGNMENTS; EVALUATION RESULTS; LINEAR ALIGNMENTS; NON-LINEAR METHODS; PARALLEL TRAINING; PARAMETER SPACES; PHONETIC INFORMATION; SELF ORGANIZED LEARNING; SELF-ORGANIZING LEARNING; SOURCE DATA; SOURCE PARAMETERS; TARGET SPACE; TARGET SPEAKER; TOPOLOGICAL STRUCTURE; TOPOLOGY PRESERVATION; VOICE CONVERSION;

LEARNING ALGORITHMS; LINGUISTICS; SPEECH PROCESSING; TARGETS; THREE TERM CONTROL SYSTEMS; TOPOLOGY; VECTOR SPACES;

ALIGNMENT;

EID: 77953724495 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2041688 Document Type: Article

Times cited : (24)

References (41)

1
- 85004448479
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," J. Acoust. Soc. Jpn.(E), vol.11, no.2, pp. 71-76, 1990.
- (1990) J. Acoust. Soc. Jpn.(E) , vol.11 , Issue.2 , pp. 71-76
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

2
- 0024874920
- Speaker adaptation applied to HMM and neural networks
- Glasgow, U.K., May
- S. Nakamura and K. Shikano, "Speaker adaptation applied to HMM and neural networks," in Proc. ICASSP, Glasgow, U.K., May 1989, pp. 89-92.
- (1989) Proc. ICASSP , pp. 89-92
- Nakamura, S.¹ Shikano, K.²

3
- 84863268465
- Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum
- Rhodes, Greece
- L. M. Arslan and D. Talkin, "Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum," in Proc. Eurospeech' 97, Rhodes, Greece, 1997.
- (1997) Proc. Eurospeech' 97
- Arslan, L.M.¹ Talkin, D.²

4
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar.
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol.6, no.2, pp. 131-142, Mar. 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

5
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- Seattle, WA, May
- A. Kain and M. W. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. ICASSP, Seattle, WA, May 1998, pp. 285-288.
- (1998) Proc. ICASSP , pp. 285-288
- Kain, A.¹ MacOn, M.W.²

6
- 0034842552
- Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum
- T. Toda, H. Saruwatari, and K. Shikao, "Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum," in Proc. ICASSP, 2001, pp. 841-944.
- (2001) Proc. ICASSP , pp. 841-944
- Toda, T.¹ Saruwatari, H.² Shikao, K.³

7
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

8
- 0026880275
- Voice transformation using PSOLA technique
- H. Valbret, E. Moulines, and J. P. Tubach, "Voice transformation using PSOLA technique," Speech Commun., vol.11, no.2-3, pp. 175-187, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2-3 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.P.³

9
- 0029254176
- Transformation of formants for voice conversion using artificial neural networks
- M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants for voice conversion using artificial neural networks," Speech Commun., vol.16, no.2, pp. 207-216, 1995.
- (1995) Speech Commun. , vol.16 , Issue.2 , pp. 207-216
- Narendranath, M.¹ Murthy, H.A.² Rajendran, S.³ Yegnanarayana, B.⁴

10
- 0029251946
- Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks
- N. Iwahashi and Y. Sagisaka, "Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks," Speech Commun., vol.16, no.2, pp. 139-151, 1995.
- (1995) Speech Commun. , vol.16 , Issue.2 , pp. 139-151
- Iwahashi, N.¹ Sagisaka, Y.²

11
- 21544463108
- VTLN-Based cross-language voice conversion
- Virgin Islands
- D. Suendermann, H. Ney, and H. Hoege, "VTLN-Based cross-language voice conversion," in Proc. ASRU'03, Virgin Islands, 2003.
- (2003) Proc. ASRU'03
- Suendermann, D.¹ Ney, H.² Hoege, H.³

12
- 33947623206
- Text-independent voice conversion based on unit selection
- D. Suendermann, H. Hoege, A. Bonafonte, H. Ney, A. Black, and S. Narayanan, "Text-independent voice conversion based on unit selection," in Proc. ICASSP'06, 2006.
- (2006) Proc. ICASSP'06
- Suendermann, D.¹ Hoege, H.² Bonafonte, A.³ Ney, H.⁴ Black, A.⁵ Narayanan, S.⁶

13
- 85009102954
- Voice conversion for unknown speakers
- H. Ye and S. J. Young, "Voice conversion for unknown speakers," in Proc. ICSLP'04.
- Proc. ICSLP'04
- Ye, H.¹ Young, S.J.²

14
- 0033154052
- Speaker transformation algorithm using segmental codebooks
- L. M. Arslan, "Speaker transformation algorithm using segmental codebooks," Speech Commun., vol.28, pp. 211-226, 1999.
- (1999) Speech Commun. , vol.28 , pp. 211-226
- Arslan, L.M.¹

15
- 44949241666
- Text-independent cross-language voice conversion
- D. Sündermann, H. Höge, A. Bonafonte, H. Ney, and J. Hirschberg, "Text-independent cross-language voice conversion," in Proc. ICSLP, 2006.
- (2006) Proc. ICSLP
- Sündermann, D.¹ Höge, H.² Bonafonte, A.³ Ney, H.⁴ Hirschberg, J.⁵

16
- 51449121435
- Text-independent voice conversion based on state mapped codebook
- M. Zhang, J. Tao, J. Tian, and X. Wang, "Text-independent voice conversion based on state mapped codebook," in Proc. ICASSP'08, 2008.
- (2008) Proc. ICASSP'08
- Zhang, M.¹ Tao, J.² Tian, J.³ Wang, X.⁴

17
- 0004251043
- Chicago, IL: Univ. of Chicago Press
- P. Ladefoged, Preliminaries to Linguistic Phonetics. Chicago, IL: Univ. of Chicago Press, 1971.
- (1971) Preliminaries to Linguistic Phonetics
- Ladefoged, P.¹

18
- 0003711977
- Oxford, U.K.: Blackwell
- W. Labov, Principles of Linguistic Change: Vol.II: Social Factors. Oxford, U.K.: Blackwell, 2001.
- (2001) Principles of Linguistic Change: Vol.II: Social Factors
- Labov, W.¹

19
- 33744674028
- Recherche sur la structure des voyelles orales
- C. Essner, "Recherche sur la structure des voyelles orales," Archives Néerlandaises de Phonétique Expérimentale, vol.20, pp. 40-77, 1947.
- (1947) Archives Néerlandaises de Phonétique Expérimentale , vol.20 , pp. 40-77
- Essner, C.¹

20
- 0003009750
- Acoustic phonetics
- M. Joos, "Acoustic phonetics," Language, vol.24, pp. 1-136, 1948.
- (1948) Language , vol.24 , pp. 131-136
- Joos, M.¹

21
- 0003417482
- Cambridge, U.K.
- "Handbook of the International Phonetic Association,". Cambridge, U.K..
- Handbook of the International Phonetic Association

22
- 0026400231
- Robust and efficient quantization of speech LSF parameters using structured vector quantizers
- R. Laroia, N. Phamdo, and N. Farvardin, "Robust and efficient quantization of speech LSF parameters using structured vector quantizers," in Proc. ICASSP'91, 1991.
- (1991) Proc. ICASSP'91
- Laroia, R.¹ Phamdo, N.² Farvardin, N.³

23
- 0028997003
- Vector-field-smoothed Bayesian learning for incremental speaker adaptation
- J. Takahashi and S. Sagayama, "Vector-field-smoothed Bayesian learning for incremental speaker adaptation," in Proc. ICASSP'95, 1995, vol.1, pp. 696-699.
- (1995) Proc. ICASSP'95 , vol.1 , pp. 696-699
- Takahashi, J.¹ Sagayama, S.²

24
- 0028997003
- Vector-field-smoothed Bayesian learning for incremental speaker adaptation
- May
- J.-I. Takahashi and S. Sagayama, "Vector-field-smoothed Bayesian learning for incremental speaker adaptation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. Conf.,May 1995, vol.1, pp. 696-699.
- (1995) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. Conf. , vol.1 , pp. 696-699
- Takahashi, J.-I.¹ Sagayama, S.²

25
- 17744361925
- New York: Springer, Graduate Texts in Mathematics
- J. M. Lee, Introduction to Topological Manifolds. New York: Springer, 2000, vol.202, Graduate Texts in Mathematics.
- (2000) Introduction to Topological Manifolds , vol.202
- Lee, J.M.¹

26
- 4544326792
- Manifold learning using euclidean K-nearest neighbor graphs
- J. A. Costa and A. O. Hero, "Manifold learning using euclidean K-nearest neighbor graphs," in Proc. ICASSP, 2004.
- (2004) Proc. ICASSP
- Costa, J.A.¹ Hero, A.O.²

27
- 0003410791
- Berlin/Heidelberg, Germany: Springer
- T. Kohonen, Self-Organizing Maps. Berlin/Heidelberg, Germany: Springer, 1995, vol.30.
- (1995) Self-Organizing Maps , vol.30
- Kohonen, T.¹

28
- 0001798623
- Convergence in distribution of the onedimensionalKohonen algorithms when the stimuli are not uniform
- C. Bouton and G. Pagès, "Convergence in distribution of the onedimensionalKohonen algorithms when the stimuli are not uniform," Adv. Appl. Probab., vol.26, no.1, pp. 80-103, 1994.
- (1994) Adv. Appl. Probab. , vol.26 , Issue.1 , pp. 80-103
- Bouton, C.¹ Pagès, G.²

29
- 21344436353
- On the a.s. convergence of the Kohonen algorithm with a general neighborhood function
- J. C. Fort and G. Pagès, "On the a.s. convergence of the Kohonen algorithm with a general neighborhood function," Ann. Appl. Probab., vol.5, no.4, pp. 1177-1216, 1995.
- (1995) Ann. Appl. Probab. , vol.5 , Issue.4 , pp. 1177-1216
- Fort, J.C.¹ Pagès, G.²

30
- 77953694567
- Speaker normalization and adaptation based on feature-map projection
- L. Knohl and A. Rinscheid, "Speaker normalization and adaptation based on feature-map projection," in Proc. Eurospeech'93, 3rd Eur. Conf. Speech, Commun. Technol., 1993, pp. 367-370.
- (1993) Proc. Eurospeech'93, 3rd Eur. Conf. Speech, Commun. Technol. , pp. 367-370
- Knohl, L.¹ Rinscheid, A.²

31
- 0027814751
- Speaker normalization with self-organizing feature maps
- L. Knohl and A. Rinscheid, "Speaker normalization with self-organizing feature maps," in Proc. IJNN'93-Nagoya, Int. Joint Conf. Neural Netw., 1993, pp. 243-246.
- (1993) Proc. IJNN'93-Nagoya, Int. Joint Conf. Neural Netw. , pp. 243-246
- Knohl, L.¹ Rinscheid, A.²

32
- 0030359624
- Voice conversion based on topological feature maps and time-variant filtering
- R. Ansgar, "Voice conversion based on topological feature maps and time-variant filtering," in Proc. ICSLP'96, pp. 1445-1448.
- Proc. ICSLP'96 , pp. 1445-1448
- Ansgar, R.¹

33
- 34547806096
- A self-organizing map with twin units capable of describing a nonlinear input-output relation applied to speech code vector mapping
- Nov.
- E. Uchino, K. Yano, and T. Azetsu, "A self-organizing map with twin units capable of describing a nonlinear input-output relation applied to speech code vector mapping," Inf. Sci.: Int. J., vol.177, no.21, pp. 4634-4644, Nov. 2007.
- (2007) Inf. Sci.: Int. J. , vol.177 , Issue.21 , pp. 4634-4644
- Uchino, E.¹ Yano, K.² Azetsu, T.³

34
- 77953701779
- Embedding new data points for manifold learning via coordinate propagation
- Long version: Knowledge and Information Systems Journal
- S. Xiang, F. Nie, Y. Song, C. Zhang, and C. Zhang, "Embedding new data points for manifold learning via coordinate propagation," in Proc. PAKDD'07, 2007, Long version: Knowledge and Information Systems Journal.
- (2007) Proc. PAKDD'07
- Xiang, S.¹ Nie, F.² Song, Y.³ Zhang, C.⁴ Zhang, C.⁵

35
- 33947233031
- Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering
- Cambridge, MA: MIT Press
- Y. Bengio, J. Paiement, and P. Vincent, "Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2004, vol.16.
- (2004) Advances in Neural Information Processing Systems , vol.16
- Bengio, Y.¹ Paiement, J.² Vincent, P.³

36
- 31544473466
- Incremental nonlinear dimensionality reduction by manifold learning
- Mar.
- M. Law and A. K. Jain, "Incremental nonlinear dimensionality reduction by manifold learning," IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no.3, pp. 377-391, Mar. 2006.
- (2006) IEEE Trans. Pattern Anal. Mach. Intell. , vol.28 , Issue.3 , pp. 377-391
- Law, M.¹ Jain, A.K.²

37
- 22844435049
- Incremental locally linear embedding
- O. Kouropteva, O. Okun, and M. Pietikaeinen, "Incremental locally linear embedding," Pattern Recognition, vol.38, no.10, pp. 1764-1767, 2005.
- (2005) Pattern Recognition , vol.38 , Issue.10 , pp. 1764-1767
- Kouropteva, O.¹ Okun, O.² Pietikaeinen, M.³

38
- 67249142662
- Phonetic Anchor based state mapping for text-independent voice conversion
- M. Zhang, J. Tao, J. Nurminen, J. Tian, and X.Wang, "Phonetic Anchor based state mapping for text-independent voice conversion," in Proc. ICSP'08, 2008.
- (2008) Proc. ICSP'08
- Zhang, M.¹ Tao, J.² Nurminen, J.³ Tian, J.⁴ Wang, X.⁵

39
- 0032673049
- Restructuring speech representations using pitch-adaptive time frequency smoothing and instanta-neous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. deCheveigne, "Restructuring speech representations using pitch-adaptive time frequency smoothing and instanta-neous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol.27, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

40
- 0029727599
- Topology preservation in self-organizing maps
- K. Kiviluoto, "Topology preservation in self-organizing maps," in Proc. Int. Conf. Neural Netw. (ICNN'96), 1996, pp. 294-299.
- (1996) Proc. Int. Conf. Neural Netw. (ICNN'96) , pp. 294-299
- Kiviluoto, K.¹

41
- 0542366491
- Efficient vector quantization of LPC parameters at 24 bits/frame
- Jan.
- K. Paliwal and B. Atal, "Efficient vector quantization of LPC parameters at 24 bits/frame," IEEE Trans. Speech Audio Process., vol.1, no.1, pp. 3-14, Jan. 1993.
- (1993) IEEE Trans. Speech Audio Process. , vol.1 , Issue.1 , pp. 3-14
- Paliwal, K.¹ Atal, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.