-
1
-
-
0031623661
-
Spectral voice conversion for text-to-speech synthesis
-
A. Kain and M. W. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1998, pp. 285-288.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1998
, pp. 285-288
-
-
Kain, A.1
Macon, M.W.2
-
2
-
-
84865747520
-
Intonation conversion from neutral to expressive speech
-
C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech," in Proc. Interspeech, 2011, pp. 2765-2768.
-
Proc. Interspeech, 2011
, pp. 2765-2768
-
-
Veaux, C.1
Robet, X.2
-
3
-
-
80052698826
-
Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
-
K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Commun., vol. 54, no. 1, pp. 134-146, 2012.
-
(2012)
Speech Commun.
, vol.54
, Issue.1
, pp. 134-146
-
-
Nakamura, K.1
Toda, T.2
Saruwatari, H.3
Shikano, K.4
-
4
-
-
0034855352
-
High-performance robust speech recognition using stereo training data
-
L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2001, pp. 301-304.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2001
, pp. 301-304
-
-
Deng, L.1
Acero, A.2
Jiang, L.3
Droppo, J.4
Huang, X.5
-
5
-
-
70450192197
-
Speech generation from hand gestures based on space mapping
-
A. Kunikoshi, Y. Qiao, N. Minematsu, and K. Hirose, "Speech generation from hand gestures based on space mapping," in Proc. Interspeech, 2009, pp. 308-311.
-
Proc. Interspeech, 2009
, pp. 308-311
-
-
Kunikoshi, A.1
Qiao, Y.2
Minematsu, N.3
Hirose, K.4
-
6
-
-
0023739214
-
Voice conversion through vector quantization
-
M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1988, pp. 655-658.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1988
, pp. 655-658
-
-
Abe, M.1
Nakamura, S.2
Shikano, K.3
Kuwabara, H.4
-
7
-
-
0026880275
-
Voice transformation using PSOLA technique
-
H. Valbret, E. Moulines, and J.-P. Tubach, "Voice transformation using PSOLA technique," Speech Commun., vol. 11, no. 2, pp. 175-187, 1992.
-
(1992)
Speech Commun.
, vol.11
, Issue.2
, pp. 175-187
-
-
Valbret, H.1
Moulines, E.2
Tubach, J.-P.3
-
8
-
-
0032026483
-
Continuous probabilistic transform for voice conversion
-
Mar.
-
Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
-
(1998)
IEEE Trans. Speech Audio Process.
, vol.6
, Issue.2
, pp. 131-142
-
-
Stylianou, Y.1
Cappé, O.2
Moulines, E.3
-
9
-
-
44949210554
-
Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
-
C.-H. Lee and C.-H. Wu, "MAP-based adaptation for speech conversion using adaptation data selection and non-parallel training," in Proc. Interspeech, 2006, pp. 2254-2257.
-
Proc. Interspeech, 2006
, pp. 2254-2257
-
-
Lee, C.-H.1
Wu, C.-H.2
-
10
-
-
34547512822
-
Eigenvoice conversion based on gaussian mixture model
-
T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on gaussian mixture model," in Proc. Interspeech, 2006, pp. 2446-2449.
-
Proc. Interspeech, 2006
, pp. 2446-2449
-
-
Toda, T.1
Ohtani, Y.2
Shikano, K.3
-
11
-
-
84865798483
-
One-to-many voice conversion based on tensor representation of speaker space
-
D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-to-many voice conversion based on tensor representation of speaker space," in Proc. Interspeech, 2011, pp. 653-656.
-
Proc. Interspeech, 2011
, pp. 653-656
-
-
Saito, D.1
Yamamoto, K.2
Minematsu, N.3
Hirose, K.4
-
12
-
-
79959834571
-
Probabilistic integration of joint density model and speaker model for voice conversion
-
D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion,"in Proc. Interspeech, 2010, pp. 1728-1731.
-
Proc. Interspeech, 2010
, pp. 1728-1731
-
-
Saito, D.1
Watanabe, S.2
Nakamura, A.3
Minematsu, N.4
-
13
-
-
70349197691
-
Voice conversion using artificial neural networks
-
S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, "Voice conversion using artificial neural networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3893-3896.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009
, pp. 3893-3896
-
-
Desai, S.1
Raghavendra, E.V.2
Yegnanarayana, B.3
Black, A.W.4
Prahallad, K.5
-
14
-
-
4544270860
-
Minimum segmentation error based discriminative training for speech synthesis application
-
Y.-J. Wu, H. Kawai, J. Ni, and R.-H. Wang, "Minimum segmentation error based discriminative training for speech synthesis application,"in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2004, pp. 629-632.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2004
, pp. 629-632
-
-
Wu, Y.-J.1
Kawai, H.2
Ni, J.3
Wang, R.-H.4
-
15
-
-
34547522070
-
Discriminative training for large-vocabulary speech recognition using minimum classification error
-
Jan.
-
E. McDermott, T. J. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri, "Discriminative training for large-vocabulary speech recognition using minimum classification error," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 203-223, Jan. 2007.
-
(2007)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.15
, Issue.1
, pp. 203-223
-
-
McDermott, E.1
Hazen, T.J.2
Le Roux, J.3
Nakamura, A.4
Katagiri, S.5
-
16
-
-
38549096029
-
A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
-
T. Tomoki and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. 90, no. 5, pp. 816-824, 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.90
, Issue.5
, pp. 816-824
-
-
Tomoki, T.1
Tokuda, K.2
-
17
-
-
84901793334
-
Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis
-
Jul.
-
Z.-H. Ling and L.-R. Dai, "Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 5, pp. 1492-1502, Jul. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.5
, pp. 1492-1502
-
-
Ling, Z.-H.1
Dai, L.-R.2
-
18
-
-
67650851754
-
USTC system for blizzard challenge 2006. An improved HMM-based speech synthesis method
-
Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for blizzard challenge 2006. An improved HMM-based speech synthesis method," in Proc. Blizzard Challenge Workshop, 2006.
-
Proc. Blizzard Challenge Workshop, 2006
-
-
Ling, Z.-H.1
Wu, Y.-J.2
Wang, Y.-P.3
Qin, L.4
Wang, R.-H.5
-
19
-
-
35148852326
-
Voice conversion using canonical correlation analysis based on gaussian mixture model
-
Z. Jian and Z. Yang, "Voice conversion using canonical correlation analysis based on gaussian mixture model," in Proc. 8thACIS Int. Conf. IEEE Software Eng., Artif. Intell., Netw., Parallel/Distrib. Comput. (SNPD '07), 2007, pp. 210-215.
-
Proc. 8thACIS Int. Conf. IEEE Software Eng., Artif. Intell., Netw., Parallel/Distrib. Comput. (SNPD '07), 2007
, pp. 210-215
-
-
Jian, Z.1
Yang, Z.2
-
20
-
-
84874248255
-
Exemplar-based voice conversion in noisy environment
-
R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion in noisy environment," in Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012, pp. 313-317.
-
Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012
, pp. 313-317
-
-
Takashima, R.1
Takiguchi, T.2
Ariki, Y.3
-
21
-
-
84901803470
-
Exemplar-based voice conversion using non-negative spectrogram deconvolution
-
Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li, "Exemplar-based voice conversion using non-negative spectrogram deconvolution,"in Proc. 8th ISCA Speech Synth. Workshop, 2013.
-
Proc. 8th ISCA Speech Synth. Workshop, 2013
-
-
Wu, Z.1
Virtanen, T.2
Kinnunen, T.3
Chng, E.S.4
Li, H.5
-
22
-
-
84906280857
-
Voice conversion in high-order eigen space using deep belief nets
-
T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in Proc. Interspeech, 2013, pp. 369-372.
-
Proc. Interspeech, 2013
, pp. 369-372
-
-
Nakashika, T.1
Takashima, R.2
Takiguchi, T.3
Ariki, Y.4
-
23
-
-
0000329993
-
Information processing in dynamical systems: Foundations of harmony theory
-
P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," Parallel Distrib. Process., vol. 1, 1986.
-
(1986)
Parallel Distrib. Process
, vol.1
-
-
Smolensky, P.1
-
24
-
-
33745805403
-
A fast learning algorithm for deep belief nets
-
G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
-
(2006)
Neural Comput.
, vol.18
, Issue.7
, pp. 1527-1554
-
-
Hinton, G.E.1
Osindero, S.2
Teh, Y.-W.3
-
25
-
-
84901237776
-
Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
-
Oct.
-
Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., no. 10, pp. 2129-2139, Oct. 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.10
, pp. 2129-2139
-
-
Ling, Z.-H.1
Deng, L.2
Yu, D.3
-
26
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
Jan.
-
A.-r. Mohamed, G. E. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 14-22
-
-
Mohamed, A.-R.1
Dahl, G.E.2
Hinton, G.3
-
27
-
-
78149306047
-
3-d object recognition with deep belief nets
-
V. Nair and G. Hinton, "3-d object recognition with deep belief nets,"Adv. Neural Inf. Process. Syst., vol. 22, pp. 1339-1347, 2009.
-
(2009)
Adv. Neural Inf. Process. Syst.
, vol.22
, pp. 1339-1347
-
-
Nair, V.1
Hinton, G.2
-
28
-
-
84991233704
-
A deep learning approach to machine transliteration
-
T. Deselaers, S. Hasan, O. Bender, and H. Ney, "A deep learning approach to machine transliteration," in Proc. 4th Workshop Statist. Mach. Translat. Assoc. Comput. Linguist., 2009, pp. 233-241.
-
Proc. 4th Workshop Statist. Mach. Translat. Assoc. Comput. Linguist., 2009
, pp. 233-241
-
-
Deselaers, T.1
Hasan, S.2
Bender, O.3
Ney, H.4
-
29
-
-
84858768256
-
The recurrent temporal restricted Boltzmann machine
-
I. Sutskever, G. Hinton, and G. Taylor, "The recurrent temporal restricted Boltzmann machine," NIPS, vol. 19, pp. 1601-1608, 2008.
-
(2008)
NIPS
, vol.19
, pp. 1601-1608
-
-
Sutskever, I.1
Hinton, G.2
Taylor, G.3
-
30
-
-
84867129058
-
Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
-
N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription," in Proc. Int. Conf. Mach. Learn., 2012.
-
Proc. Int. Conf. Mach. Learn., 2012
-
-
Boulanger-Lewandowski, N.1
Bengio, Y.2
Vincent, P.3
-
31
-
-
84889579519
-
Conditional restricted Boltzmann machine for voice conversion
-
Z. Wu, E. S. Chng, and H. Li, "Conditional restricted Boltzmann machine for voice conversion," in Proc. IEEE China Summit and Int. Conf. Signal Inf. Process. (ChinaSIP), 2013, pp. 104-108.
-
Proc. IEEE China Summit and Int. Conf. Signal Inf. Process. (ChinaSIP), 2013
, pp. 104-108
-
-
Wu, Z.1
Chng, E.S.2
Li, H.3
-
32
-
-
84906225084
-
Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
-
L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion,"in Proc. Interspeech, 2013, pp. 3052-3056.
-
Proc. Interspeech, 2013
, pp. 3052-3056
-
-
Chen, L.-H.1
Ling, Z.-H.2
Song, Y.3
Dai, L.-R.4
-
33
-
-
84864026688
-
Modeling human motion using binary latent variables
-
G. W. Taylor, G. E. Hinton, and S. T. Roweis, "Modeling human motion using binary latent variables," Adv. Neural Inf. Process. Syst., pp. 1345-1352, 2006.
-
(2006)
Adv. Neural Inf. Process. Syst
, pp. 1345-1352
-
-
Taylor, G.W.1
Hinton, G.E.2
Roweis, S.T.3
-
35
-
-
79959342724
-
Improved learning of gaussian-bernoulli restricted boltzmann machines
-
K. Cho, A. Ilin, and T. Raiko, "Improved learning of gaussian-bernoulli restricted Boltzmann machines," Artif. Neur. Netw. Mach. Learn., pp. 10-17, 2011.
-
(2011)
Artif. Neur. Netw. Mach. Learn
, pp. 10-17
-
-
Cho, K.1
Ilin, A.2
Raiko, T.3
-
37
-
-
57749193836
-
Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
-
Nov.
-
T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,"IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
-
(2007)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.15
, Issue.8
, pp. 2222-2235
-
-
Toda, T.1
Black, A.W.2
Tokuda, K.3
-
38
-
-
0025475528
-
ATR japanese speech database as a tool of speech recognition and synthesis
-
A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, no. 4, pp. 357-363, 1990.
-
(1990)
Speech Commun.
, vol.9
, Issue.4
, pp. 357-363
-
-
Kurematsu, A.1
Takeda, K.2
Sagisaka, Y.3
Katagiri, S.4
Kuwabara, H.5
Shikano, K.6
-
39
-
-
51449108867
-
TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation
-
H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, "TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008, pp. 3933-3936.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008
, pp. 3933-3936
-
-
Kawahara, H.1
Morise, M.2
Takahashi, T.3
Nisimura, R.4
Irino, T.5
Banno, H.6
-
40
-
-
80052359758
-
Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
-
B. Milner and X. Shao, "Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model," in Proc. Interspeech, 2002, pp. 2421-2424.
-
Proc. Interspeech, 2002
, pp. 2421-2424
-
-
Milner, B.1
Shao, X.2
-
41
-
-
77956002520
-
-
Comput. Sci. Dept., Univ. of Toronto, Toronto, ON, USA, Tech. Rep
-
A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," Comput. Sci. Dept., Univ. of Toronto, Toronto, ON, USA, Tech. Rep., 2009.
-
(2009)
Learning Multiple Layers of Features from Tiny images
-
-
Krizhevsky, A.1
Hinton, G.2
-
42
-
-
84905252390
-
Voice conversion in time-invariant speaker-independent space
-
T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion in time-invariant speaker-independent space," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 7939-7943.
-
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014
, pp. 7939-7943
-
-
Nakashika, T.1
Takiguchi, T.2
Ariki, Y.3
|