-
1
-
-
67651002140
-
Statistical parametric speech synthesis
-
H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Communication, Vol. 51, no. 11, pp. 1039-1064, 2009.
-
(2009)
Speech Communication
, vol.51
, Issue.11
, pp. 1039-1064
-
-
Zen, H.1
Tokuda, K.2
Black, A.3
-
2
-
-
84876687945
-
Speech synthesis based on hidden Markov models
-
K. Tokuda, Y Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models," Proceedings of the IEEE, Vol. 101, no. 5, pp. 1234-1252, 2013.
-
(2013)
Proceedings of the IEEE
, vol.101
, Issue.5
, pp. 1234-1252
-
-
Tokuda, K.1
Nankaku, Y.2
Toda, T.3
Zen, H.4
Yamagishi, J.5
Oura, K.6
-
3
-
-
85032750981
-
Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques And future trends
-
Z. H. Ling, S. Y Kang, H. Zen, A. Senior, M. Schuster, X. J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Processing Magazine, Vol. 32, no. 3, pp. 35-52, 2015.
-
(2015)
IEEE Signal Processing Magazine
, vol.32
, Issue.3
, pp. 35-52
-
-
Ling, Z.H.1
Kang, S.Y.2
Zen, H.3
Senior, A.4
Schuster, M.5
Qian, X.J.6
Meng, H.7
Deng, L.8
-
4
-
-
33846429403
-
Minimum generation error training for HMM-based speech synthesis
-
Toulouse, France, May
-
Y J. Wu and R. H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, Toulouse, France, May 2006, pp. 89-92.
-
(2006)
Proc. ICASSP
, pp. 89-92
-
-
Wu, Y.J.1
Wang, R.H.2
-
5
-
-
84978086501
-
Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training
-
Z. Wu and S. King, "Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 7, pp. 1255-1265, 2016.
-
(2016)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.24
, Issue.7
, pp. 1255-1265
-
-
Wu, Z.1
King, S.2
-
6
-
-
84994361374
-
The voice conversion challenge 2016
-
California, U.S.A., Sep.
-
T. Toda, L. H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, and J. Yamagishi, "The Voice Conversion Challenge 2016," in Proc. INTERSPEECH, California, U.S.A., Sep. 2016, pp. 1632-1636.
-
(2016)
Proc. INTERSPEECH
, pp. 1632-1636
-
-
Toda, T.1
Chen, L.H.2
Saito, D.3
Villavicencio, F.4
Wester, M.5
Wu, Z.6
Yamagishi, J.7
-
7
-
-
84994234512
-
Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis
-
California, U.S.A., Sep.
-
Y. Ijima, T. Asami, and H. Mizuno, "Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis," in Proc. INTERSPEECH, California, U.S.A., Sep. 2016, pp. 337-341.
-
(2016)
Proc. INTERSPEECH
, pp. 337-341
-
-
Ijima, Y.1
Asami, T.2
Mizuno, H.3
-
8
-
-
57749193836
-
Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
-
T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, no. 8, pp. 2222-2235, 2007.
-
(2007)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.15
, Issue.8
, pp. 2222-2235
-
-
Toda, T.1
Black, A.W.2
Tokuda, K.3
-
9
-
-
84878387899
-
Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
-
Portland, U.S.A Sep.
-
Y. Ohtani, M. Tamura, M. Morita, T. Kagoshima, and M. Akamine, "Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP," in Proc. INTERSPEECH, Portland, U.S.A., Sep. 2012.
-
(2012)
Proc. INTERSPEECH
-
-
Ohtani, Y.1
Tamura, M.2
Morita, M.3
Kagoshima, T.4
Akamine, M.5
-
10
-
-
84962834006
-
Postfilters to modify the modulation spectrum for statistical parametric speech synthesis
-
S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, "Postfilters to modify the modulation spectrum for statistical parametric speech synthesis," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 4, pp. 755-767, 2016.
-
(2016)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.24
, Issue.4
, pp. 755-767
-
-
Takamichi, S.1
Toda, T.2
Black, A.W.3
Neubig, G.4
Sakti, S.5
Nakamura, S.6
-
11
-
-
84946033919
-
Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion
-
Brisbane, Australia, Apr.
-
S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, "Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion," in Proc. ICASSP, Brisbane, Australia, Apr. 2015, pp. 4859-4863.
-
(2015)
Proc. ICASSP
, pp. 4859-4863
-
-
Takamichi, S.1
Toda, T.2
Black, A.W.3
Nakamura, S.4
-
12
-
-
84973375140
-
Trajectory training considering global variance for speech synthesis based on neural networks
-
Shanghai, China, Mar.
-
K. Hashimoto, K. Oura, Y Nankaku, and K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks," in Proc. ICASSP, Shanghai, China, Mar. 2016, pp. 5600-5604.
-
(2016)
Proc. ICASSP
, pp. 5600-5604
-
-
Hashimoto, K.1
Oura, K.2
Nankaku, Y.3
Tokuda, K.4
-
13
-
-
84910088495
-
Analysis of spectral enhancement using global variance in HMM-based speech synthesis
-
MAX Atria, Singapore, May
-
T. Nose and A. Ito, "Analysis of spectral enhancement using global variance in HMM-based speech synthesis," in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 2917-2921.
-
(2014)
Proc. INTERSPEECH
, pp. 2917-2921
-
-
Nose, T.1
Ito, A.2
-
14
-
-
84890490547
-
Statistical parametric speech synthesis using deep neural networks
-
Vancouver, Canada, May
-
H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, Vancouver, Canada, May 2013, pp. 7962-7966.
-
(2013)
Proc. ICASSP
, pp. 7962-7966
-
-
Zen, H.1
Senior, A.2
Schuster, M.3
-
15
-
-
84962901047
-
Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance
-
Z. Wu, P. L. D. Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, and J. Yamagishi, "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 4, pp. 768-783, 2016.
-
(2016)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.24
, Issue.4
, pp. 768-783
-
-
Wu, Z.1
Leon, P.L.D.2
Demiroglu, C.3
Khodabakhsh, A.4
King, S.5
Ling, Z.6
Saito, D.7
Stewart, B.8
Toda, T.9
Wester, M.10
Yamagishi, J.11
-
16
-
-
84959178048
-
Robust deep feature for spoofing detection - The SJTU system for ASVspoof 2015 challenge
-
Dresden, Germany, Sep.
-
N. Chen, Y Qian, H. Dinkel, B. Chen, and K. Yu, "Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 Challenge," in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2097-2101.
-
(2015)
Proc. INTERSPEECH
, pp. 2097-2101
-
-
Chen, N.1
Qian, Y.2
Dinkel, H.3
Chen, B.4
Yu, K.5
-
17
-
-
84937849144
-
Generative adversarial nets
-
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems 27, pp. 2672-2680. 2014.
-
(2014)
Advances in Neural Information Processing Systems 27
, pp. 2672-2680
-
-
Goodfellow, I.1
Pouget-Abadie, J.2
Mirza, M.3
Xu, B.4
Warde-Farley, D.5
Ozair, S.6
Courville, A.7
Bengio, Y.8
-
18
-
-
33746600649
-
Reducing the dimensionality of data with neural networks
-
G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, Vol. 313, no. 5786, pp. 504-507, 2006.
-
(2006)
Science
, vol.313
, Issue.5786
, pp. 504-507
-
-
Hinton, G.E.1
Salakhutdinov, R.R.2
-
19
-
-
84946045510
-
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
-
Brisbane, Australia, Apr.
-
H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, Brisbane, Australia, Apr. 2015, pp. 4470-4474.
-
(2015)
Proc. ICASSP
, pp. 4470-4474
-
-
Zen, H.1
Sak, H.2
-
20
-
-
84959090360
-
Multitask learning deep neural networks for speech feature denoising
-
Dresden, Germany, Sep.
-
B. Huang, D. Ke, H. Zheng, B. Xu, Y Xu, and K. Su, "Multitask learning deep neural networks for speech feature denoising," in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2464-2468.
-
(2015)
Proc. INTERSPEECH
, pp. 2464-2468
-
-
Huang, B.1
Ke, D.2
Zheng, H.3
Xu, B.4
Xu, Y.5
Su, K.6
-
21
-
-
84998636515
-
Generative adversarial text-to-image synthesis
-
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, "Generative adversarial text-to-image synthesis," in Proc. ICML, 2016, pp. 1060-1069.
-
(2016)
Proc. ICML
, pp. 1060-1069
-
-
Reed, S.1
Akata, Z.2
Yan, X.3
Logeswaran, L.4
Schiele, B.5
Lee, H.6
-
22
-
-
84984985889
-
"Why should I trust you?": Explaining the predictions of any classifier
-
San Francisco, U.S.A., Aug.
-
T. R. Marco, S. Sameer, and G. Carlos, ""Why should I trust you?": Explaining the predictions of any classifier," in Proc. KDD, San Francisco, U.S.A., Aug. 2016, pp. 1135-1164.
-
(2016)
Proc. KDD
, pp. 1135-1164
-
-
Marco, T.R.1
Sameer, S.2
Carlos, G.3
-
23
-
-
83755163018
-
-
D. N. Reshef, Y A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti, "Detecting novel associations in large data sets," vol. 334, no. 6062, pp. 1518-1524, 2011.
-
(2011)
Detecting Novel Associations in Large Data Sets
, vol.334
, Issue.6062
, pp. 1518-1524
-
-
Reshef, D.N.1
Reshef, Y.A.2
Finucane, H.K.3
Grossman, S.R.4
McVean, G.5
Turnbaugh, P.J.6
Lander, E.S.7
Mitzenmacher, M.8
Sabeti, P.C.9
-
24
-
-
0142210563
-
-
no. TR-I-0166M
-
M. Abe, Y Sagisaka, T. Umeda, and H. Kuwabara, "ATR technical repoart,", no. TR-I-0166M, 1990.
-
(1990)
ATR Technical Repoart
-
-
Abe, M.1
Sagisaka, Y.2
Umeda, T.3
Kuwabara, H.4
-
25
-
-
84874199000
-
Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
-
Firentze, Italy, Sep.
-
H. Kawahara, Jo Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in MAVEBA 2001, Firentze, Italy, Sep. 2001, pp. 1-6.
-
(2001)
MAVEBA 2001
, pp. 1-6
-
-
Kawahara, H.1
Estill, J.2
Fujimura, O.3
-
26
-
-
44949143155
-
Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
-
Pittsburgh, U.S.A., Sep.
-
Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation," in Proc. INTERSPEECH, Pittsburgh, U.S.A., Sep. 2006, pp. 2266-2269.
-
(2006)
Proc. INTERSPEECH
, pp. 2266-2269
-
-
Ohtani, Y.1
Toda, T.2
Saruwatari, H.3
Shikano, K.4
-
27
-
-
0032673049
-
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
-
H. Kawahara, I. Masuda-Katsuse, and A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, Vol. 27, no. 3-4, pp. 187-207, 1999.
-
(1999)
Speech Communication
, vol.27
, Issue.3-4
, pp. 187-207
-
-
Kawahara, H.1
Masuda-Katsuse, I.2
Cheveigne, A.D.3
-
28
-
-
84994252904
-
The NAIST text-to-speech system for the blizzard challenge 2015
-
Berlin, Germany, Sep.
-
S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, and S. Nakamura, "The NAIST text-to-speech system for the Blizzard Challenge 2015," in Proc. Blizzard Challenge workshop, Berlin, Germany, Sep. 2015.
-
(2015)
Proc. Blizzard Challenge Workshop
-
-
Takamichi, S.1
Kobayashi, K.2
Tanaka, K.3
Toda, T.4
Nakamura, S.5
-
29
-
-
84862294866
-
Deep sparse rectifier neural networks
-
Lauderdale, U.S.A., Apr.
-
X. Glorot, A. Bordes, and Y Bengio, "Deep sparse rectifier neural networks," in Proc. AISTATS, Lauderdale, U.S.A., Apr. 2011, pp. 315-323.
-
(2011)
Proc. AISTATS
, pp. 315-323
-
-
Glorot, X.1
Bordes, A.2
Bengio, Y.3
-
30
-
-
80052250414
-
Adaptive subgradient methods for online learning and stochastic optimization
-
J. Duchi, E. Hazan, and Y Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, Vol. 12, pp. 2121-2159, 2011.
-
(2011)
Journal of Machine Learning Research
, vol.12
, pp. 2121-2159
-
-
Duchi, J.1
Hazan, E.2
Singer, Y.3
-
31
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
|