-
1
-
-
84876687945
-
Speech synthesis based on hidden Markov models
-
May
-
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
-
(2013)
Proceedings of the IEEE
, vol.101
, Issue.5
, pp. 1234-1252
-
-
Tokuda, K.1
Nankaku, Y.2
Toda, T.3
Zen, H.4
Yamagishi, J.5
Oura, K.6
-
2
-
-
67651002140
-
Statistical parametric speech synthesis
-
H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
-
(2009)
Speech Communication
, vol.51
, Issue.11
, pp. 1039-1064
-
-
Zen, H.1
Tokuda, K.2
Black, A.W.3
-
3
-
-
84890490547
-
Statistical parametric speech synthesis using deep neural networks
-
May
-
H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
-
(2013)
Proc. of ICASSP
, pp. 7962-7966
-
-
Zen, H.1
Senior, A.2
Schuster, M.3
-
4
-
-
84910047819
-
TTS synthesis with bidirectional LSTM based recurrent neural networks
-
Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
-
(2014)
Inter Speech
, pp. 1964-1968
-
-
Fan, Y.1
Qian, Y.2
Xie, F.-L.3
Soong, F.K.4
-
5
-
-
84946045510
-
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
-
H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. of ICASSP. IEEE, 2015, pp. 4470-4474.
-
(2015)
Proc. of ICASSP. IEEE
, pp. 4470-4474
-
-
Zen, H.1
Sak, H.2
-
6
-
-
84978086501
-
Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training
-
Z. Wu and S. King, "Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 7, pp. 1255-1265, 2016.
-
(2016)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
, vol.24
, Issue.7
, pp. 1255-1265
-
-
Wu, Z.1
King, S.2
-
7
-
-
85023745327
-
An autoregressive recurrent mixture density network for parametric speech synthesis
-
X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
-
(2017)
Proc. of ICASSP. IEEE
, pp. 4895-4899
-
-
Wang, X.1
Takaki, S.2
Yamagishi, J.3
-
8
-
-
0032673049
-
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
-
H. Kawahara, I. Masuda-Katsuse, and A. De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
-
(1999)
Speech Communication
, vol.27
, Issue.3
, pp. 187-207
-
-
Kawahara, H.1
Masuda-Katsuse, I.2
De Cheveigne, A.3
-
9
-
-
84874199000
-
Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
-
H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
-
(2001)
MAVEBA
-
-
Kawahara, H.1
Estill, J.2
Fujimura, O.3
-
10
-
-
77957744515
-
HMM-based speech synthesis utilizing glottal inverse filtering
-
January
-
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153-165, January 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.1
, pp. 153-165
-
-
Raitio, T.1
Suni, A.2
Yamagishi, J.3
Pulakka, H.4
Nurminen, J.5
Vainio, M.6
Alku, P.7
-
11
-
-
84994338062
-
GlottDNN-A full-band glottal vocoder for statistical parametric speech synthesis
-
M. Airaksinen, B. Bollepalli, L. Juvela, Z. Wu, S. King, and P. Alku, "GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis, " in Proc. of Interspeech, 2016.
-
(2016)
Proc. of Interspeech
-
-
Airaksinen, M.1
Bollepalli, B.2
Juvela, L.3
Wu, Z.4
King, S.5
Alku, P.6
-
12
-
-
0035127703
-
Applying the harmonic plus noise model in concatenative speech synthesis
-
Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis, " IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, pp. 21-29, 2001.
-
(2001)
IEEE Transactions on Speech and Audio Processing
, vol.9
, Issue.1
, pp. 21-29
-
-
Stylianou, Y.1
-
13
-
-
84897865577
-
Harmonics plus noise model based vocoder for statistical parametric speech synthesis
-
April
-
D. Erro, I. Sainz, E. Navas, and I. Hernaez, "Harmonics plus noise model based vocoder for statistical parametric speech synthesis, " IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 184-194, April 2014.
-
(2014)
IEEE Journal of Selected Topics in Signal Processing
, vol.8
, Issue.2
, pp. 184-194
-
-
Erro, D.1
Sainz, I.2
Navas, E.3
Hernaez, I.4
-
14
-
-
0026881384
-
Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
-
Eurospeech '91
-
P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, " Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992, Eurospeech '91.
-
(1992)
Speech Communication
, vol.11
, Issue.2-3
, pp. 109-118
-
-
Alku, P.1
-
15
-
-
84973293681
-
Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
-
Mar
-
L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
-
(2016)
Proc. of ICASSP
, pp. 5120-5124
-
-
Juvela, L.1
Bollepalli, B.2
Airaksinen, M.3
Alku, P.4
-
16
-
-
85017575778
-
Glottal vocoding with frequency-warped time-weighted linear prediction
-
April
-
M. Airaksinen, B. Bollepalli, J. Pohjalainen, and P. Alku, "Glottal vocoding with frequency-warped time-weighted linear prediction, " IEEE Signal Processing Letters, vol. 24, no. 4, pp. 446-450, April 2017.
-
(2017)
IEEE Signal Processing Letters
, vol.24
, Issue.4
, pp. 446-450
-
-
Airaksinen, M.1
Bollepalli, B.2
Pohjalainen, J.3
Alku, P.4
-
17
-
-
84910068090
-
Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
-
September
-
T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " in Proc. of Interspeech, Singapore, September 2014, pp. 1969-1973.
-
(2014)
Proc. of Interspeech, Singapore
, pp. 1969-1973
-
-
Raitio, T.1
Suni, A.2
Juvela, L.3
Vainio, M.4
Alku, P.5
-
18
-
-
84937849144
-
Generative adversarial nets
-
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Advances in neural information processing systems, 2014, pp. 2672-2680.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 2672-2680
-
-
Goodfellow, I.1
Pouget-Abadie, J.2
Mirza, M.3
Xu, B.4
Warde-Farley, D.5
Ozair, S.6
Courville, A.7
Bengio, Y.8
-
19
-
-
85017259342
-
WaveNet: A generative model for raw audio
-
A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio, " Pre-print, 2016, https://arxiv.org/pdf/1609.03499.pdf.
-
(2016)
Pre-print
-
-
Van den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
20
-
-
85039156048
-
Deep voice: Real-time neural text-to-speech
-
submission
-
S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, "Deep voice: Real-time neural text-to-speech, " in ICML 2017 (submission), 2017, https://arxiv.org/pdf/1702.07825.pdf.
-
(2017)
ICML 2017
-
-
Arik, S.O.1
Chrzanowski, M.2
Coates, A.3
Diamos, G.4
Gibiansky, A.5
Kang, Y.6
Li, X.7
Miller, J.8
Ng, A.9
Raiman, J.10
Sengupta, S.11
Shoeybi, M.12
-
22
-
-
84902548165
-
Glottal source processing: From analysis to applications
-
T. Drugman, P. Alku, A. Alwan, and B. Yegnanarayana, "Glottal source processing: From analysis to applications, " Computer Speech & Language, vol. 28, no. 5, pp. 1117-1138, 2014.
-
(2014)
Computer Speech & Language
, vol.28
, Issue.5
, pp. 1117-1138
-
-
Drugman, T.1
Alku, P.2
Alwan, A.3
Yegnanarayana, B.4
-
23
-
-
84856294347
-
Glottal inverse filtering analysis of human voice production-A review of estimation and parameterization methods of the glottal excitation and their applications. (invited article)
-
P. Alku, "Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. (invited article), " Sadhana-Academy Proceedings in Engineering Sciences, vol. 36, no. 5, pp. 623-650, 2011.
-
(2011)
Sadhana-Academy Proceedings in Engineering Sciences
, vol.36
, Issue.5
, pp. 623-650
-
-
Alku, P.1
-
24
-
-
85023752230
-
Generative adversarial network-based postfilter for statistical parametric speech synthesis
-
March
-
T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
-
(2017)
Proc. of ICASSP
, pp. 4910-4914
-
-
Kaneko, T.1
Kameoka, H.2
Hojo, N.3
Ijima, Y.4
Hiramatsu, K.5
Kashino, K.6
-
25
-
-
85023772724
-
Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis
-
New Orleans, USA
-
Y. Saito, S. Takamichi, and H. Saruwatari, "Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis, " in ICASSP, New Orleans, USA, 2017, pp. 4900-4904.
-
(2017)
ICASSP
, pp. 4900-4904
-
-
Saito, Y.1
Takamichi, S.2
Saruwatari, H.3
-
28
-
-
85039158168
-
-
S.-W. Fu, Y. Tsao, X. Lu, and H. Kawai, "Raw waveform-based speech enhancement by fully convolutional networks, " arXiv preprint arXiv:1703.02205, 2017.
-
(2017)
Raw Waveform-based Speech Enhancement by Fully Convolutional Networks
-
-
Fu, S.-W.1
Tsao, Y.2
Lu, X.3
Kawai, H.4
-
29
-
-
85039149742
-
-
X. Mao, Q. Li, H. Xie, R. Y. Lau, and Z. Wang, "Least squares generative adversarial networks, " arXiv preprint arXiv:1611.04076v2, 2017.
-
(2017)
Least Squares Generative Adversarial Networks
-
-
Mao, X.1
Li, Q.2
Xie, H.3
Lau, R.Y.4
Wang, Z.5
-
31
-
-
70450180978
-
Robust LTS rules with the Combilex speech technology lexicon
-
September
-
K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
-
(2009)
Proc. of Interspeech, Brighton
, pp. 1295-1298
-
-
Richmond, K.1
Clark, R.A.2
Fitt, S.3
-
33
-
-
85039167642
-
Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system
-
L. Juvela, B. Bollepalli, J. Yamagishi, and P. Alku, "Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system, " in "Submitted to Interspeech", 2017.
-
(2017)
Submitted to Interspeech
-
-
Juvela, L.1
Bollepalli, B.2
Yamagishi, J.3
Alku, P.4
-
35
-
-
0029254163
-
Non-parametric techniques for pitch-scale and time-scale modification of speech
-
E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
-
(1995)
Speech Communication
, vol.16
, Issue.2
, pp. 175-205
-
-
Moulines, E.1
Laroche, J.2
-
36
-
-
0003450846
-
800, methods for subjective determination of transmission quality
-
Recommendation ITUTP, "800, methods for subjective determination of transmission quality, " International Telecommunication Union, 1996.
-
(1996)
International Telecommunication Union
-
-
Itutp, R.1
-
37
-
-
85039147615
-
-
CrowdFlower Inc
-
CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
-
(2017)
Crowd-sourcing Platform
-
-
|