-
2
-
-
85050528588
-
Listening while speaking: Speech chain by deep learning
-
Dec
-
A. Tjandra, S. Sakti, and S. Nakamura, “Listening while speaking: Speech chain by deep learning,” in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, pp. 301-308.
-
(2017)
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
, pp. 301-308
-
-
Tjandra, A.1
Sakti, S.2
Nakamura, S.3
-
3
-
-
84973293705
-
End-to-end attention-based large vocabulary speech recognition
-
IEEE
-
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-end attention-based large vocabulary speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4945-4949.
-
(2016)
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on
, pp. 4945-4949
-
-
Bahdanau, D.1
Chorowski, J.2
Serdyuk, D.3
Brakel, P.4
Bengio, Y.5
-
4
-
-
84973351869
-
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
-
IEEE
-
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4960-4964.
-
(2016)
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on
, pp. 4960-4964
-
-
Chan, W.1
Jaitly, N.2
Le, Q.3
Vinyals, O.4
-
5
-
-
85050561560
-
-
arXiv preprint
-
Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio et al., “Tacotron: A fully end-to-end text-to-speech synthesis model,” arXiv preprint arXiv:1703.10135, 2017.
-
(2017)
Tacotron: A Fully End-to-End Text-to-Speech Synthesis Model
-
-
Wang, Y.1
Skerry-Ryan, R.2
Stanton, D.3
Wu, Y.4
Weiss, R.J.5
Jaitly, N.6
Yang, Z.7
Xiao, Y.8
Chen, Z.9
Bengio, S.10
-
6
-
-
84928547704
-
Sequence-to-Sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence-to-Sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104-3112.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
7
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
8
-
-
84939821078
-
-
arXiv preprint
-
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
-
(2014)
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
-
-
Chung, J.1
Gulcehre, C.2
Cho, K.3
Bengio, Y.4
-
10
-
-
85047019895
-
-
arXiv preprint
-
C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kan-nan, and Z. Zhu, “Deep speaker: an end-to-end neural speaker embedding system,” arXiv preprint arXiv:1705.02304, 2017.
-
(2017)
Deep Speaker: An End-to-End Neural Speaker Embedding System
-
-
Li, C.1
Ma, X.2
Jiang, B.3
Li, X.4
Zhang, X.5
Liu, X.6
Cao, Y.7
Kannan, A.8
Zhu, Z.9
-
11
-
-
84960920723
-
-
arXiv preprint
-
B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.
-
(2015)
Empirical Evaluation of Rectified Activations in Convolutional Network
-
-
Xu, B.1
Wang, N.2
Chen, T.3
Li, M.4
-
12
-
-
84990062230
-
Autoencoding beyond pixels using a learned similarity metric
-
A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in International Conference on Machine Learning, 2016, pp. 1558-1566.
-
(2016)
International Conference on Machine Learning
, pp. 1558-1566
-
-
Larsen, A.B.L.1
Sønderby, S.K.2
Larochelle, H.3
Winther, O.4
-
13
-
-
84990854047
-
Perceptual losses for real-time style transfer and super-resolution
-
Springer
-
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision. Springer, 2016, pp. 694-711.
-
(2016)
European Conference on Computer Vision
, pp. 694-711
-
-
Johnson, J.1
Alahi, A.2
Fei-Fei, L.3
-
14
-
-
0012330750
-
The design for the wall street journal-based csr corpus
-
Association for Computational Linguistics
-
D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357-362.
-
(1992)
Proceedings of The Workshop on Speech and Natural Language
, pp. 357-362
-
-
Paul, D.B.1
Baker, J.M.2
-
15
-
-
84858953642
-
The Kaldi speech recognition toolkit
-
IEEE Signal Processing Society, Dec. iEEE Catalog CFP11SRW-USB
-
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, Dec. 2011, iEEE Catalog No.: CFP11SRW-USB.
-
(2011)
IEEE 2011 Workshop on Automatic Speech Recognition and Understanding
-
-
Povey, D.1
Ghoshal, A.2
Boulianne, G.3
Burget, L.4
Glembek, O.5
Goel, N.6
Hannemann, M.7
Motlicek, P.8
Qian, Y.9
Schwarz, P.10
Silovsky, J.11
Stemmer, G.12
Vesely, K.13
-
16
-
-
85030231537
-
-
Feb
-
B. McFee, M. McVicar, O. Nieto, S. Balke, C. Thome, D. Liang, E. Battenberg, J. Moore, R. Bittner, R. Yamamoto, and et al., “li-brosa 0.5.0,” Feb 2017.
-
(2017)
Li-Brosa 0.5.0
-
-
McFee, B.1
McVicar, M.2
Nieto, O.3
Balke, S.4
Thome, C.5
Liang, D.6
Battenberg, E.7
Moore, J.8
Bittner, R.9
Yamamoto, R.10
-
18
-
-
85047343776
-
-
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
-
(2017)
Automatic Differentiation in Pytorch
-
-
Paszke, A.1
Gross, S.2
Chintala, S.3
Chanan, G.4
Yang, E.5
DeVito, Z.6
Lin, Z.7
Desmaison, A.8
Antiga, L.9
Lerer, A.10
-
19
-
-
85023752928
-
Joint ctc-attention based end-to-end speech recognition using multi-task learning
-
IEEE
-
S. Kim, T. Hori, and S. Watanabe, “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 4835-4839.
-
(2017)
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
, pp. 4835-4839
-
-
Kim, S.1
Hori, T.2
Watanabe, S.3
-
20
-
-
85050532161
-
Attention-based wav2text with feature transfer learning
-
Dec
-
A. Tjandra, S. Sakti, and S. Nakamura, “Attention-based wav2text with feature transfer learning,” in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, pp. 309-315.
-
(2017)
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
, pp. 309-315
-
-
Tjandra, A.1
Sakti, S.2
Nakamura, S.3
-
21
-
-
26444592207
-
Learning from labeled and unlabeled data with label propagation
-
X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation,” Tech. Rep., 2002.
-
(2002)
Tech. Rep.
-
-
Zhu, X.1
Ghahramani, Z.2
-
22
-
-
85011070895
-
-
arXiv preprint
-
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
-
(2016)
Wavenet: A Generative Model for Raw Audio
-
-
Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
23
-
-
84983119674
-
Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
-
IEEE
-
P. Swietojanski and S. Renals, “Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models,” in Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014, pp. 171-176.
-
(2014)
Spoken Language Technology Workshop (SLT), 2014 IEEE
, pp. 171-176
-
-
Swietojanski, P.1
Renals, S.2
-
24
-
-
84959112868
-
A study of speaker adaptation for dnn-based speech synthesis
-
Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King, “A study of speaker adaptation for dnn-based speech synthesis,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
-
(2015)
Sixteenth Annual Conference of The International Speech Communication Association
-
-
Wu, Z.1
Swietojanski, P.2
Veaux, C.3
Renals, S.4
King, S.5
-
25
-
-
79951609039
-
Front-end factor analysis for speaker verification
-
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.4
, pp. 788-798
-
-
Dehak, N.1
Kenny, P.J.2
Dehak, R.3
Dumouchel, P.4
Ouellet, P.5
-
26
-
-
57749193836
-
Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
-
T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
-
(2007)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.15
, Issue.8
, pp. 2222-2235
-
-
Toda, T.1
Black, A.W.2
Tokuda, K.3
|