-
1
-
-
84873901811
-
Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models
-
May
-
R. Astudillo and R. Orglmeister, "Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1023-1034, May 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.5
, pp. 1023-1034
-
-
Astudillo, R.1
Orglmeister, R.2
-
3
-
-
42549139762
-
MVA processing of speech features
-
Jan.
-
C.-P. Chen and J. A. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 257-270, Jan. 2007.
-
(2007)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.15
, Issue.1
, pp. 257-270
-
-
Chen, C.-P.1
Bilmes, J.A.2
-
4
-
-
84890527827
-
Improving deep neural networks for LVCSR using rectified linear units and dropout
-
G. E. Dahl, T. N. Sainath, and G. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," in Proc. IEEE ICASSP, 2013, pp. 8609-8613.
-
Proc. IEEE ICASSP, 2013
, pp. 8609-8613
-
-
Dahl, G.E.1
Sainath, T.N.2
Hinton, G.3
-
5
-
-
84055222005
-
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
-
Mar.
-
G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Mar. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 30-42
-
-
Dahl, G.1
Yu, D.2
Deng, L.3
Acero, A.4
-
6
-
-
84906222220
-
Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
-
M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?," in Proc. Interspeech, 2013, pp. 2992-2996.
-
Proc. Interspeech, 2013
, pp. 2992-2996
-
-
Delcroix, M.1
Kubo, Y.2
Nakatani, T.3
Nakamura, A.4
-
7
-
-
0034855352
-
High-performance robust speech recognition using stereo training data
-
L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data," in Proc. IEEE ICASSP, 2001, pp. 301-304.
-
Proc. IEEE ICASSP, 2001
, pp. 301-304
-
-
Deng, L.1
Acero, A.2
Jiang, L.3
Droppo, J.4
Huang, X.5
-
8
-
-
18744401086
-
Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
-
May
-
L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 412-421, May 2005.
-
(2005)
IEEE Trans. Speech Audio Process.
, vol.13
, Issue.3
, pp. 412-421
-
-
Deng, L.1
Droppo, J.2
Acero, A.3
-
9
-
-
84886120743
-
Feature compensation
-
T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, ch. 9
-
J. Droppo, "Feature compensation," in Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, 2012, ch. 9, pp. 229-250.
-
(2012)
Techniques for Noise Robustness in Automatic Speech Recognition
, pp. 229-250
-
-
Droppo, J.1
-
10
-
-
78049390326
-
HMM-based pseudo-clean speech synthesis for splice algorithm
-
J. Du, Y. Hu, L.-R. Dai, and R.-H. Wang, "HMM-based pseudo-clean speech synthesis for splice algorithm," in Proc. IEEE ICASSP, 2010, pp. 4570-4573.
-
Proc. IEEE ICASSP, 2010
, pp. 4570-4573
-
-
Du, J.1
Hu, Y.2
Dai, L.-R.3
Wang, R.-H.4
-
11
-
-
80052250414
-
Adaptive subgradient methods for online learning and stochastic optimization
-
J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Mach. Learn. Res., vol. 12, pp. 2121-2159, 2010.
-
(2010)
J. Mach. Learn. Res.
, vol.12
, pp. 2121-2159
-
-
Duchi, J.1
Hazan, E.2
Singer, Y.3
-
14
-
-
77949378972
-
Discriminative adaptive training with VTS and JUD
-
F. Flego and M. J. F. Gales, "Discriminative adaptive training with VTS and JUD," in Proc. IEEE ASRU, 2009, pp. 170-175.
-
Proc. IEEE ASRU, 2009
, pp. 170-175
-
-
Flego, F.1
Gales, M.J.F.2
-
15
-
-
0032050110
-
Maximum likelihood linear transformations for HMM-based speech recognition
-
M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
-
(1998)
Comput. Speech Lang.
, vol.12
, Issue.2
, pp. 75-98
-
-
Gales, M.J.F.1
-
16
-
-
84893710272
-
Maxout networks
-
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks," J. Mach. Learn. Res. Workshop Conf. Proc., vol. 28, no. 3, pp. 1319-1327, 2013.
-
(2013)
J. Mach. Learn. Res. Workshop Conf. Proc.
, vol.28
, Issue.3
, pp. 1319-1327
-
-
Goodfellow, I.J.1
Warde-Farley, D.2
Mirza, M.3
Courville, A.4
Bengio, Y.5
-
17
-
-
84869105129
-
A classification based approach to speech segregation
-
K. Han and D. L. Wang, "A classification based approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, no. 5, pp. 3475-3483, 2012.
-
(2012)
J. Acoust. Soc. Amer.
, vol.132
, Issue.5
, pp. 3475-3483
-
-
Han, K.1
Wang, D.L.2
-
18
-
-
84881088302
-
A direct masking approach to robust ASR
-
Oct.
-
W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. L. Wang, "A direct masking approach to robust ASR," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 1993-2005, Oct. 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.10
, pp. 1993-2005
-
-
Hartmann, W.1
Narayanan, A.2
Fosler-Lussier, E.3
Wang, D.L.4
-
19
-
-
0028517164
-
RASTA processing of speech
-
Oct.
-
H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-589, Oct. 1994.
-
(1994)
IEEE Trans. Speech Audio Process.
, vol.2
, Issue.4
, pp. 578-589
-
-
Hermansky, H.1
Morgan, N.2
-
20
-
-
84867720412
-
-
arXiv preprint arXiv:1207.0580
-
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
-
(2012)
Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
-
-
Hinton, G.E.1
Srivastava, N.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
21
-
-
33745805403
-
A fast learning algorithm for deep belief nets
-
G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
-
(2006)
Neural Comput.
, vol.18
, Issue.7
, pp. 1527-1554
-
-
Hinton, G.1
Osindero, S.2
Teh, Y.3
-
22
-
-
70349093614
-
An algorithm that improves speech intelligibility in noise for normal-hearing listeners
-
G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486-1494, 2009.
-
(2009)
J. Acoust. Soc. Amer.
, vol.126
, Issue.3
, pp. 1486-1494
-
-
Kim, G.1
Lu, Y.2
Hu, Y.3
Loizou, P.4
-
23
-
-
78649325568
-
Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise
-
W. Kim and R. Stern, "Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise," Speech Commun., vol. 53, pp. 1-11, 2011.
-
(2011)
Speech Commun.
, vol.53
, pp. 1-11
-
-
Kim, W.1
Stern, R.2
-
24
-
-
84878919540
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Adv. Neural Inf. Process. Syst., vol. 25, pp. 1106-1114, 2012.
-
(2012)
Adv. Neural Inf. Process. Syst.
, vol.25
, pp. 1106-1114
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
25
-
-
84878409063
-
Recurrent neural networks for noise reduction in robust ASR
-
A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent neural networks for noise reduction in robust ASR," in Proc. Interspeech, 2012.
-
Proc. Interspeech, 2012
-
-
Maas, A.L.1
Le, Q.V.2
O'Neil, T.M.3
Vinyals, O.4
Nguyen, P.5
Ng, A.Y.6
-
26
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
Jan.
-
A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 14-22
-
-
Mohamed, A.1
Dahl, G.2
Hinton, G.3
-
27
-
-
77956509090
-
Rectified linear units improve restricted Boltzmann machines
-
V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc. ICML 27, 2010, pp. 807-814.
-
Proc. ICML 27, 2010
, pp. 807-814
-
-
Nair, V.1
Hinton, G.E.2
-
29
-
-
84890493989
-
Ideal ratio mask estimation using deep neural networks for robust speech recognition
-
A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7092-7096.
-
Proc. IEEE ICASSP, 2013
, pp. 7092-7096
-
-
Narayanan, A.1
Wang, D.2
-
30
-
-
85009227702
-
Analysis of the Aurora large vocabulary evaluations
-
N. Parihar and J. Picone, "Analysis of the Aurora large vocabulary evaluations," in Proc. Eurospeech, 2003, pp. 337-340.
-
Proc. Eurospeech, 2003
, pp. 337-340
-
-
Parihar, N.1
Picone, J.2
-
31
-
-
84890448307
-
An evaluation of posterior modeling techniques for phonetic recognition
-
R. Prabhavalkar, T. N. Sainath, D. Nahamoo, B. Ramabhadran, and D. Kanevsky, "An evaluation of posterior modeling techniques for phonetic recognition," in Proc. IEEE ICASSP, 2013, pp. 7165-7169.
-
Proc. IEEE ICASSP, 2013
, pp. 7165-7169
-
-
Prabhavalkar, R.1
Sainath, T.N.2
Nahamoo, D.3
Ramabhadran, B.4
Kanevsky, D.5
-
32
-
-
85032752225
-
Missing-feature approaches in speech recognition
-
B. Raj and R. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 101-116, 2005.
-
(2005)
IEEE Signal Process. Mag.
, vol.22
, Issue.5
, pp. 101-116
-
-
Raj, B.1
Stern, R.2
-
33
-
-
0142026377
-
Speech segregation based on sound localization
-
N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003.
-
(2003)
J. Acoust. Soc. Amer.
, vol.114
, Issue.4
, pp. 2236-2252
-
-
Roman, N.1
Wang, D.L.2
Brown, G.J.3
-
34
-
-
82255167374
-
Intelligibility of reverberant noisy speech with ideal binary masking
-
N. Roman and J. Woodruff, "Intelligibility of reverberant noisy speech with ideal binary masking," J. Acoust. Soc. Amer., vol. 130, no. 4, pp. 2153-2161, 2011.
-
(2011)
J. Acoust. Soc. Amer.
, vol.130
, Issue.4
, pp. 2153-2161
-
-
Roman, N.1
Woodruff, J.2
-
35
-
-
84858976070
-
Feature engineering in context-dependent deep neural networks for conversational speech transcription
-
F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. IEEE ASRU, 2011, pp. 24-29.
-
Proc. IEEE ASRU, 2011
, pp. 24-29
-
-
Seide, F.1
Li, G.2
Chen, X.3
Yu, D.4
-
36
-
-
4644317224
-
A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
-
M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
-
(2004)
Speech Commun.
, vol.43
, Issue.4
, pp. 379-393
-
-
Seltzer, M.L.1
Raj, B.2
Stern, R.M.3
-
37
-
-
84890492030
-
An investigation of deep neural networks for noise robust speech recognition
-
M. L. Seltzer, D. Yu, and Y.-Q. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7398-7402.
-
Proc. IEEE ICASSP, 2013
, pp. 7398-7402
-
-
Seltzer, M.L.1
Yu, D.2
Wang, Y.-Q.3
-
38
-
-
33750311718
-
Binary and ratio time-frequency masks for robust speech recognition
-
S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006.
-
(2006)
Speech Commun.
, vol.48
, pp. 1486-1501
-
-
Srinivasan, S.1
Roman, N.2
Wang, D.L.3
-
40
-
-
84892233308
-
On ideal binary masks as the computational goal of auditory scene analysis
-
P. Divenyi, Ed. Boston, MA, USA: Kluwer
-
D. L. Wang, "On ideal binary masks as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Boston, MA, USA: Kluwer, 2005, pp. 181-197.
-
(2005)
Speech Separation by Humans and Machines
, pp. 181-197
-
-
Wang, D.L.1
-
41
-
-
64649103540
-
Speech intelligibility in background noise with ideal binary time-frequency masking
-
D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
-
(2009)
J. Acoust. Soc. Amer.
, vol.125
, pp. 2336-2347
-
-
Wang, D.L.1
Kjems, U.2
Pedersen, M.S.3
Boldt, J.B.4
Lunner, T.5
-
42
-
-
84870477511
-
Exploring monaural features for classification-based speech segregation
-
Y. Wang, K. Han, and D. L. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, pp. 270-279, 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, pp. 270-279
-
-
Wang, Y.1
Han, K.2
Wang, D.L.3
-
43
-
-
84875678689
-
Towards scaling up classification-based speech separation
-
Jul.
-
Y. Wang and D. L. Wang, "Towards scaling up classification-based speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.7
, pp. 1381-1390
-
-
Wang, Y.1
Wang, D.L.2
-
44
-
-
84890523904
-
Feature denoising for speech separation in unknown noisy environments
-
Y. Wang and D. L. Wang, "Feature denoising for speech separation in unknown noisy environments," in Proc. IEEE ICASSP, 2013, pp. 7472-7476.
-
Proc. IEEE ICASSP, 2013
, pp. 7472-7476
-
-
Wang, Y.1
Wang, D.L.2
-
45
-
-
84862293102
-
Speaker and noise factorization for robust speech recognition
-
Y.-Q. Wang and M. J. F. Gales, "Speaker and noise factorization for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2149-2158, 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.7
, pp. 2149-2158
-
-
Wang, Y.-Q.1
Gales, M.J.F.2
-
46
-
-
84900537286
-
The Munich feature enhancement approach to the 2nd CHiME challenge using BLSTM recurrent neural networks
-
F. Weninger, J. Geiger, M. Wöllmer, B. Schuller, and G. Rigoll, "The Munich feature enhancement approach to the 2nd CHiME challenge using BLSTM recurrent neural networks," in Proc. 2nd CHiME Mach. Listen. Multisource Environ. Workshop, 2013, pp. 86-90.
-
Proc. 2nd CHiME Mach. Listen. Multisource Environ. Workshop, 2013
, pp. 86-90
-
-
Weninger, F.1
Geiger, J.2
Wöllmer, M.3
Schuller, B.4
Rigoll, G.5
-
47
-
-
0003822743
-
-
Cambridge, U.K.: Cambridge Univ. Press, [Online]. Available
-
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, U.K.: Cambridge Univ. Press, 2002 [Online]. Available: http://htk.eng.cam.ac.uk.
-
(2002)
The HTK Book
-
-
Young, S.1
Evermann, G.2
Hain, T.3
Kershaw, D.4
Moore, G.5
Odell, J.6
Ollason, D.7
Povey, D.8
Valtchev, V.9
Woodland, P.10
-
48
-
-
85083953021
-
Feature learning in deep neural networks - studies on speech recognition tasks
-
D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, "Feature learning in deep neural networks - studies on speech recognition tasks," in Proc. ICLR, 2013.
-
Proc. ICLR, 2013
-
-
Yu, D.1
Seltzer, M.L.2
Li, J.3
Huang, J.-T.4
Seide, F.5
|