-
1
-
-
84891583985
-
-
New York, NY, USA: Wiley
-
T. Virtanen, R. Singh, and B. Raj, Techniques for noise robustness in automatic speech recognition. New York, NY, USA: Wiley, 2012.
-
(2012)
Techniques for Noise Robustness in Automatic Speech Recognition
-
-
Virtanen, T.1
Singh, R.2
Raj, B.3
-
2
-
-
0035396555
-
Noise power spectral density estimation based on optimal smoothing and minimum statistics
-
Jul.
-
R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504-512, Jul. 2001.
-
(2001)
IEEE Trans. Speech Audio Process.
, vol.9
, Issue.5
, pp. 504-512
-
-
Martin, R.1
-
3
-
-
51449100115
-
Efficient model-based speech separation and denoising using non-negative subspace analysis
-
S. J. Rennie, J. R. Hershey, and P. A. Olsen, "Efficient model-based speech separation and denoising using non-negative subspace analysis," in Proc. ICASSP, Las Vegas, NV, USA, 2008, pp. 1833-1836.
-
Proc. ICASSP, Las Vegas, NV, USA, 2008
, pp. 1833-1836
-
-
Rennie, S.J.1
Hershey, J.R.2
Olsen, P.A.3
-
4
-
-
38049021850
-
Convolutive speech bases and their application to supervised speech separation
-
Jan.
-
P. Smaragdis, "Convolutive speech bases and their application to supervised speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 1-14, Jan. 2007.
-
(2007)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.15
, Issue.1
, pp. 1-14
-
-
Smaragdis, P.1
-
5
-
-
85016663198
-
RASTA-PLP speech analysis technique
-
H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, "RASTA-PLP speech analysis technique," in Proc. ICASSP, San Francisco, CA,USA, 1992, vol. 1, pp. 121-124.
-
Proc. ICASSP, San Francisco, CA,USA, 1992
, vol.1
, pp. 121-124
-
-
Hermansky, H.1
Morgan, N.2
Bayya, A.3
Kohn, P.4
-
6
-
-
77955673019
-
Model-based feature enhancement for reverberant speech recognition
-
Sep.
-
A. Krueger and R. Haeb-Umbach, "Model-based feature enhancement for reverberant speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1692-1707, Sep. 2010.
-
(2010)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.18
, Issue.7
, pp. 1692-1707
-
-
Krueger, A.1
Haeb-Umbach, R.2
-
7
-
-
85017287487
-
Linear discriminant analysis for improved large vocabulary continuous speech recognition
-
R. Haeb-Umbach and H. Ney, "Linear discriminant analysis for improved large vocabulary continuous speech recognition," in Proc. ICASSP, San Francisco, CA, USA, 1992, pp. 13-16.
-
Proc. ICASSP, San Francisco, CA, USA, 1992
, pp. 13-16
-
-
Haeb-Umbach, R.1
Ney, H.2
-
8
-
-
51449120120
-
BoostedMMI for model and feature-space discriminative training
-
D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah, "BoostedMMI for model and feature-space discriminative training," in Proc. ICASSP, Las Vegas, NV, USA, 2008, pp. 4057-4060.
-
Proc. ICASSP, Las Vegas, NV, USA, 2008
, pp. 4057-4060
-
-
Povey, D.1
Kanevsky, D.2
Kingsbury, B.3
Ramabhadran, B.4
Saon, G.5
Visweswariah, K.6
-
9
-
-
0032048385
-
Speech recognition in noisy environments using first-order vector Taylor series
-
D. Y. Kim, C. Kwan Un, and N. S. Kim, "Speech recognition in noisy environments using first-order vector Taylor series," Speech Commun., vol. 24, no. 1, pp. 39-49, 1998.
-
(1998)
Speech Commun.
, vol.24
, Issue.1
, pp. 39-49
-
-
Kim, D.Y.1
Kwan Un, C.2
Kim, N.S.3
-
10
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
-
Nov.
-
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
-
(2012)
IEEE Signal Process. Mag.
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.E.4
Mohamed, A.-R.5
Jaitly, N.6
Senior, A.7
Vanhoucke, V.8
Nguyen, P.9
Sainath, T.N.10
Kingsbury, B.11
-
11
-
-
84890492030
-
An investigation of deep neural networks for noise robust speech recognition
-
M. Seltzer, D. Yu, and Y. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proc. ICASSP, Vancouver, BC, Canada, 2013, pp. 7398-7402.
-
Proc. ICASSP, Vancouver, BC, Canada, 2013
, pp. 7398-7402
-
-
Seltzer, M.1
Yu, D.2
Wang, Y.3
-
12
-
-
84867626068
-
Revisiting recurrent neural networks for robust ASR
-
O. Vinyals, S. V. Ravuri, and D. Povey, "Revisiting recurrent neural networks for robust ASR," in Proc. ICASSP, Kyoto, Japan, 2012, pp. 4085-4088.
-
Proc. ICASSP, Kyoto, Japan, 2012
, pp. 4085-4088
-
-
Vinyals, O.1
Ravuri, S.V.2
Povey, D.3
-
13
-
-
84890543083
-
Speech recognition with deep recurrent neural networks
-
A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Proc. ICASSP, 2013, pp. 6645-6649.
-
Proc. ICASSP, 2013
, pp. 6645-6649
-
-
Graves, A.1
Mohamed, A.-R.2
Hinton, G.3
-
14
-
-
0141741840
-
Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
-
S. C. Kremer and J. F. Kolen, Eds. Piscataway, NJ, USA: IEEE Press
-
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: The difficulty of learning long-term dependencies," in Field Guide to Dynamical Recurrent Networks, S. C. Kremer and J. F. Kolen, Eds. Piscataway, NJ, USA: IEEE Press, 2001.
-
(2001)
Field Guide to Dynamical Recurrent Networks
-
-
Hochreiter, S.1
Bengio, Y.2
Frasconi, P.3
Schmidhuber, J.4
-
15
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
16
-
-
70450180507
-
Robust in-car spelling recognition-a tandem BLSTM-HMM approach
-
M. Wöllmer, F. Eyben, B. Schuller, Y. Sun, T. Moosmayr, and N. Nguyen-Thien, "Robust in-car spelling recognition-a tandem BLSTM-HMM approach," in Proc. Interspeech, Brighton, U.K., 2009, pp. 2507-2510.
-
Proc. Interspeech, Brighton, U.K., 2009
, pp. 2507-2510
-
-
Wöllmer, M.1
Eyben, F.2
Schuller, B.3
Sun, Y.4
Moosmayr, T.5
Nguyen-Thien, N.6
-
17
-
-
80051637579
-
A multi-stream ASR framework for BLSTM modeling of conversational speech
-
M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "A multi-stream ASR framework for BLSTM modeling of conversational speech," in Proc. ICASSP, Prague, Czech Republic, 2011, pp. 4860-4863.
-
(2011)
Proc. ICASSP, Prague, Czech Republic
, pp. 4860-4863
-
-
Wöllmer, M.1
Eyben, F.2
Schuller, B.3
Rigoll, G.4
-
18
-
-
85032752364
-
Graphical model architectures for speech recognition
-
Sep.
-
J. A. Bilmes and C. Bartels, "Graphical model architectures for speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 89-100, Sep. 2005.
-
(2005)
IEEE Signal Process. Mag.
, vol.22
, Issue.5
, pp. 89-100
-
-
Bilmes, J.A.1
Bartels, C.2
-
19
-
-
9644308136
-
Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR
-
A. Hagen and A. Morris, "Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR," Comput. Speech Lang., vol. 19, no. 1, pp. 3-30, 2005.
-
(2005)
Comput. Speech Lang.
, vol.19
, Issue.1
, pp. 3-30
-
-
Hagen, A.1
Morris, A.2
-
20
-
-
79959825120
-
Using a DBN to integrate Sparse Classification and GMM-based ASR
-
Y. Sun, J. F. Gemmeke, B. Cranen, L. tenBosch, and L. Boves, "Using a DBN to integrate Sparse Classification and GMM-based ASR," in Proc. Interspeech, Makuhari, Japan, 2010, pp. 2098-2101.
-
Proc. Interspeech, Makuhari, Japan, 2010
, pp. 2098-2101
-
-
Sun, Y.1
Gemmeke, J.F.2
Cranen, B.3
TenBosch, L.4
Boves, L.5
-
21
-
-
84878543263
-
The PASCAL CHiME speech separation and recognition challenge
-
J. P. Barker, E. Vincent, N. Ma, H. Christensen, and P. D. Green, "The PASCAL CHiME speech separation and recognition challenge," Comput. Speech Lang., vol. 27, no. 3, pp. 621-633, 2013.
-
(2013)
Comput. Speech Lang.
, vol.27
, Issue.3
, pp. 621-633
-
-
Barker, J.P.1
Vincent, E.2
Ma, N.3
Christensen, H.4
Green, P.D.5
-
22
-
-
84890541701
-
The second 'CHiME' speech separation and recognition challenge: Datasets, tasks and baselines
-
E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni, "The second 'CHiME' speech separation and recognition challenge: Datasets, tasks and baselines," in Proc. ICASSP, Vancouver, BC, Canada, 2013, pp. 126-130.
-
Proc. ICASSP, Vancouver, BC, Canada, 2013
, pp. 126-130
-
-
Vincent, E.1
Barker, J.2
Watanabe, S.3
Le Roux, J.4
Nesta, F.5
Matassoni, M.6
-
23
-
-
84883396653
-
Noise Robust ASR in Reverberated Multisource Environments Applying Convolutive NMF and Long Short-Term Memory
-
M. Wöllmer, F. Weninger, J. Geiger,B. Schuller, and G. Rigoll, "Noise Robust ASR in Reverberated Multisource Environments Applying Convolutive NMF and Long Short-Term Memory," Comput. Speech Lang., Special Issue Speech Separat. Recogn. Multisource Environ., vol. 27, pp. 780-797, 2013.
-
(2013)
Comput. Speech Lang., Special Issue Speech Separat. Recogn. Multisource Environ.
, vol.27
, pp. 780-797
-
-
Wöllmer, M.1
Weninger, F.2
Geiger, J.3
Schuller, B.4
Rigoll, G.5
-
24
-
-
84893675434
-
The TUM+TUT+KUL Approach to the 2nd CHiME Challenge: Multi-Stream ASR Exploiting BLSTM Networks and Sparse NMF
-
J. T. Geiger, F. Weninger, A. Hurmalainen, J. F. Gemmeke, M. Wöllmer, B. Schuller, G. Rigoll, and T. Virtanen, "The TUM+TUT+KUL Approach to the 2nd CHiME Challenge: Multi-Stream ASR Exploiting BLSTM Networks and Sparse NMF," in Proc. CHiME Workshop, Vancouver, BC, Canada, 2013, pp. 25-30.
-
Proc. CHiME Workshop, Vancouver, BC, Canada, 2013
, pp. 25-30
-
-
Geiger, J.T.1
Weninger, F.2
Hurmalainen, A.3
Gemmeke, J.F.4
Wöllmer, M.5
Schuller, B.6
Rigoll, G.7
Virtanen, T.8
-
25
-
-
79960657803
-
Exemplar-based sparse representations for noise robust automatic speech recognition
-
Sep.
-
J. Gemmeke, T. Virtanen, and A. Hurmalainen, "Exemplar-based sparse representations for noise robust automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2067-2080, Sep. 2011.
-
(2011)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.19
, Issue.7
, pp. 2067-2080
-
-
Gemmeke, J.1
Virtanen, T.2
Hurmalainen, A.3
-
26
-
-
84906222220
-
Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
-
M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?," in Proc. Interspeech, Lyon, France, 2013, pp. 2992-2996.
-
Proc. Interspeech, Lyon, France, 2013
, pp. 2992-2996
-
-
Delcroix, M.1
Kubo, Y.2
Nakatani, T.3
Nakamura, A.4
-
27
-
-
84890503970
-
Effectiveness of discriminative training and feature transformation for reverberated and noisy speech
-
Y. Tachioka, S. Watanabe, and J. R. Hershey, "Effectiveness of discriminative training and feature transformation for reverberated and noisy speech," in Proc. ICASSP, Vancouver, BC, Canada, 2013, pp. 6935-6939.
-
Proc. ICASSP, Vancouver, BC, Canada, 2013
, pp. 6935-6939
-
-
Tachioka, Y.1
Watanabe, S.2
Hershey, J.R.3
-
28
-
-
84911377545
-
The Kaldi speech recognition toolkit
-
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlícek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," in Proc. ASRU, Honolulu, HI, USA, 2011.
-
Proc. ASRU, Honolulu, HI, USA, 2011
-
-
Povey, D.1
Ghoshal, A.2
Boulianne, G.3
Burget, L.4
Glembek, O.5
Goel, N.6
Hannemann, M.7
Motlícek, P.8
Qian, Y.9
Schwarz, P.10
Silovsky, J.11
Stemmer, G.12
Vesely, K.13
-
29
-
-
0033677121
-
Maximum likelihood discriminant feature spaces
-
G. Saon, M. Padmanabhan, R. Gopinath, and S. Chen, "Maximum likelihood discriminant feature spaces," in Proc. ICASSP, Istanbul, Turkey, 2000, pp. 1129-1132.
-
Proc. ICASSP, Istanbul, Turkey, 2000
, pp. 1129-1132
-
-
Saon, G.1
Padmanabhan, M.2
Gopinath, R.3
Chen, S.4
-
30
-
-
0032050110
-
Maximum likelihood linear transformations for HMM-based speech recognition
-
M. J. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
-
(1998)
Comput. Speech Lang.
, vol.12
, Issue.2
, pp. 75-98
-
-
Gales, M.J.1
-
31
-
-
84865766789
-
Uncertainty measures for improving exemplar-based source separation
-
H. Kallasjoki, U. Remes, J. F. Gemmeke, T. Virtanen, and K. J. Palomäki, "Uncertainty measures for improving exemplar-based source separation," in Proc. INTERSPEECH, Florence, Italy, 2011, pp. 469-472.
-
Proc. INTERSPEECH, Florence, Italy, 2011
, pp. 469-472
-
-
Kallasjoki, H.1
Remes, U.2
Gemmeke, J.F.3
Virtanen, T.4
Palomäki, K.J.5
-
32
-
-
85032752215
-
Exemplar-based processing for speech recognition: An overview
-
Nov.
-
T. Sainath, B. Ramabhadran, D. Nahamoo, D. Kanevsky, D. Van Compernolle, K. Demuynck, J. Gemmeke, J. Bellegarda, and S. Sundaram, "Exemplar-based processing for speech recognition: An overview," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 98-113, Nov. 2012.
-
(2012)
IEEE Signal Process. Mag.
, vol.29
, Issue.6
, pp. 98-113
-
-
Sainath, T.1
Ramabhadran, B.2
Nahamoo, D.3
Kanevsky, D.4
Van Compernolle, D.5
Demuynck, K.6
Gemmeke, J.7
Bellegarda, J.8
Sundaram, S.9
-
33
-
-
84893652593
-
Compact long context spectral factorisation models for noise robust recognition of medium vocabulary speech
-
A. Hurmalainen, J. F. Gemmeke, and T. Virtanen, "Compact long context spectral factorisation models for noise robust recognition of medium vocabulary speech," in Proc. CHiME Workshop, Vancouver, BC, Canada, 2013, pp. 13-18.
-
Proc. CHiME Workshop, Vancouver, BC, Canada, 2013
, pp. 13-18
-
-
Hurmalainen, A.1
Gemmeke, J.F.2
Virtanen, T.3
-
34
-
-
84878390904
-
Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise
-
F. Weninger, M. Wöllmer, and B. Schuller, "Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise," in Proc. Interspeech, Portland, OR, USA, 2012, pp. 302-305.
-
Proc. Interspeech, Portland, OR, USA, 2012
, pp. 302-305
-
-
Weninger, F.1
Wöllmer, M.2
Schuller, B.3
-
35
-
-
0031268931
-
Bidirectional recurrent neural networks
-
Nov.
-
M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
-
(1997)
IEEE Trans. Signal Process.
, vol.45
, Issue.11
, pp. 2673-2681
-
-
Schuster, M.1
Paliwal, K.K.2
-
36
-
-
27744588611
-
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
-
A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Netw., vol. 18, no. 5-6, pp. 602-610, 2005.
-
(2005)
Neural Netw.
, vol.18
, Issue.5-6
, pp. 602-610
-
-
Graves, A.1
Schmidhuber, J.2
-
37
-
-
70349284484
-
-
Ph.D. dissertation, Technische Univ. München, Munich, Germany
-
A. Graves, "Supervised sequence labelling with recurrent neural networks," Ph.D. dissertation, Technische Univ. München, Munich, Germany, 2008.
-
(2008)
Supervised Sequence Labelling with Recurrent Neural Networks
-
-
Graves, A.1
-
38
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
Jan.
-
A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
-
(2012)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.20
, Issue.1
, pp. 14-22
-
-
Mohamed, A.1
Dahl, G.2
Hinton, G.3
-
39
-
-
84893671946
-
Discriminative methods for noise robust speech recognition: A CHiME challenge benchmark
-
Y. Tachioka, S. Watanabe, J. Le Roux, and J. R. Hershey, "Discriminative methods for noise robust speech recognition: A CHiME challenge benchmark," in Proc. CHiME Workshop, Vancouver, BC, Canada, 2013, pp. 19-24.
-
Proc. CHiME Workshop, Vancouver, BC, Canada, 2013
, pp. 19-24
-
-
Tachioka, Y.1
Watanabe, S.2
Le Roux, J.3
Hershey, J.R.4
-
40
-
-
84867614588
-
Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions
-
M. Wöllmer, A. Metallinou, N. Katsamanis, B. Schuller, and S. Narayanan, "Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions," in Proc. ICASSP, Kyoto, Japan, 2012, pp. 4157-4160.
-
Proc. ICASSP, Kyoto, Japan, 2012
, pp. 4157-4160
-
-
Wöllmer, M.1
Metallinou, A.2
Katsamanis, N.3
Schuller, B.4
Narayanan, S.5
-
41
-
-
84863740422
-
Toward a practical implementation of exemplar-based noise robust ASR
-
J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and Y. Sun, "Toward a practical implementation of exemplar-based noise robust ASR," in Proc. EUSIPCO, Barcelona, Spain, 2011, pp. 1490-1494.
-
(2011)
Proc. EUSIPCO, Barcelona, Spain
, pp. 1490-1494
-
-
Gemmeke, J.F.1
Hurmalainen, A.2
Virtanen, T.3
Sun, Y.4
-
42
-
-
84886818613
-
Active-set Newton algorithm for overcomplete non-negative representations of audio
-
Nov.
-
T. Virtanen, J. Gemmeke, and B. Raj, "Active-set Newton algorithm for overcomplete non-negative representations of audio," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2277-2289, Nov. 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.11
, pp. 2277-2289
-
-
Virtanen, T.1
Gemmeke, J.2
Raj, B.3
-
43
-
-
84893685019
-
A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
-
F. Nesta, M. Matassoni, and R. F. Astudillo, "A flexible spatial blind source extraction framework for robust speech recognition in noisy environments," in Proc. CHiME Workshop, Vancouver, BC, Canada, 2013, pp. 33-38.
-
Proc. CHiME Workshop, Vancouver, BC, Canada, 2013
, pp. 33-38
-
-
Nesta, F.1
Matassoni, M.2
Astudillo, R.F.3
-
44
-
-
84905240834
-
Recurrent deep neural networks for robust speech recognition
-
to be published
-
C. Weng, D. Yu, S. Watanabe, and B.-H. Juang, "Recurrent deep neural networks for robust speech recognition," in Proc. ICASSP, Florence, Italy, 2014, to be published.
-
Proc. ICASSP, Florence, Italy, 2014
-
-
Weng, C.1
Yu, D.2
Watanabe, S.3
Juang, B.-H.4
|