-
1
-
-
34547941599
-
Automatic speech recognition and speech variability: A review
-
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., and Wellekens, C. (2007). "Automatic speech recognition and speech variability: A review," Speech Commun. 49, 763-786. 10.1016/j.specom.2007.02.006
-
(2007)
Speech Commun.
, vol.49
, pp. 763-786
-
-
Benzeghiba, M.1
De Mori, R.2
Deroo, O.3
Dupont, S.4
Erbes, T.5
Jouvet, D.6
Fissore, L.7
Laface, P.8
Mertins, A.9
Ris, C.10
Rose, R.11
Tyagi, V.12
Wellekens, C.13
-
2
-
-
51449089975
-
Localized spectro-temporal cepstral analysis of speech
-
Bouvrie, J., Ezzat, T., and Poggio, T. (2008). "Localized spectro-temporal cepstral analysis of speech ", in Proceedings of ICASSP 2008, pp. 4733-4736.
-
(2008)
Proceedings of ICASSP 2008
, pp. 4733-4736
-
-
Bouvrie, J.1
Ezzat, T.2
Poggio, T.3
-
3
-
-
23744508888
-
Multiresolution spectrotemporal analysis of complex sounds
-
Chi, T., Ru, P., and Shamma, S. (2005). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 118, 887. 10.1121/1.1945807
-
(2005)
J. Acoust. Soc. Am.
, vol.118
, pp. 887
-
-
Chi, T.1
Ru, P.2
Shamma, S.3
-
4
-
-
0000747781
-
New telephone speech corpora at CSLU
-
Cole, R. A., Noel, M., Lander, T., and Durham, T. (1995). "New telephone speech corpora at CSLU," in Proceedings of Eurospeech 1995, p. 95.
-
(1995)
Proceedings of Eurospeech 1995
, pp. 95
-
-
Cole, R.A.1
Noel, M.2
Lander, T.3
Durham, T.4
-
6
-
-
0019053271
-
Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences
-
Davis, S., and Mermelstein, P. (1980). "Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust. Speech Signal Process. 28, 357-366. 10.1109/TASSP.1980.1163420
-
(1980)
IEEE Trans. Acoust. Speech Signal Process.
, vol.28
, pp. 357-366
-
-
Davis, S.1
Mermelstein, P.2
-
7
-
-
51449087857
-
Hierarchical spectro-temporal features for robust speech recognition
-
Domont, X., Heckmann, M., Joublin, F., and Goerick, C. (2008). "Hierarchical spectro-temporal features for robust speech recognition," in Proceedings of ICASSP 2008, pp. 4417-4420.
-
(2008)
Proceedings of ICASSP 2008
, pp. 4417-4420
-
-
Domont, X.1
Heckmann, M.2
Joublin, F.3
Goerick, C.4
-
9
-
-
84987770945
-
-
ETSI Standard 201 108 v1.1.3. It is available at the ETSI website
-
ETSI Standard 201 108 v1.1.3 (2003). It is available at the ETSI website: http://www.etsi.org/WebSite/Technologies/DistributedSpeechRecognition.aspx.
-
(2003)
-
-
-
10
-
-
34547552785
-
AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis
-
Ezzat, T., Bouvrie, J., and Poggio, T. (2007a). "AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis," in Proceedings of ICASSP 2007, Vol. 4, pp. 1061-1064.
-
(2007)
Proceedings of ICASSP 2007
, vol.4
, pp. 1061-1064
-
-
Ezzat, T.1
Bouvrie, J.2
Poggio, T.3
-
11
-
-
67651044226
-
Spectro-temporal analysis of speech using 2-D gabor filters
-
Ezzat, T., Bouvrie, J., and Poggio, T. (2007b). "Spectro-temporal analysis of speech using 2-D gabor filters," in Proceedings of Interspeech 2007, pp. 506-509.
-
(2007)
Proceedings of Interspeech 2007
, pp. 506-509
-
-
Ezzat, T.1
Bouvrie, J.2
Poggio, T.3
-
12
-
-
0024909979
-
Some statistical issues in the comparison of speech recognition algorithms
-
Gillick, L., and Cox, S. (1989). "Some statistical issues in the comparison of speech recognition algorithms," in Proceedings of ICASSP 1989, Vol. 1, pp. 532-535.
-
(1989)
Proceedings of ICASSP 1989
, vol.1
, pp. 532-535
-
-
Gillick, L.1
Cox, S.2
-
14
-
-
84867227177
-
A closer look on hierarchical spectro-temporal features (HIST)
-
Heckmann, M., Domont, X., Joublin, F., and Goerick, C. (2008). "A closer look on hierarchical spectro-temporal features (HIST)," in Proceedings of Interspeech 2008, pp. 894-897.
-
(2008)
Proceedings of Interspeech 2008
, pp. 894-897
-
-
Heckmann, M.1
Domont, X.2
Joublin, F.3
Goerick, C.4
-
15
-
-
0025041264
-
Perceptual linear predictive (PLP) analysis of speech
-
Hermansky, H. (1990). "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-1752. 10.1121/1.399423
-
(1990)
J. Acoust. Soc. Am.
, vol.87
, pp. 1738-1752
-
-
Hermansky, H.1
-
16
-
-
0033709098
-
Tandem connectionist feature extraction for conventional HMM systems
-
Hermansky, H., Ellis, D. P. W., and Sharma, S. (2000). "Tandem connectionist feature extraction for conventional HMM systems," in Proceedings of ICASSP 2000, Vol. 3, pp. 1635-1638.
-
(2000)
Proceedings of ICASSP 2000
, vol.3
, pp. 1635-1638
-
-
Hermansky, H.1
Ellis, D.P.W.2
Sharma, S.3
-
17
-
-
0028517164
-
RASTA processing of speech
-
Hermansky, H., and Morgan, N. (1994). "RASTA processing of speech," IEEE Trans. Speech, Audio Process. 2, 578-589. 10.1109/89.326616
-
(1994)
IEEE Trans. Speech, Audio Process
, vol.2
, pp. 578-589
-
-
Hermansky, H.1
Morgan, N.2
-
18
-
-
0032658253
-
Temporal patterns (TRAPS) in ASR of noisy speech
-
Hermansky, H., and Sharma, S. (1999). "Temporal patterns (TRAPS) in ASR of noisy speech," in Proceedings of ICASSP 1999, Vol. 1, pp. 289-292.
-
(1999)
Proceedings of ICASSP 1999
, vol.1
, pp. 289-292
-
-
Hermansky, H.1
Sharma, S.2
-
19
-
-
0032676337
-
On the relative importance of various components of the modulation spectrum for automatic speech recognition
-
Kanedera, N., Arai, T., Hermansky, H., and Pavel, M. (1999). "On the relative importance of various components of the modulation spectrum for automatic speech recognition," Speech Commun. 28, 43-55. 10.1016/S0167-6393(99)00002-3
-
(1999)
Speech Commun.
, vol.28
, pp. 43-55
-
-
Kanedera, N.1
Arai, T.2
Hermansky, H.3
Pavel, M.4
-
20
-
-
85009227802
-
Localized spectro-temporal features for automatic speech recognition
-
Kleinschmidt, M. (2003). "Localized spectro-temporal features for automatic speech recognition," in Proceedings of Eurospeech 2003, pp. 2573-2576.
-
(2003)
Proceedings of Eurospeech 2003
, pp. 2573-2576
-
-
Kleinschmidt, M.1
-
22
-
-
0031187171
-
Speech recognition by machines and humans
-
Lippmann, R. (1997). "Speech recognition by machines and humans," Speech Commun. 22, 1-15. 10.1016/S0167-6393(97)00021-6
-
(1997)
Speech Commun.
, vol.22
, pp. 1-15
-
-
Lippmann, R.1
-
23
-
-
38849119808
-
Phoneme representation and classification in primary auditory cortex
-
Mesgarani, N., David, S., Fritz, J., and Shamma, S. (2008). "Phoneme representation and classification in primary auditory cortex," J. Acoust. Soc. Am. 123, 899-909. 10.1121/1.2816572
-
(2008)
J. Acoust. Soc. Am.
, vol.123
, pp. 899-909
-
-
Mesgarani, N.1
David, S.2
Fritz, J.3
Shamma, S.4
-
24
-
-
34047272330
-
Discrimination of speech from non-speech based on multiscale spectro-temporal modulations
-
Mesgarani, N., Slaney, M., and Shamma, S. (2006). "Discrimination of speech from non-speech based on multiscale spectro-temporal modulations," IEEE Trans. Audio Speech Lang. Proc. 14, 920-930. 10.1109/TSA.2005.858055
-
(2006)
IEEE Trans. Audio Speech Lang. Proc.
, vol.14
, pp. 920-930
-
-
Mesgarani, N.1
Slaney, M.2
Shamma, S.3
-
25
-
-
79959816304
-
A multistream multiresolution framework for phoneme recognition
-
Mesgarani, N., Thomas, S., and Hermansky, H. (2010). "A multistream multiresolution framework for phoneme recognition," in Proceedings of Interspeech 2010, pp. 318-321.
-
(2010)
Proceedings of Interspeech 2010
, pp. 318-321
-
-
Mesgarani, N.1
Thomas, S.2
Hermansky, H.3
-
26
-
-
79551679242
-
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
-
Meyer, B. T., Brand, T., and Kollmeier, B. (2011b). "Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes," J. Acoust. Soc. Am. 129, 388-403. 10.1121/1.3514525
-
(2011)
J. Acoust. Soc. Am.
, vol.129
, pp. 388-403
-
-
Meyer, B.T.1
Brand, T.2
Kollmeier, B.3
-
27
-
-
79953659090
-
Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
-
Meyer, B., and Kollmeier, B. (2011a). "Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition," Speech Commun. 53, 753-767. 10.1016/j.specom.2010.07.002
-
(2011)
Speech Commun.
, vol.53
, pp. 753-767
-
-
Meyer, B.1
Kollmeier, B.2
-
28
-
-
84987754323
-
Multiresolution spectrotemporal analysis of complex sounds
-
Nemala, S. K., and Elhilali, M. (2010). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 127, 1817. 10.1121/1.3384192
-
(2010)
J. Acoust. Soc. Am.
, vol.127
, pp. 1817
-
-
Nemala, S.K.1
Elhilali, M.2
-
29
-
-
84987702417
-
The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
-
Pearce, D., and Hirsch, H. (2000). "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proceedings of ICSLP 2000, Vol. 4, pp. 29-32.
-
(2000)
Proceedings of ICSLP 2000
, vol.4
, pp. 29-32
-
-
Pearce, D.1
Hirsch, H.2
-
30
-
-
0037824480
-
Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition
-
Qiu, A., Schreiner, C., and Escabi, M. (2003). "Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition," J. Neurophysiol. 90, 456-476. 10.1152/jn.00851.2002
-
(2003)
J. Neurophysiol.
, vol.90
, pp. 456-476
-
-
Qiu, A.1
Schreiner, C.2
Escabi, M.3
-
32
-
-
84912114311
-
Comparative experiments on large vocabulary speech recognition
-
Schwarz, R., Anastasakos, T., Kubala, F., Makhoul, J., Nguyen, L., and Zavaliagkos, G. (1993). "Comparative experiments on large vocabulary speech recognition," in Proceedings of the Workshop on Human Language Technology, pp. 75-80.
-
(1993)
Proceedings of the Workshop on Human Language Technology
, pp. 75-80
-
-
Schwarz, R.1
Anastasakos, T.2
Kubala, F.3
Makhoul, J.4
Nguyen, L.5
Zavaliagkos, G.6
-
33
-
-
0027623210
-
Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
-
Varga, A., and Steeneken, H. J. M. (1993). "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun. 12, 247-251. 10.1016/0167-6393(93)90095-3
-
(1993)
Speech Commun.
, vol.12
, pp. 247-251
-
-
Varga, A.1
Steeneken, H.J.M.2
-
34
-
-
33745183789
-
Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines
-
Wesker, T., Meyer, B., Wagener, K., Anemüller, J., Mertins, A., and Kollmeier, B. (2005). "Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines," in Proceedings of Eurospeech/Interspeech 2005, pp. 1273-1276.
-
(2005)
Proceedings of Eurospeech/Interspeech 2005
, pp. 1273-1276
-
-
Wesker, T.1
Meyer, B.2
Wagener, K.3
Anemüller, J.4
Mertins, A.5
Kollmeier, B.6
-
35
-
-
4544219816
-
-
(Cambridge University Engineering Department, Cambridge, UK)
-
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (2001). The HTK Book, version 3.1 (Cambridge University Engineering Department, Cambridge, UK), pp. 1-271.
-
(2001)
The HTK Book, Version 3.1
, pp. 1-271
-
-
Young, S.1
Kershaw, D.2
Odell, J.3
Ollason, D.4
Valtchev, V.5
Woodland, P.6
-
36
-
-
84867220821
-
Multi-stream spectro-temporal features for robust speech recognition
-
Zhao, S., and Morgan, N. (2008). "Multi-stream spectro-temporal features for robust speech recognition," in Proceedings of Interspeech 2008, pp. 898-901.
-
(2008)
Proceedings of Interspeech 2008
, pp. 898-901
-
-
Zhao, S.1
Morgan, N.2
-
37
-
-
70450216114
-
Multi-stream to many-stream: Using spectro-temporal features for ASR
-
Zhao, S., Ravuri, S., and Morgan, N. (2009). "Multi-stream to many-stream: Using spectro-temporal features for ASR," in Proceedings of Interspeech 2009, pp. 2951-2954.
-
(2009)
Proceedings of Interspeech 2009
, pp. 2951-2954
-
-
Zhao, S.1
Ravuri, S.2
Morgan, N.3
|