SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 2435-2439

Should Deep Neural nets have ears? The role of auditory features in deep learning approaches

(3) Martinez, Angel Mario Castro a Moritz, Niko a,b Meyer, Bernd T a

a UNIVERSITY OF OLDENBURG (Germany)

b FRAUNHOFER IDMT (Germany)

Author keywords

Amplitude modulation filter bank; Deep learning; Deep neural network; Gabor features; Speech recognition

Indexed keywords

AMPLITUDE MODULATION; ARTIFICIAL INTELLIGENCE; FILTER BANKS; GABOR FILTERS; LEARNING SYSTEMS; SIGNAL PROCESSING; SPEECH COMMUNICATION;

ACOUSTIC CONDITIONS; AUDITORY SIGNAL PROCESSING; AUTOMATIC SPEECH RECOGNITION; CONVENTIONAL FILTERS; DEEP LEARNING; DEEP NEURAL NETWORKS; GABOR FEATURE; MEL-FREQUENCY CEPSTRAL COEFFICIENTS;

SPEECH RECOGNITION;

EID: 84910029373 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (17)

References (30)

1
- 0031187171
- Speech recognition by machines and humans
- R. Lippmann, "Speech recognition by machines and humans, " Speech Commun., vol. 22, no. 1, pp. 1-15, 1997.
- (1997) Speech Commun , vol.22 , Issue.1 , pp. 1-15
- Lippmann, R.¹

2
- 34247580087
- Reaching over the gap: A review of efforts to link human and automatic speech recognition research
- O. Scharenborg, "Reaching over the gap: A review of efforts to link human and automatic speech recognition research, " Speech Commun., pp. 336-347, 2007.
- (2007) Speech Commun , pp. 336-347
- Scharenborg, O.¹

3
- 79953659090
- Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
- B. Meyer and B. Kollmeier, "Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition, " Speech Commun., vol. 53, pp. 753-767, 2011.
- (2011) Speech Commun , vol.53 , pp. 753-767
- Meyer, B.¹ Kollmeier, B.²

4
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- A. R. Mohamed, G. Hinton and G. Penn, "Understanding how deep belief networks perform acoustic modelling, " in Proc. ICASSP, 2012.
- (2012) Proc. ICASSP
- Mohamed, A.R.¹ Hinton, G.² Penn, G.³

5
- 84055222005
- Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing
- G. Dahl, D. Yu, L. Deng and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, " IEEE Transactions, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) IEEE Transactions , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

6
- 80051654263
- Deep belief networks using discriminative features for phone recognition
- A. R. Mohamed, T. N. Sainath, G. Dahl, B. Ramabhadran, G. E. Hinton and M. A. Picheny, "Deep belief networks using discriminative features for phone recognition, " in Proc. ICASSP, 2011.
- (2011) Proc. ICASSP
- Mohamed, A.R.¹ Sainath, T.N.² Dahl, G.³ Ramabhadran, B.⁴ Hinton, G.E.⁵ Picheny, M.A.⁶

7
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak and A. R. Mohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition, " in IEEE Workshop ASRU, 2011.
- (2011) IEEE Workshop ASRU
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.R.⁶

8
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. Ellis and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems, " in Proc. Interspeech, 2000.
- (2000) Proc. Interspeech
- Hermansky, H.¹ Ellis, D.² Sharma, S.³

9
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. Ellis and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems, " in Proc. Interspeech, 2000.
- (2000) Proc. Interspeech
- Hermansky, H.¹ Ellis, D.² Sharma, S.³

10
- 70450205161
- Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction
- C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction, " in Proc. Interspeech, 2009.
- (2009) Proc. Interspeech
- Kim, C.¹ Stern, R.M.²

11
- 0000460671
- Complex sounds and auditory images
- R. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang and M. Allerhand, "Complex sounds and auditory images, " in Auditory physiology and perception, Proc. 9th International Symposium on Hearing, 1992.
- (1992) Auditory Physiology and Perception, Proc. 9th International Symposium on Hearing
- Patterson, R.¹ Robinson, K.² Holdsworth, J.³ McKeown, D.⁴ Zhang, C.⁵ Allerhand, M.⁶

12
- 0037824480
- Gabor analysis of auditory mid- brain receptive fields: Spectro-temporal and binaural composition
- A. Qiu, C. Schreiner and M. Escabi, "Gabor analysis of auditory mid- brain receptive fields: Spectro-temporal and binaural composition, " Journal of Neurophysiology, vol. 90, pp. 456-476, 2003.
- (2003) Journal of Neurophysiology , vol.90 , pp. 456-476
- Qiu, A.¹ Schreiner, C.² Escabi, M.³

13
- 34547509128
- Representation of phonemes in primary auditory cortex: How the brain analyzes speech
- N. Mesgarani, D. Stephen and S. Shamma, "Representation of phonemes in primary auditory cortex: How the brain analyzes speech, " in Proc. ICASSP, 2007.
- (2007) Proc. ICASSP
- Mesgarani, N.¹ Stephen, D.² Shamma, S.³

14
- 27144544136
- Improving word accuracy with Gabor feature extraction
- M. Kleinschmidt and D.Gelbart, "Improving word accuracy with Gabor feature extraction, " in Proc. Interspeech, 2002.
- (2002) Proc. Interspeech
- Kleinschmidt, M.¹ Gelbart, D.²

15
- 84863799482
- Spectrotemporal modulation subspace-spanning filter bank features for robust automatic speech recognition
- M. R. Schädler, B. Kollmeier and B. T. Meyer, "Spectrotemporal modulation subspace-spanning filter bank features for robust automatic speech recognition, " J. Acoust. Soc. Am., pp. 4134-4151, 2011.
- (2011) J. Acoust. Soc. Am. , pp. 4134-4151
- Schädler, M.R.¹ Kollmeier, B.² Meyer, B.T.³

16
- 84878415523
- Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition
- B. Meyer, C. Spille, B. Kollmeier and N. Morgan, "Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition, " in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Meyer, B.¹ Spille, C.² Kollmeier, B.³ Morgan, N.⁴

17
- 84867619222
- Spectro-temporal Gabor features for speaker recognition
- H. Lei, B. Meyer and N. Mirghafori, "Spectro-temporal Gabor features for speaker recognition, " in Proc. ICASSP, 2012.
- (2012) Proc. ICASSP
- Lei, H.¹ Meyer, B.² Mirghafori, N.³

18
- 0024241221
- Periodicity coding in the inferior Colliculus of the cat. I. Neuronal mechanisms
- G. Langner and C. Schreiner, "Periodicity coding in the inferior Colliculus of the cat. I. Neuronal mechanisms, " J. of Neurophysiology, vol. 60, pp. 1799-1822, 1988.
- (1988) J. of Neurophysiology , vol.60 , pp. 1799-1822
- Langner, G.¹ Schreiner, C.²

19
- 0028297185
- Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction
- B. Kollmeier and R. Koch, "Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction, " J. Acoust. Soc. Am., vol. 95, no. 3, pp. 1593-1602, 1994.
- (1994) J. Acoust. Soc. Am , vol.95 , Issue.3 , pp. 1593-1602
- Kollmeier, B.¹ Koch, R.²

20
- 0030691985
- Modeling auditory processing of amplitude modulation I. Detection and masking with narrow-band carriers
- T. Dau, B. Kollmeier and A. Kohlrausch, "Modeling Auditory Processing of Amplitude Modulation. I. Detection and Masking with Narrow-Band Carriers, " J. Acoustic Soc. Am., vol. 102, no. 5, p. 2892-2905, 1997.
- (1997) J. Acoustic Soc. Am. , vol.102 , Issue.5 , pp. 2892-2905
- Dau, T.¹ Kollmeier, B.² Kohlrausch, A.³

21
- 84955462883
- Robust ASR in reverberant environments using temporal cepstrum smoothing for speech enhancement and an amplitude modulation filterbank for feature extraction
- F. Xiong, N. Moritz, R. Rehr, J. Anemüller, B. Meyer, T. Gerkmann, S. Doclo and S. Goetze, "Robust ASR in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction, " in Proc. REVERB Workshop, 2014.
- (2014) Proc. REVERB Workshop
- Xiong, F.¹ Moritz, N.² Rehr, R.³ Anemüller, J.⁴ Meyer, B.⁵ Gerkmann, T.⁶ Doclo, S.⁷ Goetze, S.⁸

22
- 79551679242
- Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
- B. Meyer, T. Brand and B. Kollmeier, "Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes, " J. Acoust. Soc. Am., vol. 129, pp. 388-403, 2011.
- (2011) J. Acoust. Soc. Am , vol.129 , pp. 388-403
- Meyer, B.¹ Brand, T.² Kollmeier, B.³

23
- 80051627812
- Amplitude Modulation Spectrogram based Features for Robust Speech Recognition in Noisy and Reverberant Environments
- N. Moritz, J. Anemüller and B. Kollmeier, "Amplitude Modulation Spectrogram based Features for Robust Speech Recognition in Noisy and Reverberant Environments, " in Proc. ICASSP, 2011.
- (2011) Proc. ICASSP
- Moritz, N.¹ Anemüller, J.² Kollmeier, B.³

24
- 84878415009
- Amplitude modulation filters as feature sets for robust ASR: Constant absolute or relative bandwidth?
- Portland, USA
- N. Moritz, J. Anemüller and B. Kollmeier, "Amplitude Modulation Filters as Feature Sets for Robust ASR: Constant Absolute or Relative Bandwidth?, " in Proc. Interspeech, Portland, USA, 2012.
- (2012) Proc. Interspeech
- Moritz, N.¹ Anemüller, J.² Kollmeier, B.³

25
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, " IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366., 1980.
- (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

26
- 84906241049
- Improved feature processing for deep neural networks
- S. Rath, D. Povey, K. Veselý and J. Cernocký, "Improved feature processing for deep neural networks, " in Interspeech, 2013.
- (2013) Interspeech
- Rath, S.¹ Povey, D.² Veselý, K.³ Cernocký, J.⁴

27
- 84906274730
- Sequencediscriminative training of deep neural networks
- K. Veselý, A. Ghoshal, L. Burget and D. Povey, "Sequencediscriminative training of deep neural networks, " in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Veselý, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

28
- 84861125212
- A practical guide to training restricted Boltzmann machines
- G. Hinton, "A practical guide to training restricted Boltzmann machines, " Momentum, vol. 9, no. 1, p. 926, 2010.
- (2010) Momentum , vol.9 , Issue.1
- Hinton, G.¹

29
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel und K. Veselý, "The Kaldi speech recognition toolkit, " in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Veselý, K.⁷

30
- 56149125973
- Aurora working group: DSR front end LVCSR evaluation AU/384/02, Inst. For Signal and Information Process
- N. Parihar and J. Picone, Aurora working group: DSR front end LVCSR evaluation AU/384/02, Inst. for Signal and Information Process, Mississippi State University, Technical Report, 2002.
- (2002) Mississippi State University, Technical Report
- Parihar, N.¹ Picone, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.