SCOPUS 정보 검색 플랫폼

Journal of the Acoustical Society of America

Volumn 128, Issue 6, 2010, Pages 3769-3780

Temporal envelope compensation for robust phoneme recognition using modulation spectrum

(3) Ganapathy, Sriram a Thomas, Samuel a Hermansky, Hynek a

a Johns Hopkins University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE LOOPS; ANALYSIS TECHNIQUES; CHANNEL NOISE; CONVERSATIONAL TELEPHONE SPEECH; FEATURE EXTRACTION TECHNIQUES; FREQUENCY DOMAINS; LINEAR PREDICTION; MODULATION FREQUENCIES; MODULATION SPECTRUM; NOISE COMPENSATION; NOISY SPEECH; PHONEME RECOGNITION; PROCESSING STAGE; ROBUST SPEECH; SPEECH SIGNALS; SUB-BANDS; TEMPORAL ENVELOPES; TEST DATA;

ADDITIVE NOISE; EXPERIMENTS; FEATURE EXTRACTION; MODULATION; SPEECH COMMUNICATION; TELEPHONE; TELEPHONE SETS;

SPEECH RECOGNITION;

ARTICLE; AUTOMATED PATTERN RECOGNITION; HUMAN; NOISE; PHONETICS; SIGNAL PROCESSING; SOUND DETECTION; SPEECH; STATISTICAL MODEL; TIME;

HUMANS; LINEAR MODELS; NOISE; PATTERN RECOGNITION, AUTOMATED; PHONETICS; SIGNAL PROCESSING, COMPUTER-ASSISTED; SOUND SPECTROGRAPHY; SPEECH ACOUSTICS; TIME FACTORS;

EID: 79952171347 PISSN: 00014966 EISSN: None Source Type: Journal
DOI: 10.1121/1.3504658 Document Type: Article

Times cited : (29)

References (40)

1
- 36248966385
- Autoregressive modelling of temporal envelopes
- Athineos, M., and Ellis, D. P. W. (2007). "Autoregressive modelling of temporal envelopes," IEEE Trans. Signal Process. 55(11), 5237-5245.
- (2007) IEEE Trans. Signal Process. , vol.55 , Issue.11 , pp. 5237-5245
- Athineos, M.¹ Ellis, D.P.W.²

2
- 33750241712
- LP-TRAPS: Linear predictive temporal patterns
- Athineos, M., Hermansky, H., and Ellis, D. P. W. (2004). "LP-TRAPS: Linear predictive temporal patterns," in Proceedings of Interspeech, pp. 1154-1157.
- (2004) Proceedings of Interspeech , pp. 1154-1157
- Athineos, M.¹ Hermansky, H.² Ellis, D.P.W.³

3
- 0031192532
- On the effects of short-term spectrum smoothing in channel normalization
- PII S1063667697048591
- Avendano, C., and Hermansky, H. (1997). "On the effects of short-term spectrum smoothing in channel normalization," IEEE Trans. Speech Audio Process. 5(4), 372-374. (Pubitemid 127746010)
- (1997) IEEE Transactions on Speech and Audio Processing , vol.5 , Issue.4 , pp. 372-374
- Avendano, C.¹ Hermansky, H.²

4
- 0003573244
- Kluwer Academic Publishers, Boston
- Bourlard, H., and Morgan, N. (1994). Connectionist Speech Recognition - A Hybrid Approach (Kluwer Academic Publishers, Boston), pp. 59-115.
- (1994) Connectionist Speech Recognition - A Hybrid Approach , pp. 59-115
- Bourlard, H.¹ Morgan, N.²

5
- 42549139762
- MVA Processing of Speech Features
- Chen, C., and Bilmes, J. A. (2007). "MVA Processing of Speech Features," IEEE Trans. Audio, Speech, Lang. Process. 15(1), 257-270.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.¹ Bilmes, J.A.²

6
- 0029952425
- A quantitative model of the 'effective' signal processing in the auditory system: I. Model structure
- Dau, T., Püschel, D., and Kohlrausch, A. (1996). "A quantitative model of the 'effective' signal processing in the auditory system: I. Model structure," J. Acoust. Soc. Am. 99(6), 3615-3622.
- (1996) J. Acoust. Soc. Am. , vol.99 , Issue.6 , pp. 3615-3622
- Dau, T.¹ Püschel, D.² Kohlrausch, A.³

7
- 79952169792
- Last viewed August 18, 2009
- Dhillon, R., Bhagat, S., Carvey, H., and Shriberg, E. (2002) "The ICSI Meeting Recorder Project," http://www.icsi.berkeley.edu/Speech/mr (Last viewed August 18, 2009).
- (2002) The ICSI Meeting Recorder Project
- Dhillon, R.¹ Bhagat, S.² Carvey, H.³ Shriberg, E.⁴

8
- 0027957839
- Effect of temporal envelope smearing on speech reception
- Drullman, R., Festen, J. M., and Plomp, R. (1994). "Effect of temporal envelope smearing on speech reception," J. Acoust. Soc. Am. 95(2), 1053-1064. (Pubitemid 24056370)
- (1994) Journal of the Acoustical Society of America , vol.95 , Issue.2 , pp. 1053-1064
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

9
- 0242662969
- ETSI ES 202 050 v1.1.1 STQ
- ETSI (2002). ETSI ES 202 050 v1.1.1 STQ; Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. http://www.etsi.org/deliver/etsi-es/202000-202099/202050/01.01.05-60/ es-202050v010105p.pdf
- (2002) Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithms

10
- 70450185608
- Noise suppression based on extending a speech-dominated modulation band
- Falk, T. H., Stadler, S., Kleijn, W. B., and Chan, W. Y. (2007). "Noise suppression based on extending a speech-dominated modulation band," in Proceedings of Interspeech, pp. 970-973.
- (2007) Proceedings of Interspeech , pp. 970-973
- Falk, T.H.¹ Stadler, S.² Kleijn, W.B.³ Chan, W.Y.⁴

11
- 58649102246
- Modulation frequency features for phoneme recognition in noisy speech
- Ganapathy, S., Thomas, S., and Hermansky, H. (2009). "Modulation frequency features for phoneme recognition in noisy speech," J. Acoust. Soc. Am., Express Lett. 125(1), EL8-EL12.
- (2009) J. Acoust. Soc. Am., Express Lett. , vol.125 , Issue.1
- Ganapathy, S.¹ Thomas, S.² Hermansky, H.³

12
- 70450144093
- Ph.D. thesis, University of California, Berkeley
- Gelbart, D. (2008). "Ensemble feature selection for multi-stream automatic speech recognition," Ph.D. thesis, University of California, Berkeley.
- (2008) Ensemble Feature Selection for Multi-stream Automatic Speech Recognition
- Gelbart, D.¹

13
- 84962920708
- Evaluating long-term spectral subtraction for reverberant ASR
- Gelbart, D., and Morgan, N. (2001). "Evaluating long-term spectral subtraction for reverberant ASR," in Proceedings of IEEE Automatic Speech Recognition and Understanding, pp. 190-193.
- (2001) Proceedings of IEEE Automatic Speech Recognition and Understanding , pp. 190-193
- Gelbart, D.¹ Morgan, N.²

14
- 85009252959
- Double the trouble: Handling noise and reverberation in far-field automatic speech recognition
- Gelbart, D., and Morgan, N. (2002). "Double the trouble: Handling noise and reverberation in far-field automatic speech recognition," in Proceedings of Interspeech, pp. 2185-2188.
- (2002) Proceedings of Interspeech , pp. 2185-2188
- Gelbart, D.¹ Morgan, N.²

15
- 33745533621
- The development of AMI system for transcription of speech in meetings
- Hain, T., Burget, L., Dines, J., McCowan, I., Karafiat, M., Lincoln,M., Moore D., Garau, G., Wan, V., Ordelman, R., and Renals, S. (2005). "The development of AMI system for transcription of speech in meetings," in Proceedings of Machine Learning for Multimodal Interaction, pp. 344-356.
- (2005) Proceedings of Machine Learning for Multimodal Interaction , pp. 344-356
- Hain, T.¹ Burget, L.² Dines, J.³ McCowan, I.⁴ Karafiat, M.⁵ Lincoln, M.⁶ Moore, D.⁷ Garau, G.⁸ Wan, V.⁹ Ordelman, R.¹⁰ Renals, S.¹¹

16
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky, H. (1990). "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87(4), 1738-1752.
- (1990) J. Acoust. Soc. Am. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

17
- 33745213373
- Multi-resolution RASTA filtering for TANDEM-based ASR
- Hermansky, H., and Fousek, P. (2005). "Multi-resolution RASTA filtering for TANDEM-based ASR," in Proceedings of Interspeech, pp. 361-364.
- (2005) Proceedings of Interspeech , pp. 361-364
- Hermansky, H.¹ Fousek, P.²

18
- 0028517164
- RASTA processing of speech
- Hermansky, H., and Morgan, N. (1994). "RASTA processing of speech," IEEE Trans. Speech Audio Process. 2, 578-589.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

19
- 0003235731
- TRAPS - Classifiers of Temporal Patterns
- Hermansky, H., and Sharma, S. (1998). "TRAPS - Classifiers of Temporal Patterns," in Proceedings of Interspeech, pp. 1817-1820.
- (1998) Proceedings of Interspeech , pp. 1817-1820
- Hermansky, H.¹ Sharma, S.²

20
- 79551573428
- Last viewed September 18, 2009
- Hirsch, H. G. (2001) "FaNT: Filtering and noise adding tool," http://dnt.kr.hsnr.de/download.html (Last viewed September 18, 2009).
- (2001) FaNT: Filtering and Noise Adding Tool
- Hirsch, H.G.¹

21
- 33745206705
- The simulation of realistic acoustic input scenarios for speech recognition systems
- Hirsch, H. G., and Finster, H. (2005). "The simulation of realistic acoustic input scenarios for speech recognition systems," in Proceedings of Interspeech, pp. 2697-3000.
- (2005) Proceedings of Interspeech , pp. 2697-3000
- Hirsch, H.G.¹ Finster, H.²

22
- 0019060580
- PREDICTING SPEECH INTELLIGIBILITY in ROOMS from the MODULATION TRANSFER FUNCTION - 1. GENERAL ROOM ACOUSTICS
- Houtgast, T., Steeneken, H. J. M., and Plomp, R. (1980). "Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics," Acoustica 46, 60-72. (Pubitemid 11477041)
- (1980) Acustica , vol.46 , Issue.1 , pp. 60-72
- Houtgast, T.¹ Steeneken, H.J.M.² Plomp, R.³

23
- 0032136330
- Robust speech recognition using the modulation spectrogram
- PII S0167639398000326
- Kingsbury, B. E. D., Morgan, N., and Greenberg, S. (1998). "Robust speech recognition using the modulation spectrogram," Speech Commun. 25(1-3), 117-132. (Pubitemid 128413637)
- (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 117-132
- Kingsbury, B.E.D.¹ Morgan, N.² Greenberg, S.³

24
- 0033004349
- Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications
- Kumerasan, R., and Rao, A. (1999). "Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications," J. Acoust. Soc. Am. 105(3), 1912-1924.
- (1999) J. Acoust. Soc. Am. , vol.105 , Issue.3 , pp. 1912-1924
- Kumerasan, R.¹ Rao, A.²

25
- 0024768209
- Speaker independent phone recognition using hiddenMarkov models
- Lee, K. F. (1989). "Speaker independent phone recognition using hiddenMarkov models," IEEE Trans. Acoust., Speech, Signal Process. 37(11), 1641-1648.
- (1989) IEEE Trans. Acoust., Speech, Signal Process. , vol.37 , Issue.11 , pp. 1641-1648
- Lee, K.F.¹

26
- 0016495091
- 'Linear prediction: A tutorial review
- Makhoul, J. (1975). "'Linear prediction: A tutorial review," Proc. IEEE 63(4), 561-580.
- (1975) Proc. IEEE , vol.63 , Issue.4 , pp. 561-580
- Makhoul, J.¹

27
- 0032634932
- Computing the discrete-time analytic signal via FFT
- Marple, L. S. (1999). "Computing the discrete-time analytic signal via FFT," IEEE Trans. Signal Process. 47(9), 2600-2603.
- (1999) IEEE Trans. Signal Process. , vol.47 , Issue.9 , pp. 2600-2603
- Marple, L.S.¹

28
- 0026384344
- Continuous speech recognition using PLP analysis with multi-layer perceptrons
- Morgan, N., Hermansky, H., Bourlard, H., Kohn, P., and Wooters, C. (1992). "Continuous speech recognition using PLP analysis with multi-layer perceptrons," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 49-52.
- (1992) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , vol.1 , pp. 49-52
- Morgan, N.¹ Hermansky, H.² Bourlard, H.³ Kohn, P.⁴ Wooters, C.⁵

29
- 0020499643
- Modelling and enhancement of reverberant speech using an envelope convolution method
- Mourjopoulos, J., and Hammond, J. K. (1983). "Modelling and enhancement of reverberant speech using an envelope convolution method," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1144-1147.
- (1983) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 1144-1147
- Mourjopoulos, J.¹ Hammond, J.K.²

30
- 0003200767
- The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
- Pearce, D., and Hirsch, H. G. (2000). "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," in ISCA Tutorial and Research Workshop ASR2000, pp. 29-32.
- (2000) ISCA Tutorial and Research Workshop ASR2000 , pp. 29-32
- Pearce, D.¹ Hirsch, H.G.²

31
- 51449100229
- Exploiting contextual information for improved phoneme recognition
- Pinto, J., Yegnanarayana, B., Hermansky, H., and Doss, M. M. (2008). "Exploiting contextual information for improved phoneme recognition," in Proceedings of IEEE Conference on Acoustics, Speech and Signal Processing, pp. 4449-4452.
- (2008) Proceedings of IEEE Conference on Acoustics, Speech and Signal Processing , pp. 4449-4452
- Pinto, J.¹ Yegnanarayana, B.² Hermansky, H.³ Doss, M.M.⁴

32
- 0030682302
- HTIMIT and LLHDB: Speech corpora for the study of hand set transducer effects
- Reynolds, D. A. (1997). "HTIMIT and LLHDB: Speech corpora for the study of hand set transducer effects," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1535-1538.
- (1997) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 1535-1538
- Reynolds, D.A.¹

33
- 0001613977
- Differential sensitivity of the ear for pure tones
- Riesz, R. R. (1928). "Differential sensitivity of the ear for pure tones," Phys. Rev. 31, 867-875.
- (1928) Phys. Rev. , vol.31 , pp. 867-875
- Riesz, R.R.¹

34
- 77949394249
- Ph.D. thesis, BUT, Brno, CZ
- Schwarz, P. (2008). "Phoneme recognition based on long temporal context," Ph.D. thesis, BUT, Brno, CZ.
- (2008) Phoneme Recognition Based on Long Temporal Context
- Schwarz, P.¹

35
- 0028823541
- Speech recognition with primarily temporal cues
- Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., and Ekelid, M. (1995). "Speech recognition with primarily temporal cues," Science 270(5234), 303-304.
- (1995) Science , vol.270 , Issue.5234 , pp. 303-304
- Shannon, R.V.¹ Zeng, F.G.² Kamath, V.³ Wygonski, J.⁴ Ekelid, M.⁵

36
- 70349223037
- An auditory-based feature for robust speech recognition
- Shao, Y., Jin, Z., Wang, D. L., and Srinivasan, S. (2009). "An auditory-based feature for robust speech recognition," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4625-4628.
- (2009) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 4625-4628
- Shao, Y.¹ Jin, Z.² Wang, D.L.³ Srinivasan, S.⁴

37
- 0032828464
- A model of auditory perception as front end for automatic speech recognition
- Tchorz, J., and Kollmeier, B. (1999). "A model of auditory perception as front end for automatic speech recognition," J. Acoust. Soc. Am. 106(4), 2040-2050.
- (1999) J. Acoust. Soc. Am. , vol.106 , Issue.4 , pp. 2040-2050
- Tchorz, J.¹ Kollmeier, B.²

38
- 67650107416
- Recognition of reverberant speech using frequency domain linear prediction
- Thomas, S., Ganapathy, S., and Hermansky, H. (2008). "Recognition of reverberant speech using frequency domain linear prediction," IEEE Signal Process. Lett. 15, 681-684.
- (2008) IEEE Signal Process. Lett. , vol.15 , pp. 681-684
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

39
- 0034842487
- Scalable and progressive audio codec
- Vinton, M. S., and Atlas, L. E. (2001). "Scalable and progressive audio codec," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3277-3280.
- (2001) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 3277-3280
- Vinton, M.S.¹ Atlas, L.E.²

40
- 35248862134
- Spectral and temporal cues for phoneme recognition in noise
- DOI 10.1121/1.2767000
- Xu, L., and Zheng, Y. (2007). "Spectral and temporal cues for phoneme recognition in noise," J. Acoust. Soc. Am. 122(3), 1758-1764. (Pubitemid 47560537)
- (2007) Journal of the Acoustical Society of America , vol.122 , Issue.3 , pp. 1758-1764
- Xu, L.¹ Zheng, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.