SCOPUS 정보 검색 플랫폼

Volumn 24, Issue 1, 2010, Pages 77-93

A computational auditory scene analysis system for speech segregation and robust speech recognition

(4) Shao, Yang a Srinivasan, Soundararajan b Jin, Zhaozhang a Wang, DeLiang a,b

a The Ohio State University (United States)

Author keywords

Binary time frequency mask; Computational Auditory Scene Analysis; Robust speech recognition; Speech segregation; Uncertainty decoding

Indexed keywords

BINARY TIME-FREQUENCY MASK; COMPUTATIONAL AUDITORY SCENE ANALYSIS; ROBUST SPEECH RECOGNITION; SPEECH SEGREGATION; UNCERTAINTY DECODING;

DECODING; PATIENT REHABILITATION; SEGREGATION (METALLOGRAPHY); SIGNAL PROCESSING; SPEECH COMMUNICATION; UNCERTAINTY ANALYSIS;

SPEECH RECOGNITION;

EID: 69249159165 PISSN: 08852308 EISSN: 10958363 Source Type: Journal
DOI: 10.1016/j.csl.2008.03.004 Document Type: Article

Times cited : (103)

References (38)

1
- 38849083727
- Morgan & Claypool, San Rafael, CA
- Allen J.B. Articulation and Intelligibility (2005), Morgan & Claypool, San Rafael, CA
- (2005) Articulation and Intelligibility
- Allen, J.B.¹

2
- 0003684441
- The MIT Press, Cambridge, MA
- Bregman A.S. Auditory Scene Analysis (1990), The MIT Press, Cambridge, MA
- (1990) Auditory Scene Analysis
- Bregman, A.S.¹

3
- 34547539772
- Available from
- Cooke, M., Lee, T., 2006. Speech separation and recognition competition. Available from: .
- (2006) Speech separation and recognition competition
- Cooke, M.¹ Lee, T.²

4
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- Cooke M., Green P., Josifovski L., and Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34 (2001) 267-285
- (2001) Speech Commun. , vol.34 , pp. 267-285
- Cooke, M.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

5
- 18744401086
- Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
- Deng L., Droppo J., and Acero A. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech Audio Process. 13 (2005) 412-421
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , pp. 412-421
- Deng, L.¹ Droppo, J.² Acero, A.³

6
- 4544369701
- A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel
- Deoras, A.N., Hasegawa-Johnson, M., 2004. A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. In: Proceedings of ICASSP'04, vol. 1. pp. 861-864.
- (2004) Proceedings of ICASSP'04 , vol.1 , pp. 861-864
- Deoras, A.N.¹ Hasegawa-Johnson, M.²

7
- 0026843273
- A Bayesian estimation approach for speech enhancement using hidden Markov models
- Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. Signal Process. 40 4 (1992) 725-735
- (1992) IEEE Trans. Signal Process. , vol.40 , Issue.4 , pp. 725-735
- Ephraim, Y.¹

8
- 0004072715
- Marcel Dekker, New York
- Furui S. Digital Speech Processing, Synthesis, and Recognition (2001), Marcel Dekker, New York
- (2001) Digital Speech Processing, Synthesis, and Recognition
- Furui, S.¹

9
- 0030245128
- Robust continuous speech recognition using parallel model combination
- Gales M.J.F., and Young S.J. Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4 (1996) 352-359
- (1996) IEEE Trans. Speech Audio Process. , vol.4 , pp. 352-359
- Gales, M.J.F.¹ Young, S.J.²

10
- 85045165251
- Ph.D. Thesis, Biophysics Program, The Ohio State University
- Hu, G., 2006. Monaural speech organization and segregation. Ph.D. Thesis, Biophysics Program, The Ohio State University.
- (2006) Monaural speech organization and segregation
- Hu, G.¹

11
- 4644265990
- Monaural speech segregation based on pitch tracking and amplitude modulation
- Hu G., and Wang D.L. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15 (2004) 1135-1150
- (2004) IEEE Trans. Neural Networks , vol.15 , pp. 1135-1150
- Hu, G.¹ Wang, D.L.²

12
- 46049084696
- An auditory scene analysis approach to monaural speech segregation
- Hansler E., and Schmidt G. (Eds), Springer, Heidelberg
- Hu G., and Wang D.L. An auditory scene analysis approach to monaural speech segregation. In: Hansler E., and Schmidt G. (Eds). Topics in Acoustic Echo and Noise Control (2006), Springer, Heidelberg 485-515
- (2006) Topics in Acoustic Echo and Noise Control , pp. 485-515
- Hu, G.¹ Wang, D.L.²

13
- 38849102154
- Auditory segmentation based on onset and offset analysis
- Hu G., and Wang D.L. Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Language Process. 15 (2007) 396-405
- (2007) IEEE Trans. Audio Speech Language Process. , vol.15 , pp. 396-405
- Hu, G.¹ Wang, D.L.²

14
- 0004056285
- Prentice Hall PTR, Upper Saddle River, NJ
- Huang X., Acero A., and Hon H. Spoken Language Processing (2001), Prentice Hall PTR, Upper Saddle River, NJ
- (2001) Spoken Language Processing
- Huang, X.¹ Acero, A.² Hon, H.³

15
- 84899014722
- A probabilistic approach to single channel blind signal separation
- Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
- Jang G., and Lee T. A probabilistic approach to single channel blind signal separation. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in Neural Information Processing Systems, vol. 15 (2003), MIT Press, Cambridge, MA 1173-1180
- (2003) Advances in Neural Information Processing Systems, vol. 15 , pp. 1173-1180
- Jang, G.¹ Lee, T.²

16
- 4644257621
- Single microphone source separation using high resolution signal reconstruction
- Kristjansson, T., Attias, H., Hershey, J., 2004. Single microphone source separation using high resolution signal reconstruction. In: Proceedings of ICASSP'04, vol. 2. pp. 817-820.
- (2004) Proceedings of ICASSP'04 , vol.2 , pp. 817-820
- Kristjansson, T.¹ Attias, H.² Hershey, J.³

17
- 0034852834
- Developing usable speech criteria for speaker identification technology
- Lovekin, J.M., Yantorno, R.E., Krishnamachari, K.R., Benincasa, D.S., Wenndt, S.J., 2001. Developing usable speech criteria for speaker identification technology. In: Proceedings of ICASSP'01, pp. 421-424.
- (2001) Proceedings of ICASSP'01 , pp. 421-424
- Lovekin, J.M.¹ Yantorno, R.E.² Krishnamachari, K.R.³ Benincasa, D.S.⁴ Wenndt, S.J.⁵

18
- 0023944462
- Simulation of auditory neural transduction: further studies
- Meddis R. Simulation of auditory neural transduction: further studies. The Journal of the Acoustical Society of America 83 (1988) 1056-1063
- (1988) The Journal of the Acoustical Society of America , vol.83 , pp. 1056-1063
- Meddis, R.¹

19
- 0003789815
- Academic Press, San Diego, CA
- Moore B.C.J. An Introduction to the Psychology of Hearing. fifth ed. (2003), Academic Press, San Diego, CA
- (2003) An Introduction to the Psychology of Hearing. fifth ed.
- Moore, B.C.J.¹

20
- 0003513556
- Prentice-Hall, Inc., Upper Saddle River, NJ
- Oppenheim A.V., Schafer R.W., and Buck J.R. Discrete-time Signal Processing. second ed. (1999), Prentice-Hall, Inc., Upper Saddle River, NJ
- (1999) Discrete-time Signal Processing. second ed.
- Oppenheim, A.V.¹ Schafer, R.W.² Buck, J.R.³

21
- 0009804718
- Auditory models as preprocessors for speech recognition
- Schouten M.E.H. (Ed), Mouton de Gruyter, Berlin, Germany (Chapter 1)
- Patterson R.D., Holdsworth J., and Allerhand M. Auditory models as preprocessors for speech recognition. In: Schouten M.E.H. (Ed). The Auditory Processing of Speech: From Sounds to Words (1992), Mouton de Gruyter, Berlin, Germany 67-83 (Chapter 1)
- (1992) The Auditory Processing of Speech: From Sounds to Words , pp. 67-83
- Patterson, R.D.¹ Holdsworth, J.² Allerhand, M.³

22
- 4644336054
- Reconstruction of missing features for robust speech recognition
- Raj B., Seltzer M.L., and Stern R.M. Reconstruction of missing features for robust speech recognition. Speech Commun. 43 (2004) 275-296
- (2004) Speech Commun. , vol.43 , pp. 275-296
- Raj, B.¹ Seltzer, M.L.² Stern, R.M.³

23
- 33745190244
- Recognizing speech from simultaneous speakers
- Raj, B., Singh, R., Smaragdis, P., 2005. Recognizing speech from simultaneous speakers. In: Proceedings of Interspeech'05, pp. 3317-3320.
- (2005) Proceedings of Interspeech'05 , pp. 3317-3320
- Raj, B.¹ Singh, R.² Smaragdis, P.³

24
- 0142026377
- Speech segregation based on sound localization
- Roman N., Wang D.L., and Brown G.J. Speech segregation based on sound localization. J. Acoust. Soc. Am. 114 (2003) 2236-2252
- (2003) J. Acoust. Soc. Am. , vol.114 , pp. 2236-2252
- Roman, N.¹ Wang, D.L.² Brown, G.J.³

25
- 84892289719
- Automatic speech processing by inference in generative models
- Divenyi P. (Ed), Kluwer Academic, Norwell, MA
- Roweis S.T. Automatic speech processing by inference in generative models. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Kluwer Academic, Norwell, MA 97-134
- (2005) Speech Separation by Humans and Machines , pp. 97-134
- Roweis, S.T.¹

26
- 46049084086
- Ph.D. Thesis, Computer Science and Engineering, The Ohio State University
- Shao, Y., 2007. Sequential organization in computational auditory scene analysis. Ph.D. Thesis, Computer Science and Engineering, The Ohio State University.
- (2007) Sequential organization in computational auditory scene analysis
- Shao, Y.¹

27
- 33744996003
- Model-based sequential organization in cochannel speech
- Shao Y., and Wang D.L. Model-based sequential organization in cochannel speech. IEEE Trans. Audio Speech Language Process. 14 (2006) 289-298
- (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , pp. 289-298
- Shao, Y.¹ Wang, D.L.²

28
- 34547499683
- Incorporating auditory feature uncertainties in robust speaker identification
- Shao, Y., Srinivasan, S., Wang, D.L., 2007. Incorporating auditory feature uncertainties in robust speaker identification. In: Proceedings of ICASSP'07, vol. IV, pp. 277-280.
- (2007) Proceedings of ICASSP'07 , vol.4 , pp. 277-280
- Shao, Y.¹ Srinivasan, S.² Wang, D.L.³

29
- 56249136428
- Transforming binary uncertainties for robust speech recognition
- Srinivasan S., and Wang D.L. Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 15 7 (2007) 2130-2140
- (2007) IEEE Trans. Audio, Speech Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
- Srinivasan, S.¹ Wang, D.L.²

30
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- Srinivasan S., Roman N., and Wang D.L. Binary and ratio time-frequency masks for robust speech recognition. Speech Commun. 48 (2006) 1486-1501
- (2006) Speech Commun. , vol.48 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.L.³

31
- 0025681008
- Hidden Markov model decomposition of speech and noise
- Varga, A.P., Moore, R.K., 1990. Hidden Markov model decomposition of speech and noise. In: Proceedings of ICASSP'90, pp. 845-848.
- (1990) Proceedings of ICASSP'90 , pp. 845-848
- Varga, A.P.¹ Moore, R.K.²

32
- 0026172104
- Watersheds in digital spaces: an efficient algorithm based on immersion simulations
- Vincent L., and Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 13 6 (1991) 583-598
- (1991) IEEE Trans. Pattern Anal. Mach. Intell. , vol.13 , Issue.6 , pp. 583-598
- Vincent, L.¹ Soille, P.²

33
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- Divenyi P. (Ed), Norwell, MA
- Wang D.L. On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Norwell, MA 181-197
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

34
- 85011300842
- Feature-based speech segregation
- Wang D.L., and Brown G.J. (Eds), Wiley-IEEE Press, Hoboken, NJ
- Wang D.L. Feature-based speech segregation. In: Wang D.L., and Brown G.J. (Eds). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (2006), Wiley-IEEE Press, Hoboken, NJ 81-114
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , pp. 81-114
- Wang, D.L.¹

35
- 0032682770
- Separation of speech from interfering sounds based on oscillatory correlation
- Wang D.L., and Brown G.J. Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Networks 10 3 (1999) 684-697
- (1999) IEEE Trans. Neural Networks , vol.10 , Issue.3 , pp. 684-697
- Wang, D.L.¹ Brown, G.J.²

36
- 82255178542
- Wang D.L., and Brown G.J. (Eds), Wiley-IEEE Press, Hoboken, NJ
- In: Wang D.L., and Brown G.J. (Eds). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (2006), Wiley-IEEE Press, Hoboken, NJ
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

37
- 84957681902
- Weickert, J., 1997. A review of nonlinear diffusion filtering. In: Romeny, B.H., Florack, L.J.K.a.M.V. (Eds.), Scale-space Theory in Computer Vision. Springer, Berlin, pp. 3-28.
- Weickert, J., 1997. A review of nonlinear diffusion filtering. In: Romeny, B.H., Florack, L.J.K.a.M.V. (Eds.), Scale-space Theory in Computer Vision. Springer, Berlin, pp. 3-28.

38
- 69249132867
- Young, S, Kershaw, D, Odell, J, Valtchev, V, Woodland, P, 2000. The HTK Book for HTK Version 3.0, Microsoft Corporation
- Young, S., Kershaw, D., Odell, J., Valtchev, V., Woodland, P., 2000. The HTK Book (for HTK Version 3.0). Microsoft Corporation.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.