SCOPUS 정보 검색 플랫폼

Volumn 5, Issue 2, 2004, Pages 103-117

The fusion of visual lip movements and mixed speech signals for robust speech separation

Author keywords

Audiovisual information fusion; Audiovisual signal separation; Blind speech separation; Independent component analysis; Robust speech recognition

Indexed keywords

ACOUSTIC SIGNAL PROCESSING; ALGORITHMS; CLASSIFICATION (OF INFORMATION); COMPUTER SIMULATION; CORRELATION METHODS; DATABASE SYSTEMS; ESTIMATION; FEATURE EXTRACTION; GAUSSIAN NOISE (ELECTRONIC); INDEPENDENT COMPONENT ANALYSIS; SIGNAL TO NOISE RATIO; VECTORS; VIDEO SIGNAL PROCESSING;

AUDIOVISUAL INFORMATION FUSION; BLIND SPEECH SEPARATION; ROBUST SPEECH RECOGNITION;

SPEECH PROCESSING;

EID: 1842854565 PISSN: 15662535 EISSN: None Source Type: Journal
DOI: 10.1016/j.inffus.2003.10.006 Document Type: Article

Times cited : (8)

References (32)

1
- 0035458007
- Multi-modal sound localization using audiovisual information Fusion
- Aarabi P., Zaky S. Multi-modal sound localization using audiovisual information Fusion. Information Fusion. 3(2):2001;209-223.
- (2001) Information Fusion , vol.3 , Issue.2 , pp. 209-223
- Aarabi, P.¹ Zaky, S.²

2
- 0003456875
- M.A.Sc. Thesis, Department of Electrical and Computer Engineering, University of Toronto, June
- P. Aarabi, Multi-sense artificial awareness, M.A.Sc. Thesis, Department of Electrical and Computer Engineering, University of Toronto, June 1999.
- (1999) Multi-sense Artificial Awareness
- Aarabi, P.¹

3
- 84863153827
- Integrated vision and sound localization
- Paris, France, July
- P. Aarabi, S. Zaky, Integrated vision and sound localization, in: Proceedings of the 3rd International Conference on Information Fusion, Paris, France, July 2000.
- (2000) Proceedings of the 3rd International Conference on Information Fusion
- Aarabi, P.¹ Zaky, S.²

4
- 0022021789
- The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects
- Grant K.W., Ardell L.H., Kuhl P.K., Sparks D.W. The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. Journal of the Acoustical Society of America. 77(2):1985;671-677.
- (1985) Journal of the Acoustical Society of America , vol.77 , Issue.2 , pp. 671-677
- Grant, K.W.¹ Ardell, L.H.² Kuhl, P.K.³ Sparks, D.W.⁴

5
- 0031069562
- Speechreading using probabilistic models
- Luettin J., Thacker N.A. Speechreading using probabilistic models. Computer Vision and Image Understanding. 65(2):1997;163-178.
- (1997) Computer Vision and Image Understanding , vol.65 , Issue.2 , pp. 163-178
- Luettin, J.¹ Thacker, N.A.²

6
- 0029765665
- Visual speech recognition using active shape models and hidden Markov models
- Atlanta, GA, May
- J. Luettin, N.A. Thacker, S.W. Beet, Visual speech recognition using active shape models and hidden Markov models, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, May 1996.
- (1996) Proceedings of the International Conference on Acoustics, Speech, and Signal Processing
- Luettin, J.¹ Thacker, N.A.² Beet, S.W.³

7
- 0010630633
- Continuous audio-visual speech recognition
- J. Luettin, S. Dupont, Continuous audio-visual speech recognition, in: Proceedings of the 5th European Conference on Computer Vision, 1998.
- (1998) Proceedings of the 5th European Conference on Computer Vision
- Luettin, J.¹ Dupont, S.²

8
- 0033708494
- Audio-visual intent-to-speak detection for human-computer interaction
- Istanbul, Turkey, 5-9 June
- C. Neti, P. deCuetos, A. Senior, Audio-visual intent-to-speak detection for human-computer interaction, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 5-9 June 2000.
- (2000) Proceedings of the International Conference on Acoustics, Speech, and Signal Processing
- Neti, C.¹ DeCuetos, P.² Senior, A.³

9
- 0002358797
- Discriminative learning of visual data for audiovisual speech recognition
- Rogozan A. Discriminative learning of visual data for audiovisual speech recognition. International Journal on Artificial Intelligence Tools. 8(1):1999;43-52.
- (1999) International Journal on Artificial Intelligence Tools , vol.8 , Issue.1 , pp. 43-52
- Rogozan, A.¹

10
- 0032180188
- Adaptive fusion of acoustic and visual sources for automatic speech recognition
- Rogozan A., Deléglise P. Adaptive fusion of acoustic and visual sources for automatic speech recognition. Speech Communication. 26(1-2):1998;149-161.
- (1998) Speech Communication , vol.26 , Issue.1-2 , pp. 149-161
- Rogozan, A.¹ Deléglise, P.²

11
- 0000886386
- Visual speech recognition with stochastic networks
- G. Tesauro, D. Toruetzky, & T. Leen. Cambridge: MIT Press
- Movellan J.R. Visual speech recognition with stochastic networks. Tesauro G., Toruetzky D., Leen T. Advances in Neural Information Processing Systems. vol. 7:1995;MIT Press, Cambridge.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Movellan, J.R.¹

12
- 0006464281
- Automatic computer lip-reading using fuzzy set theory
- Santa Cruz, CA
- J. Baldwin, T. Martin, M. Saeed, Automatic computer lip-reading using fuzzy set theory, in: Proceedings of AVSP 99, Santa Cruz, CA, 1999.
- (1999) Proceedings of AVSP 99
- Baldwin, J.¹ Martin, T.² Saeed, M.³

13
- 0032309170
- 3D modeling and tracking of human lip motion
- S. Basu, N. Oliver, A. Pentland, 3D modeling and tracking of human lip motion, in: IEEE International Conference on Computer Vision, 1998.
- (1998) IEEE International Conference on Computer Vision
- Basu, S.¹ Oliver, N.² Pentland, A.³

14
- 0000134331
- 2D deformable models for visual speech analysis
- D.G. Stork, M.E. Hennecke (Eds.). Speechreading by Humans and Machines, Springer Verlag, Berlin
- Coianiz T., Torresani L., Capril B. 2D deformable models for visual speech analysis. Stork D.G., Hennecke M.E. Speechreading by Humans and Machines. NATO ASI Series, Series F: Computer and Systems Sciences. vol. 150:1996;391-398 Springer Verlag, Berlin.
- (1996) NATO ASI Series, Series F: Computer and Systems Sciences , vol.150 , pp. 391-398
- Coianiz, T.¹ Torresani, L.² Capril, B.³

15
- 84947902135
- Statistical chromaticity models for lip tracking with B-splines
- Lecture Notes in Computer Science, Springer Verlag
- Ramos Sanchez M.U., Matas J., Kittler J. Statistical chromaticity models for lip tracking with B-splines. Proceedings of the First International Conference on Audio- and Video-based Biometric Person Authentication. Lecture Notes in Computer Science. 1997;69-76 Springer Verlag.
- (1997) Proceedings of the First International Conference on Audio- and Video-based Biometric Person Authentication , pp. 69-76
- Ramos Sanchez, M.U.¹ Matas, J.² Kittler, J.³

16
- 0018074107
- Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma
- Erber N.P., De Flippo C.L. Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma/. Journal of the Acoustical Society of America. 64:1978;1015-1019.
- (1978) Journal of the Acoustical Society of America , vol.64 , pp. 1015-1019
- Erber, N.P.¹ De Flippo, C.L.²

17
- 0022411853
- On the role of visual rate information in phonetic perception
- Green K.P., Miller J.L. On the role of visual rate information in phonetic perception. Perception and Psychophysics. 38(3):1985;269-276.
- (1985) Perception and Psychophysics , vol.38 , Issue.3 , pp. 269-276
- Green, K.P.¹ Miller, J.L.²

18
- 0022228262
- Automatic lipreading to enhance speech recognition
- E.D. Petajan, Automatic lipreading to enhance speech recognition, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1985, pp. 40-47.
- (1985) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , pp. 40-47
- Petajan, E.D.¹

19
- 0030247984
- Computer lipreading for improved accuracy in automatic speech recognition
- Silsbee P.L., Bovik A.C. Computer lipreading for improved accuracy in automatic speech recognition. IEEE Transactions on Speech and Audio Processing. 4(5):1996;337-351.
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 337-351
- Silsbee, P.L.¹ Bovik, A.C.²

20
- 0029234004
- Nonlinear manifold learning for visual speech recognition
- C. Breglar, S.M. Omohundro, Nonlinear manifold learning for visual speech recognition, in: IEEE International Conference on Computer Vision, 1995, pp. 494-499.
- (1995) IEEE International Conference on Computer Vision , pp. 494-499
- Breglar, C.¹ Omohundro, S.M.²

21
- 0029747053
- Integrating audio and visual information to provide highly robust speech recognition
- M.J. Tomlinson, M.J. Russell, N.M. Brooke, Integrating audio and visual information to provide highly robust speech recognition, in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 1996, pp. 821-824.
- (1996) Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing , vol.2 , pp. 821-824
- Tomlinson, M.J.¹ Russell, M.J.² Brooke, N.M.³

22
- 0025503485
- Neural network models of sensory integration for improved vowel recognition
- Yuhas B.P., Goldstein M.H., Sejnowski T.J., Jenkins R.E. Neural network models of sensory integration for improved vowel recognition. Proceedings of the IEEE. 78(10):1990;1658-1668.
- (1990) Proceedings of the IEEE , vol.78 , Issue.10 , pp. 1658-1668
- Yuhas, B.P.¹ Goldstein, M.H.² Sejnowski, T.J.³ Jenkins, R.E.⁴

23
- 0025750892
- Automatic lipreading by optical flow analysis
- Mase K., Pentland A. Automatic lipreading by optical flow analysis. Systems and Computers in Japan. 22(6):1991.
- (1991) Systems and Computers in Japan , vol.22 , Issue.6
- Mase, K.¹ Pentland, A.²

24
- 0003250216
- A robust method for speech signal time-delay estimation in reverberant rooms
- Atlanta, Georgia, May
- M.S. Brandstein, H. Silverman, A robust method for speech signal time-delay estimation in reverberant rooms, in: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, May 1996.
- (1996) Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing
- Brandstein, M.S.¹ Silverman, H.²

25
- 0029411030
- An information-maximization approach to blind separation and blind deconvolution
- Bell A., Sejnowski T. An information-maximization approach to blind separation and blind deconvolution. Neural Computation. 7(7):1995;1129-1159.
- (1995) Neural Computation , vol.7 , Issue.7 , pp. 1129-1159
- Bell, A.¹ Sejnowski, T.²

26
- 0029725825
- Blind separation of delayed sources based on information maximization
- May
- K. Torkkola, Blind separation of delayed sources based on information maximization, ICASSP, May 1996.
- (1996) ICASSP
- Torkkola, K.¹

27
- 0000717513
- A new learning algorithm for blind signal separation
- S. Amari, A. Cichocki, H. Yang, A new learning algorithm for blind signal separation, Advances in Neural Information Processing Systems, vol. 8, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Amari, S.¹ Cichocki, A.² Yang, H.³

28
- 0030658281
- Blind source separation of real world signals
- Lee T.W., Bell A.J., Orglmeister R. Blind source separation of real world signals. Proc. ICNN. 1997.
- (1997) Proc. ICNN
- Lee, T.W.¹ Bell, A.J.² Orglmeister, R.³

29
- 0036383219
- Robust speech separation using visually constructed speech signals
- Orlando, FL, April
- P. Aarabi, N.H. Khameneh, Robust speech separation using visually constructed speech signals, in: Proceedings of Sensor Fusion: Architectures, Algorithms, and Applications VI (AeroSense'01), Orlando, FL, April 2002.
- (2002) Proceedings of Sensor Fusion: Architectures, Algorithms, and Applications VI (AeroSense'01)
- Aarabi, P.¹ Khameneh, N.H.²

30
- 1842830672
- Audio-visual segmentation and the cocktail party effect
- Beijing, October
- T. Darrell, J. Fisher, P. Viola, B. Freeman, Audio-visual segmentation and the cocktail party effect, in: Proceedings of the International Conference on Multimodal Interfaces, Beijing, October 2000.
- (2000) Proceedings of the International Conference on Multimodal Interfaces
- Darrell, T.¹ Fisher, J.² Viola, P.³ Freeman, B.⁴

31
- 1842613015
- Genetic sensor selection enhanced independent component analysis and its applications to robust speech recognition
- Baltimore, MD, June
- P. Aarabi, Genetic sensor selection enhanced independent component analysis and its applications to robust speech recognition, in: Proceedings of the 5th IEEE Workshop on Nonlinear Signal and Image Processing (NSIP '01), Baltimore, MD, June 2001.
- (2001) Proceedings of the 5th IEEE Workshop on Nonlinear Signal and Image Processing (NSIP '01)
- Aarabi, P.¹

32
- 0031629543
- Direct blind separation of independent non-Gaussian signals with dynamic channels
- London, England
- R.-W. Liu, Hui Luo, Direct blind separation of independent non-Gaussian signals with dynamic channels, in: Proceedings of 5th IEEE International Workshop on Cellular Neural Networks and their Applications, London, England, 1998, pp. 34-38.
- (1998) Proceedings of 5th IEEE International Workshop on Cellular Neural Networks and their Applications , pp. 34-38
- Liu, R.-W.¹ Luo, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.