메뉴 건너뛰기




Volumn 15, Issue 8, 2007, Pages 2257-2269

Speech enhancement and recognition in meetings with an audio-visual sensor array

Author keywords

Audio x2013; visual fusion; Microphone array processing; Multiobject tracking; Speech enhancement; Speech recognition

Indexed keywords

AUDIO VISUALS; AUDIO-VISUAL SENSORS; BEAM-FORMING TECHNIQUES; INTEGRATED APPROACHES; MICROPHONE ARRAY PROCESSING; MICROPHONE ARRAYS; MULTIOBJECT TRACKING; POST-FILTERING; RECOGNITION PERFORMANCE; SPATIAL FILTERING; SPEAKER TRACKING; SPEECH ACQUISITIONS; SPEECH RECOGNITION SYSTEMS; SPEECH SIGNALS; TABLE TOPS;

EID: 40249089621     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2007.906197     Document Type: Article
Times cited : (52)

References (59)
  • 1
    • 26844474912 scopus 로고    scopus 로고
    • Living laboratories: The future computing environments group at the Georgia Institute of Technology
    • Hague, Apr
    • G. Abowd et al., "Living laboratories: The future computing environments group at the Georgia Institute of Technology," in Proc. Conf. Human Factors in Comput. Syst. (CHI), Hague, Apr. 2000, pp. 215-216.
    • (2000) Proc. Conf. Human Factors in Comput. Syst. (CHI) , pp. 215-216
    • Abowd, G.1
  • 2
    • 10244242647 scopus 로고    scopus 로고
    • Detection and separation of speech event using audio and video information fusion
    • F. Asano et al., "Detection and separation of speech event using audio and video information fusion," J. Appl. Signal Process., vol. 11, pp. 1727-1738, 2004.
    • (2004) J. Appl. Signal Process , vol.11 , pp. 1727-1738
    • Asano, F.1
  • 4
  • 5
    • 0032665455 scopus 로고    scopus 로고
    • Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement
    • J. Bitzer, K. S. Uwe, and K. Kammeyer, "Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1999, vol. 5, pp. 2965-2968.
    • (1999) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , vol.5 , pp. 2965-2968
    • Bitzer, J.1    Uwe, K.S.2    Kammeyer, K.3
  • 6
    • 38049107298 scopus 로고    scopus 로고
    • A generative approach to audio-visual person tracking
    • Southampton, U.K, Apr
    • R. Brunelli et al., "A generative approach to audio-visual person tracking," in Proc. CLEAR Evaluation Workshop, Southampton, U.K., Apr. 2006, pp. 55-68.
    • (2006) Proc. CLEAR Evaluation Workshop , pp. 55-68
    • Brunelli, R.1
  • 8
    • 0029304865 scopus 로고
    • Human and machine recognition of faces: A survey
    • May
    • R. Chellapa, C.Wilson, and A. Sirohey, "Human and machine recognition of faces: A survey," Proc. IEEE, vol. 83, no. 5, pp. 705-740, May 1995.
    • (1995) Proc. IEEE , vol.83 , Issue.5 , pp. 705-740
    • Chellapa, R.1    Wilson, C.2    Sirohey, A.3
  • 9
    • 21244492850 scopus 로고    scopus 로고
    • Real-time speaker tracking using particle filter sensor fusion
    • Mar
    • Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion," Proc. IEEE, vol. 92, no. 3, pp. 485-494, Mar. 2004.
    • (2004) Proc. IEEE , vol.92 , Issue.3 , pp. 485-494
    • Chen, Y.1    Rui, Y.2
  • 14
    • 0030715160 scopus 로고    scopus 로고
    • Multi-modal tracking of faces for video communications
    • San Juan, Puerto Rico, Jun
    • J. Crowley and P. Berard, "Multi-modal tracking of faces for video communications," in Proc. Conf. Comput. Vision Pattern Recognition (CVPR), San Juan, Puerto Rico, Jun. 1997, pp. 640-645.
    • (1997) Proc. Conf. Comput. Vision Pattern Recognition (CVPR) , pp. 640-645
    • Crowley, J.1    Berard, P.2
  • 15
    • 4344692646 scopus 로고    scopus 로고
    • A high-accuracy, low-latency technique for talker localization in reverberant environments,
    • Ph.D. dissertation, Brown Univ, Providence, RI
    • J. DiBiase, "A high-accuracy, low-latency technique for talker localization in reverberant environments," Ph.D. dissertation, Brown Univ., Providence, RI, 2000.
    • (2000)
    • DiBiase, J.1
  • 16
    • 0003343412 scopus 로고    scopus 로고
    • Robust localization in reverberant rooms
    • New York: Springer
    • J. DiBiase, H. Silverman, and M. Brandstein, "Robust localization in reverberant rooms," in Microphone Arrays. New York: Springer, 2001, vol. 8, pp. 157-180.
    • (2001) Microphone Arrays , vol.8 , pp. 157-180
    • DiBiase, J.1    Silverman, H.2    Brandstein, M.3
  • 17
    • 0003363117 scopus 로고    scopus 로고
    • Superdirectional microphone arrays
    • S. Gay and J. Benesty, Eds. Norwell, MA: Kluwer, ch. 10, pp
    • G. W. Elko, "Superdirectional microphone arrays," in Acoustic Signal Processing for Telecommunication, S. Gay and J. Benesty, Eds. Norwell, MA: Kluwer, 2000, ch. 10, pp. 181-237.
    • (2000) Acoustic Signal Processing for Telecommunication , pp. 181-237
    • Elko, G.W.1
  • 19
    • 0009622481 scopus 로고    scopus 로고
    • Learning joint statistical models for audio-visual fusion and segregation
    • Denver, CO, Dec
    • J. Fisher, T. Darrell, W. T. Freeman, and P. Viola, "Learning joint statistical models for audio-visual fusion and segregation," in Proc. Neural Inf. Process. Syst. (NIPS), Denver, CO, Dec. 2000, pp. 772-778.
    • (2000) Proc. Neural Inf. Process. Syst. (NIPS) , pp. 772-778
    • Fisher, J.1    Darrell, T.2    Freeman, W.T.3    Viola, P.4
  • 23
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr
    • J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Acoust., Speech. Signal Process., vol. 2, no. 2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Acoust., Speech. Signal Process , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.-L.1    Lee, C.-H.2
  • 27
    • 0032136153 scopus 로고    scopus 로고
    • CONDENSATION: Conditional density propagation for visual tracking
    • M. Isard and A. Blake, "CONDENSATION: Conditional density propagation for visual tracking," Proc. Int. J. Comput. Vision, vol. 29, no. 1, pp. 5-28, 1998.
    • (1998) Proc. Int. J. Comput. Vision , vol.29 , Issue.1 , pp. 5-28
    • Isard, M.1    Blake, A.2
  • 28
    • 0037774471 scopus 로고    scopus 로고
    • Audio-visual localization of multiple speakers in a video teleconferencing setting
    • B. Kapralos, M. Jenkin, and E. Milios, "Audio-visual localization of multiple speakers in a video teleconferencing setting," Int. J. Imaging Syst. Technol., vol. 13, pp. 95-105, 2003.
    • (2003) Int. J. Imaging Syst. Technol , vol.13 , pp. 95-105
    • Kapralos, B.1    Jenkin, M.2    Milios, E.3
  • 29
    • 64149097954 scopus 로고    scopus 로고
    • 3D audiovisual person tracking using Kalman filtering and information theory
    • Southampton, U.K, Apr
    • N. Katsarakis et al., "3D audiovisual person tracking using Kalman filtering and information theory," in Proc. CLEAR Evaluation Workshop, Southampton, U.K., Apr. 2006, pp. 45-54.
    • (2006) Proc. CLEAR Evaluation Workshop , pp. 45-54
    • Katsarakis, N.1
  • 30
    • 35048868406 scopus 로고    scopus 로고
    • An MCMC-based particle filter for tracking multiple interacting targets
    • Prague, May
    • Z. Khan, T. Balch, and F. Dellaert, "An MCMC-based particle filter for tracking multiple interacting targets," in Proc. Eur. Conf. Comput. Vision (ECCV), Prague, May 2004, pp. 279-290.
    • (2004) Proc. Eur. Conf. Comput. Vision (ECCV) , pp. 279-290
    • Khan, Z.1    Balch, T.2    Dellaert, F.3
  • 31
    • 0033707896 scopus 로고    scopus 로고
    • HMM adaptation and microphone array processing for distant speech recognition
    • Istanbul, Turkey, Jun
    • J. Kleban and Y. Gong, "HMM adaptation and microphone array processing for distant speech recognition," in Proc. Int. Conf. Acoust. , Speech, Signal Process. (ICASSP), Istanbul, Turkey, Jun. 2000, pp. 1411-1414.
    • (2000) Proc. Int. Conf. Acoust. , Speech, Signal Process. (ICASSP) , pp. 1411-1414
    • Kleban, J.1    Gong, Y.2
  • 32
    • 0016990291 scopus 로고
    • The generalized correlation method for estimation of time delay
    • Aug
    • C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech. Signal Process., vol. ASSP-24, no. 4, pp. 320-327, Aug. 1976.
    • (1976) IEEE Trans. Acoust., Speech. Signal Process , vol.ASSP-24 , Issue.4 , pp. 320-327
    • Knapp, C.1    Carter, G.2
  • 33
    • 0030193445 scopus 로고    scopus 로고
    • Two decades of array signal processing research: The parametric approach
    • Jul
    • H. Krim and M. Viberg, "Two decades of array signal processing research: The parametric approach," IEEE Signal Process. Mag., vol. 13, no. 4, pp. 67-94, Jul. 1996.
    • (1996) IEEE Signal Process. Mag , vol.13 , Issue.4 , pp. 67-94
    • Krim, H.1    Viberg, M.2
  • 35
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, no. 2, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 37
    • 33846217002 scopus 로고    scopus 로고
    • The multichannelWall Street Journal audio-visual corpus (MC-WSJ-AV): Specification and initial experiments
    • San Juan, Puerto Rico, Dec
    • M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti, "The multichannelWall Street Journal audio-visual corpus (MC-WSJ-AV): Specification and initial experiments," in IEEE Autom. Speech Recognition Understanding Workshop (ASRU), San Juan, Puerto Rico, Dec. 2005, pp. 357-362.
    • (2005) IEEE Autom. Speech Recognition Understanding Workshop (ASRU) , pp. 357-362
    • Lincoln, M.1    McCowan, I.2    Vepa, J.3    Maganti, H.K.4
  • 38
    • 0009653561 scopus 로고    scopus 로고
    • Post-filtering techniques
    • New York: Springer
    • K. S. Uwe, J. Bitzer, and C. Marro, "Post-filtering techniques," in Microphone Arrays. New York: Springer, 2001, vol. 3, pp. 36-60.
    • (2001) Microphone Arrays , vol.3 , pp. 36-60
    • Uwe, K.S.1    Bitzer, J.2    Marro, C.3
  • 40
    • 0032072917 scopus 로고    scopus 로고
    • Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering
    • May
    • C. Marro, Y. Mahieux, and K. U. Simmer, "Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering," IEEE Trans. Speech Audio Process., vol. 6, no. 3, pp. 240-259, May 1998.
    • (1998) IEEE Trans. Speech Audio Process , vol.6 , Issue.3 , pp. 240-259
    • Marro, C.1    Mahieux, Y.2    Simmer, K.U.3
  • 41
    • 0346707504 scopus 로고    scopus 로고
    • Microphone array post-filter based on noise field coherence
    • Nov
    • I. McCowan and H. Bourlard, "Microphone array post-filter based on noise field coherence," IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 709-716, Nov. 2003.
    • (2003) IEEE Trans. Speech Audio Process , vol.11 , Issue.6 , pp. 709-716
    • McCowan, I.1    Bourlard, H.2
  • 43
    • 0030677479 scopus 로고    scopus 로고
    • Multi-channel speech enhancment in a car environment using Wiener filtering and spectral subtraction
    • Munich, Germany, Apr
    • J.Meyer and K. U. Simmer, "Multi-channel speech enhancment in a car environment using Wiener filtering and spectral subtraction," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Munich, Germany, Apr. 1997, pp. 1167-1170.
    • (1997) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 1167-1170
    • Meyer, J.1    Simmer, K.U.2
  • 45
    • 0141631692 scopus 로고    scopus 로고
    • Microphone array speech recognition: Experiments on overlapping speech in meetings
    • Hong Kong, Apr
    • D. Moore and I. McCowan, "Microphone array speech recognition: Experiments on overlapping speech in meetings," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hong Kong, Apr. 2003, pp. V-497-V-500.
    • (2003) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP)
    • Moore, D.1    McCowan, I.2
  • 46
    • 34547165111 scopus 로고    scopus 로고
    • An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
    • Southampton, U.K, Apr
    • K. Nickel, T. Gehrig, H. K. Ekenel, J. McDonough, and R. Stiefelhagen, "An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset," in Proc. CLEAR Evaluation Workshop, Southampton, U.K., Apr. 2006, pp. 69-80.
    • (2006) Proc. CLEAR Evaluation Workshop , pp. 69-80
    • Nickel, K.1    Gehrig, T.2    Ekenel, H.K.3    McDonough, J.4    Stiefelhagen, R.5
  • 49
    • 0028996854 scopus 로고
    • WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition
    • Detroit, MI, Apr
    • T. R. al, "WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Detroit, MI, Apr. 1995, pp. 81-84.
    • (1995) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 81-84
    • al, T.R.1
  • 50
    • 85009230793 scopus 로고    scopus 로고
    • Factorial models and refiltering for speech separation and denoising
    • Geneva, Switzerland, Sep
    • S. Roweis, "Factorial models and refiltering for speech separation and denoising," in Proc. Eurospeech Conf. Speech Commun. Technol. (Eurospeech- 2003), Geneva, Switzerland, Sep. 2003, pp. 1009-1012.
    • (2003) Proc. Eurospeech Conf. Speech Commun. Technol. (Eurospeech- 2003) , pp. 1009-1012
    • Roweis, S.1
  • 52
    • 0023985457 scopus 로고
    • Beamforming:A versatile approach to spatial filtering, IEEE Acoust., Speech
    • Apr
    • B. D. V.Veen and K. M. Buckley, "Beamforming:A versatile approach to spatial filtering," IEEE Acoust., Speech, Signal Process. Mag., vol. 5, no. 2, pp. 4-24, Apr. 1988.
    • (1988) Signal Process. Mag , vol.5 , Issue.2 , pp. 4-24
    • Veen, B.D.V.1    Buckley, K.M.2
  • 53
    • 0034844366 scopus 로고    scopus 로고
    • Sequential Monte Carlo fusion of sound and vision for speaker tracking
    • Vancouver, BC, Canada, Jul
    • J. Vermaak, M. Gagnet, A. Blake, and P. Perez, "Sequential Monte Carlo fusion of sound and vision for speaker tracking," in Proc. Int. Conf. Comput. Vision (ICCV), Vancouver, BC, Canada, Jul. 2001, pp. 741-746.
    • (2001) Proc. Int. Conf. Comput. Vision (ICCV) , pp. 741-746
    • Vermaak, J.1    Gagnet, M.2    Blake, A.3    Perez, P.4
  • 54
    • 85143190952 scopus 로고    scopus 로고
    • A. Waibel, T. Schultz, M. Bett, R. Malkin, I. Rogina, R. Stiefelhagen, and J. Yang, Smart: The smart meeting room task at ISL, in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hong Kong, Apr. 2003, pp. IV-752-IV-754.
    • A. Waibel, T. Schultz, M. Bett, R. Malkin, I. Rogina, R. Stiefelhagen, and J. Yang, "Smart: The smart meeting room task at ISL," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hong Kong, Apr. 2003, pp. IV-752-IV-754.
  • 55
    • 0036298833 scopus 로고    scopus 로고
    • Particle filter beamforming for acoustic source localization in a reverberant environment
    • Orlando, FL, May
    • D. Ward and R. Williamson, "Particle filter beamforming for acoustic source localization in a reverberant environment," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Orlando, FL, May 2002, pp. 1777-1780.
    • (2002) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 1777-1780
    • Ward, D.1    Williamson, R.2
  • 56
    • 0030718943 scopus 로고    scopus 로고
    • Multilingual large vocabulary speech recognition: The European SQUALE project
    • S. J. Young et al., "Multilingual large vocabulary speech recognition: The European SQUALE project," Comput. Speech Lang., vol. 11, no. 1, pp. 73-89, 1997.
    • (1997) Comput. Speech Lang , vol.11 , Issue.1 , pp. 73-89
    • Young, S.J.1
  • 57
    • 0023773764 scopus 로고
    • A microphone array with adaptive post-filtering for noise reduction in reverberant rooms
    • New York, Apr
    • R. Zelinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms," in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), New York, Apr. 1988, pp. 2578-2581.
    • (1988) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 2578-2581
    • Zelinski, R.1
  • 58
    • 0033284445 scopus 로고    scopus 로고
    • Flexible camera calibration by viewing a plane from unknown orientations
    • Kerkyra, Greece, Sep
    • Z. Zhang, "Flexible camera calibration by viewing a plane from unknown orientations," in Proc. Int. Conf. Computer Vision (ICCV), Kerkyra, Greece, Sep. 1999, pp. 666-673.
    • (1999) Proc. Int. Conf. Computer Vision (ICCV) , pp. 666-673
    • Zhang, Z.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.