메뉴 건너뛰기




Volumn 15, Issue 1, 2007, Pages 96-108

Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures

Author keywords

Audiovisual coherence; Blind source separation; Convolutive mixture; Speech enhancement; Statistical modeling

Indexed keywords

ADDITIVE MIXTURES; AUDIO AND VISUAL INFORMATIONS; AUDIO-VISUAL SPEECH PROCESSING; AUDIOVISUAL COHERENCE; BLIND SEPARATIONS; COMPLEX MIXTURES; CONVOLUTIVE MIXTURE; EXTRACTION SYSTEMS; FREQUENCY CHANNELS; FREQUENCY SEPARATIONS; NOISY ENVIRONMENTS; NOVEL ALGORITHMS; SCALE FACTOR AMBIGUITIES; SPEECH SIGNALS; STANDARD SOURCES; STATISTICAL MODELING; STATISTICAL MODELS; STATISTICAL TOOLS; VISUAL SIGNALS;

EID: 34447100075     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2006.872619     Document Type: Article
Times cited : (72)

References (50)
  • 1
    • 0030362791 scopus 로고    scopus 로고
    • For speech perception by humans or machines, three senses are better than one
    • L. E. Bernstein and C. Benoît, "For speech perception by humans or machines, three senses are better than one," in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), 1996, pp. 1477-1480.
    • (1996) Proc. Int. Conf. Spoken Lang. Process. (ICSLP) , pp. 1477-1480
    • Bernstein, L.E.1    Benoît, C.2
  • 2
    • 0001048664 scopus 로고
    • Visual contribution to speech intelligibility in noise
    • W. Sumby and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
    • (1954) J. Acoust. Soc. Amer , vol.26 , pp. 212-215
    • Sumby, W.1    Pollack, I.2
  • 3
    • 0002028032 scopus 로고
    • Some preliminaries to a comprehensive account of audio-visual speech perception
    • B. Dodd and R. Campbell, Eds. Mahwah, NJ: Lawrence Erlbaum
    • Q. Summerfield, "Some preliminaries to a comprehensive account of audio-visual speech perception," in Hearing by Eye: The Psychology of Lipreading, B. Dodd and R. Campbell, Eds. Mahwah, NJ: Lawrence Erlbaum, 1987, pp. 3-51.
    • (1987) Hearing by Eye: The Psychology of Lipreading , pp. 3-51
    • Summerfield, Q.1
  • 4
    • 0031747741 scopus 로고    scopus 로고
    • Complementarity and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise
    • J. Robert-Ribes, J.-L. Schwartz, T. Lallouache, and P. Escudier, "Complementarity and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise," J. Acoust. Soc. Amer., vol. 103, no. 6, pp. 3677-3689, 1998.
    • (1998) J. Acoust. Soc. Amer , vol.103 , Issue.6 , pp. 3677-3689
    • Robert-Ribes, J.1    Schwartz, J.-L.2    Lallouache, T.3    Escudier, P.4
  • 5
    • 0003699540 scopus 로고
    • Automatic lipreading to enhance speech recognition,
    • Ph.D. dissertation, Univ. Illinois, Urbana
    • E. D. Petajan, "Automatic lipreading to enhance speech recognition," Ph.D. dissertation, Univ. Illinois, Urbana, 1984.
    • (1984)
    • Petajan, E.D.1
  • 6
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audio-visual speech
    • Sep
    • G. Potamianos, C. Neti, G. Gravier,A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audio-visual speech," Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sep. 2003.
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4    Senior, A.W.5
  • 7
    • 0033822769 scopus 로고    scopus 로고
    • The use of visible speech cues for improving auditory detection of spoken sentences
    • K. Grant and P. Seitz, "The use of visible speech cues for improving auditory detection of spoken sentences," J. Acoust. Soc. Amer., vol. 108, pp. 1197-1208, 2000.
    • (2000) J. Acoust. Soc. Amer , vol.108 , pp. 1197-1208
    • Grant, K.1    Seitz, P.2
  • 8
    • 10444258058 scopus 로고    scopus 로고
    • Investigating the audio-visual speech detection advantage
    • J. Kim and D. Chris, "Investigating the audio-visual speech detection advantage," Speech Commun., vol. 44, no. 1-4, pp. 19-30, 2004.
    • (2004) Speech Commun , vol.44 , Issue.1-4 , pp. 19-30
    • Kim, J.1    Chris, D.2
  • 9
    • 10444276578 scopus 로고    scopus 로고
    • Auditory speech detection in noise enhanced by lipreading
    • L. E. Bernstein, E. T. J. Auer, and S. Takayanagi, "Auditory speech detection in noise enhanced by lipreading," Speech Commun., vol. 44, no. 1-4, pp. 5-18, 2004.
    • (2004) Speech Commun , vol.44 , Issue.1-4 , pp. 5-18
    • Bernstein, L.E.1    Auer, E.T.J.2    Takayanagi, S.3
  • 10
    • 85009257811 scopus 로고    scopus 로고
    • Audio-visual scene analysis; evidence for a "very-early" integration process in audio-visual speech perception
    • J.-L. Schwartz, F. Berthommier, and C. Savariaux, "Audio-visual scene analysis; evidence for a "very-early" integration process in audio-visual speech perception," in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), 2002, pp. 1937-1940.
    • (2002) Proc. Int. Conf. Spoken Lang. Process. (ICSLP) , pp. 1937-1940
    • Schwartz, J.-L.1    Berthommier, F.2    Savariaux, C.3
  • 11
    • 0034974093 scopus 로고    scopus 로고
    • Audio-visual enhancement of speech in noise
    • Jun
    • L. Girin, J.-L. Schwartz, and G. Feng, "Audio-visual enhancement of speech in noise," J. Acoust. Soc. Amer., vol. 109, no. 6, pp. 3007-3020, Jun. 2001.
    • (2001) J. Acoust. Soc. Amer , vol.109 , Issue.6 , pp. 3007-3020
    • Girin, L.1    Schwartz, J.-L.2    Feng, G.3
  • 12
    • 85009232030 scopus 로고    scopus 로고
    • Audio-visual speech enhancement with AVCDCN (AudioVisual Codebook Dependent Cepstral Normalization)
    • S. Deligne, G. Potamianos, and C. Neti, "Audio-visual speech enhancement with AVCDCN (AudioVisual Codebook Dependent Cepstral Normalization)," in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), 2002, pp. 1449-1452.
    • (2002) Proc. Int. Conf. Spoken Lang. Process. (ICSLP) , pp. 1449-1452
    • Deligne, S.1    Potamianos, G.2    Neti, C.3
  • 14
  • 15
    • 0036874541 scopus 로고    scopus 로고
    • Separation of audio-visual speech sources: A new approach exploiting the audiovisual coherence of speech stimuli
    • D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, "Separation of audio-visual speech sources: a new approach exploiting the audiovisual coherence of speech stimuli," EURASIP J. Appl. Signal Process., vol. 2002, no. 11, pp. 1165-1173, 2002.
    • (2002) EURASIP J. Appl. Signal Process , vol.2002 , Issue.11 , pp. 1165-1173
    • Sodoyer, D.1    Schwartz, J.-L.2    Girin, L.3    Klinkisch, J.4    Jutten, C.5
  • 16
    • 10444247388 scopus 로고    scopus 로고
    • Developing an audio-visual speech source separation algorithm
    • Oct
    • D. Sodoyer, L. Girin, C. Jutten, and J.-L. Schwartz, "Developing an audio-visual speech source separation algorithm," Speech Commun., vol. 44, no. 1-4, pp. 113-125, Oct. 2004.
    • (2004) Speech Commun , vol.44 , Issue.1-4 , pp. 113-125
    • Sodoyer, D.1    Girin, L.2    Jutten, C.3    Schwartz, J.-L.4
  • 17
    • 0032187518 scopus 로고    scopus 로고
    • Blind signal separation: Statistical principles
    • Oct
    • J.-F. Cardoso, "Blind signal separation: statistical principles," Proc. IEEE, vol. 86, no. 10, pp. 2009-2025, Oct. 1998.
    • (1998) Proc. IEEE , vol.86 , Issue.10 , pp. 2009-2025
    • Cardoso, J.-F.1
  • 20
    • 0028416938 scopus 로고
    • Independent component analysis, a new concept?
    • Apr
    • P. Comon, "Independent component analysis, a new concept?," Signal Process., vol. 36, no. 3, pp. 287-314, Apr. 1994.
    • (1994) Signal Process , vol.36 , Issue.3 , pp. 287-314
    • Comon, P.1
  • 21
    • 0001877182 scopus 로고
    • Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétrique en apprentissage non supervisé
    • Nice, France, May
    • J. Hérault, C. Jutten, and B. Ans, "Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétrique en apprentissage non supervisé," in Proc. GRETSI, Nice, France, May 1985, vol. 2, pp. 1017-1020.
    • (1985) Proc. GRETSI , vol.2 , pp. 1017-1020
    • Hérault, J.1    Jutten, C.2    Ans, B.3
  • 22
    • 0026191274 scopus 로고
    • Blind separation of sources. Part I: An adaptive algorithm based on a neuromimetic architecture
    • Jul
    • C. Jutten and J. Hérault, "Blind separation of sources. Part I: An adaptive algorithm based on a neuromimetic architecture," Signal Process., vol. 24, no. 1, pp. 1-10, Jul. 1991.
    • (1991) Signal Process , vol.24 , Issue.1 , pp. 1-10
    • Jutten, C.1    Hérault, J.2
  • 23
    • 0027812550 scopus 로고
    • Blind beamforming for non Gaussian signals
    • Dec
    • J.-F. Cardoso and A. Souloumiac, "Blind beamforming for non Gaussian signals," Proc. Inst. Elect. Eng. F, vol. 140, no. 6, pp. 362-370, Dec. 1993.
    • (1993) Proc. Inst. Elect. Eng. F , vol.140 , Issue.6 , pp. 362-370
    • Cardoso, J.-F.1    Souloumiac, A.2
  • 24
    • 0032629347 scopus 로고    scopus 로고
    • Fast and robust fixed-point algorithms for independent component analysis
    • May
    • A. Hyvarinen, "Fast and robust fixed-point algorithms for independent component analysis," IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 626-634, May 1999.
    • (1999) IEEE Trans. Neural Netw , vol.10 , Issue.3 , pp. 626-634
    • Hyvarinen, A.1
  • 25
    • 0029411030 scopus 로고
    • An information-maximization approach to blind source separation and blind deconvolution
    • A. Bell and T. Sejnowski, "An information-maximization approach to blind source separation and blind deconvolution," Neural Comput., vol. 7, pp. 1129-1159, 1995.
    • (1995) Neural Comput , vol.7 , pp. 1129-1159
    • Bell, A.1    Sejnowski, T.2
  • 26
    • 0005993647 scopus 로고
    • Fetal electrocardiogram extraction by source subspace separation
    • Girona, Spain, Jun. 12-14
    • L. De Lathauwer, D. Callaerts, B. De Moor, and J. Vandewalle, "Fetal electrocardiogram extraction by source subspace separation," in Proc. IEEE Workshop HOS, Girona, Spain, Jun. 12-14, 1995, pp. 134-138.
    • (1995) Proc. IEEE Workshop HOS , pp. 134-138
    • De Lathauwer, L.1    Callaerts, D.2    De Moor, B.3    Vandewalle, J.4
  • 27
    • 0035113616 scopus 로고    scopus 로고
    • Noninvasive fetal electrocardiogram extraction: Blind source separation versus adaptative noise cancellation
    • Jan
    • V. Zarzoso and A. K. Nandi, "Noninvasive fetal electrocardiogram extraction: Blind source separation versus adaptative noise cancellation," IEEE Trans. Biomed. Eng., vol. 48, no. 1, pp. 12-18, Jan. 2001.
    • (2001) IEEE Trans. Biomed. Eng , vol.48 , Issue.1 , pp. 12-18
    • Zarzoso, V.1    Nandi, A.K.2
  • 28
  • 31
    • 0000914334 scopus 로고    scopus 로고
    • Convolutive blind separation of non stationary sources
    • May
    • L. Para and C. Spence, "Convolutive blind separation of non stationary sources," IEEE Trans. Speech Audio Process., vol. 8, no. 3, pp. 320-327, May 2000.
    • (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.3 , pp. 320-327
    • Para, L.1    Spence, C.2
  • 34
    • 13344276516 scopus 로고    scopus 로고
    • Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures
    • Sienna, Italy, Oct
    • B. Rivet, L. Girin, C. Jutten, and J.-L. Schwartz, "Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures," in IEEE Int. Workshop Multimedia Signal Process. (MMSP), Sienna, Italy, Oct. 2004, pp. 47-50.
    • (2004) IEEE Int. Workshop Multimedia Signal Process. (MMSP) , pp. 47-50
    • Rivet, B.1    Girin, L.2    Jutten, C.3    Schwartz, J.-L.4
  • 35
    • 33646773973 scopus 로고    scopus 로고
    • Solving the indeterminations of blind source separation of convolutive speech mixtures
    • Philadelphia, PA, Mar
    • B. Rivet, L. Girin, and C. Jutten, "Solving the indeterminations of blind source separation of convolutive speech mixtures," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Philadelphia, PA, Mar. 2005, pp. 533-536.
    • (2005) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 533-536
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 36
    • 64149122186 scopus 로고    scopus 로고
    • Log-Rayleigh distribution:Asimple and efficient statistical representation of log-spectral coefficients
    • submitted for publication
    • -, "Log-Rayleigh distribution:Asimple and efficient statistical representation of log-spectral coefficients," IEEE Trans. Audio, Speech, Lang. Process., 2006, submitted for publication.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 37
    • 0029354468 scopus 로고
    • Blind source separation for convolutive mixtures
    • H.-L. Nguyen-Thi and C. Jutten, "Blind source separation for convolutive mixtures," Signal Process., vol. 45, pp. 209-229, 1995.
    • (1995) Signal Process , vol.45 , pp. 209-229
    • Nguyen-Thi and, H.-L.1    Jutten, C.2
  • 38
    • 0035537406 scopus 로고    scopus 로고
    • Joint approximate diagonalization of positive definite matrices
    • D.-T. Pham, "Joint approximate diagonalization of positive definite matrices," SIAM J. Matrix Anal. Appl., vol. 22, no. 4, pp. 1136-1152, 2001.
    • (2001) SIAM J. Matrix Anal. Appl , vol.22 , Issue.4 , pp. 1136-1152
    • Pham, D.-T.1
  • 39
    • 0035659640 scopus 로고    scopus 로고
    • An approach to blind source separation based on temporal structure of speech signals
    • Oct
    • N. Murata, S. Ikeda, and A. Ziehe, "An approach to blind source separation based on temporal structure of speech signals," Neurocomput., vol. 41, no. 1-4, pp. 1-24, Oct. 2001.
    • (2001) Neurocomput , vol.41 , Issue.1-4 , pp. 1-24
    • Murata, N.1    Ikeda, S.2    Ziehe, A.3
  • 40
    • 0347153389 scopus 로고
    • Read my lips. . . and my jaw! How intelligible are the components of a speaker's face?
    • Madrid, Spain
    • B. Le Goff, T. Guiard-Marigny, and C. Benoît, "Read my lips. . . and my jaw! How intelligible are the components of a speaker's face?," in Proc. Euro. Conf. Speech Communication Technology, Madrid, Spain, 1995, pp. 291-294.
    • (1995) Proc. Euro. Conf. Speech Communication Technology , pp. 291-294
    • Le Goff, B.1    Guiard-Marigny, T.2    Benoît, C.3
  • 41
    • 64149123499 scopus 로고    scopus 로고
    • -, Analysis-synthesis and intelligibility of a talking face, in Progress in Speech SynthesisJ. Van Santen, R. Sproat, J. Olive, and J. Hirschberg, Eds. New York: Springer-Verlag, 1996, pp. 235-244.
    • -, "Analysis-synthesis and intelligibility of a talking face," in Progress in Speech SynthesisJ. Van Santen, R. Sproat, J. Olive, and J. Hirschberg, Eds. New York: Springer-Verlag, 1996, pp. 235-244.
  • 43
    • 0002605227 scopus 로고
    • Un poste visage-parole. Acquisition et traitement des contours labiaux
    • in French, Montréal, QC, Canada
    • T. Lallouache, "Un poste visage-parole. Acquisition et traitement des contours labiaux," in Proc. Journées d'Etude sur la Parole (JEP) (in French), Montréal, QC, Canada, 1990, pp. 282-286.
    • (1990) Proc. Journées d'Etude sur la Parole (JEP) , pp. 282-286
    • Lallouache, T.1
  • 44
    • 0030270377 scopus 로고    scopus 로고
    • Second-order complex random vectors and normal distributions
    • Oct
    • B. Picinbono, "Second-order complex random vectors and normal distributions," IEEE Trans. Signal Process., vol. 44, no. 10, pp. 2637-2640, Oct. 1996.
    • (1996) IEEE Trans. Signal Process , vol.44 , Issue.10 , pp. 2637-2640
    • Picinbono, B.1
  • 45
    • 0027634633 scopus 로고
    • Proper complex random processes with applications to information theory
    • Jul
    • F. D. Neeser and J. L. Massey, "Proper complex random processes with applications to information theory," IEEE Trans. Inf. Theory, vol. 39, no. 4, pp. 1293-1302, Jul. 1993.
    • (1993) IEEE Trans. Inf. Theory , vol.39 , Issue.4 , pp. 1293-1302
    • Neeser, F.D.1    Massey, J.L.2
  • 46
    • 64149128058 scopus 로고    scopus 로고
    • L. Benaroya, Séparation de plusieurs sources sonores avec un seul microphone, Ph.D. dissertation, Traitement du signal, Univ. Rennes 1, Rennes, France, Jun. 2003.
    • L. Benaroya, "Séparation de plusieurs sources sonores avec un seul microphone," Ph.D. dissertation, Traitement du signal, Univ. Rennes 1, Rennes, France, Jun. 2003.
  • 47
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior," Speech Commun., vol. 26, no. 1, pp. 23-43, 1998.
    • (1998) Speech Commun , vol.26 , Issue.1 , pp. 23-43
    • Yehia, H.1    Rubin, P.2    Vatikiotis-Bateson, E.3
  • 48
    • 0032122727 scopus 로고    scopus 로고
    • Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates
    • Jul
    • D. Ormoneit and V. Tresp, "Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates," IEEE Trans. Neural Netw., vol. 9, no. 4, pp. 639-650, Jul. 1998.
    • (1998) IEEE Trans. Neural Netw , vol.9 , Issue.4 , pp. 639-650
    • Ormoneit, D.1    Tresp, V.2
  • 50
    • 0002629270 scopus 로고
    • Maximum-likelihood from incomplete data via the EM algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum-likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. Ser. B., vol. 39, pp. 1-38, 1977.
    • (1977) J. R. Statist. Soc. Ser. B , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.