메뉴 건너뛰기




Volumn 20, Issue 12, 2009, Pages 1898-1910

Learning bimodal structure in audiovisual data

Author keywords

Audio visual source localization; Dictionary learning; Matching pursuit (MP); Multimodal data processing; Sparse representation

Indexed keywords

AUDIO-VISUAL; DICTIONARY LEARNING; MATCHING PURSUIT; MULTI-MODAL DATA; SOURCE LOCALIZATION; SPARSE REPRESENTATION;

EID: 72149118713     PISSN: 10459227     EISSN: None     Source Type: Journal    
DOI: 10.1109/TNN.2009.2032182     Document Type: Article
Times cited : (35)

References (58)
  • 1
    • 37549045258 scopus 로고    scopus 로고
    • Multisensory interplay reveals crossmodal influences on 'sensory-specific' brain regions, neural responses, and judgements
    • J. Driver and T. Noesselt, "Multisensory interplay reveals crossmodal influences on 'sensory-specific' brain regions, neural responses, and judgements," Neuron, vol.57, no.1, pp. 11-23, 2008.
    • (2008) Neuron , vol.57 , Issue.1 , pp. 11-23
    • Driver, J.1    Noesselt, T.2
  • 2
    • 33746482755 scopus 로고    scopus 로고
    • Seeing sounds: Visual and auditory interactions in the brain
    • D. A. Bulkin and J. M. Groh, "Seeing sounds: Visual and auditory interactions in the brain," Current Opinion Neurobiol., vol.16, no.4, pp. 415-419, 2006.
    • (2006) Current Opinion Neurobiol. , vol.16 , Issue.4 , pp. 415-419
    • Bulkin, D.A.1    Groh, J.M.2
  • 3
    • 23044480166 scopus 로고    scopus 로고
    • Multisensory contributions to lowlevel, 'unisensory' processing
    • C. E. Schroeder and J. J. Foxe, "Multisensory contributions to lowlevel, 'unisensory' processing," Current Opinion Neurobiol., vol.15, no.4, pp. 454-458, 2005.
    • (2005) Current Opinion Neurobiol. , vol.15 , Issue.4 , pp. 454-458
    • Schroeder, C.E.1    Foxe, J.J.2
  • 4
    • 0035423980 scopus 로고    scopus 로고
    • Sensory modalities are not separate modalities: Plasticity and interactions
    • S. Shimojo and L. Shams, "Sensory modalities are not separate modalities: Plasticity and interactions," Current Opinion Neurobiol., vol.11, no.4, pp. 505-509, 2001.
    • (2001) Current Opinion Neurobiol , vol.11 , Issue.4 , pp. 505-509
    • Shimojo, S.1    Shams, L.2
  • 5
    • 0031020146 scopus 로고    scopus 로고
    • Sound alters visual motion perception
    • R. Sekuler, A. Sekuler, and R. Lau, "Sound alters visual motion perception," Nature, vol.385, no.6614, pp. 308-308, 1997.
    • (1997) Nature , vol.385 , Issue.6614 , pp. 308-308
    • Sekuler, R.1    Sekuler, A.2    Lau, R.3
  • 6
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. McGurk and J. W. MacDonald, "Hearing lips and seeing voices," Nature, vol.264, no.5588, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , Issue.5588 , pp. 746-748
    • McGurk, H.1    MacDonald, J.W.2
  • 7
    • 0029935458 scopus 로고    scopus 로고
    • Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading
    • J. Driver, "Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading," Nature, vol.381, pp. 66-68, 1996.
    • (1996) Nature , vol.381 , pp. 66-68
    • Driver, J.1
  • 8
    • 33646707471 scopus 로고    scopus 로고
    • Vision and touch are automatically integrated for the perception of sequences of events
    • J.-P. Bresciani, F. Dammeier, and M. Ernst, "Vision and touch are automatically integrated for the perception of sequences of events," J. Vis., vol.6, no.5, pp. 554-564, 2006.
    • (2006) J. Vis. , vol.6 , Issue.5 , pp. 554-564
    • Bresciani, J.-P.1    Dammeier, F.2    Ernst, M.3
  • 9
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audiovisual speech
    • Sep.
    • G. Potamianos, C. Neti, G. Gravier,A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audiovisual speech," Proc. IEEE, vol.91, no.9, pp. 1306-1326, Sep. 2003.
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4    Senior, A.W.5
  • 11
    • 85008046156 scopus 로고    scopus 로고
    • Extraction of audio features specific to speech production for multimodal speaker detection
    • Jan.
    • P. Besson, V. Popovici, J.-M. Vesin, J.-P. Thiran, and M. Kunt, "Extraction of audio features specific to speech production for multimodal speaker detection," IEEE Trans. Multimedia, vol.10, no.1, pp. 63-73, Jan. 2008.
    • (2008) IEEE Trans. Multimedia , vol.10 , Issue.1 , pp. 63-73
    • Besson, P.1    Popovici, V.2    Vesin, J.-M.3    Thiran, J.-P.4    Kunt, M.5
  • 14
    • 34447100075 scopus 로고    scopus 로고
    • Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures
    • Jan.
    • B. Rivet, L. Girin, and C. Jutten, "Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures," IEEE Trans. Audio Speech Lang. Process., vol.15, no.1, pp. 96-108, Jan. 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.1 , pp. 96-108
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 15
    • 34447095008 scopus 로고    scopus 로고
    • Visual voice activity detection as a help for speech source separation from convolutive mixtures
    • B. Rivet, L. Girin, and C. Jutten, "Visual voice activity detection as a help for speech source separation from convolutive mixtures," Speech Commun., vol.49, no.7, pp. 667-677, 2007.
    • (2007) Speech Commun , vol.49 , Issue.7 , pp. 667-677
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 17
    • 84899028297 scopus 로고    scopus 로고
    • Audio-vision: Using audio-visual synchrony to locate sounds
    • J. Hershey and J. Movellan, "Audio-vision: Using audio-visual synchrony to locate sounds," in Proc. Neural Inf. Process. Signal, 1999, vol.12, pp. 813-819.
    • (1999) Proc. Neural Inf. Process. Signal , vol.12 , pp. 813-819
    • Hershey, J.1    Movellan, J.2
  • 18
    • 2642557514 scopus 로고    scopus 로고
    • FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks
    • M. Slaney and M. Covell, "FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks," in Proc. Neural Inf. Process. Signal, 2000, vol.13, pp. 814-820.
    • (2000) Proc. Neural Inf. Process. Signal , vol.13 , pp. 814-820
    • Slaney, M.1    Covell, M.2
  • 20
    • 2642562769 scopus 로고    scopus 로고
    • Speaker association with signal-level audiovisual fusion
    • Jun.
    • J. W. Fisher, III and T. Darrell, "Speaker association with signal-level audiovisual fusion," IEEE Trans. Multimedia, vol.6, no.3, pp. 406-413, Jun. 2004.
    • (2004) IEEE Trans. Multimedia , vol.6 , Issue.3 , pp. 406-413
    • Fisher Iii, J.W.1    Darrell, T.2
  • 21
    • 34147167538 scopus 로고    scopus 로고
    • Cross-modal localization via sparsity
    • Apr.
    • E. Kidron, Y. Schechner, and M. Elad, "Cross-modal localization via sparsity," IEEE Trans. Signal Process., vol.55, no.4, pp. 1390-1404, Apr. 2007.
    • (2007) IEEE Trans. Signal Process. , vol.55 , Issue.4 , pp. 1390-1404
    • Kidron, E.1    Schechner, Y.2    Elad, M.3
  • 23
    • 72149099803 scopus 로고    scopus 로고
    • Dynamic dependency tests: Analysis and applications to multi-modal data association
    • M. R. Siracusa and J. W. Fisher, "Dynamic dependency tests: Analysis and applications to multi-modal data association," in Proc. Int. Conf. Artif. Intell. Statist., 2007.
    • (2007) Proc. Int. Conf. Artif. Intell. Statist.
    • Siracusa, M.R.1    Fisher, J.W.2
  • 24
    • 33749427593 scopus 로고    scopus 로고
    • Analysis of multimodal sequences using geometric video representations
    • G. Monaci, Ò. D. Escoda, and P. Vandergheynst, "Analysis of multimodal sequences using geometric video representations," Signal Process., vol.86, no.12, pp. 3534-3548, 2006.
    • (2006) Signal Process , vol.86 , Issue.12 , pp. 3534-3548
    • Monaci, G.1    Escoda, O.D.2    Vandergheynst, P.3
  • 27
    • 0027842081 scopus 로고
    • Matching pursuits with time-frequency dictionaries
    • Dec.
    • S. Mallat and Z. Zhang, "Matching pursuits with time-frequency dictionaries," IEEE Trans. Signal Process., vol.41, no.12, pp. 3397-3415, Dec. 1993.
    • (1993) IEEE Trans. Signal Process. , vol.41 , Issue.12 , pp. 3397-3415
    • Mallat, S.1    Zhang, Z.2
  • 29
    • 0032131292 scopus 로고    scopus 로고
    • Atomic decomposition by basis pursuit
    • S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM J. Sci. Comput., vol.20, no.1, pp. 33-61, 1998.
    • (1998) SIAM J. Sci. Comput. , vol.20 , Issue.1 , pp. 33-61
    • Chen, S.S.1    Donoho, D.L.2    Saunders, M.A.3
  • 30
  • 31
    • 0037418225 scopus 로고    scopus 로고
    • Optimal sparse representation in general (nonorthogoinal) dictionaries via l1 minimization
    • D. L. Donoho and M. Elad, "Optimal sparse representation in general (nonorthogoinal) dictionaries via l1 minimization," Proc. Nat. Acad. Sci., vol.100, pp. 2197-2202, 2003.
    • (2003) Proc. Nat. Acad. Sci. , vol.100 , pp. 2197-2202
    • Donoho, D.L.1    Elad, M.2
  • 32
    • 0003306887 scopus 로고    scopus 로고
    • Curvelets-A surprisingly effective nonadaptive representation for objects with edges
    • A. Cohen, C. Rabut, and L. Schmaker, Eds. Nashville, TN: Vanderbilt Univ. Press
    • E. J. Candès and D. L. Donoho, "Curvelets-A surprisingly effective nonadaptive representation for objects with edges," in Curve and Surface Fitting, A. Cohen, C. Rabut, and L. Schmaker, Eds. Nashville, TN: Vanderbilt Univ. Press, 1999.
    • (1999) Curve and Surface Fitting
    • Candès, E.J.1    Donoho, D.L.2
  • 33
    • 0002303753 scopus 로고
    • The coding of sensory messages
    • W. H. Thorpe and O. L. Zangwill, Eds. Cambridge, U.K.: Cambridge Univ. Press
    • H. B. Barlow, "The coding of sensory messages," in Current Problems in Animal Behaviour, W. H. Thorpe and O. L. Zangwill, Eds. Cambridge, U.K.: Cambridge Univ. Press, 1961.
    • (1961) Current Problems in Animal Behaviour
    • Barlow, H.B.1
  • 34
    • 0030779611 scopus 로고    scopus 로고
    • Sparse coding with an overcomplete basis set: A strategy employed by V1?
    • B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: A strategy employed by V1?," Vis. Res., vol.37, pp. 3311-3327, 1997.
    • (1997) Vis. Res. , vol.37 , pp. 3311-3327
    • Olshausen, B.A.1    Field, D.J.2
  • 35
    • 33847100046 scopus 로고    scopus 로고
    • A network that uses few active neurons to code visual input predicts the diverse shapes of cortical receptive fields
    • M. Rehn and F. T. Sommer, "A network that uses few active neurons to code visual input predicts the diverse shapes of cortical receptive fields," J. Comput. Neurosci., vol.22, no.2, pp. 135-146, 2007.
    • (2007) J. Comput. Neurosci. , vol.22 , Issue.2 , pp. 135-146
    • Rehn, M.1    Sommer, F.T.2
  • 36
    • 0034133184 scopus 로고    scopus 로고
    • Learning overcomplete representations
    • M. S. Lewicki and T. J. Sejnowski, "Learning overcomplete representations," Neural Comput., vol.12, no.2, pp. 337-365, 2000.
    • (2000) Neural Comput , vol.12 , Issue.2 , pp. 337-365
    • Lewicki, M.S.1    Sejnowski, T.J.2
  • 37
    • 33644513420 scopus 로고    scopus 로고
    • Efficient auditory coding
    • E. C. Smith and M. S. Lewicki, "Efficient auditory coding," Nature, vol.439, no.7079, pp. 978-982, 2006.
    • (2006) Nature , vol.439 , Issue.7079 , pp. 978-982
    • Smith, E.C.1    Lewicki, M.S.2
  • 38
    • 33744987389 scopus 로고    scopus 로고
    • Sparse and shift-invariant representations of music
    • Jan.
    • T. Blumensath and M. Davies, "Sparse and shift-invariant representations of music," IEEE Trans. Audio Speech Lang. Process., vol.14, no.1, pp. 50-57, Jan. 2006.
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.1 , pp. 50-57
    • Blumensath, T.1    Davies, M.2
  • 43
    • 33751379736 scopus 로고    scopus 로고
    • Image denoising via sparse and redundant representations over learned dictionaries
    • DOI 10.1109/TIP.2006.881969
    • M. Elad and M. Aaron, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Trans. Image Process., vol.15, no.12, pp. 3736-3745, Dec. 2006. (Pubitemid 44811686)
    • (2006) IEEE Transactions on Image Processing , vol.15 , Issue.12 , pp. 3736-3745
    • Elad, M.1    Aharon, M.2
  • 45
    • 33750383209 scopus 로고    scopus 로고
    • The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation
    • Nov.
    • M. Elad, M. Aharon, and A. M. Bruckstein, "The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation," IEEE Trans. Signal Process., vol.54, no.11, pp. 4311-4322, Nov. 2006.
    • (2006) IEEE Trans. Signal Process. , vol.54 , Issue.11 , pp. 4311-4322
    • Elad, M.1    Aharon, M.2    Bruckstein, A.M.3
  • 47
    • 0031102203 scopus 로고    scopus 로고
    • Sparse signal reconstruction from limited data using focuss: Are-weighted minimum norm algorithm
    • Mar.
    • I. F. Gorodnitsky and B. D. Rao, "Sparse signal reconstruction from limited data using focuss: Are-weighted minimum norm algorithm," IEEE Trans. Signal Process., vol.45, no.3, pp. 600-616, Mar. 1997.
    • (1997) IEEE Trans. Signal Process. , vol.45 , Issue.3 , pp. 600-616
    • Gorodnitsky, I.F.1    Rao, B.D.2
  • 48
    • 30844445842 scopus 로고    scopus 로고
    • Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit
    • J. A. Tropp, A. C. Gilbert, and M. J. Strauss, "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit," Signal Process., vol.86, no.3, pp. 572-588, 2006.
    • (2006) Signal Process , vol.86 , Issue.3 , pp. 572-588
    • Tropp, J.A.1    Gilbert, A.C.2    Strauss, M.J.3
  • 49
    • 0000797290 scopus 로고
    • What is the computational goal of the neocortex?
    • C. Koch and J. L. Davis, Eds. Cambridge, MA: MIT Press
    • H. B. Barlow, "What is the computational goal of the neocortex?," in Large-Scale Neuronal Theories of the Brain, C. Koch and J. L. Davis, Eds. Cambridge, MA: MIT Press, 1994.
    • (1994) Large-Scale Neuronal Theories of the Brain
    • Barlow, H.B.1
  • 50
    • 0022019614 scopus 로고
    • Intermodal timing relations and audiovisual speech recognition by normal-hearing adults
    • M. McGrath and Q. Summerfield, "Intermodal timing relations and audiovisual speech recognition by normal-hearing adults," J. Acoust. Soc. Amer., vol.77, no.2, pp. 678-685, 1985.
    • (1985) J. Acoust. Soc. Amer. , vol.77 , Issue.2 , pp. 678-685
    • McGrath, M.1    Summerfield, Q.2
  • 51
    • 21344450921 scopus 로고    scopus 로고
    • Perceptual fusion and stimulus coincidence in the cross-modal integration of speech
    • L. M. Miller and M. D'Esposito, "Perceptual fusion and stimulus coincidence in the cross-modal integration of speech," J. Neurosci., vol.25, no.25, pp. 5884-5893, 2005.
    • (2005) J. Neurosci. , vol.25 , Issue.25 , pp. 5884-5893
    • Miller, L.M.1    D'Esposito, M.2
  • 52
    • 33947142837 scopus 로고    scopus 로고
    • Theoretical results on sparse representations of multiple-measurement vectors
    • Dec.
    • J. Chen and X. Huo, "Theoretical results on sparse representations of multiple-measurement vectors," IEEE Trans. Signal Process., vol.54, no.12, pp. 4634-4643, Dec. 2006.
    • (2006) IEEE Trans. Signal Process. , vol.54 , Issue.12 , pp. 4634-4643
    • Chen, J.1    Huo, X.2
  • 53
    • 23844477225 scopus 로고    scopus 로고
    • Sparse solutions to linear inverse problems with multiple measurement vectors
    • Jul.
    • S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, "Sparse solutions to linear inverse problems with multiple measurement vectors," IEEE Trans. Signal Process., vol.53, no.7, pp. 2477-2488, Jul. 2005.
    • (2005) IEEE Trans. Signal Process. , vol.53 , Issue.7 , pp. 2477-2488
    • Cotter, S.F.1    Rao, B.D.2    Engan, K.3    Kreutz-Delgado, K.4
  • 54
    • 33745640090 scopus 로고    scopus 로고
    • Simultaneous approximation by greedy algorithms
    • D. Leviatan and V. Temlyakov, "Simultaneous approximation by greedy algorithms," Adv. Comput. Math., vol.25, no.1-3, pp. 73-90, 2006.
    • (2006) Adv. Comput. Math. , vol.25 , Issue.1-3 , pp. 73-90
    • Leviatan, D.1    Temlyakov, V.2
  • 55
    • 0036874756 scopus 로고    scopus 로고
    • Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus
    • E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, "Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus," EURASIP J. Appl. Signal Process., vol.2002, no.11, pp. 1189-1201, 2002.
    • (2002) EURASIP J. Appl. Signal Process. , vol.2002 , Issue.11 , pp. 1189-1201
    • Patterson, E.K.1    Gurbuz, S.2    Tufekci, Z.3    Gowdy, J.N.4
  • 56
    • 0033316361 scopus 로고    scopus 로고
    • Hierarchical models of object recognition in cortex
    • M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neurosci., vol.2, pp. 1019-1025, 1999.
    • (1999) Nature Neurosci , vol.2 , pp. 1019-1025
    • Riesenhuber, M.1    Poggio, T.2
  • 57
    • 33847275584 scopus 로고    scopus 로고
    • Unsupervised learning of visual features through spike timing dependent plasticity
    • T. Masquelier and S. J. Thorpe, "Unsupervised learning of visual features through spike timing dependent plasticity," PLoS Comput. Biol., vol.3, no.2, p. e31, 2007.
    • (2007) PLoS Comput. Biol. , vol.3 , Issue.2
    • Masquelier, T.1    Thorpe, S.J.2
  • 58
    • 85162045775 scopus 로고    scopus 로고
    • Modeling natural sounds with modulation cascade processes
    • R. E. Turner and M. Sahani, "Modeling natural sounds with modulation cascade processes," in Proc. Neural Inf. Process. Syst., 2008, vol.20, pp. 1545-1552.
    • (2008) Proc. Neural Inf. Process. Syst. , vol.20 , pp. 1545-1552
    • Turner, R.E.1    Sahani, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.