SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 2405-2413

Visually Indicated Sounds

(6) Owens, Andrew a Isola, Phillip a,b McDermott, Josh a Torralba, Antonio a Adelson, Edward H a Freeman, William T a,c

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c GOOGLE INC (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; RECURRENT NEURAL NETWORKS;

EXAMPLE-BASED SYNTHESIS; PHYSICAL INTERACTIONS; PSYCHOPHYSICAL EXPERIMENTS; VISUAL SCENE; WAVE FORMS;

PATTERN RECOGNITION;

EID: 84986249782 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.264 Document Type: Conference Paper

Times cited : (404)

References (46)

1
- 84973926501
- Learning to see by moving
- P. Agrawal, J. Carreira, and J. Malik. Learning to see by moving. In ICCV, 2015.
- (2015) ICCV
- Agrawal, P.¹ Carreira, J.² Malik, J.³

2
- 85006343094
- Joint object-material category segmentation from audio-visual cues
- A. Arnab, M. Sapienza, S. Golodetz, J. Valentin, O. Miksik, S. Izadi, and P. H. S. Torr. Joint object-material category segmentation from audio-visual cues. In BMVC, 2015.
- (2015) BMVC
- Arnab, A.¹ Sapienza, M.² Golodetz, S.³ Valentin, J.⁴ Miksik, O.⁵ Izadi, S.⁶ Torr, P.H.S.⁷

3
- 0037854202
- The acquisition of physical knowledge in infancy: A summary in eight lessons
- R. Baillargeon. The acquisition of physical knowledge in infancy: A summary in eight lessons. Blackwell handbook of childhood cognitive development, 1:46-83, 2002.
- (2002) Blackwell Handbook of Childhood Cognitive Development , vol.1 , pp. 46-83
- Baillargeon, R.¹

4
- 84962478162
- Material recognition in the wild with the materials in context database
- S. Bell, P. Upchurch, N. Snavely, and K. Bala. Material recognition in the wild with the materials in context database. CoRR, abs/1412.0623, 2014.
- (2014) CoRR, Abs , vol.1412 , pp. 0623
- Bell, S.¹ Upchurch, P.² Snavely, N.³ Bala, K.⁴

5
- 27644583688
- A tutorial on onset detection in music signals
- J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial on onset detection in music signals. Speech and Audio Processing, IEEE Transactions on, 13(5):1035-1047, 2005.
- (2005) Speech and Audio Processing, IEEE Transactions on , vol.13 , Issue.5 , pp. 1035-1047
- Bello, J.P.¹ Daudet, L.² Abdallah, S.³ Duxbury, C.⁴ Davies, M.⁵ Sandler, M.B.⁶

6
- 84986273500
- Were those coconuts or horse hoofs? Visual context effects on identification and perceived veracity of everyday sounds
- T. Bonebright. Were those coconuts or horse hoofs? visual context effects on identification and perceived veracity of everyday sounds. In International Conference on Auditory Display, 2012.
- (2012) International Conference on Auditory Display
- Bonebright, T.¹

7
- 34249866182
- Statistical modeling of intrinsic structures in impacts sounds
- S. Cavaco and M. S. Lewicki. Statistical modeling of intrinsic structures in impacts sounds. The Journal of the Acoustical Society of America, 121(6):3558-3568, 2007.
- (2007) The Journal of the Acoustical Society of America , vol.121 , Issue.6 , pp. 3558-3568
- Cavaco, S.¹ Lewicki, M.S.²

8
- 84959239475
- Visual vibrometry: Estimating material properties from small motion in video
- A. Davis, K. L. Bouman, M. Rubinstein, F. Durand, and W. T. Freeman. Visual vibrometry: Estimating material properties from small motion in video. In CVPR, 2015.
- (2015) CVPR
- Davis, A.¹ Bouman, K.L.² Rubinstein, M.³ Durand, F.⁴ Freeman, W.T.⁵

9
- 84905749657
- The visual microphone: Passive recovery of sound from video
- A. Davis, M. Rubinstein, N. Wadhwa, G. J. Mysore, F. Durand, and W. T. Freeman. The visual microphone: passive recovery of sound from video. ACM Transactions on Graphics (TOG), 2014.
- (2014) ACM Transactions on Graphics (TOG)
- Davis, A.¹ Rubinstein, M.² Wadhwa, N.³ Mysore, G.J.⁴ Durand, F.⁵ Freeman, W.T.⁶

10
- 85198028989
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

11
- 84973916088
- Unsupervised visual representation learning by context prediction
- C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. ICCV, 2015.
- (2015) ICCV
- Doersch, C.¹ Gupta, A.² Efros, A.A.³

12
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

13
- 0016421071
- The estimation of the gradient of a density function, with applications in pattern recognition
- K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):32-40, 1975.
- (1975) Information Theory, IEEE Transactions on , vol.21 , Issue.1 , pp. 32-40
- Fukunaga, K.¹ Hostetler, L.D.²

14
- 84959948834
- What in the world do we hear?: An ecological approach to auditory event perception
- W.W. Gaver. What in the world do we hear?: An ecological approach to auditory event perception. Ecological psychology, 1993.
- (1993) Ecological Psychology
- Gaver, W.W.¹

15
- 84911468319
- Learning haptic representation for manipulating deformable food objects
- M. Gemici and A. Saxena. Learning haptic representation for manipulating deformable food objects. In IROS, 2014.
- (2014) IROS
- Gemici, M.¹ Saxena, A.²

16
- 0025110885
- Derivation of auditory filter shapes from notched-noise data
- B. R. Glasberg and B. C. Moore. Derivation of auditory filter shapes from notched-noise data. Hearing research, 47(1):103-138, 1990.
- (1990) Hearing Research , vol.47 , Issue.1 , pp. 103-138
- Glasberg, B.R.¹ Moore, B.C.²

17
- 85070926206
- arXiv preprint arXiv 1504 02518
- R. Goroshin, J. Bruna, J. Tompson, D. Eigen, and Y. LeCun. Unsupervised feature learning from temporal data. arXiv preprint arXiv:1504.02518, 2015.
- (2015) Unsupervised Feature Learning from Temporal Data
- Goroshin, R.¹ Bruna, J.² Tompson, J.³ Eigen, D.⁴ LeCun, Y.⁵

18
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

19
- 84862572299
- Spatial pattern of bold fmri activation reveals cross-modal information in auditory cortex
- P.-J. Hsieh, J. T. Colas, and N. Kanwisher. Spatial pattern of bold fmri activation reveals cross-modal information in auditory cortex. Journal of neurophysiology, 2012.
- (2012) Journal of Neurophysiology
- Hsieh, P.-J.¹ Colas, J.T.² Kanwisher, N.³

20
- 0742307391
- Speech enhancement based on wavelet thresholding the multitaper spectrum
- Y. Hu and P. C. Loizou. Speech enhancement based on wavelet thresholding the multitaper spectrum. Speech and Audio Processing, IEEE Transactions on, 12(1):59-67, 2004.
- (2004) Speech and Audio Processing, IEEE Transactions on , vol.12 , Issue.1 , pp. 59-67
- Hu, Y.¹ Loizou, P.C.²

21
- 84990069535
- arXiv preprint arXiv 1511 06811
- P. Isola, D. Zoran, D. Krishnan, and E. H. Adelson. Learning visual groups from co-occurrences in space and time. arXiv preprint arXiv:1511.06811, 2015.
- (2015) Learning Visual Groups from Co-occurrences in Space and Time
- Isola, P.¹ Zoran, D.² Krishnan, D.³ Adelson, E.H.⁴

22
- 84973897623
- Learning image representations tied to ego-motion
- D. Jayaraman and K. Grauman. Learning image representations tied to ego-motion. In ICCV, December 2015.
- (2015) ICCV, December
- Jayaraman, D.¹ Grauman, K.²

23
- 84973872595
- 3d convolutional neural networks for human action recognition
- S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE TPAMI, 2013.
- (2013) IEEE TPAMI
- Ji, S.¹ Xu, W.² Yang, M.³ Yu, K.⁴

24
- 84913580146
- Caffe: Convolutional architecture for fast feature embedding
- ACM
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675-678. ACM, 2014.
- (2014) Proceedings of the ACM International Conference on Multimedia , pp. 675-678
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

25
- 0002697827
- Can one hear the shape of a drum?
- M. Kac. Can one hear the shape of a drum? The american mathematical monthly, 1966.
- (1966) The American Mathematical Monthly
- Kac, M.¹

26
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

27
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

28
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. M. Meng, and L. Deng. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine, 2015.
- (2015) IEEE Signal Processing Magazine
- Ling, Z.-H.¹ Kang, S.-Y.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.-J.⁶ Meng, H.M.⁷ Deng, L.⁸

29
- 68149175636
- Human sound source identification
- Springer
- R. A. Lutfi. Human sound source identification. In Auditory perception of sound sources, pages 13-42. Springer, 2008.
- (2008) Auditory Perception of Sound Sources , pp. 13-42
- Lutfi, R.A.¹

30
- 80052406394
- Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
- J. H. McDermott and E. P. Simoncelli. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron, 71(5):926-940, 2011.
- (2011) Neuron , vol.71 , Issue.5 , pp. 926-940
- McDermott, J.H.¹ Simoncelli, E.P.²

31
- 71149084945
- Deep learning from temporal coherence in video
- H. Mobahi, R. Collobert, and J. Weston. Deep learning from temporal coherence in video. In ICML, 2009.
- (2009) ICML
- Mobahi, H.¹ Collobert, R.² Weston, J.³

32
- 80053437179
- Multimodal deep learning
- J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, 2011.
- (2011) ICML
- Ngiam, J.¹ Khosla, A.² Kim, M.³ Nam, J.⁴ Lee, H.⁵ Ng, A.Y.⁶

33
- 84986281265
- arXiv preprint arXiv 1509 06825
- L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. arXiv preprint arXiv:1509.06825, 2015.
- (2015) Supersizing Self-supervision: Learning to Grasp from 50k Tries and 700 Robot Hours
- Pinto, L.¹ Gupta, A.²

34
- 84862867519
- The origins of inquiry: Inductive inference and exploration in early childhood
- L. Schulz. The origins of inquiry: Inductive inference and exploration in early childhood. Trends in cognitive sciences, 16(7):382-389, 2012.
- (2012) Trends in Cognitive Sciences , vol.16 , Issue.7 , pp. 382-389
- Schulz, L.¹

35
- 0004168716
- Springer Science & Business Media
- A. A. Shabana. Theory of vibration: an introduction. Springer Science & Business Media, 1995.
- (1995) Theory of Vibration: An Introduction
- Shabana, A.A.¹

36
- 84878726069
- Recognizing materials using perceptually inspired features
- L. Sharan, C. Liu, R. Rosenholtz, and E. H. Adelson. Recognizing materials using perceptually inspired features. International journal of computer vision, 103(3):348-371, 2013.
- (2013) International Journal of Computer Vision , vol.103 , Issue.3 , pp. 348-371
- Sharan, L.¹ Liu, C.² Rosenholtz, R.³ Adelson, E.H.⁴

37
- 85113410945
- Black boxes: Hypothesis testing via indirect perceptual evidence
- M. H. Siegel, R. Magid, J. B. Tenenbaum, and L. E. Schulz. Black boxes: Hypothesis testing via indirect perceptual evidence. Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014.
- (2014) Proceedings of the 36th Annual Conference of the Cognitive Science Society
- Siegel, M.H.¹ Magid, R.² Tenenbaum, J.B.³ Schulz, L.E.⁴

38
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, 2014.
- (2014) Advances in Neural Information Processing Systems
- Simonyan, K.¹ Zisserman, A.²

39
- 70350383283
- Interactive learning of the acoustic properties of household objects
- J. Sinapov, M. Wiemer, and A. Stoytchev. Interactive learning of the acoustic properties of household objects. In ICRA, 2009.
- (2009) ICRA
- Sinapov, J.¹ Wiemer, M.² Stoytchev, A.³

40
- 85153941343
- Pattern playback in the 90s
- M. Slaney. Pattern playback in the 90s. In NIPS, pages 827-834, 1994.
- (1994) NIPS , pp. 827-834
- Slaney, M.¹

41
- 15444371960
- The development of embodied cognition: Six lessons from babies
- L. Smith and M. Gasser. The development of embodied cognition: Six lessons from babies. Artificial life, 11(1-2):13-29, 2005.
- (2005) Artificial Life , vol.11 , Issue.1-2 , pp. 13-29
- Smith, L.¹ Gasser, M.²

42
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929-1958, 2014.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

43
- 0035148928
- Foleyautomatic: Physically-based sound effects for interactive simulation and animation
- ACM
- K. Van Den Doel, P. G. Kry, and D. K. Pai. Foleyautomatic: physically-based sound effects for interactive simulation and animation. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 537-544. ACM, 2001.
- (2001) Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques , pp. 537-544
- Doel Den K.Van¹ Kry, P.G.² Pai, D.K.³

44
- 84990057054
- Anticipating the future by watching unlabeled video
- C. Vondrick, H. Pirsiavash, and A. Torralba. Anticipating the future by watching unlabeled video. arXiv preprint arXiv:1504.08023, 2015.
- (2015) ArXiv Preprint ArXiv , vol.1504 , pp. 08023
- Vondrick, C.¹ Pirsiavash, H.² Torralba, A.³

45
- 84973889989
- Unsupervised learning of visual representations using videos
- X. Wang and A. Gupta. Unsupervised learning of visual representations using videos. In ICCV, 2015.
- (2015) ICCV
- Wang, X.¹ Gupta, A.²

46
- 84937964578
- Learning deep features for scene recognition using places database
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS, 2014.
- (2014) NIPS
- Zhou, B.¹ Lapedriza, A.² Xiao, J.³ Torralba, A.⁴ Oliva, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.