SCOPUS 정보 검색 플랫폼

Proceedings - IEEE International Conference on Multimedia and Expo

Volumn 2014-September, Issue Septmber, 2014, Pages

Improved audio features for large-scale multimedia event detection

(3) Metze, Florian a Rawat, Shourabh a Wang, Yipei a

a Carnegie Mellon University (United States)

Author keywords

acoustic event detection; computational acoustic scene analysis; multimedia retrieval

Indexed keywords

IMAGE RETRIEVAL; NEURAL NETWORKS; SEMANTICS;

ACOUSTIC EVENT DETECTIONS; ALGORITHMIC APPROACH; COMPUTATIONAL ACOUSTICS; MULTIMEDIA EVENT DETECTIONS; MULTIMEDIA RETRIEVAL; RELATIVE CONTRIBUTION; SEMANTIC DESCRIPTIONS; TEMPORAL RESOLUTION;

FEATURE EXTRACTION;

EID: 84937509499 PISSN: 19457871 EISSN: 1945788X Source Type: Conference Proceeding
DOI: 10.1109/ICME.2014.6890234 Document Type: Conference Paper

Times cited : (18)

References (28)

1
- 85085788280
- Trecvid 2013-An introduction to the goals, tasks, data, evaluation mechanisms, and metrics
- Gaithersburg, MD; U.S.A., Nov., National Institute of Standards and Technology
- Paul Over, Jon Fiscus, and Greg Sanders, "TRECVID 2013-An introduction to the goals, tasks, data, evaluation mechanisms, and metrics, " in Proc. TRECVID, Gaithersburg, MD; U.S.A., Nov. 2013, National Institute of Standards and Technology, http://wwwnlpir. nist.gov/projects/tv2013/.
- (2013) Proc. TRECVID
- Over, P.¹ Fiscus, J.² Sanders, G.³

2
- 84937454179
- Creating havic: Heterogeneous audio visual internet collection
- Istanbul, Turkey, May 2012, European Language Resources Association (ELRA
- Stephanie Strassel, Amanda Morris, Jonathan Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, and Martial Michel, "Creating HAVIC: Heterogeneous audio visual internet collection, " in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, May 2012, European Language Resources Association (ELRA).
- Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)
- Strassel, S.¹ Morris, A.² Fiscus, J.³ Caruso, C.⁴ Lee, H.⁵ Over, P.⁶ Fiumara, J.⁷ Shaw, B.⁸ Antonishek, B.⁹ Michel, M.¹⁰

3
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury, " Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.E.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

4
- 84905262359
- Tech. Rep. CMU-LTI-12-07, Carnegie Mellon University, Pittsburgh, PA; U.S.A
- Susanne Burger, Qin Jin, Peter F. Schulam, and Florian Metze, " Noisemes: Manual annotation of environmental noise in audio streams," Tech. Rep. CMU-LTI-12-07, Carnegie Mellon University, Pittsburgh, PA; U.S.A., 2012.
- (2012) Noisemes: Manual Annotation of Environmental Noise in Audio Streams
- Burger, S.¹ Jin, Q.² Schulam, P.F.³ Metze, F.⁴

5
- 84878551166
- Event-based video retrieval using audio
- Qin Jin, Peter F. Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, and Florian Metze, " Event-based video retrieval using audio," In Proc. INTERSPEECH
- Proc. INTERSPEECH
- Jin, Q.¹ Schulam, P.F.² Rawat, S.³ Burger, S.⁴ Ding, D.⁵ Metze, F.⁶

6
- 85041458689
- Audio concept ranking for video event detection on user-generated content
- Marseille, France, Aug., ISCA
- Benjamin Elizalde,Mirco Ravanelli,Gerald Friedland Audio concept ranking for video event detection on user-generated content in Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM), Marseille, France, Aug. 2013, ISCA.
- (2013) Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM)
- Elizalde, B.¹ Ravanelli, M.² Friedland, G.³

7
- 84870415507
- Supervised acoustic concept extraction for multimedia event detection
- Nara; Japan, Oct. ACM
- Stephanie Pancoast, Murat Akbacak, and Michelle Sanchez, "Supervised acoustic concept extraction for multimedia event detection, " in ACM Multimedia Workshop on Audi o and Multimedia Methods for Large-Scale Video Analysis (AMVA), Nara; Japan, Oct. 2012, ACM.
- (2012) ACM Multimedia Workshop on Audi O and Multimedia Methods for Large-Scale Video Analysis (AMVA)
- Pancoast, S.¹ Akbacak, M.² Sanchez, M.³

8
- 84905270442
- IBM research and columbia university trecvid-2011 multimedia event detection (med) system
- Gaithersburg, MD; U.S.A. , Nov., National Institute of Standards and Technology
- Liangliang Cao, Shih-Fu Chang, Noel Codella, Courtenay Cotton, Dan Ellis, Leiguang Gong, Matthew Hill, Gang Hua, John Kender, Michele Merler, Yadong Mu, Apostol Natseve, and John R. Smith, " IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System," in Proc. TRECVID, Gaithersburg, MD; U.S.A. , Nov. 2011, National Institute of Standards and Technology, http://wwwnlpir. nist.gov/projects/tv2011/.
- (2011) Proc. TRECVID
- Cao, L.¹ Chang, S.-F.² Codella, N.³ Cotton, C.⁴ Ellis, D.⁵ Gong, L.⁶ Hill, M.⁷ Hua, G.⁸ Kender, J.⁹ Merler, M.¹⁰ Mu, Y.¹¹ Natseve, A.¹² Smith, J.R.¹³

9
- 84906214187
- Robust audio codebooks for large scale event detection in consumer videos
- Lyon; France, Aug., ISCA
- Shourabh Rawat, Peter Schulam, Susanne Burger, Duo Ding, Yipei Wang, and Florian Metze, " Robust audio codebooks for large scale event detection in consumer videos," in Proc. INTERSPEECH, Lyon; France, Aug. 2013, ISCA.
- (2013) Proc. INTERSPEECH
- Rawat, S.¹ Schulam, P.² Burger, S.³ Ding, D.⁴ Wang, Y.⁵ Metze, F.⁶

10
- 84878606595
- Bag-of-Audiowords approach for multimedia event classification
- Stephanie Pancoast and Murat Akbacak, "Bag-of-Audiowords approach for multimedia event classification," In Proc. INTERSPEECH [27].
- Proc. INTERSPEECH [ , vol.27
- Pancoast, S.¹ Akbacak, M.²

11
- 84878580398
- Compact audio representation for event detection in consumer media
- Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, and Prem Natarajan, " Compact audio representation for event detection in consumer media," In Proc. INTERSPEECH [27].
- Proc. INTERSPEECH [ , vol.27
- Zhuang, X.¹ Tsakalidis, S.² Wu, S.³ Natarajan, P.⁴ Prasad, R.⁵ Natarajan, P.⁶

12
- 84878587807
- Robust event detection from spoken content in consumer domain videos
- Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, ShuangWu, Pradeep Natarajan, Rohit Prasad, and Prem Natarajan, " Robust event detection from spoken content in consumer domain videos, " In Proc. INTERSPEECH [27].
- Proc. INTERSPEECH [ , vol.27
- Tsakalidis, S.¹ Zhuang, X.² Hsiao, R.³ Wu, S.⁴ Natarajan, P.⁵ Prasad, R.⁶ Natarajan, P.⁷

13
- 84455207538
- Audio-visual fusion using Bayesian model combination for web video retrieval
- New York, NY, USA, MM '11, ACM
- Vasant Manohar, Stavros Tsakalidis, Pradeep Natarajan, Rohit Prasad, and Prem Natarajan, " Audio-visual fusion using Bayesian model combination for web video retrieval," in Proceedings of the 19th ACM International Conference on Multimedia, New York, NY, USA, 2011, MM '11, pp. 1537-1540, ACM.
- (2011) Proceedings of the 19th ACM International Conference on Multimedia , pp. 1537-1540
- Manohar, V.¹ Tsakalidis, S.² Natarajan, P.³ Prasad, R.⁴ Natarajan, P.⁵

14
- 84937415065
- National Institute of Standards of Technology Aug. 2013, Last acccessed: April 15
- National Institute of Standards of Technology, " 2013 TRECVID Multimedia Event Detection Track," http://www.nist.gov/itl/iad/mig/med13.cfm, Aug. 2013, Last acccessed: April 15, 2014.
- (2014) 2013 TRECVID Multimedia Event Detection Track

15
- 84971360473
- Sept. 2013
- FFmpeg, " A complete, cross-platform solution to record, convert and stream audio and video," http://www.ffmpeg.org/, Sept. 2013.
- FFmpeg A Complete, Cross-platform Solution to Record, Convert and Stream Audio and Video

16
- 84962868641
- A one-pass decoder based on polymorphic linguistic context assignment
- Madonna di Campiglio, Italy Dec IEEE
- Hagen Soltau, Florian Metze, Christian Fügen, and Alex Waibel, " A One-pass Decoder based on Polymorphic Linguistic Context Assignment," in Proc. Automatic Speech Recognition and Understanding (ASRU), Madonna di Campiglio, Italy, Dec. 2001, IEEE.
- (2001) Proc. Automatic Speech Recognition and Understanding (ASRU)
- Soltau, H.¹ Metze, F.² Fügen, C.³ Waibel, A.⁴

17
- 84953744816
- A statistical interpretation of term specificity and its application in retrieval
- Karen Sparck Jones, " A statistical interpretation of term specificity and its application in retrieval," Journal of Documentation, 1972.
- (1972) Journal of Documentation
- Jones, K.S.¹

18
- 84937454189
- Extracting deep bottleneck features using stacked auto-encoders
- Jonas Gehring, Yajie Miao, Florian Metze, and Alex Waibel, " Extracting deep bottleneck features using stacked auto-encoders," In Proc. ICASSP [28].
- Proc. ICASSP [ , vol.28
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

19
- 84890499569
- Unsupervised hierarchical structure induction for deeper semantic analysis of audio
- Sourish Chaudhuri and Bhiksha Raj, "Unsupervised hierarchical structure induction for deeper semantic analysis of audio, " In Proc. ICASSP [28], pp. 833-837.
- Proc. ICASSP , vol.28 , pp. 833-837
- Chaudhuri, S.¹ Raj, B.²

20
- 51449103447
- Optimizing bottleneck features for lvcsr
- Las Vegas, NV; U.S.A. Apr. IEEE
- Frantisek Grézl and Petr Fousek, "Optimizing bottleneck features for LVCSR, " in Proc. ICASSP, Las Vegas, NV; U.S.A., Apr. 2008, IEEE.
- (2008) Proc. ICASSP
- Grézl, F.¹ Fousek, P.²

21
- 84955035459
- A scale for the measurement of the psychological magnitude pitch
- Stanley S. Stevens, John Volkman, and Edwin B. Newman, " A scale for the measurement of the psychological magnitude pitch," The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937.
- (1937) The Journal of the Acoustical Society of America , vol.8 , Issue.3 , pp. 185-190
- Stevens, S.S.¹ Volkman, J.² Newman, E.B.³

22
- 84937454189
- Extracting deep bottleneck features using stacked auto-encoders
- 22] Jonas Gehring, Yajie Miao, Florian Metze, and Alex Waibel, " Extracting Deep Bottleneck Features Using Stacked Auto-Encoders," In Proc. ICASSP [28].
- Proc. ICASSP [ , vol.28
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

23
- 84857819132
- Theano: A cpu and gpu math expression compiler
- Oral Presentation
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio, "Theano: a CPU and GPU math expression compiler, " in Proceedings of the Python for Scientific Com puting Conference (SciPy), June 2010, Oral Presentation.
- (2010) Proceedings of the Python for Scientific Com Puting Conference (SciPy), June
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

24
- 33644626634
- Tech. Rep., IRCAM
- Geoffroy Peeters, " A large set of audio features for sound description (similarity and classification) in the CUIDADO project," Tech. Rep., IRCAM, 2004, http://recherche.ircam.fr/anasyn/peeters/ARTICLES-/Peeters 2003 cuidadoaudiofeatures.pdf.
- (2004) A Large Set of Audio Features for Sound Description (Similarity and Classification) in the CUIDADO Project
- Peeters, G.¹

25
- 78650977476
- Opensmile: The munich versatile and fast open-source audio feature extractor
- New York, NY; USA, MM '10 ACM
- Florian Eyben, Martin Wöllmer, and Björn Schuller, " Opensmile: the Munich versatile and fast open-source audio feature extractor," in Proceedings of the International Conference on Multimedia, New York, NY; USA, 2010, MM '10, pp. 1459-1462, ACM.
- (2010) Proceedings of the International Conference on Multimedia , pp. 1459-1462
- Eyben, F.¹ Wöllmer, M.² Schuller, B.³

26
- 84890530296
- Subband autocorrelation features for video soundtrack classification
- Courtenay V. Cotton and Dan P.W. Ellis, " Subband autocorrelation features for video soundtrack classification," In Proc. ICASSP [28], pp. 8663-8666.
- Proc. ICASSP [ , vol.28 , pp. 8663-8666
- Cotton, C.V.¹ Ellis, D.P.W.²

27
- 84937454192
- INTERSPEECH 2012, Portland, OR; U.S.A., Sept. ISCA
- INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, OR; U.S.A., Sept. 2012. ISCA.
- (2012) 13th Annual Conference of the International Speech Communication Association

28
- 84937454193
- Vancouver, BC; Canada, May. IEEE
- IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC; Canada, May 2013. IEEE.
- (2013) IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.