메뉴 건너뛰기




Volumn , Issue , 2013, Pages 728-731

Speech activity detection on youtube using deep neural networks

Author keywords

Deep neural networks; Segmentation; Speech activity detection; Voice activity detection

Indexed keywords

IMAGE SEGMENTATION; SPEECH RECOGNITION;

EID: 84906228076     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (119)

References (16)
  • 3
    • 0036293830 scopus 로고    scopus 로고
    • An overview of automatic speaker recognition technology
    • D. A. Reynolds, "An overview of automatic speaker recognition technology, " in Proceedings of ICASSP, vol. 4, 2002, pp. 4072- 4075.
    • (2002) Proceedings of ICASSP , vol.4 , pp. 4072-4075
    • Reynolds, D.A.1
  • 4
    • 84873315510 scopus 로고    scopus 로고
    • Unsupervised speech activity detection using voicing measures and perceptual spectral flux
    • IEEE
    • S. Sadjadi and J. Hansen, "Unsupervised speech activity detection using voicing measures and perceptual spectral flux, " Signal Processing Letters, IEEE, vol. 20, pp. 197-200, 2013.
    • (2013) Signal Processing Letters , vol.20 , pp. 197-200
    • Sadjadi, S.1    Hansen, J.2
  • 7
    • 79959838316 scopus 로고    scopus 로고
    • Voice activity detection based on conditional random fields using multiple features
    • A. Saito, Y. Nankaku, A. Lee, and K. Tokuda, "Voice activity detection based on conditional random fields using multiple features, " in Proceedings of InterSpeech, 2010, pp. 2086-2089.
    • (2010) Proceedings of InterSpeech , pp. 2086-2089
    • Saito, A.1    Nankaku, Y.2    Lee, A.3    Tokuda, K.4
  • 8
    • 80051623447 scopus 로고    scopus 로고
    • Speaker diarization of heterogeneous web video files: A preliminary study
    • P. Clement, T. Bazillon, and C. Fredouille, "Speaker diarization of heterogeneous web video files: A preliminary study, " in Proceedings of ICASSP, 2011, pp. 4432-4435.
    • (2011) Proceedings of ICASSP , pp. 4432-4435
    • Clement, P.1    Bazillon, T.2    Fredouille, C.3
  • 9
    • 84878610785 scopus 로고    scopus 로고
    • Speech/nonspeech segmentation in web videos
    • A. Misra, "Speech/nonspeech segmentation in web videos, " in Proceedings of InterSpeech, 2012.
    • (2012) Proceedings of InterSpeech
    • Misra, A.1
  • 10
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets, " Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 14
    • 23344452899 scopus 로고    scopus 로고
    • Statistical voice activity detection using a multiple observation likelihood ratio test
    • IEEE
    • J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical voice activity detection using a multiple observation likelihood ratio test, " Signal Processing Letters, IEEE, vol. 12, no. 10, pp. 689-692, 2005.
    • (2005) Signal Processing Letters , vol.12 , Issue.10 , pp. 689-692
    • Ramirez, J.1    Segura, J.C.2    Benitez, C.3    Garcia, L.4    Rubio, A.5
  • 16
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30-42, 2012.
    • (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.