메뉴 건너뛰기




Volumn , Issue , 2013, Pages 2992-2996

Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?

Author keywords

Deep neural network; Multi condition training; Robust speech recognition; Speech enhancement

Indexed keywords

HUMAN COMPUTER INTERACTION; SPEECH ENHANCEMENT;

EID: 84906222220     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (47)

References (33)
  • 9
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • M. J. F. Gales and S. Young, "Robust continuous speech recognition using parallel model combination, " IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.2
  • 10
    • 0032048385 scopus 로고    scopus 로고
    • Speech recognition in noisy environments using first-order vector Taylor series
    • D. Y. Kim, C. K. Un, and N. S. Kim, "Speech recognition in noisy environments using first-order vector Taylor series, " Speech Communication, pp. 39-49, 1998.
    • (1998) Speech Communication , pp. 39-49
    • Kim, D.Y.1    Un, C.K.2    Kim, N.S.3
  • 12
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics, " IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, 2001.
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 14
    • 79959854950 scopus 로고    scopus 로고
    • Multichannel source separation based on source location cue with logspectral shaping by hidden Markov source model
    • T. Nakatani, S. Araki, T. Yoshioka, and M. Fujimoto, "Multichannel source separation based on source location cue with logspectral shaping by hidden Markov source model, " in Proc. Interspeech, 2010, pp. 2766-2769.
    • (2010) Proc. Interspeech , pp. 2766-2769
    • Nakatani, T.1    Araki, S.2    Yoshioka, T.3    Fujimoto, M.4
  • 15
    • 84865754161 scopus 로고    scopus 로고
    • Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR
    • T. Nakatani, S. Araki, M. Delcroix, T. Yoshioka, and M. Fujimoto, "Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR, " in Proc. Interspeech, 2011, pp. 1785-1788.
    • (2011) Proc. Interspeech , pp. 1785-1788
    • Nakatani, T.1    Araki, S.2    Delcroix, M.3    Yoshioka, T.4    Fujimoto, M.5
  • 20
    • 45749110924 scopus 로고    scopus 로고
    • Representational power of restricted Boltzmann machines and deep belief networks
    • N. L. Roux and Y. Bengio, "Representational power of restricted Boltzmann machines and deep belief networks, " Neural Computation, vol. 20, no. 6, pp. 1631-1649, 2008.
    • (2008) Neural Computation , vol.20 , Issue.6 , pp. 1631-1649
    • Roux, N.L.1    Bengio, Y.2
  • 24
    • 78650016939 scopus 로고    scopus 로고
    • Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment
    • H. Sawada, S. Araki, and S. Makino, "Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 3, pp. 516-527, 2010.
    • (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.3 , pp. 516-527
    • Sawada, H.1    Araki, S.2    Makino, S.3
  • 25
    • 84878543263 scopus 로고    scopus 로고
    • The PASCAL CHiME speech separation and recognition challenge
    • J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, "The PASCAL CHiME speech separation and recognition challenge, " Computer Speech&Language, vol. 27, no. 3, pp. 621 - 633, 2013.
    • (2013) Computer Speech&Language , vol.27 , Issue.3 , pp. 621-633
    • Barker, J.1    Vincent, E.2    Ma, N.3    Christensen, H.4    Green, P.5
  • 27
    • 45849093239 scopus 로고    scopus 로고
    • Efficient WFSTbased one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
    • T. Hori, C. Hori, Y. Minami, and A. Nakamura, "Efficient WFSTbased one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1352-1365, 2006.
    • (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.4 , pp. 1352-1365
    • Hori, T.1    Hori, C.2    Minami, Y.3    Nakamura, A.4
  • 28
    • 70450194926 scopus 로고    scopus 로고
    • Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training
    • E. McDermott, S. Watanabe, and A. Nakamura, "Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training, " in Proc. Interspeech, 2009, pp. 224-227.
    • (2009) Proc. Interspeech , pp. 224-227
    • McDermott, E.1    Watanabe, S.2    Nakamura, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.