메뉴 건너뛰기




Volumn 62, Issue 16, 2014, Pages 4114-4128

Deep scattering spectrum

Author keywords

Audio classification; deep neural networks; MFCC; modulation spectrum; wavelets

Indexed keywords

MODULATION;

EID: 84896734479     PISSN: 1053587X     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSP.2014.2326991     Document Type: Article
Times cited : (490)

References (50)
  • 4
    • 80052406394 scopus 로고    scopus 로고
    • Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
    • J. McDermott and E. Simoncelli, "Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis," Neuron, vol. 71, no. 5, pp. 926-940, 2011.
    • (2011) Neuron , vol.71 , Issue.5 , pp. 926-940
    • McDermott, J.1    Simoncelli, E.2
  • 5
    • 80051605384 scopus 로고    scopus 로고
    • Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection
    • M. Ramona and G. Peeters, "Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection," in Proc. IEEE ICASSP, 2011, pp. 477-480.
    • (2011) Proc. IEEE ICASSP , pp. 477-480
    • Ramona, M.1    Peeters, G.2
  • 7
    • 0034227088 scopus 로고    scopus 로고
    • Auditory images: How complex sounds are represented in the auditory system
    • R. D. Patterson, "Auditory images: How complex sounds are represented in the auditory system," J. Acoust. Soc. Japan (E), vol. 21, no. 4, pp. 183-190, 2000.
    • (2000) J. Acoust. Soc. Japan (E) , vol.21 , Issue.4 , pp. 183-190
    • Patterson, R.D.1
  • 8
    • 67349176070 scopus 로고    scopus 로고
    • Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features
    • Jun
    • C. Lee, J. Shih, K. Yu, and H. Lin, "Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features," IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, Jun. 2009.
    • (2009) IEEE Trans. Multimedia , vol.11 , Issue.4 , pp. 670-682
    • Lee, C.1    Shih, J.2    Yu, K.3    Lin, H.4
  • 9
    • 80051659405 scopus 로고    scopus 로고
    • Classifying soundtracks with audio texture features
    • Prague, Czech Republic, May 22-27
    • D. Ellis, X. Zeng, and J. McDermott, "Classifying soundtracks with audio texture features," in Proc. IEEE ICASSP, Prague, Czech Republic, May 22-27, 2011, pp. 5880-5883.
    • (2011) Proc. IEEE ICASSP , pp. 5880-5883
    • Ellis, D.1    Zeng, X.2    McDermott, J.3
  • 11
    • 84864324516 scopus 로고    scopus 로고
    • Group invariant scattering
    • S. Mallat, "Group invariant scattering," Commun. Pure Appl. Math., vol. 65, no. 10, pp. 1331-1398, 2012.
    • (2012) Commun. Pure Appl. Math , vol.65 , Issue.10 , pp. 1331-1398
    • Mallat, S.1
  • 12
    • 77955998889 scopus 로고    scopus 로고
    • Convolutional networks and applications in vision
    • presented at
    • Y. LeCun, K. Kavukvuoglu, and C. Farabet, "Convolutional networks and applications in vision," presented at the IEEE ISCAS, 2010.
    • (2010) The IEEE ISCAS
    • Lecun, Y.1    Kavukvuoglu, K.2    Farabet, C.3
  • 13
    • 84863380535 scopus 로고    scopus 로고
    • Unsupervised feature learning for audio classification using convolutional deep belief networks
    • presented at
    • H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," presented at the NIPS, 2009.
    • (2009) The NIPS
    • Lee, H.1    Pham, P.2    Largman, Y.3    Ng, A.4
  • 14
    • 85032751458 scopus 로고    scopus 로고
    • Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
    • Dec
    • G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Dec. 2012.
    • (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
    • Hinton, G.1
  • 15
    • 84890545163 scopus 로고    scopus 로고
    • A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
    • presented at
    • L. Deng, O. Abdel-Hamid, and D. Yu, "A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion," presented at the ICASSP, 2013.
    • (2013) The ICASSP
    • Deng, L.1    Abdel-Hamid, O.2    Yu, D.3
  • 16
    • 84890543083 scopus 로고    scopus 로고
    • Speech recognition with deep recurrent neural networks
    • presented at
    • A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," presented at the ICASSP, 2013.
    • (2013) The ICASSP
    • Graves, A.1    Mohamed, A.-R.2    Hinton, G.3
  • 17
    • 84867602860 scopus 로고    scopus 로고
    • Learning a robust tonnetz- space transform for automatic chord recognition
    • E. J. Humphrey, T. Cho, and J. P. Bello, "Learning a robust tonnetz- space transform for automatic chord recognition," in Proc. IEEE ICASSP, 2012, pp. 453-456.
    • (2012) Proc. IEEE ICASSP , pp. 453-456
    • Humphrey, E.J.1    Cho, T.2    Bello, J.P.3
  • 18
    • 84873584268 scopus 로고    scopus 로고
    • Learning features from music audio with deep belief networks
    • presented at
    • P. Hamel and D. Eck, "Learning features from music audio with deep belief networks," presented at the ISMIR, 2010.
    • (2010) The ISMIR
    • Hamel, P.1    Eck, D.2
  • 19
    • 84873426072 scopus 로고    scopus 로고
    • Analyzing drum patterns using conditional deep belief networks
    • presented at
    • E. Battenberg and D. Wessel, "Analyzing drum patterns using conditional deep belief networks," presented at the ISMIR, 2012.
    • (2012) The ISMIR
    • Battenberg, E.1    Wessel, D.2
  • 20
    • 0030691985 scopus 로고    scopus 로고
    • Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
    • DOI 10.1121/1.420344
    • T. Dau, B. Kollmeier, and A. Kohlrausch, "Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers," J. Acoust. Soc. Amer., vol. 102, no. 5, pp. 2892-2905, 1997. (Pubitemid 27486267)
    • (1997) Journal of the Acoustical Society of America , vol.102 , Issue.5 , pp. 2892-2905
    • Dau, T.1    Kollmeier, B.2    Kohlrausch, A.3
  • 21
    • 23744508888 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • DOI 10.1121/1.1945807
    • T. Chi, P. Ru, and S. Shamma, "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005. (Pubitemid 41129224)
    • (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
    • Chi, T.1    Ru, P.2    Shamma, S.A.3
  • 22
    • 34047272330 scopus 로고    scopus 로고
    • Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
    • DOI 10.1109/TSA.2005.858055
    • N. Mesgarani, M. Slaney, and S. A. Shamma, "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations," IEEE Audio, Speech, Language Process., vol. 14, no. 3, pp. 920-930, 2006. (Pubitemid 46547653)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 920-930
    • Mesgarani, N.1    Slaney, M.2    Shamma, S.A.3
  • 23
    • 84879877798 scopus 로고    scopus 로고
    • Invariant scattering convolution networks
    • Aug
    • J. Bruna and S. Mallat, "Invariant scattering convolution networks," IEEE Trans. Pattern Anal.Mach. Intell., vol. 35, no. 8, pp. 1872-1886, Aug. 2013.
    • (2013) IEEE Trans. Pattern Anal.Mach. Intell , vol.35 , Issue.8 , pp. 1872-1886
    • Bruna, J.1    Mallat, S.2
  • 24
    • 84887372433 scopus 로고    scopus 로고
    • Rotation, scaling and deformation invariant scattering for texture discrimination
    • presented at
    • L. Sifre and S. Mallat, "Rotation, scaling and deformation invariant scattering for texture discrimination," presented at the CVPR, 2013.
    • (2013) The CVPR
    • Sifre, L.1    Mallat, S.2
  • 26
    • 30644477788 scopus 로고    scopus 로고
    • Coherent envelope detection for modulation filtering of speech
    • S. Schimmel and L. Atlas, "Coherent envelope detection for modulation filtering of speech," in Proc. ICASSP, 2005, vol. 1, pp. 221-224.
    • (2005) Proc. ICASSP , vol.1 , pp. 221-224
    • Schimmel, S.1    Atlas, L.2
  • 27
    • 85162342944 scopus 로고    scopus 로고
    • Probabilistic amplitude and frequency demodulation
    • R. Turner and M. Sahani, "Probabilistic amplitude and frequency demodulation," Adv. Neural Inf. Process. Syst., pp. 981-989, 2011.
    • (2011) Adv. Neural Inf. Process. Syst , pp. 981-989
    • Turner, R.1    Sahani, M.2
  • 28
    • 77956538750 scopus 로고    scopus 로고
    • Solving demodulation as an optimization problem
    • Aug
    • G. Sell and M. Slaney, "Solving demodulation as an optimization problem," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 8, pp. 2051-2066, Aug. 2010.
    • (2010) IEEE Trans. Audio, Speech, Language Process , vol.18 , Issue.8 , pp. 2051-2066
    • Sell, G.1    Slaney, M.2
  • 29
    • 84904989784 scopus 로고    scopus 로고
    • Phase Retrieval for the Cauchy wavelet transform
    • [Online], submitted for publication
    • I.Waldspurger and S. Mallat, "Phase Retrieval for the Cauchy wavelet transform," J. Fourier Anal. Appl. [Online]. Available: http://arxiv.org/ abs/1404.1183, submitted for publication
    • J. Fourier Anal. Appl
    • Waldspurger, I.1    Mallat, S.2
  • 30
    • 84864122549 scopus 로고    scopus 로고
    • Unsupervised learning of sparse features for scalable audio classification
    • presented at
    • M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, "Unsupervised learning of sparse features for scalable audio classification," presented at the ISMIR, 2011.
    • (2011) The ISMIR
    • Henaff, M.1    Jarrett, K.2    Kavukcuoglu, K.3    Lecun, Y.4
  • 31
    • 84873444848 scopus 로고    scopus 로고
    • Learning sparse feature representations for music annotation and retrieval
    • presented at
    • J. Nam, J. Herrera, M. Slaney, and J. Smith, "Learning sparse feature representations for music annotation and retrieval," presented at the ISMIR, 2012.
    • (2012) The ISMIR
    • Nam, J.1    Herrera, J.2    Slaney, M.3    Smith, J.4
  • 32
    • 33644513420 scopus 로고    scopus 로고
    • Efficient auditory coding
    • DOI 10.1038/nature04485, PII N04485
    • E. C. Smith and M. S. Lewicki, "Efficient auditory coding," Nature, vol. 439, no. 7079, pp. 978-982, 2006. (Pubitemid 43292416)
    • (2006) Nature , vol.439 , Issue.7079 , pp. 978-982
    • Smith, E.C.1    Lewicki, M.S.2
  • 34
    • 0001050714 scopus 로고
    • An iterative technique for the rectification of observed distributions
    • L. Lucy, "An iterative technique for the rectification of observed distributions," Astron. J., vol. 79, p. 745, 1974.
    • (1974) Astron. J , vol.79 , pp. 745
    • Lucy, L.1
  • 36
    • 84874997021 scopus 로고    scopus 로고
    • Phase recovery, maxcut and complex semidefinite programming
    • I. Waldspurger, A. d'Aspremont, and S. Mallat, "Phase Recovery, Maxcut and Complex Semidefinite Programming," Math. Programm., pp. 1-35, 2013.
    • (2013) Math. Programm , pp. 1-35
    • Waldspurger, I.1    D'Aspremont, A.2    Mallat, S.3
  • 37
    • 0021407831 scopus 로고
    • Signal estimation from modified shorttime Fourier transform
    • Feb
    • D. W. Griffin and J. S. Lim, "Signal estimation from modified shorttime Fourier transform," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236-243, Feb. 1984.
    • (1984) IEEE Trans. Acoust., Speech, Signal Process , vol.32 , Issue.2 , pp. 236-243
    • Griffin, D.W.1    Lim, J.S.2
  • 38
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357-366, Apr. 1980. (Pubitemid 11464930)
    • (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis Steven, B.1    Mermelstein Paul2
  • 39
    • 79955702502 scopus 로고    scopus 로고
    • LIBSVM: A library for support vector machines
    • [Online]
    • C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol. vol. 2, pp. 27:1-27:27, 2011 [Online]. Available: http://www.csie.ntu. edu.tw/~cjlin/libsvm, software available at
    • (2011) ACM Trans. Intell. Syst. Technol , vol.2 , pp. 271-2727
    • Chang, C.-C.1    Lin, C.-J.2
  • 40
    • 0036648502 scopus 로고    scopus 로고
    • Musical genre classification of audio signals
    • DOI 10.1109/TSA.2002.800560, PII 1011092002800560
    • G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293-302, Jul. 2002. (Pubitemid 34950067)
    • (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.5 , pp. 293-302
    • Tzanetakis, G.1    Cook, P.2
  • 41
    • 84873597627 scopus 로고    scopus 로고
    • Multiscale scattering for audio classification
    • Miami, FL, USA, Oct. 24-28
    • J. Andén and S. Mallat, "Multiscale scattering for audio classification," in Proc. ISMIR, Miami, FL, USA, Oct. 24-28, 2011, pp. 657-662.
    • (2011) Proc. ISMIR , pp. 657-662
    • Andén, J.1    Mallat, S.2
  • 42
    • 84890520466 scopus 로고    scopus 로고
    • Representing environmental sounds using the separable scattering transform
    • presented at
    • C. Baugé, M. Lagrange, J. Andén, and S. Mallat, "Representing environmental sounds using the separable scattering transform," presented at the IEEE ICASSP, 2013.
    • (2013) The IEEE ICASSP
    • Baugé, C.1    Lagrange, M.2    Andén, J.3    Mallat, S.4
  • 43
    • 44849141781 scopus 로고    scopus 로고
    • Hierarchical large-margin Gaussianmixture models for phonetic classification
    • H.-A. Chang and J.R. Glass, "Hierarchical large-margin Gaussianmixture models for phonetic classification," in Proc. IEEE ASRU, 2007, pp. 272-277.
    • (2007) Proc. IEEE ASRU , pp. 272-277
    • Chang, H.-A.1    Glass, J.R.2
  • 44
    • 84881526300 scopus 로고    scopus 로고
    • Music genre classification using multiscale scattering and sparse representations
    • presented at
    • X. Chen and P. J. Ramadge, "Music genre classification using multiscale scattering and sparse representations," presented at the CISS, 2013.
    • (2013) The CISS
    • Chen, X.1    Ramadge, P.J.2
  • 46
    • 0024768209 scopus 로고
    • Speaker-independent phone recognition using hidden Markov models
    • Nov
    • K.-F. Lee and H.-W. Hon, "Speaker-independent phone recognition using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 11, pp. 1641-1648, Nov. 1989.
    • (1989) IEEE Trans. Acoust., Speech, Signal Process , vol.37 , Issue.11 , pp. 1641-1648
    • Lee, K.-F.1    Hon, H.-W.2
  • 47
    • 0032639886 scopus 로고    scopus 로고
    • On the use of support vector machines for phonetic classification
    • Feb
    • P. Clarkson and P. J. Moreno, "On the use of support vector machines for phonetic classification," IEEE Trans. Acoust., Speech, Signal Process., vol. 2, pp. 585-588, Feb. 1999.
    • (1999) IEEE Trans. Acoust., Speech, Signal Process , vol.2 , pp. 585-588
    • Clarkson, P.1    Moreno, P.J.2
  • 49
    • 84858975144 scopus 로고    scopus 로고
    • A convex hull approach to sparse representations for exemplarbased speech recognition
    • T. N. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran, and P. Shah, "A convex hull approach to sparse representations for exemplarbased speech recognition," in Proc. IEEE ASRU, 2011, pp. 59-64.
    • (2011) Proc. IEEE ASRU , pp. 59-64
    • Sainath, T.N.1    Nahamoo, D.2    Kanevsky, D.3    Ramabhadran, B.4    Shah, P.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.