SCOPUS 정보 검색 플랫폼

IEEE Transactions on Signal Processing

Volumn 62, Issue 16, 2014, Pages 4114-4128

Deep scattering spectrum

(2) Andén, Joakim a,b Mallat, Stéphane c

a CENTRE DE MATHÉMATIQUES APPLIQUÉES (France)

b PRINCETON UNIVERSITY (United States)

c ECOLE NORMALE SUPÉRIEURE (France)

Author keywords

Audio classification; deep neural networks; MFCC; modulation spectrum; wavelets

Indexed keywords

MODULATION;

AUDIO CLASSIFICATION; DEEP NEURAL NETWORKS; MFCC; MODULATION SPECTRUM; WAVELETS;

AUDIO ACOUSTICS;

EID: 84896734479 PISSN: 1053587X EISSN: None Source Type: Journal
DOI: 10.1109/TSP.2014.2326991 Document Type: Article

Times cited : (490)

References (50)

1
- 84886483484
- Scattering transform for intrapartum fetal heart rate characterization and acidosis detection
- presented at
- V. Chudáček, J. Andén, S. Mallat, P. Abry, and M. Doret, "Scattering transform for intrapartum fetal heart rate characterization and acidosis detection," presented at the IEEE Int. Conf. Eng. Med. Biol. Soc., 2013.
- (2013) The IEEE Int. Conf. Eng. Med. Biol. Soc
- Chudáček, V.¹ Andén, J.² Mallat, S.³ Abry, P.⁴ Doret, M.⁵

2
- 0030638046
- The modulation spectrum in the automatic recognition of speech
- H. Hermansky, "The modulation spectrum in the automatic recognition of speech," in Proc. IEEE Autom. Speech Recognit. Understanding Workshop, 1997, pp. 140-147.
- (1997) Proc. IEEE Autom. Speech Recognit. Understanding Workshop , pp. 140-147
- Hermansky, H.¹

3
- 0034842487
- Scalable and progressive audio codec
- M. S. Vinton and L. E. Atlas, "Scalable and progressive audio codec," in Proc. 2001 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP'01), 2001, vol. 5, pp. 3277-3280.
- (2001) Proc. 2001 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP'01) , vol.5 , pp. 3277-3280
- Vinton, M.S.¹ Atlas, L.E.²

4
- 80052406394
- Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
- J. McDermott and E. Simoncelli, "Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis," Neuron, vol. 71, no. 5, pp. 926-940, 2011.
- (2011) Neuron , vol.71 , Issue.5 , pp. 926-940
- McDermott, J.¹ Simoncelli, E.²

5
- 80051605384
- Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection
- M. Ramona and G. Peeters, "Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection," in Proc. IEEE ICASSP, 2011, pp. 477-480.
- (2011) Proc. IEEE ICASSP , pp. 477-480
- Ramona, M.¹ Peeters, G.²

6
- 0009634522
- Hoboken, NJ, USA: Wiley
- M. Slaney and R. Lyon, M. Cooke, S. Beet, and M. Crawford, Eds., Visual Representations of Speech Signals. Hoboken, NJ, USA: Wiley, 1993, pp. 95-116.
- (1993) Visual Representations of Speech Signals , pp. 95-116
- Slaney, M.¹ Lyon, R.² Cooke, M.³ Beet, S.⁴ Crawford, M.⁵

7
- 0034227088
- Auditory images: How complex sounds are represented in the auditory system
- R. D. Patterson, "Auditory images: How complex sounds are represented in the auditory system," J. Acoust. Soc. Japan (E), vol. 21, no. 4, pp. 183-190, 2000.
- (2000) J. Acoust. Soc. Japan (E) , vol.21 , Issue.4 , pp. 183-190
- Patterson, R.D.¹

8
- 67349176070
- Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features
- Jun
- C. Lee, J. Shih, K. Yu, and H. Lin, "Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features," IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, Jun. 2009.
- (2009) IEEE Trans. Multimedia , vol.11 , Issue.4 , pp. 670-682
- Lee, C.¹ Shih, J.² Yu, K.³ Lin, H.⁴

9
- 80051659405
- Classifying soundtracks with audio texture features
- Prague, Czech Republic, May 22-27
- D. Ellis, X. Zeng, and J. McDermott, "Classifying soundtracks with audio texture features," in Proc. IEEE ICASSP, Prague, Czech Republic, May 22-27, 2011, pp. 5880-5883.
- (2011) Proc. IEEE ICASSP , pp. 5880-5883
- Ellis, D.¹ Zeng, X.² McDermott, J.³

10
- 0141520589
- A non-uniform modulation transform for audio coding with increased time resolution
- J. K. Thompson and L. E. Atlas, "A non-uniform modulation transform for audio coding with increased time resolution," in Proc. 2003 IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP'03), 2003, vol. 5, pp. V-397.
- (2003) Proc. 2003 IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP'03) , vol.5
- Thompson, J.K.¹ Atlas, L.E.²

11
- 84864324516
- Group invariant scattering
- S. Mallat, "Group invariant scattering," Commun. Pure Appl. Math., vol. 65, no. 10, pp. 1331-1398, 2012.
- (2012) Commun. Pure Appl. Math , vol.65 , Issue.10 , pp. 1331-1398
- Mallat, S.¹

12
- 77955998889
- Convolutional networks and applications in vision
- presented at
- Y. LeCun, K. Kavukvuoglu, and C. Farabet, "Convolutional networks and applications in vision," presented at the IEEE ISCAS, 2010.
- (2010) The IEEE ISCAS
- Lecun, Y.¹ Kavukvuoglu, K.² Farabet, C.³

13
- 84863380535
- Unsupervised feature learning for audio classification using convolutional deep belief networks
- presented at
- H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," presented at the NIPS, 2009.
- (2009) The NIPS
- Lee, H.¹ Pham, P.² Largman, Y.³ Ng, A.⁴

14
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Dec
- G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Dec. 2012.
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹

15
- 84890545163
- A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
- presented at
- L. Deng, O. Abdel-Hamid, and D. Yu, "A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion," presented at the ICASSP, 2013.
- (2013) The ICASSP
- Deng, L.¹ Abdel-Hamid, O.² Yu, D.³

16
- 84890543083
- Speech recognition with deep recurrent neural networks
- presented at
- A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," presented at the ICASSP, 2013.
- (2013) The ICASSP
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

17
- 84867602860
- Learning a robust tonnetz- space transform for automatic chord recognition
- E. J. Humphrey, T. Cho, and J. P. Bello, "Learning a robust tonnetz- space transform for automatic chord recognition," in Proc. IEEE ICASSP, 2012, pp. 453-456.
- (2012) Proc. IEEE ICASSP , pp. 453-456
- Humphrey, E.J.¹ Cho, T.² Bello, J.P.³

18
- 84873584268
- Learning features from music audio with deep belief networks
- presented at
- P. Hamel and D. Eck, "Learning features from music audio with deep belief networks," presented at the ISMIR, 2010.
- (2010) The ISMIR
- Hamel, P.¹ Eck, D.²

19
- 84873426072
- Analyzing drum patterns using conditional deep belief networks
- presented at
- E. Battenberg and D. Wessel, "Analyzing drum patterns using conditional deep belief networks," presented at the ISMIR, 2012.
- (2012) The ISMIR
- Battenberg, E.¹ Wessel, D.²

20
- 0030691985
- Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
- DOI 10.1121/1.420344
- T. Dau, B. Kollmeier, and A. Kohlrausch, "Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers," J. Acoust. Soc. Amer., vol. 102, no. 5, pp. 2892-2905, 1997. (Pubitemid 27486267)
- (1997) Journal of the Acoustical Society of America , vol.102 , Issue.5 , pp. 2892-2905
- Dau, T.¹ Kollmeier, B.² Kohlrausch, A.³

21
- 23744508888
- Multiresolution spectrotemporal analysis of complex sounds
- DOI 10.1121/1.1945807
- T. Chi, P. Ru, and S. Shamma, "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005. (Pubitemid 41129224)
- (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
- Chi, T.¹ Ru, P.² Shamma, S.A.³

22
- 34047272330
- Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
- DOI 10.1109/TSA.2005.858055
- N. Mesgarani, M. Slaney, and S. A. Shamma, "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations," IEEE Audio, Speech, Language Process., vol. 14, no. 3, pp. 920-930, 2006. (Pubitemid 46547653)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 920-930
- Mesgarani, N.¹ Slaney, M.² Shamma, S.A.³

23
- 84879877798
- Invariant scattering convolution networks
- Aug
- J. Bruna and S. Mallat, "Invariant scattering convolution networks," IEEE Trans. Pattern Anal.Mach. Intell., vol. 35, no. 8, pp. 1872-1886, Aug. 2013.
- (2013) IEEE Trans. Pattern Anal.Mach. Intell , vol.35 , Issue.8 , pp. 1872-1886
- Bruna, J.¹ Mallat, S.²

24
- 84887372433
- Rotation, scaling and deformation invariant scattering for texture discrimination
- presented at
- L. Sifre and S. Mallat, "Rotation, scaling and deformation invariant scattering for texture discrimination," presented at the CVPR, 2013.
- (2013) The CVPR
- Sifre, L.¹ Mallat, S.²

25
- 0003456805
- NewYork, NY, USA: Academic Press
- S. Mallat, AWavelet Tour of Signal Processing. NewYork, NY, USA: Academic Press, 1999.
- (1999) AWavelet Tour of Signal Processing
- Mallat, S.¹

26
- 30644477788
- Coherent envelope detection for modulation filtering of speech
- S. Schimmel and L. Atlas, "Coherent envelope detection for modulation filtering of speech," in Proc. ICASSP, 2005, vol. 1, pp. 221-224.
- (2005) Proc. ICASSP , vol.1 , pp. 221-224
- Schimmel, S.¹ Atlas, L.²

27
- 85162342944
- Probabilistic amplitude and frequency demodulation
- R. Turner and M. Sahani, "Probabilistic amplitude and frequency demodulation," Adv. Neural Inf. Process. Syst., pp. 981-989, 2011.
- (2011) Adv. Neural Inf. Process. Syst , pp. 981-989
- Turner, R.¹ Sahani, M.²

28
- 77956538750
- Solving demodulation as an optimization problem
- Aug
- G. Sell and M. Slaney, "Solving demodulation as an optimization problem," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 8, pp. 2051-2066, Aug. 2010.
- (2010) IEEE Trans. Audio, Speech, Language Process , vol.18 , Issue.8 , pp. 2051-2066
- Sell, G.¹ Slaney, M.²

29
- 84904989784
- Phase Retrieval for the Cauchy wavelet transform
- [Online], submitted for publication
- I.Waldspurger and S. Mallat, "Phase Retrieval for the Cauchy wavelet transform," J. Fourier Anal. Appl. [Online]. Available: http://arxiv.org/ abs/1404.1183, submitted for publication
- J. Fourier Anal. Appl
- Waldspurger, I.¹ Mallat, S.²

30
- 84864122549
- Unsupervised learning of sparse features for scalable audio classification
- presented at
- M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, "Unsupervised learning of sparse features for scalable audio classification," presented at the ISMIR, 2011.
- (2011) The ISMIR
- Henaff, M.¹ Jarrett, K.² Kavukcuoglu, K.³ Lecun, Y.⁴

31
- 84873444848
- Learning sparse feature representations for music annotation and retrieval
- presented at
- J. Nam, J. Herrera, M. Slaney, and J. Smith, "Learning sparse feature representations for music annotation and retrieval," presented at the ISMIR, 2012.
- (2012) The ISMIR
- Nam, J.¹ Herrera, J.² Slaney, M.³ Smith, J.⁴

32
- 33644513420
- Efficient auditory coding
- DOI 10.1038/nature04485, PII N04485
- E. C. Smith and M. S. Lewicki, "Efficient auditory coding," Nature, vol. 439, no. 7079, pp. 978-982, 2006. (Pubitemid 43292416)
- (2006) Nature , vol.439 , Issue.7079 , pp. 978-982
- Smith, E.C.¹ Lewicki, M.S.²

33
- 0002538142
- The DARPA speech recognition research database: Specifications and status
- W. Fisher, G. Doddington, and K. Goudie-Marshall, "The DARPA speech recognition research database: Specifications and status," in Proc. DARPA Workshop Speech Recognit., 1986, pp. 93-99.
- (1986) Proc. DARPA Workshop Speech Recognit , pp. 93-99
- Fisher, W.¹ Doddington, G.² Goudie-Marshall, K.³

34
- 0001050714
- An iterative technique for the rectification of observed distributions
- L. Lucy, "An iterative technique for the rectification of observed distributions," Astron. J., vol. 79, p. 745, 1974.
- (1974) Astron. J , vol.79 , pp. 745
- Lucy, L.¹

35
- 84875874870
- Phase retrieval via matrix completion
- E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, "Phase retrieval via matrix completion," SIAM J. Imaging Sci., vol. 6, no. 1, pp. 199-225, 2013.
- (2013) SIAM J. Imaging Sci , vol.6 , Issue.1 , pp. 199-225
- Candès, E.J.¹ Eldar, Y.C.² Strohmer, T.³ Voroninski, V.⁴

36
- 84874997021
- Phase recovery, maxcut and complex semidefinite programming
- I. Waldspurger, A. d'Aspremont, and S. Mallat, "Phase Recovery, Maxcut and Complex Semidefinite Programming," Math. Programm., pp. 1-35, 2013.
- (2013) Math. Programm , pp. 1-35
- Waldspurger, I.¹ D'Aspremont, A.² Mallat, S.³

37
- 0021407831
- Signal estimation from modified shorttime Fourier transform
- Feb
- D. W. Griffin and J. S. Lim, "Signal estimation from modified shorttime Fourier transform," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236-243, Feb. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process , vol.32 , Issue.2 , pp. 236-243
- Griffin, D.W.¹ Lim, J.S.²

38
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357-366, Apr. 1980. (Pubitemid 11464930)
- (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
- Davis Steven, B.¹ Mermelstein Paul²

39
- 79955702502
- LIBSVM: A library for support vector machines
- [Online]
- C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol. vol. 2, pp. 27:1-27:27, 2011 [Online]. Available: http://www.csie.ntu. edu.tw/~cjlin/libsvm, software available at
- (2011) ACM Trans. Intell. Syst. Technol , vol.2 , pp. 271-2727
- Chang, C.-C.¹ Lin, C.-J.²

40
- 0036648502
- Musical genre classification of audio signals
- DOI 10.1109/TSA.2002.800560, PII 1011092002800560
- G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293-302, Jul. 2002. (Pubitemid 34950067)
- (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.5 , pp. 293-302
- Tzanetakis, G.¹ Cook, P.²

41
- 84873597627
- Multiscale scattering for audio classification
- Miami, FL, USA, Oct. 24-28
- J. Andén and S. Mallat, "Multiscale scattering for audio classification," in Proc. ISMIR, Miami, FL, USA, Oct. 24-28, 2011, pp. 657-662.
- (2011) Proc. ISMIR , pp. 657-662
- Andén, J.¹ Mallat, S.²

42
- 84890520466
- Representing environmental sounds using the separable scattering transform
- presented at
- C. Baugé, M. Lagrange, J. Andén, and S. Mallat, "Representing environmental sounds using the separable scattering transform," presented at the IEEE ICASSP, 2013.
- (2013) The IEEE ICASSP
- Baugé, C.¹ Lagrange, M.² Andén, J.³ Mallat, S.⁴

43
- 44849141781
- Hierarchical large-margin Gaussianmixture models for phonetic classification
- H.-A. Chang and J.R. Glass, "Hierarchical large-margin Gaussianmixture models for phonetic classification," in Proc. IEEE ASRU, 2007, pp. 272-277.
- (2007) Proc. IEEE ASRU , pp. 272-277
- Chang, H.-A.¹ Glass, J.R.²

44
- 84881526300
- Music genre classification using multiscale scattering and sparse representations
- presented at
- X. Chen and P. J. Ramadge, "Music genre classification using multiscale scattering and sparse representations," presented at the CISS, 2013.
- (2013) The CISS
- Chen, X.¹ Ramadge, P.J.²

45
- 84870497334
- An analysis of the GTZANmusic genre dataset
- B. L. Sturm, "An analysis of the GTZANmusic genre dataset," in Proc. 2nd Int. ACM Workshop Music Inf. Retrieval With User-Centered Multimodal Strategies, 2012, pp. 7-12.
- (2012) Proc. 2nd Int. ACM Workshop Music Inf. Retrieval with User-Centered Multimodal Strategies , pp. 7-12
- Sturm, B.L.¹

46
- 0024768209
- Speaker-independent phone recognition using hidden Markov models
- Nov
- K.-F. Lee and H.-W. Hon, "Speaker-independent phone recognition using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 11, pp. 1641-1648, Nov. 1989.
- (1989) IEEE Trans. Acoust., Speech, Signal Process , vol.37 , Issue.11 , pp. 1641-1648
- Lee, K.-F.¹ Hon, H.-W.²

47
- 0032639886
- On the use of support vector machines for phonetic classification
- Feb
- P. Clarkson and P. J. Moreno, "On the use of support vector machines for phonetic classification," IEEE Trans. Acoust., Speech, Signal Process., vol. 2, pp. 585-588, Feb. 1999.
- (1999) IEEE Trans. Acoust., Speech, Signal Process , vol.2 , pp. 585-588
- Clarkson, P.¹ Moreno, P.J.²

48
- 0003877861
- Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, USA
- A. K. Halberstadt, "Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition," Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, USA, 1998.
- (1998) Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition
- Halberstadt, A.K.¹

49
- 84858975144
- A convex hull approach to sparse representations for exemplarbased speech recognition
- T. N. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran, and P. Shah, "A convex hull approach to sparse representations for exemplarbased speech recognition," in Proc. IEEE ASRU, 2011, pp. 59-64.
- (2011) Proc. IEEE ASRU , pp. 59-64
- Sainath, T.N.¹ Nahamoo, D.² Kanevsky, D.³ Ramabhadran, B.⁴ Shah, P.⁵

50
- 84904989782
- Ph.D. dissertation, Ecole Polytechnique, Palaiseau, France
- J. Andén, "Time and Frequency Scattering for Audio Classification," Ph.D. dissertation, Ecole Polytechnique, Palaiseau, France, 2014.
- (2014) Time and Frequency Scattering for Audio Classification
- Andén, J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.