SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 31-35

Deep clustering: Discriminative embeddings for segmentation and separation

(4) Hershey, John R a Chen, Zhuo b Le Roux, Jonathan a Watanabe, Shinji a

a MITSUBISHI ELECTRIC RESEARCH LABORATORIES (United States)

b Columbia University ^* (United States)

Author keywords

clustering; deep learning; embedding; speech separation

Indexed keywords

EID: 84973320590 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7471631 Document Type: Conference Paper

Times cited : (1623)

References (33)

1
- 0003684441
- MIT press
- A. S. Bregman, Auditory scene analysis: The perceptual organization of sound. MIT press, 1990
- (1990) Auditory Scene Analysis: The Perceptual Organization of Sound
- Bregman, A.S.¹

2
- 84905284062
- Single-channel speech separation with memory-enhanced recurrent neural networks
- IEEE
- F. Weninger, F. Eyben, and B. Schuller, "Single-channel speech separation with memory-enhanced recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
- Weninger, F.¹ Eyben, F.² Schuller, B.³

3
- 84875678689
- Towards scaling up classificationbased speech separation
- Y. Wang and D. Wang, "Towards scaling up classificationbased speech separation," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 7, 2013
- (2013) IEEE Trans. Audio, Speech, Language Process , vol.21 , Issue.7
- Wang, Y.¹ Wang, D.²

4
- 84870477511
- Exploring monaural features for classification-based speech segregation
- Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 2, 2013
- (2013) IEEE Trans. Audio, Speech, Language Process , vol.21 , Issue.2
- Wang, Y.¹ Han, K.² Wang, D.³

5
- 69249222720
- Super-human multi-talker speech recognition: A graphical modeling approach
- J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson, "Super-human multi-talker speech recognition: A graphical modeling approach," Comput. Speech Lang., vol. 24, no. 1, 2010
- (2010) Comput. Speech Lang. , vol.24 , Issue.1
- Hershey, J.R.¹ Rennie, S.J.² Olsen, P.A.³ Kristjansson, T.T.⁴

6
- 84973327529
- Joint optimization of masks and deep recurrent neural networks for monaural source separation
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Joint optimization of masks and deep recurrent neural networks for monaural source separation," arXiv preprint arXiv:1502. 04149, 2015
- (2015) ArXiv Preprint arXiv:1502. 04149
- Huang, P.-S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

7
- 44849140301
- Speech recognition using factorial hidden markov models for separation in the feature space
- Pittsburgh
- T. Virtanen, "Speech recognition using factorial hidden markov models for separation in the feature space," in Proc. Interspeech 2006, Pittsburgh, 2006
- (2006) Proc. Interspeech 2006
- Virtanen, T.¹

8
- 85032751986
- Single-channel multitalker speech recognition
- S. J. Rennie, J. R. Hershey, and P. A. Olsen, "Single-channel multitalker speech recognition," IEEE Signal Process. Mag., vol. 27, no. 6, 2010
- (2010) IEEE Signal Process. Mag. , vol.27 , Issue.6
- Rennie, S.J.¹ Hershey, J.R.² Olsen, P.A.³

9
- 69249202377
- Monaural speech separation and recognition challenge
- M. Cooke, J. R. Hershey, and S. J. Rennie, "Monaural speech separation and recognition challenge," Computer Speech & Language, vol. 24, no. 1, 2010
- (2010) Computer Speech & Language , vol.24 , Issue.1
- Cooke, M.¹ Hershey, J.R.² Rennie, S.J.³

10
- 79953672082
- Ph. D. dissertation, Columbia University
- R. J. Weiss, "Underdetermined source separation using speaker subspace models," Ph. D. dissertation, Columbia University, 2009
- (2009) Underdetermined Source Separation Using Speaker Subspace Models
- Weiss, R.J.¹

11
- 84973327523
- Speech enhancement with lstm recurrent neural networks and its application to noiserobust asr
- Springer
- F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J. R. Hershey, and B. Schuller, "Speech enhancement with lstm recurrent neural networks and its application to noiserobust asr," in Latent Variable Analysis and Signal Separation. Springer, 2015
- (2015) Latent Variable Analysis and Signal Separation
- Weninger, F.¹ Erdogan, H.² Watanabe, S.³ Vincent, E.⁴ Le Roux, J.⁵ Hershey, J.R.⁶ Schuller, B.⁷

12
- 84921740463
- On training targets for supervised speech separation
- Y. Wang, A. Narayanan, and D. Wang, "On training targets for supervised speech separation," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 12, 2014
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.12
- Wang, Y.¹ Narayanan, A.² Wang, D.³

13
- 84889257121
- An experimental study on speech enhancement based on deep neural networks
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "An experimental study on speech enhancement based on deep neural networks," Signal Processing Letters, IEEE, vol. 21, no. 1, 2014
- (2014) Signal Processing Letters, IEEE , vol.21 , Issue.1
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

14
- 0003479143
- Ph. D. dissertation, Univ. of Sheffield
- M. P. Cooke, "Modelling auditory processing and organisation," Ph. D. dissertation, Univ. of Sheffield, 1991
- (1991) Modelling Auditory Processing and Organisation
- Cooke, M.P.¹

15
- 0003794341
- Ph. D. dissertation, MIT
- D. P. W. Ellis, "Prediction-driven computational auditory scene analysis," Ph. D. dissertation, MIT, 1996
- (1996) Prediction-driven Computational Auditory Scene Analysis
- Ellis, D.P.W.¹

16
- 33749317042
- Learning spectral clustering, with application to speech separation
- F. R. Bach and M. I. Jordan, "Learning spectral clustering, with application to speech separation," JMLR, vol. 7, 2006
- (2006) JMLR , vol.7
- Bach, F.R.¹ Jordan, M.I.²

17
- 0003131192
- Laws of organization in perceptual forms
- W. A. Ellis, Ed. Routledge and Kegan Paul
- M. Wertheimer, "Laws of organization in perceptual forms," in A Source book of Gestalt psychology, W. A. Ellis, Ed. Routledge and Kegan Paul, 1938
- (1938) A Source Book of Gestalt Psychology
- Wertheimer, M.¹

18
- 84867946385
- An unsupervised approach to cochannel speech separation
- K. Hu and D. Wang, "An unsupervised approach to cochannel speech separation," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 1, 2013
- (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , Issue.1
- Hu, K.¹ Wang, D.²

19
- 0034244751
- Normalized cuts and image segmentation
- J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Trans. PAMI, vol. 22, no. 8, 2000
- (2000) IEEE Trans. PAMI , vol.22 , Issue.8
- Shi, J.¹ Malik, J.²

20
- 84973282227
- Learning deep representations for graph clustering
- F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, "Learning deep representations for graph clustering," in Proc. AAAI, 2014
- (2014) Proc. AAAI
- Tian, F.¹ Gao, B.² Cui, Q.³ Chen, E.⁴ Liu, T.-Y.⁵

21
- 84919933352
- Deep embedding network for clustering
- P. Huang, Y. Huang,W. Wang, and L. Wang, "Deep embedding network for clustering," in Proc. ICPR, 2014
- (2014) Proc. ICPR
- Huang, P.¹ Huang, Y.² Wang, W.³ Wang, L.⁴

22
- 0742286179
- Spectral grouping using the nyström method
- C. Fowlkes, S. Belongie, F. Chung, and J. Malik, "Spectral grouping using the nyström method," IEEE Trans. PAMI, vol. 26, no. 2, 2004
- (2004) IEEE Trans. PAMI , vol.26 , Issue.2
- Fowlkes, C.¹ Belongie, S.² Chung, F.³ Malik, J.⁴

23
- 84868302138
- Local equivalences of distances between clusteringsa geometric perspective
- M. Meila, "Local equivalences of distances between clusteringsa geometric perspective," Machine Learning, vol. 86, no. 3, 2012
- (2012) Machine Learning , vol.86 , Issue.3
- Meila, M.¹

24
- 0000008146
- Comparing partitions
- L. Hubert and P. Arabie, "Comparing partitions," Journal of classification, vol. 2, no. 1, 1985
- (1985) Journal of Classification , vol.2 , Issue.1
- Hubert, L.¹ Arabie, P.²

25
- 84973353119
- University of Washington Department of Statistics, Technical Report
- M. Meila, "The stability of a good clustering," University of Washington Department of Statistics, vol. Technical Report 624, 2014. [Online]. Available: http://www. stat. washington. edu/research/reports/2014/tr624. pdf
- (2014) The Stability of A Good Clustering , vol.624
- Meila, M.¹

26
- 84973394876
- Deep clustering: Discriminative embeddings for segmentation and separation
- J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, "Deep clustering: Discriminative embeddings for segmentation and separation," Sep. 2015, arXiv:1508. 04306. [Online]. Available: http://arxiv. org/abs/1508. 04306
- (2015) Sep. ArXiv:1508. 04306
- Hershey, J.R.¹ Chen, Z.² Roux, J.L.³ Watanabe, S.⁴

27
- 38049021850
- Convolutive speech bases and their application to supervised speech separation
- P. Smaragdis, "Convolutive speech bases and their application to supervised speech separation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 1, 2007
- (2007) IEEE Trans. Audio, Speech, Language Process. , vol.15 , Issue.1
- Smaragdis, P.¹

28
- 84942683436
- Sparse NMF-half-baked or well done?
- TR2015-023, Mar. Cambridge, MA, USA, Tech. Rep.
- J. Le Roux, F. J. Weninger, and J. R. Hershey, "Sparse NMF-half-baked or well done" MERL, Cambridge, MA, USA, Tech. Rep. TR2015-023, Mar. 2015
- (2015) MERL
- Le Roux, J.¹ Weninger, F.J.² Hershey, J.R.³

29
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- Springer
- D. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech separation by humans and machines. Springer, 2005
- (2005) Speech Separation by Humans and Machines
- Wang, D.¹

30
- 33744975847
- Performance measurement in blind audio source separation
- E. Vincent, R. Gribonval, and C. Févotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, 2006
- (2006) IEEE Trans. Audio, Speech, Language Process , vol.14 , Issue.4
- Vincent, E.¹ Gribonval, R.² Févotte, C.³

31
- 84887031960
- An iterative model-based approach to cochannel speech separation
- K. Hu and D. Wang, "An iterative model-based approach to cochannel speech separation," EURASIP Journal on Audio, Speech, and Music Processing, vol. 2013, no. 1, 2013
- (2013) EURASIP Journal on Audio, Speech, and Music Processing , vol.2013 , Issue.1
- Hu, K.¹ Wang, D.²

32
- 84876258641
- Learning hierarchical features for scene labeling
- C. Farabet, C. Couprie, L. Najman, and Y. LeCun, "Learning hierarchical features for scene labeling," IEEE Trans. PAMI, vol. 35, no. 8, 2013
- (2013) IEEE Trans. PAMI , vol.35 , Issue.8
- Farabet, C.¹ Couprie, C.² Najman, L.³ LeCun, Y.⁴

33
- 84937933229
- Recursive context propagation network for semantic scene labeling
- A. Sharma, O. Tuzel, and M.-Y. Liu, "Recursive context propagation network for semantic scene labeling," in Proc. NIPS, 2014.
- (2014) Proc. NIPS
- Sharma, A.¹ Tuzel, O.² Liu, M.-Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.