SCOPUS 정보 검색 플랫폼

Volumn 22, Issue 11, 2011, Pages 1744-1756

Learning speaker-specific characteristics with a deep neural architecture

a UNIVERSITY OF MANCHESTER (United Kingdom)

Author keywords

Deep neural architecture; hybrid learning strategy; overcomplete representation; speaker comparison; speaker segmentation; speaker verification; speaker specific characteristics

Indexed keywords

HYBRID LEARNING; NEURAL ARCHITECTURES; OVERCOMPLETE REPRESENTATIONS; SPEAKER COMPARISON; SPEAKER SEGMENTATIONS; SPEAKER VERIFICATION; SPEAKER-SPECIFIC CHARACTERISTICS;

ARCHITECTURE; LINGUISTICS; NETWORK ARCHITECTURE; NEURAL NETWORKS; SPEECH PROCESSING;

SPEECH RECOGNITION;

ALGORITHM; ARTICLE; ARTIFICIAL INTELLIGENCE; ARTIFICIAL NEURAL NETWORK; AUTOMATIC SPEECH RECOGNITION; BIOLOGICAL MODEL; COMPUTER SYSTEM; DISCRIMINATION LEARNING; HUMAN; LANGUAGE; NORMAL DISTRIBUTION;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; COMPUTER SYSTEMS; DISCRIMINATION LEARNING; HUMANS; LANGUAGE; MODELS, NEUROLOGICAL; NEURAL NETWORKS (COMPUTER); NORMAL DISTRIBUTION; SPEECH RECOGNITION SOFTWARE;

EID: 80455143732 PISSN: 10459227 EISSN: None Source Type: Journal
DOI: 10.1109/TNN.2011.2167240 Document Type: Article

Times cited : (108)

References (41)

1
- 0031233424
- Speaker recognition: A tutorial
- Sep
- J. Campbell, "Speaker recognition: A tutorial," IEEE Proc., vol. 85, no. 8, pp. 1437-1462, Sep. 1997.
- (1997) IEEE Proc. , vol.85 , Issue.8 , pp. 1437-1462
- Campbell, J.¹

2
- 85032751904
- Forensic speaker recognition
- Mar
- J. P. Campbell, W. Shen, W. M. Campbell, R. Schwartz, J. F. Bonastre, and D. Matrouf, "Forensic speaker recognition," IEEE Signal Proc. Mag., vol. 26, no. 2, pp. 95-103, Mar. 2009.
- (2009) IEEE Signal Proc. Mag. , vol.26 , Issue.2 , pp. 95-103
- Campbell, J.P.¹ Shen, W.² Campbell, W.M.³ Schwartz, R.⁴ Bonastre, J.F.⁵ Matrouf, D.⁶

3
- 0004056285
- Englewood Cliffs, NJ: Prentice Hall
- X. Huang, A. Acero, and H. Hon, Spoken Language Processing. Englewood Cliffs, NJ: Prentice Hall, 2001.
- (2001) Spoken Language Processing
- Huang, X.¹ Acero, A.² Hon, H.³

4
- 36248934935
- Ph.D. thesis, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA
- Q. Jin, "Robust speaker recognition," Ph.D. thesis, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, 2007.
- (2007) Robust Speaker Recognition
- Jin, Q.¹

5
- 0030247355
- Robust speaker recognition: A feature-based approach
- Sep
- R. Mammone, X. Zhang, and R. Ramachandran, "Robust speaker recognition: A feature-based approach," IEEE Signal Process. Mag., vol. 13, no. 5, pp. 1-58, Sep. 1996.
- (1996) IEEE Signal Process. Mag. , vol.13 , Issue.5 , pp. 1-58
- Mammone, R.¹ Zhang, X.² Ramachandran, R.³

6
- 0029355999
- Speaker Identification and verification using Gaussian mixture speaker models
- Aug
- D. A. Reynolds, "Speaker Identification and verification using Gaussian mixture speaker models," Speech Commun., vol. 17, nos. 1-2, pp. 91-108, Aug. 1995.
- (1995) Speech Commun. , vol.17 , Issue.1-2 , pp. 91-108
- Reynolds, D.A.¹

7
- 0001091769
- Speaker Identification and verification using Gaussian mixture speaker models
- Jan
- D. A. Reynolds, "Speaker Identification and verification using Gaussian mixture speaker models," MIT Lincoln Lab. J., vol. 8, pp. 173-191, Jan. 1995.
- (1995) MIT Lincoln Lab.J. , vol.8 , pp. 173-191
- Reynolds, D.A.¹

8
- 29044445345
- Ph.D. thesis, School Sci. Eng., Oregon Graduate Institute Sci. Eng., Portland, OR
- S. Kajarekar, "Analysis of variability in speech with applications to speech and speaker recognition," Ph.D. thesis, School Sci. Eng., Oregon Graduate Institute Sci. Eng., Portland, OR, 2002.
- (2002) Analysis of Variability in Speech with Applications to Speech and Speaker Recognition
- Kajarekar, S.¹

9
- 79953810111
- Iterative gaussianization: From ICA to random rotations
- Apr
- V. Laparra, G. Camps-Valls, and J. Malo, "Iterative gaussianization: From ICA to random rotations," IEEE Trans. Neural Netw., vol. 22, no. 4, pp. 537-549, Apr. 2011.
- (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.4 , pp. 537-549
- Laparra, V.¹ Camps-Valls, G.² Malo, J.³

10
- 34547975052
- Scaling learning algorithms toward AI
- Cambridge, MA: MIT Press
- Y. Bengio and Y. LeCun, "Scaling learning algorithms toward AI," in Large Scale Kernel Machine. Cambridge, MA: MIT Press, 2007, pp. 321-360.
- (2007) Large Scale Kernel Machine , pp. 321-360
- Bengio, Y.¹ Le Cun, Y.²

11
- 69349090197
- Learning deep architectures for AI
- Y. Bengio, "Learning deep architectures for AI," Foundations Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
- (2009) Foundations Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

12
- 35348818718
- Learning multiple layers of representation
- Oct
- G. E. Hinton, "Learning multiple layers of representation," Trends Cogn. Sci., vol. 11, no. 10, pp. 428-434, Oct. 2007.
- (2007) Trends Cogn. Sci. , vol.11 , Issue.10 , pp. 428-434
- Hinton, G.E.¹

13
- 33746600649
- Reducing the dimensionality of data with neural networks
- Jul
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, Jul. 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

14
- 0032203257
- Gradient based learning applied to document recognition
- Nov
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient based learning applied to document recognition," IEEE Proc., vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
- (1998) IEEE Proc. , vol.86 , Issue.11 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

15
- 33745805403
- A fast learning algorithm for deep belief nets
- Jul
- G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, Jul. 2006.
- (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.W.³

16
- 84864073449
- Greedy layerwise training of deep networks
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layerwise training of deep networks," in Proc. Adv. Neural Inform. Process. Syst., vol. 19. 2007, pp. 1-8.
- (2007) Proc. Adv. Neural Inform. Process. Syst. , vol.19 , pp. 1-8
- Bengio, Y.¹ Lamblin, P.² Popovici, D.³ Larochelle, H.⁴

17
- 77953183471
- What is the best multi-stage architecture for object recognition?
- -Oct
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?," in Proc. IEEE 12th Int. Conf. Comput. Vision, Kyoto, Japan, Sep.-Oct. 2009, pp. 2146-2153.
- (2009) Proc. IEEE 12th Int. Conf. Comput. Vision, Kyoto, Japan, Sep. , pp. 2146-2153
- Jarrett, K.¹ Kavukcuoglu, K.² Ranzato, M.³ LeCun, Y.⁴

18
- 59449087310
- Exploring strategies for training deep neural networks
- Jan
- H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring strategies for training deep neural networks," J. Mach. Learn. Res., vol. 10, pp. 1-40, Jan. 2009.
- (2009) J. Mach. Learn. Res. , vol.10 , pp. 1-40
- Larochelle, H.¹ Bengio, Y.² Louradour, J.³ Lamblin, P.⁴

19
- 77949522811
- Why does unsupervised pre-training help deep learning?
- Feb
- D. Erhan, Y. Bengio, A. Courville, P. Manzagol, and P. Vincent, "Why does unsupervised pre-training help deep learning?," J. Mach. Learn. Res., vol. 11, pp. 625-660, Feb. 2010.
- (2010) J. Mach. Learn. Res. , vol.11 , pp. 625-660
- Erhan, D.¹ Bengio, Y.² Courville, A.³ Manzagol, P.⁴ Vincent, P.⁵

20
- 80455158726
- Learning a non-linear embedding by preserving class neighbourhood structure
- R. Salakhutdinov and G. Hinton, "Learning a non-linear embedding by preserving class neighbourhood structure," in Proc. Art. Intell. Statis., vol. 2. 2007, pp. 412-419.
- (2007) Proc. Art. Intell. Statis. , vol.2 , pp. 412-419
- Salakhutdinov, R.¹ Hinton, G.²

21
- 34249661090
- Synergistic face detection and pose estimation with energy-based models
- May
- M. Osadchy, Y. LeCun, and M. Miller, "Synergistic face detection and pose estimation with energy-based models," J.Mach. Learn. Res., vol. 8, pp. 1197-1215, May 2007.
- (2007) J.Mach. Learn. Res. , vol.8 , pp. 1197-1215
- Osadchy, M.¹ Le Cun, Y.² Miller, M.³

22
- 24644436425
- Learning a similarity metric discriminatively, with application to face verification
- Jun
- S. Chopra, R. Hadsell, and Y. LeCun, "Learning a similarity metric discriminatively, with application to face verification," in Proc. IEEE Computr. Soc. Conf. Comput. Vision Pattern Recognit., vol. 1. Jun. 2005, pp. 539-546.
- (2005) Proc. IEEE Computr. Soc. Conf. Comput. Vision Pattern Recognit. , vol.1 , pp. 539-546
- Chopra, S.¹ Hadsell, R.² Le Cun, Y.³

23
- 5044231640
- Learning methods for generic object recognition with invariance to pose and lighting
- Jul
- Y. LeCun, F. J. Huang, and L. Bottou, "Learning methods for generic object recognition with invariance to pose and lighting," in Proc. IEEE Computr. Soc. Conf. Comput. Vision Pattern Recognit., vol. 2. Jul. 2004, pp. 97-104.
- (2004) Proc. IEEE Computr. Soc. Conf. Comput. Vision Pattern Recognit. , vol.2 , pp. 97-104
- Le Cun, Y.¹ Huang, F.J.² Bottou, L.³

24
- 34547967782
- An empirical evaluation of deep architectures on problems with many factors of variation
- H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation," in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 473-480.
- (2007) Proc. 24th Int. Conf. Mach. Learn. , pp. 473-480
- Larochelle, H.¹ Erhan, D.² Courville, A.³ Bergstra, J.⁴ Bengio, Y.⁵

25
- 77956502334
- Unsupervised feature learning for audio classification using convolutional deep belief networks
- H. Lee, Y. Largman, P. Pham, and A. Y. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Proc. Adv. Neural Informat. Process. Syst., vol. 22. 2010, pp. 1-9.
- (2010) Proc. Adv. Neural Informat. Process. Syst. , vol.22 , pp. 1-9
- Lee, H.¹ Largman, Y.² Pham, P.³ Ng, A.Y.⁴

26
- 80455140956
- Linguistic Data Consortium (LDC), Philadelphia PA [Online], Available
- Linguistic Data Consortium (LDC), Philadelphia, PA [Online]. Available: http://www.ldc.upenn.edu.com

27
- 80455158727
- Russian Speech Corpus [Online], Available
- Russian Speech Corpus [Online]. Available: http://www.repository. voxforge1.org/

28
- 80455132745
- Shenzhen Inst. Advanced Technology, Chinese Academy Science, Shenzhen, China, Tech. Rep
- L. Wang, "Chinese speech corpus for speaker recognition," Shenzhen Inst. Advanced Technology, Chinese Academy Science, Shenzhen, China, Tech. Rep., pp. 1-66, 2008.
- (2008) Chinese Speech Corpus for Speaker Recognition , pp. 1-66
- Wang, L.¹

29
- 84862617580
- Loss functions for discriminative training of energy-based models
- Y. LeCun and F. J. Huang, "Loss functions for discriminative training of energy-based models," in Proc. Art. Intell. Statist., 2005, pp. 1-8.
- (2005) Proc. Art. Intell. Statist. , pp. 1-8
- Le Cun, Y.¹ Huang, F.J.²

30
- 24644437539
- Signature verification using a siamese time delay neural network
- J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, "Signature verification using a siamese time delay neural network," in Proc. Adv. Neural Inform. Process. Syst., vol. 5. 1993, pp. 1-3.
- (1993) Proc. Adv. Neural Inform. Process. Syst. , vol.5 , pp. 1-3
- Bromley, J.¹ Guyon, I.² Le Cun, Y.³ Sackinger, E.⁴ Shah, R.⁵

31
- 0036487238
- Toward better making a decision in speaker verification
- Feb
- K. Chen, "Toward better making a decision in speaker verification," Pattern Recog., vol. 36, no. 2, pp. 329-346, Feb. 2003.
- (2003) Pattern Recog. , vol.36 , Issue.2 , pp. 329-346
- Chen, K.¹

32
- 0036505591
- Capture inter-speaker information with a neural network for speaker identification
- Mar
- L. Wang, K. Chen, and H. Chi, "Capture inter-speaker information with a neural network for speaker identification," IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 436-445, Mar. 2002.
- (2002) IEEE Trans. Neural Netw. , vol.13 , Issue.2 , pp. 436-445
- Wang, L.¹ Chen, K.² Chi, H.³

33
- 56449114208
- Extracting and composing robust features with denoising autoencoders
- P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proc. Art. Intell. Statis., 2007, pp. 1-8.
- (2007) Proc. Art. Intell. Statis. , pp. 1-8
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.⁴

34
- 80455132744
- Speaker comparison with inner product discriminant functions
- W. Campbell, D. E. Sturim, and Z. Karam, "Speaker comparison with inner product discriminant functions," in Proc. Adv. Neural Inform. Process. Syst., vol. 22. 2010, pp. 1-9.
- (2010) Proc. Adv. Neural Inform. Process. Syst. , vol.22 , pp. 1-9
- Campbell, W.¹ Sturim, D.E.² Karam, Z.³

35
- 0029209272
- Robust text-independent speaker identification using Gaussian mixture speaker models
- Jan
- D. A. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 72-83, Jan. 1995.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.1 , pp. 72-83
- Reynolds, D.A.¹ Rose, R.C.²

36
- 85046873967
- The DET curve in assessment of detection task performance
- A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Pryzbocki, "The DET curve in assessment of detection task performance," in Proc. Eurospeech, vol. 4. 1997, pp. 1899-1903.
- (1997) Proc. Eurospeech , vol.4 , pp. 1899-1903
- Martin, A.¹ Doddington, G.² Kamm, T.³ Ordowski, M.⁴ Pryzbocki, M.⁵

37
- 79951670099
- Optimized discriminative kernel for SVM scoring and its application to speaker verification
- Feb
- S. Zhang and M. Mak, "Optimized discriminative kernel for SVM scoring and its application to speaker verification," IEEE Trans. Neural Netw., vol. 22, no. 2, pp. 173-185, Feb. 2011.
- (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.2 , pp. 173-185
- Zhang, S.¹ Mak, M.²

38
- 38949122754
- Speaker segmentation and clustering
- May
- M. Kotti, V. Moschou, and C. Kotropoulos, "Speaker segmentation and clustering," Signal Process., vol. 88, no. 5, pp. 1091-1124, May 2008.
- (2008) Signal Process. , vol.88 , Issue.5 , pp. 1091-1124
- Kotti, M.¹ Moschou, V.² Kotropoulos, C.³

39
- 0034273195
- DISTBIC: A speaker-based segmentation for audio data indexing
- Sep
- P. Delacourt and C. Wellekens, "DISTBIC: A speaker-based segmentation for audio data indexing," Speech Commun., vol. 32, nos. 1-2, pp. 111-126, Sep. 2000.
- (2000) Speech Commun. , vol.32 , Issue.1-2 , pp. 111-126
- Delacourt, P.¹ Wellekens, C.²

40
- 0002595416
- Speaker environment and channel change detection and clustering via Bayesian information criterion
- S. S. Chen and P. S. Gopalakrishnan, "Speaker environment and channel change detection and clustering via Bayesian information criterion," in Proc. Defense Adv. Res. Projects Agency Speech Recognit. Workshop, 1998, pp. 127-132.
- (1998) Proc. Defense Adv. Res. Projects Agency Speech Recognit. Workshop , pp. 127-132
- Chen, S.S.¹ Gopalakrishnan, P.S.²

41
- 84898993653
- Neighbourhood component analysis
- J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, "Neighbourhood component analysis," in Proc. Adv. Neural Inform. Process. Syst., vol. 17. 2005, pp. 1-8.
- (2005) Proc. Adv. Neural Inform. Process. Syst. , vol.17 , pp. 1-8
- Goldberger, J.¹ Roweis, S.² Hinton, G.³ Salakhutdinov, R.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.