SCOPUS 정보 검색 플랫폼

Neural Networks

Volumn 64, Issue , 2015, Pages 49-58

Frame-by-frame language identification in short utterances using deep neural networks

(4) Gonzalez Dominguez, Javier a,b Lopez Moreno, Ignacio a Moreno, Pedro J a Gonzalez Rodriguez, Joaquin b

a GOOGLE INC (United States)

b UNIVERSIDAD AUTÓNOMA DE MADRID (Spain)

Author keywords

DNNs; I vectors; Real time LID

Indexed keywords

DEEP NEURAL NETWORKS; MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS; SPEECH RECOGNITION;

AUTOMATIC LANGUAGE IDENTIFICATION; CONTEXTUAL INFORMATION; DNNS; I VECTORS; LANGUAGE IDENTIFICATION; LANGUAGE RECOGNITION; REAL TIME; REAL-TIME APPLICATION;

NEURAL NETWORKS;

ANALYTIC METHOD; ARTICLE; ARTIFICIAL NEURAL NETWORK; AUTOMATIC LANGUAGE IDENTIFICATION; AUTOMATIC SPEECH RECOGNITION; DATA BASE; DEEP NEURAL NETWORK; FRAME BY FRAME LANGUAGE IDENTIFICATION; INFORMATION PROCESSING; LANGUAGE; LEARNING ALGORITHM; MATHEMATICAL COMPUTING; MATHEMATICAL VARIABLE; PERFORMANCE; PROBABILITY; SPEECH; SYSTEM ANALYSIS; BIOMETRY; PROCEDURES;

BIOMETRIC IDENTIFICATION; LANGUAGE; NEURAL NETWORKS (COMPUTER); SPEECH; SPEECH RECOGNITION SOFTWARE;

EID: 84922385124 PISSN: 08936080 EISSN: 18792782 Source Type: Journal
DOI: 10.1016/j.neunet.2014.08.006 Document Type: Article

Times cited : (66)

References (39)

1
- 79958844455
- Language identification: A tutorial
- Ambikairajah E., Li H., Wang L., Yin B., Sethu V. Language identification: A tutorial. IEEE Circuits and Systems Magazine 2011, 11(2):82-108. 10.1109/MCAS.2011.941081.
- (2011) IEEE Circuits and Systems Magazine , vol.11 , Issue.2 , pp. 82-108
- Ambikairajah, E.¹ Li, H.² Wang, L.³ Yin, B.⁴ Sethu, V.⁵

2
- 66749166374
- Dialect identification: The effects of region of origin and amount of experience
- Baker W., Eddington D., Nay L. Dialect identification: The effects of region of origin and amount of experience. American Speech 2009, 84(1):48-71.
- (2009) American Speech , vol.84 , Issue.1 , pp. 48-71
- Baker, W.¹ Eddington, D.² Nay, L.³

3
- 84858969967
- (Ph.D. thesis), Columbia University
- Biadsy F. Automatic dialect and accent recognition and its application to speech recognition 2011, (Ph.D. thesis), Columbia University.
- (2011) Automatic dialect and accent recognition and its application to speech recognition
- Biadsy, F.¹

4
- 78650028960
- Pattern recognition and machine learning
- Springer
- Bishop C. Pattern recognition and machine learning. Information science and statistics 2007, Springer. 1st ed.
- (2007) Information science and statistics
- Bishop, C.¹

5
- 85073192959
- Description and analysis of the Brno276 system for LRE2011
- International Speech Communication Association
- Brummer N., Cumani S., Glembek O., Karafiát M., Matejka P., Pesan J., et al. Description and analysis of the Brno276 system for LRE2011. Proceedings of Odyssey 2012: The speaker and language recognition workshop 2012, 216-223. International Speech Communication Association.
- (2012) Proceedings of Odyssey 2012: The speaker and language recognition workshop , pp. 216-223
- Brummer, N.¹ Cumani, S.² Glembek, O.³ Karafiát, M.⁴ Matejka, P.⁵ Pesan, J.⁶

6
- 80052728797
- CoRR abs/1003.0358.
- Ciresan, D., Meier, U., Gambardella, L., Schmidhuber, J. (2010). Deep big simple neural nets excel on handwritten digit recognition, CoRR abs/1003.0358.
- (2010) Deep big simple neural nets excel on handwritten digit recognition
- Ciresan, D.¹ Meier, U.² Gambardella, L.³ Schmidhuber, J.⁴

7
- 0024902957
- Language identification with neural networks: A feasibility study
- Cole, R., Inouye, J., Muthusamy, Y., Gopalakrishnan, M. (1989). Language identification with neural networks: A feasibility study. In IEEE Pacific Rim conference on communications, computers and signal processing, 1989. Conference proceeding. (pp. 525-529) . http://dx.doi.org/10.1109/PACRIM.1989.48417.
- (1989) IEEE Pacific Rim conference on communications, computers and signal processing, 1989. Conference proceeding , pp. 525-529
- Cole, R.¹ Inouye, J.² Muthusamy, Y.³ Gopalakrishnan, M.⁴

8
- 84877760312
- Large scale distributed deep networks
- P. Bartlett, F. Pereira, C. Burges, L. Bottou, K. Weinberger (Eds.)
- Dean J., Corrado G., Monga R., Chen K., Devin M., Le Q., et al. Large scale distributed deep networks. Advances in neural information processing systems 25 2012, 1232-1240. P. Bartlett, F. Pereira, C. Burges, L. Bottou, K. Weinberger (Eds.).
- (2012) Advances in neural information processing systems 25 , pp. 1232-1240
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Le, Q.⁶

9
- 79951609039
- Front-end factor analysis for speaker verification
- Dehak N., Kenny P., Dehak R., Dumouchel P., Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing 2011, 19(4):788-798.
- (2011) IEEE Transactions on Audio, Speech and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

10
- 84865750857
- Language recognition via i-vectors and dimensionality reduction
- ISCA
- Dehak N., Torres-Carrasquillo P.A., Reynolds D.A., Dehak R. Language recognition via i-vectors and dimensionality reduction. INTERSPEECH 2011, 857-860. ISCA.
- (2011) INTERSPEECH , pp. 857-860
- Dehak, N.¹ Torres-Carrasquillo, P.A.² Reynolds, D.A.³ Dehak, R.⁴

11
- 78049390307
- A comparison of approaches for modeling prosodic features in speaker recognition
- Ferrer, L., Scheffer, N., Shriberg, E. (2010). A comparison of approaches for modeling prosodic features in speaker recognition. In International conference on acoustics, speech, and signal processing (pp. 4414-4417) . http://dx.doi.org/10.1109/ICASSP.2010.5495632.
- (2010) International conference on acoustics, speech, and signal processing , pp. 4414-4417
- Ferrer, L.¹ Scheffer, N.² Shriberg, E.³

12
- 78649305128
- Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009
- Gonzalez-Dominguez J., Lopez-Moreno I., Franco-Pedroso J., Ramos D., Toledano D., Gonzalez-Rodriguez J. Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009. IEEE Journal of Selected Topics in Signal Processing 2010, 4(6):1084-1093. 10.1109/JSTSP.2010.2076071.
- (2010) IEEE Journal of Selected Topics in Signal Processing , vol.4 , Issue.6 , pp. 1084-1093
- Gonzalez-Dominguez, J.¹ Lopez-Moreno, I.² Franco-Pedroso, J.³ Ramos, D.⁴ Toledano, D.⁵ Gonzalez-Rodriguez, J.⁶

13
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky H. Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990, 87(4):1738-1752.
- (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

14
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Hinton G., Deng L., Yu D., Dahl G., Mohamed A., Jaitly N., et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 2012, 29(6):82-97. 10.1109/MSP.2012.2205597.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶

15
- 58349106697
- A study of interspeaker variability in speaker verification
- Kenny P., Oullet P., Dehak V., Gupta N., Dumouchel P. A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 2008, 16(5):980-988.
- (2008) IEEE Transactions on Audio, Speech and Language Processing , vol.16 , Issue.5 , pp. 980-988
- Kenny, P.¹ Oullet, P.² Dehak, V.³ Gupta, N.⁴ Dumouchel, P.⁵

16
- 33750147566
- Neural network classifiers for language identification using phonotactic and prosodic features
- Leena, M., Srinivasa Rao, K., Yegnanarayana, B. (2005). Neural network classifiers for language identification using phonotactic and prosodic features. In Proceedings of 2005 international conference on intelligent sensing and information processing, 2005 (pp. 404-408) . http://dx.doi.org/10.1109/ICISIP.2005.1529486.
- (2005) Proceedings of 2005 international conference on intelligent sensing and information processing, 2005 , pp. 404-408
- Leena, M.¹ Srinivasa Rao, K.² Yegnanarayana, B.³

17
- 84876676725
- Spoken language recognition: From fundamentals to practice
- Li H., Ma B., Lee K.A. Spoken language recognition: From fundamentals to practice. Proceedings of the IEEE 2013, 101(5):1136-1159. 10.1109/JPROC.2012.2237151.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1136-1159
- Li, H.¹ Ma, B.² Lee, K.A.³

18
- 84900522099
- Li, M., Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification, Computer Speech and Language.
- (2014) Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification, Computer Speech and Language
- Li, M.¹ Narayanan, S.²

19
- 84863809213
- Dialect identification: Impact of difference between read versus spontaneous speech
- Liu, G., Lei, Y., Hansen, J. H. (2010). Dialect identification: Impact of difference between read versus spontaneous speech. In EUSIPCO-2010 (pp. 2003-2006).
- (2010) EUSIPCO-2010 , pp. 2003-2006
- Liu, G.¹ Lei, Y.² Hansen, J.H.³

20
- 84886586505
- A linguistic data acquisition front-end for language recognition evaluation
- Liu, G., Zhang, C., Hansen, J. H. L. (2012). A linguistic data acquisition front-end for language recognition evaluation. In Proc. Odyssey, Singapore.
- (2012) Proc. Odyssey, Singapore
- Liu, G.¹ Zhang, C.² Hansen, J.H.L.³

21
- 84905247598
- Automatic language identification using deep neural networks
- Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P. (2014). Automatic language identification using deep neural networks. In IEEE International conference on acoustics, speech, and signal processing.
- (2014) IEEE International conference on acoustics, speech, and signal processing
- Lopez-Moreno, I.¹ Gonzalez-Dominguez, J.² Plchot, O.³ Martinez, D.⁴ Gonzalez-Rodriguez, J.⁵ Moreno, P.⁶

22
- 84890467091
- Prosodic features and formant modeling for an ivector-based language recognition system
- Martinez, D., Lleida, E., Ortega, A., Miguel, A. (2013). Prosodic features and formant modeling for an ivector-based language recognition system. In 2013 IEEE International conference on acoustics, speech and signal processing, ICASSP (pp. 6847-6851) . http://dx.doi.org/10.1109/ICASSP.2013.6638988.
- (2013) 2013 IEEE International conference on acoustics, speech and signal processing, ICASSP , pp. 6847-6851
- Martinez, D.¹ Lleida, E.² Ortega, A.³ Miguel, A.⁴

23
- 84865769863
- Language recognition in ivectors space
- ISCA
- Martinez D., Plchot O., Burget L., Glembek O., Matejka P. Language recognition in ivectors space. INTERSPEECH 2011, 861-864. ISCA.
- (2011) INTERSPEECH , pp. 861-864
- Martinez, D.¹ Plchot, O.² Burget, L.³ Glembek, O.⁴ Matejka, P.⁵

24
- 0141676589
- New entropy-based combination rules in HMM/ANN multi-stream ASR
- (pp. II-741-4) .
- Misra, H., Bourlard, H., Tyagi, V. (2003). New entropy-based combination rules in HMM/ANN multi-stream ASR. In 2003 IEEE International conference on acoustics, speech, and signal processing, 2003. Proceedings, ICASSP'03, Vol. 2 (pp. II-741-4) . http://dx.doi.org/10.1109/ICASSP.2003.1202473.
- (2003) 2003 IEEE International conference on acoustics, speech, and signal processing, 2003. Proceedings, ICASSP'03 , vol.2
- Misra, H.¹ Bourlard, H.² Tyagi, V.³

25
- 84055211743
- Acoustic modeling using deep belief networks
- Mohamed A., Dahl G., Hinton G. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech and Language Processing 2012, 20(1):14-22. 10.1109/TASL.2011.2109382.
- (2012) IEEE Transactions on Audio, Speech and Language Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

26
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- IEEE
- Mohamed A.Rahman, Hinton G.E., Penn G. Understanding how deep belief networks perform acoustic modelling. ICASSP 2012, 4273-4276. IEEE.
- (2012) ICASSP , pp. 4273-4276
- Mohamed, A.¹ Hinton, G.E.² Penn, G.³

27
- 79951804013
- Deep learning for spoken language identification
- Montavon, G. (2009). Deep learning for spoken language identification. In NIPS Workshop on deep learning for speech recognition and related applications.
- (2009) NIPS Workshop on deep learning for speech recognition and related applications
- Montavon, G.¹

28
- 0028516964
- Reviewing automatic language identification
- Muthusamy Y., Barnard E., Cole R. Reviewing automatic language identification. IEEE Signal Processing Magazine 1994, 11(4):33-41. 10.1109/79.317925.
- (1994) IEEE Signal Processing Magazine , vol.11 , Issue.4 , pp. 33-41
- Muthusamy, Y.¹ Barnard, E.² Cole, R.³

29
- 84905252473
- NIST, 2009. The 2009 NIST SLR Evaluation Plan.. http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09_EvalPlan_v6.pdf.
- (2009) The 2009 NIST SLR Evaluation Plan

30
- 0029355999
- Speaker identification and verification using Gaussian mixture speaker models
- Reynolds D. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 1995, 17(1-2):91-108.
- (1995) Speech Communication , vol.17 , Issue.1-2 , pp. 91-108
- Reynolds, D.¹

31
- 0141744710
- The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition
- Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A. et al. (2003). The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition. In IEEE International conference on acoustics, speech, and signal processing, Vol. 4 (pp. 784-787).
- (2003) IEEE International conference on acoustics, speech, and signal processing , vol.4 , pp. 784-787
- Reynolds, D.¹ Andrews, W.² Campbell, J.³ Navratil, J.⁴ Peskin, B.⁵ Adami, A.⁶

32
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- Saon, G., Soltau, H., Nahamoo, D., Picheny, M. (2013). Speaker adaptation of neural network acoustic models using i-vectors. In 2013 IEEE Workshop on automatic speech recognition and understanding, ASRU (pp. 55-59) . http://dx.doi.org/10.1109/ASRU.2013.6707705.
- (2013) 2013 IEEE Workshop on automatic speech recognition and understanding, ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

33
- 80051616158
- The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition
- Sturim, D., Campbell, W., Dehak, N., Karam, Z., McCree, A., Reynolds, D. et al. (2011). The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition. In 2011 IEEE International conference on acoustics, speech and signal processing, ICASSP (pp. 5272-5275) . http://dx.doi.org/10.1109/ICASSP.2011.5947547.
- (2011) 2011 IEEE International conference on acoustics, speech and signal processing, ICASSP , pp. 5272-5275
- Sturim, D.¹ Campbell, W.² Dehak, N.³ Karam, Z.⁴ McCree, A.⁵ Reynolds, D.⁶

34
- 78049357114
- The MITLL NIST LRE 2009 language recognition system
- Torres-Carrasquillo, P., Singer, E., Gleason, T., McCree, A., Reynolds, D., Richardson, F. et al. (2010). The MITLL NIST LRE 2009 language recognition system. In 2010 IEEE International conference on acoustics speech and signal processing, ICASSP (pp. 4994-4997) . http://dx.doi.org/10.1109/ICASSP.2010.5495080.
- (2010) 2010 IEEE International conference on acoustics speech and signal processing, ICASSP , pp. 4994-4997
- Torres-Carrasquillo, P.¹ Singer, E.² Gleason, T.³ McCree, A.⁴ Reynolds, D.⁵ Richardson, F.⁶

35
- 85009275225
- Approaches to language identification using Gaussian mixture models and shifted delta cepstral features
- Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Deller, J. R. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In ICSLP, Vol. 1 (pp. 89-92).
- (2002) ICSLP , vol.1 , pp. 89-92
- Torres-Carrasquillo, P.A.¹ Singer, E.² Kohler, M.A.³ Deller, J.R.⁴

36
- 84867204678
- Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition
- Torres-Carrasquillo, P. A., Sturim, D. E., Reynolds, D. A., McCree, A. (2008). Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition. In INTERSPEECH (pp. 723-726).
- (2008) INTERSPEECH , pp. 723-726
- Torres-Carrasquillo, P.A.¹ Sturim, D.E.² Reynolds, D.A.³ McCree, A.⁴

37
- 85032782045
- Deep learning and its applications to signal and information processing [exploratory DSP]
- Yu D., Deng L. Deep learning and its applications to signal and information processing [exploratory DSP]. IEEE Signal Processing Magazine 2011, 28(1):145-154. 10.1109/MSP.2010.939038.
- (2011) IEEE Signal Processing Magazine , vol.28 , Issue.1 , pp. 145-154
- Yu, D.¹ Deng, L.²

38
- 0029733178
- Comparison of four approaches to automatic language identification of telephone speech
- Zissman M. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Acoustics, Speech and Signal Processing 1996, 4(1):31-44.
- (1996) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.4 , Issue.1 , pp. 31-44
- Zissman, M.¹

39
- 0035427178
- Automatic language identification
- Zissman M.A., Berkling K. Automatic language identification. Speech Communication 2001, 35(1-2):115-124.
- (2001) Speech Communication , vol.35 , pp. 115-124
- Zissman, M.A.¹ Berkling, K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.