메뉴 건너뛰기




Volumn 24, Issue 4, 2016, Pages 768-783

Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance

Author keywords

Anti spoofing; Countermeasure; Security; Speaker verification; Speech synthesis; Spoofing attack; Voice conversion

Indexed keywords

BENCHMARKING; SPEECH PROCESSING; SPEECH SYNTHESIS;

EID: 84962901047     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2016.2526653     Document Type: Article
Times cited : (99)

References (78)
  • 2
    • 84959177524 scopus 로고    scopus 로고
    • Human vs machine spoofing detection on wideband and narrowband data
    • M. Wester, Z. Wu, and J. Yamagishi, "Human vs machine spoofing detection on wideband and narrowband data, " in Proc. Interspeech, 2015.
    • (2015) Proc. Interspeech
    • Wester, M.1    Wu, Z.2    Yamagishi, J.3
  • 3
    • 84862007811 scopus 로고    scopus 로고
    • Voice biometrics-the Asia pacific experience
    • P. Golden, "Voice biometrics-The Asia Pacific experience, " Biom. Technol. Today, vol. 2012, no. 4, pp. 10-11, 2012.
    • (2012) Biom. Technol. Today , vol.2012 , Issue.4 , pp. 10-11
    • Golden, P.1
  • 4
    • 84875163582 scopus 로고    scopus 로고
    • Talking passwords: Voice biometrics for data access and security
    • M. Khitrov, "Talking passwords: Voice biometrics for data access and security, " Biom. Technol. Today, vol. 2013, no. 2, pp. 9-11, 2013.
    • (2013) Biom. Technol. Today , vol.2013 , Issue.2 , pp. 9-11
    • Khitrov, M.1
  • 5
    • 84880875127 scopus 로고    scopus 로고
    • Voice biometrics: Success stories, success factors and what's next
    • B. Beranek, "Voice biometrics: Success stories, success factors and what's next, " Biom. Technol. Today, vol. 2013, no. 7, pp. 9-11, 2013.
    • (2013) Biom. Technol. Today , vol.2013 , Issue.7 , pp. 9-11
    • Beranek, B.1
  • 7
    • 84929796028 scopus 로고    scopus 로고
    • Surveying the development of biometric user authentication on mobile phones
    • 3rd Quart
    • W. Meng, D. Wong, S. Furnell, and J. Zhou, "Surveying the development of biometric user authentication on mobile phones, " IEEE Commun. Surv. Tuts., vol. 17, no. 3, pp. 1268-1293, 3rd Quart. 2015.
    • (2015) IEEE Commun. Surv. Tuts. , vol.17 , Issue.3 , pp. 1268-1293
    • Meng, W.1    Wong, D.2    Furnell, S.3    Zhou, J.4
  • 8
    • 84919922238 scopus 로고    scopus 로고
    • Spoofing and countermeasures for speaker verification: A survey
    • Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li, "Spoofing and countermeasures for speaker verification: A survey, " Speech Commun., vol. 66, pp. 130-153, 2015.
    • (2015) Speech Commun. , vol.66 , pp. 130-153
    • Wu, Z.1    Evans, N.2    Kinnunen, T.3    Yamagishi, J.4    Alegre, F.5    Li, H.6
  • 11
    • 84929611508 scopus 로고    scopus 로고
    • Automatic versus human speaker verification: The case of voice mimicry
    • R. G. Hautamäki, T. Kinnunen, V. Hautamäki, and A.-M. Laukkanen, "Automatic versus human speaker verification: The case of voice mimicry, " Speech Commun., vol. 72, pp. 13-31, 2015.
    • (2015) Speech Commun. , vol.72 , pp. 13-31
    • Hautamäki, R.G.1    Kinnunen, T.2    Hautamäki, V.3    Laukkanen, A.-M.4
  • 14
    • 84949494025 scopus 로고    scopus 로고
    • On the study of replay and voice conversion attacks to text-dependent speaker verification
    • Z. Wu and H. Li, "On the study of replay and voice conversion attacks to text-dependent speaker verification, " Multimedia Tools Appl., 2015, doi:10.1007/s11042-015-3080-9.
    • (2015) Multimedia Tools Appl.
    • Wu, Z.1    Li, H.2
  • 16
    • 84865369980 scopus 로고    scopus 로고
    • Evaluation of speaker verification security and detection of hmm-based synthetic speech
    • Oct
    • P. L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, and I. Saratxaga, "Evaluation of speaker verification security and detection of HMM-based synthetic speech, " IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 8, pp. 2280-2290, Oct. 2012.
    • (2012) IEEE Trans. Audio Speech Lang. Process , vol.20 , Issue.8 , pp. 2280-2290
    • De Leon, P.L.1    Pucher, M.2    Yamagishi, J.3    Hernaez, I.4    Saratxaga, I.5
  • 17
    • 65349113532 scopus 로고    scopus 로고
    • Artificial impostor voice transformation effects on false acceptance rates
    • J.-F. Bonastre, D. Matrouf, and C. Fredouille, "Artificial impostor voice transformation effects on false acceptance rates, " in Proc. Interspeech, 2007.
    • (2007) Proc. Interspeech
    • Bonastre, J.-F.1    Matrouf, D.2    Fredouille, C.3
  • 20
    • 84906234851 scopus 로고    scopus 로고
    • Voice transformation-based spoofing of text dependent speaker verification systems
    • Z. Kons and H. Aronowitz, "Voice transformation-based spoofing of text dependent speaker verification systems, " in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Kons, Z.1    Aronowitz, H.2
  • 21
    • 84956723787 scopus 로고    scopus 로고
    • Voice conversion versus speaker verification: An overview
    • Z. Wu and H. Li, "Voice conversion versus speaker verification: An overview, " APSIPA Trans. Signal Inf. Process., vol. 3, p. e17, 2014.
    • (2014) APSIPA Trans. Signal Inf. Process , vol.3 , pp. e17
    • Wu, Z.1    Li, H.2
  • 23
    • 1942512336 scopus 로고    scopus 로고
    • Imposture using synthetic speech against speaker verification based on spectrum and pitch
    • T. Masuko, K. Tokuda, and T. Kobayashi, "Imposture using synthetic speech against speaker verification based on spectrum and pitch, " in Proc. Interspeech, 2000.
    • (2000) Proc. Interspeech
    • Masuko, T.1    Tokuda, K.2    Kobayashi, T.3
  • 24
    • 0012330750 scopus 로고
    • The design for the wall street journal-based CSR corpus
    • D. B. Paul and J. M. Baker, "The design for the wall street journal-based CSR corpus, " in Proc. Workshop Speech Nat. Lang., 1992, pp. 357-362.
    • (1992) Proc. Workshop Speech Nat. Lang. , pp. 357-362
    • Paul, D.B.1    Baker, J.M.2
  • 29
    • 84878402831 scopus 로고    scopus 로고
    • Synthetic speech discrimination using pitch pattern statistics derived from image analysis
    • P. L. De Leon, B. Stewart, and J. Yamagishi, "Synthetic speech discrimination using pitch pattern statistics derived from image analysis, " in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • De Leon, P.L.1    Stewart, B.2    Yamagishi, J.3
  • 32
    • 33947167478 scopus 로고    scopus 로고
    • Face description with local binary patterns: Application to face recognition
    • Dec
    • T. Ahonen, A. Hadid, and M. Pietikainen, "Face description with local binary patterns: Application to face recognition, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037-2041, Dec. 2006.
    • (2006) IEEE Trans. Pattern Anal. Mach. Intell. , vol.28 , Issue.12 , pp. 2037-2041
    • Ahonen, T.1    Hadid, A.2    Pietikainen, M.3
  • 33
    • 84906244272 scopus 로고    scopus 로고
    • A new speaker verification spoofing countermeasure based on local binary patterns
    • F. Alegre, R. Vipperla, A. Amehraye, and N. Evans, "A new speaker verification spoofing countermeasure based on local binary patterns, " in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Alegre, F.1    Vipperla, R.2    Amehraye, A.3    Evans, N.4
  • 34
    • 84893797780 scopus 로고    scopus 로고
    • A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns
    • F. Alegre, A. Amehraye, and N. Evans, "A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns, " in Proc. Int. Conf. Biom.: Theory Appl. Syst. (BTAS), 2013.
    • (2013) Proc. Int. Conf. Biom.: Theory Appl. Syst. (BTAS)
    • Alegre, F.1    Amehraye, A.2    Evans, N.3
  • 35
    • 84878410960 scopus 로고    scopus 로고
    • Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition
    • Z. Wu, E. S. Chng, and H. Li, "Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, " in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Wu, Z.1    Chng, E.S.2    Li, H.3
  • 36
    • 84910072494 scopus 로고    scopus 로고
    • A crossvocoder study of speaker independent synthetic speech detection using phase information
    • J. Sanchez, I. Saratxaga, I. Hernaez, E. Navas, and D. Erro, "A crossvocoder study of speaker independent synthetic speech detection using phase information, " in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Sanchez, J.1    Saratxaga, I.2    Hernaez, I.3    Navas, E.4    Erro, D.5
  • 40
    • 84959103968 scopus 로고    scopus 로고
    • Joint speaker verification and antispoofing in the i-vector space
    • Apr
    • A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, "Joint speaker verification and antispoofing in the I-vector space, " IEEE Trans. Inf. Forensics Secur., vol. 10, no. 4, pp. 821-832, Apr. 2015.
    • (2015) IEEE Trans. Inf. Forensics Secur. , vol.10 , Issue.4 , pp. 821-832
    • Sizov, A.1    Khoury, E.2    Kinnunen, T.3    Wu, Z.4    Marcel, S.5
  • 42
    • 84959130948 scopus 로고    scopus 로고
    • ASVs poof 2015: The first automatic speaker verification spoofing and countermeasures challenge
    • Z. Wu et al., "ASVs poof 2015: The first automatic speaker verification spoofing and countermeasures challenge, " in Proc. Interspeech, 2015.
    • (2015) Proc. Interspeech
    • Wu, Z.1
  • 44
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 45
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigné, A.3
  • 46
    • 84865777002 scopus 로고    scopus 로고
    • The CSTR/EMIME HTS system for blizzard challenge 2010
    • Kyoto, Japan, Sep
    • J. Yamagishi and O. Watts, "The CSTR/EMIME HTS system for blizzard challenge 2010, " in Proc. Blizzard Challenge, Kyoto, Japan, Sep. 2010.
    • (2010) Proc. Blizzard Challenge
    • Yamagishi, J.1    Watts, O.2
  • 47
    • 84897393748 scopus 로고    scopus 로고
    • Structural Bayesian linear regression for hidden Markov models
    • S. Watanabe, A. Nakamura, and B.-H. Juang, "Structural Bayesian linear regression for hidden Markov models, " J. Signal Process. Syst., vol. 74, no. 3, pp. 341-358, 2014.
    • (2014) J. Signal Process. Syst. , vol.74 , Issue.3 , pp. 341-358
    • Watanabe, S.1    Nakamura, A.2    Juang, B.-H.3
  • 48
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for hmm-based speech synthesis
    • May
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, " IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, May 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 49
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, " Speech Commun., vol. 9, no. 5-6, pp. 453-468, 1990.
    • (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-468
    • Moulines, E.1    Charpentier, F.2
  • 52
    • 0142247093 scopus 로고    scopus 로고
    • The German text-to-speech synthesis system Mary: A tool for research, development and teaching
    • M. Schröder and J. Trouvain, "The German text-to-speech synthesis system MARY: A tool for research, development and teaching, " Int. J. Speech Technol., vol. 6, no. 4, pp. 365-377, 2003.
    • (2003) Int. J. Speech Technol. , vol.6 , Issue.4 , pp. 365-377
    • Schröder, M.1    Trouvain, J.2
  • 56
    • 84878378722 scopus 로고    scopus 로고
    • Effects of speaker adaptive training on tensor-based arbitrary speaker conversion
    • D. Saito, N. Minematsu, and K. Hirose, "Effects of speaker adaptive training on tensor-based arbitrary speaker conversion, " in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Saito, D.1    Minematsu, N.2    Hirose, K.3
  • 57
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 58
    • 84906276055 scopus 로고    scopus 로고
    • Exemplarbased unit selection for voice conversion utilizing temporal information
    • Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li, "Exemplarbased unit selection for voice conversion utilizing temporal information, " in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Wu, Z.1    Virtanen, T.2    Kinnunen, T.3    Chng, E.S.4    Li, H.5
  • 59
    • 84878390910 scopus 로고    scopus 로고
    • Implementation of computationally efficient real-time voice conversion
    • T. Toda, T. Muramatsu, and H. Banno, "Implementation of computationally efficient real-time voice conversion, " in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Toda, T.1    Muramatsu, T.2    Banno, H.3
  • 61
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models, " Digit. Signal Process., vol. 10, no. 1, pp. 19-41, 2000.
    • (2000) Digit. Signal Process , vol.10 , Issue.1 , pp. 19-41
    • Reynolds, D.A.1    Quatieri, T.F.2    Dunn, R.B.3
  • 64
    • 84864277561 scopus 로고    scopus 로고
    • Audioseg: Audio segmentation toolkit, release 1.2
    • France, Jan
    • G. Gravier, M. Betser, and M. Ben, "AudioSeg: Audio segmentation toolkit, release 1.2, " IRISA, France, Jan. 2010.
    • (2010) IRISA
    • Gravier, G.1    Betser, M.2    Ben, M.3
  • 65
    • 84910024698 scopus 로고    scopus 로고
    • MSR identity toolbox v1. 0: A MATLAB toolbox for speaker recognition research
    • Soc. Speech Lang. Tech. Committee Newsl. November
    • S. O. Sadjadi, M. Slaney, and L. Heck, "MSR identity toolbox v1. 0: A MATLAB toolbox for speaker recognition research, " in Proc. IEEE Signal Process. Soc. Speech Lang. Tech. Committee Newsl., November 2013.
    • (2013) Proc. IEEE Signal Process
    • Sadjadi, S.O.1    Slaney, M.2    Heck, L.3
  • 66
    • 84901846660 scopus 로고    scopus 로고
    • From single to multiple enrollment I-vectors: Practical PLDA scoring variants for speaker verification
    • p
    • P. Rajan, A. Afanasyev, V. Hautamäki, and T. Kinnunen, "From single to multiple enrollment I-vectors: Practical PLDA scoring variants for speaker verification, " Digit. Signal Process., vol. 31, pp. 93-101, 2014.
    • (2014) Digit. Signal Process , vol.31 , pp. 93-101
    • Rajan, P.1    Afanasyev, A.2    Hautamäki, V.3    Kinnunen, T.4
  • 67
    • 4544290141 scopus 로고    scopus 로고
    • Usefulness of phase spectrum in human speech perception
    • K. K. Paliwal and L. D. Alsteris, "Usefulness of phase spectrum in human speech perception, " in Proc. Interspeech, 2003.
    • (2003) Proc. Interspeech
    • Paliwal, K.K.1    Alsteris, L.D.2
  • 68
    • 51849100937 scopus 로고    scopus 로고
    • Significance of the modified group delay feature in speech recognition
    • Jan
    • R. M. Hegde et al., "Significance of the modified group delay feature in speech recognition, " IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 1, pp. 190-202, Jan. 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.1 , pp. 190-202
    • Hegde, R.M.1
  • 71
    • 84925160976 scopus 로고    scopus 로고
    • Cambridge, U.K.: Cambridge Univ. Press
    • P. Taylor, Text-to-Speech Synthesis. Cambridge, U.K.: Cambridge Univ. Press, 2009.
    • (2009) Text-to-Speech Synthesis
    • Taylor, P.1
  • 72
    • 27544482501 scopus 로고    scopus 로고
    • Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification
    • Jan
    • A. Ogihara, H. Unno, and A. Shiozakai, "Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, " IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. 88, no. 1, pp. 280-286, Jan. 2005.
    • (2005) IEICE Trans. Fundam. Electron. Commun. Comput. Sci. , vol.88 , Issue.1 , pp. 280-286
    • Ogihara, A.1    Unno, H.2    Shiozakai, A.3
  • 73
    • 51449086024 scopus 로고    scopus 로고
    • Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006
    • Sep
    • N. Brummer et al., "Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006, " IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 7, pp. 2072-2084, Sep. 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.7 , pp. 2072-2084
    • Brummer, N.1
  • 75
    • 0033889739 scopus 로고    scopus 로고
    • Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data
    • A. Schmidt-Nielsen and T. H. Crystal, "Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data, " Digit. Signal Process., vol. 10, no. 1, pp. 249-266, 2000.
    • (2000) Digit. Signal Process , vol.10 , Issue.1 , pp. 249-266
    • Schmidt-Nielsen, A.1    Crystal, T.H.2
  • 77
    • 77956899791 scopus 로고    scopus 로고
    • Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis
    • Y. Pantazis, Y. Stylianou, and E. Klabbers, "Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis, " in Proc. Interspeech, 2005.
    • (2005) Proc. Interspeech
    • Pantazis, Y.1    Stylianou, Y.2    Klabbers, E.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.