메뉴 건너뛰기




Volumn 3, Issue , 2014, Pages

Voice conversion versus speaker verification: An overview

Author keywords

Anti spoofing; Countermeasure; Security; Speaker verification; Spoofing attack; Voice conversion

Indexed keywords

AUTHENTICATION; SPEECH PROCESSING;

EID: 84956723787     PISSN: None     EISSN: 20487703     Source Type: Journal    
DOI: 10.1017/ATSIP.2014.17     Document Type: Review
Times cited : (50)

References (114)
  • 2
    • 33751542948 scopus 로고    scopus 로고
    • Speaker verification security improvement by means of speech watermarking
    • Faundez-Zanuy, M.; Hagmler, M.; Kubin, G.: Speaker verification security improvement by means of speech watermarking. Speech Commun., 48 (12) (2006), 1608-1619.
    • (2006) Speech Commun. , vol.48 , Issue.12 , pp. 1608-1619
    • Faundez-Zanuy, M.1    Hagmler, M.2    Kubin, G.3
  • 4
    • 84867605072 scopus 로고    scopus 로고
    • Speaker verification performance degradation against spoofing and tampering attacks
    • Villalba, J.; Lleida, E.: Speaker verification performance degradation against spoofing and tampering attacks, in FALA 10Workshop, 2010, 131-134.
    • (2010) FALA 10Workshop , pp. 131-134
    • Villalba, J.1    Lleida, E.2
  • 8
    • 84906213805 scopus 로고    scopus 로고
    • I-vectors meet imitators: On vulnerability of speaker verification systems against voice mimicry
    • Hautami, R.G.; Kinnunen, T.; Hautami, V.; Leino, T.; Laukkanen, A.-M.: I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry, in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Hautami, R.G.1    Kinnunen, T.2    Hautami, V.3    Leino, T.4    Laukkanen, A.-M.5
  • 10
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Zen, H.; Tokuda, K.; Black, A.W.: Statistical parametric speech synthesis. Speech Commun., 51 (11) (2009), 1039-1064.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 11
    • 84871382567 scopus 로고    scopus 로고
    • A unified trajectory tiling approach to high quality speech rendering
    • Qian, Y.; Soong, F.K.; Yan, Z.-J.: A unified trajectory tiling approach to high quality speech rendering. IEEE Trans. Audio, Speech, Lang. Process., 21 (1-2) (2013), 280-290.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.1-2 , pp. 280-290
    • Qian, Y.1    Soong, F.K.2    Yan, Z.-J.3
  • 16
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • Yamagishi, J.;Kobayashi,T.;Nakano,Y.; Ogata, K.; Isogai, J.: Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process., 17 (1) (2009), 66-83.
    • (2009) IEEE Trans. Audio Speech Lang. Process. , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 17
    • 70350125882 scopus 로고    scopus 로고
    • An overview of text-independent speaker recognition: Fromfeatures to supervectors
    • Kinnunen, T.; Li, H.: An overview of text-independent speaker recognition: fromfeatures to supervectors. Speech Commun., 52 (1) (2010), 12-40.
    • (2010) Speech Commun. , vol.52 , Issue.1 , pp. 12-40
    • Kinnunen, T.1    Li, H.2
  • 18
    • 84919922238 scopus 로고    scopus 로고
    • Spoofing and countermeasures for speaker verification: A survey
    • Wu, Z.; Evans, N.; Kinnunen, T.; Yamgishi, J.; Alegre, F.; Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun., 66 (2015), 130-153.
    • (2015) Speech Commun. , vol.66 , pp. 130-153
    • Wu, Z.1    Evans, N.2    Kinnunen, T.3    Yamgishi, J.4    Alegre, F.5    Li, H.6
  • 20
    • 77953725318 scopus 로고    scopus 로고
    • INCA algorithm for training voice conversion systems from nonparallel corpora
    • Erro, D.; Moreno, A.; Bonafonte, A.: INCA algorithm for training voice conversion systems from nonparallel corpora. IEEE Trans. Audio, Speech Lang. Process., 18 (5) (2010), 944-953.
    • (2010) IEEE Trans. Audio. Speech Lang. Process. , vol.18 , Issue.5 , pp. 944-953
    • Erro, D.1    Moreno, A.2    Bonafonte, A.3
  • 24
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Toda, T.; Black, A.W.; Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process., 15 (8) (2007), 2222-2235.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 26
    • 78149260085 scopus 로고    scopus 로고
    • Continuous stochastic feature mapping based on trajectory HMMs
    • Zen, H.; Nankaku, Y.; Tokuda, K.: Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans. Audio Speech Lang. Process., 19 (2) (2011), 417-430.
    • (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.2 , pp. 417-430
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3
  • 27
    • 0029254176 scopus 로고
    • Transformation of formants for voice conversion using artificial neural networks
    • Narendranath, M.;Murthy, H.A.; Rajendran, S.; Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun., 16 (2) (1995), 207-216.
    • (1995) Speech Commun. , vol.16 , Issue.2 , pp. 207-216
    • Narendranath, M.1    Murthy, H.A.2    Rajendran, S.3    Yegnanarayana, B.4
  • 30
    • 84910087395 scopus 로고    scopus 로고
    • Sequence error (SE) minimization training of neural network for voice conversion
    • Xie, F.-L.; Qian, Y.; Fan, Y.; Soong, F.K.; Li, H.: Sequence error (SE) minimization training of neural network for voice conversion, in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Xie, F.-L.1    Qian, Y.2    Fan, Y.3    Soong, F.K.4    Li, H.5
  • 31
    • 84921735339 scopus 로고    scopus 로고
    • Voice conversion using deep neural networks with layer-wise generative training
    • Chen, L.-H.; Ling, Z.-H.; Liu, L.-J.; Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE Trans. Audio Speech Lang. Process., 22 (12) (2014), 1859-1872.
    • (2014) IEEE Trans. Audio Speech Lang. Process. , vol.22 , Issue.12 , pp. 1859-1872
    • Chen, L.-H.1    Ling, Z.-H.2    Liu, L.-J.3    Dai, L.-R.4
  • 32
    • 80053068819 scopus 로고    scopus 로고
    • Voice conversion using support vector regression
    • Song, P.; Bao, Y.Q.; Zhao, L.; Zou, C.R.: Voice conversion using support vector regression. Electron. Lett., 47 (18) (2011), 1045-1046.
    • (2011) Electron. Lett. , vol.47 , Issue.18 , pp. 1045-1046
    • Song, P.1    Bao, Y.Q.2    Zhao, L.3    Zou, C.R.4
  • 36
    • 0029256372 scopus 로고
    • Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt
    • Mizuno, H.; Abe, M.: Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt. Speech Commun., 16 (2) (1995), 153-164.
    • (1995) Speech Commun. , vol.16 , Issue.2 , pp. 153-164
    • Mizuno, H.1    Abe, M.2
  • 39
    • 84857498745 scopus 로고    scopus 로고
    • Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora
    • Godoy, E.; Rosec, O.; Chonavel, T.: Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio Speech Lang. Process., 20 (4) (2012), 1313-1323.
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.4 , pp. 1313-1323
    • Godoy, E.1    Rosec, O.2    Chonavel, T.3
  • 40
    • 84872177757 scopus 로고    scopus 로고
    • Parametric voice conversion based on bilinear frequency warping plus amplitude scaling
    • Erro, D.; Navas, E.; Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process., 21 (3) (2013), 556-566.
    • (2013) IEEE Trans. Audio Speech Lang. Process. , vol.21 , Issue.3 , pp. 556-566
    • Erro, D.1    Navas, E.2    Hernaez, I.3
  • 44
    • 84906276055 scopus 로고    scopus 로고
    • Exemplarbased unit selection for voice conversion utilizing temporal information
    • Wu, Z.; Virtanen, T.; Kinnunen, T.; Chng, E.S.; Li, H.: Exemplarbased unit selection for voice conversion utilizing temporal information, in Proc. INTERSPEECH, 2013.
    • (2013) Proc. INTERSPEECH
    • Wu, Z.1    Virtanen, T.2    Kinnunen, T.3    Chng, E.S.4    Li, H.5
  • 48
    • 79959842826 scopus 로고    scopus 로고
    • Text-independent F0 transformation with non-parallel data for voice conversion
    • Wu, Z.-Z.; Kinnunen, T.; Chng, E.S.; Li, H.: Text-independent F0 transformation with non-parallel data for voice conversion, in Proc. INTERSPEECH, 2010.
    • (2010) Proc. INTERSPEECH
    • Wu, Z.-Z.1    Kinnunen, T.2    Chng, E.S.3    Li, H.4
  • 49
    • 77953726259 scopus 로고    scopus 로고
    • Pitch and duration transformation with non-parallel data
    • Campinas, Brazil, May
    • Lovive, D.; Barbot, N.; Boeffard, O.: Pitch and duration transformation with non-parallel data, in Speech Prosody 2008, Campinas, Brazil, May 2008, 111-114.
    • (2008) Speech Prosody 2008 , pp. 111-114
    • Lovive, D.1    Barbot, N.2    Boeffard, O.3
  • 51
    • 34047247202 scopus 로고    scopus 로고
    • Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
    • Wu, C.-H.; Hsia, C.-C.; Liu, T.-H.; Wang, J.-F.: Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans. Audio Speech Lang. Process., 14 (4) (2006), 1109-1116.
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.4 , pp. 1109-1116
    • Wu, C.-H.1    Hsia, C.-C.2    Liu, T.-H.3    Wang, J.-F.4
  • 53
    • 77953726259 scopus 로고    scopus 로고
    • Pitch and duration transformation with non-parallel data
    • Lolive, D.; Barbot, N.; Boeffard, O.: Pitch and duration transformation with non-parallel data, Proc. Speech Prosody, 2008, 111-114.
    • (2008) Proc. Speech Prosody , pp. 111-114
    • Lolive, D.1    Barbot, N.2    Boeffard, O.3
  • 54
    • 84867199771 scopus 로고    scopus 로고
    • Simultaneous conversionof duration and spectrum based on statistical models including time-sequence matching
    • Yutani, K.; Uto, Y.; Nankaku, Y.; Toda, T.; Tokuda, K.: Simultaneous conversionof durationandspectrumbasedon statisticalmodels including time-sequence matching, in Proc. Interspeech, 2008.
    • (2008) Proc. Interspeech
    • Yutani, K.1    Uto, Y.2    Nankaku, Y.3    Toda, T.4    Tokuda, K.5
  • 55
    • 0031233424 scopus 로고    scopus 로고
    • Speaker recognition: A tutorial
    • Campbell, J.P. Jr: Speaker recognition: a tutorial. Proc. IEEE, 85 (9) (1997), 1437-1462.
    • (1997) Proc. IEEE , vol.85 , Issue.9 , pp. 1437-1462
    • Campbell, J.P.1
  • 56
    • 2942594475 scopus 로고    scopus 로고
    • Atutorial on text-independent speaker verification
    • Bimbot, F. et al.:Atutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process., 2004 (2004), 430-451.
    • (2004) EURASIP J. Appl. Signal Process. , vol.2004 , pp. 430-451
    • Bimbot, F.1
  • 58
    • 79958818321 scopus 로고    scopus 로고
    • An overviewof speaker identification: Accuracy and robustness issues
    • Togneri, R.; Pullella,D.:An overviewof speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag., 11 (2) (2011), 23-61.
    • (2011) IEEE Circuits Syst. Mag. , vol.11 , Issue.2 , pp. 23-61
    • Togneri, R.1    Pullella, D.2
  • 59
    • 84876676725 scopus 로고    scopus 로고
    • Spoken language recognition: Fromfundamentals to practice
    • Li, H.;Ma, B.; Lee, K.A.: Spoken language recognition: Fromfundamentals to practice. Proc. IEEE, 101 (5) (2013), 1136-1159.
    • (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1136-1159
    • Li, H.1    Ma, B.2    Lee, K.A.3
  • 60
    • 84897385841 scopus 로고    scopus 로고
    • Text-dependent speaker verification: Classifiers, databases and RSR2015
    • Larcher, A.; Lee, K.A.; Ma, B.; Li, H.: Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Commun., 60 (2014), 56-77.
    • (2014) Speech Commun. , vol.60 , pp. 56-77
    • Larcher, A.1    Lee, K.A.2    Ma, B.3    Li, H.4
  • 66
    • 64249101047 scopus 로고    scopus 로고
    • Modeling prosodic features with joint factor analysis for speaker verification
    • Dehak, N.; Dumouchel, P.; Kenny, P.: Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process., 15 (7) (2007), 2095-2103.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.7 , pp. 2095-2103
    • Dehak, N.1    Dumouchel, P.2    Kenny, P.3
  • 67
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • Reynolds, D.A.; Quatieri, T.F.; Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Process., 10 (1) (2000), 19-41.
    • (2000) Digital Signal Process. , vol.10 , Issue.1 , pp. 19-41
    • Reynolds, D.A.1    Quatieri, T.F.2    Dunn, R.B.3
  • 72
    • 33645887246 scopus 로고    scopus 로고
    • Support vector machines using GMM supervectors for speaker verification
    • Campbell, W.M.; Sturim, D.E.; Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett., 13 (5) (2006), 308-311.
    • (2006) IEEE Signal Process. Lett. , vol.13 , Issue.5 , pp. 308-311
    • Campbell, W.M.1    Sturim, D.E.2    Reynolds, D.A.3
  • 74
    • 14644412368 scopus 로고    scopus 로고
    • Speaker verification using sequence discriminant support vector machines
    • Wan, V.; Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Trans. Speech Audio Process., 13 (2) (2005), 203-210.
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.2 , pp. 203-210
    • Wan, V.1    Renals, S.2
  • 76
  • 78
    • 0028204659 scopus 로고
    • Speaker recognition using neural networks and conventional classifiers
    • Farrell, K.R.; Mammone, R.J.; Assaleh, K.T.: Speaker recognition using neural networks and conventional classifiers. IEEE Trans. Speech Audio Process., 2 (1) (1994), 194-205.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.1 , pp. 194-205
    • Farrell, K.R.1    Mammone, R.J.2    Assaleh, K.T.3
  • 79
    • 0041360472 scopus 로고    scopus 로고
    • Efficient text-independent speaker verification with structural Gaussian mixture models and neural network
    • Xiang, B.; Berger, T.: Efficient text-independent speaker verification with structural Gaussian mixture models and neural network. IEEE Trans. Speech Audio Process., 11 (5) (2003), 447-456.
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.5 , pp. 447-456
    • Xiang, B.1    Berger, T.2
  • 82
    • 84910072392 scopus 로고    scopus 로고
    • A deep neural network speaker verification system targeting microphone speech
    • Lei, Y.; Ferrer, L.;McLaren, M.; Scheffer, N.: A deep neural network speaker verification system targeting microphone speech, in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Lei, Y.1    Ferrer, L.2    McLaren, M.3    Scheffer, N.4
  • 83
    • 84910028428 scopus 로고    scopus 로고
    • Application of convolutional neural networks to speaker recognition in noisy conditions
    • McLaren, M.; Lei, Y.; Scheffer, N.; Ferrer, L.: Application of convolutional neural networks to speaker recognition in noisy conditions, in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • McLaren, M.1    Lei, Y.2    Scheffer, N.3    Ferrer, L.4
  • 89
    • 84906234851 scopus 로고    scopus 로고
    • Voice transformation-based spoofing of text-dependent speaker verification systems
    • Kons, Z.; Aronowitz, H.: Voice transformation-based spoofing of text-dependent speaker verification systems, in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Kons, Z.1    Aronowitz, H.2
  • 91
    • 65349113532 scopus 로고    scopus 로고
    • Artificial impostor voice transformation effects on false acceptance rates
    • Bonastre, J.-F.; Matrouf, D.; Fredouille, C.: Artificial impostor voice transformation effects on false acceptance rates, in Proc. Interspeech, 2007.
    • (2007) Proc. Interspeech
    • Bonastre, J.-F.1    Matrouf, D.2    Fredouille, C.3
  • 98
    • 84878412793 scopus 로고    scopus 로고
    • Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals
    • Alegre, F.; Vipperla, R.; Evans, N.: Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals, in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Alegre, F.1    Vipperla, R.2    Evans, N.3
  • 99
    • 84872177757 scopus 로고    scopus 로고
    • Parametric voice conversion based on bilinear frequency warping plus amplitude scaling
    • Erro, D.; Navas, E.; Hernaez, I.: Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process. 21 (3) (2013), 556-566.
    • (2013) IEEE Trans. Audio Speech Lang. Process. , vol.21 , Issue.3 , pp. 556-566
    • Erro, D.1    Navas, E.2    Hernaez, I.3
  • 100
    • 84878410960 scopus 로고    scopus 로고
    • Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition
    • Wu, Z.; Chng, E.S.; Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Wu, Z.1    Chng, E.S.2    Li, H.3
  • 103
    • 84906244272 scopus 로고    scopus 로고
    • A new speaker verification spoofing countermeasure based on local binary patterns
    • Alegre, F.; Vipperla, R.; Amehraye, A.; Evans, N.: A new speaker verification spoofing countermeasure based on local binary patterns, in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Alegre, F.1    Vipperla, R.2    Amehraye, A.3    Evans, N.4
  • 104
    • 33947167478 scopus 로고    scopus 로고
    • Face description with local binary patterns: Application to face recognition
    • Ahonen, T.; Hadid, A.; Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 28 (12) (2006), 2037-2041.
    • (2006) IEEE Trans. Pattern Anal. Mach. Intell. , vol.28 , Issue.12 , pp. 2037-2041
    • Ahonen, T.1    Hadid, A.2    Pietikainen, M.3
  • 108
    • 84896111913 scopus 로고    scopus 로고
    • Alize 3.0-open source toolkit for state-of-the-art speaker recognition
    • Larcher, A. et al.: Alize 3.0-open source toolkit for state-of-the-art speaker recognition, in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Larcher, A.1
  • 113
    • 84878465724 scopus 로고    scopus 로고
    • RSR2015: Database for textdependent speaker verification usingmultiple pass-phrases
    • Larcher, A.; Lee, K.-A.; Ma, B.; Li, H.: RSR2015: database for textdependent speaker verification usingmultiple pass-phrases, in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Larcher, A.1    Lee, K.-A.2    Ma, B.3    Li, H.4
  • 114
    • 84906275384 scopus 로고    scopus 로고
    • Vulnerability evaluation of speaker verification under voice conversion spoofing: The effect of text constraints
    • Wu, Z.; Larcher, A.; Lee, K.A.; Chng, E.S.; Kinnunen, T.; Li, H.: Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints, in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Wu, Z.1    Larcher, A.2    Lee, K.A.3    Chng, E.S.4    Kinnunen, T.5    Li, H.6


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.