메뉴 건너뛰기




Volumn 53, Issue 4, 2011, Pages 552-566

On the detection of pitch marks using a robust multi-phase algorithm

Author keywords

Fundamental frequency; Glottal closure instant; Pitch mark; Speech signal polarity

Indexed keywords

CONFIDENCE SCORE; DETECTION ACCURACY; DETECTION ALGORITHM; FUNDAMENTAL FREQUENCIES; GLOTTAL CLOSURE INSTANT; NUMBER OF METHODS; PITCH MARK; SPEECH SIGNAL POLARITY; SPEECH SIGNALS; SPEECH WAVEFORMS; VOICED SPEECH;

EID: 79952362108     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2011.01.008     Document Type: Article
Times cited : (23)

References (40)
  • 1
    • 79952363141 scopus 로고    scopus 로고
    • Concatenative text-to-speech synthesis based on sinusoidal modelling
    • John Wiley and Sons Chichester
    • E.R. Banga, C.G. Mateo, and X.F. Salgado Concatenative text-to-speech synthesis based on sinusoidal modelling Improvements in Speech Synthesis 2002 John Wiley and Sons Chichester 52 63
    • (2002) Improvements in Speech Synthesis , pp. 52-63
    • Banga, E.R.1    Mateo, C.G.2    Salgado, X.F.3
  • 2
    • 85032421249 scopus 로고    scopus 로고
    • A novel discontinuity metric for unit selection text-to-speech synthesis
    • Workshop, Pittsburgh, PA, June 2004
    • Bellegarda, J.R., 2004. A novel discontinuity metric for unit selection text-to-speech synthesis. In: Proc. Fifth ISCA Speech Synth. Workshop, Pittsburgh, PA, June 2004, pp. 133-138.
    • (2004) Proc. Fifth ISCA Speech Synth , pp. 133-138
    • Bellegarda, J.R.1
  • 4
    • 33947141366 scopus 로고    scopus 로고
    • A quantitative assessment of group delay methods for identifying glottal closures in voiced speech
    • M. Brooks, P.A. Naylor, and J. Gudnason A quantitative assessment of group delay methods for identifying glottal closures in voiced speech IEEE Trans. Audio Speech Lang. Process. 14 2 2006 456 466
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.2 , pp. 456-466
    • Brooks, M.1    Naylor, P.A.2    Gudnason, J.3
  • 6
    • 44949138061 scopus 로고    scopus 로고
    • Pitch marking based on an adaptable filter and a peak-valley estimation method
    • J.-H. Chen, and Y.-A. Kao Pitch marking based on an adaptable filter and a peak-valley estimation method Comput. Linguist. Chinese Lang. Process. 6 2 2001 1 12
    • (2001) Comput. Linguist. Chinese Lang. Process. , vol.6 , Issue.2 , pp. 1-12
    • Chen, J.-H.1    Kao, Y.-A.2
  • 7
    • 85009090884 scopus 로고    scopus 로고
    • A two-phase pitch marking method for TD-PSOLA synthesis
    • Jeju, Korea, 2004, pp.
    • Cheng-Yuan, L., Jyh-Shing, R.J., 2004. A two-phase pitch marking method for TD-PSOLA synthesis. In: Proc. INTERSPEECH, Jeju, Korea, 2004, pp. 1189-1192.
    • (2004) Proc. INTERSPEECH , pp. 1189-1192
    • Cheng-Yuan, L.1    Jyh-Shing, R.J.2
  • 8
    • 0036214787 scopus 로고    scopus 로고
    • YIN, a fundamental frequency estimator for speech and music
    • DOI 10.1121/1.1458024
    • A. de Cheveigné, and H. Kawahara YIN, a fundamental frequency estimator for speech and music J. Acoust. Soc. Amer. 111 4 2002 1917 1930 (Pubitemid 34297247)
    • (2002) Journal of the Acoustical Society of America , vol.111 , Issue.4 , pp. 1917-1930
    • De Cheveigne, A.1
  • 9
    • 68949165450 scopus 로고    scopus 로고
    • Corpus-based speech synthesis
    • T. Dutoit Corpus-based speech synthesis J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing 2008 Springer Berlin, Heidelberg 437 455 (Chapter 21)
    • (2008) Springer Handbook of Speech Processing , pp. 437-455
    • Dutoit, T.1
  • 10
    • 0030205398 scopus 로고    scopus 로고
    • On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation
    • DOI 10.1016/0167-6393(96)00029-5, PII S0167639396000295
    • T. Dutoit, and B. Gosselin On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation Speech Comm. 19 2 1996 119 143 (Pubitemid 126363821)
    • (1996) Speech Communication , vol.19 , Issue.2 , pp. 119-143
    • Dutoit, T.1    Gosselin, B.2
  • 11
    • 33751399667 scopus 로고    scopus 로고
    • Poincaré pitch marks
    • M. Hagmüller, and G. Kubin Poincaré pitch marks Speech Comm. 48 12 2006 1650 1665
    • (2006) Speech Comm. , vol.48 , Issue.12 , pp. 1650-1665
    • Hagmüller, M.1    Kubin, G.2
  • 12
    • 0024906968 scopus 로고
    • A diphone synthesis system based on time-domain prosodic modifications of speech
    • Glasgow, UK, May 1989
    • Hamon, C., Moulines, E., Charpentier, F., 1989. A diphone synthesis system based on time-domain prosodic modifications of speech. In: Proc. ICASSP, Glasgow, UK, May 1989, pp. 238-241.
    • (1989) Proc. ICASSP , pp. 238-241
    • Hamon, C.1    Moulines, E.2    Charpentier, F.3
  • 13
    • 56149097756 scopus 로고    scopus 로고
    • F0 transformation within the voice conversion framework
    • Antwerp, Belgium, 2007
    • Hanzlíček, Z., Matoušek, J., 2007. F0 transformation within the voice conversion framework. In: Proc. INTERSPEECH, Antwerp, Belgium, 2007, pp. 1961-1964.
    • (2007) Proc. INTERSPEECH , pp. 1961-1964
    • Hanzlíček, Z.1
  • 15
    • 77956738121 scopus 로고    scopus 로고
    • Hybrid electroglottograph and speech signal based algorithm for pitch marking
    • Antwerp, Belgium, 2007
    • Hussein, H., Jokisch, O., 2007. Hybrid electroglottograph and speech signal based algorithm for pitch marking. In: Proc. INTERSPEECH, Antwerp, Belgium, 2007, pp. 1653-1656.
    • (2007) Proc. INTERSPEECH , pp. 1653-1656
    • Hussein, H.1    Jokisch, O.2
  • 16
    • 84867209708 scopus 로고    scopus 로고
    • A hybrid speech signal based algorithm for pitch marking using finite state machines
    • Brisbane, Australia, 2008
    • Hussein, H., Wolff, M., Jokisch, O., Duckhorn, F., Strecha, G., Hoffmann, R., 2008. A hybrid speech signal based algorithm for pitch marking using finite state machines. In: Proc. INTERSPEECH, Brisbane, Australia, 2008, pp. 135-138.
    • (2008) Proc. INTERSPEECH , pp. 135-138
    • Hussein, H.1    Wolff, M.2    Jokisch, O.3    Duckhorn, F.4    Strecha, G.5    Hoffmann, R.6
  • 17
    • 84961779105 scopus 로고    scopus 로고
    • Enhancement of coded speech by constrained optimization
    • Tsukuba, Ibaraki, Japan, 2002
    • Kleijn, W.B., 2002. Enhancement of coded speech by constrained optimization. In: Proc. IEEE Workshop on Speech Coding, Tsukuba, Ibaraki, Japan, 2002, pp. 163-165.
    • (2002) Proc. IEEE Workshop on Speech Coding , pp. 163-165
    • Kleijn, W.B.1
  • 19
    • 56149118994 scopus 로고    scopus 로고
    • A robust multi-phase pitch-mark detection algorithm
    • Antwerp, Belgium, 2007
    • Legát, M., Matoušek, J., Tihelka, D., 2007. A robust multi-phase pitch-mark detection algorithm. In: Proc. INTERSPEECH, Antwerp, Belgium, 2007, pp. 1641-1644.
    • (2007) Proc. INTERSPEECH , pp. 1641-1644
    • Legát, M.1
  • 20
    • 0028417076 scopus 로고
    • A Frobenius norm approach to glottal closure detection from the speech signal
    • C. Ma, Y. Kamp, and L.F. Willems A Frobenius norm approach to glottal closure detection from the speech signal IEEE Trans. Speech Audio Process. 2 2 1994 258 265
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 258-265
    • Ma, C.1    Kamp, Y.2    Willems, L.F.3
  • 21
    • 84867215793 scopus 로고    scopus 로고
    • Automatic pitch-synchronous phonetic segmentation
    • Brisbane, Australia, 2008
    • Matoušek, J., Romportl, J., 2008. Automatic pitch-synchronous phonetic segmentation. In: Proc. INTERSPEECH, Brisbane, Australia, 2008, pp. 1626-1629.
    • (2008) Proc. INTERSPEECH , pp. 1626-1629
    • Matoušek, J.1
  • 22
    • 85009132058 scopus 로고    scopus 로고
    • Design of speech corpus for text-to-speech synthesis
    • Aalborg, Denmark, 2001
    • Matoušek, J., Psutka, J., Krta, J., 2001. Design of speech corpus for text-to-speech synthesis. In: Proc. EUROSPEECH, Aalborg, Denmark, 2001, pp. 2047-2050.
    • (2001) Proc. EUROSPEECH , pp. 2047-2050
    • Matoušek, J.1
  • 23
    • 85009071398 scopus 로고    scopus 로고
    • Recent improvements on ARTIC: Czech text-to-speech system
    • Jeju, Korea, 2004
    • Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z., 2004. Recent improvements on ARTIC: Czech text-to-speech system. In: Proc. INTERSPEECH, Jeju, Korea, 2004, pp. 1933-1936.
    • (2004) Proc. INTERSPEECH , pp. 1933-1936
    • Matoušek, J.1
  • 25
    • 44949156935 scopus 로고    scopus 로고
    • Automatic glottal closed-phase location and analysis by Kalman filtering
    • Perthshire, Scotland, August 2001, paper 142
    • McKenna, J.G., 2001. Automatic glottal closed-phase location and analysis by Kalman filtering. In: Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, Scotland, August 2001, paper 142.
    • (2001) Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis
    • McKenna, J.G.1
  • 26
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines, and F. Charpentier Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones Speech Comm. 9 5-6 1990 453 467
    • (1990) Speech Comm. , vol.9 , Issue.56 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 27
    • 0000668614 scopus 로고    scopus 로고
    • Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals
    • P.S. Murthy, and B. Yegnanarayana Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals IEEE Trans. Speech Audio Process. 7 1999 609 619
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 609-619
    • Murthy, P.S.1    Yegnanarayana, B.2
  • 29
    • 0026545826 scopus 로고
    • A multichannel electroglottograph
    • M. Rothenberg A multichannel electroglottograph J. Voice 6 1 1992 36 43
    • (1992) J. Voice , vol.6 , Issue.1 , pp. 36-43
    • Rothenberg, M.1
  • 30
    • 0023764487 scopus 로고    scopus 로고
    • Monitoring vocal fold abduction through vocal fold contract area
    • M. Rothenberg, and J.J. Mahshie Monitoring vocal fold abduction through vocal fold contract area J. Speech Hearing Res. 31 1998 338 351
    • (1998) J. Speech Hearing Res. , vol.31 , pp. 338-351
    • Rothenberg, M.1    Mahshie, J.J.2
  • 31
    • 85009096905 scopus 로고    scopus 로고
    • An automatic pitch-marking method using wavelet transform
    • Beijing, China, 2000
    • Sakamoto, M., Saito, T., 2000. An automatic pitch-marking method using wavelet transform. In: Proc. Internat. Conf. Spoken Language Processing, Vol. 3, Beijing, China, 2000, pp. 650-653.
    • (2000) Proc. Internat. Conf. Spoken Language Processing , vol.3 , pp. 650-653
    • Sakamoto, M.1    Saito, T.2
  • 32
    • 0038317931 scopus 로고    scopus 로고
    • Decomposition of vocal cycle length perturbations into vocal jitter and vocal microtremor, and comparison of their size in normophonic speakers
    • DOI 10.1016/S0892-1997(03)00014-6
    • J. Schoentgen Decomposition of vocal cycle length perturbations into vocal jitter and vocal microtremor, and comparison of their size in normophonic speakers J. Voice 17 2 2003 114 125 (Pubitemid 36666379)
    • (2003) Journal of Voice , vol.17 , Issue.2 , pp. 114-125
    • Schoentgen, J.1
  • 33
    • 0029375490 scopus 로고
    • Determination of instants of significant excitation in speech using group delay function
    • R. Smits, and B. Yegnanarayana Determination of instants of significant excitation in speech using group delay function IEEE Trans. Speech Audio Process. 3 5 1995 325 333
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 325-333
    • Smits, R.1    Yegnanarayana, B.2
  • 34
    • 0016129045 scopus 로고
    • Determination of the instant of glottal closure from the speech wave
    • H. Strube Determination of the instant of glottal closure from the speech wave J. Acoust. Soc. Amer. 56 5 1974 1625 1629
    • (1974) J. Acoust. Soc. Amer. , vol.56 , Issue.5 , pp. 1625-1629
    • Strube, H.1
  • 35
    • 0035127703 scopus 로고    scopus 로고
    • Applying the harmonic plus noise model in concatenative speech synthesis
    • DOI 10.1109/89.890068
    • Y. Stylianou Applying the harmonic plus noise model in concatenative speech synthesis IEEE Trans. Speech Audio Process. 9 1 2001 21 29 (Pubitemid 32130684)
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.1 , pp. 21-29
    • Stylianou, Y.1
  • 36
    • 85135272025 scopus 로고    scopus 로고
    • Robust glottal closure detection using the wavelet transform
    • Budapest, Hungary, 1999
    • Tuan, V.N., d'Alessandro, C., 1999. Robust glottal closure detection using the wavelet transform. In: Proc. EUROSPEECH, Budapest, Hungary, 1999, pp. 2805-2808.
    • (1999) Proc. EUROSPEECH , pp. 2805-2808
    • Tuan, V.N.1    D'Alessandro, C.2
  • 37
    • 85010815133 scopus 로고
    • Voice transformation using PSOLA technique
    • San Francisco, CA, 1992
    • Valbret, H., Moulines, E., Tubach, J.P., 1992. Voice transformation using PSOLA technique. In: Proc. ICASSP, San Francisco, CA, 1992, Vol. 1, pp. 145-149.
    • (1992) Proc. ICASSP , vol.1 , pp. 145-149
    • Valbret, H.1    Moulines, E.2    Tubach, J.P.3
  • 38
    • 0027252181 scopus 로고
    • An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech
    • Minneapolis, MN, 1993
    • Verhelst, W., Roelands, M., 1993. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: Proc. ICASSP, Minneapolis, MN, 1993, Vol. 2, pp. 554-557.
    • (1993) Proc. ICASSP , vol.2 , pp. 554-557
    • Verhelst, W.1    Roelands, M.2
  • 39
    • 84863415660 scopus 로고    scopus 로고
    • Determining polarity of speech signals based on gradient of spurious glottal waveforms
    • Seattle, WA, 1998
    • Wen Ding, Campbell, N., 1998. Determining polarity of speech signals based on gradient of spurious glottal waveforms. In: Proc. ICASSP, Seattle, WA, 1998, Vol. 2, pp. 857-860.
    • (1998) Proc. ICASSP , vol.2 , pp. 857-860
    • Ding, W.1    Campbell, N.2
  • 40
    • 0028997020 scopus 로고
    • A robust method for determining instants of major excitations in voiced speech
    • Detroit, MI, 1995
    • Yegnanarayana, B., Smits, R., 1995. A robust method for determining instants of major excitations in voiced speech. In: Proc. ICASSP, Detroit, MI, 1995, pp. 776-779.
    • (1995) Proc. ICASSP , pp. 776-779
    • Yegnanarayana, B.1    Smits, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.