SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 19, Issue 2, 1996, Pages 161-176

Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR

(3) Pols, Louis C W a Wang, Xue a Ten Bosch, Louis F M b

a UNIVERSITY OF AMSTERDAM (Netherlands)

b NUANCE COMMUNICATIONS (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DATABASE SYSTEMS; MATHEMATICAL MODELS; PROBABILITY; SIGNAL PROCESSING;

ANALYSIS OF VARIANCE; PROBABILISTIC ACOUSTIC; RECOGNIZER; TIMIT DATABASE;

SPEECH RECOGNITION;

EID: 0030205397 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/0167-6393(96)00033-7 Document Type: Article

Times cited : (24)

References (42)

1
- 0030142722
- Towards increasing speech recognition error rates
- H. Bourlard, H. Hermansky and N. Morgan (1996), "Towards increasing speech recognition error rates", Speech Communication, Vol. 18, No. 3, pp. 205-231.
- (1996) Speech Communication , vol.18 , Issue.3 , pp. 205-231
- Bourlard, H.¹ Hermansky, H.² Morgan, N.³

2
- 0027646354
- Automatic segmentation and labeling of speech based on Hidden Markov Models
- F. Brugnara, D. Falavigna and M. Omologo (1993), "Automatic segmentation and labeling of speech based on Hidden Markov Models", Speech Communication, Vol. 12, No. 4, pp. 357-370.
- (1993) Speech Communication , vol.12 , Issue.4 , pp. 357-370
- Brugnara, F.¹ Falavigna, D.² Omologo, M.³

3
- 1542364607
- Sex, dialects, and reduction
- Banff
- D. Byrd (1992), "Sex, dialects, and reduction", Proc. ICSLP'92, Banff, Vol. 1, pp. 827-830.
- (1992) Proc. ICSLP'92 , vol.1 , pp. 827-830
- Byrd, D.¹

4
- 0000067521
- Vowel length variation as a function of the voicing of the consonant environment
- M. Chen (1970), "Vowel length variation as a function of the voicing of the consonant environment", Phonetica, Vol. 22, pp. 129-159.
- (1970) Phonetica , vol.22 , pp. 129-159
- Chen, M.¹

5
- 0020180117
- Segmental durations in connected-speech signals: Preliminary results
- T.H. Crystal and A.S. House (1982), "Segmental durations in connected-speech signals: Preliminary results", J. Acoust. Soc. Amer., Vol. 72, pp. 705-716.
- (1982) J. Acoust. Soc. Amer. , vol.72 , pp. 705-716
- Crystal, T.H.¹ House, A.S.²

6
- 0023921973
- Segmental durations in connected-speech signals: Current results
- T.H. Crystal and A.S. House (1988a), "Segmental durations in connected-speech signals: Current results", J. Acoust. Soc. Amer., Vol. 83, pp. 1553-1573.
- (1988) J. Acoust. Soc. Amer. , vol.83 , pp. 1553-1573
- Crystal, T.H.¹ House, A.S.²

7
- 0023950375
- Segmental durations in connected-speech signals: Syllabic stress
- T.H. Crystal and A.S. House (1988b), "Segmental durations in connected-speech signals: Syllabic stress", J. Acoust. Soc. Amer., Vol. 83, pp. 1574-1585.
- (1988) J. Acoust. Soc. Amer. , vol.83 , pp. 1574-1585
- Crystal, T.H.¹ House, A.S.²

8
- 0025368070
- Articulation rate and the duration of syllables and stress groups in connected speech
- T.H. Crystal and A.S. House (1990), "Articulation rate and the duration of syllables and stress groups in connected speech", J. Acoust. Soc. Amer., Vol. 88, pp. 101-112.
- (1990) J. Acoust. Soc. Amer. , vol.88 , pp. 101-112
- Crystal, T.H.¹ House, A.S.²

9
- 0012217979
- Ph.D. Thesis, University of Utrecht
- W. Eefting (1991), Timing in talking. Tempo variation in production and perception, Ph.D. Thesis, University of Utrecht.
- (1991) Timing in Talking. Tempo Variation in Production and Perception
- Eefting, W.¹

10
- 0003548585
- NTIS order number PB01-100354, now available from LDC
- J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett and N.L. Dahlgren (1993), The DARPA TIMIT acousticphonetic continuous speech corpus CDROM, NTIS order number PB01-100354, now available from LDC.
- (1993) The DARPa TIMIT Acousticphonetic Continuous Speech Corpus CDROM
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgren, N.L.⁶

11
- 0028529872
- Speaker-independent continuous speech dictation
- J.L. Gauvain, L.F. Lamel, G. Adda and M. Adda-Decker (1994), "Speaker-independent continuous speech dictation", Speech Communication, Vol. 15, Nos, 1-2, pp. 21-37.
- (1994) Speech Communication , vol.15 , Issue.1-2 , pp. 21-37
- Gauvain, J.L.¹ Lamel, L.F.² Adda, G.³ Adda-Decker, M.⁴

12
- 0011450090
- Constraining the duration variance in HMM-based connected-speech recognition
- Berlin
- M.M. Hochberg and H.F. Silverman (1993), "Constraining the duration variance in HMM-based connected-speech recognition", Proc. Eurospeech'93, Berlin, Vol. 1, pp. 323-326.
- (1993) Proc. Eurospeech'93 , vol.1 , pp. 323-326
- Hochberg, M.M.¹ Silverman, H.F.²

13
- 0346315521
- Using relative duration in large vocabulary speech recognition
- Berlin
- M. Jones and P.C. Woodland (1993), "Using relative duration in large vocabulary speech recognition", Proc. Eurospeech'93, Berlin, Vol. 1, pp. 311-314.
- (1993) Proc. Eurospeech'93 , vol.1 , pp. 311-314
- Jones, M.¹ Woodland, P.C.²

14
- 0028413177
- Phonetic analyses of word and segment variation using the TIMIT corpus of American English
- P.A. Keating, D. Byrd, E. Flemming and Y. Todaka (1994), "Phonetic analyses of word and segment variation using the TIMIT corpus of American English", Speech Communication, Vol. 14, No. 2, pp. 131-142.
- (1994) Speech Communication , vol.14 , Issue.2 , pp. 131-142
- Keating, P.A.¹ Byrd, D.² Flemming, E.³ Todaka, Y.⁴

15
- 0023407575
- Review of text-to-speech conversion for English
- D.H. Klatt (1987), "Review of text-to-speech conversion for English", J. Acoust. Soc. Amer., Vol. 82, pp. 737-793.
- (1987) J. Acoust. Soc. Amer. , vol.82 , pp. 737-793
- Klatt, D.H.¹

16
- 85135374294
- Identifying non-linguistic speech features
- Berlin
- L.F. Lamel and J.L. Gauvain (1993a), "Identifying non-linguistic speech features", Proc. Eurospeech'93, Berlin, Vol. 1, pp. 23-30.
- (1993) Proc. Eurospeech'93 , vol.1 , pp. 23-30
- Lamel, L.F.¹ Gauvain, J.L.²

17
- 85135371588
- High performance speaker-independent phone recognition using CDHMM
- Berlin
- L.F. Lamel and J.L. Gauvain (1993b), "High performance speaker-independent phone recognition using CDHMM", Proc. Eurospeech'93, Berlin, Vol. 1, pp. 121-124.
- (1993) Proc. Eurospeech'93 , vol.1 , pp. 121-124
- Lamel, L.F.¹ Gauvain, J.L.²

18
- 0002583871
- Speech database development: Design and analysis of the acoustic-phonetic corpus
- L. Lamel, R. Kassel and S. Seneff (1986), "Speech database development: Design and analysis of the acoustic-phonetic corpus", Proc. DARPA Speech Recognition Workshop, pp. 100-109.
- (1986) Proc. DARPA Speech Recognition Workshop , pp. 100-109
- Lamel, L.¹ Kassel, R.² Seneff, S.³

19
- 0024768209
- Speaker-independent phone recognition using Hidden Markov Models
- K.-F. Lee and H.-W. Hon (1989), "Speaker-independent phone recognition using Hidden Markov Models", IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-37, pp. 1641-1648.
- (1989) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-37 , pp. 1641-1648
- Lee, K.-F.¹ Hon, H.-W.²

20
- 0022685753
- Continuously variable duration hidden Markov models for automatic speech recognition
- S.E. Levinson (1986), "Continuously variable duration hidden Markov models for automatic speech recognition", Computer Speech and Language, Vol. 1, pp. 29-45.
- (1986) Computer Speech and Language , vol.1 , pp. 29-45
- Levinson, S.E.¹

21
- 0028404665
- High accuracy phone recognition using context clustering and quasi-triphone models
- A. Ljolje (1994), "High accuracy phone recognition using context clustering and quasi-triphone models", Computer Speech and Language, Vol. 8, pp. 129-151.
- (1994) Computer Speech and Language , vol.8 , pp. 129-151
- Ljolje, A.¹

22
- 0026392350
- Automatic segmentation and labeling of speech
- Toronto
- A. Ljolje and M.D. Riley (1991), "Automatic segmentation and labeling of speech", Proc. Internat. Conf. Acoust. Speech Signal Process.-91, Toronto, pp. 473-476.
- (1991) Proc. Internat. Conf. Acoust. Speech Signal Process.-91 , pp. 473-476
- Ljolje, A.¹ Riley, M.D.²

23
- 0003423254
- Ph.D. Thesis, University of Utrecht
- S.G. Nooteboom (1970), Production and perception of vowel duration, Ph.D. Thesis, University of Utrecht.
- (1970) Production and Perception of Vowel Duration
- Nooteboom, S.G.¹

24
- 0015716065
- The effect of position in utterance on speech segment duration in English
- O.K. Oller (1973), "The effect of position in utterance on speech segment duration in English", J. Acoust. Soc. Amer., Vol. 54, pp. 1235-1247.
- (1973) J. Acoust. Soc. Amer. , vol.54 , pp. 1235-1247
- Oller, O.K.¹

25
- 30244562943
- Speech corpora and performance assessment in the DARPA SLS program
- Kobe
- D. Pallett (1990), "Speech corpora and performance assessment in the DARPA SLS program", Proc. ICSLP'90, Kobe, pp. 24.3.1-24.3.4.
- (1990) Proc. ICSLP'90 , pp. 2431-2434
- Pallett, D.¹

26
- 0012316245
- 1994 Benchmark tests for the ARPA Spoken Language Program
- Austin, TX
- D.S. Pallett, J.G. Fiscus, W.M. Fisher, J.S. Garofolo, B.A. Lund, A. Martin and M.A. Przybocki (1995), "1994 Benchmark tests for the ARPA Spoken Language Program", Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, pp. 5-36.
- (1995) Proc. ARPA Spoken Language Systems Technology Workshop , pp. 5-36
- Pallett, D.S.¹ Fiscus, J.G.² Fisher, W.M.³ Garofolo, J.S.⁴ Lund, B.A.⁵ Martin, A.⁶ Przybocki, M.A.⁷

27
- 84953657625
- Duration of syllable nuclei in English"
- G.E. Peterson and I. Lehiste (1960), "Duration of syllable nuclei in English", J. Acoust. Soc. Amer., Vol. 32, pp. 693-703.
- (1960) J. Acoust. Soc. Amer. , vol.32 , pp. 693-703
- Peterson, G.E.¹ Lehiste, I.²

28
- 0040813706
- Several improvements to a recurrent error propagation network phone recognition system
- Cambridge Univ., CUED/F-INFENG/TR.82
- T. Robinson (1991), Several improvements to a recurrent error propagation network phone recognition system, Technical report, Cambridge Univ., CUED/F-INFENG/TR.82.
- (1991) Technical Report
- Robinson, T.¹

29
- 0028996878
- Analysis of acoustic-phonetic variations in fluent speech using TIMIT
- Detroit
- D.X. Sun and L. Deng (1995), "Analysis of acoustic-phonetic variations in fluent speech using TIMIT", Proc. Internat. Conf. Acoust. Speech Signal Process.-95, Detroit, pp. 201-204.
- (1995) Proc. Internat. Conf. Acoust. Speech Signal Process.-95 , pp. 201-204
- Sun, D.X.¹ Deng, L.²

30
- 0016542927
- Vowel duration in American English
- N. Umeda (1975), "Vowel duration in American English", J. Acoust. Soc. Amer., Vol. 58, pp. 434-445.
- (1975) J. Acoust. Soc. Amer. , vol.58 , pp. 434-445
- Umeda, N.¹

31
- 84937305188
- Mouton/De Gruyter, Berlin
- V.J. Van Heuven and L.C.W. Pols, Eds. (1993), Analysis and Synthesis of Speech. Strategic Research towards high-quality Text-to-speech Generation (Mouton/De Gruyter, Berlin).
- (1993) Analysis and Synthesis of Speech. Strategic Research Towards High-Quality Text-to-Speech Generation
- Van Heuven, V.J.¹ Pols, L.C.W.²

32
- 0026963758
- Contextual effects on vowel duration
- J.P.H. Van Santen (1992), "Contextual effects on vowel duration", Speech Communication, Vol. 11, No. 6, pp. 513-546.
- (1992) Speech Communication , vol.11 , Issue.6 , pp. 513-546
- Van Santen, J.P.H.¹

33
- 0343389412
- Spectro-temporal features of vowel segments
- Ph.D. thesis, University of Amsterdam
- R.J.J.H. Van Son (1993), Spectro-temporal features of vowel segments, Studies in Language and Language Use 3, Ph.D. thesis, University of Amsterdam.
- (1993) Studies in Language and Language Use , vol.3
- Van Son, R.J.J.H.¹

34
- 0347387926
- Fast automatic segmentation and labeling: Results on TIMIT and EUROM0
- Madrid
- A. Vorstermans, J.P. Martens and B. Van Coile (1995), "Fast automatic segmentation and labeling: Results on TIMIT and EUROM0", Proc. Eurospeech'95, Madrid, Vol. 2, pp. 1397-1400.
- (1995) Proc. Eurospeech'95 , vol.2 , pp. 1397-1400
- Vorstermans, A.¹ Martens, J.P.² Van Coile, B.³

35
- 30244529221
- Durationally constrained training of HMM without explicit state durational pdf
- X. Wang (1994), "Durationally constrained training of HMM without explicit state durational pdf", Proc. Inst. Phonetic Sciences Univ. of Amsterdam, Vol. 18, pp. 111-130.
- (1994) Proc. Inst. Phonetic Sciences Univ. of Amsterdam , vol.18 , pp. 111-130
- Wang, X.¹

36
- 30244569166
- Durational modelling in HMM-based speech recognition: Towards a justified measure
- A.J. Rubio and J.M. López, Eds., (Springer, Berlin)
- X. Wang (1995), "Durational modelling in HMM-based speech recognition: Towards a justified measure", in: A.J. Rubio and J.M. López, Eds., Speech Recognition and Coding. New Advances and Trends (Springer, Berlin), pp. 128-131.
- (1995) Speech Recognition and Coding. New Advances and Trends , pp. 128-131
- Wang, X.¹

37
- 30244528040
- Ph.D. thesis, University of Amsterdam
- X. Wang (1996), Duration modelling in HMM-based speech recognition, Ph.D. thesis, University of Amsterdam.
- (1996) Duration Modelling in HMM-based Speech Recognition
- Wang, X.¹

38
- 0003078259
- The HTK tied-state continuous speech recogniser
- Berlin
- P.C. Woodland and S.J. Young (1993), ́The HTK tied-state continuous speech recogniser", Proc. Eurospeech'93, Berlin, Vol. 3, pp. 2207-2210.
- (1993) Proc. Eurospeech'93 , vol.3 , pp. 2207-2210
- Woodland, P.C.¹ Young, S.J.²

39
- 0005415741
- Cambridge University Engineering Department, Speech Group, September 1992
- S.J. Young (1992), HTK Version 1.4: User, reference and programmer manual, Cambridge University Engineering Department, Speech Group, September 1992.
- (1992) HTK Version 1.4: User, Reference and Programmer Manual
- Young, S.J.¹

40
- 85135369802
- The use of state tying in continuous speech recognition
- Berlin
- S.J. Young and P.C. Woodland (1993), "The use of state tying in continuous speech recognition", Proc. Eurospeech'93, Berlin, Vol. 3, pp. 2203-2206.
- (1993) Proc. Eurospeech'93 , vol.3 , pp. 2203-2206
- Young, S.J.¹ Woodland, P.C.²

41
- 0028530231
- State clustering in hidden Markov model-based continuous speech recognition
- S.J. Young and P.C. Woodland (1994), "State clustering in hidden Markov model-based continuous speech recognition", Computer Speech and Language, Vol. 8, pp. 369-383.
- (1994) Computer Speech and Language , vol.8 , pp. 369-383
- Young, S.J.¹ Woodland, P.C.²

42
- 0025477640
- Speech database development at MIT: TIMIT and beyond
- V. Zue, S. Seneff and J. Glass (1990), "Speech database development at MIT: TIMIT and beyond", Speech Communication, Vol. 9, No. 4, pp. 351-356.
- (1990) Speech Communication , vol.9 , Issue.4 , pp. 351-356
- Zue, V.¹ Seneff, S.² Glass, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.