SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 1405-1409

Audio-to-text alignment for speech recognition with very limited resources

(3) Anguera, Xavier a Luque, Jordi a Gracia, Ciro a,b

a TELEFONICA RESEARCH (Spain)

b UNIVERSITAT POMPEU FABRA (Spain)

Author keywords

Asr; Low resources; Phonetic alignment; Text to speech alignment

Indexed keywords

AUDIO RECORDINGS; CHARACTER RECOGNITION; DYNAMIC PROGRAMMING; EXPERIMENTS; LINGUISTICS; SPEECH; SPEECH COMMUNICATION;

ASR; LOW-RESOURCES; NORMALIZED TEXTS; PHONETIC ALIGNMENT; SPEECH RECOGNIZER; TARGET LANGUAGE; TEXT TO SPEECH; TRAINING DATABASE;

SPEECH RECOGNITION;

EID: 84910072484 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (32)

References (25)

1
- 80155162494
- Spoken book alignment using WFSTS
- D. Caseiro, H. Meinedo, A. Serralheiro, I. Trancoso, and J. a. Neto, "Spoken Book Alignment using WFSTS, " Proc. of the second international conference on Human Language Technology Research, pp. 3-5, 2002.
- (2002) Proc. of the Second International Conference on Human Language Technology Research , pp. 3-5
- Caseiro, D.¹ Meinedo, H.² Serralheiro, A.³ Trancoso, I.⁴ Neto, J.A.⁵

2
- 80155146584
- Automatic synchronization of electronic and audio books via TTS alignment and silence filtering
- X. Anguera, N. Perez, A. Urruela, and N. Oliver, "Automatic Synchronization of Electronic and Audio Books via TTS Alignment and Silence Filtering, " in Proc. ICME, 2011.
- (2011) Proc. ICME
- Anguera, X.¹ Perez, N.² Urruela, A.³ Oliver, N.⁴

3
- 79956282392
- Segmentation of monologues in audio books for building synthetic voices
- K. Prahallad and A. W. Black, "Segmentation of Monologues in Audio Books for Building Synthetic Voices, " Trans. Audio, Speech and Language Processing, vol. 19, no. 5, pp. 1444-1449, 2011.
- (2011) Trans. Audio, Speech and Language Processing , vol.19 , Issue.5 , pp. 1444-1449
- Prahallad, K.¹ Black, A.W.²

4
- 34547521678
- Automatic alignment and error correction of human generated transcripts for long speech recordings
- T. J. Hazen, "Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings, " in Proc. Inter Speech, 2006, pp. 1606-1609.
- (2006) Proc. Inter Speech , pp. 1606-1609
- Hazen, T.J.¹

5
- 0343950213
- Improving acoustic models by watching television
- Carnegie Mellon University, Tech. Rep
- M. J.Witbrock and A. G. Hauptmann, "Improving Acoustic Models by Watching Television, " Technical Report CMU-CS-98-110, Carnegie Mellon University, Tech. Rep., 1998.
- (1998) Technical Report CMU-CS-98-110
- Witbrock, M.J.¹ Hauptmann, A.G.²

6
- 53149086681
- Using prompts to produce quality corpus for training automatic speech recognition systems
- B. Lecouteux and G. Linar, "Using prompts to produce quality corpus for training automatic speech recognition systems, " in Proc. The 14th IEEE Mediterranean Electrotechnical Conference, 2008, pp. 841-846.
- (2008) Proc the 14th IEEE Mediterranean Electrotechnical Conference , pp. 841-846
- Lecouteux, B.¹ Linar, G.²

7
- 46449097482
- Alignment of speech to highly imperfect text transcriptions
- A. Haubold and J. R. Kender, "Alignment of Speech to Highly Imperfect Text Transcriptions, " in Proc. ICME, 2007.
- (2007) Proc. ICME
- Haubold, A.¹ Kender, J.R.²

8
- 84893599214
- Text spotting in large speech databases for under-resourced languages
- A. Buzo, H. Cucu, and C. Burileanu, "Text Spotting In Large Speech Databases For Under-Resourced Languages, " in Proc. Speech Technology and Human-Computer Dialogue (SpeD) Conference, no. 1, 2013.
- (2013) Proc. Speech Technology and Human-Computer Dialogue (SpeD) Conference , Issue.1
- Buzo, A.¹ Cucu, H.² Burileanu, C.³

9
- 84893644777
- Processing spoken lectures in resource-scarse environments
- C. J. van Heerden, P. de Villiers, E. Barnard, and M. H. Davel, "Processing spoken Lectures in Resource-Scarse Environments, " in Proc. Proceedings of the 22nd Annual Symposium of the Pattern Recognition Association of South Africa, 2011, pp. 138-143.
- (2011) Proc. Proceedings of the 22nd Annual Symposium of the Pattern Recognition Association of South Africa , pp. 138-143
- Van Heerden, C.J.¹ De Villiers, P.² Barnard, E.³ Davel, M.H.⁴

10
- 84865744412
- August
- M. H. Davel, C. V. Heerden, N. Kleynhans, and E. Barnard, "Efficient harvesting of Internet audio for resource-scarce ASR, " no. August, 2011, pp. 3153-3156.
- (2011) Efficient Harvesting of Internet Audio for Resource-scarce ASR , pp. 3153-3156
- Davel, M.H.¹ Heerden, C.V.² Kleynhans, N.³ Barnard, E.⁴

11
- 84910039499
- Automatic generation of hyperlinks between audio and transcript
- September
- J. Robert-Ribes and R. Mukhtar, "Automatic Generation of Hyperlinks Between Audio and Transcript, " in Proc. Eurospeech, vol. 1997, no. September, 1997, pp. 903-906.
- (1997) Proc. Eurospeech , vol.1997 , pp. 903-906
- Robert-Ribes, J.¹ Mukhtar, R.²

12
- 84885726863
- A recursive algorithm for the forced alignment of very long audio segments
- P. J. Moreno, C. Joerg, J.-m. Van Thong, and O. Glickman, "A Recursive Algorithm for the Forced Alignment of Very Long Audio Segments, " in Proc. ICSLP, 1998.
- (1998) Proc. ICSLP
- Moreno, P.J.¹ Joerg, C.² Van Thong, J.-M.³ Glickman, O.⁴

13
- 84906260292
- Text-to-speech alignment of long recordings using universal phone models
- August
- S. Hoffmann and B. Pfister, "Text-to-Speech Alignment of Long Recordings Using Universal Phone Models, " Proc. Inter Speech, no. August, pp. 1520-1524, 2013.
- (2013) Proc. Inter Speech , pp. 1520-1524
- Hoffmann, S.¹ Pfister, B.²

14
- 84906264108
- Technique for automatic sentence level alignment of long speech and transcripts
- August
- I. Ahmed, S. K. Kopparapu, T. C. S. Innovation, L. Mumbai, Y. Park, and T. West, "Technique for Automatic Sentence Level Alignment of Long Speech and Transcripts, " in Proc. Inter Speech, no. August, 2013, pp. 1516-1519.
- (2013) Proc. Inter Speech , pp. 1516-1519
- Ahmed, I.¹ Kopparapu, S.K.² Innovation, T.C.S.³ Mumbai, L.⁴ Park, Y.⁵ West, T.⁶

15
- 84865764419
- Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training
- August
- N. T. Vu, F. Kraus, and T. Schultz, "Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training, " in Proc. Inter Speech, no. August, 2011, pp. 3145-3148.
- (2011) Proc. Inter Speech , pp. 3145-3148
- Vu, N.T.¹ Kraus, F.² Schultz, T.³

16
- 0036460908
- Lightly supervised recognition for automatic alignment of large coherent speech recordings
- N. Braunschweiler, M. J. F. Gales, and S. Buchholz, "Lightly supervised recognition for automatic alignment of large coherent speech recordings, " Trans. Computer Speech and Language, vol. 16, no. 1, pp. 115-129, 2002.
- (2002) Trans. Computer Speech and Language , vol.16 , Issue.1 , pp. 115-129
- Braunschweiler, N.¹ Gales, M.J.F.² Buchholz, S.³

17
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlícek, T. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi Speech Recognition Toolkit, " in Proc ASRU, 2011.
- (2011) Proc ASRU
- Povey, D.¹ Ghoshal, A.² Burget, L.³ Glembek, O.⁴ Goel, N.⁵ Hannemann, M.⁶ Motlícek, P.⁷ Qian, T.⁸ Schwarz, P.⁹ Silovsky, J.¹⁰ Stemmer, G.¹¹ Vesely, K.¹²

18
- 51849124558
- [Online]
- "SoundTouch Audio Processing Library." [Online]. Available: Http://www.surina.net/soundtouch.
- SoundTouch Audio Processing Library

19
- 84906274473
- An open-source state-of-the-art toolbox for broadcast news Diarization
- [Online]
- M. Rouvier, G. Dupuy, P. Gay, E. Khoury, T. Merlin, and S. Meignier, "An Open-source State-of-the-art Toolbox for Broadcast News Diarization, " in Proc. Inter Speech, 2013. [Online]. Available: Http://www-lium.univ-lemans.fr/diarization.
- (2013) Proc. Inter Speech
- Rouvier, M.¹ Dupuy, G.² Gay, P.³ Khoury, E.⁴ Merlin, T.⁵ Meignier, S.⁶

20
- 77949394249
- Ph.D. dissertation, [Online]
- P. Schwarz, "Phoneme Recognition Based on Long Temporal Context, " Ph.D. dissertation, 2008. [Online]. Available: Http://speech.fit.vutbr.cz/software/phoneme-recognizerbased-long-temporal-context.
- (2008) Phoneme Recognition Based on Long Temporal Context
- Schwarz, P.¹

21
- 85009230817
- Grapheme based speech recognition
- M. Killer, S. Stüker, and T. Schultz, "Grapheme Based Speech Recognition, " in Eurospeech, 2003, pp. 3141-3144.
- (2003) Eurospeech , pp. 3141-3144
- Killer, M.¹ Stüker, S.² Schultz, T.³

22
- 78049527800
- The Cere voice characterful speech synthesiser SDK
- Newcastle
- M. P. Aylett and C. J. Pidcock, "The CereVoice Characterful Speech Synthesiser SDK, " in Proc. AISB, Newcastle, 2007, pp. 174-178.
- (2007) Proc. AISB , pp. 174-178
- Aylett, M.P.¹ Pidcock, C.J.²

23
- 84976375912
- CEUDEX: A data base oriented to context-dependent units training in Spanish for continuous speech recognition
- September
- C. de la Torre, L. Gernández-Gómez, and D. Tapias, "CEUDEX: A Data Base oriented to Context-Dependent Units Training in Spanish for Continuous Speech Recognition, " in Proc. Eurospeech, no. September, 1995, pp. 845-848.
- (1995) N Proc. Eurospeech , pp. 845-848
- De La Torre, C.¹ Gernández-Gómez, L.² Tapias, D.³

24
- 84910096663
- [Online]
- "EL QUIJOTE - IV Centenario Audiobook." [Online]. Availablehttp://www.quijote.es/IVCentenarioAudioLibro.php.
- EL QUIJOTE - IV Centenario Audiobook

25
- 84910054632
- [Online]
- "Canal Parlament - Parlament de Catalunya." [Online]. Available: Http://www.parlament.cat/web/actualitat/canal-parlament.
- Canal Parlament - Parlament de Catalunya

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.