메뉴 건너뛰기




Volumn 8403 LNCS, Issue PART 1, 2014, Pages 102-112

Improved text extraction from pdf documents for large-scale natural language processing

Author keywords

noisy text processing; parallel corpora; text normalization

Indexed keywords

COMPUTATIONAL LINGUISTICS; COMPUTER GRAPHICS; TEXT PROCESSING; TOOLS;

EID: 84958521669     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-54906-9_9     Document Type: Conference Paper
Times cited : (8)

References (10)
  • 1
    • 70450248509 scopus 로고    scopus 로고
    • Graphwrap: A system for interactive wrapping of pdf documents using graph matching techniques
    • Hassan, T.: Graphwrap: A system for interactive wrapping of pdf documents using graph matching techniques. In: ACM Symposium on Document Engineering, pp. 247-248 (2009)
    • (2009) ACM Symposium on Document Engineering , pp. 247-248
    • Hassan, T.1
  • 2
  • 3
    • 84982842007 scopus 로고    scopus 로고
    • Kenlm: Faster and smaller language model queries
    • Association for Computational Linguistics, Edinburgh July
    • Heafield, K.: Kenlm: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187-197. Association for Computational Linguistics, Edinburgh (July 2011), http://www.aclweb.org/anthology/W11-2123
    • (2011) Proceedings of the Sixth Workshop on Statistical Machine Translation , pp. 187-197
    • Heafield, K.1
  • 7
  • 8
    • 0345376175 scopus 로고    scopus 로고
    • The Web as a parallel corpus
    • special Issue on the Web as Corpus
    • Resnik, P., Smith, N.A.: The Web as a parallel corpus. Computational Linguistics 29(3), 349-380 (2003); special Issue on the Web as Corpus
    • (2003) Computational Linguistics , vol.29 , Issue.3 , pp. 349-380
    • Resnik, P.1    Smith, N.A.2
  • 9
    • 84876815126 scopus 로고    scopus 로고
    • Efficient discrimination between closely related languages
    • The COLING 2012 Organizing Committee, Mumbai, India
    • Tiedemann, J., Ljubešič, N.: Efficient discrimination between closely related languages. In: Proceedings of COLING 2012, pp. 2619-2634. The COLING 2012 Organizing Committee, Mumbai, India (2012), http://www.aclweb.org/anthology/C12-1160
    • (2012) Proceedings of COLING 2012 , pp. 2619-2634
    • Tiedemann, J.1    Ljubešič, N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.