|
Volumn 8403 LNCS, Issue PART 1, 2014, Pages 102-112
|
Improved text extraction from pdf documents for large-scale natural language processing
|
Author keywords
noisy text processing; parallel corpora; text normalization
|
Indexed keywords
COMPUTATIONAL LINGUISTICS;
COMPUTER GRAPHICS;
TEXT PROCESSING;
TOOLS;
LINGUISTIC PROCESSING;
MULTILINGUAL DATABASE;
NATURAL LANGUAGE PROCESSING;
NOISY TEXT PROCESSING;
PARALLEL CORPORA;
POST-PROCESSING PROCEDURE;
STATISTICAL MACHINE TRANSLATION;
TEXT NORMALIZATIONS;
NATURAL LANGUAGE PROCESSING SYSTEMS;
|
EID: 84958521669
PISSN: 03029743
EISSN: 16113349
Source Type: Book Series
DOI: 10.1007/978-3-642-54906-9_9 Document Type: Conference Paper |
Times cited : (8)
|
References (10)
|