메뉴 건너뛰기




Volumn 19, Issue 1, 2013, Pages 61-93

A fast and flexible architecture for very large word n-gram datasets

Author keywords

[No Author keywords available]

Indexed keywords

BINARY FORMAT; DATA SETS; EXPERIMENTAL EVALUATION; FLEXIBLE ARCHITECTURES; LANGUAGE MODEL; LOSSLESS COMPRESSION; MEMORY REQUIREMENTS; MEMORY USE; MEMORY UTILIZATION; N-GRAM MODELS; PARTIAL DATA; RAPID ENCODING; REGULAR EXPRESSIONS; STREAMING TEXTS;

EID: 84870484286     PISSN: 13513249     EISSN: 14698110     Source Type: Journal    
DOI: 10.1017/S1351324911000349     Document Type: Article
Times cited : (7)

References (45)
  • 1
    • 0030781644 scopus 로고    scopus 로고
    • Fast algorithms for sorting and searching strings
    • New Orleans, LA, USA Philadelphia PA USA: Society for Industrial and Applied Mathematics
    • Bentley, J. L., and Sedgewick, R. 1997. Fast algorithms for sorting and searching strings. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'97), New Orleans, LA, USA, pp. 360-9. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.
    • (1997) Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'97) , pp. 360-369
    • Bentley, J.L.1    Sedgewick, R.2
  • 2
    • 0002534190 scopus 로고    scopus 로고
    • Ternary search trees
    • April 01 Accessed 8 April 2011
    • Bentley, J. L., and Sedgewick, R. 1998. Ternary search trees. Dr Dobb's Journal, April 01, http://drdobbs.com/windows/184410528. Accessed 8 April 2011.
    • (1998) Dr Dobb's Journal
    • Bentley, J.L.1    Sedgewick, R.2
  • 4
    • 0014814325 scopus 로고
    • Space/time trade-offs in hash coding with allowable errors
    • Bloom, B. H. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7): 422-6.
    • (1970) Communications of the ACM , vol.13 , Issue.7 , pp. 422-426
    • Bloom, B.H.1
  • 5
    • 47749103248 scopus 로고    scopus 로고
    • Philadelphia, PA, USA: Linguistic Data Consortium
    • Brants, T., and Franz, A. 2006. Web 1T 5-gram Version 1. Philadelphia, PA, USA: Linguistic Data Consortium. http://www.ldc.upenn.edu/Catalog/ CatalogEntry.jsp?catalogId=LDC2006T13
    • (2006) Web 1T 5-gram Version 1
    • Brants, T.1    Franz, A.2
  • 9
    • 1842473700 scopus 로고    scopus 로고
    • The Bloomier filter: An efficient data structure for static support lookup tables
    • New Orleans, LA, USA Philadelphia PA: Society for Industrial and Applied Mathematics
    • Chazelle, B., Kilian, J., Rubinfeld R., and Tal, A. 2004. The Bloomier filter: an efficient data structure for static support lookup tables. Proceedings of the 15th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2004), New Orleans, LA, USA, pp. 30-9. Philadelphia, PA: Society for Industrial and Applied Mathematics.
    • (2004) Proceedings of the 15th Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2004) , pp. 30-39
    • Chazelle, B.1    Kilian, J.2    Rubinfeld, R.3    Tal, A.4
  • 12
    • 85115691961 scopus 로고    scopus 로고
    • Google Web 1T 5-grams made easy (but not for the computer)
    • Los Angeles, CA, USA Stroudsburg PA: Association for Computational Linguistics
    • Evert, S. 2010. Google Web 1T 5-grams made easy (but not for the computer). In Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop (WAC-6), Los Angeles, CA, USA, pp. 32-40. Stroudsburg, PA: Association for Computational Linguistics.
    • (2010) Proceedings of the NAACL HLT 2010 Sixth Web As Corpus Workshop (WAC-6) , pp. 32-40
    • Evert, S.1
  • 15
    • 78651323072 scopus 로고    scopus 로고
    • The effects of learner errors on the development of a collocation detection tool
    • Toronto, Canada New York NY: Association for Computing Machinery
    • Futagi, Y. 2010. The effects of learner errors on the development of a collocation detection tool. In Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data (AND '10), Toronto, Canada, pp. 27-34. New York, NY: Association for Computing Machinery.
    • (2010) Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data (AND '10) , pp. 27-34
    • Futagi, Y.1
  • 18
    • 19944407315 scopus 로고    scopus 로고
    • Philadelphia, PA, USA: Linguistic Data Consortium
    • Graff, D., and Cieri, C. 2003. English Gigaword. Philadelphia, PA, USA: Linguistic Data Consortium. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2003T05
    • (2003) English Gigaword
    • Graff, D.1    Cieri, C.2
  • 19
    • 84870565442 scopus 로고    scopus 로고
    • Minimal perfect hash rank: Compact storage of large n-gram language models
    • New York NY: Association for Computing Machinery
    • Guthrie, D., and Hepple, M. 2010a. Minimal perfect hash rank: compact storage of large n-gram language models. In Proceedings of SIGIR 2010 Web N-gram Workshop, Geneva, Switzerland, pp. 21-9. New York, NY: Association for Computing Machinery.
    • (2010) Proceedings of SIGIR 2010 Web N-gram Workshop, Geneva, Switzerland , pp. 21-29
    • Guthrie, D.1    Hepple, M.2
  • 20
    • 80053289945 scopus 로고    scopus 로고
    • Storing the Web in memory: Space efficient language models with constant time retrieval
    • Boston, MA, USA Stroudsburg PA: Association for Computational Linguistics
    • Guthrie, D., and Hepple, M. 2010b. Storing the Web in memory: space efficient language models with constant time retrieval. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), Boston, MA, USA, pp. 262-72. Stroudsburg, PA: Association for Computational Linguistics.
    • (2010) Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010) , pp. 262-272
    • Guthrie, D.1    Hepple, M.2
  • 23
    • 84982842007 scopus 로고    scopus 로고
    • KenLM: Faster and smaller language model queries
    • Edinburgh, Scotland, UK. Stroudsburg, PA: Association for Computational Linguistics
    • Heafield, K. 2011. KenLM: faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 187-97, Edinburgh, Scotland, UK. Stroudsburg, PA: Association for Computational Linguistics.
    • (2011) Proceedings of the 6th Workshop on Statistical Machine Translation , pp. 187-197
    • Heafield, K.1
  • 31
    • 0141703229 scopus 로고    scopus 로고
    • Lossless compression of language model structure and word identifiers
    • Hong Kong. USA Piscataway NJstitute of Electrical and Electronics Engineers
    • Raj, B., and Whittaker, E. W. D. 2003. Lossless compression of language model structure and word identifiers. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'03), Hong Kong. USA, vol. 1, pp. 388-99. Piscataway, NJ: Institute of Electrical and Electronics Engineers.
    • (2003) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'03) , vol.1 , pp. 388-399
    • Raj, B.1    Whittaker, E.W.D.2
  • 32
    • 0004021226 scopus 로고    scopus 로고
    • PhD thesis Technical Report. CMU-CS-96-143, Carnegie Mellon University, Pittsburgh, PA, USA
    • Ravishankar, M. 1996. Efficient Algorithms for Speech Recognition. PhD thesis, Technical Report. CMU-CS-96-143, Carnegie Mellon University, Pittsburgh, PA, USA.
    • (1996) Efficient Algorithms for Speech Recognition
    • Ravishankar, M.1
  • 33
    • 80053394312 scopus 로고    scopus 로고
    • Linguistic knowledge discovery tool: Very large n-gram database search with arbitrary wildcards
    • Manchester, UK Stroudsburg PA: Association for Computational Linguistics
    • Sekine, S. 2008. Linguistic knowledge discovery tool: very large n-gram database search with arbitrary wildcards. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08), Manchester, UK, pp. 181-4. Stroudsburg, PA: Association for Computational Linguistics.
    • (2008) Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08) , pp. 181-184
    • Sekine, S.1
  • 34
    • 84870544057 scopus 로고    scopus 로고
    • N-gram search engine with patterns combining token, POS, chunk and NE information
    • Valletta, Malta Paris France: European Language Resources Association
    • Sekine, S., and Dalwani, K., 2010. N-gram search engine with patterns combining token, POS, chunk and NE information. In Proceedings of Language Resource and Evaluation Conference (LREC-2010), Valletta, Malta, pp. 2682-6. Paris, France: European Language Resources Association.
    • (2010) Proceedings of Language Resource and Evaluation Conference (LREC-2010) , pp. 2682-2686
    • Sekine, S.1    Dalwani, K.2
  • 35
    • 84891308106 scopus 로고    scopus 로고
    • SRILM-An extensible language modeling toolkit
    • Denver, CO, USA International Speech Communication Association ISCA archive
    • Stolcke, A. 2002. SRILM-An extensible language modeling toolkit. In Proceedings of 7th International Conference on Spoken Language Processing (ICSLP2002-INTERSPEECH 2002), Denver, CO, USA, pp. 901-4. International Speech Communication Association, ISCA archive, http://www.isca-speech.org/archive/ icslp02
    • (2002) Proceedings of 7th International Conference on Spoken Language Processing (ICSLP2002-INTERSPEECH 2002) , pp. 901-904
    • Stolcke, A.1
  • 38
    • 84860511413 scopus 로고    scopus 로고
    • Randomised language modelling for statistical machine translation
    • Prague, Czech Republic Stroudsburg PA: Association for Computational Linguistics
    • Talbot, D., and Osborne, M. 2007a. Randomised language modelling for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 512-19. Stroudsburg, PA: Association for Computational Linguistics.
    • (2007) Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007) , pp. 512-519
    • Talbot, D.1    Osborne, M.2
  • 41
    • 70349227599 scopus 로고    scopus 로고
    • Efficacy of a constantly adaptive language modeling technique for web-scale applications
    • Taipei, Taiwan Piscataway NJ: Institute of Electrical and Electronics Engineers
    • Wang, K., and Li, X. 2009. Efficacy of a constantly adaptive language modeling technique for web-scale applications. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'2009), Taipei, Taiwan, pp. 4733-6. Piscataway, NJ: Institute of Electrical and Electronics Engineers.
    • (2009) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'2009) , pp. 4733-4736
    • Wang, K.1    Li, X.2
  • 43
    • 85009115852 scopus 로고    scopus 로고
    • Quantization-based language model compression
    • Aalborg, Denmark International Speech Communication Association ISCA archive
    • Whittaker, E. W. D., and Raj, B. 2001. Quantization-based language model compression. In Proceedings of 7th European Conference on Speech Communication and Technology (EUROSPEECH'01), Aalborg, Denmark, pp. 33-6. International Speech Communication Association, ISCA archive, http://www.isca-speech.org/archive/ eurospeech 2001
    • (2001) Proceedings of 7th European Conference on Speech Communication and Technology (EUROSPEECH'01) , pp. 33-36
    • Whittaker, E.W.D.1    Raj, B.2
  • 45
    • 84858386160 scopus 로고    scopus 로고
    • Efficient phrase-table representation for machine translation with applications to online MT and speech translation
    • Rochester, NY, USA Stroudsburg PA: Association for Computational Linguistics
    • Zens, R., and Ney, H. 2007. Efficient phrase-table representation for machine translation with applications to online MT and speech translation. In Proceedings of The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT-2007), Rochester, NY, USA, pp. 492-9. Stroudsburg, PA: Association for Computational Linguistics.
    • (2007) Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT-2007) , pp. 492-499
    • Zens, R.1    Ney, H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.