메뉴 건너뛰기




Volumn , Issue , 2010, Pages 391-400

On compressing the textual web

Author keywords

Burrows wheeler transform; Compressed (self )indexes; Lossless data compression; Text compression

Indexed keywords

BURROWS WHEELER TRANSFORM; COMPRESSION ALGORITHMS; DATA SETS; EXPERIMENTAL COMPARISON; FAST SCANNING; HUFFMAN; INVERTED LIST; KNOW-HOW; LARGE PARTS; LOSSLESS DATA COMPRESSION; MOVE-TO-FRONT; NEW TECHNOLOGIES; SOFTWARE DEVELOPER; STORAGE SOLUTIONS; TEXT COMPRESSIONS; WEB APPLICATION; WEB COLLECTIONS; WEB PAGE; WEB STRUCTURES;

EID: 77950946921     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1718487.1718536     Document Type: Conference Paper
Times cited : (43)

References (59)
  • 1
    • 77950934630 scopus 로고    scopus 로고
    • Apache Foundation. Hadoop. http://hadoop.apache.org/.
  • 4
    • 0037619265 scopus 로고    scopus 로고
    • Web search for a planet: The google cluster architecture
    • L. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2), 2003.
    • (2003) IEEE Micro , vol.23 , pp. 2
    • Barroso, L.1    Dean, J.2    Hölzle, U.3
  • 5
    • 35348849154 scopus 로고    scopus 로고
    • Scaling up all pairs similarity search
    • R. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, 131-140, 2007.
    • (2007) WWW , pp. 131-140
    • Bayardo, R.1    Ma, Y.2    Srikant, R.3
  • 6
    • 0035367637 scopus 로고    scopus 로고
    • Data compression with long repeated strings
    • J. Bentley and M. McIlroy. Data compression with long repeated strings. Information Sciences, 135(1-2):1-11, 2001.
    • (2001) Information Sciences , vol.135 , Issue.1-2 , pp. 1-11
    • Bentley, J.1    McIlroy, M.2
  • 7
    • 20744453675 scopus 로고    scopus 로고
    • Dictionaries using variable-length keys and data, with applications
    • D. Blandford and G. Blelloch. Dictionaries using variable-length keys and data, with applications. ACM-SIAM SODA, 1-10, 2005.
    • (2005) ACM-SIAM SODA , pp. 1-10
    • Blandford, D.1    Blelloch, G.2
  • 9
    • 61649107228 scopus 로고    scopus 로고
    • Permuting Web Graphs
    • P. Boldi, M. Santini, and S. Vigna. Permuting Web Graphs. WAW, 116-126, 2009.
    • (2009) WAW , pp. 116-126
    • Boldi, P.1    Santini, M.2    Vigna, S.3
  • 10
    • 19944376183 scopus 로고    scopus 로고
    • The webgraph framework I: Compression techniques
    • P. Boldi and S. Vigna. The webgraph framework I: compression techniques. WWW, 595-602, 2004.
    • (2004) WWW , pp. 595-602
    • Boldi, P.1    Vigna, S.2
  • 11
    • 77950926507 scopus 로고    scopus 로고
    • P. Boldi and S. Vigna. MG4J at TREC 2005. http://mg4j.dsi.unimi.it/.
    • (2005)
    • Boldi, P.1    Vigna, S.2
  • 14
    • 79956075292 scopus 로고    scopus 로고
    • Identifying and filtering near-duplicate documents
    • A. Broder. Identifying and filtering near-duplicate documents. CPM, LNCS 1848, 2000.
    • (2000) CPM, LNCS 1848
    • Broder, A.1
  • 16
    • 42549139837 scopus 로고    scopus 로고
    • A scalable pattern mining approach to web graph compression with communities
    • G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. WSDM, 95-106, 2008.
    • (2008) WSDM , pp. 95-106
    • Buehrer, G.1    Chellapilla, K.2
  • 17
    • 79960793459 scopus 로고    scopus 로고
    • Efficiency vs. Effectiveness in terabyte-scale information retrieval
    • S. Büttcher and C. Clarke. Efficiency vs. Effectiveness in terabyte-scale information retrieval. TREC, 2005.
    • (2005) TREC
    • Büttcher, S.1    Clarke, C.2
  • 18
    • 63449115477 scopus 로고    scopus 로고
    • Index compression is good, especially for random access
    • S. Büttcher and C. Clarke. Index compression is good, especially for random access. CIKM, 761-770, 2007.
    • (2007) CIKM , pp. 761-770
    • Büttcher, S.1    Clarke, C.2
  • 19
    • 0032631733 scopus 로고    scopus 로고
    • Cache-based compaction: A new technique for optimizing web-transfer
    • M. Chan and T. Woo. Cache-based compaction: A new technique for optimizing web-transfer. INFOCOM, 1999.
    • (1999) INFOCOM
    • Chan, M.1    Woo, T.2
  • 22
    • 38049015483 scopus 로고    scopus 로고
    • A fast and compact web graph representation
    • F. Claude and G. Navarro. A fast and compact web graph representation. SPIRE, LNCS 4726, 118-129, 2007.
    • (2007) SPIRE, LNCS 4726 , pp. 118-129
    • Claude, F.1    Navarro, G.2
  • 23
    • 77950936440 scopus 로고    scopus 로고
    • D. Cutting. Apache Lucene. http://lucene.apache.org/.
    • Cutting, D.1
  • 25
    • 4544259509 scopus 로고    scopus 로고
    • Locality-sensitive hashing scheme based on p-stable distributions
    • M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. ACM SoCG, 253-262, 2004.
    • (2004) ACM SoCG , pp. 253-262
    • Datar, M.1    Immorlica, N.2    Indyk, P.3    Mirrokni, V.4
  • 26
    • 85030321143 scopus 로고    scopus 로고
    • MapReduce: Simplified data processing on large clusters
    • J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
    • (2004) OSDI
    • Dean, J.1    Ghemawat, S.2
  • 27
    • 4544256811 scopus 로고    scopus 로고
    • Application-specific delta-encoding via resemblance detection
    • F. Douglis and A. Iyengar. Application-specific delta-encoding via resemblance detection. USENIX, 113-126, 2003.
    • (2003) USENIX , pp. 113-126
    • Douglis, F.1    Iyengar, A.2
  • 29
    • 33750716855 scopus 로고    scopus 로고
    • The engineering of a compression boosting library: Theory vs practice in BWT compression
    • P. Ferragina, R. Giancarlo, and G. Manzini. The engineering of a compression boosting library: Theory vs practice in BWT compression. ESA, LNCS 4168, 756-767, 2006.
    • (2006) ESA, LNCS 4168 , pp. 756-767
    • Ferragina, P.1    Giancarlo, R.2    Manzini, G.3
  • 32
    • 77950950499 scopus 로고    scopus 로고
    • P. Ferragina and G. Navarro. Pizza&Chili corpus home page. http://pizzachili.di.unipi.it/.
    • Ferragina, P.1    Navarro, G.2
  • 33
    • 70350408798 scopus 로고    scopus 로고
    • On optimally partitioning a text to improve its compression
    • P. Ferragina, I. Nitto, and R. Venturini. On optimally partitioning a text to improve its compression. ESA, LNCS 5757, 420-431, 2009.
    • (2009) ESA, LNCS , vol.5757 , pp. 420-431
    • Ferragina, P.1    Nitto, I.2    Venturini, R.3
  • 35
    • 36448937133 scopus 로고    scopus 로고
    • Compressed permuterm index
    • P. Ferragina and R. Venturini. Compressed permuterm index. ACM SIGIR, 535-542, 2007.
    • (2007) ACM SIGIR , pp. 535-542
    • Ferragina, P.1    Venturini, R.2
  • 36
    • 64049096557 scopus 로고    scopus 로고
    • Reducing the storage burden via data deduplication
    • D. Geer. Reducing the storage burden via data deduplication. Computer, 41(12):15-17, 2008.
    • (2008) Computer , vol.41 , Issue.12 , pp. 15-17
    • Geer, D.1
  • 37
  • 38
    • 0003067623 scopus 로고    scopus 로고
    • Scalable techniques for clustering the web
    • T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB, 129-134, 2000.
    • (2000) WebDB , pp. 129-134
    • Haveliwala, T.1    Gionis, A.2    Indyk, P.3
  • 39
    • 0034188132 scopus 로고    scopus 로고
    • Grammar-based codes: A new class of universal lossless source codes
    • J. Kieffer and E.-H. Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Info. Theory, 46(3):737-754, 2000.
    • (2000) IEEE Trans. Info. Theory , vol.46 , Issue.3 , pp. 737-754
    • Kieffer, J.1    Yang, E.-H.2
  • 40
    • 85091109842 scopus 로고    scopus 로고
    • Redundancy elimination within large collections of files
    • P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey. Redundancy elimination within large collections of files. USENIX, 2004.
    • (2004) USENIX
    • Kulkarni, P.1    Douglis, F.2    LaVoie, J.3    Tracey, J.4
  • 42
    • 85043988965 scopus 로고
    • Finding similar files in a large file system
    • U. Manber. Finding similar files in a large file system. USENIX, 1-10, 1994.
    • (1994) USENIX , pp. 1-10
    • Manber, U.1
  • 44
    • 0030609306 scopus 로고    scopus 로고
    • Potential benefits of delta encoding and data compression for HTTP
    • J. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. ACM SIGCOMM, 181-194, 1997.
    • (1997) ACM SIGCOMM , pp. 181-194
    • Mogul, J.1    Douglis, F.2    Feldmann, A.3    Krishnamurthy, B.4
  • 46
    • 84961214036 scopus 로고    scopus 로고
    • Cluster-based delta compression of a collection of files
    • Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. WISE, 257-268, 2002.
    • (2002) WISE , pp. 257-268
    • Ouyang, Z.1    Memon, N.2    Suel, T.3    Trendafilov, D.4
  • 47
    • 34250638291 scopus 로고    scopus 로고
    • A web-based kernel function for measuring the similarity of short text snippets
    • M. Sahami and T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. WWW, 377-386, 2006.
    • (2006) WWW , pp. 377-386
    • Sahami, M.1    Heilman, T.2
  • 48
    • 1142267351 scopus 로고    scopus 로고
    • Winnowing: Local algorithms for document fingerprinting
    • S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: Local algorithms for document fingerprinting. SIGMOD, 76-85, 2003.
    • (2003) SIGMOD , pp. 76-85
    • Schleimer, S.1    Wilkerson, D.2    Aiken, A.3
  • 50
    • 37149046010 scopus 로고    scopus 로고
    • Sorting out the document identifier assignment problem
    • F. Silvestri. Sorting out the document identifier assignment problem. ECIR, LNCS 4425, 101-112, 2007.
    • (2007) ECIR, LNCS 4425 , pp. 101-112
    • Silvestri, F.1
  • 51
    • 2442558876 scopus 로고    scopus 로고
    • chapter "Algorithms for delta compression and remote file synchronization", Academic Press
    • T. Suel and N. Memon. Lossless Compression Handbook, chapter "Algorithms for delta compression and remote file synchronization", Academic Press, 2002.
    • (2002) Lossless Compression Handbook
    • Suel, T.1    Memon, N.2
  • 52
    • 2442563450 scopus 로고    scopus 로고
    • Improved file synchronization techniques for maintaining large replicated collections over slow networks
    • T. Suel, P. Noel, and D. Trendafilov. Improved file synchronization techniques for maintaining large replicated collections over slow networks. IEEE ICDE, 153-164, 2004.
    • (2004) IEEE ICDE , pp. 153-164
    • Suel, T.1    Noel, P.2    Trendafilov, D.3
  • 57
    • 72449208572 scopus 로고    scopus 로고
    • Compressing term positions in web indexes
    • H. Yan, S. Ding, and T. Suel. Compressing term positions in web indexes. ACM SIGIR, 2009.
    • (2009) ACM SIGIR
    • Yan, H.1    Ding, S.2    Suel, T.3
  • 58
    • 84865646680 scopus 로고    scopus 로고
    • Inverted index compression and query processing with optimized document ordering
    • H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. WWW, pages 401-410, 2009.
    • (2009) WWW , pp. 401-410
    • Yan, H.1    Ding, S.2    Suel, T.3
  • 59
    • 55149106898 scopus 로고    scopus 로고
    • Performance of compressed inverted list caching in search engines
    • J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. WWW, 387-396, 2008.
    • (2008) WWW , pp. 387-396
    • Zhang, J.1    Long, X.2    Suel, T.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.