-
1
-
-
77950934630
-
-
Apache Foundation. Hadoop. http://hadoop.apache.org/.
-
-
-
-
4
-
-
0037619265
-
Web search for a planet: The google cluster architecture
-
L. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2), 2003.
-
(2003)
IEEE Micro
, vol.23
, pp. 2
-
-
Barroso, L.1
Dean, J.2
Hölzle, U.3
-
5
-
-
35348849154
-
Scaling up all pairs similarity search
-
R. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, 131-140, 2007.
-
(2007)
WWW
, pp. 131-140
-
-
Bayardo, R.1
Ma, Y.2
Srikant, R.3
-
6
-
-
0035367637
-
Data compression with long repeated strings
-
J. Bentley and M. McIlroy. Data compression with long repeated strings. Information Sciences, 135(1-2):1-11, 2001.
-
(2001)
Information Sciences
, vol.135
, Issue.1-2
, pp. 1-11
-
-
Bentley, J.1
McIlroy, M.2
-
7
-
-
20744453675
-
Dictionaries using variable-length keys and data, with applications
-
D. Blandford and G. Blelloch. Dictionaries using variable-length keys and data, with applications. ACM-SIAM SODA, 1-10, 2005.
-
(2005)
ACM-SIAM SODA
, pp. 1-10
-
-
Blandford, D.1
Blelloch, G.2
-
8
-
-
3042680184
-
Ubicrawler: A scalable fully distributed web crawler
-
P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711-726, 2004.
-
(2004)
Software: Practice & Experience
, vol.34
, Issue.8
, pp. 711-726
-
-
Boldi, P.1
Codenotti, B.2
Santini, M.3
Vigna, S.4
-
10
-
-
19944376183
-
The webgraph framework I: Compression techniques
-
P. Boldi and S. Vigna. The webgraph framework I: compression techniques. WWW, 595-602, 2004.
-
(2004)
WWW
, pp. 595-602
-
-
Boldi, P.1
Vigna, S.2
-
11
-
-
77950926507
-
-
P. Boldi and S. Vigna. MG4J at TREC 2005. http://mg4j.dsi.unimi.it/.
-
(2005)
-
-
Boldi, P.1
Vigna, S.2
-
12
-
-
57349197117
-
Reorganizing compressed text
-
N. Brisaboa, A. Fariña, S. Ladra, and G. Navarro. Reorganizing compressed text. ACM SIGIR, 139-146, 2008.
-
(2008)
ACM SIGIR
, pp. 139-146
-
-
Brisaboa, N.1
Fariña, A.2
Ladra, S.3
Navarro, G.4
-
13
-
-
85041104131
-
Self-indexing natural language
-
N. Brisaboa, A. Fariña, G. Navarro, A. Places, and E. Rodríguez. Self-indexing natural language. SPIRE, LNCS 5280, 2008.
-
(2008)
SPIRE, LNCS 5280
-
-
Brisaboa, N.1
Fariña, A.2
Navarro, G.3
Places, A.4
Rodríguez, E.5
-
14
-
-
79956075292
-
Identifying and filtering near-duplicate documents
-
A. Broder. Identifying and filtering near-duplicate documents. CPM, LNCS 1848, 2000.
-
(2000)
CPM, LNCS 1848
-
-
Broder, A.1
-
15
-
-
0010362121
-
Syntactic clustering of the web
-
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13), 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
-
-
Broder, A.1
Glassman, S.2
Manasse, M.3
Zweig, G.4
-
16
-
-
42549139837
-
A scalable pattern mining approach to web graph compression with communities
-
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. WSDM, 95-106, 2008.
-
(2008)
WSDM
, pp. 95-106
-
-
Buehrer, G.1
Chellapilla, K.2
-
17
-
-
79960793459
-
Efficiency vs. Effectiveness in terabyte-scale information retrieval
-
S. Büttcher and C. Clarke. Efficiency vs. Effectiveness in terabyte-scale information retrieval. TREC, 2005.
-
(2005)
TREC
-
-
Büttcher, S.1
Clarke, C.2
-
18
-
-
63449115477
-
Index compression is good, especially for random access
-
S. Büttcher and C. Clarke. Index compression is good, especially for random access. CIKM, 761-770, 2007.
-
(2007)
CIKM
, pp. 761-770
-
-
Büttcher, S.1
Clarke, C.2
-
19
-
-
0032631733
-
Cache-based compaction: A new technique for optimizing web-transfer
-
M. Chan and T. Woo. Cache-based compaction: A new technique for optimizing web-transfer. INFOCOM, 1999.
-
(1999)
INFOCOM
-
-
Chan, M.1
Woo, T.2
-
20
-
-
47749140025
-
Bigtable: A distributed storage system for structured data
-
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2), 2008.
-
(2008)
ACM Trans. Comput. Syst.
, vol.26
, Issue.2
-
-
Chang, F.1
Dean, J.2
Ghemawat, S.3
Hsieh, W.4
Wallach, D.5
Burrows, M.6
Chandra, T.7
Fikes, A.8
Gruber, R.9
-
21
-
-
70350694219
-
On compressing social networks
-
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, P. Raghavan. On compressing social networks. KDD, 219-228, 2009.
-
(2009)
KDD
, pp. 219-228
-
-
Chierichetti, F.1
Kumar, R.2
Lattanzi, S.3
Mitzenmacher, M.4
Panconesi, A.5
Raghavan, P.6
-
22
-
-
38049015483
-
A fast and compact web graph representation
-
F. Claude and G. Navarro. A fast and compact web graph representation. SPIRE, LNCS 4726, 118-129, 2007.
-
(2007)
SPIRE, LNCS 4726
, pp. 118-129
-
-
Claude, F.1
Navarro, G.2
-
23
-
-
77950936440
-
-
D. Cutting. Apache Lucene. http://lucene.apache.org/.
-
-
-
Cutting, D.1
-
25
-
-
4544259509
-
Locality-sensitive hashing scheme based on p-stable distributions
-
M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. ACM SoCG, 253-262, 2004.
-
(2004)
ACM SoCG
, pp. 253-262
-
-
Datar, M.1
Immorlica, N.2
Indyk, P.3
Mirrokni, V.4
-
26
-
-
85030321143
-
MapReduce: Simplified data processing on large clusters
-
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
-
(2004)
OSDI
-
-
Dean, J.1
Ghemawat, S.2
-
27
-
-
4544256811
-
Application-specific delta-encoding via resemblance detection
-
F. Douglis and A. Iyengar. Application-specific delta-encoding via resemblance detection. USENIX, 113-126, 2003.
-
(2003)
USENIX
, pp. 113-126
-
-
Douglis, F.1
Iyengar, A.2
-
29
-
-
33750716855
-
The engineering of a compression boosting library: Theory vs practice in BWT compression
-
P. Ferragina, R. Giancarlo, and G. Manzini. The engineering of a compression boosting library: Theory vs practice in BWT compression. ESA, LNCS 4168, 756-767, 2006.
-
(2006)
ESA, LNCS 4168
, pp. 756-767
-
-
Ferragina, P.1
Giancarlo, R.2
Manzini, G.3
-
30
-
-
30544455566
-
Boosting textual compressionoptimal linear time
-
P. Ferragina, R. Giancarlo, G. Manzini, and M. Sciortino. Boosting textual compressionoptimal linear time. Journal of the ACM, 52:688-713, 2005.
-
(2005)
Journal of the ACM
, vol.52
, pp. 688-713
-
-
Ferragina, P.1
Giancarlo, R.2
Manzini, G.3
Sciortino, M.4
-
31
-
-
84979917497
-
Compressed text indexes: From theory to practice
-
P. Ferragina, R. González, G. Navarro, and R. Venturini. Compressed text indexes: From theory to practice. ACM Journal of Experimental Algorithmics, 13, 2008.
-
(2008)
ACM Journal of Experimental Algorithmics
, vol.13
-
-
Ferragina, P.1
González, R.2
Navarro, G.3
Venturini, R.4
-
32
-
-
77950950499
-
-
P. Ferragina and G. Navarro. Pizza&Chili corpus home page. http://pizzachili.di.unipi.it/.
-
-
-
Ferragina, P.1
Navarro, G.2
-
33
-
-
70350408798
-
On optimally partitioning a text to improve its compression
-
P. Ferragina, I. Nitto, and R. Venturini. On optimally partitioning a text to improve its compression. ESA, LNCS 5757, 420-431, 2009.
-
(2009)
ESA, LNCS
, vol.5757
, pp. 420-431
-
-
Ferragina, P.1
Nitto, I.2
Venturini, R.3
-
36
-
-
64049096557
-
Reducing the storage burden via data deduplication
-
D. Geer. Reducing the storage burden via data deduplication. Computer, 41(12):15-17, 2008.
-
(2008)
Computer
, vol.41
, Issue.12
, pp. 15-17
-
-
Geer, D.1
-
37
-
-
77957931572
-
Detecting the origin of text segments efficiently
-
O.A. Hamid, B. Behzadi, S. Christoph, and M.R. Henzinger. Detecting the origin of text segments efficiently. WWW, 61-70, 2009.
-
(2009)
WWW
, pp. 61-70
-
-
Hamid, O.A.1
Behzadi, B.2
Christoph, S.3
Henzinger, M.R.4
-
38
-
-
0003067623
-
Scalable techniques for clustering the web
-
T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB, 129-134, 2000.
-
(2000)
WebDB
, pp. 129-134
-
-
Haveliwala, T.1
Gionis, A.2
Indyk, P.3
-
39
-
-
0034188132
-
Grammar-based codes: A new class of universal lossless source codes
-
J. Kieffer and E.-H. Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Info. Theory, 46(3):737-754, 2000.
-
(2000)
IEEE Trans. Info. Theory
, vol.46
, Issue.3
, pp. 737-754
-
-
Kieffer, J.1
Yang, E.-H.2
-
42
-
-
85043988965
-
Finding similar files in a large file system
-
U. Manber. Finding similar files in a large file system. USENIX, 1-10, 1994.
-
(1994)
USENIX
, pp. 1-10
-
-
Manber, U.1
-
44
-
-
0030609306
-
Potential benefits of delta encoding and data compression for HTTP
-
J. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. ACM SIGCOMM, 181-194, 1997.
-
(1997)
ACM SIGCOMM
, pp. 181-194
-
-
Mogul, J.1
Douglis, F.2
Feldmann, A.3
Krishnamurthy, B.4
-
46
-
-
84961214036
-
Cluster-based delta compression of a collection of files
-
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. WISE, 257-268, 2002.
-
(2002)
WISE
, pp. 257-268
-
-
Ouyang, Z.1
Memon, N.2
Suel, T.3
Trendafilov, D.4
-
47
-
-
34250638291
-
A web-based kernel function for measuring the similarity of short text snippets
-
M. Sahami and T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. WWW, 377-386, 2006.
-
(2006)
WWW
, pp. 377-386
-
-
Sahami, M.1
Heilman, T.2
-
48
-
-
1142267351
-
Winnowing: Local algorithms for document fingerprinting
-
S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: Local algorithms for document fingerprinting. SIGMOD, 76-85, 2003.
-
(2003)
SIGMOD
, pp. 76-85
-
-
Schleimer, S.1
Wilkerson, D.2
Aiken, A.3
-
50
-
-
37149046010
-
Sorting out the document identifier assignment problem
-
F. Silvestri. Sorting out the document identifier assignment problem. ECIR, LNCS 4425, 101-112, 2007.
-
(2007)
ECIR, LNCS 4425
, pp. 101-112
-
-
Silvestri, F.1
-
51
-
-
2442558876
-
-
chapter "Algorithms for delta compression and remote file synchronization", Academic Press
-
T. Suel and N. Memon. Lossless Compression Handbook, chapter "Algorithms for delta compression and remote file synchronization", Academic Press, 2002.
-
(2002)
Lossless Compression Handbook
-
-
Suel, T.1
Memon, N.2
-
52
-
-
2442563450
-
Improved file synchronization techniques for maintaining large replicated collections over slow networks
-
T. Suel, P. Noel, and D. Trendafilov. Improved file synchronization techniques for maintaining large replicated collections over slow networks. IEEE ICDE, 153-164, 2004.
-
(2004)
IEEE ICDE
, pp. 153-164
-
-
Suel, T.1
Noel, P.2
Trendafilov, D.3
-
53
-
-
74549167103
-
Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression
-
D. Teodosiu, N. Bjorner, J. Porkka, M. Manasse, and Y. Gurevich. Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression. Microsoft Research TR-2006-2157, 2006.
-
(2006)
Microsoft Research TR-2006-2157
-
-
Teodosiu, D.1
Bjorner, N.2
Porkka, J.3
Manasse, M.4
Gurevich, Y.5
-
54
-
-
36448985970
-
Fast generation of result snippets in web search
-
DOI 10.1145/1277741.1277766, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
-
A. Turpin, Y. Tsegay, D. Hawking, and H. Williams. Fast generation of result snippets in web search. ACM SIGIR, 127-134, 2007. (Pubitemid 350164953)
-
(2007)
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
, pp. 127-134
-
-
Turpin, A.1
Tsegay, Y.2
Hawking, D.3
Williams, H.E.4
-
57
-
-
72449208572
-
Compressing term positions in web indexes
-
H. Yan, S. Ding, and T. Suel. Compressing term positions in web indexes. ACM SIGIR, 2009.
-
(2009)
ACM SIGIR
-
-
Yan, H.1
Ding, S.2
Suel, T.3
-
58
-
-
84865646680
-
Inverted index compression and query processing with optimized document ordering
-
H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. WWW, pages 401-410, 2009.
-
(2009)
WWW
, pp. 401-410
-
-
Yan, H.1
Ding, S.2
Suel, T.3
-
59
-
-
55149106898
-
Performance of compressed inverted list caching in search engines
-
J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. WWW, 387-396, 2008.
-
(2008)
WWW
, pp. 387-396
-
-
Zhang, J.1
Long, X.2
Suel, T.3
|