-
1
-
-
0027621699
-
Mining association rules between sets of items in large databases
-
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207-216, 1993.
-
(1993)
SIGMOD
, pp. 207-216
-
-
Agrawal, R.1
Imielinski, T.2
Swami, A.N.3
-
2
-
-
35448936512
-
On synopses for distinct-value estimation under multiset operations
-
K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, pages 199-210, 2007.
-
(2007)
SIGMOD
, pp. 199-210
-
-
Beyer, K.S.1
Haas, P.J.2
Reinwald, B.3
Sismanis, Y.4
Gemulla, R.5
-
3
-
-
33745756440
-
Mirror, mirror on the web: A study of host pairs with replicated content
-
K. Bharat and A. Z. Broder. Mirror, mirror on the web: A study of host pairs with replicated content. In WWW, pages 501-512, 1999.
-
(1999)
, pp. 501-512
-
-
Bharat, K.1
Broder, A.Z.2
-
4
-
-
0014814325
-
Space/time tradeoffs in in hash coding with allowable errors
-
B. Bloom. Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM, 13:422-426, 1970.
-
(1970)
Communications of the ACM
, vol.13
, pp. 422-426
-
-
Bloom, B.1
-
5
-
-
84990496348
-
Selecting several samples from a single population
-
K. R. W. Brewer, L. J. Early, and S. F. Joyce. Selecting several samples from a single population. Australian Journal of Statistics, 14(3):231-239, 1972.
-
(1972)
Australian Journal of Statistics
, vol.14
, Issue.3
, pp. 231-239
-
-
Brewer, K.R.W.1
Early, L.J.2
Joyce, S.F.3
-
7
-
-
79956075292
-
Identifying and filtering near-duplicate documents
-
A. Z. Broder. Identifying and filtering near-duplicate documents. In CPM, pages 1-10, 2000.
-
(2000)
CPM
, pp. 1-10
-
-
Broder, A.Z.1
-
8
-
-
0034207121
-
Min-wise independent permutations
-
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comput. System Sci., 60(3):630-659, 2000.
-
(2000)
J. Comput. System Sci
, vol.60
, Issue.3
, pp. 630-659
-
-
Broder, A.Z.1
Charikar, M.2
Frieze, A.M.3
Mitzenmacher, M.4
-
10
-
-
0036040277
-
Similarity estimation techniques from rounding algorithms
-
M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
-
(2002)
STOC
-
-
Charikar, M.S.1
-
11
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems, 20(1):171-191, 2002.
-
(2002)
ACM Transactions on Information Systems
, vol.20
, Issue.1
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
12
-
-
0031353179
-
Size-estimation framework with applications to transitive closure and reachability
-
E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. System Sci., 55:441-453, 1997.
-
(1997)
J. Comput. System Sci
, vol.55
, pp. 441-453
-
-
Cohen, E.1
-
13
-
-
70349089219
-
Stream sampling for variance-optimal estimation of subset sums
-
E. Cohen, N. Duffield, H. Kaplan, C. Lund, and M. Thorup. Stream sampling for variance-optimal estimation of subset sums. In SODA, 2009.
-
(2009)
SODA
-
-
Cohen, E.1
Duffield, N.2
Kaplan, H.3
Lund, C.4
Thorup, M.5
-
14
-
-
0042014556
-
Associative search in Peer to Peer networks: Harnessing latent semantics
-
E. Cohen, A. Fiat, and H. Kaplan. Associative search in Peer to Peer networks: Harnessing latent semantics. In INFOCOM, 2003.
-
(2003)
INFOCOM
-
-
Cohen, E.1
Fiat, A.2
Kaplan, H.3
-
15
-
-
1842473627
-
Efficient estimation algorithms for neighborhood variance and other moments
-
E. Cohen and H. Kaplan. Efficient estimation algorithms for neighborhood variance and other moments. In SODA, 2004.
-
(2004)
SODA
-
-
Cohen, E.1
Kaplan, H.2
-
16
-
-
33845863420
-
Spatially-decaying aggregation over a network: Model and algorithms
-
E. Cohen and H. Kaplan. Spatially-decaying aggregation over a network: model and algorithms. J. Comput. System Sci., 73:265-288, 2007.
-
(2007)
J. Comput. System Sci
, vol.73
, pp. 265-288
-
-
Cohen, E.1
Kaplan, H.2
-
17
-
-
36849001315
-
Summarizing data using bottom-k sketches
-
E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In PODC, 2007.
-
(2007)
PODC
-
-
Cohen, E.1
Kaplan, H.2
-
18
-
-
67049114678
-
Estimating aggregates over multiple sets
-
E. Cohen and H. Kaplan. Estimating aggregates over multiple sets. In ICDM, 2008.
-
(2008)
ICDM
-
-
Cohen, E.1
Kaplan, H.2
-
19
-
-
70349672710
-
Tighter estimation using bottom-k sketches
-
E. Cohen and H. Kaplan. Tighter estimation using bottom-k sketches. In VLDB, 2008.
-
(2008)
VLDB
-
-
Cohen, E.1
Kaplan, H.2
-
20
-
-
33747616446
-
Maintaining time-decaying stream aggregates
-
E. Cohen and M. Strauss. Maintaining time-decaying stream aggregates. J. Algorithms, 59:19-36, 2006.
-
(2006)
J. Algorithms
, vol.59
, pp. 19-36
-
-
Cohen, E.1
Strauss, M.2
-
22
-
-
8644227073
-
Constructing a text corpus for inexact duplicate detection
-
J. G. Conrad and C. P. Schriber. Constructing a text corpus for inexact duplicate detection. In SIGIR, pages 582-583, 2004.
-
(2004)
SIGIR
, pp. 582-583
-
-
Conrad, J.G.1
Schriber, C.P.2
-
23
-
-
0036366837
-
Mining database structure; or, how to build a data quality browser
-
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240-251, 2002.
-
(2002)
SIGMOD
, pp. 240-251
-
-
Dasu, T.1
Johnson, T.2
Muthukrishnan, S.3
Shkapenyuk, V.4
-
24
-
-
37049036831
-
Priority sampling for estimating arbitrary subset sums
-
Mach
-
N. Duffield, M. Thorup, and C. Lund. Priority sampling for estimating arbitrary subset sums. J. Assoc. Comput. Mach., 54(6), 2007.
-
J. Assoc. Comput
, vol.54
, Issue.6
, pp. 2007
-
-
Duffield, N.1
Thorup, M.2
Lund, C.3
-
25
-
-
0034206002
-
Summary cache: A scalable wide-area Web cache sharing protocol
-
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area Web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281-293, 2000.
-
(2000)
IEEE/ACM Transactions on Networking
, vol.8
, Issue.3
, pp. 281-293
-
-
Fan, L.1
Cao, P.2
Almeida, J.3
Broder, A.Z.4
-
26
-
-
0033309273
-
An approximate L1-difference algorithm for massive data streams
-
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In FOCS, pages 501-511, 1999.
-
(1999)
FOCS
, pp. 501-511
-
-
Feigenbaum, J.1
Kannan, S.2
Strauss, M.3
Viswanathan, M.4
-
28
-
-
84944323337
-
Distinct sampling for highly-accurate answers to distinct values queries and event reports
-
P. B. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In VLDB, pages 541-550, 2001.
-
(2001)
VLDB
, pp. 541-550
-
-
Gibbons, P.B.1
-
29
-
-
70349659026
-
Hashed samples: Selectivity estimators for set similarity selection queries
-
M. Hadjieleftheriou, X. Yu, N. Koudas, and D. Srivastava. Hashed samples: Selectivity estimators for set similarity selection queries. In VLDB, 2008.
-
(2008)
VLDB
-
-
Hadjieleftheriou, M.1
Yu, X.2
Koudas, N.3
Srivastava, D.4
-
31
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
-
(2006)
SIGIR
, pp. 284-291
-
-
Henzinger, M.R.1
-
32
-
-
84947396376
-
A generalization of sampling without replacement from a finite universe
-
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663-685, 1952.
-
(1952)
Journal of the American Statistical Association
, vol.47
, Issue.260
, pp. 663-685
-
-
Horvitz, D.G.1
Thompson, D.J.2
-
33
-
-
33244490258
-
Randomized incremental constructions of three-dimensional convex hulls and planar voronoi diagrams, and approximate range counting
-
H. Kaplan and M. Sharir. Randomized incremental constructions of three-dimensional convex hulls and planar voronoi diagrams, and approximate range counting. In SODA, pages 484-493, 2006.
-
(2006)
SODA
, pp. 484-493
-
-
Kaplan, H.1
Sharir, M.2
-
34
-
-
12244261882
-
Improved robustness of signature-based near-replica detection via lexicon randomization
-
A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In SIGKDD, pages 605-610, 2004.
-
(2004)
SIGKDD
, pp. 605-610
-
-
Kolcz, A.1
Chowdhury, A.2
Alspector, J.3
-
35
-
-
34748825544
-
A sketch algorithm for estimating two-way and multi-way associations
-
P. Li and K. W. Church. A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics, 33:305-354, 2007.
-
(2007)
Computational Linguistics
, vol.33
, pp. 305-354
-
-
Li, P.1
Church, K.W.2
-
36
-
-
85043988965
-
Finding similar files in a large file system
-
U. Manber. Finding similar files in a large file system. In Usenix, pages 1-10, 1994.
-
(1994)
Usenix
, pp. 1-10
-
-
Manber, U.1
-
37
-
-
35348911985
-
-
G. S. Manku, A. Jain, and A. D. Sarma. Detecting nearduplicates for web crawling. In WWW, 2007.
-
G. S. Manku, A. Jain, and A. D. Sarma. Detecting nearduplicates for web crawling. In WWW, 2007.
-
-
-
-
38
-
-
33748699331
-
Computing separable functions via gossip
-
D. Mosk-Aoyama and D. Shah. Computing separable functions via gossip. In PODC, 2006.
-
(2006)
PODC
-
-
Mosk-Aoyama, D.1
Shah, D.2
-
39
-
-
0035051307
-
Finding interesting associations without support pruning
-
R. Motwani, E. Cohen, M. Datar, S. Fujiware, A. Gronis, P. Indyk, J. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13:64-78, 2001.
-
(2001)
IEEE Transactions on Knowledge and Data Engineering
, vol.13
, pp. 64-78
-
-
Motwani, R.1
Cohen, E.2
Datar, M.3
Fujiware, S.4
Gronis, A.5
Indyk, P.6
Ullman, J.7
Yang, C.8
-
41
-
-
0001022352
-
Sequential poisson sampling
-
E. Ohlsson. Sequential poisson sampling. J. Official Statistics, 14(2):149-162, 1998.
-
(1998)
J. Official Statistics
, vol.14
, Issue.2
, pp. 149-162
-
-
Ohlsson, E.1
-
42
-
-
0001630482
-
Asymptotic theory for successive sampling with varying probabilities without replacement, I
-
B. Rosén. Asymptotic theory for successive sampling with varying probabilities without replacement, I. The Annals of Mathematical Statistics, 43(2):373-397, 1972.
-
(1972)
The Annals of Mathematical Statistics
, vol.43
, Issue.2
, pp. 373-397
-
-
Rosén, B.1
-
43
-
-
0031571412
-
Asymptotic theory for order sampling
-
B. Rosén. Asymptotic theory for order sampling. J. Statistical Planning and Inference, 62(2):135-158, 1997.
-
(1997)
J. Statistical Planning and Inference
, vol.62
, Issue.2
, pp. 135-158
-
-
Rosén, B.1
-
44
-
-
0031571470
-
On sampling with probability proportional to size
-
B. Rosen. On sampling with probability proportional to size. J. Statistical Planning and Inference, 62(2):159-191, 1997.
-
(1997)
J. Statistical Planning and Inference
, vol.62
, Issue.2
, pp. 159-191
-
-
Rosen, B.1
-
45
-
-
1142267351
-
Winnowing: Local algorithms for document fingerprinting
-
S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003.
-
(2003)
SIGMOD
-
-
Schleimer, S.1
Wilkerson, D.2
Aiken, A.3
-
46
-
-
0005258822
-
A protocol-independent technique for eliminating redundant network traffic
-
N. T. Spring and D. Wetherall. A protocol-independent technique for eliminating redundant network traffic. In SIGCOMM, 2000.
-
(2000)
SIGCOMM
-
-
Spring, N.T.1
Wetherall, D.2
-
47
-
-
0041472305
-
Efficient content location using interest-based locality in peer-to-peer systems
-
K. Sripanidkulchai, B. Maggs, and H. Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In INFOCOM, 2003.
-
(2003)
INFOCOM
-
-
Sripanidkulchai, K.1
Maggs, B.2
Zhang, H.3
-
48
-
-
33748098958
-
The DLT priority sampling is essentially optimal
-
M. Szegedy. The DLT priority sampling is essentially optimal. In STOC, 2006.
-
(2006)
STOC
-
-
Szegedy, M.1
-
49
-
-
70449656148
-
-
M. Szegedy and M. Thorup. On the variance of subset sum estimation. In ESA, 2007.
-
M. Szegedy and M. Thorup. On the variance of subset sum estimation. In ESA, 2007.
-
-
-
-
50
-
-
3042749011
-
Peer-to-peer information retrieval using self-organizing semantic overlay networks
-
C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In SIGCOMM, 2003.
-
(2003)
SIGCOMM
-
-
Tang, C.1
Xu, Z.2
Dwarkadas, S.3
|