-
1
-
-
35448936512
-
On synopses for distinct-value estimation under multiset operations
-
DOI 10.1145/1247480.1247504, SIGMOD 2007: Proceedings of the ACM SIGMOD International Conference on Management of Data
-
K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, pages 199-210. ACM, 2007. (Pubitemid 47630807)
-
(2007)
Proceedings of the ACM SIGMOD International Conference on Management of Data
, pp. 199-210
-
-
Beyer, K.1
Haas, P.J.2
Reinwald, B.3
Sismanis, Y.4
Gemulla, R.5
-
2
-
-
0014814325
-
Space/time tradeoffs in in hash coding with allowable errors
-
B. Bloom. Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM, 13:422-426, 1970.
-
(1970)
Communications of the ACM
, vol.13
, pp. 422-426
-
-
Bloom, B.1
-
6
-
-
0013206133
-
Collection statistics for fast duplicate document detection
-
DOI 10.1145/506309.506311
-
A. Chowdhury, O. Frieder, D. Grossman, and M. C. McCabe. Collection statistics for fast duplicate document detection. ACM Transactions on Information Systems, 20(1):171-191, 2002. (Pubitemid 44642301)
-
(2002)
ACM Transactions on Information Systems
, vol.20
, Issue.2
, pp. 171-191
-
-
Chowdhury, A.1
Frieder, O.2
Grossman, D.3
McCabe, M.C.4
-
7
-
-
0031353179
-
Size-Estimation Framework with Applications to Transitive Closure and Reachability
-
E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. System Sci., 55:441-453, 1997. (Pubitemid 127432363)
-
(1997)
Journal of Computer and System Sciences
, vol.55
, Issue.3
, pp. 441-453
-
-
Cohen, E.1
-
8
-
-
67049085162
-
Variance optimal sampling based estimation of subset sums
-
ACM-SIAM
-
E. Cohen, N. Duffield, H. Kaplan, C. Lund, and M. Thorup. Variance optimal sampling based estimation of subset sums. In Proc. 20th ACM-SIAMSymposium on Discrete Algorithms. ACM-SIAM, 2009.
-
(2009)
Proc. 20th ACM-SIAMSymposium on Discrete Algorithms
-
-
Cohen, E.1
Duffield, N.2
Kaplan, H.3
Lund, C.4
Thorup, M.5
-
11
-
-
8644227073
-
Constructing a text corpus for inexact duplicate detection
-
J. G. Conrad and C. P. Schriber. Constructing a text corpus for inexact duplicate detection. In SIGIR 2004, pages 582- 583, 2004.
-
(2004)
SIGIR 2004
, pp. 582-583
-
-
Conrad, J.G.1
Schriber, C.P.2
-
12
-
-
0036366837
-
Mining database structure; Or, how to build a data quality browser
-
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240-251. ACM, 2002. (Pubitemid 34985551)
-
(2002)
Proceedings of the ACM SIGMOD International Conference on Management of Data
, pp. 240-251
-
-
Dasu, T.1
Johnson, T.2
Muthukrishnan, S.3
Shkapenyuk, V.4
-
13
-
-
37049036831
-
Priority sampling for estimating arbitrary subset sums
-
N. Duffield, M. Thorup, and C. Lund. Priority sampling for estimating arbitrary subset sums. J. Assoc. Comput. Mach., 54(6), 2007.
-
(2007)
J. Assoc. Comput. Mach.
, vol.54
, Issue.6
-
-
Duffield, N.1
Thorup, M.2
Lund, C.3
-
14
-
-
0034206002
-
Summary cache: A scalable wide-area Web cache sharing protocol
-
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area Web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281-293, 2000. IEEE/ACM Transactions on Networking, 8(3):281-293, 2000.
-
(2000)
IEEE/ACM Transactions on Networking
, vol.8
, Issue.3
, pp. 281-293
-
-
Fan, L.1
Cao, P.2
Almeida, J.3
Broder, A.Z.4
-
15
-
-
0033309273
-
1-difference algorithm for massive data streams
-
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive datastreams. In Proc. 40th IEEE Annual Symposium on Foundations of Computer Science, pages 501-511. IEEE, 1999. (Pubitemid 30539956)
-
(1999)
Annual Symposium on Foundations of Computer Science - Proceedings
, pp. 501-511
-
-
Feigenbaum, J.1
Kannan, S.2
Strauss, M.3
Viswanathan, M.4
-
16
-
-
33750296887
-
Finding near-duplicate web pages: A largescale evaluation of algorithms
-
M. R. Henzinger. Finding near-duplicate web pages: a largescale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
-
(2006)
SIGIR
, pp. 284-291
-
-
Henzinger, M.R.1
-
17
-
-
84947396376
-
A generalization of sampling without replacement from a finite universe
-
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663-685, 1952.
-
(1952)
Journal of the American Statistical Association
, vol.47
, Issue.260
, pp. 663-685
-
-
Horvitz, D.G.1
Thompson, D.J.2
-
19
-
-
34748825544
-
A sketch algorithm for estimating two-way and multi-way associations
-
P. Li and K. W. Church. A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics, 33:305-354, 2007.
-
(2007)
Computational Linguistics
, vol.33
, pp. 305-354
-
-
Li, P.1
Church, K.W.2
-
20
-
-
85043988965
-
Finding similar files in a large file system
-
U. Manber. Finding similar files in a large file system. In Usenix Conference, pages 1-10, 1994.
-
(1994)
Usenix Conference
, pp. 1-10
-
-
Manber, U.1
-
22
-
-
84872255037
-
-
The Netflix Prize. &z.ast;http://www.netflixprize.com/.
-
The Netflix Prize
-
-
-
23
-
-
1142267351
-
Winnowing: Local algorithms for document fingerprinting
-
S. Schleimer, D.Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, pages 76-85, 2003.
-
(2003)
SIGMOD
, pp. 76-85
-
-
Schleimer, S.1
Wilkerson, D.2
Aiken, A.3
-
24
-
-
33748098958
-
The DLT priority sampling is essentially optimal
-
STOC'06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing
-
M. Szegedy. The DLT priority sampling is essentially optimal. In Proc. 38th Annual ACM Symposium on Theory of Computing, pages 150-158. ACM, 2006 (Pubitemid 44306548)
-
(2006)
Proceedings of the Annual ACM Symposium on Theory of Computing
, vol.2006
, pp. 150-158
-
-
Szegedy, M.1
|