메뉴 건너뛰기




Volumn 37, Issue 1, 2009, Pages 251-262

Leveraging discarded samples for tighter estimation of multiple-set aggregates

Author keywords

Algorithms; G.3: probabilistic algorithms; H.2 database management; Measurement; Performance

Indexed keywords

APPROXIMATE QUERY; DATA SETS; DATABASE MANAGEMENT; EMPIRICAL EVALUATIONS; ESTIMATION ERRORS; EXACT COMPUTATIONS; H.2 DATABASE MANAGEMENT; HYPERTEXT DOCUMENTS; MARKET BASKET; MASSIVE DATA SETS; PROBABILISTIC ALGORITHM; RANDOM SAMPLE; SYNTHETIC AND REAL DATA; TIME PERIODS; TYPICAL APPLICATION;

EID: 70449636034     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1555349.1555379     Document Type: Conference Paper
Times cited : (25)

References (50)
  • 1
    • 0027621699 scopus 로고
    • Mining association rules between sets of items in large databases
    • R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207-216, 1993.
    • (1993) SIGMOD , pp. 207-216
    • Agrawal, R.1    Imielinski, T.2    Swami, A.N.3
  • 2
    • 35448936512 scopus 로고    scopus 로고
    • On synopses for distinct-value estimation under multiset operations
    • K. S. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, pages 199-210, 2007.
    • (2007) SIGMOD , pp. 199-210
    • Beyer, K.S.1    Haas, P.J.2    Reinwald, B.3    Sismanis, Y.4    Gemulla, R.5
  • 3
    • 33745756440 scopus 로고    scopus 로고
    • Mirror, mirror on the web: A study of host pairs with replicated content
    • K. Bharat and A. Z. Broder. Mirror, mirror on the web: A study of host pairs with replicated content. In WWW, pages 501-512, 1999.
    • (1999) , pp. 501-512
    • Bharat, K.1    Broder, A.Z.2
  • 4
    • 0014814325 scopus 로고
    • Space/time tradeoffs in in hash coding with allowable errors
    • B. Bloom. Space/time tradeoffs in in hash coding with allowable errors. Communications of the ACM, 13:422-426, 1970.
    • (1970) Communications of the ACM , vol.13 , pp. 422-426
    • Bloom, B.1
  • 7
    • 79956075292 scopus 로고    scopus 로고
    • Identifying and filtering near-duplicate documents
    • A. Z. Broder. Identifying and filtering near-duplicate documents. In CPM, pages 1-10, 2000.
    • (2000) CPM , pp. 1-10
    • Broder, A.Z.1
  • 10
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002.
    • (2002) STOC
    • Charikar, M.S.1
  • 12
    • 0031353179 scopus 로고    scopus 로고
    • Size-estimation framework with applications to transitive closure and reachability
    • E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. System Sci., 55:441-453, 1997.
    • (1997) J. Comput. System Sci , vol.55 , pp. 441-453
    • Cohen, E.1
  • 13
    • 70349089219 scopus 로고    scopus 로고
    • Stream sampling for variance-optimal estimation of subset sums
    • E. Cohen, N. Duffield, H. Kaplan, C. Lund, and M. Thorup. Stream sampling for variance-optimal estimation of subset sums. In SODA, 2009.
    • (2009) SODA
    • Cohen, E.1    Duffield, N.2    Kaplan, H.3    Lund, C.4    Thorup, M.5
  • 14
    • 0042014556 scopus 로고    scopus 로고
    • Associative search in Peer to Peer networks: Harnessing latent semantics
    • E. Cohen, A. Fiat, and H. Kaplan. Associative search in Peer to Peer networks: Harnessing latent semantics. In INFOCOM, 2003.
    • (2003) INFOCOM
    • Cohen, E.1    Fiat, A.2    Kaplan, H.3
  • 15
    • 1842473627 scopus 로고    scopus 로고
    • Efficient estimation algorithms for neighborhood variance and other moments
    • E. Cohen and H. Kaplan. Efficient estimation algorithms for neighborhood variance and other moments. In SODA, 2004.
    • (2004) SODA
    • Cohen, E.1    Kaplan, H.2
  • 16
    • 33845863420 scopus 로고    scopus 로고
    • Spatially-decaying aggregation over a network: Model and algorithms
    • E. Cohen and H. Kaplan. Spatially-decaying aggregation over a network: model and algorithms. J. Comput. System Sci., 73:265-288, 2007.
    • (2007) J. Comput. System Sci , vol.73 , pp. 265-288
    • Cohen, E.1    Kaplan, H.2
  • 17
    • 36849001315 scopus 로고    scopus 로고
    • Summarizing data using bottom-k sketches
    • E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In PODC, 2007.
    • (2007) PODC
    • Cohen, E.1    Kaplan, H.2
  • 18
    • 67049114678 scopus 로고    scopus 로고
    • Estimating aggregates over multiple sets
    • E. Cohen and H. Kaplan. Estimating aggregates over multiple sets. In ICDM, 2008.
    • (2008) ICDM
    • Cohen, E.1    Kaplan, H.2
  • 19
    • 70349672710 scopus 로고    scopus 로고
    • Tighter estimation using bottom-k sketches
    • E. Cohen and H. Kaplan. Tighter estimation using bottom-k sketches. In VLDB, 2008.
    • (2008) VLDB
    • Cohen, E.1    Kaplan, H.2
  • 20
    • 33747616446 scopus 로고    scopus 로고
    • Maintaining time-decaying stream aggregates
    • E. Cohen and M. Strauss. Maintaining time-decaying stream aggregates. J. Algorithms, 59:19-36, 2006.
    • (2006) J. Algorithms , vol.59 , pp. 19-36
    • Cohen, E.1    Strauss, M.2
  • 22
    • 8644227073 scopus 로고    scopus 로고
    • Constructing a text corpus for inexact duplicate detection
    • J. G. Conrad and C. P. Schriber. Constructing a text corpus for inexact duplicate detection. In SIGIR, pages 582-583, 2004.
    • (2004) SIGIR , pp. 582-583
    • Conrad, J.G.1    Schriber, C.P.2
  • 23
    • 0036366837 scopus 로고    scopus 로고
    • Mining database structure; or, how to build a data quality browser
    • T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, pages 240-251, 2002.
    • (2002) SIGMOD , pp. 240-251
    • Dasu, T.1    Johnson, T.2    Muthukrishnan, S.3    Shkapenyuk, V.4
  • 24
    • 37049036831 scopus 로고    scopus 로고
    • Priority sampling for estimating arbitrary subset sums
    • Mach
    • N. Duffield, M. Thorup, and C. Lund. Priority sampling for estimating arbitrary subset sums. J. Assoc. Comput. Mach., 54(6), 2007.
    • J. Assoc. Comput , vol.54 , Issue.6 , pp. 2007
    • Duffield, N.1    Thorup, M.2    Lund, C.3
  • 26
    • 0033309273 scopus 로고    scopus 로고
    • An approximate L1-difference algorithm for massive data streams
    • J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In FOCS, pages 501-511, 1999.
    • (1999) FOCS , pp. 501-511
    • Feigenbaum, J.1    Kannan, S.2    Strauss, M.3    Viswanathan, M.4
  • 28
    • 84944323337 scopus 로고    scopus 로고
    • Distinct sampling for highly-accurate answers to distinct values queries and event reports
    • P. B. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In VLDB, pages 541-550, 2001.
    • (2001) VLDB , pp. 541-550
    • Gibbons, P.B.1
  • 29
    • 70349659026 scopus 로고    scopus 로고
    • Hashed samples: Selectivity estimators for set similarity selection queries
    • M. Hadjieleftheriou, X. Yu, N. Koudas, and D. Srivastava. Hashed samples: Selectivity estimators for set similarity selection queries. In VLDB, 2008.
    • (2008) VLDB
    • Hadjieleftheriou, M.1    Yu, X.2    Koudas, N.3    Srivastava, D.4
  • 31
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
    • (2006) SIGIR , pp. 284-291
    • Henzinger, M.R.1
  • 32
    • 84947396376 scopus 로고
    • A generalization of sampling without replacement from a finite universe
    • D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663-685, 1952.
    • (1952) Journal of the American Statistical Association , vol.47 , Issue.260 , pp. 663-685
    • Horvitz, D.G.1    Thompson, D.J.2
  • 33
    • 33244490258 scopus 로고    scopus 로고
    • Randomized incremental constructions of three-dimensional convex hulls and planar voronoi diagrams, and approximate range counting
    • H. Kaplan and M. Sharir. Randomized incremental constructions of three-dimensional convex hulls and planar voronoi diagrams, and approximate range counting. In SODA, pages 484-493, 2006.
    • (2006) SODA , pp. 484-493
    • Kaplan, H.1    Sharir, M.2
  • 34
    • 12244261882 scopus 로고    scopus 로고
    • Improved robustness of signature-based near-replica detection via lexicon randomization
    • A. Kolcz, A. Chowdhury, and J. Alspector. Improved robustness of signature-based near-replica detection via lexicon randomization. In SIGKDD, pages 605-610, 2004.
    • (2004) SIGKDD , pp. 605-610
    • Kolcz, A.1    Chowdhury, A.2    Alspector, J.3
  • 35
    • 34748825544 scopus 로고    scopus 로고
    • A sketch algorithm for estimating two-way and multi-way associations
    • P. Li and K. W. Church. A sketch algorithm for estimating two-way and multi-way associations. Computational Linguistics, 33:305-354, 2007.
    • (2007) Computational Linguistics , vol.33 , pp. 305-354
    • Li, P.1    Church, K.W.2
  • 36
    • 85043988965 scopus 로고
    • Finding similar files in a large file system
    • U. Manber. Finding similar files in a large file system. In Usenix, pages 1-10, 1994.
    • (1994) Usenix , pp. 1-10
    • Manber, U.1
  • 37
    • 35348911985 scopus 로고    scopus 로고
    • G. S. Manku, A. Jain, and A. D. Sarma. Detecting nearduplicates for web crawling. In WWW, 2007.
    • G. S. Manku, A. Jain, and A. D. Sarma. Detecting nearduplicates for web crawling. In WWW, 2007.
  • 38
    • 33748699331 scopus 로고    scopus 로고
    • Computing separable functions via gossip
    • D. Mosk-Aoyama and D. Shah. Computing separable functions via gossip. In PODC, 2006.
    • (2006) PODC
    • Mosk-Aoyama, D.1    Shah, D.2
  • 41
    • 0001022352 scopus 로고    scopus 로고
    • Sequential poisson sampling
    • E. Ohlsson. Sequential poisson sampling. J. Official Statistics, 14(2):149-162, 1998.
    • (1998) J. Official Statistics , vol.14 , Issue.2 , pp. 149-162
    • Ohlsson, E.1
  • 42
    • 0001630482 scopus 로고
    • Asymptotic theory for successive sampling with varying probabilities without replacement, I
    • B. Rosén. Asymptotic theory for successive sampling with varying probabilities without replacement, I. The Annals of Mathematical Statistics, 43(2):373-397, 1972.
    • (1972) The Annals of Mathematical Statistics , vol.43 , Issue.2 , pp. 373-397
    • Rosén, B.1
  • 43
    • 0031571412 scopus 로고    scopus 로고
    • Asymptotic theory for order sampling
    • B. Rosén. Asymptotic theory for order sampling. J. Statistical Planning and Inference, 62(2):135-158, 1997.
    • (1997) J. Statistical Planning and Inference , vol.62 , Issue.2 , pp. 135-158
    • Rosén, B.1
  • 44
    • 0031571470 scopus 로고    scopus 로고
    • On sampling with probability proportional to size
    • B. Rosen. On sampling with probability proportional to size. J. Statistical Planning and Inference, 62(2):159-191, 1997.
    • (1997) J. Statistical Planning and Inference , vol.62 , Issue.2 , pp. 159-191
    • Rosen, B.1
  • 45
    • 1142267351 scopus 로고    scopus 로고
    • Winnowing: Local algorithms for document fingerprinting
    • S. Schleimer, D. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003.
    • (2003) SIGMOD
    • Schleimer, S.1    Wilkerson, D.2    Aiken, A.3
  • 46
    • 0005258822 scopus 로고    scopus 로고
    • A protocol-independent technique for eliminating redundant network traffic
    • N. T. Spring and D. Wetherall. A protocol-independent technique for eliminating redundant network traffic. In SIGCOMM, 2000.
    • (2000) SIGCOMM
    • Spring, N.T.1    Wetherall, D.2
  • 47
    • 0041472305 scopus 로고    scopus 로고
    • Efficient content location using interest-based locality in peer-to-peer systems
    • K. Sripanidkulchai, B. Maggs, and H. Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In INFOCOM, 2003.
    • (2003) INFOCOM
    • Sripanidkulchai, K.1    Maggs, B.2    Zhang, H.3
  • 48
    • 33748098958 scopus 로고    scopus 로고
    • The DLT priority sampling is essentially optimal
    • M. Szegedy. The DLT priority sampling is essentially optimal. In STOC, 2006.
    • (2006) STOC
    • Szegedy, M.1
  • 49
    • 70449656148 scopus 로고    scopus 로고
    • M. Szegedy and M. Thorup. On the variance of subset sum estimation. In ESA, 2007.
    • M. Szegedy and M. Thorup. On the variance of subset sum estimation. In ESA, 2007.
  • 50
    • 3042749011 scopus 로고    scopus 로고
    • Peer-to-peer information retrieval using self-organizing semantic overlay networks
    • C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In SIGCOMM, 2003.
    • (2003) SIGCOMM
    • Tang, C.1    Xu, Z.2    Dwarkadas, S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.