메뉴 건너뛰기




Volumn 33, Issue 3, 2007, Pages 305-354

A sketch algorithm for estimating two-way and multi-way associations

Author keywords

[No Author keywords available]

Indexed keywords

CONTINGENCY TABLE; DOCUMENT FREQUENCY; SMALL SAMPLES; TWO WAYS;

EID: 34748825544     PISSN: 08912017     EISSN: 15309312     Source Type: Journal    
DOI: 10.1162/coli.2007.33.3.305     Document Type: Article
Times cited : (40)

References (63)
  • 1
    • 0038166193 scopus 로고    scopus 로고
    • Database-friendly random projections: Johnson-Lindenstrauss with binary coins
    • Achlioptas, Dimitris. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671-687.
    • (2003) Journal of Computer and System Sciences , vol.66 , Issue.4 , pp. 671-687
    • Achlioptas, D.1
  • 3
    • 0347761797 scopus 로고    scopus 로고
    • A new method for similarity indexing of market basket data
    • Philadelphia, PA
    • Aggarwal, Charu C. and Joel L. Wolf. 1999. A new method for similarity indexing of market basket data. In SIGMOD, pages 407-418, Philadelphia, PA.
    • (1999) SIGMOD , pp. 407-418
    • Aggarwal, C.C.1    Wolf, J.L.2
  • 4
    • 0027621699 scopus 로고
    • Mining association rules between sets of items in large databases
    • Washington, DC
    • Agrawal, Rakesh, Tomasz Imielinski, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In SIGMOD, pages 207-216, Washington, DC.
    • (1993) SIGMOD , pp. 207-216
    • Agrawal, R.1    Imielinski, T.2    Swami, A.3
  • 5
    • 0001371923 scopus 로고    scopus 로고
    • Fast discovery of association rules
    • U. M. Fayyad, G. Pratetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, AAAI/MIT Press, Cambridge, MA
    • Agrawal, Rakesh, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. 1996. Fast discovery of association rules. In U. M. Fayyad, G. Pratetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, pages 307-328, Cambridge, MA.
    • (1996) Advances in Knowledge Discovery and Data Mining , pp. 307-328
    • Agrawal, R.1    Mannila, H.2    Srikant, R.3    Toivonen, H.4    Inkeri Verkamo, A.5
  • 6
    • 0001882616 scopus 로고
    • Fast algorithms for mining association rules in large databases
    • Santiago de Chile, Chile
    • Agrawal, Rakesh and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In VLDB, pages 487-499, Santiago de Chile, Chile.
    • (1994) VLDB , pp. 487-499
    • Agrawal, R.1    Srikant, R.2
  • 7
    • 0003535936 scopus 로고    scopus 로고
    • John Wiley & Sons, Inc, Hoboken, NJ, second edition
    • Agresti, Alan. 2002. Categorical Data Analysis. John Wiley & Sons, Inc., Hoboken, NJ, second edition.
    • (2002) Categorical Data Analysis
    • Agresti, A.1
  • 8
    • 0029719644 scopus 로고    scopus 로고
    • The space complexity of approximating the frequency moments
    • Philadelphia, PA
    • Alon, Noga, Yossi Matias, and Mario Szegedy. 1996. The space complexity of approximating the frequency moments. In STOC, pages 20-29, Philadelphia, PA.
    • (1996) STOC , pp. 20-29
    • Alon, N.1    Matias, Y.2    Szegedy, M.3
  • 11
    • 84976810280 scopus 로고
    • Copy detection mechanisms for digital documents
    • San Jose, CA
    • Brin, Sergey, James Davis, and Hector Garcia-Molina. 1995. Copy detection mechanisms for digital documents. In SIGMOD, pages 398-409, San Jose, CA.
    • (1995) SIGMOD , pp. 398-409
    • Brin, S.1    Davis, J.2    Garcia-Molina, H.3
  • 12
    • 0038589165 scopus 로고    scopus 로고
    • Brin, Sergey and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107-117, Brisbane, Australia.
    • Brin, Sergey and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107-117, Brisbane, Australia.
  • 13
    • 0031161999 scopus 로고    scopus 로고
    • Beyond market baskets: Generalizing association rules to correlations
    • Tucson, AZ
    • Brin, Sergy, Rajeev Motwani, and Craig Silverstein. 1997. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD, pages 265-276, Tucson, AZ.
    • (1997) SIGMOD , pp. 265-276
    • Brin, S.1    Motwani, R.2    Silverstein, C.3
  • 14
    • 0031162961 scopus 로고    scopus 로고
    • Dynamic itemset counting and implication rules for market basket data
    • Tucson, AZ
    • Brin, Sergy, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. 1997. Dynamic itemset counting and implication rules for market basket data. In SIGMOD, pages 265-276, Tucson, AZ.
    • (1997) SIGMOD , pp. 265-276
    • Brin, S.1    Motwani, R.2    Ullman, J.D.3    Tsur, S.4
  • 15
    • 0031346696 scopus 로고    scopus 로고
    • On the resemblance and containment of documents
    • Positano, Italy
    • Broder, Andrei Z. 1997. On the resemblance and containment of documents. In The Compression and Complexity of Sequences, pages 21-29, Positano, Italy.
    • (1997) The Compression and Complexity of Sequences , pp. 21-29
    • Broder, A.Z.1
  • 16
    • 34748822435 scopus 로고    scopus 로고
    • Broder, Andrei Z. 1998. Filtering near-duplicate documents. In FUN, Isola d'Elba, Italy.
    • Broder, Andrei Z. 1998. Filtering near-duplicate documents. In FUN, Isola d'Elba, Italy.
  • 17
    • 0031620041 scopus 로고    scopus 로고
    • Min-wise independent permutations (extended abstract)
    • Dallas, TX
    • Broder, Andrei Z., Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. 1998. Min-wise independent permutations (extended abstract). In STOC, pages 327-336, Dallas, TX.
    • (1998) STOC , pp. 327-336
    • Broder, A.Z.1    Charikar, M.2    Frieze, A.M.3    Mitzenmacher, M.4
  • 19
    • 0010362121 scopus 로고    scopus 로고
    • Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. In WWW, pages 1157-1166, Santa Clara, CA.
    • Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. In WWW, pages 1157-1166, Santa Clara, CA.
  • 20
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • Montreal, Canada
    • Charikar, Moses S. 2002. Similarity estimation techniques from rounding algorithms. In STOC, pages 380-388, Montreal, Canada.
    • (2002) STOC , pp. 380-388
    • Charikar, M.S.1
  • 21
    • 0032089874 scopus 로고    scopus 로고
    • Random sampling for histogram construction: How much is enough?
    • Seattle, WA
    • Chaudhuri Surajit, Rajeev Motwani, and Vivek R. Narasayya. 1998. Random sampling for histogram construction: How much is enough? In SIGMOD, pages 436-447, Seattle, WA.
    • (1998) SIGMOD , pp. 436-447
    • Surajit, C.1    Motwani, R.2    Narasayya, V.R.3
  • 22
    • 0347761807 scopus 로고    scopus 로고
    • On random sampling over joins
    • Philadelphia, PA
    • Chaudhuri, Surajit, Rajeev Motwani, and Vivek R. Narasayya. 1999. On random sampling over joins. In SIGMOD, pages 263-274, Philadelphia, PA.
    • (1999) SIGMOD , pp. 263-274
    • Chaudhuri, S.1    Motwani, R.2    Narasayya, V.R.3
  • 23
    • 0242625264 scopus 로고    scopus 로고
    • Chen, Bin, Peter Haas, and Peter Scheuermann. 2002. New two-phase sampling based algorithm for discovering association rules. In KDD, pages 462-468, Edmonton, Canada.
    • Chen, Bin, Peter Haas, and Peter Scheuermann. 2002. New two-phase sampling based algorithm for discovering association rules. In KDD, pages 462-468, Edmonton, Canada.
  • 24
    • 84936824188 scopus 로고
    • Word association norms, mutual information and lexicography
    • Church, Kenneth and Patrick Hanks. 1991. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22-29.
    • (1991) Computational Linguistics , vol.16 , Issue.1 , pp. 22-29
    • Church, K.1    Hanks, P.2
  • 26
    • 0003707560 scopus 로고
    • John Wiley & Sons, Inc, New York, NY, second edition
    • David, Herbert A. 1981. Order Statistics. John Wiley & Sons, Inc., New York, NY, second edition.
    • (1981) Order Statistics
    • David, H.A.1
  • 27
    • 0000695960 scopus 로고
    • On a least squares adjustment of a sampled frequency table when the expected marginal totals are known
    • Deming, W. Edwards and Frederick F. Stephan. 1940. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11(4):427-444.
    • (1940) The Annals of Mathematical Statistics , vol.11 , Issue.4 , pp. 427-444
    • Deming, W.E.1    Stephan, F.F.2
  • 28
    • 26944440870 scopus 로고    scopus 로고
    • Approximating a gram matrix for improved kernel-based learning
    • Bertinoro, Italy
    • Drineas, Petros and Michael W. Mahoney. 2005. Approximating a gram matrix for improved kernel-based learning. In COLT, pages 323-337, Bertinoro, Italy.
    • (2005) COLT , pp. 323-337
    • Drineas, P.1    Mahoney, M.W.2
  • 29
    • 85055298348 scopus 로고
    • Accurate methods for the statistics of surprise and coincidence
    • Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61-74.
    • (1993) Computational Linguistics , vol.19 , Issue.1 , pp. 61-74
    • Dunning, T.1
  • 33
    • 0032091595 scopus 로고    scopus 로고
    • Cure: An efficient clustering algorithm for large databases
    • Seattle, WA
    • Guha Sudipto, Rajeev Rastogi, and Kyuseok Shim. 1998. Cure: An efficient clustering algorithm for large databases. In SIGMOD, pages 73-84, Seattle, WA.
    • (1998) SIGMOD , pp. 73-84
    • Sudipto, G.1    Rastogi, R.2    Shim, K.3
  • 35
    • 0003067623 scopus 로고    scopus 로고
    • Scalable techniques for clustering the Web
    • Dallas, TX
    • Haveliwala, Taher H., Aristides Gionis, and Piotr Indyk. 2000. Scalable techniques for clustering the Web. In WebDB, pages 129-134, Dallas, TX.
    • (2000) WebDB , pp. 129-134
    • Haveliwala, T.H.1    Gionis, A.2    Indyk, P.3
  • 36
    • 77953112255 scopus 로고    scopus 로고
    • Haveliwala, Taher H., Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW, pages 432-442, Honolulu, HI.
    • Haveliwala, Taher H., Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW, pages 432-442, Honolulu, HI.
  • 37
    • 0346457324 scopus 로고    scopus 로고
    • Online association rule mining
    • Philadelphia, PA
    • Hidber, Christian. 1999. Online association rule mining. In SIGMOD, pages 145-156, Philadelphia, PA.
    • (1999) SIGMOD , pp. 145-156
    • Hidber, C.1
  • 38
    • 0003815920 scopus 로고
    • Hornby, Albert Sydney, editor, Oxford University Press, Oxford, UK, fourth edition
    • Hornby, Albert Sydney, editor. 1989. Oxford Advanced Learner's Dictionary of Current English. Oxford University Press, Oxford, UK, fourth edition.
    • (1989) Oxford Advanced Learner's Dictionary of Current English
  • 39
    • 0344612511 scopus 로고    scopus 로고
    • A small approximately min-wise independent family of hash functions
    • Indyk, Piotr. 2001. A small approximately min-wise independent family of hash functions. Journal of Algorithm, 38(1):84-90.
    • (2001) Journal of Algorithm , vol.38 , Issue.1 , pp. 84-90
    • Indyk, P.1
  • 40
    • 0031644241 scopus 로고    scopus 로고
    • Approximate nearest neighbors: Towards removing the curse of dimensionality
    • Dallas, TX
    • Indyk, Piotr and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, pages 604-613, Dallas, TX.
    • (1998) STOC , pp. 604-613
    • Indyk, P.1    Motwani, R.2
  • 41
    • 0038784483 scopus 로고    scopus 로고
    • On the sample size of k-restricted min-wise independent permutations and other k-wise distributions
    • San Diego, CA
    • Itoh, Toshiya, Yoshinori Takei, and Jun Tarui. 2003. On the sample size of k-restricted min-wise independent permutations and other k-wise distributions. In STOC, pages 710-718, San Diego, CA.
    • (2003) STOC , pp. 710-718
    • Itoh, T.1    Takei, Y.2    Tarui, J.3
  • 43
    • 34748905473 scopus 로고    scopus 로고
    • Li, Ping. 2006. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, available from http://arxiv.org/PS_cache/cs/pdf/0611/0611114v2.pdf.
    • Li, Ping. 2006. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, available from http://arxiv.org/PS_cache/cs/pdf/0611/0611114v2.pdf.
  • 44
    • 34748873316 scopus 로고    scopus 로고
    • Using sketches to estimate two-way and multi-way associations
    • Technical Report TR-2005-115, Microsoft Research, Redmond, WA, September
    • Li, Ping and Kenneth W. Church. 2005. Using sketches to estimate two-way and multi-way associations. Technical Report TR-2005-115, Microsoft Research, Redmond, WA, September.
    • (2005)
    • Li, P.1    Church, K.W.2
  • 45
    • 34748926449 scopus 로고    scopus 로고
    • Conditional random sampling: A sketched-based sampling technique for sparse data
    • Technical Report 2006-08, Department of Statistics, Stanford University
    • Li, Ping, Kenneth W. Church, and Trevor J. Hastie. 2006. Conditional random sampling: A sketched-based sampling technique for sparse data. Technical Report 2006-08, Department of Statistics, Stanford University.
    • (2006)
    • Li, P.1    Church, K.W.2    Hastie, T.J.3
  • 46
    • 84864064770 scopus 로고    scopus 로고
    • Conditional random sampling: A sketch-based sampling technique for sparse data
    • Vancouver, BC, Canada
    • Li, Ping, Kenneth W. Church, and Trevor J. Hastie. 2007. Conditional random sampling: A sketch-based sampling technique for sparse data. In NIPS, pages 873-880. Vancouver, BC, Canada.
    • (2007) NIPS , pp. 873-880
    • Li, P.1    Church, K.W.2    Hastie, T.J.3
  • 47
    • 33746094275 scopus 로고    scopus 로고
    • Improving random projections using marginal information
    • Pittsburgh, PA
    • Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006a. Improving random projections using marginal information. In COLT, pages 635-649, Pittsburgh, PA.
    • (2006) COLT , pp. 635-649
    • Li, P.1    Hastie, T.J.2    Church, K.W.3
  • 48
    • 33749573641 scopus 로고    scopus 로고
    • Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006b. Very sparse random projections. In KDD, pages 287-296, Philadelphia, PA.
    • Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006b. Very sparse random projections. In KDD, pages 287-296, Philadelphia, PA.
  • 49
    • 38049003198 scopus 로고    scopus 로고
    • 1 using Cauchy random projections
    • San Diego, CA
    • 1 using Cauchy random projections. In COLT, pages 514-529, San Diego, CA.
    • (2007) COLT , pp. 514-529
    • Li, P.1    Hastie, T.J.2    Church, K.W.3
  • 50
    • 0242698558 scopus 로고    scopus 로고
    • Random sampling techniques for space efficient online computation of order statistics of large datasets
    • Philadelphia, PA
    • Manku, Gurmeet Singh, Sridhar Rajagopalan, and Bruce G. Lindsay. 1999. Random sampling techniques for space efficient online computation of order statistics of large datasets. In SIGCOMM, pages 251-262, Philadelphia, PA.
    • (1999) SIGCOMM , pp. 251-262
    • Manku, G.S.1    Rajagopalan, S.2    Lindsay, B.G.3
  • 52
    • 0032094250 scopus 로고    scopus 로고
    • Wavelet-based histograms for selectivity estimation
    • Seattle, WA
    • Matias, Yossi, Jeffrey Scott Vitter, and Min Wang. 1998. Wavelet-based histograms for selectivity estimation. In SIGMOD, pages 448-459, Seattle, WA.
    • (1998) SIGMOD , pp. 448-459
    • Matias, Y.1    Scott Vitter, J.2    Wang, M.3
  • 53
    • 85117198887 scopus 로고    scopus 로고
    • On log-likelihood-ratios and the significance of rare events
    • Barcelona, Spain
    • Moore, Robert C. 2004. On log-likelihood-ratios and the significance of rare events. In EMNLP, pages 333-340, Barcelona, Spain.
    • (2004) EMNLP , pp. 333-340
    • Moore, R.C.1
  • 54
    • 0003868769 scopus 로고    scopus 로고
    • Pearsall, Judy, editor, Oxford University Press, Oxford, UK
    • Pearsall, Judy, editor. 1998. The New Oxford Dictionary of English. Oxford University Press, Oxford, UK.
    • (1998) The New Oxford Dictionary of English
  • 55
    • 36949016905 scopus 로고    scopus 로고
    • Ravichandran, Deepak, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In ACL, pages 622-629, Ann Arbor, MI.
    • Ravichandran, Deepak, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In ACL, pages 622-629, Ann Arbor, MI.
  • 56
    • 0001630482 scopus 로고
    • Asymptotic theory for successive sampling with varying probabilities without replacement, I
    • Rosen, Bengt. 1972a. Asymptotic theory for successive sampling with varying probabilities without replacement, I. The Annals of Mathematical Statistics, 43(2):373-397.
    • (1972) The Annals of Mathematical Statistics , vol.43 , Issue.2 , pp. 373-397
    • Rosen, B.1
  • 57
    • 0001630482 scopus 로고
    • Asymptotic theory for successive sampling with varying probabilities without replacement, II
    • Rosen, Bengt. 1972b. Asymptotic theory for successive sampling with varying probabilities without replacement, II. The Annals of Mathematical Statistics, 43(3):748-776.
    • (1972) The Annals of Mathematical Statistics , vol.43 , Issue.3 , pp. 748-776
    • Rosen, B.1
  • 59
    • 0040188435 scopus 로고
    • An iterative method of adjusting sample frequency tables when expected marginal totals are known
    • Stephan, Frederick F. 1942. An iterative method of adjusting sample frequency tables when expected marginal totals are known. The Annals of Mathematical Statistics, 13(2):166-178.
    • (1942) The Annals of Mathematical Statistics , vol.13 , Issue.2 , pp. 166-178
    • Stephan, F.F.1
  • 60
    • 84947579437 scopus 로고    scopus 로고
    • A scalable approach to balanced, high-dimensional clustering of market-baskets
    • Bangalore, India
    • Strehl, Alexander and Joydeep Ghosh. 2000. A scalable approach to balanced, high-dimensional clustering of market-baskets. In HiPC, pages 525-536, Bangalore, India.
    • (2000) HiPC , pp. 525-536
    • Strehl, A.1    Ghosh, J.2
  • 61
    • 0002663971 scopus 로고    scopus 로고
    • Sampling large databases for association rules
    • Bombay, India
    • Toivonen, Hannu. 1996. Sampling large databases for association rules. In VLDB, pages 134-145, Bombay, India.
    • (1996) VLDB , pp. 134-145
    • Toivonen, H.1
  • 62
    • 14844315829 scopus 로고    scopus 로고
    • American Mathematical Society, Providence, RI
    • Vempala, Santosh. 2004. The Random Projection Method. American Mathematical Society, Providence, RI.
    • (2004) The Random Projection Method
    • Vempala, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.