-
1
-
-
0038166193
-
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
-
Achlioptas, Dimitris. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671-687.
-
(2003)
Journal of Computer and System Sciences
, vol.66
, Issue.4
, pp. 671-687
-
-
Achlioptas, D.1
-
2
-
-
0347718066
-
Fast algorithms for projected clustering
-
Philadelphia, PA
-
Aggarwal, Charu C., Cecilia Magdalena Procopiuc, Joel L. Wolf, Philip S. Yu, and Jong Soo Park. 1999. Fast algorithms for projected clustering. In SIGMOD, pages 61-72, Philadelphia, PA.
-
(1999)
SIGMOD
, pp. 61-72
-
-
Aggarwal, C.C.1
Magdalena Procopiuc, C.2
Wolf, J.L.3
Yu, P.S.4
Soo Park, J.5
-
3
-
-
0347761797
-
A new method for similarity indexing of market basket data
-
Philadelphia, PA
-
Aggarwal, Charu C. and Joel L. Wolf. 1999. A new method for similarity indexing of market basket data. In SIGMOD, pages 407-418, Philadelphia, PA.
-
(1999)
SIGMOD
, pp. 407-418
-
-
Aggarwal, C.C.1
Wolf, J.L.2
-
4
-
-
0027621699
-
Mining association rules between sets of items in large databases
-
Washington, DC
-
Agrawal, Rakesh, Tomasz Imielinski, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In SIGMOD, pages 207-216, Washington, DC.
-
(1993)
SIGMOD
, pp. 207-216
-
-
Agrawal, R.1
Imielinski, T.2
Swami, A.3
-
5
-
-
0001371923
-
Fast discovery of association rules
-
U. M. Fayyad, G. Pratetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, AAAI/MIT Press, Cambridge, MA
-
Agrawal, Rakesh, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. 1996. Fast discovery of association rules. In U. M. Fayyad, G. Pratetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, pages 307-328, Cambridge, MA.
-
(1996)
Advances in Knowledge Discovery and Data Mining
, pp. 307-328
-
-
Agrawal, R.1
Mannila, H.2
Srikant, R.3
Toivonen, H.4
Inkeri Verkamo, A.5
-
6
-
-
0001882616
-
Fast algorithms for mining association rules in large databases
-
Santiago de Chile, Chile
-
Agrawal, Rakesh and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In VLDB, pages 487-499, Santiago de Chile, Chile.
-
(1994)
VLDB
, pp. 487-499
-
-
Agrawal, R.1
Srikant, R.2
-
7
-
-
0003535936
-
-
John Wiley & Sons, Inc, Hoboken, NJ, second edition
-
Agresti, Alan. 2002. Categorical Data Analysis. John Wiley & Sons, Inc., Hoboken, NJ, second edition.
-
(2002)
Categorical Data Analysis
-
-
Agresti, A.1
-
8
-
-
0029719644
-
The space complexity of approximating the frequency moments
-
Philadelphia, PA
-
Alon, Noga, Yossi Matias, and Mario Szegedy. 1996. The space complexity of approximating the frequency moments. In STOC, pages 20-29, Philadelphia, PA.
-
(1996)
STOC
, pp. 20-29
-
-
Alon, N.1
Matias, Y.2
Szegedy, M.3
-
11
-
-
84976810280
-
Copy detection mechanisms for digital documents
-
San Jose, CA
-
Brin, Sergey, James Davis, and Hector Garcia-Molina. 1995. Copy detection mechanisms for digital documents. In SIGMOD, pages 398-409, San Jose, CA.
-
(1995)
SIGMOD
, pp. 398-409
-
-
Brin, S.1
Davis, J.2
Garcia-Molina, H.3
-
12
-
-
0038589165
-
-
Brin, Sergey and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107-117, Brisbane, Australia.
-
Brin, Sergey and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107-117, Brisbane, Australia.
-
-
-
-
13
-
-
0031161999
-
Beyond market baskets: Generalizing association rules to correlations
-
Tucson, AZ
-
Brin, Sergy, Rajeev Motwani, and Craig Silverstein. 1997. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD, pages 265-276, Tucson, AZ.
-
(1997)
SIGMOD
, pp. 265-276
-
-
Brin, S.1
Motwani, R.2
Silverstein, C.3
-
14
-
-
0031162961
-
Dynamic itemset counting and implication rules for market basket data
-
Tucson, AZ
-
Brin, Sergy, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. 1997. Dynamic itemset counting and implication rules for market basket data. In SIGMOD, pages 265-276, Tucson, AZ.
-
(1997)
SIGMOD
, pp. 265-276
-
-
Brin, S.1
Motwani, R.2
Ullman, J.D.3
Tsur, S.4
-
15
-
-
0031346696
-
On the resemblance and containment of documents
-
Positano, Italy
-
Broder, Andrei Z. 1997. On the resemblance and containment of documents. In The Compression and Complexity of Sequences, pages 21-29, Positano, Italy.
-
(1997)
The Compression and Complexity of Sequences
, pp. 21-29
-
-
Broder, A.Z.1
-
16
-
-
34748822435
-
-
Broder, Andrei Z. 1998. Filtering near-duplicate documents. In FUN, Isola d'Elba, Italy.
-
Broder, Andrei Z. 1998. Filtering near-duplicate documents. In FUN, Isola d'Elba, Italy.
-
-
-
-
17
-
-
0031620041
-
Min-wise independent permutations (extended abstract)
-
Dallas, TX
-
Broder, Andrei Z., Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. 1998. Min-wise independent permutations (extended abstract). In STOC, pages 327-336, Dallas, TX.
-
(1998)
STOC
, pp. 327-336
-
-
Broder, A.Z.1
Charikar, M.2
Frieze, A.M.3
Mitzenmacher, M.4
-
18
-
-
0034207121
-
Min-wise independent permutations
-
Broder, Andrei Z., Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. 2000. Min-wise independent permutations. Journal of Computer Systems and Sciences, 60(3):630-659.
-
(2000)
Journal of Computer Systems and Sciences
, vol.60
, Issue.3
, pp. 630-659
-
-
Broder, A.Z.1
Charikar, M.2
Frieze, A.M.3
Mitzenmacher, M.4
-
19
-
-
0010362121
-
-
Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. In WWW, pages 1157-1166, Santa Clara, CA.
-
Broder, Andrei Z., Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. In WWW, pages 1157-1166, Santa Clara, CA.
-
-
-
-
20
-
-
0036040277
-
Similarity estimation techniques from rounding algorithms
-
Montreal, Canada
-
Charikar, Moses S. 2002. Similarity estimation techniques from rounding algorithms. In STOC, pages 380-388, Montreal, Canada.
-
(2002)
STOC
, pp. 380-388
-
-
Charikar, M.S.1
-
21
-
-
0032089874
-
Random sampling for histogram construction: How much is enough?
-
Seattle, WA
-
Chaudhuri Surajit, Rajeev Motwani, and Vivek R. Narasayya. 1998. Random sampling for histogram construction: How much is enough? In SIGMOD, pages 436-447, Seattle, WA.
-
(1998)
SIGMOD
, pp. 436-447
-
-
Surajit, C.1
Motwani, R.2
Narasayya, V.R.3
-
22
-
-
0347761807
-
On random sampling over joins
-
Philadelphia, PA
-
Chaudhuri, Surajit, Rajeev Motwani, and Vivek R. Narasayya. 1999. On random sampling over joins. In SIGMOD, pages 263-274, Philadelphia, PA.
-
(1999)
SIGMOD
, pp. 263-274
-
-
Chaudhuri, S.1
Motwani, R.2
Narasayya, V.R.3
-
23
-
-
0242625264
-
-
Chen, Bin, Peter Haas, and Peter Scheuermann. 2002. New two-phase sampling based algorithm for discovering association rules. In KDD, pages 462-468, Edmonton, Canada.
-
Chen, Bin, Peter Haas, and Peter Scheuermann. 2002. New two-phase sampling based algorithm for discovering association rules. In KDD, pages 462-468, Edmonton, Canada.
-
-
-
-
24
-
-
84936824188
-
Word association norms, mutual information and lexicography
-
Church, Kenneth and Patrick Hanks. 1991. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22-29.
-
(1991)
Computational Linguistics
, vol.16
, Issue.1
, pp. 22-29
-
-
Church, K.1
Hanks, P.2
-
26
-
-
0003707560
-
-
John Wiley & Sons, Inc, New York, NY, second edition
-
David, Herbert A. 1981. Order Statistics. John Wiley & Sons, Inc., New York, NY, second edition.
-
(1981)
Order Statistics
-
-
David, H.A.1
-
27
-
-
0000695960
-
On a least squares adjustment of a sampled frequency table when the expected marginal totals are known
-
Deming, W. Edwards and Frederick F. Stephan. 1940. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11(4):427-444.
-
(1940)
The Annals of Mathematical Statistics
, vol.11
, Issue.4
, pp. 427-444
-
-
Deming, W.E.1
Stephan, F.F.2
-
28
-
-
26944440870
-
Approximating a gram matrix for improved kernel-based learning
-
Bertinoro, Italy
-
Drineas, Petros and Michael W. Mahoney. 2005. Approximating a gram matrix for improved kernel-based learning. In COLT, pages 323-337, Bertinoro, Italy.
-
(2005)
COLT
, pp. 323-337
-
-
Drineas, P.1
Mahoney, M.W.2
-
29
-
-
85055298348
-
Accurate methods for the statistics of surprise and coincidence
-
Dunning, Ted. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61-74.
-
(1993)
Computational Linguistics
, vol.19
, Issue.1
, pp. 61-74
-
-
Dunning, T.1
-
30
-
-
17644418833
-
Web-scale information extraction in knowitall (preliminary results)
-
New York, NY
-
Etzioni, Oren, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall (preliminary results). In WWW, pages 100-110, New York, NY.
-
(2004)
In WWW
, pp. 100-110
-
-
Etzioni, O.1
Cafarella, M.2
Downey, D.3
Kok, S.4
Popescu, A.5
Shaked, T.6
Soderland, S.7
Weld, D.S.8
Yates, A.9
-
31
-
-
0013084629
-
-
Prentice Hall, New York, NY
-
Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. 2002. Database Systems: The Complete Book. Prentice Hall, New York, NY.
-
(2002)
Database Systems: The Complete Book
-
-
Garcia-Molina, H.1
Ullman, J.D.2
Widom, J.3
-
32
-
-
0037957085
-
One-pass wavelet decompositions of data streams
-
Gilbert, Anna C., Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. 2003. One-pass wavelet decompositions of data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3)541-554.
-
(2003)
IEEE Transactions on Knowledge and Data Engineering
, vol.15
, Issue.3
, pp. 541-554
-
-
Gilbert, A.C.1
Yannis Kotidis, S.M.2
Strauss, M.J.3
-
33
-
-
0032091595
-
Cure: An efficient clustering algorithm for large databases
-
Seattle, WA
-
Guha Sudipto, Rajeev Rastogi, and Kyuseok Shim. 1998. Cure: An efficient clustering algorithm for large databases. In SIGMOD, pages 73-84, Seattle, WA.
-
(1998)
SIGMOD
, pp. 73-84
-
-
Sudipto, G.1
Rastogi, R.2
Shim, K.3
-
34
-
-
0003684449
-
-
Springer, New York, NY
-
Hastie, T. R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY.
-
(2001)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
-
-
Hastie, T.R.T.1
Friedman, J.2
-
35
-
-
0003067623
-
Scalable techniques for clustering the Web
-
Dallas, TX
-
Haveliwala, Taher H., Aristides Gionis, and Piotr Indyk. 2000. Scalable techniques for clustering the Web. In WebDB, pages 129-134, Dallas, TX.
-
(2000)
WebDB
, pp. 129-134
-
-
Haveliwala, T.H.1
Gionis, A.2
Indyk, P.3
-
36
-
-
77953112255
-
-
Haveliwala, Taher H., Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW, pages 432-442, Honolulu, HI.
-
Haveliwala, Taher H., Aristides Gionis, Dan Klein, and Piotr Indyk. 2002. Evaluating strategies for similarity search on the web. In WWW, pages 432-442, Honolulu, HI.
-
-
-
-
37
-
-
0346457324
-
Online association rule mining
-
Philadelphia, PA
-
Hidber, Christian. 1999. Online association rule mining. In SIGMOD, pages 145-156, Philadelphia, PA.
-
(1999)
SIGMOD
, pp. 145-156
-
-
Hidber, C.1
-
38
-
-
0003815920
-
-
Hornby, Albert Sydney, editor, Oxford University Press, Oxford, UK, fourth edition
-
Hornby, Albert Sydney, editor. 1989. Oxford Advanced Learner's Dictionary of Current English. Oxford University Press, Oxford, UK, fourth edition.
-
(1989)
Oxford Advanced Learner's Dictionary of Current English
-
-
-
39
-
-
0344612511
-
A small approximately min-wise independent family of hash functions
-
Indyk, Piotr. 2001. A small approximately min-wise independent family of hash functions. Journal of Algorithm, 38(1):84-90.
-
(2001)
Journal of Algorithm
, vol.38
, Issue.1
, pp. 84-90
-
-
Indyk, P.1
-
40
-
-
0031644241
-
Approximate nearest neighbors: Towards removing the curse of dimensionality
-
Dallas, TX
-
Indyk, Piotr and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, pages 604-613, Dallas, TX.
-
(1998)
STOC
, pp. 604-613
-
-
Indyk, P.1
Motwani, R.2
-
41
-
-
0038784483
-
On the sample size of k-restricted min-wise independent permutations and other k-wise distributions
-
San Diego, CA
-
Itoh, Toshiya, Yoshinori Takei, and Jun Tarui. 2003. On the sample size of k-restricted min-wise independent permutations and other k-wise distributions. In STOC, pages 710-718, San Diego, CA.
-
(2003)
STOC
, pp. 710-718
-
-
Itoh, T.1
Takei, Y.2
Tarui, J.3
-
43
-
-
34748905473
-
-
Li, Ping. 2006. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, available from http://arxiv.org/PS_cache/cs/pdf/0611/0611114v2.pdf.
-
Li, Ping. 2006. Very sparse stable random projections, estimators and tail bounds for stable random projections. Technical report, available from http://arxiv.org/PS_cache/cs/pdf/0611/0611114v2.pdf.
-
-
-
-
44
-
-
34748873316
-
Using sketches to estimate two-way and multi-way associations
-
Technical Report TR-2005-115, Microsoft Research, Redmond, WA, September
-
Li, Ping and Kenneth W. Church. 2005. Using sketches to estimate two-way and multi-way associations. Technical Report TR-2005-115, Microsoft Research, Redmond, WA, September.
-
(2005)
-
-
Li, P.1
Church, K.W.2
-
45
-
-
34748926449
-
Conditional random sampling: A sketched-based sampling technique for sparse data
-
Technical Report 2006-08, Department of Statistics, Stanford University
-
Li, Ping, Kenneth W. Church, and Trevor J. Hastie. 2006. Conditional random sampling: A sketched-based sampling technique for sparse data. Technical Report 2006-08, Department of Statistics, Stanford University.
-
(2006)
-
-
Li, P.1
Church, K.W.2
Hastie, T.J.3
-
46
-
-
84864064770
-
Conditional random sampling: A sketch-based sampling technique for sparse data
-
Vancouver, BC, Canada
-
Li, Ping, Kenneth W. Church, and Trevor J. Hastie. 2007. Conditional random sampling: A sketch-based sampling technique for sparse data. In NIPS, pages 873-880. Vancouver, BC, Canada.
-
(2007)
NIPS
, pp. 873-880
-
-
Li, P.1
Church, K.W.2
Hastie, T.J.3
-
47
-
-
33746094275
-
Improving random projections using marginal information
-
Pittsburgh, PA
-
Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006a. Improving random projections using marginal information. In COLT, pages 635-649, Pittsburgh, PA.
-
(2006)
COLT
, pp. 635-649
-
-
Li, P.1
Hastie, T.J.2
Church, K.W.3
-
48
-
-
33749573641
-
-
Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006b. Very sparse random projections. In KDD, pages 287-296, Philadelphia, PA.
-
Li, Ping, Trevor J. Hastie, and Kenneth W Church. 2006b. Very sparse random projections. In KDD, pages 287-296, Philadelphia, PA.
-
-
-
-
49
-
-
38049003198
-
1 using Cauchy random projections
-
San Diego, CA
-
1 using Cauchy random projections. In COLT, pages 514-529, San Diego, CA.
-
(2007)
COLT
, pp. 514-529
-
-
Li, P.1
Hastie, T.J.2
Church, K.W.3
-
50
-
-
0242698558
-
Random sampling techniques for space efficient online computation of order statistics of large datasets
-
Philadelphia, PA
-
Manku, Gurmeet Singh, Sridhar Rajagopalan, and Bruce G. Lindsay. 1999. Random sampling techniques for space efficient online computation of order statistics of large datasets. In SIGCOMM, pages 251-262, Philadelphia, PA.
-
(1999)
SIGCOMM
, pp. 251-262
-
-
Manku, G.S.1
Rajagopalan, S.2
Lindsay, B.G.3
-
52
-
-
0032094250
-
Wavelet-based histograms for selectivity estimation
-
Seattle, WA
-
Matias, Yossi, Jeffrey Scott Vitter, and Min Wang. 1998. Wavelet-based histograms for selectivity estimation. In SIGMOD, pages 448-459, Seattle, WA.
-
(1998)
SIGMOD
, pp. 448-459
-
-
Matias, Y.1
Scott Vitter, J.2
Wang, M.3
-
53
-
-
85117198887
-
On log-likelihood-ratios and the significance of rare events
-
Barcelona, Spain
-
Moore, Robert C. 2004. On log-likelihood-ratios and the significance of rare events. In EMNLP, pages 333-340, Barcelona, Spain.
-
(2004)
EMNLP
, pp. 333-340
-
-
Moore, R.C.1
-
54
-
-
0003868769
-
-
Pearsall, Judy, editor, Oxford University Press, Oxford, UK
-
Pearsall, Judy, editor. 1998. The New Oxford Dictionary of English. Oxford University Press, Oxford, UK.
-
(1998)
The New Oxford Dictionary of English
-
-
-
55
-
-
36949016905
-
-
Ravichandran, Deepak, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In ACL, pages 622-629, Ann Arbor, MI.
-
Ravichandran, Deepak, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In ACL, pages 622-629, Ann Arbor, MI.
-
-
-
-
56
-
-
0001630482
-
Asymptotic theory for successive sampling with varying probabilities without replacement, I
-
Rosen, Bengt. 1972a. Asymptotic theory for successive sampling with varying probabilities without replacement, I. The Annals of Mathematical Statistics, 43(2):373-397.
-
(1972)
The Annals of Mathematical Statistics
, vol.43
, Issue.2
, pp. 373-397
-
-
Rosen, B.1
-
57
-
-
0001630482
-
Asymptotic theory for successive sampling with varying probabilities without replacement, II
-
Rosen, Bengt. 1972b. Asymptotic theory for successive sampling with varying probabilities without replacement, II. The Annals of Mathematical Statistics, 43(3):748-776.
-
(1972)
The Annals of Mathematical Statistics
, vol.43
, Issue.3
, pp. 748-776
-
-
Rosen, B.1
-
58
-
-
0003882234
-
-
Addison-Wesley, New York, NY
-
Salton, Gerard. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, New York, NY.
-
(1989)
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
-
-
Salton, G.1
-
59
-
-
0040188435
-
An iterative method of adjusting sample frequency tables when expected marginal totals are known
-
Stephan, Frederick F. 1942. An iterative method of adjusting sample frequency tables when expected marginal totals are known. The Annals of Mathematical Statistics, 13(2):166-178.
-
(1942)
The Annals of Mathematical Statistics
, vol.13
, Issue.2
, pp. 166-178
-
-
Stephan, F.F.1
-
60
-
-
84947579437
-
A scalable approach to balanced, high-dimensional clustering of market-baskets
-
Bangalore, India
-
Strehl, Alexander and Joydeep Ghosh. 2000. A scalable approach to balanced, high-dimensional clustering of market-baskets. In HiPC, pages 525-536, Bangalore, India.
-
(2000)
HiPC
, pp. 525-536
-
-
Strehl, A.1
Ghosh, J.2
-
61
-
-
0002663971
-
Sampling large databases for association rules
-
Bombay, India
-
Toivonen, Hannu. 1996. Sampling large databases for association rules. In VLDB, pages 134-145, Bombay, India.
-
(1996)
VLDB
, pp. 134-145
-
-
Toivonen, H.1
-
62
-
-
14844315829
-
-
American Mathematical Society, Providence, RI
-
Vempala, Santosh. 2004. The Random Projection Method. American Mathematical Society, Providence, RI.
-
(2004)
The Random Projection Method
-
-
Vempala, S.1
-
63
-
-
0003756969
-
-
Morgan Kaufmann Publishing, San Francisco, CA, second edition
-
Witten, Ian H., Alstair Moffat, and Timothy C. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, CA, second edition.
-
(1999)
Managing Gigabytes: Compressing and Indexing Documents and Images
-
-
Witten, I.H.1
Moffat, A.2
Bell, T.C.3
|