-
1
-
-
77954734537
-
-
Apache Hadoop. http://hadoop.apache.org.
-
-
-
-
2
-
-
77954693149
-
-
Apache Hive. http://hadoop.apache.org/hive.
-
-
-
-
3
-
-
85104914015
-
Efficient exact set-similarity joins
-
A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918-929, 2006.
-
(2006)
VLDB
, pp. 918-929
-
-
Arasu, A.1
Ganti, V.2
Kaushik, R.3
-
4
-
-
35348849154
-
Scaling up all pairs similarity search
-
R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, pages 131-140, 2007.
-
(2007)
WWW
, pp. 131-140
-
-
Bayardo, R.J.1
Ma, Y.2
Srikant, R.3
-
5
-
-
0010362121
-
Syntactic clustering of the web
-
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
-
(1997)
Computer Networks
, vol.29
, Issue.8-13
, pp. 1157-1166
-
-
Broder, A.Z.1
Glassman, S.C.2
Manasse, M.S.3
Zweig, G.4
-
6
-
-
33749597967
-
A primitive operator for similarity joins in data cleaning
-
S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, 2006.
-
(2006)
ICDE
, pp. 5
-
-
Chaudhuri, S.1
Ganti, V.2
Kaushik, R.3
-
7
-
-
37549003336
-
MapReduce: Simplified data processing on large clusters
-
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113, 2008.
-
(2008)
Commun. ACM
, vol.51
, Issue.1
, pp. 107-113
-
-
Dean, J.1
Ghemawat, S.2
-
8
-
-
0026870271
-
Parallel database systems: The future of high performance database systems
-
D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85-98, 1992.
-
(1992)
Commun. ACM
, vol.35
, Issue.6
, pp. 85-98
-
-
DeWitt, D.J.1
Gray, J.2
-
9
-
-
0002773778
-
An evaluation of non-equijoin algorithms
-
D. J. DeWitt, J. F. Naughton, and D. A. Schneider. An evaluation of non-equijoin algorithms. In VLDB, pages 443-452, 1991.
-
(1991)
VLDB
, pp. 443-452
-
-
DeWitt, D.J.1
Naughton, J.F.2
Schneider, D.A.3
-
10
-
-
77952278077
-
Building a highlevel dataflow system on top of MapReduce: The Pig experience
-
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a highlevel dataflow system on top of MapReduce: the Pig experience. PVLDB, 2(2):1414-1425, 2009.
-
(2009)
PVLDB
, vol.2
, Issue.2
, pp. 1414-1425
-
-
Gates, A.1
Natkovich, O.2
Chopra, S.3
Kamath, P.4
Narayanam, S.5
Olston, C.6
Reed, B.7
Srinivasan, S.8
Srivastava, U.9
-
11
-
-
77954730261
-
-
Genbank. http://www.ncbi.nlm.nih.gov/Genbank.
-
Genbank
-
-
-
12
-
-
15044355327
-
Similarity search in high dimensions via hashing
-
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518-529, 1999.
-
(1999)
VLDB
, pp. 518-529
-
-
Gionis, A.1
Indyk, P.2
Motwani, R.3
-
13
-
-
84944318804
-
Approximate string joins in a database (almost) for free
-
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491-500, 2001.
-
(2001)
VLDB
, pp. 491-500
-
-
Gravano, L.1
Ipeirotis, P.G.2
Jagadish, H.V.3
Koudas, N.4
Muthukrishnan, S.5
Srivastava, D.6
-
14
-
-
33750296887
-
Finding near-duplicate web pages: A large-scale evaluation of algorithms
-
M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
-
(2006)
SIGIR
, pp. 284-291
-
-
Henzinger, M.R.1
-
15
-
-
0037319544
-
Methods for identifying versioned and plagiarized documents
-
T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
-
(2003)
JASIST
, vol.54
, Issue.3
, pp. 203-215
-
-
Hoad, T.C.1
Zobel, J.2
-
16
-
-
77954753681
-
-
Jaql. http://www.jaql.org.
-
-
-
-
18
-
-
0001793230
-
Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc)
-
M. Kitsuregawa and Y. Ogawa. Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In VLDB, pages 210-221, 1990.
-
(1990)
VLDB
, pp. 210-221
-
-
Kitsuregawa, M.1
Ogawa, Y.2
-
19
-
-
65749311706
-
Application of hash to data base machine and its architecture
-
M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Comput., 1(1):63-74, 1983.
-
(1983)
New Generation Comput.
, vol.1
, Issue.1
, pp. 63-74
-
-
Kitsuregawa, M.1
Tanaka, H.2
Moto-Oka, T.3
-
20
-
-
35348835502
-
Detectives: Detecting coalition hit inflation attacks in advertising networks streams
-
A. Metwally, D. Agrawal, and A. E. Abbadi. Detectives: detecting coalition hit inflation attacks in advertising networks streams. In WWW, pages 241-250, 2007.
-
(2007)
WWW
, pp. 241-250
-
-
Metwally, A.1
Agrawal, D.2
Abbadi, A.E.3
-
21
-
-
70350512695
-
A comparison of approaches to large-scale data analysis
-
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD Conference, pages 165-178, 2009.
-
(2009)
SIGMOD Conference
, pp. 165-178
-
-
Pavlo, A.1
Paulson, E.2
Rasin, A.3
Abadi, D.J.4
DeWitt, D.J.5
Madden, S.6
Stonebraker, M.7
-
22
-
-
34250638291
-
A web-based kernel function for measuring the similarity of short text snippets
-
M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages 377-386, 2006.
-
(2006)
WWW
, pp. 377-386
-
-
Sahami, M.1
Heilman, T.D.2
-
23
-
-
3142777876
-
Efficient set joins on similarity predicates
-
S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD Conference, pages 743-754, 2004.
-
(2004)
SIGMOD Conference
, pp. 743-754
-
-
Sarawagi, S.1
Kirpal, A.2
-
24
-
-
84976736061
-
A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment
-
D. A. Schneider and D. J. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In SIGMOD Conference, pages 110-121, 1989.
-
(1989)
SIGMOD Conference
, pp. 110-121
-
-
Schneider, D.A.1
DeWitt, D.J.2
-
25
-
-
32344452531
-
Evaluating similarity measures: A large-scale study in the orkut social network
-
E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In KDD, pages 678-684, 2005.
-
(2005)
KDD
, pp. 678-684
-
-
Spertus, E.1
Sahami, M.2
Buyukkokten, O.3
-
26
-
-
77954743231
-
-
Technical report, Department of Computer Science, March
-
R. Vernica, M. Carey, and C. Li. Efficient parallel set-similarity joins using MapReduce. Technical report, Department of Computer Science, UC Irvine, March 2010. http://asterix.ics.uci.edu.
-
(2010)
Efficient Parallel Set-similarity Joins Using MapReduce
-
-
Vernica, R.1
Carey, M.2
Li, C.3
-
27
-
-
77954731952
-
-
version 1
-
Web 1t 5-gram version 1. http://www.ldc.upenn. edu/Catalog/CatalogEntry. jsp? catalogId=LDC2006T13.
-
Web 1t 5-gram
-
-
-
28
-
-
70849105253
-
Ed-join: An efficient algorithm for similarity joins with edit distance constraints
-
C. Xiao, W. Wang, and X. Lin. Ed-join: An efficient algorithm for similarity joins with edit distance constraints. In VLDB, 2008.
-
(2008)
VLDB
-
-
Xiao, C.1
Wang, W.2
Lin, X.3
-
29
-
-
57349141410
-
Efficient similarity joins for near duplicate detection
-
C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, pages 131-140, 2008.
-
(2008)
WWW
, pp. 131-140
-
-
Xiao, C.1
Wang, W.2
Lin, X.3
Yu, J.X.4
-
30
-
-
35448944021
-
Map-Reduce-Merge: Simplified relational data processing on large clusters
-
H. Yang, A. Dasdan, R.-L. Hsiao, and D. S. P. Jr. Map-Reduce-Merge: simplified relational data processing on large clusters. In SIGMOD Conference, pages 1029-1040, 2007.
-
(2007)
SIGMOD Conference
, pp. 1029-1040
-
-
Yang, H.1
Dasdan, A.2
Hsiao, R.-L.3
P Jr., D.S.4
|