SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGMOD International Conference on Management of Data

Volumn , Issue , 2010, Pages 495-506

Efficient parallel set-similarity joins using MapReduce

(3) Vernica, Rares a Carey, Michael J a Li, Chen a

a Irvine (United States)

Author keywords

mapreduce; set similarity join

Indexed keywords

MAIN MEMORY; REAL DATA SETS; SCALE-UP; SELF-JOIN; SIMILARITY JOIN;

ALGORITHMS;

DATA PROCESSING;

EID: 77954744650 PISSN: 07308078 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1807167.1807222 Document Type: Conference Paper

Times cited : (417)

References (30)

1
- 77954734537
- Apache Hadoop. http://hadoop.apache.org.

2
- 77954693149
- Apache Hive. http://hadoop.apache.org/hive.

3
- 85104914015
- Efficient exact set-similarity joins
- A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918-929, 2006.
- (2006) VLDB , pp. 918-929
- Arasu, A.¹ Ganti, V.² Kaushik, R.³

4
- 35348849154
- Scaling up all pairs similarity search
- R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, pages 131-140, 2007.
- (2007) WWW , pp. 131-140
- Bayardo, R.J.¹ Ma, Y.² Srikant, R.³

5
- 0010362121
- Syntactic clustering of the web
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157-1166, 1997.
- (1997) Computer Networks , vol.29 , Issue.8-13 , pp. 1157-1166
- Broder, A.Z.¹ Glassman, S.C.² Manasse, M.S.³ Zweig, G.⁴

6
- 33749597967
- A primitive operator for similarity joins in data cleaning
- S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, 2006.
- (2006) ICDE , pp. 5
- Chaudhuri, S.¹ Ganti, V.² Kaushik, R.³

7
- 37549003336
- MapReduce: Simplified data processing on large clusters
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113, 2008.
- (2008) Commun. ACM , vol.51 , Issue.1 , pp. 107-113
- Dean, J.¹ Ghemawat, S.²

8
- 0026870271
- Parallel database systems: The future of high performance database systems
- D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85-98, 1992.
- (1992) Commun. ACM , vol.35 , Issue.6 , pp. 85-98
- DeWitt, D.J.¹ Gray, J.²

9
- 0002773778
- An evaluation of non-equijoin algorithms
- D. J. DeWitt, J. F. Naughton, and D. A. Schneider. An evaluation of non-equijoin algorithms. In VLDB, pages 443-452, 1991.
- (1991) VLDB , pp. 443-452
- DeWitt, D.J.¹ Naughton, J.F.² Schneider, D.A.³

10
- 77952278077
- Building a highlevel dataflow system on top of MapReduce: The Pig experience
- A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a highlevel dataflow system on top of MapReduce: the Pig experience. PVLDB, 2(2):1414-1425, 2009.
- (2009) PVLDB , vol.2 , Issue.2 , pp. 1414-1425
- Gates, A.¹ Natkovich, O.² Chopra, S.³ Kamath, P.⁴ Narayanam, S.⁵ Olston, C.⁶ Reed, B.⁷ Srinivasan, S.⁸ Srivastava, U.⁹

11
- 77954730261
- Genbank. http://www.ncbi.nlm.nih.gov/Genbank.
- Genbank

12
- 15044355327
- Similarity search in high dimensions via hashing
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518-529, 1999.
- (1999) VLDB , pp. 518-529
- Gionis, A.¹ Indyk, P.² Motwani, R.³

13
- 84944318804
- Approximate string joins in a database (almost) for free
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491-500, 2001.
- (2001) VLDB , pp. 491-500
- Gravano, L.¹ Ipeirotis, P.G.² Jagadish, H.V.³ Koudas, N.⁴ Muthukrishnan, S.⁵ Srivastava, D.⁶

14
- 33750296887
- Finding near-duplicate web pages: A large-scale evaluation of algorithms
- M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
- (2006) SIGIR , pp. 284-291
- Henzinger, M.R.¹

15
- 0037319544
- Methods for identifying versioned and plagiarized documents
- T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
- (2003) JASIST , vol.54 , Issue.3 , pp. 203-215
- Hoad, T.C.¹ Zobel, J.²

16
- 77954753681
- Jaql. http://www.jaql.org.

17
- 77954709256
- Jaql - Fuzzy join tutorial. http://code.google.com/ p/jaql/wiki/ fuzzyJoinTutorial.
- Jaql - Fuzzy Join Tutorial

18
- 0001793230
- Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc)
- M. Kitsuregawa and Y. Ogawa. Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In VLDB, pages 210-221, 1990.
- (1990) VLDB , pp. 210-221
- Kitsuregawa, M.¹ Ogawa, Y.²

19
- 65749311706
- Application of hash to data base machine and its architecture
- M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Comput., 1(1):63-74, 1983.
- (1983) New Generation Comput. , vol.1 , Issue.1 , pp. 63-74
- Kitsuregawa, M.¹ Tanaka, H.² Moto-Oka, T.³

20
- 35348835502
- Detectives: Detecting coalition hit inflation attacks in advertising networks streams
- A. Metwally, D. Agrawal, and A. E. Abbadi. Detectives: detecting coalition hit inflation attacks in advertising networks streams. In WWW, pages 241-250, 2007.
- (2007) WWW , pp. 241-250
- Metwally, A.¹ Agrawal, D.² Abbadi, A.E.³

21
- 70350512695
- A comparison of approaches to large-scale data analysis
- A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD Conference, pages 165-178, 2009.
- (2009) SIGMOD Conference , pp. 165-178
- Pavlo, A.¹ Paulson, E.² Rasin, A.³ Abadi, D.J.⁴ DeWitt, D.J.⁵ Madden, S.⁶ Stonebraker, M.⁷

22
- 34250638291
- A web-based kernel function for measuring the similarity of short text snippets
- M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages 377-386, 2006.
- (2006) WWW , pp. 377-386
- Sahami, M.¹ Heilman, T.D.²

23
- 3142777876
- Efficient set joins on similarity predicates
- S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD Conference, pages 743-754, 2004.
- (2004) SIGMOD Conference , pp. 743-754
- Sarawagi, S.¹ Kirpal, A.²

24
- 84976736061
- A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment
- D. A. Schneider and D. J. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In SIGMOD Conference, pages 110-121, 1989.
- (1989) SIGMOD Conference , pp. 110-121
- Schneider, D.A.¹ DeWitt, D.J.²

25
- 32344452531
- Evaluating similarity measures: A large-scale study in the orkut social network
- E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In KDD, pages 678-684, 2005.
- (2005) KDD , pp. 678-684
- Spertus, E.¹ Sahami, M.² Buyukkokten, O.³

26
- 77954743231
- Technical report, Department of Computer Science, March
- R. Vernica, M. Carey, and C. Li. Efficient parallel set-similarity joins using MapReduce. Technical report, Department of Computer Science, UC Irvine, March 2010. http://asterix.ics.uci.edu.
- (2010) Efficient Parallel Set-similarity Joins Using MapReduce
- Vernica, R.¹ Carey, M.² Li, C.³

27
- 77954731952
- version 1
- Web 1t 5-gram version 1. http://www.ldc.upenn. edu/Catalog/CatalogEntry. jsp? catalogId=LDC2006T13.
- Web 1t 5-gram

28
- 70849105253
- Ed-join: An efficient algorithm for similarity joins with edit distance constraints
- C. Xiao, W. Wang, and X. Lin. Ed-join: An efficient algorithm for similarity joins with edit distance constraints. In VLDB, 2008.
- (2008) VLDB
- Xiao, C.¹ Wang, W.² Lin, X.³

29
- 57349141410
- Efficient similarity joins for near duplicate detection
- C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, pages 131-140, 2008.
- (2008) WWW , pp. 131-140
- Xiao, C.¹ Wang, W.² Lin, X.³ Yu, J.X.⁴

30
- 35448944021
- Map-Reduce-Merge: Simplified relational data processing on large clusters
- H. Yang, A. Dasdan, R.-L. Hsiao, and D. S. P. Jr. Map-Reduce-Merge: simplified relational data processing on large clusters. In SIGMOD Conference, pages 1029-1040, 2007.
- (2007) SIGMOD Conference , pp. 1029-1040
- Yang, H.¹ Dasdan, A.² Hsiao, R.-L.³ P Jr., D.S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.