메뉴 건너뛰기




Volumn , Issue , 2010, Pages 495-506

Efficient parallel set-similarity joins using MapReduce

Author keywords

mapreduce; set similarity join

Indexed keywords

MAIN MEMORY; REAL DATA SETS; SCALE-UP; SELF-JOIN; SIMILARITY JOIN;

EID: 77954744650     PISSN: 07308078     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1807167.1807222     Document Type: Conference Paper
Times cited : (417)

References (30)
  • 1
    • 77954734537 scopus 로고    scopus 로고
    • Apache Hadoop. http://hadoop.apache.org.
  • 2
    • 77954693149 scopus 로고    scopus 로고
    • Apache Hive. http://hadoop.apache.org/hive.
  • 3
    • 85104914015 scopus 로고    scopus 로고
    • Efficient exact set-similarity joins
    • A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918-929, 2006.
    • (2006) VLDB , pp. 918-929
    • Arasu, A.1    Ganti, V.2    Kaushik, R.3
  • 4
    • 35348849154 scopus 로고    scopus 로고
    • Scaling up all pairs similarity search
    • R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, pages 131-140, 2007.
    • (2007) WWW , pp. 131-140
    • Bayardo, R.J.1    Ma, Y.2    Srikant, R.3
  • 6
    • 33749597967 scopus 로고    scopus 로고
    • A primitive operator for similarity joins in data cleaning
    • S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, 2006.
    • (2006) ICDE , pp. 5
    • Chaudhuri, S.1    Ganti, V.2    Kaushik, R.3
  • 7
    • 37549003336 scopus 로고    scopus 로고
    • MapReduce: Simplified data processing on large clusters
    • J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113, 2008.
    • (2008) Commun. ACM , vol.51 , Issue.1 , pp. 107-113
    • Dean, J.1    Ghemawat, S.2
  • 8
    • 0026870271 scopus 로고
    • Parallel database systems: The future of high performance database systems
    • D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85-98, 1992.
    • (1992) Commun. ACM , vol.35 , Issue.6 , pp. 85-98
    • DeWitt, D.J.1    Gray, J.2
  • 9
    • 0002773778 scopus 로고
    • An evaluation of non-equijoin algorithms
    • D. J. DeWitt, J. F. Naughton, and D. A. Schneider. An evaluation of non-equijoin algorithms. In VLDB, pages 443-452, 1991.
    • (1991) VLDB , pp. 443-452
    • DeWitt, D.J.1    Naughton, J.F.2    Schneider, D.A.3
  • 11
    • 77954730261 scopus 로고    scopus 로고
    • Genbank. http://www.ncbi.nlm.nih.gov/Genbank.
    • Genbank
  • 12
    • 15044355327 scopus 로고    scopus 로고
    • Similarity search in high dimensions via hashing
    • A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518-529, 1999.
    • (1999) VLDB , pp. 518-529
    • Gionis, A.1    Indyk, P.2    Motwani, R.3
  • 14
    • 33750296887 scopus 로고    scopus 로고
    • Finding near-duplicate web pages: A large-scale evaluation of algorithms
    • M. R. Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR, pages 284-291, 2006.
    • (2006) SIGIR , pp. 284-291
    • Henzinger, M.R.1
  • 15
    • 0037319544 scopus 로고    scopus 로고
    • Methods for identifying versioned and plagiarized documents
    • T. C. Hoad and J. Zobel. Methods for identifying versioned and plagiarized documents. JASIST, 54(3):203-215, 2003.
    • (2003) JASIST , vol.54 , Issue.3 , pp. 203-215
    • Hoad, T.C.1    Zobel, J.2
  • 16
    • 77954753681 scopus 로고    scopus 로고
    • Jaql. http://www.jaql.org.
  • 18
    • 0001793230 scopus 로고
    • Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc)
    • M. Kitsuregawa and Y. Ogawa. Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In VLDB, pages 210-221, 1990.
    • (1990) VLDB , pp. 210-221
    • Kitsuregawa, M.1    Ogawa, Y.2
  • 19
    • 65749311706 scopus 로고
    • Application of hash to data base machine and its architecture
    • M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Comput., 1(1):63-74, 1983.
    • (1983) New Generation Comput. , vol.1 , Issue.1 , pp. 63-74
    • Kitsuregawa, M.1    Tanaka, H.2    Moto-Oka, T.3
  • 20
    • 35348835502 scopus 로고    scopus 로고
    • Detectives: Detecting coalition hit inflation attacks in advertising networks streams
    • A. Metwally, D. Agrawal, and A. E. Abbadi. Detectives: detecting coalition hit inflation attacks in advertising networks streams. In WWW, pages 241-250, 2007.
    • (2007) WWW , pp. 241-250
    • Metwally, A.1    Agrawal, D.2    Abbadi, A.E.3
  • 22
    • 34250638291 scopus 로고    scopus 로고
    • A web-based kernel function for measuring the similarity of short text snippets
    • M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW, pages 377-386, 2006.
    • (2006) WWW , pp. 377-386
    • Sahami, M.1    Heilman, T.D.2
  • 23
    • 3142777876 scopus 로고    scopus 로고
    • Efficient set joins on similarity predicates
    • S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD Conference, pages 743-754, 2004.
    • (2004) SIGMOD Conference , pp. 743-754
    • Sarawagi, S.1    Kirpal, A.2
  • 24
    • 84976736061 scopus 로고
    • A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment
    • D. A. Schneider and D. J. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In SIGMOD Conference, pages 110-121, 1989.
    • (1989) SIGMOD Conference , pp. 110-121
    • Schneider, D.A.1    DeWitt, D.J.2
  • 25
    • 32344452531 scopus 로고    scopus 로고
    • Evaluating similarity measures: A large-scale study in the orkut social network
    • E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In KDD, pages 678-684, 2005.
    • (2005) KDD , pp. 678-684
    • Spertus, E.1    Sahami, M.2    Buyukkokten, O.3
  • 27
    • 77954731952 scopus 로고    scopus 로고
    • version 1
    • Web 1t 5-gram version 1. http://www.ldc.upenn. edu/Catalog/CatalogEntry. jsp? catalogId=LDC2006T13.
    • Web 1t 5-gram
  • 28
    • 70849105253 scopus 로고    scopus 로고
    • Ed-join: An efficient algorithm for similarity joins with edit distance constraints
    • C. Xiao, W. Wang, and X. Lin. Ed-join: An efficient algorithm for similarity joins with edit distance constraints. In VLDB, 2008.
    • (2008) VLDB
    • Xiao, C.1    Wang, W.2    Lin, X.3
  • 29
    • 57349141410 scopus 로고    scopus 로고
    • Efficient similarity joins for near duplicate detection
    • C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, pages 131-140, 2008.
    • (2008) WWW , pp. 131-140
    • Xiao, C.1    Wang, W.2    Lin, X.3    Yu, J.X.4
  • 30
    • 35448944021 scopus 로고    scopus 로고
    • Map-Reduce-Merge: Simplified relational data processing on large clusters
    • H. Yang, A. Dasdan, R.-L. Hsiao, and D. S. P. Jr. Map-Reduce-Merge: simplified relational data processing on large clusters. In SIGMOD Conference, pages 1029-1040, 2007.
    • (2007) SIGMOD Conference , pp. 1029-1040
    • Yang, H.1    Dasdan, A.2    Hsiao, R.-L.3    P Jr., D.S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.