메뉴 건너뛰기




Volumn 27, Issue 1, 2012, Pages 45-63

Multi-pass sorted neighborhood blocking with MapReduce

Author keywords

Blocking; Cloud computing; Entity resolution; Hadoop; MapReduce; Sorted Neighborhood

Indexed keywords


EID: 84857059718     PISSN: 18652034     EISSN: 18652042     Source Type: Journal    
DOI: 10.1007/s00450-011-0177-x     Document Type: Conference Paper
Times cited : (67)

References (23)
  • 3
    • 5444258997 scopus 로고    scopus 로고
    • A comparison of fast blocking methods for record linkage
    • Baxter R, Christen P, Churches T (2003) A comparison of fast blocking methods for record linkage. In: ACM SIGKDD, vol 3, pp 25-27
    • (2003) ACM SIGKDD , vol.3 , pp. 25-27
    • Baxter, R.1    Christen, P.2    Churches, T.3
  • 4
    • 77952372966 scopus 로고    scopus 로고
    • Adaptive duplicate detection using learnable string similarity measures
    • Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp 39-48
    • (2003) KDD , pp. 39-48
    • Bilenko, M.1    Mooney, R.J.2
  • 6
    • 65449178105 scopus 로고    scopus 로고
    • Febrl -: An open source data cleaning, deduplication and record linkage system with a graphical user interface
    • Christen P (2008) Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: KDD, pp 1065-1068
    • (2008) KDD , pp. 1065-1068
    • Christen, P.1
  • 7
    • 7444251738 scopus 로고    scopus 로고
    • Febrl - A Parallel Open Source Data Linkage System
    • Advances in Knowledge Discovery and Data Mining
    • Christen P, Churches T, Hegland M (2004) Febrl-a parallel open source data linkage system. In: PAKDD, pp 638-647 (Pubitemid 38824950)
    • (2004) Lecture Notes in Computer Science , Issue.3056 , pp. 638-647
    • Christen, P.1    Churches, T.2    Hegland, M.3
  • 8
    • 85030321143 scopus 로고    scopus 로고
    • MapReduce: Simplified data processing on large clusters
    • Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: OSDI, pp 137-150
    • (2004) OSDI , pp. 137-150
    • Dean, J.1    Ghemawat, S.2
  • 9
    • 37549003336 scopus 로고    scopus 로고
    • MapReduce: Simplified data processing on large clusters
    • 10.1145/1327452.1327492
    • J Dean S Ghemawat 2008 MapReduce: simplified data processing on large clusters Commun ACM 51 1 107 113 10.1145/1327452.1327492
    • (2008) Commun ACM , vol.51 , Issue.1 , pp. 107-113
    • Dean, J.1    Ghemawat, S.2
  • 10
    • 0026870271 scopus 로고
    • Parallel database systems. The future of high performance database systems
    • DOI 10.1145/129888.129894
    • D DeWitt J Gray 1992 Parallel database systems: the future of high performance database systems Commun ACM 35 6 85 98 10.1145/129888.129894 (Pubitemid 23642225)
    • (1992) Communications of the ACM , vol.35 , Issue.6 , pp. 85-98
    • Dewitt David1    Gray Jim2
  • 13
    • 84876673365 scopus 로고    scopus 로고
    • Foundation AS
    • Foundation AS (2006) Hadoop. http://hadoop.apache.org/mapreduce/
    • (2006) Hadoop
  • 14
    • 84976856849 scopus 로고
    • The merge/purge problem for large databases
    • Hernández MA, Stolfo SJ (1995) The merge/purge problem for large databases. In: SIGMOD Conference, pp 127-138
    • (1995) SIGMOD Conference , pp. 127-138
    • Hernández, M.A.1    Stolfo, S.J.2
  • 15
    • 63449096255 scopus 로고    scopus 로고
    • Parallel linkage
    • Kim HS, Lee D (2007) Parallel linkage. In: CIKM, pp 283-292
    • (2007) CIKM , pp. 283-292
    • Kim, H.S.1    Lee, D.2
  • 17
    • 85059130497 scopus 로고    scopus 로고
    • Parallel sorted neighborhood blocking with mapreduce
    • Kolb L, Thor A, Rahm E (2011) Parallel sorted neighborhood blocking with mapreduce. In: BTW, pp 45-64
    • (2011) BTW , pp. 45-64
    • Kolb, L.1    Thor, A.2    Rahm, E.3
  • 18
    • 72649095071 scopus 로고    scopus 로고
    • Frameworks for entity matching: A comparison
    • 10.1016/j.datak.2009.10.003
    • H Köpcke E Rahm 2010 Frameworks for entity matching: a comparison Data Knowl Eng 69 2 197 210 10.1016/j.datak.2009.10.003
    • (2010) Data Knowl Eng , vol.69 , Issue.2 , pp. 197-210
    • Köpcke, H.1    Rahm, E.2
  • 19
    • 80455148340 scopus 로고    scopus 로고
    • Evaluation of entity resolution approaches on real-world match problems
    • Köpcke H, Thor A, Rahm E (2010) Evaluation of entity resolution approaches on real-world match problems. In: VLDB, pp 484-493
    • (2010) VLDB
    • Köpcke, H.1    Thor, A.2    Rahm, E.3
  • 20
    • 77954338155 scopus 로고    scopus 로고
    • Learning-based approaches for matching web data entities
    • 10.1109/MIC.2010.58
    • H Köpcke A Thor E Rahm 2010 Learning-based approaches for matching web data entities IEEE Internet Comput 14 23 31 10.1109/MIC.2010.58
    • (2010) IEEE Internet Comput , vol.14 , pp. 23-31
    • Köpcke, H.1    Thor, A.2    Rahm, E.3
  • 21
    • 84964816728 scopus 로고    scopus 로고
    • Data-intensive text processing with mapreduce
    • 10.2200/S00274ED1V01Y201006HLT007
    • J Lin C Dyer 2010 Data-intensive text processing with mapreduce Synth Lect Hum Lang Technol 3 1 1 177 10.2200/S00274ED1V01Y201006HLT007
    • (2010) Synth Lect Hum Lang Technol , vol.3 , Issue.1 , pp. 1-177
    • Lin, J.1    Dyer, C.2
  • 22
    • 0002490026 scopus 로고    scopus 로고
    • Data cleaning: Problems and current approaches
    • E Rahm HH Do 2000 Data cleaning: problems and current approaches IEEE Data Eng Bull 23 4 3 13
    • (2000) IEEE Data Eng Bull , vol.23 , Issue.4 , pp. 3-13
    • Rahm, E.1    Do, H.H.2
  • 23
    • 77954744650 scopus 로고    scopus 로고
    • Efficient parallel set-similarity joins using mapreduce
    • 10.1145/1807167.1807222
    • Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: SIGMOD Conference, pp 495-506
    • (2010) SIGMOD Conference , pp. 495-506
    • Vernica, R.1    Carey, M.J.2    Li, C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.