메뉴 건너뛰기




Volumn , Issue , 2012, Pages 1055-1064

An automatic blocking mechanism for large-scale de-duplication tasks

Author keywords

blocking; canopy formation; de duplication

Indexed keywords

ARCHITECTURAL CONSTRAINTS; AUTOMATICALLY GENERATED; BLOCKING; BLOCKING MECHANISMS; BLOCKING PROCESS; CANOPY FORMATION; CENSUS DATA; DATA INTEGRATION; DATA SETS; DE-DUPLICATION; DESIGN PROBLEMS; DISJOINTNESS; GRID ENVIRONMENTS; HIERARCHICAL TREE; LARGE DATASETS; MAP-REDUCE; MEDICAL RECORD; MEMORY REQUIREMENTS; NEW DIMENSIONS; OBJECTIVE FUNCTIONS; PAIR-WISE COMPARISON; POST PROCESSING; PROBLEM SIZE; REAL-WORLD ENTITIES; STRUCTURED DATA;

EID: 84871075183     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2396761.2398403     Document Type: Conference Paper
Times cited : (26)

References (29)
  • 1
    • 84871056742 scopus 로고    scopus 로고
    • DBPedia. http://dbpedia.org/.
  • 3
    • 84871078931 scopus 로고    scopus 로고
    • MAHOUT. http://cwiki.apache.org/MAHOUT/canopy-clustering.html.
    • MAHOUT
  • 4
    • 5444258997 scopus 로고    scopus 로고
    • A comparison of fast blocking methods for record linkage
    • R. Baxter, P. Christen, and T. Churches. A comparison of fast blocking methods for record linkage. In KDD, 2003.
    • (2003) KDD
    • Baxter, R.1    Christen, P.2    Churches, T.3
  • 7
    • 84871099550 scopus 로고    scopus 로고
    • Adaptive blocking: Learning to scale up record linkage and clustering
    • M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage and clustering. In ICDM, 2006.
    • (2006) ICDM
    • Bilenko, M.1    Kamath, B.2    Mooney, R.J.3
  • 8
    • 84857183817 scopus 로고    scopus 로고
    • A survey of indexing techniques for scalable record linkage and deduplication
    • P. Christen. A survey of indexing techniques for scalable record linkage and deduplication. TKDE, 2011.
    • (2011) TKDE
    • Christen, P.1
  • 10
    • 0001366593 scopus 로고
    • Discrete-variable extremum problems
    • G. B. Dantzig. Discrete-variable extremum problems. Operations Research, 5(2), 1957.
    • (1957) Operations Research , vol.5 , Issue.2
    • Dantzig, G.B.1
  • 14
    • 84976856849 scopus 로고
    • The merge/purge problem for large databases
    • M. A. Hernandez and S. J. Stolfo. The merge/purge problem for large databases. In SIGMOD, 1995.
    • (1995) SIGMOD
    • Hernandez, M.A.1    Stolfo, S.J.2
  • 15
    • 85012195470 scopus 로고    scopus 로고
    • The history of histograms (abridged)
    • Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003.
    • (2003) VLDB
    • Ioannidis, Y.E.1
  • 16
    • 84950419860 scopus 로고
    • Advances in record linkage methodology as applied to matching the 1985 census of tampa
    • M. A. Jaro. Advances in record linkage methodology as applied to matching the 1985 census of tampa. JASA, 84, 1989.
    • (1989) JASA , pp. 84
    • Jaro, M.A.1
  • 17
    • 84943425383 scopus 로고    scopus 로고
    • Efficient record linkage in large data sets
    • L. Jin, C. Li, and S. Mehrotra. Efficient record linkage in large data sets. In DASFAA, 2003.
    • (2003) DASFAA
    • Jin, L.1    Li, C.2    Mehrotra, S.3
  • 18
    • 34748826012 scopus 로고
    • Advances in record linkage methodology: A method for determining the best blocking strategy
    • R. P. Kelley. Advances in record linkage methodology: a method for determining the best blocking strategy. Record Linkage Techniques, 1985.
    • (1985) Record Linkage Techniques
    • Kelley, R.P.1
  • 20
    • 0034592784 scopus 로고    scopus 로고
    • Efficient clustering of high-dimensional data sets with application to reference matching
    • A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, 2000.
    • (2000) KDD
    • McCallum, A.1    Nigam, K.2    Ungar, L.H.3
  • 21
    • 36348932551 scopus 로고    scopus 로고
    • Learning blocking schemes for record linkage
    • M. Michelson and C. A. Knoblock. Learning blocking schemes for record linkage. In AAAI, 2006.
    • (2006) AAAI
    • Michelson, M.1    Knoblock, C.A.2
  • 22
    • 79960020260 scopus 로고    scopus 로고
    • Processing theta-joins using mapreduce
    • A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In SIGMOD, 2011.
    • (2011) SIGMOD
    • Okcan, A.1    Riedewald, M.2
  • 23
    • 67049141149 scopus 로고    scopus 로고
    • Scaling record linkage to non-uniform distributed class sizes
    • S. Rendle and L. Schmidt-Thieme. Scaling record linkage to non-uniform distributed class sizes. In PAKDD, 2008.
    • (2008) PAKDD
    • Rendle, S.1    Schmidt-Thieme, L.2
  • 24
    • 0242456811 scopus 로고    scopus 로고
    • Interactive deduplication using active learning
    • S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In SIGKDD, 2002.
    • (2002) SIGKDD
    • Sarawagi, S.1    Bhamidipaty, A.2
  • 25
    • 63449096255 scopus 로고    scopus 로고
    • Parallel linkage
    • H. sik Kim and D. Lee. Parallel linkage. In CIKM, 2007.
    • (2007) CIKM
    • Sik Kim, H.1    Lee, D.2
  • 26
    • 77954744650 scopus 로고    scopus 로고
    • Efficient parallel set-similarity joins using mapreduce
    • R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using mapreduce. In SIGMOD, 2010.
    • (2010) SIGMOD
    • Vernica, R.1    Carey, M.J.2    Li, C.3
  • 27
    • 11144240583 scopus 로고    scopus 로고
    • A comparison of string distance metrics for name-matching tasks
    • S. E. F. W. Cohen, P. Ravikumar. A comparison of string distance metrics for name-matching tasks. In Proc. of IJCAI, 2003.
    • Proc. of IJCAI, 2003
    • Cohen, S.E.F.W.1    Ravikumar, P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.