메뉴 건너뛰기




Volumn 18, Issue 5, 2009, Pages 1141-1166

Creating probabilistic databases from duplicated data

Author keywords

Duplicate detection; Probabilistic databases; String databases

Indexed keywords

ALTERNATIVE APPROACH; CLEANING STRATEGIES; DEDUPLICATION; DUPLICATE DETECTION; IMPERFECT DATA; JOIN METHOD; MODULAR FRAMEWORK; PROBABILISTIC DATABASE; PROBABILISTIC DATABASES; QUERY RESULTS; REAL-WORLD ENTITIES; STRING DATABASES; THRESHOLDING TECHNIQUES;

EID: 70349826301     PISSN: 10668888     EISSN: 0949877X     Source Type: Journal    
DOI: 10.1007/s00778-009-0161-2     Document Type: Article
Times cited : (46)

References (51)
  • 5
    • 67649649597 scopus 로고    scopus 로고
    • Large-scale deduplication with constraints using dedupalog
    • Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: IEEE Proc. of the Int'l Conf. on Data Eng., pp. 952-963 (2009)
    • (2009) IEEE Proc. of the Int'l Conf. on Data Eng. , pp. 952-963
    • Arasu, A.1
  • 6
    • 4644233828 scopus 로고    scopus 로고
    • The star clustering algorithm for static and dynamic information organization
    • 1068.68098 2112265
    • J.A. Aslam E. Pelekhov D. Rus 2004 The star clustering algorithm for static and dynamic information organization J. Graph Algorithm. Appl. 8 1 95 129 1068.68098 2112265
    • (2004) J. Graph Algorithm. Appl. , vol.8 , Issue.1 , pp. 95-129
    • Aslam, J.A.1    Pelekhov, E.2    Rus, D.3
  • 7
    • 3142665421 scopus 로고    scopus 로고
    • Correlation clustering
    • 1089.68085 10.1023/B:MACH.0000033116.57574.95
    • N. Bansal A. Blum S. Chawla 2004 Correlation clustering Mach. Learn. 56 1-3 89 113 1089.68085 10.1023/B:MACH.0000033116.57574.95
    • (2004) Mach. Learn. , vol.56 , Issue.13 , pp. 89-113
    • Bansal, N.1    Blum, A.2    Chawla, S.3
  • 13
    • 36348996876 scopus 로고    scopus 로고
    • Collective entity resolution in relational data
    • I. Bhattacharya L. Getoor 2006 Collective entity resolution in relational data IEEE Data Eng. Bull. 29 2 4 12
    • (2006) IEEE Data Eng. Bull. , vol.29 , Issue.2 , pp. 4-12
    • Bhattacharya, I.1    Getoor, L.2
  • 14
    • 0141607824 scopus 로고    scopus 로고
    • Latent dirichlet allocation
    • 1112.68379 10.1162/jmlr.2003.3.4-5.993
    • D.M. Blei A.Y. Ng M.I. Jordan 2003 Latent dirichlet allocation J. Mach. Learn. Res. 3 993 1022 1112.68379 10.1162/jmlr.2003.3.4-5.993
    • (2003) J. Mach. Learn. Res. , vol.3 , pp. 993-1022
    • Blei, D.M.1    Ng, A.Y.2    Jordan, M.I.3
  • 19
    • 61349087255 scopus 로고    scopus 로고
    • Cleaning uncertain data with quality guarantees
    • R. Cheng J. Chen X. Xie 2008 Cleaning uncertain data with quality guarantees Proc. VLDB Endow. (PVLDB) 1 1 722 735
    • (2008) Proc. VLDB Endow. (PVLDB) , vol.1 , Issue.1 , pp. 722-735
    • Cheng, R.1    Chen, J.2    Xie, X.3
  • 21
    • 36148994784 scopus 로고    scopus 로고
    • Efficient query evaluation on probabilistic databases
    • 10.1007/s00778-006-0004-3
    • N. Dalvi D. Suciu 2007 Efficient query evaluation on probabilistic databases Int. J. Very Large Data Bases 16 4 523 544 10.1007/s00778-006-0004-3
    • (2007) Int. J. Very Large Data Bases , vol.16 , Issue.4 , pp. 523-544
    • Dalvi, N.1    Suciu, D.2
  • 23
    • 33746868385 scopus 로고    scopus 로고
    • Correlation clustering in general weighted graphs
    • DOI 10.1016/j.tcs.2006.05.008, PII S0304397506003227
    • E.D. Demaine D. Emanuel A. Fiat N. Immorlica 2006 Correlation clustering in general weighted graphs Theor. Comput. Sci. 361 2 172 187 1099.68074 10.1016/j.tcs.2006.05.008 2252576 (Pubitemid 44189103)
    • (2006) Theoretical Computer Science , vol.361 , Issue.2-3 , pp. 172-187
    • Demaine, E.D.1    Emanuel, D.2    Fiat, A.3    Immorlica, N.4
  • 26
    • 84947399464 scopus 로고
    • A theory for record linkage
    • 10.2307/2286061
    • I.P. Fellegi A.B. Sunter 1969 A theory for record linkage J. Am. Stat. Assoc. 64 328 1183 1210 10.2307/2286061
    • (1969) J. Am. Stat. Assoc. , vol.64 , Issue.328 , pp. 1183-1210
    • Fellegi, I.P.1    Sunter, A.B.2
  • 27
    • 84906283185 scopus 로고    scopus 로고
    • Graph clustering and minimum cut trees
    • 1098.68095 2119992
    • G.W. Flake R.E. Tarjan K. Tsioutsiouliklis 2004 Graph clustering and minimum cut trees Internet Math. 1 4 385 408 1098.68095 2119992
    • (2004) Internet Math. , vol.1 , Issue.4 , pp. 385-408
    • Flake, G.W.1    Tarjan, R.E.2    Tsioutsiouliklis, K.3
  • 30
    • 0007508819 scopus 로고    scopus 로고
    • Algorithms on strings, trees, and sequences
    • Cambridge University Press, Cambridge
    • Gusfield, D.: Algorithms on strings, trees, and sequences. In: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
    • (1997) Computer Science and Computational Biology
    • Gusfield, D.1
  • 35
    • 0013331361 scopus 로고    scopus 로고
    • Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem
    • DOI 10.1023/A:1009761603038
    • M.A. Hernández S.J. Stolfo 1998 Real-world data is dirty: data cleansing and the merge/purge problem Data Min. Know. Discov. 2 1 9 37 10.1023/A:1009761603038 (Pubitemid 128063205)
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-38
    • Hernandez, M.A.1    Stolfo, S.J.2
  • 37
    • 0026979939 scopus 로고
    • Techniques for automatically correcting words in text
    • DOI 10.1145/146370.146380
    • K. Kukich 1992 Techniques for automatically correcting words in text ACM Comput. Surv. 24 4 377 439 10.1145/146370.146380 (Pubitemid 23687641)
    • (1992) ACM Computing Surveys , vol.24 , Issue.4 , pp. 377-439
    • Kukich Karen1
  • 38
    • 85011032600 scopus 로고    scopus 로고
    • VGRAM: Improving performance of approximate queries on string collections using variable-length grams
    • Vienna
    • Li, C., Wang, B., Yang, X.: VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In: Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pp. 303-314, Vienna (2007)
    • (2007) Proc. of the Int'l Conf. on Very Large Data Bases (VLDB) , pp. 303-314
    • Li, C.1    Wang, B.2    Yang, X.3
  • 42
    • 0002490026 scopus 로고    scopus 로고
    • Data cleaning: Problems and current approaches
    • E. Rahm H. Hai Do 2000 Data cleaning: problems and current approaches IEEE Data Eng. Bull. 23 4 3 13
    • (2000) IEEE Data Eng. Bull. , vol.23 , Issue.4 , pp. 3-13
    • Rahm, E.1    Hai Do, H.2
  • 44
    • 8844253324 scopus 로고    scopus 로고
    • Understanding inverse document frequency: On theoretical arguments for IDF
    • DOI 10.1108/00220410410560582
    • S. Robertson 2004 Understanding inverse document frequency: on theoretical arguments for IDF J. Doc. 60 5 503 520 10.1108/00220410410560582 (Pubitemid 39538229)
    • (2004) Journal of Documentation , vol.60 , Issue.5 , pp. 503-520
    • Robertson, S.1
  • 46
    • 49549112222 scopus 로고    scopus 로고
    • Representing tuple and attribute uncertainty in probabilistic databases
    • Sen, P., Deshpande, A., Getoor, L.: Representing tuple and attribute uncertainty in probabilistic databases. In: ICDM Workshops, pp. 507-512 (2007)
    • (2007) ICDM Workshops , pp. 507-512
    • Sen, P.1    Deshpande, A.2    Getoor, L.3
  • 49
    • 51149112283 scopus 로고    scopus 로고
    • Probabilistic top-k and ranking-aggregate queries
    • 10.1145/1386118.1386119
    • M.A. Soliman I.F. Ilyas K.C. Chang 2008 Probabilistic top-k and ranking-aggregate queries ACM Trans. Database Syst. (TODS) 33 3 1 54 10.1145/1386118.1386119
    • (2008) ACM Trans. Database Syst. (TODS) , vol.33 , Issue.3 , pp. 1-54
    • Soliman, M.A.1    Ilyas, I.F.2    Chang, K.C.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.