메뉴 건너뛰기




Volumn 7, Issue 2, 2012, Pages 114-130

De-duplication of aggregation authority files

Author keywords

Authority control; Candidate identification; Cassandra; Metadata; Ontologies; Performance; Record aggregations; Record deduplication; Record merge; Scalability; Semantics; Sorted neighbourhood

Indexed keywords

DIGITAL STORAGE; ITERATIVE METHODS; METADATA; ONTOLOGY; SCALABILITY; SEMANTICS; USER INTERFACES;

EID: 84868227103     PISSN: 17442621     EISSN: 1744263X     Source Type: Journal    
DOI: 10.1504/IJMSO.2012.050014     Document Type: Article
Times cited : (9)

References (39)
  • 4
    • 0036040277 scopus 로고    scopus 로고
    • Similarity estimation techniques from rounding algorithms
    • May, Montreal, Quebec, Canada
    • Charikar, M. (2002) 'Similarity estimation techniques from rounding algorithms', in 34th Annual Symposium on Theory and Computing, May, Montreal, Quebec, Canada.
    • (2002) 34th Annual Symposium on Theory and Computing
    • Charikar, M.1
  • 5
    • 84857183817 scopus 로고    scopus 로고
    • A survey of indexing techniques for scalable record linkage and deduplication
    • ISSN 1041-4347. doi: 10.1109/TKDE
    • Christen, P. (2011) 'A survey of indexing techniques for scalable record linkage and deduplication', Knowledge and Data Engineering, IEEE Transactions on, ISSN 1041-4347. doi: 10.1109/TKDE, Vol. 127, No. 1, pp.99.
    • (2011) Knowledge and Data Engineering, IEEE Transactions on , vol.127 , Issue.1 , pp. 99
    • Christen, P.1
  • 6
    • 65449178105 scopus 로고    scopus 로고
    • An open source data cleaning, deduplication and record linkage system with a graphical user interface
    • New York, NY, USA, ACM. ISBN 978-1-60558-193-4. doi, URL http://doi.acm.org/10.1145/1401890.1402020
    • Christen, P.F. (2008) 'An open source data cleaning, deduplication and record linkage system with a graphical user interface', In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, New York, NY, USA, ACM. ISBN 978-1-60558-193-4. doi: http://doi.acm.org/10.1145/1401890.1402020, URL http://doi.acm.org/10.1145/ 1401890.1402020, pp.1065-1068.
    • (2008) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08 , pp. 1065-1068
    • Christen, P.F.1
  • 7
    • 67650700151 scopus 로고    scopus 로고
    • Accurate synthetic generation of realistic personal information, in Thanaruk Theeramunkong
    • (Eds.): Kijsirikul, B., Cercone, N. and Ho, T-B 10.1007/978-3-642-01307- 2.47 Springer Berlin, Heidelberg, ISBN 978-3-642-01306-5. URL
    • Christen, P. and Pudjijono, A. (2009) 'Accurate synthetic generation of realistic personal information', In Thanaruk Theeramunkong, (Eds.): Kijsirikul, B., Cercone, N. and Ho, T-B., Advances in Knowledge Discovery and Data Mining, of Lecture Notes in Computer Science, Springer Berlin, Heidelberg, ISBN 978-3-642-01306-5. URL http://dx.doi.org/10.1007/978-3-642-01307-2-47. 10.1007/978-3-642-01307-2.47, Vol. 5476 pp.507-514.
    • (2009) Advances in Knowledge Discovery and Data Mining, of Lecture Notes in Computer Science , vol.5476 , pp. 507-514
    • Christen, P.1    Pudjijono, A.2
  • 11
    • 0242540438 scopus 로고    scopus 로고
    • Learning to match and cluster large high-dimensional data sets for data integration
    • ACM ISBN 1-58113-567-X. doi New York, USA. URL http://doi.acm. org/10.1145/775047.775116
    • Cohen, W.W. and Richman, J. (2002) 'Learning to match and cluster large high-dimensional data sets for data integration', in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '02, ACM ISBN 1-58113-567-X. doi: http://doi. acm.org/10.1145/775047.775116. URL http://doi.acm. org/10.1145/775047.775116, New York, USA., pp.475-480.
    • (2002) Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02 , pp. 475-480
    • Cohen, W.W.1    Richman, J.2
  • 12
    • 3042649466 scopus 로고
    • From authority control to informed retrieval: Framing the expanded domain of subject access
    • Dalrymple, P.W. and Young, J.A. (1991) 'From authority control to informed retrieval: Framing the expanded domain of subject access', College and Research Libraries, doi: http://hdl.handle. net/1860/3173, Vol. 52, pp.139-149.
    • (1991) College and Research Libraries , vol.52 , pp. 139-149
    • Dalrymple, P.W.1    Young, J.A.2
  • 13
    • 33845667955 scopus 로고    scopus 로고
    • Duplicate record detection: A survey
    • DOI 10.1109/TKDE.2007.250581
    • Elmagarmid, A.K., Ipeirotis, P.G. and Verykios, V.S. (2007) 'Duplicate record detection: a survey', Knowledge and Data Engineering, IEEE Transactions on, Jan., ISSN 1041-4347, doi: 10.1109/TKDE 250581, URL http://www.cs.purdue. edu/homes/ake/pub/-survey2.pdf, Vol. 19, No. 1, pp.1-16. (Pubitemid 44955773)
    • (2007) IEEE Transactions on Knowledge and Data Engineering , vol.19 , Issue.1 , pp. 1-16
    • Elmagarmid, A.K.1    Ipeirotis, P.G.2    Verykios, V.S.3
  • 16
    • 84905814283 scopus 로고    scopus 로고
    • Authority control in the context of bibliographic control in the electronic environment
    • Gorman, M. (2003) 'Authority control in the context of bibliographic control in the electronic environment', International Conference Authority Control: Defi nition and International Experiences, Florence, February 10-12, doi: http://hdl.handle. net/10760/4164. Vol. 38, No. 3-4, pp.11-22. (Pubitemid 40055125)
    • (2004) Cataloging and Classification Quarterly , vol.38 , Issue.3-4 , pp. 11-22
    • Gorman, M.1
  • 17
    • 84880467474 scopus 로고    scopus 로고
    • Text joins in an RDBMS for web data integration
    • ACM. ISBN 1-58113680-3. doi URL http://doi.acm. org/10.1145/775152.775166 New York, NY, USA
    • Gravano, L., Panagiotis, G., Koudas, I.N. and Srivastava, D. (2003) 'Text joins in an RDBMS for web data integration', In Proceedings of the 12th international conference on World Wide Web, WWW '03, ACM. ISBN 1-58113680-3. doi: http://doi.acm.org/10.1145/775152.775166, URL http://doi.acm. org/10.1145/775152.775166, New York, NY, USA, pp.90-101.
    • (2003) Proceedings of the 12th International Conference on World Wide Web, WWW '03 , pp. 90-101
    • Gravano, L.1    Panagiotis, G.2    Koudas, I.N.3    Srivastava, D.4
  • 18
    • 84976856849 scopus 로고
    • SIGMOD Rec., May, ISSN 0163-5808, doi URL http://doi. acm.org/10.1145/568271.223807
    • Hernandez, M.A. and Stolfo, S.J. (1995) 'The merge/purge problem for large databases, SIGMOD Rec., May, ISSN 0163-5808, doi: http://doi.acm.org/10. 1145/568271.223807, URL http://doi. acm.org/10.1145/568271.223807, Vol. 24, pp.127-138.
    • (1995) The Merge/purge Problem for Large Databases , vol.24 , pp. 127-138
    • Hernandez, M.A.1    Stolfo, S.J.2
  • 19
    • 84950419860 scopus 로고
    • Advances in record-linkage methodology as applied to matching the 1985 census of Tampa
    • June, URL published by American Statistical Association, Florida
    • Jaro, M.A. (1989) 'Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Journal of the American Statistical Association, June, URL http://www.jstor.org/stable/2289924, published by American Statistical Association, Florida, Vol. 84, No. 406, pp.414-420.
    • (1989) Journal of the American Statistical Association , vol.84 , Issue.406 , pp. 414-420
    • Jaro, M.A.1
  • 20
    • 56749169489 scopus 로고    scopus 로고
    • Cragan and adolfo correa, fi ne-grained record integration and linkage tool
    • ISSN 1542-0760 10.1002/bdra.20521 URL
    • Jurczyk, P., James, J., Lu, Li. and Janet, D.X. (2008) 'Cragan and adolfo correa, fi ne-grained record integration and linkage tool', Birth Defects Research Part A: Clinical and Molecular Teratology, ISSN 1542-0760, doi: 10.1002/bdra.20521. URL http://dx.doi.org/10.1002/bdra.20521, Vol. 82, No. 11, pp.822-829.
    • (2008) Birth Defects Research Part A: Clinical and Molecular Teratology , vol.82 , Issue.11 , pp. 822-829
    • Jurczyk, P.1    James, J.2    Li, L.3    Janet, D.X.4
  • 21
    • 33745266392 scopus 로고    scopus 로고
    • Domain-independent data cleaning via analysis of entity-relationship graph
    • DOI 10.1145/1138394.1138401
    • Kalashnikov, D.V. and Mehrotra, S. (2006) 'Domain-independent data cleaning via analysis of entity-relationship graph', ACM Transactions on Database Systems (TODS), Vol. 31, No. 2, pp.716-767. (Pubitemid 43924953)
    • (2006) ACM Transactions on Database Systems , vol.31 , Issue.2 , pp. 716-767
    • Kalashnikov, D.V.1    Mehrotra, S.2
  • 23
    • 72649095071 scopus 로고    scopus 로고
    • Frameworks for entity matching: A comparison
    • ISSN 0169-023X doi: 10.1016/j.datak.2009.10.003. URL
    • Koepcke, H. and Rahm, E. (2010) 'Frameworks for entity matching: a comparison', Data; Knowledge Engineering, ISSN 0169-023X. doi: 10.1016/j.datak.2009.10.003. URL http://www.sciencedirect.com/science/article/- pii/S0169023X09001451, Vol. 69, No. 2, pp.197-210.
    • (2010) Data; Knowledge Engineering , vol.69 , Issue.2 , pp. 197-210
    • Koepcke, H.1    Rahm, E.2
  • 24
    • 84882976599 scopus 로고    scopus 로고
    • Training selection for tuning entity matching
    • Koepcke, H., Rahm, E. and Rahm, E. (2008) 'Training selection for tuning entity matching', In QDB/MUD, pp.3-12.
    • (2008) QDB/MUD , pp. 3-12
    • Koepcke, H.1    Rahm, E.2    Rahm, E.3
  • 25
    • 0035786730 scopus 로고    scopus 로고
    • The open archives initiative: Building a low-barrier interoperability framework
    • ACM Press, ISBN 1-58113-345-6. doi
    • Lagoze, C. and Van de Sompel, H. (2001) The open archives initiative: building a low-barrier interoperability framework, in Proceedings of the fi rst ACM/IEEE-CS Joint Conference on Digital Libraries, ACM Press, ISBN 1-58113-345-6. doi: http://doi.acm.org/10.1145/379437.379449, pp.54-62.
    • (2001) Proceedings of the Fi Rst ACM/IEEE-CS Joint Conference on Digital Libraries , pp. 54-62
    • Lagoze, C.1    Van De Sompel, H.2
  • 27
    • 77955933052 scopus 로고    scopus 로고
    • Cassandra: A decentralized structured storage system
    • April, ISSN 0163-5980. doi:http://doi. acm.org/10.1145/1773912.1773922. URL http://doi.acm.org/10.1145/1773912.1773922
    • Lakshman, A. and Malik, P. (2010) 'Cassandra: a decentralized structured storage system', SIGOPS Oper. Syst. Rev., April, ISSN 0163-5980. doi: http://doi.acm.org/10.1145/1773912.1773922. URL http://doi.acm.org/10.1145/ 1773912.1773922, Vol. 44, pp.35-40.
    • (2010) SIGOPS Oper. Syst. Rev , vol.44 , pp. 35-40
    • Lakshman, A.1    Malik, P.2
  • 28
    • 14744286115 scopus 로고    scopus 로고
    • The making of the open archives initiative protocol for metadata harvesting
    • Lagoze, C. and Van de Sompel, H. (2003) 'The making of the open archives initiative protocol for metadata harvesting', Library Hi Tech, Vol. 21, No. 2, pp.118-128.
    • (2003) Library Hi Tech , vol.21 , Issue.2 , pp. 118-128
    • Lagoze, C.1    Van De Sompel, H.2
  • 29
    • 80054062837 scopus 로고    scopus 로고
    • PACE: A general-purpose tool for authority control
    • (Ed.): Garcia-Barriocanal, E., Cebeci, Z., Okur, M.C. and Ozturk, A 10.1007/978-3-642-24731-6, 8 Springer Berlin, Heidelberg, ISBN 978-3-642-24731-6, URL
    • Manghi, P. and Mikulicic, M. (2011) 'PACE: a general-purpose tool for authority control, ' in (Ed.): Garcia-Barriocanal, E., Cebeci, Z., Okur, M.C. and Ozturk, A., Metadata and Semantic Research, of Communications in Computer and Information Science, Springer Berlin, Heidelberg, ISBN 978-3-642-24731-6, URL http://dx.doi.org/10.1007/978-3-642-24731-6-8, 10.1007/978-3-642-24731-6, 8, Vol. 240, pp.80-92.
    • (2011) Metadata and Semantic Research, of Communications in Computer and Information Science , vol.240 , pp. 80-92
    • Manghi, P.1    Mikulicic, M.2
  • 31
    • 70449112601 scopus 로고    scopus 로고
    • 'Virtual international authority fi le: linking the Deutsche Nationalbibliothek and Library of Congress name authority fi les
    • Rick, B., Hengel-Dittrich, C., ONeill, E.T. and Viaf, T.B. (2007) 'Virtual international authority fi le: linking the Deutsche Nationalbibliothek and Library of Congress name authority fi les', in International Cataloging and Bibliographic Control, Vol. 1, No. 36, pp.12-19.
    • (2007) International Cataloging and Bibliographic Control , vol.1 , Issue.36 , pp. 12-19
    • Rick, B.1    Hengel-Dittrich, C.2    Oneill, E.T.3    Viaf, T.B.4
  • 32
    • 77954714476 scopus 로고    scopus 로고
    • Detecting duplicate biological entities using shortest path edit distance
    • doi: 10.1504/IJDMB.2010.034196. URL http://inderscience. metapress.com/content/-TQ3737625VK1R573
    • Rudniy, A., Song, M. and Geller, J. (2010) 'Detecting duplicate biological entities using shortest path edit distance', International Journal of Data Mining and Bioinformatics, doi: 10.1504/IJDMB.2010.034196. URL http://inderscience. metapress.com/content/-TQ3737625VK1R573, Vol. 4, No. 4, pp.395-410.
    • (2010) International Journal of Data Mining and Bioinformatics , vol.4 , Issue.4 , pp. 395-410
    • Rudniy, A.1    Song, M.2    Geller, J.3
  • 33
    • 0035545848 scopus 로고    scopus 로고
    • Learning object identification rules for information integration
    • DOI 10.1016/S0306-4379(01)00042-4, Data Extraction, Cleaning and Reconciliation
    • Tejada, S., Knoblock, C. and Minton, S. (2001) 'Learning object identifi cation rules for information extraction', Information Systems, Vol. 26, No. 8, pp.607-633. (Pubitemid 33046273)
    • (2001) Information Systems , vol.26 , Issue.8 , pp. 607-633
    • Tejada, S.1    Knoblock, C.A.2    Minton, S.3
  • 34
    • 84868232618 scopus 로고    scopus 로고
    • Authority control: State of the art and new perspectives
    • Florence, Italy
    • Tillett, B.T. (2003) Authority control: state of the art and new perspectives, in Authority Control International Conference, Florence, Italy.
    • (2003) Authority Control International Conference
    • Tillett, B.T.1
  • 36
    • 84868226364 scopus 로고    scopus 로고
    • Linking medical records: A machine learning approach
    • doi: 10.1504/IJCENT.2010.03836. URL
    • Wang, X. and Alexander, S.M. (2010) 'Linking medical records: a machine learning approach', International Journal of Collaborative Enterprise, doi: 10.1504/IJCENT.2010.03836. URL http://inderscience.metapress.com/content/- T93552025P 150H36, Vol. 1, No. 3, pp.394-406.
    • (2010) International Journal of Collaborative Enterprise , vol.1 , Issue.3 , pp. 394-406
    • Wang, X.1    Alexander, S.M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.