메뉴 건너뛰기




Volumn 70, Issue , 2007, Pages 111-119

A two-step classification approach to unsupervised record linkage

Author keywords

Clustering; Data linkage; Data matching; Deduplication; Entity resolution; Quality measures; Support vector machines

Indexed keywords

CLUSTERING; DATA LINKAGE; DATA MATCHING; DEDUPLICATION; QUALITY MEASURES;

EID: 44649135932     PISSN: 14451336     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Article
Times cited : (27)

References (34)
  • 5
    • 33746054079 scopus 로고    scopus 로고
    • Adaptive product normalization: Using online learning for record linkage in comparison shopping
    • (ICDM05) Houston, Texas
    • Bilenko, M., Basu, S. & Sahami, M. (2005), Adaptive product normalization: Using online learning for record linkage in comparison shopping, in'IEEE International Conference on Data Mining' (ICDM'05), Houston, Texas, pp. 58-65.
    • (2005) IEEE International Conference on Data Mining , pp. 58-65
    • Bilenko, M.1    Basu, S.2    Sahami, M.3
  • 6
    • 0003710380 scopus 로고    scopus 로고
    • Department of Computer Science, National Taiwan University. Software available at
    • Chang, C.-C. & Lin, C.-J. (2001), LIBSVM: a library for support vector machines, manual. Department of Computer Science, National Taiwan University. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    • (2001) LIBSVM: a library for support vector machines, manual
    • Chang, C.-C.1    Lin, C.-J.2
  • 9
    • 78449293191 scopus 로고    scopus 로고
    • A comparison of personal name matching: techniques and practical issues
    • (MCD) held at IEEE ICDM06, Hong Kong
    • Christen, P. (2006), A comparison of personal name matching: techniques and practical issues, in 'Workshop on Mining Complex Data' (MCD), held at IEEE ICDM'06, Hong Kong.
    • (2006) Workshop on Mining Complex Data
    • Christen, P.1
  • 10
    • 33846428121 scopus 로고    scopus 로고
    • Quality and complexity measures for data linkage and deduplication
    • F. Guillet & H. Hamilton, eds, Springer Studies in Computational Intelligence
    • Christen, P. & Goiser, K. (2007), Quality and complexity measures for data linkage and deduplication, in F. Guillet & H. Hamilton, eds, 'Quality Measures in Data Mining', Springer Studies in Computational Intelligence, vol. 43, pp. 127-151.
    • (2007) Quality Measures in Data Mining , vol.43 , pp. 127-151
    • Christen, P.1    Goiser, K.2
  • 12
    • 4344570142 scopus 로고    scopus 로고
    • Practical introduction to record linkage for injury research
    • Clarke, D.E. (2004), 'Practical introduction to record linkage for injury research', Injury Prevention, vol. 10, pp. 186-191.
    • (2004) Injury Prevention , vol.10 , pp. 186-191
    • Clarke, D.E.1
  • 13
    • 0242540438 scopus 로고    scopus 로고
    • Learning to match and cluster large high-dimensional data sets for data integration
    • (SIGKDD02) Edmonton
    • Cohen, W.W. & Richman, J. (2002), Learning to match and cluster large high-dimensional data sets for data integration, in 'ACM International Conference on Knowledge Discovery and Data Mining' (SIGKDD'02), Edmonton, pp. 475-480.
    • (2002) ACM International Conference on Knowledge Discovery and Data Mining , pp. 475-480
    • Cohen, W.W.1    Richman, J.2
  • 17
    • 1642332418 scopus 로고    scopus 로고
    • Methods for automatic record matching and linking and their use in national statistics
    • National Statistics, London
    • Gill, L. (2001), 'Methods for automatic record matching and linking and their use in national statistics', National Statistics Methodology Series, no. 25, National Statistics, London.
    • (2001) National Statistics Methodology Series , Issue.25
    • Gill, L.1
  • 18
    • 65449179112 scopus 로고    scopus 로고
    • Towards automated record linkage
    • (AusDM06) Sydney, Conferences in Research and Practice in Information Technology (CRPIT)
    • Goiser K. & Christen, P. (2006), Towards automated record linkage, in 'Australasian Data Mining Conference' (AusDM'06), Sydney, Conferences in Research and Practice in Information Technology (CRPIT), vol. 61, pp. 23-31.
    • (2006) Australasian Data Mining Conference , vol.61 , pp. 23-31
    • Goiser, K.1    Christen, P.2
  • 19
    • 37149056535 scopus 로고    scopus 로고
    • Decision models for record linkage
    • Springer LNCS 3755
    • Gu, L. & Baxter, R. (2006), Decision models for record linkage, in 'Selected Papers from AusDM', Springer LNCS 3755, pp. 146-160.
    • (2006) Selected Papers from AusDM , pp. 146-160
    • Gu, L.1    Baxter, R.2
  • 20
    • 0036450652 scopus 로고    scopus 로고
    • Research use of linked health data - A best practice protocol
    • Kelman, C.W., Bass, J. & Holman, D. (2002), 'Research use of linked health data - A best practice protocol', Aust NZ Journal of Public Health, vol. 26, pp. 251-255.
    • (2002) Aust NZ Journal of Public Health , vol.26 , pp. 251-255
    • Kelman, C.W.1    Bass, J.2    Holman, D.3
  • 21
    • 0742311711 scopus 로고    scopus 로고
    • Partially supervised classification of text documents
    • (ICML02) Sydney, Australia
    • Liu, B., Lee, W.S., Yu, P.S. & Li, X. (2002), Partially supervised classification of text documents, in 'International Conference on Machine Learning' (ICML'02), Sydney, Australia, pp. 387-394.
    • (2002) International Conference on Machine Learning , pp. 387-394
    • Liu, B.1    Lee, W.S.2    Yu, P.S.3    Li, X.4
  • 22
    • 78149306870 scopus 로고    scopus 로고
    • Building text classifiers using positive and unlabeled examples
    • (ICDM03) Melbourne, Florida
    • Liu, B., Dai, Y., Li, X., Lee, W.S. & Yu, P.S. (2003), Building text classifiers using positive and unlabeled examples, in 'IEEE International Conference on Data Mining' (ICDM'03), Melbourne, Florida, pp. 179-186.
    • (2003) IEEE International Conference on Data Mining , pp. 179-186
    • Liu, B.1    Dai, Y.2    Li, X.3    Lee, W.S.4    Yu, P.S.5
  • 24
    • 0037867900 scopus 로고    scopus 로고
    • Two approaches to handling noisy variation in text mining
    • (TextML02) Sydney, Australia
    • Nahm, U.Y., Bilenko, M. & Mooney, R.J. (2002), Two approaches to handling noisy variation in text mining, in 'ICML'02 workshop on text learning' (TextML'02), Sydney, Australia, pp. 18-27.
    • (2002) ICML02 workshop on text learning , pp. 18-27
    • Nahm, U.Y.1    Bilenko, M.2    Mooney, R.J.3
  • 26
    • 0033886806 scopus 로고    scopus 로고
    • Text classification from labeled and unlabeled documents using EM
    • Nigam, K., McCallum, A.K., Thrun, S. & Mitchell, T. (2000), 'Text classification from labeled and unlabeled documents using EM', Machine Learning, vol. 39, no. 2, pp. 103-134.
    • (2000) Machine Learning , vol.39 , Issue.2 , pp. 103-134
    • Nigam, K.1    McCallum, A.K.2    Thrun, S.3    Mitchell, T.4
  • 27
    • 0002490026 scopus 로고    scopus 로고
    • Data cleaning: Problems and current approaches
    • Rahm, E. & Do, H.H. (2000), 'Data cleaning: Problems and current approaches', IEEE Data Engineering Bulletin, vol. 23, no. 4, pp. 3-13.
    • (2000) IEEE Data Engineering Bulletin , vol.23 , Issue.4 , pp. 3-13
    • Rahm, E.1    Do, H.H.2
  • 29
    • 0242456803 scopus 로고    scopus 로고
    • Learning domain-independent string transformation weights for high accuracy object identification
    • (SIGKDD02) Edmonton, Canada
    • Tejada S., Knoblock C.A. & Minton S. (2000), Learning domain-independent string transformation weights for high accuracy object identification, in 'ACM International Conference on Knowledge Discovery and Data Mining' (SIGKDD'02), Edmonton, Canada, pp. 350-359.
    • (2000) ACM International Conference on Knowledge Discovery and Data Mining , pp. 350-359
    • Tejada, S.1    Knoblock, C.A.2    Minton, S.3
  • 30
    • 85008018414 scopus 로고    scopus 로고
    • Automatically detecting criminal identity deception: An adaptive detection algorithm, IEEE Transactions on Systems
    • Wang, G., Chen, H., Xu, J.J. & Atabakhsh, H. (2006), 'Automatically detecting criminal identity deception: An adaptive detection algorithm', IEEE Transactions on Systems, Man and Cybernetics (Part A), vol. 36, no. 5, pp. 988-999.
    • (2006) Man and Cybernetics (Part A) , vol.36 , Issue.5 , pp. 988-999
    • Wang, G.1    Chen, H.2    Xu, J.J.3    Atabakhsh, H.4
  • 31
    • 33846411033 scopus 로고    scopus 로고
    • Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage
    • US Bureau of the Census
    • Winkler, W.E. (2000), 'Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage', Technical report RR2000/05, US Bureau of the Census.
    • (2000) Technical report RR2000/05
    • Winkler, W.E.1
  • 32
    • 2942709772 scopus 로고    scopus 로고
    • Methods for evaluating and creating data quality, Information Systems
    • Winkler, W.E. (2004), 'Methods for evaluating and creating data quality', Information Systems, Elsevier, vol. 29, no. 7, pp. 531-550.
    • (2004) Elsevier , vol.29 , Issue.7 , pp. 531-550
    • Winkler, W.E.1
  • 33
    • 33845615644 scopus 로고    scopus 로고
    • Overview of record linkage and current research directions
    • US Bureau of the Census
    • Winkler, W.E. (2006), 'Overview of record linkage and current research directions', Technical report RR2006/02, US Bureau of the Census.
    • (2006) Technical report RR2006/02
    • Winkler, W.E.1
  • 34


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.