메뉴 건너뛰기




Volumn 66, Issue 3, 2008, Pages 368-381

Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization

Author keywords

Cascade generalization; Decision tree; Entity matching; Heterogeneous databases; Record linkage

Indexed keywords

CLASSIFICATION (OF INFORMATION); DECISION THEORY; DECISION TREES; HUMAN COMPUTER INTERACTION;

EID: 47849087202     PISSN: 0169023X     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.datak.2008.04.007     Document Type: Article
Times cited : (16)

References (66)
  • 1
    • 0002696312 scopus 로고    scopus 로고
    • Matching records in a national medical patient index
    • Bell G.B., and Sethi A. Matching records in a national medical patient index. Communications of the ACM 44 9 (2001) 83-88
    • (2001) Communications of the ACM , vol.44 , Issue.9 , pp. 83-88
    • Bell, G.B.1    Sethi, A.2
  • 2
    • 77954003729 scopus 로고    scopus 로고
    • I. Bhattacharya, L. Getoor, Iterative record linkage for cleaning and integration, in: Proceedings of the Ninth ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004.
    • I. Bhattacharya, L. Getoor, Iterative record linkage for cleaning and integration, in: Proceedings of the Ninth ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2004.
  • 3
    • 77952372966 scopus 로고    scopus 로고
    • M. Bilenko, R.J. Mooney, Adaptive duplicate detection using learnable string similarity measures, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2003, pp. 39-48.
    • M. Bilenko, R.J. Mooney, Adaptive duplicate detection using learnable string similarity measures, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, 2003, pp. 39-48.
  • 4
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Breiman L. Bagging predictors. Machine Learning 24 2 (1996) 123-140
    • (1996) Machine Learning , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 5
    • 0001858087 scopus 로고
    • Multivariate decision trees
    • Brodley C.E., and Utgoff P.E. Multivariate decision trees. Machine Learning 19 1 (1995) 45-77
    • (1995) Machine Learning , vol.19 , Issue.1 , pp. 45-77
    • Brodley, C.E.1    Utgoff, P.E.2
  • 6
    • 47849111084 scopus 로고    scopus 로고
    • C.D. Budzinsky, Automated spelling correction, Unpublished Report, Statistics Canada, Ottawa, 1991.
    • C.D. Budzinsky, Automated spelling correction, Unpublished Report, Statistics Canada, Ottawa, 1991.
  • 7
    • 0029271867 scopus 로고
    • Rule based joins in heterogeneous databases
    • Chatterjee A., and Segev A. Rule based joins in heterogeneous databases. Decision Support Systems 13 3-4 (1995) 313-333
    • (1995) Decision Support Systems , vol.13 , Issue.3-4 , pp. 313-333
    • Chatterjee, A.1    Segev, A.2
  • 8
    • 26444550791 scopus 로고    scopus 로고
    • S. Chaudhuri, V. Ganti, R. Motwani, Robust identification of fuzzy duplicates, in: Proceedings of the International Conference on Data Engineering, Tokyo, Japan, 2005, pp. 865-876.
    • S. Chaudhuri, V. Ganti, R. Motwani, Robust identification of fuzzy duplicates, in: Proceedings of the International Conference on Data Engineering, Tokyo, Japan, 2005, pp. 865-876.
  • 11
    • 84863154946 scopus 로고    scopus 로고
    • P.T. Davis, D.K. Elson, J.L. Klavans, Methods for precise named entity matching in digital collections, in: Proceedings of the 2003 Joint Conference on Digital Libraries, 2003, pp. 27-31.
    • P.T. Davis, D.K. Elson, J.L. Klavans, Methods for precise named entity matching in digital collections, in: Proceedings of the 2003 Joint Conference on Digital Libraries, 2003, pp. 27-31.
  • 12
    • 4344686171 scopus 로고    scopus 로고
    • Record matching in data warehouses: a decision model for data consolidation
    • Dey D. Record matching in data warehouses: a decision model for data consolidation. Operations Research 51 2 (2003) 240-254
    • (2003) Operations Research , vol.51 , Issue.2 , pp. 240-254
    • Dey, D.1
  • 13
    • 0032182242 scopus 로고    scopus 로고
    • A probabilistic decision model for entity matching in heterogeneous databases
    • Dey D., Sarkar S., and De P. A probabilistic decision model for entity matching in heterogeneous databases. Management Science 44 10 (1998) 1379-1395
    • (1998) Management Science , vol.44 , Issue.10 , pp. 1379-1395
    • Dey, D.1    Sarkar, S.2    De, P.3
  • 14
    • 0036565014 scopus 로고    scopus 로고
    • A distance-based approach to entity reconciliation in heterogeneous databases
    • Dey D., Sarkar S., and De P. A distance-based approach to entity reconciliation in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering 14 3 (2002) 567-582
    • (2002) IEEE Transactions on Knowledge and Data Engineering , vol.14 , Issue.3 , pp. 567-582
    • Dey, D.1    Sarkar, S.2    De, P.3
  • 15
    • 2342615638 scopus 로고    scopus 로고
    • Profile-based object matching for information integration
    • Doan A., Lu Y., Lee Y., and Han J. Profile-based object matching for information integration. IEEE Intelligent Systems 18 5 (2003) 54-59
    • (2003) IEEE Intelligent Systems , vol.18 , Issue.5 , pp. 54-59
    • Doan, A.1    Lu, Y.2    Lee, Y.3    Han, J.4
  • 16
    • 47849091034 scopus 로고    scopus 로고
    • P. Domingos, A unified bias-variance decomposition and its applications, in: Proceedings of 17th International Conference on Machine Learning, 2000, pp. 231-238.
    • P. Domingos, A unified bias-variance decomposition and its applications, in: Proceedings of 17th International Conference on Machine Learning, 2000, pp. 231-238.
  • 17
    • 47849104793 scopus 로고    scopus 로고
    • M.E. Fair, Record linkage in an information age society, in: Proceedings of Record Linkage Techniques - 1997, 1997, pp. 427-441.
    • M.E. Fair, Record linkage in an information age society, in: Proceedings of Record Linkage Techniques - 1997, 1997, pp. 427-441.
  • 19
    • 47849101981 scopus 로고    scopus 로고
    • Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148-156.
    • Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the Thirteenth International Conference on Machine Learning, 1996, pp. 148-156.
  • 20
    • 0034541162 scopus 로고    scopus 로고
    • Cascade generalization
    • Gama J., and Brazdil P. Cascade generalization. Machine Learning 41 3 (2000) 315-343
    • (2000) Machine Learning , vol.41 , Issue.3 , pp. 315-343
    • Gama, J.1    Brazdil, P.2
  • 21
    • 47849104368 scopus 로고    scopus 로고
    • M. Ganesh, J. Srivastava, T. Richardson, Mining entity-identification rules for database integration, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 291-294.
    • M. Ganesh, J. Srivastava, T. Richardson, Mining entity-identification rules for database integration, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 291-294.
  • 22
    • 47849132294 scopus 로고    scopus 로고
    • K. Gilhooly, Dirty data blights the bottom line, Computerworld, November 07, 2005.
    • K. Gilhooly, Dirty data blights the bottom line, Computerworld, November 07, 2005.
  • 23
    • 47849106891 scopus 로고    scopus 로고
    • I.J. Haimowitz, Ö. Gür-Ali, H. Schwarz, Integrating and mining distributed customer databases, in: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997, pp. 179-182.
    • I.J. Haimowitz, Ö. Gür-Ali, H. Schwarz, Integrating and mining distributed customer databases, in: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997, pp. 179-182.
  • 25
    • 0013331361 scopus 로고    scopus 로고
    • Real-world data is dirty: data cleansing and the merge/purge problem
    • Hernández M.A., and Stolfo S.J. Real-world data is dirty: data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2 1 (1998) 9-37
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.1 , pp. 9-37
    • Hernández, M.A.1    Stolfo, S.J.2
  • 27
    • 0026390019 scopus 로고
    • Classifying schematic and data heterogeneity in multidatabase systems
    • Kim W., and Seo J. Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24 12 (1991) 12-18
    • (1991) IEEE Computer , vol.24 , Issue.12 , pp. 12-18
    • Kim, W.1    Seo, J.2
  • 28
    • 47849122086 scopus 로고    scopus 로고
    • R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, pp. 1137-1143.
    • R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, pp. 1137-1143.
  • 29
    • 47849123092 scopus 로고    scopus 로고
    • M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: Onesided sampling, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 179-186.
    • M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: Onesided sampling, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 179-186.
  • 30
    • 8644242446 scopus 로고    scopus 로고
    • W. Lam, R. Huang, P.-S. Cheung, Learning phonetic similarity for matching named entity translations and mining new translations, in: Proceedings of the Twenty-seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, pp. 289-296.
    • W. Lam, R. Huang, P.-S. Cheung, Learning phonetic similarity for matching named entity translations and mining new translations, in: Proceedings of the Twenty-seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, pp. 289-296.
  • 32
    • 1342347426 scopus 로고    scopus 로고
    • The design and implementation of a corporate householding knowledge processor to improve data quality
    • Madnick S.E., Wang Y.R., and Xian X. The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems 20 3 (2003) 41-69
    • (2003) Journal of Management Information Systems , vol.20 , Issue.3 , pp. 41-69
    • Madnick, S.E.1    Wang, Y.R.2    Xian, X.3
  • 33
    • 33646765912 scopus 로고    scopus 로고
    • Conditional models of identity uncertainty with application to noun coreference
    • McCallum A., and Wellner B. Conditional models of identity uncertainty with application to noun coreference. Advances in Neural Information Processing Systems 17 (2005) 905-912
    • (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 905-912
    • McCallum, A.1    Wellner, B.2
  • 34
    • 6444245574 scopus 로고    scopus 로고
    • Enhancing information systems management with natural language processing techniques
    • Métais E. Enhancing information systems management with natural language processing techniques. Data & Knowledge Engineering 41 2-3 (2002) 247-272
    • (2002) Data & Knowledge Engineering , vol.41 , Issue.2-3 , pp. 247-272
    • Métais, E.1
  • 35
    • 47849119593 scopus 로고    scopus 로고
    • A.E. Monge, C.P. Elkan, The filed matching problem: algorithms and applications, in: Proceedings of the second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 267-270.
    • A.E. Monge, C.P. Elkan, The filed matching problem: algorithms and applications, in: Proceedings of the second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 267-270.
  • 36
    • 0002431740 scopus 로고    scopus 로고
    • Automatic construction of decision trees from data: a multi-disciplinary survey
    • Murthy S.K. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining and Knowledge Discovery 2 4 (1998) 345-389
    • (1998) Data Mining and Knowledge Discovery , vol.2 , Issue.4 , pp. 345-389
    • Murthy, S.K.1
  • 39
    • 47849119104 scopus 로고    scopus 로고
    • J.C. Pinheiro, D.X. Sun, Methods for linking and mining massive heterogeneous databases, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998, pp. 309-313.
    • J.C. Pinheiro, D.X. Sun, Methods for linking and mining massive heterogeneous databases, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998, pp. 309-313.
  • 40
    • 1142288175 scopus 로고    scopus 로고
    • Element matching across data-oriented XML sources using a multi-strategy clustering model
    • Pluempitiwiriyawej C., and Hammer J. Element matching across data-oriented XML sources using a multi-strategy clustering model. Data & Knowledge Engineering 48 3 (2004) 297-333
    • (2004) Data & Knowledge Engineering , vol.48 , Issue.3 , pp. 297-333
    • Pluempitiwiriyawej, C.1    Hammer, J.2
  • 41
    • 0002442571 scopus 로고
    • Discovering rules by induction from large collections of examples
    • Michie D. (Ed), Edinburgh University Press, Edinburgh
    • Quinlan J.R. Discovering rules by induction from large collections of examples. In: Michie D. (Ed). Expert Systems in the Micro-electronic Age (1979), Edinburgh University Press, Edinburgh 168-201
    • (1979) Expert Systems in the Micro-electronic Age , pp. 168-201
    • Quinlan, J.R.1
  • 43
    • 0346970930 scopus 로고    scopus 로고
    • Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project
    • Saggion H., Cunningham H., Bontcheva K., Maynard D., Hamza O., and Wilks Y. Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data & Knowledge Engineering 48 2 (2004) 247-264
    • (2004) Data & Knowledge Engineering , vol.48 , Issue.2 , pp. 247-264
    • Saggion, H.1    Cunningham, H.2    Bontcheva, K.3    Maynard, D.4    Hamza, O.5    Wilks, Y.6
  • 44
    • 0242456811 scopus 로고    scopus 로고
    • S. Sarawagi, A. Bhamidipaty, Interactive deduplication using active learning, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, 2002.
    • S. Sarawagi, A. Bhamidipaty, Interactive deduplication using active learning, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, 2002.
  • 48
    • 0003893064 scopus 로고
    • World Scientific Publishing Co. Pte. Ltd., River Edge, NJ
    • Stephen G.A. String Searching Algorithms (1994), World Scientific Publishing Co. Pte. Ltd., River Edge, NJ
    • (1994) String Searching Algorithms
    • Stephen, G.A.1
  • 50
    • 0035545848 scopus 로고    scopus 로고
    • Learning object identification rules for information integration
    • Tejada S., Knoblock C.A., and Minton S. Learning object identification rules for information integration. Information Systems 26 8 (2001) 607-633
    • (2001) Information Systems , vol.26 , Issue.8 , pp. 607-633
    • Tejada, S.1    Knoblock, C.A.2    Minton, S.3
  • 52
    • 47849097021 scopus 로고    scopus 로고
    • A.C. Trembly, Poor data quality: A $600 billion issue, National Underwriter Property & Casualty - Risk & Benefits Management, March 18, 2002 Edition.
    • A.C. Trembly, Poor data quality: A $600 billion issue, National Underwriter Property & Casualty - Risk & Benefits Management, March 18, 2002 Edition.
  • 53
    • 0001164493 scopus 로고
    • Shift of Bias for Inductive Concept Learning
    • Michalski R., Carbonell J., and Mitchell T. (Eds), Morgan Kaufmann, Los Altos, CA (Chapter 5)
    • Utgoff P.E. Shift of Bias for Inductive Concept Learning. In: Michalski R., Carbonell J., and Mitchell T. (Eds). Machine Learning: An Artificial Intelligence Approach, Vol. II (1986), Morgan Kaufmann, Los Altos, CA 107-148 (Chapter 5)
    • (1986) Machine Learning: An Artificial Intelligence Approach, Vol. II , pp. 107-148
    • Utgoff, P.E.1
  • 56
    • 47849126967 scopus 로고    scopus 로고
    • W.E. Winkler, Matching and record linkage, in: Proceedings of Record Linkage Techniques - 1997, 1997, pp. 374-403.
    • W.E. Winkler, Matching and record linkage, in: Proceedings of Record Linkage Techniques - 1997, 1997, pp. 374-403.
  • 57
    • 47849119350 scopus 로고    scopus 로고
    • W.E. Winkler, Record linkage software and methods for merging administrative lists, Exchange of Technology and Know-How, Luxembourg, 1999, pp. 313-323.
    • W.E. Winkler, Record linkage software and methods for merging administrative lists, Exchange of Technology and Know-How, Luxembourg, 1999, pp. 313-323.
  • 59
    • 0026692226 scopus 로고
    • Stacked generalization
    • Wolpert D.H. Stacked generalization. Neural Networks 5 2 (1992) 241-259
    • (1992) Neural Networks , vol.5 , Issue.2 , pp. 241-259
    • Wolpert, D.H.1
  • 60
    • 85133070181 scopus 로고
    • The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework
    • Addison-Wesley
    • Wolpert D.H. The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning (1994), Addison-Wesley 117-214
    • (1994) Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , pp. 117-214
    • Wolpert, D.H.1
  • 61
    • 3142679542 scopus 로고    scopus 로고
    • W. Wu, C. Yu, A. Doan, W. Meng, An interactive clustering-based approach to integrating source query interfaces on the deep Web, in: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 95-106.
    • W. Wu, C. Yu, A. Doan, W. Meng, An interactive clustering-based approach to integrating source query interfaces on the deep Web, in: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 95-106.
  • 62
    • 33845920025 scopus 로고    scopus 로고
    • Semantic matching across heterogeneous data sources
    • Zhao H. Semantic matching across heterogeneous data sources. Communications of the ACM 50 1 (2007) 45-50
    • (2007) Communications of the ACM , vol.50 , Issue.1 , pp. 45-50
    • Zhao, H.1
  • 64
    • 5644287747 scopus 로고    scopus 로고
    • Entity identification for heterogeneous database integration - a multiple classifier system approach and empirical evaluation
    • Zhao H., and Ram S. Entity identification for heterogeneous database integration - a multiple classifier system approach and empirical evaluation. Information Systems 30 2 (2005) 119-132
    • (2005) Information Systems , vol.30 , Issue.2 , pp. 119-132
    • Zhao, H.1    Ram, S.2
  • 66
    • 33947161876 scopus 로고    scopus 로고
    • Combining schema and instance information for integrating heterogeneous data sources
    • Zhao H., and Ram S. Combining schema and instance information for integrating heterogeneous data sources. Data & Knowledge Engineering 61 2 (2007) 281-303
    • (2007) Data & Knowledge Engineering , vol.61 , Issue.2 , pp. 281-303
    • Zhao, H.1    Ram, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.