메뉴 건너뛰기




Volumn 12, Issue 2-3, 2006, Pages 275-308

Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets

Author keywords

Class noise; Data cleansing; Machine learning

Indexed keywords

ALGORITHMS; CLASSIFICATION (OF INFORMATION); DATA ACQUISITION; DISTRIBUTED DATABASE SYSTEMS; ERROR ANALYSIS; GLOBAL OPTIMIZATION; LEARNING SYSTEMS;

EID: 33646598358     PISSN: 13845810     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10618-005-0012-8     Document Type: Article
Times cited : (44)

References (53)
  • 1
    • 0025725905 scopus 로고
    • Instance-based learning algorithms
    • Aha, D., Kibler, D., and Albert, M. 1991. Instance-based learning algorithms. Machine Learning, 6(1):37-66.
    • (1991) Machine Learning , vol.6 , Issue.1 , pp. 37-66
    • Aha, D.1    Kibler, D.2    Albert, M.3
  • 6
    • 0347709675 scopus 로고    scopus 로고
    • Comparison of various routines for unknown attribute value processing the covering paradigm
    • Bruha, I. and Franck, F. 1996. Comparison of various routines for unknown attribute value processing the covering paradigm. IJPRAI 10(8):939-955
    • (1996) IJPRAI , vol.10 , Issue.8 , pp. 939-955
    • Bruha, I.1    Franck, F.2
  • 10
    • 34249966007 scopus 로고
    • The CN2 induction algorithm
    • Clark, P. and Niblett, T. 1989. The CN2 induction algorithm. Machine Learning, 3(4):261-283.
    • (1989) Machine Learning , vol.3 , Issue.4 , pp. 261-283
    • Clark, P.1    Niblett, T.2
  • 11
    • 0002914363 scopus 로고
    • Rule induction with CN2: Some recent improvement
    • Berlin, Springer-Verlag
    • Clark, P. and Boswell, R. 1991. Rule induction with CN2: Some recent improvement. Proc. of 5th ECML, Berlin, Springer-Verlag.
    • (1991) Proc. of 5th ECML
    • Clark, P.1    Boswell, R.2
  • 12
    • 80053403826 scopus 로고    scopus 로고
    • Ensemble methods in machine learning
    • J. Kittler and F. Roli, (Eds.), Springer, Berlin
    • Dietterich, T. 2000. Ensemble methods in machine learning. In Lecture Notes in Computer Science Vol. 1867, J. Kittler and F. Roli, (Eds.), Springer, Berlin: pp. 1-15.
    • (2000) Lecture Notes in Computer Science , vol.1867 , pp. 1-15
    • Dietterich, T.1
  • 13
    • 0011984911 scopus 로고    scopus 로고
    • Experiments with noise filtering in a medical domain
    • San Francisco, CA
    • Gamberger, D., Lavrac, N., and Groselj, C. 1999. Experiments with noise filtering in a medical domain. Proc. of 16th ICML Conference, San Francisco, CA, pp. 143-151.
    • (1999) Proc. of 16th ICML Conference , pp. 143-151
    • Gamberger, D.1    Lavrac, N.2    Groselj, C.3
  • 14
    • 0034143132 scopus 로고    scopus 로고
    • Noise detection and elimination in data preprocessing: Experiments in medical domains
    • Gamberger, D. Lavrac, N., and Dzeroski, S. 2000. Noise detection and elimination in data preprocessing: Experiments in medical domains. Applied Artificial Intelligence, 14:205-223.
    • (2000) Applied Artificial Intelligence , vol.14 , pp. 205-223
    • Gamberger, D.1    Lavrac, N.2    Dzeroski, S.3
  • 15
    • 4444353068 scopus 로고    scopus 로고
    • A comparison of several approaches to missing attribute values in data mining
    • Grzymala-Busse, J.W. and Hu, M. 2000. A comparison of several approaches to missing attribute values in data mining. Rough Sets and Current Trends in Computing, pp. 378-385.
    • (2000) Rough Sets and Current Trends in Computing , pp. 378-385
    • Grzymala-Busse, J.W.1    Hu, M.2
  • 17
    • 0017480535 scopus 로고
    • A recursive partitioning decision rule for nonparametric classification
    • Friedman, J.H. 1977. A recursive partitioning decision rule for nonparametric classification. IEEE Transaction on Computers, 26(4):404-408.
    • (1977) IEEE Transaction on Computers , vol.26 , Issue.4 , pp. 404-408
    • Friedman, J.H.1
  • 19
    • 0027580356 scopus 로고
    • Very simple classification rules perform well on most commonly used datasets
    • Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11.
    • (1993) Machine Learning , vol.11
    • Holte, R.C.1
  • 20
    • 33646585878 scopus 로고    scopus 로고
    • A grey-based nearest neighbor approach for predicting missing attribute values
    • Taiwan, NSC-90-2213-E-011-052
    • Huang, C.C. and Lee, H.M. 2001, A grey-based nearest neighbor approach for predicting missing attribute values. Proc. of 2001 National Computer Symposium, Taiwan, NSC-90-2213-E-011-052.
    • (2001) Proc. of 2001 National Computer Symposium
    • Huang, C.C.1    Lee, H.M.2
  • 21
    • 33646546132 scopus 로고    scopus 로고
    • IBM Almaden Research, Synthetic classification data generator
    • IBM Synthetic Data. IBM Almaden Research, Synthetic classification data generator, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata, html#elassSynData.
    • IBM Synthetic Data
  • 23
    • 0003563503 scopus 로고
    • Experiments in automatic learning of medical diagnostic rules
    • Jozef Stefan Institute, Ljubljana, Yugoslavia
    • Kononenko, I., Bratko, I., and Roskar, E. 1984. Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia.
    • (1984) Technical Report
    • Kononenko, I.1    Bratko, I.2    Roskar, E.3
  • 24
    • 78149286827 scopus 로고    scopus 로고
    • Probabilistic noise identification and data cleaning
    • FL, USA
    • Kubica, J. and Moore, A. 2003. Probabilistic noise identification and data cleaning. Proc. of ICDM, FL, USA
    • (2003) Proc. of ICDM
    • Kubica, J.1    Moore, A.2
  • 25
    • 85124125604 scopus 로고
    • Heterogeneous uncertainty sampling for supervised learning
    • NJ, Morgan Kaufmann
    • Lewis, D. and Catlett, J. 1994. Heterogeneous uncertainty sampling for supervised learning. Proc. of the 11th ICML Conference, NJ, Morgan Kaufmann: pp. 148-156.
    • (1994) Proc. of the 11th ICML Conference , pp. 148-156
    • Lewis, D.1    Catlett, J.2
  • 27
    • 85005299854 scopus 로고
    • The multi-purpose incremental learning system AQ15 and its testing application to three medical domains
    • Michalski, R.S., Mozetic, I., Hong, J., and Lavrac, N. 1986. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of AAAI, pp. 1041-1045.
    • (1986) Proceedings of AAAI , pp. 1041-1045
    • Michalski, R.S.1    Mozetic, I.2    Hong, J.3    Lavrac, N.4
  • 31
    • 0141771188 scopus 로고    scopus 로고
    • A survey of methods for scaling up inductive algorithms
    • Provost, F. and Kolluri, V. 1999. A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3(2):131-169.
    • (1999) Data Mining and Knowledge Discovery , vol.3 , Issue.2 , pp. 131-169
    • Provost, F.1    Kolluri, V.2
  • 32
    • 33744584654 scopus 로고
    • Induction of decision trees
    • Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1(1):81-106.
    • (1986) Machine Learning , vol.1 , Issue.1 , pp. 81-106
    • Quinlan, J.R.1
  • 35
    • 0025448521 scopus 로고
    • The strength of weak learnability
    • Schapire, R.E. 1990. The strength of weak learnability. Machine Learning, 5(2):197-227.
    • (1990) Machine Learning , vol.5 , Issue.2 , pp. 197-227
    • Schapire, R.E.1
  • 37
    • 0012657799 scopus 로고
    • Prototype and feature selection by sampling and random mutation hill climbing algorithms
    • New Brunswick, NJ. Morgan Kaufmann
    • Skalak, D. 1994. Prototype and feature selection by sampling and random mutation hill climbing algorithms, Proc. of 11th ICML Conference, New Brunswick, NJ. Morgan Kaufmann, pp. 293-301.
    • (1994) Proc. of 11th ICML Conference , pp. 293-301
    • Skalak, D.1
  • 40
    • 0016969272 scopus 로고
    • An experiment with edited nearest-neighbor rule
    • Tomek, I. 1976. An experiment with edited nearest-neighbor rule. IEEE Trans. on Sys. Man and Cyber., 6(6):448-452.
    • (1976) IEEE Trans. on Sys. Man and Cyber. , vol.6 , Issue.6 , pp. 448-452
    • Tomek, I.1
  • 46
    • 0015361129 scopus 로고
    • Asymptotic properties of nearest neighbor rales using edited data
    • Wilson, D. 1972. Asymptotic properties of nearest neighbor rales using edited data. IEEE Trans. on SMC, 2:408-421.
    • (1972) IEEE Trans. on SMC , vol.2 , pp. 408-421
    • Wilson, D.1
  • 47
    • 0343081513 scopus 로고    scopus 로고
    • Reduction techniques for examplar-based learning algorithms
    • Wilson, D. and Martinez, T.R. 2000. Reduction techniques for examplar-based learning algorithms. Machine Learning, 38(3):257-268.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 257-268
    • Wilson, D.1    Martinez, T.R.2
  • 48
    • 0000027741 scopus 로고
    • Learning structural descriptions from examples
    • McGraw-Hill, New York
    • Winston, P. 1975. Learning structural descriptions from examples. The Psychology of Computer Vision, McGraw-Hill, New York.
    • (1975) The Psychology of Computer Vision
    • Winston, P.1
  • 50
    • 0032049926 scopus 로고    scopus 로고
    • American Society for Information Science
    • Wu, X. 1998. Rule induction with extension matrices. American Society for Information Science, 49(5):435-454.
    • (1998) Rule Induction with Extension Matrices , vol.49 , Issue.5 , pp. 435-454
    • Wu, X.1
  • 52
  • 53
    • 19544372918 scopus 로고    scopus 로고
    • Class noise vs attribute noise: A quantitative study of their impacts
    • Zhu, X. and Wu, X. 2004. Class noise vs attribute noise: A quantitative study of their impacts. Artificial Intelligence Review, 22(3-4):177-210.
    • (2004) Artificial Intelligence Review , vol.22 , Issue.3-4 , pp. 177-210
    • Zhu, X.1    Wu, X.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.