메뉴 건너뛰기




Volumn 25, Issue 5, 2008, Pages 1-54

Text mining infrastructure in R

Author keywords

Count based evaluation; R; String kernels; Text classification; Text clustering; Text mining

Indexed keywords


EID: 46749110035     PISSN: 15487660     EISSN: 15487660     Source Type: Journal    
DOI: 10.18637/jss.v025.i05     Document Type: Article
Times cited : (975)

References (88)
  • 1
    • 33750170787 scopus 로고    scopus 로고
    • Adeva JJG, Calvo R (2006). Mining Text with Pimiento. IEEE Internet Computing, 10(4), 27-35. ISSN 1089-7801. doi:10.1109/MIC.2006.85.
    • Adeva JJG, Calvo R (2006). "Mining Text with Pimiento." IEEE Internet Computing, 10(4), 27-35. ISSN 1089-7801. doi:10.1109/MIC.2006.85.
  • 4
    • 46749140131 scopus 로고    scopus 로고
    • Bates D, Maechler M 2007, Matrix: A Matrix Package for R. R package version 0.999375-2, URL
    • Bates D, Maechler M (2007). Matrix: A Matrix Package for R. R package version 0.999375-2, URL http://CRAN.R-project.org/package= Matrix.
  • 8
    • 84867919822 scopus 로고
    • Transformation-based Error-driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
    • Bill E (1995). "Transformation-based Error-driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging." Computational Linguistics, 21(4), 543-565.
    • (1995) Computational Linguistics , vol.21 , Issue.4 , pp. 543-565
    • Bill, E.1
  • 9
    • 0004342289 scopus 로고    scopus 로고
    • Hierarchical Taxonomies Using Divise Partitioning
    • Technical Report 98-012, University of Minnesota
    • Boley D (1998). "Hierarchical Taxonomies Using Divise Partitioning." Technical Report 98-012, University of Minnesota.
    • (1998)
    • Boley, D.1
  • 17
    • 32444442667 scopus 로고    scopus 로고
    • A Unified View of Kernel k-Means, Spectral Clustering and Graph Partitioning
    • Technical report, University of Texas at Austin
    • Dhillon I, Guan Y, Kulis B (2005). "A Unified View of Kernel k-Means, Spectral Clustering and Graph Partitioning." Technical report, University of Texas at Austin.
    • (2005)
    • Dhillon, I.1    Guan, Y.2    Kulis, B.3
  • 18
    • 32144458563 scopus 로고    scopus 로고
    • Application of Latent Semantic Analysis to Protein Remote Homology Detection
    • ISSN 1367-4803. doi:10.1093/bioinformatics/bti801
    • Dong QW, Wang XL, Lin L (2006). "Application of Latent Semantic Analysis to Protein Remote Homology Detection." Bioinformatics, 22(3), 285-290. ISSN 1367-4803. doi:10.1093/bioinformatics/bti801.
    • (2006) Bioinformatics , vol.22 , Issue.3 , pp. 285-290
    • Dong, Q.W.1    Wang, X.L.2    Lin, L.3
  • 19
    • 46749127238 scopus 로고    scopus 로고
    • Feinerer I 2007a, openNLP: OpenNLP Interface. R package version 0.1, URL
    • Feinerer I (2007a). openNLP: OpenNLP Interface. R package version 0.1, URL http://CRAN.R-project.org/package=openNLP.
  • 20
    • 46749144027 scopus 로고    scopus 로고
    • Feinerer I 2007b, tm: Text Mining Package. R package version 0.3, URL
    • Feinerer I (2007b). tm: Text Mining Package. R package version 0.3, URL http://CRAN.R-project.org/package=tm.
  • 21
    • 46749116252 scopus 로고    scopus 로고
    • Feinerer I 2007c, wordnet: WordNet Interface. R package version 0.1, URL
    • Feinerer I (2007c). wordnet: WordNet Interface. R package version 0.1, URL http://CRAN.R-project.org/package=wordnet.
  • 22
    • 46749116670 scopus 로고    scopus 로고
    • Feinerer I, Hornik K (2007). Text Mining of Supreme Administrative Court Jurisdictions. In C Preisach, H Burkhardt, L Schmidt-Thieme, R Decker (eds.), Data Analysis, Machine Learning, and Applications (Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., March 7-9, 2007, Freiburg, Germany), Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag.
    • Feinerer I, Hornik K (2007). "Text Mining of Supreme Administrative Court Jurisdictions." In C Preisach, H Burkhardt, L Schmidt-Thieme, R Decker (eds.), "Data Analysis, Machine Learning, and Applications (Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., March 7-9, 2007, Freiburg, Germany)," Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag.
  • 23
    • 46749094048 scopus 로고    scopus 로고
    • Feinerer I, Wild F (2007). Automated Coding of Qualitative Interviews with Latent Semantic Analysis. In H Mayr, D Karagiannis (eds.), Proceedings of the 6th International Conference on Information Systems Technology and its Applications, May 23-25, 2007, Kharkiv, Ukraine, 107 of Lecture Notes in Informatics, pp. 66-77. Gesellschaft für Informatik e.V., Bonn, Germany.
    • Feinerer I, Wild F (2007). "Automated Coding of Qualitative Interviews with Latent Semantic Analysis." In H Mayr, D Karagiannis (eds.), "Proceedings of the 6th International Conference on Information Systems Technology and its Applications, May 23-25, 2007, Kharkiv, Ukraine," volume 107 of Lecture Notes in Informatics, pp. 66-77. Gesellschaft für Informatik e.V., Bonn, Germany.
  • 24
    • 0004289791 scopus 로고    scopus 로고
    • Fellbaum C ed, Bradford Books. ISBN 0-262-06197-X
    • Fellbaum C (ed.) (1998). WordNet: An Electronic Lexical Database. Bradford Books. ISBN 0-262-06197-X.
    • (1998) WordNet: An Electronic Lexical Database
  • 28
    • 13544267765 scopus 로고    scopus 로고
    • Bayesian Analysis of a Multinomial Sequence and Homogeneity of Literary Style
    • Girón J, Ginebra J, Riba A (2005). "Bayesian Analysis of a Multinomial Sequence and Homogeneity of Literary Style." The American Statistician, 59(1), 19-30.
    • (2005) The American Statistician , vol.59 , Issue.1 , pp. 19-30
    • Girón, J.1    Ginebra, J.2    Riba, A.3
  • 31
    • 0001138328 scopus 로고
    • Algorithm AS 136: A K-means Clustering Algorithm (AS R39: 81V30 P355-356)
    • Hartigan JA, Wong MA (1979). "Algorithm AS 136: A K-means Clustering Algorithm (AS R39: 81V30 P355-356)." Applied Statistics, 28, 100-108.
    • (1979) Applied Statistics , vol.28 , pp. 100-108
    • Hartigan, J.A.1    Wong, M.A.2
  • 35
    • 10344250352 scopus 로고    scopus 로고
    • Who was the Author? An Introduction to Stylometry
    • Holmes D, Kardos J (2003). "Who was the Author? An Introduction to Stylometry." Chance, 16(2), 5-8.
    • (2003) Chance , vol.16 , Issue.2 , pp. 5-8
    • Holmes, D.1    Kardos, J.2
  • 36
    • 27544491443 scopus 로고    scopus 로고
    • A CLUE for CLUster Ensembles
    • URL
    • Hornik K (2005). "A CLUE for CLUster Ensembles." Journal of Statistical Software, 14(12). URL http://www.jstatsoft.org/v14/i12/.
    • (2005) Journal of Statistical Software , vol.14 , Issue.12
    • Hornik, K.1
  • 37
    • 46749087704 scopus 로고    scopus 로고
    • Hornik K 2007a, clue: Cluster Ensembles. R package version 0.3-17, URL
    • Hornik K (2007a). clue: Cluster Ensembles. R package version 0.3-17, URL http://CRAN.R-project.org/package=clue.
  • 38
    • 46749094900 scopus 로고    scopus 로고
    • Hornik K 2007b, Snowball: Snowball Stemmers. R package version 0.0-1
    • Hornik K (2007b). Snowball: Snowball Stemmers. R package version 0.0-1.
  • 39
    • 46749091531 scopus 로고    scopus 로고
    • Hornik K, Zeileis A, Hothorn T, Buchta C 2007, RWeka: An R Interface to Weka. R package version 0.3-9, URL
    • Hornik K, Zeileis A, Hothorn T, Buchta C (2007). RWeka: An R Interface to Weka. R package version 0.3-9, URL http://CRAN.R-project.org/ package=RWeka.
  • 41
    • 0038167128 scopus 로고    scopus 로고
    • Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms
    • Kluwer Academic Publishers, Boston
    • Joachims T (2002). Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. The Kluwer International Series In Engineerig And Computer Science. Kluwer Academic Publishers, Boston.
    • (2002) The Kluwer International Series In Engineerig And Computer Science
    • Joachims, T.1
  • 42
    • 0014129195 scopus 로고
    • Hierarchical Clustering Schemes
    • Johnson S (1967). "Hierarchical Clustering Schemes." Psychometrika, 2, 241-254.
    • (1967) Psychometrika , vol.2 , pp. 241-254
    • Johnson, S.1
  • 43
    • 84879562945 scopus 로고    scopus 로고
    • Karatzoglou A, Feinerer I (2007). Text Clustering with String Kernels in R. In R Decker, HJ Lenz (eds.), Advances in Data Analysis (Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006), Studies in Classification, Data Analysis, and Knowledge Organization, pp. 91-98. Springer-Verlag.
    • Karatzoglou A, Feinerer I (2007). "Text Clustering with String Kernels in R." In R Decker, HJ Lenz (eds.), "Advances in Data Analysis (Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006)," Studies in Classification, Data Analysis, and Knowledge Organization, pp. 91-98. Springer-Verlag.
  • 44
    • 11244352554 scopus 로고    scopus 로고
    • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). kernlab - An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9). URL http://www.jstatsoft.org/v11/i09/.
    • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). "kernlab - An S4 Package for Kernel Methods in R." Journal of Statistical Software, 11(9). URL http://www.jstatsoft.org/v11/i09/.
  • 45
    • 46749155718 scopus 로고    scopus 로고
    • Karatzoglou A, Smola A, Hornik K, Zeileis A 2006, kernlab: Kernel Methods Lab. R package version 0.8-1, URL
    • Karatzoglou A, Smola A, Hornik K, Zeileis A (2006). kernlab: Kernel Methods Lab. R package version 0.8-1, URL http://CRAN.R-project. org/package=kernlab.
  • 46
    • 80053431219 scopus 로고    scopus 로고
    • An Introduction to Latent Semantic Analysis
    • Landauer T, Foltz P, Laham D (1998). "An Introduction to Latent Semantic Analysis." Discourse Processes, 25, 259-284.
    • (1998) Discourse Processes , vol.25 , pp. 259-284
    • Landauer, T.1    Foltz, P.2    Laham, D.3
  • 48
  • 49
    • 46749140761 scopus 로고    scopus 로고
    • Using KCCA for Japanese-English Cross-Language Information Retrieval and Classification
    • Y, Shawe-Taylor J 2007, URL http://eprints.ees.soton.ac.uk/10786
    • Y, Shawe-Taylor J (2007). "Using KCCA for Japanese-English Cross-Language Information Retrieval and Classification." Journal of Intelligent Information Systems. URL http://eprints.ees.soton.ac.uk/10786/.
    • Journal of Intelligent Information Systems
  • 51
    • 0001457509 scopus 로고
    • Some Methods for Classification and Analysis of Multivariate Observations
    • University of California Press, Berkeley
    • MacQueen J (1967). "Some Methods for Classification and Analysis of Multivariate Observations." In "Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability," volume 1, pp. 281-297. University of California Press, Berkeley.
    • (1967) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , vol.1 , pp. 281-297
    • MacQueen, J.1
  • 52
    • 5444246382 scopus 로고    scopus 로고
    • URL
    • Manola F, Miller E (2004). RDF Primer. World Wide Web Consortium. URL http://www.w3.org/TR/rdf-primer/.
    • (2004) RDF Primer
    • Manola, F.1    Miller, E.2
  • 54
    • 46749147840 scopus 로고    scopus 로고
    • Meyer D, Buchta C 2007, proxy: Distance and Similarity Measures. R package version 0.2, URL
    • Meyer D, Buchta C (2007). proxy: Distance and Similarity Measures. R package version 0.2, URL http://CRAN. R-project.org/package= proxy.
  • 57
    • 34249852033 scopus 로고
    • Building a Large Annotated Corpus of English: The Penn Treebank
    • URL
    • Mitchell M, Santorini B, Marcinkiewicz MA (1993). "Building a Large Annotated Corpus of English: The Penn Treebank." Computational Linguistics, 19(2), 313-330. URL ftp://ftp.cis.upenn.edu/pub/ treebank/doc/c193.ps.gz.
    • (1993) Computational Linguistics , vol.19 , Issue.2 , pp. 313-330
    • Mitchell, M.1    Santorini, B.2    Marcinkiewicz, M.A.3
  • 58
    • 46749110943 scopus 로고    scopus 로고
    • Mueller JP 2006, ttda: Tools for Textual Data Analysis. R package version 0.1.1, URL
    • Mueller JP (2006). "ttda: Tools for Textual Data Analysis." R package version 0.1.1, URL http://wwwpeople.unil.ch/jean- pierre.mueller/.
  • 60
    • 84899013108 scopus 로고    scopus 로고
    • T Dietterich, S Becker, Z Ghahramani eds, Advances in Neural Information Processing Systems
    • Ng A, Jordan M, Weiss Y (2002). "On Spectral Clustering: Analysis and an Algorithm." In T Dietterich, S Becker, Z Ghahramani (eds.), "Advances in Neural Information Processing Systems," volume 14.
    • (2002) On Spectral Clustering: Analysis and an Algorithm , vol.14
    • Ng, A.1    Jordan, M.2    Weiss, Y.3
  • 61
    • 10344248121 scopus 로고    scopus 로고
    • Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution
    • Nilo J, Binongo G (2003). "Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution." Chance, 16(2), 9-17.
    • (2003) Chance , vol.16 , Issue.2 , pp. 9-17
    • Nilo, J.1    Binongo, G.2
  • 62
    • 46749154907 scopus 로고    scopus 로고
    • Interacting with Data Using the Filehash Package
    • URL
    • Peng RD (2006). "Interacting with Data Using the Filehash Package." R News, 6(4), 19-24. URL http://CRAN.R-project.org/ doc/Rnews/.
    • (2006) R News , vol.6 , Issue.4 , pp. 19-24
    • Peng, R.D.1
  • 63
    • 46749112102 scopus 로고    scopus 로고
    • Piatetsky-Shapiro G (2005). Poll on Text Mining Tools Used in 2004. Checked on 2006-09-17, URL http://www.kdnuggets.com/polls/2005/ text_mining_tools.htm.
    • Piatetsky-Shapiro G (2005). "Poll on Text Mining Tools Used in 2004." Checked on 2006-09-17, URL http://www.kdnuggets.com/polls/2005/ text_mining_tools.htm.
  • 64
    • 0000582788 scopus 로고    scopus 로고
    • An Algorithm for Suffix Stripping
    • Reprint
    • Porter M (1997). "An Algorithm for Suffix Stripping." Readings in Information Retrieval, pp. 313-316. Reprint.
    • (1997) Readings in Information Retrieval , pp. 313-316
    • Porter, M.1
  • 65
    • 46749145598 scopus 로고    scopus 로고
    • R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
    • R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
  • 67
    • 24944532498 scopus 로고    scopus 로고
    • An E-mail Analysis Method Based on Text Mining Techniques
    • doi, 10.1016/j.asoc.2004.10.007
    • Sakurai S, Suyama A (2005). "An E-mail Analysis Method Based on Text Mining Techniques." Applied Soft Computing, 6(1), 62-71. doi : 10.1016/j.asoc.2004.10.007.
    • (2005) Applied Soft Computing , vol.6 , Issue.1 , pp. 62-71
    • Sakurai, S.1    Suyama, A.2
  • 69
    • 0002442796 scopus 로고    scopus 로고
    • Machine Learning in Automated Text Categorization
    • ISSN 0360-0300. doi: 10.1145/505282.505283
    • Sebastiani F (2002). "Machine Learning in Automated Text Categorization." ACM Computing Surveys, 34(1), 1-47. ISSN 0360-0300. doi: 10.1145/505282.505283.
    • (2002) ACM Computing Surveys , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 73
    • 2442439674 scopus 로고    scopus 로고
    • A Comparison of Document Clustering Techniques
    • URL
    • Steinbach M, Karypis G, Kumar V (2000). "A Comparison of Document Clustering Techniques." In "KDD Workshop on Text Mining," URL http://www.cs.cmu.edu/~dunja/PapersWshKDD2000.html.
    • (2000) KDD Workshop on Text Mining
    • Steinbach, M.1    Karypis, G.2    Kumar, V.3
  • 74
    • 46749128384 scopus 로고    scopus 로고
    • Strehl A, Ghosh J, Mooney RJ (2000). Impact of Similarity Measures on Web-page Clustering. In Proc. AAAI Workshop on AI for Web Search (AAAI 2000), Austin, pp. 58-64. AAAI/MIT Press. ISBN 1-57735-116-9. URL http://strehl.com/download/strehl-aaai00.pdf.
    • Strehl A, Ghosh J, Mooney RJ (2000). "Impact of Similarity Measures on Web-page Clustering." In "Proc. AAAI Workshop on AI for Web Search (AAAI 2000), Austin," pp. 58-64. AAAI/MIT Press. ISBN 1-57735-116-9. URL http://strehl.com/download/strehl-aaai00.pdf.
  • 75
    • 46749104342 scopus 로고    scopus 로고
    • Temple Lang D 2004, Rstem: Interface to Snowball Implementation of Porter's Word Stemming Algorithm. R package version 0.2-0, URL
    • Temple Lang D (2004). Rstem: Interface to Snowball Implementation of Porter's Word Stemming Algorithm. R package version 0.2-0, URL http://www.omegahat.org/Rstem/.
  • 76
    • 46749131347 scopus 로고    scopus 로고
    • Temple Lang D 2006, XML: Tools for Parsing and Generating XML within R and S-Plus. R package version 0.99-8, URL
    • Temple Lang D (2006). XML: Tools for Parsing and Generating XML within R and S-Plus. R package version 0.99-8, URL http://CRAN.R-project.org/package=XML.
  • 78
    • 0003895851 scopus 로고    scopus 로고
    • Springer-Verlag, New York, fourth edition. ISBN 0-387-95457-0, URL
    • Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Springer-Verlag, New York, fourth edition. ISBN 0-387-95457-0, URL http://www.stats.ox.ac.uk/pub/MASS4/.
    • (2002) Modern Applied Statistics with S
    • Venables, W.N.1    Ripley, B.D.2
  • 79
    • 33749236901 scopus 로고    scopus 로고
    • Fast Kernels for String and Tree Matching
    • K Tsuda, B Schölkopf, J Vert eds, MIT Press, Cambridge, MA
    • Vishwanathan S, Smola A (2004). "Fast Kernels for String and Tree Matching." In K Tsuda, B Schölkopf, J Vert (eds.), "Kernels and Bioinformatics," MIT Press, Cambridge, MA.
    • (2004) Kernels and Bioinformatics
    • Vishwanathan, S.1    Smola, A.2
  • 80
    • 0002531715 scopus 로고    scopus 로고
    • Dynamic Alignment Kernels
    • A Smola, P Bartlett, B Schölkopf, D Schuurmans eds, pp, MIT Press, Cambridge, MA
    • Watkins C (2000). "Dynamic Alignment Kernels." In A Smola, P Bartlett, B Schölkopf, D Schuurmans (eds.), "Advances in Large Margin Classifiers," pp. 39-50. MIT Press, Cambridge, MA.
    • (2000) Advances in Large Margin Classifiers , pp. 39-50
    • Watkins, C.1
  • 82
    • 46749122912 scopus 로고    scopus 로고
    • Wild F 2005, Isa: Latent Semantic Analysis. R package version 0.57, URL
    • Wild F (2005). Isa: Latent Semantic Analysis. R package version 0.57, URL http://CRAN.R-project.org/package=lsa.
  • 85
  • 86
    • 3543085722 scopus 로고    scopus 로고
    • Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering
    • ISSN 0885-6125. doi:10.1023/B:MACH. 0000027785.44527.d6
    • Zhao Y, Karypis G (2004). "Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering." Machine Learning, 55(3), 311-331. ISSN 0885-6125. doi:10.1023/B:MACH. 0000027785.44527.d6.
    • (2004) Machine Learning , vol.55 , Issue.3 , pp. 311-331
    • Zhao, Y.1    Karypis, G.2
  • 87
    • 24044537630 scopus 로고    scopus 로고
    • Hierarchical Clustering Algorithms for Document Datasets
    • Zhao Y, Karypis G (2005a). "Hierarchical Clustering Algorithms for Document Datasets." Data Mining and Knowledge Discovery, 10(2), 141-168.
    • (2005) Data Mining and Knowledge Discovery , vol.10 , Issue.2 , pp. 141-168
    • Zhao, Y.1    Karypis, G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.