SCOPUS 정보 검색 플랫폼

Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Volumn , Issue , 2015, Pages 9-16

A Parallel Distributed Weka Framework for Big Data Mining Using Spark

(5) Koliopoulos, Aris Kyriakos a Yiapanis, Paraskevas a Tekiner, Firat a Nenadic, Goran a Keane, John a

a UNIVERSITY OF MANCHESTER (United Kingdom)

Author keywords

Big Data; Data Mining; Distributed Systems; Machine Learning; Spark; Weka

Indexed keywords

ARTIFICIAL INTELLIGENCE; BIG DATA; ELECTRIC SPARKS; LEARNING SYSTEMS; USER INTERFACES;

DISTRIBUTED FRAMEWORK; DISTRIBUTED SYSTEMS; INTUITIVE INTERFACES; ITERATIVE COMPUTATION; KNOWLEDGE EXTRACTION; PROCESSING CAPABILITY; SEQUENTIAL EXECUTION; WEKA;

DATA MINING;

EID: 84953864174 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/BigDataCongress.2015.12 Document Type: Conference Paper

Times cited : (54)

References (29)

1
- 84885987846
- Stamford, CT:Gartner
- M. Beyer and D. Laney, "The importance of big data: A definition, " Stamford, CT:Gartner.
- The Importance of Big Data: A Definition
- Beyer, M.¹ Laney, D.²

2
- 84870452716
- "Apache Hadoop, " http://hadoop.apache.org/.
- Apache Hadoop

3
- 37549003336
- MapReduce: Simplified data processing on large clusters
- J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters, " Com ACM, pp. 107-113, 2008.
- (2008) Com ACM , pp. 107-113
- Dean, J.¹ Ghemawat, S.²

4
- 84900451613
- "Apache Spark, " https://spark.apache.org/.
- Apache Spark

5
- 84887447695
- The family of mapreduce and large-scale data processing systems
- S. Sakr, A. Liu, and A. G. Fayoumi, "The Family of Mapreduce and Large-scale Data Processing Systems, " ACM Comput. Surv., vol. 46, no. 1, pp. 11:1-11:44, 2013.
- (2013) ACM Comput. Surv , vol.46 , Issue.1 , pp. 1101-1144
- Sakr, S.¹ Liu, A.² Fayoumi, A.G.³

6
- 85026965461
- "SparkR, " http://amplab-extras.github.io/SparkR-pkg/.
- SparkR

7
- 79957859069
- SystemML: Declarative machine learning on mapreduce
- A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan, "SystemML: Declarative Machine Learning on MapReduce, " in Interntl Conf. on Data Engineering, 2011, pp. 231-242.
- (2011) Interntl Conf. on Data Engineering , pp. 231-242
- Ghoting, A.¹ Krishnamurthy, R.² Pednault, E.³ Reinwald, B.⁴ Sindhwani, V.⁵ Tatikonda, S.⁶ Tian, Y.⁷ Vaithyanathan, S.⁸

8
- 76749092270
- The WEKA data mining software: An update
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA Data Mining Software: An Update, " SIGKDD Explor. Newsl., pp. 10-18, 2009.
- (2009) SIGKDD Explor. Newsl , pp. 10-18
- Hall, M.¹ Frank, E.² Holmes, G.³ Pfahringer, B.⁴ Reutemann, P.⁵ Witten, I.H.⁶

9
- 80255141699
- accessed 2015-02-02
- R. A. Muenchen, "The Popularity of Data Analysis Software, " http://r4stats.com/articles/popularity//, accessed: 2015-02-02.
- The Popularity of Data Analysis Software
- Muenchen, R.A.¹

10
- 77951181190
- Toolkit-based high-performance data mining of large data on mapreduce clusters
- D. Wegener, M. Mock, D. Adranale, and S. Wrobel, "Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters, " in ICDM, 2009, pp. 296-301.
- (2009) ICDM , pp. 296-301
- Wegener, D.¹ Mock, M.² Adranale, D.³ Wrobel, S.⁴

11
- 85040175609
- Resilient distributed datasets: A fault-Tolerant abstraction for inmemory cluster Computing
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mc-Cauley, M. J. Franklin, S. Shenker, and I. Stoica, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for Inmemory Cluster Computing, " in NSDI, 2012.
- (2012) NSDI
- Zaharia, M.¹ Chowdhury, M.² Das, T.³ Dave, A.⁴ Ma, J.⁵ Mc-Cauley, M.⁶ Franklin, M.J.⁷ Shenker, S.⁸ Stoica, I.⁹

12
- 0002433547
- From data mining to knowledge discovery: An overview
- U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From Data Mining to Knowledge Discovery: An Overview, " in Advances in KDDM, 1996, pp. 1-34.
- (1996) Advances in KDDM , pp. 1-34
- Fayyad, U.M.¹ Piatetsky-Shapiro, G.² Smyth, P.³

13
- 84991833843
- 3rd Edition
- I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, 2011.
- (2011) Data Mining: Practical Machine Learning Tools and Techniques
- Witten, I.H.¹ Frank, E.²

14
- 84953912432
- M. Hall, "Weka and Hadoop, " http://markahall.blogspot.co.uk/2013/10/weka-And-hadooppart-1.html/.
- Weka and Hadoop
- Hall, M.¹

15
- 0030403087
- Parallel mining of association rules
- R. Agrawal and J. C. Shafer, "Parallel Mining of Association Rules, " Knowl. and Data Eng., pp. 962-969, 1996.
- (1996) Knowl. and Data Eng , pp. 962-969
- Agrawal, R.¹ Shafer, J.C.²

16
- 84858620646
- Disk-locality in datacenter computing considered irrelevant
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, "Disk-locality in Datacenter Computing Considered Irrelevant, " in Conf. on Hot Topics in Operating Systems, 2011.
- (2011) Conf. on Hot Topics in Operating Systems
- Ananthanarayanan, G.¹ Ghodsi, A.² Shenker, S.³ Stoica, I.⁴

17
- 84893331360
- Scale-up vs scale-out for hadoop: Time to rethink?
- R. Appuswamy, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron, "Scale-up vs Scale-out for Hadoop: Time to Rethink?" in Cloud Computing, 2013, pp. 20:1-20:13.
- (2013) Cloud Computing , pp. 2001-2013
- Appuswamy, R.¹ Gkantsidis, C.² Narayanan, D.³ Hodson, O.⁴ Rowstron, A.⁵

18
- 84904561738
- Chapman and Hall/CRC
- C. C. Aggarwal, Ed., Data Classification: Algorithms and Applications. Chapman and Hall/CRC, 2014.
- (2014) Data Classification: Algorithms and Applications
- Aggarwal, C.C.¹

19
- 84870749286
- "Apache Mahout, " http://mahout.apache.org/.
- Apache Mahout

20
- 84894647945
- MLI: An API for distributed machine learning
- E. R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. E. Gonzalez, M. J. Franklin, M. I. Jordan, and T. Kraska, "MLI: An API for distributed machine learning, " ICDM, 2013.
- (2013) ICDM
- Sparks, E.R.¹ Talwalkar, A.² Smith, V.³ Kottalam, J.⁴ Pan, X.⁵ Gonzalez, J.E.⁶ Franklin, M.J.⁷ Jordan, M.I.⁸ Kraska, T.⁹

21
- 84863169230
- Radoop: Analyzing big data with rapidminer and hadoop, 2011.
- (2011) Radoop: Analyzing Big Data with Rapidminer and Hadoop

22
- 77954751910
- Ricardo: Integrating R and Hadoop
- S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, and J. McPherson, "Ricardo: Integrating R and Hadoop, " in Intl Conf. on Management of Data, 2010, pp. 987-998.
- (2011) Intl Conf. on Management of Data , pp. 987-998
- Das, S.¹ Sismanis, Y.² Beyer, K.S.³ Gemulla, R.⁴ Haas, P.J.⁵ McPherson, J.⁶

23
- 84959538188
- accessed 2015-03-03
- "RHIPE, " https://www.datadr.org/, accessed: 2015-03-03.
- RHIPE

24
- 84923930001
- RABID: A distributed parallel R for large datasets
- H. Lin, S. Yang, and S. Midkiff, "RABID: A distributed parallel R for large datasets, " in Congress on Big Data, 2014, pp. 725-732.
- (2014) Congress on Big Data , pp. 725-732
- Lin, H.¹ Yang, S.² Midkiff, S.³

25
- 85026967969
- "MLib, " https://spark.apache.org/mllib/.
- MLib

26
- 26944487959
- Adapting the weka data mining toolkit to a grid based environment
- M. Prez, A. Snchez, P. Herrero, V. Robles, and J. Pea, "Adapting the Weka Data Mining Toolkit to a Grid Based Environment, " Web Intelligence, pp. 492-497, 2005.
- (2005) Web Intelligence , pp. 492-497
- Prez, M.¹ Snchez, A.² Herrero, P.³ Robles, V.⁴ Pea, J.⁵

27
- 33646175301
- Weka-parallel: Machine learning in parallel
- S. Celis and D. Musicant, "Weka-parallel: machine learning in parallel, " Carleton College, Tech. Rep., 2002.
- (2002) Carleton College, Tech. Rep
- Celis, S.¹ Musicant, D.²

28
- 33646429469
- Weka4WS: A WSRFenabled weka toolkit for distributed data mining on grids
- D. Talia, P. Trunfio, and O. Verta, "Weka4WS: A WSRFEnabled Weka Toolkit for Distributed Data Mining on Grids, " Knowledge Discovery in Databases, pp. 309-320, 2005.
- (2005) Knowledge Discovery in Databases , pp. 309-320
- Talia, D.¹ Trunfio, P.² Verta, O.³

29
- 0034592784
- Efficient clustering of high-dimensional data sets with application to reference Matching
- A. McCallum, K. Nigam, and L. H. Ungar, "Efficient Clustering of High-dimensional Data Sets with Application to Reference Matching, " in KDD, 2000, pp. 169-178.
- (2000) KDD , pp. 169-178
- McCallum, A.¹ Nigam, K.² Ungar, L.H.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.