-
1
-
-
84919389078
-
Challenges of big data analysis
-
Fan, J., Han, F., Liu, H., Challenges of big data analysis. Nat. Sci. Rev. 1:2 (2014), 293–314, 10.1093/nsr/nwt032.
-
(2014)
Nat. Sci. Rev.
, vol.1
, Issue.2
, pp. 293-314
-
-
Fan, J.1
Han, F.2
Liu, H.3
-
2
-
-
84903270767
-
Applying statistical thinking to ‘Big Data’ problems
-
Hoerl, R., Snee, R., De Veaux, R., Applying statistical thinking to ‘Big Data’ problems. Wiley Interdiscip. Rev.: Comput. Stat. 6:4 (2014), 222–232, 10.1002/wics.1306.
-
(2014)
Wiley Interdiscip. Rev.: Comput. Stat.
, vol.6
, Issue.4
, pp. 222-232
-
-
Hoerl, R.1
Snee, R.2
De Veaux, R.3
-
3
-
-
84885041315
-
On statistics, computation and scalability
-
Jordan, M., On statistics, computation and scalability. Bernoulli 19:4 (2013), 1378–1390, 10.3150/12-BEJSP17.
-
(2013)
Bernoulli
, vol.19
, Issue.4
, pp. 1378-1390
-
-
Jordan, M.1
-
4
-
-
77954709174
-
Data warehousing and analytics infrastructure at Facebook
-
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H., Data warehousing and analytics infrastructure at Facebook. Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, 2010, 1013–1020.
-
(2010)
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010
, pp. 1013-1020
-
-
Thusoo, A.1
Shao, Z.2
Anthony, S.3
Borthakur, D.4
Jain, N.5
Sen Sarma, J.6
Murthy, R.7
Liu, H.8
-
5
-
-
85029404715
-
Big data – Retour vers le futur 3. De statisticien à data scientist
-
arXiv:1403.3758 arXiv preprint
-
Besse, P., Garivier, A., Loubes, J., Big data – Retour vers le futur 3. De statisticien à data scientist. arXiv preprint arXiv:1403.3758, 2014.
-
(2014)
-
-
Besse, P.1
Garivier, A.2
Loubes, J.3
-
6
-
-
84926319635
-
Big data for modern industry: challenges and trends
-
Yin, S., Kaynak, O., Big data for modern industry: challenges and trends. Proceedings of the IEEE, vol. 103, 2015, 143–146.
-
(2015)
Proceedings of the IEEE
, vol.103
, pp. 143-146
-
-
Yin, S.1
Kaynak, O.2
-
7
-
-
84888087862
-
Scalable strategies for computing with massive data
-
Kane, M., Emerson, J., Weston, S., Scalable strategies for computing with massive data. J. Stat. Softw., 55, 2013 http://www.jstatsoft.org/v55/i14.
-
(2013)
J. Stat. Softw.
, vol.55
-
-
Kane, M.1
Emerson, J.2
Weston, S.3
-
8
-
-
84975753826
-
R: A Language and Environment for Statistical Computing
-
R Foundation for Statistical Computing Vienna, Austria
-
R Core Team. R: A Language and Environment for Statistical Computing. 2016, R Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org.
-
(2016)
-
-
-
9
-
-
85029433230
-
Statistique et big data analytics. Volumétrie, l'attaque des clones
-
arXiv:1405.6676 arXiv preprint
-
Besse, P., Villa-Vialaneix, N., Statistique et big data analytics. Volumétrie, l'attaque des clones. arXiv preprint arXiv:1405.6676, 2014.
-
(2014)
-
-
Besse, P.1
Villa-Vialaneix, N.2
-
10
-
-
84979940943
-
A survey of statistical methods and computing for big data
-
arXiv:1502.07989 arXiv preprint
-
Wang, C., Chen, M., Schifano, E., Wu, J., Yan, J., A survey of statistical methods and computing for big data. arXiv preprint arXiv:1502.07989, 2015.
-
(2015)
-
-
Wang, C.1
Chen, M.2
Schifano, E.3
Wu, J.4
Yan, J.5
-
11
-
-
70350657266
-
Fast approximate spectral clustering
-
J. Elder F. Soulié-Fogelman P. Flach M. Zaki ACM New York, NY, USA
-
Yan, D., Huang, L., Jordan, M., Fast approximate spectral clustering. Elder, J., Soulié-Fogelman, F., Flach, P., Zaki, M., (eds.) Proceedings of the 15th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2009, ACM, New York, NY, USA, 907–916, 10.1145/1557019.1557118.
-
(2009)
Proceedings of the 15th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining
, pp. 907-916
-
-
Yan, D.1
Huang, L.2
Jordan, M.3
-
12
-
-
84905910853
-
A scalable bootstrap for massive data
-
Kleiner, A., Talwalkar, A., Sarkar, P., Jordan, M., A scalable bootstrap for massive data. J. R. Stat. Soc., Ser. B, Stat. Methodol. 76:4 (2014), 795–816.
-
(2014)
J. R. Stat. Soc., Ser. B, Stat. Methodol.
, vol.76
, Issue.4
, pp. 795-816
-
-
Kleiner, A.1
Talwalkar, A.2
Sarkar, P.3
Jordan, M.4
-
13
-
-
84897512709
-
Scalable simple random sampling and stratified sampling
-
W&CP Georgia, USA
-
Meng, X., Scalable simple random sampling and stratified sampling. Proceedings of the 30th International Conference on Machine Learning, ICML 2013 JMLR, vol. 28, 2013, W&CP, Georgia, USA.
-
(2013)
Proceedings of the 30th International Conference on Machine Learning, ICML 2013, JMLR
, vol.28
-
-
Meng, X.1
-
14
-
-
0036036832
-
Approximate clustering via core-sets
-
J. Reif ACM New York, NY, USA
-
Bǎdoiu, M., Har-Peled, S., Indyk, P., Approximate clustering via core-sets. Reif, J., (eds.) Proceedings of the 34th Annual ACM Symposium on Theory of Computing, no. 250–257, 2002, ACM, New York, NY, USA, 10.1145/509907.509947.
-
(2002)
Proceedings of the 34th Annual ACM Symposium on Theory of Computing, no. 250–257
-
-
Bǎdoiu, M.1
Har-Peled, S.2
Indyk, P.3
-
15
-
-
84873118945
-
Early accurate results for advanced analytics on MapReduce
-
Istanbul, Turkey
-
Laptev, N., Zeng, K., Zaniolo, C., Early accurate results for advanced analytics on MapReduce. Proceedings of the 28th International Conference on Very Large Data Bases, Istanbul, Turkey Proc. VLDB Endow., 5, 2012.
-
(2012)
Proceedings of the 28th International Conference on Very Large Data Bases, Proc. VLDB Endow.
, vol.5
-
-
Laptev, N.1
Zeng, K.2
Zaniolo, C.3
-
16
-
-
56049109090
-
Map-Reduce for machine learning on multicore
-
J. Lafferty C. Williams J. Shawe-Taylor R. Zemel A. Culotta Hyatt Regency, Vancouver, Canada
-
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K., Map-Reduce for machine learning on multicore. Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A., (eds.) Advances in Neural Information Processing Systems (NIPS 2010), Hyatt Regency, Vancouver, Canada, vol. 23, 2010, 281–288.
-
(2010)
Advances in Neural Information Processing Systems (NIPS 2010)
, vol.23
, pp. 281-288
-
-
Chu, C.1
Kim, S.2
Lin, Y.3
Yu, Y.4
Bradski, G.5
Ng, A.6
Olukotun, K.7
-
17
-
-
84946566592
-
A split-and-conquer approach for analysis of extraordinarily large data
-
Chen, X., Xie, M., A split-and-conquer approach for analysis of extraordinarily large data. Stat. Sin. 24 (2014), 1655–1684.
-
(2014)
Stat. Sin.
, vol.24
, pp. 1655-1684
-
-
Chen, X.1
Xie, M.2
-
18
-
-
84875499797
-
Computational and statistical tradeoffs via convex relaxation
-
Chandrasekaran, V., Jordan, M., Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 13 (2013), E1181–E1190.
-
(2013)
Proc. Natl. Acad. Sci. USA
, vol.13
, pp. E1181-E1190
-
-
Chandrasekaran, V.1
Jordan, M.2
-
19
-
-
33745777639
-
Incremental support vector learning: analysis, implementation and application
-
Laskov, P., Gehl, C., Krüger, S., Müller, K., Incremental support vector learning: analysis, implementation and application. J. Mach. Learn. Res. 7 (2006), 1909–1936.
-
(2006)
J. Mach. Learn. Res.
, vol.7
, pp. 1909-1936
-
-
Laskov, P.1
Gehl, C.2
Krüger, S.3
Müller, K.4
-
20
-
-
77953178544
-
On-line random forests
-
IEEE
-
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H., On-line random forests. Proceedings of IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009, IEEE, 1393–1400.
-
(2009)
Proceedings of IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops
, pp. 1393-1400
-
-
Saffari, A.1
Leistner, C.2
Santner, J.3
Godec, M.4
Bischof, H.5
-
21
-
-
0035478854
-
Random forests
-
Breiman, L., Random forests. Mach. Learn. 45:1 (2001), 5–32 http://www.springerlink.com/content/u0p06167n6173512/fulltext.pdf.
-
(2001)
Mach. Learn.
, vol.45
, Issue.1
, pp. 5-32
-
-
Breiman, L.1
-
22
-
-
84933565370
-
Consistency of random forests
-
Scornet, E., Biau, G., Vert, J., Consistency of random forests. Ann. Stat. 43:4 (2015), 1716–1741, 10.1214/15-AOS1321.
-
(2015)
Ann. Stat.
, vol.43
, Issue.4
, pp. 1716-1741
-
-
Scornet, E.1
Biau, G.2
Vert, J.3
-
23
-
-
77958064179
-
Mining data with random forests: a survey and results of new tests
-
Verikas, A., Gelzinis, A., Bacauskiene, M., Mining data with random forests: a survey and results of new tests. Pattern Recognit. 44:2 (2011), 330–349, 10.1016/j.patcog.2010.08.011.
-
(2011)
Pattern Recognit.
, vol.44
, Issue.2
, pp. 330-349
-
-
Verikas, A.1
Gelzinis, A.2
Bacauskiene, M.3
-
24
-
-
84890868650
-
Mining data with random forests: current options for real-world applications
-
Ziegler, A., König, I., Mining data with random forests: current options for real-world applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 4:1 (2014), 55–63, 10.1002/widm.1114.
-
(2014)
Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
, vol.4
, Issue.1
, pp. 55-63
-
-
Ziegler, A.1
König, I.2
-
25
-
-
33846516584
-
Pattern Recognition and Machine Learning
-
Springer-Verlag New York, NY, USA
-
Bishop, C., Pattern Recognition and Machine Learning. 2006, Springer-Verlag, New York, NY, USA.
-
(2006)
-
-
Bishop, C.1
-
26
-
-
0003684449
-
The Elements of Statistical Learning
-
2nd edition Springer-Verlag New York, NY, USA
-
Hastie, T., Tibshirani, R., Friedman, J., The Elements of Statistical Learning. 2nd edition, 2009, Springer-Verlag, New York, NY, USA.
-
(2009)
-
-
Hastie, T.1
Tibshirani, R.2
Friedman, J.3
-
27
-
-
0003802343
-
Classification and Regression Trees
-
Chapman and Hall New York, USA
-
Breiman, L., Friedman, J., Olsen, R., Stone, C., Classification and Regression Trees. 1984, Chapman and Hall, New York, USA.
-
(1984)
-
-
Breiman, L.1
Friedman, J.2
Olsen, R.3
Stone, C.4
-
28
-
-
77957922514
-
Variable selection using random forests
-
Genuer, R., Poggi, J., Tuleau-Malot, C., Variable selection using random forests. Pattern Recognit. Lett. 31:14 (2010), 2225–2236, 10.1016/j.patrec.2010.03.014.
-
(2010)
Pattern Recognit. Lett.
, vol.31
, Issue.14
, pp. 2225-2236
-
-
Genuer, R.1
Poggi, J.2
Tuleau-Malot, C.3
-
29
-
-
0642310183
-
Resampling fewer than n observations: gains, losses and remedies for losses
-
Bickel, P., Götze, F., van Zwet, W., Resampling fewer than n observations: gains, losses and remedies for losses. Stat. Sin. 7:1 (1997), 1–31.
-
(1997)
Stat. Sin.
, vol.7
, Issue.1
, pp. 1-31
-
-
Bickel, P.1
Götze, F.2
van Zwet, W.3
-
30
-
-
53349123556
-
On the choice of m in the m out of n bootstrap and confidence bounds for extrema
-
Bickel, P., Sakov, A., On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Stat. Sin. 18:3 (2008), 967–985 http://www3.stat.sinica.edu.tw/statistica/J18N3/J18N38/J18N38.html.
-
(2008)
Stat. Sin.
, vol.18
, Issue.3
, pp. 967-985
-
-
Bickel, P.1
Sakov, A.2
-
31
-
-
84906873734
-
On the use of MapReduce for imbalanced big data using random forest
-
del Rio, S., López, V., Benítez, J., Herrera, F., On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285 (2014), 112–137, 10.1016/j.ins.2014.03.043.
-
(2014)
Inf. Sci.
, vol.285
, pp. 112-137
-
-
del Rio, S.1
López, V.2
Benítez, J.3
Herrera, F.4
-
32
-
-
0012992955
-
Online bagging and boosting
-
M. Kaufmann Key West Florida, USA
-
Oza, N., Russel, S., Online bagging and boosting. Kaufmann, M., (eds.) Proceedings of Eighth International Workshop on Artificial Intelligence and Statistics, 2001, Key West, Florida, USA, 105–112.
-
(2001)
Proceedings of Eighth International Workshop on Artificial Intelligence and Statistics
, pp. 105-112
-
-
Oza, N.1
Russel, S.2
-
33
-
-
27944503134
-
Online Bayesian bagging
-
Lee, H., Clyde, M., Online Bayesian bagging. J. Mach. Learn. Res. 5 (2004), 143–151.
-
(2004)
J. Mach. Learn. Res.
, vol.5
, pp. 143-151
-
-
Lee, H.1
Clyde, M.2
-
34
-
-
33745697989
-
Creating non-parametric bootstrap samples using Poisson frequencies
-
Hanley, J., MacGibbon, B., Creating non-parametric bootstrap samples using Poisson frequencies. Comput. Methods Programs Biomed. 83 (2006), 57–62.
-
(2006)
Comput. Methods Programs Biomed.
, vol.83
, pp. 57-62
-
-
Hanley, J.1
MacGibbon, B.2
-
35
-
-
33646430006
-
Extremely randomized trees
-
Geurts, P., Ernst, D., Wehenkel, L., Extremely randomized trees. Mach. Learn. 63:1 (2006), 3–42, 10.1007/s10994-006-6226-1.
-
(2006)
Mach. Learn.
, vol.63
, Issue.1
, pp. 3-42
-
-
Geurts, P.1
Ernst, D.2
Wehenkel, L.3
-
36
-
-
84897502867
-
Consistency of online random forests
-
Denil, M., Matheson, D., de Freitas, N., Consistency of online random forests. Proceedings of the 30th International Conference on Machine Learning, ICML 2013, 2013, 1256–1264.
-
(2013)
Proceedings of the 30th International Conference on Machine Learning, ICML 2013
, pp. 1256-1264
-
-
Denil, M.1
Matheson, D.2
de Freitas, N.3
-
37
-
-
84971637693
-
readr: Read Tabular Data
-
R package version 0.2.2
-
Wickham, H., François, R., readr: Read Tabular Data. R package version 0.2.2 http://CRAN.R-project.org/package=readr, 2015.
-
(2015)
-
-
Wickham, H.1
François, R.2
-
38
-
-
0345040873
-
Classification and regression by randomForest
-
Liaw, A., Wiener, M., Classification and regression by randomForest. R News 2:3 (2002), 18–22 http://CRAN.R-project.org/doc/Rnews.
-
(2002)
R News
, vol.2
, Issue.3
, pp. 18-22
-
-
Liaw, A.1
Wiener, M.2
-
39
-
-
84890520049
-
Use of the zero norm with linear model and kernel methods
-
Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M., Use of the zero norm with linear model and kernel methods. J. Mach. Learn. Res. 3 (2003), 1439–1461.
-
(2003)
J. Mach. Learn. Res.
, vol.3
, pp. 1439-1461
-
-
Weston, J.1
Elisseff, A.2
Schoelkopf, B.3
Tipping, M.4
-
40
-
-
84953731160
-
foreach: Foreach looping construct for R
-
R package version 1.4.2
-
Revolution Analytics, Weston, S., foreach: Foreach looping construct for R. R package version 1.4.2 http://CRAN.R-project.org/package=foreach, 2014.
-
(2014)
-
-
Revolution Analytics1
Weston, S.2
-
41
-
-
85029452867
-
An outlier detection-based tree selection approach to extreme pruning of random forests
-
arXiv:1503.05187 arXiv preprint
-
Fawagreh, K., Gaber, M., Elyan, E., An outlier detection-based tree selection approach to extreme pruning of random forests. arXiv preprint arXiv:1503.05187, 2015.
-
(2015)
-
-
Fawagreh, K.1
Gaber, M.2
Elyan, E.3
-
42
-
-
33749018252
-
An analysis of diversity measures
-
Tang, E., Suganthan, P., Yao, X., An analysis of diversity measures. Mach. Learn. 65 (2006), 247–271.
-
(2006)
Mach. Learn.
, vol.65
, pp. 247-271
-
-
Tang, E.1
Suganthan, P.2
Yao, X.3
-
43
-
-
84890157266
-
A weighted random forests approach to improve predictive performance
-
Winham, S.J., Freimuth, R., Biernacka, J., A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. ASA Data Sci. J. 6:6 (2013), 496–505.
-
(2013)
Stat. Anal. Data Min. ASA Data Sci. J.
, vol.6
, Issue.6
, pp. 496-505
-
-
Winham, S.J.1
Freimuth, R.2
Biernacka, J.3
-
44
-
-
0031211090
-
A decision-theoretic generalization of on-line learning and an application to boosting
-
Freund, Y., Schapire, R., A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55:1 (1997), 119–139.
-
(1997)
J. Comput. Syst. Sci.
, vol.55
, Issue.1
, pp. 119-139
-
-
Freund, Y.1
Schapire, R.2
-
45
-
-
33646403804
-
Pert-perfect random tree ensembles
-
Cutler, A., Zhao, G., Pert-perfect random tree ensembles. Comput. Sci. Stat. 33 (2001), 490–497.
-
(2001)
Comput. Sci. Stat.
, vol.33
, pp. 490-497
-
-
Cutler, A.1
Zhao, G.2
-
46
-
-
54249099241
-
Consistency of random forests and other averaging classifiers
-
Biau, G., Devroye, L., Lugosi, G., Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9 (2008), 2015–2033.
-
(2008)
J. Mach. Learn. Res.
, vol.9
, pp. 2015-2033
-
-
Biau, G.1
Devroye, L.2
Lugosi, G.3
-
47
-
-
84961993377
-
Analysis of purely random forests bias
-
arXiv:1407.3939 arXiv preprint
-
Arlot, S., Genuer, R., Analysis of purely random forests bias. arXiv preprint arXiv:1407.3939, 2014.
-
(2014)
-
-
Arlot, S.1
Genuer, R.2
-
48
-
-
37349116573
-
Data Stream Management: Processing High-Speed Data Streams, Data-Centric Systems and Applications
-
Springer-Verlag Berlin, Heidelberg
-
Garofalakis, M., Gehrke, J., Rastogi, R., Data Stream Management: Processing High-Speed Data Streams, Data-Centric Systems and Applications. 2016, Springer-Verlag, Berlin, Heidelberg.
-
(2016)
-
-
Garofalakis, M.1
Gehrke, J.2
Rastogi, R.3
-
49
-
-
10044238664
-
Mining frequent patterns in data streams at multiple time granularities
-
H. Kargupta A. Joshi K. Sivakumar Y. Yesha AAAI Press/The MIT Press Menlo Park, CA, USA
-
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P., Mining frequent patterns in data streams at multiple time granularities. Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y., (eds.) Data Mining: Next Generation Challenges and Future Directions (Proceedings of the NSF Workshop on Next Generation Data Mining), 2004, AAAI Press/The MIT Press, Menlo Park, CA, USA, 191–212.
-
(2004)
Data Mining: Next Generation Challenges and Future Directions (Proceedings of the NSF Workshop on Next Generation Data Mining)
, pp. 191-212
-
-
Giannella, C.1
Han, J.2
Pei, J.3
Yan, X.4
Yu, P.5
-
50
-
-
78649394566
-
Classification using streaming random forests
-
Abdulsalam, H., Skillicorn, D., Martin, P., Classification using streaming random forests. IEEE Trans. Knowl. Data Eng. 23:1 (2011), 22–36, 10.1109/TKDE.2010.36.
-
(2011)
IEEE Trans. Knowl. Data Eng.
, vol.23
, Issue.1
, pp. 22-36
-
-
Abdulsalam, H.1
Skillicorn, D.2
Martin, P.3
|