-
1
-
-
79957809015
-
HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads
-
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. VLDB, 2009.
-
(2009)
VLDB
-
-
Abouzeid, A.1
Bajda-Pawlikowski, K.2
Abadi, D.3
Silberschatz, A.4
Rasin, A.5
-
4
-
-
0345570094
-
Scaling to very very large corpora for natural language disambiguation
-
M. Banko and E. Brill. Scaling to very very large corpora for natural language disambiguation. ACL, 2001.
-
(2001)
ACL
-
-
Banko, M.1
Brill, E.2
-
5
-
-
80052691167
-
High-precision phrase-based document classification on a modern scale
-
R. Bekkerman and M. Gavish. High-precision phrase-based document classification on a modern scale. KDD, 2011.
-
(2011)
KDD
-
-
Bekkerman, R.1
Gavish, M.2
-
7
-
-
84904136037
-
Large-scale machine learning with stochastic gradient descent
-
L. Bottou. Large-scale machine learning with stochastic gradient descent. COMPSTAT, 2010.
-
(2010)
COMPSTAT
-
-
Bottou, L.1
-
8
-
-
80053375619
-
Large language models in machine translation
-
T. Brants, A. Popat, P. Xu, F. Och, and J. Dean. Large language models in machine translation. EMNLP, 2007.
-
(2007)
EMNLP
-
-
Brants, T.1
Popat, A.2
Xu, P.3
Och, F.4
Dean, J.5
-
9
-
-
0030211964
-
Bagging predictors
-
L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.
-
(1996)
Machine Learning
, vol.24
, Issue.2
, pp. 123-140
-
-
Breiman, L.1
-
10
-
-
0346786584
-
Arcing classifiers
-
L. Breiman. Arcing classifiers. Annals of Statistics, 26(3):801-849, 1998.
-
(1998)
Annals of Statistics
, vol.26
, Issue.3
, pp. 801-849
-
-
Breiman, L.1
-
11
-
-
0035478854
-
Random forests
-
L. Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.
-
(2001)
Machine Learning
, vol.45
, Issue.1
, pp. 5-32
-
-
Breiman, L.1
-
12
-
-
84862676370
-
PSVM: Parallel Support Vector Machines with incomplete Cholesky factorization
-
Cambridge University Press
-
E. Chang, H. Bai, K. Zhu, H. Wang, J. Li, and Z. Qiu. PSVM: Parallel Support Vector Machines with incomplete Cholesky factorization. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2012.
-
(2012)
Scaling Up Machine Learning: Parallel and Distributed Approaches
-
-
Chang, E.1
Bai, H.2
Zhu, K.3
Wang, H.4
Li, J.5
Qiu, Z.6
-
13
-
-
85071319367
-
Bigtable: A distributed storage system for structured data
-
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. OSDI, 2006.
-
(2006)
OSDI
-
-
Chang, F.1
Dean, J.2
Ghemawat, S.3
Hsieh, W.4
Wallach, D.5
Burrows, M.6
Chandra, T.7
Fikes, A.8
Gruber, R.9
-
14
-
-
84868307166
-
MAD skills: New analysis practices for big data
-
J. Cohen, B. Dolan, M. Dunlap, J. Hellerstein, and C. Welton. MAD skills: New analysis practices for big data. VLDB, 2009.
-
(2009)
VLDB
-
-
Cohen, J.1
Dolan, B.2
Dunlap, M.3
Hellerstein, J.4
Welton, C.5
-
15
-
-
85030321143
-
MapReduce: Simplified data processing on large clusters
-
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
-
(2004)
OSDI
-
-
Dean, J.1
Ghemawat, S.2
-
16
-
-
84957855798
-
Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce
-
C. Dyer, A. Cordova, A. Mont, and J. Lin. Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. StatMT Workshop, 2008.
-
StatMT Workshop, 2008
-
-
Dyer, C.1
Cordova, A.2
Mont, A.3
Lin, J.4
-
17
-
-
0002593344
-
Multi-interval discretization of continuous-valued attributes for classification learning
-
U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. IJCAI, 1993.
-
(1993)
IJCAI
-
-
Fayyad, U.1
Irani, K.2
-
19
-
-
77952278077
-
Building a high-level dataflow system on top of MapReduce: The Pig experience
-
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience. VLDB, 2009.
-
(2009)
VLDB
-
-
Gates, A.1
Natkovich, O.2
Chopra, S.3
Kamath, P.4
Narayanamurthy, S.5
Olston, C.6
Reed, B.7
Srinivasan, S.8
Srivastava, U.9
-
20
-
-
79957859069
-
SystemML: Declarative machine learning on MapReduce
-
A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on MapReduce. ICDE, 2011.
-
(2011)
ICDE
-
-
Ghoting, A.1
Krishnamurthy, R.2
Pednault, E.3
Reinwald, B.4
Sindhwani, V.5
Tatikonda, S.6
Tian, Y.7
Vaithyanathan, S.8
-
21
-
-
70849126253
-
The unreasonable effectiveness of data
-
A. Halevy, P. Norvig, and F. Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2):8-12, 2009.
-
(2009)
IEEE Intelligent Systems
, vol.24
, Issue.2
, pp. 8-12
-
-
Halevy, A.1
Norvig, P.2
Pereira, F.3
-
23
-
-
0003684449
-
-
Springer
-
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
-
(2009)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
-
-
Hastie, T.1
Tibshirani, R.2
Friedman, J.3
-
24
-
-
34247882698
-
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
-
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM TOIS, 25(2):1-27, 2007.
-
(2007)
ACM TOIS
, vol.25
, Issue.2
, pp. 1-27
-
-
Joachims, T.1
Granka, L.2
Pan, B.3
Hembrooke, H.4
Radlinski, F.5
Gay, G.6
-
25
-
-
85128719106
-
Twitter sentiment analysis: The good the bad and the OMG!
-
E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the OMG! ICWSM, 2011.
-
(2011)
ICWSM
-
-
Kouloumpis, E.1
Wilson, T.2
Moore, J.3
-
29
-
-
79961034447
-
Full-text indexing for optimizing selection operations in large-scale data analytics
-
J. Lin, D. Ryaboy, and K. Weil. Full-text indexing for optimizing selection operations in large-scale data analytics. MAPREDUCE Workshop, 2011.
-
MAPREDUCE Workshop, 2011
-
-
Lin, J.1
Ryaboy, D.2
Weil, K.3
-
30
-
-
79959945877
-
Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework
-
Y. Lin, D. Agrawal, C. Chen, B. Ooi, and S. Wu. Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework. SIGMOD, 2011.
-
(2011)
SIGMOD
-
-
Lin, Y.1
Agrawal, D.2
Chen, C.3
Ooi, B.4
Wu, S.5
-
31
-
-
84864232092
-
Data infrastructure at LinkedIn
-
LinkedIn
-
LinkedIn. Data infrastructure at LinkedIn. ICDE, 2012.
-
(2012)
ICDE
-
-
-
32
-
-
80052652249
-
Efficient large-scale distributed training of conditional maximum entropy models
-
G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. NIPS, 2009.
-
(2009)
NIPS
-
-
Mann, G.1
McDonald, R.2
Mohri, M.3
Silberman, N.4
Walker, D.5
-
33
-
-
80052650170
-
Distributed training strategies for the structured perceptron
-
R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. HLT, 2010.
-
(2010)
HLT
-
-
McDonald, R.1
Hall, K.2
Mann, G.3
-
35
-
-
52649085804
-
Map-reduce for machine learning on multicore
-
A. Ng, G. Bradski, C.-T. Chu, K. Olukotun, S. Kim, Y.-A. Lin, and Y. Yu. Map-reduce for machine learning on multicore. NIPS, 2006.
-
(2006)
NIPS
-
-
Ng, A.1
Bradski, G.2
Chu, C.-T.3
Olukotun, K.4
Kim, S.5
Lin, Y.-A.6
Yu, Y.7
-
36
-
-
84890614558
-
From Tweets to polls: Linking text sentiment to public opinion time series
-
B. O'Connor, R. Balasubramanyan, B. Routledge, and N. Smith. From Tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 2010.
-
(2010)
ICWSM
-
-
O'Connor, B.1
Balasubramanyan, R.2
Routledge, B.3
Smith, N.4
-
37
-
-
55349148888
-
Pig Latin: A not-so-foreign language for data processing
-
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.
-
(2008)
SIGMOD
-
-
Olston, C.1
Reed, B.2
Srivastava, U.3
Kumar, R.4
Tomkins, A.5
-
38
-
-
85028156346
-
Twitter as a corpus for sentiment analysis and opinion mining
-
A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. LREC, 2010.
-
(2010)
LREC
-
-
Pak, A.1
Paroubek, P.2
-
40
-
-
48449095896
-
Opinion mining and sentiment analysis
-
B. Pang and L. Lee. Opinion mining and sentiment analysis. FnTIR, 2(1-2):1-135, 2008.
-
(2008)
FnTIR
, vol.2
, Issue.1-2
, pp. 1-135
-
-
Pang, B.1
Lee, L.2
-
42
-
-
80052672008
-
Detecting adversarial advertisements in the wild
-
D. Sculley, M. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, and Y. Zhou. Detecting adversarial advertisements in the wild. KDD, 2011.
-
(2011)
KDD
-
-
Sculley, D.1
Otey, M.2
Pohl, M.3
Spitznagel, B.4
Hainsworth, J.5
Zhou, Y.6
-
43
-
-
48849117633
-
Pegasos: Primal estimated sub-gradient solver for SVM
-
Y. Singer and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. ICML, 2007.
-
(2007)
ICML
-
-
Singer, Y.1
Srebro, N.2
-
44
-
-
80052119994
-
An architecture for parallel topic models
-
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB, 2010.
-
(2010)
VLDB
-
-
Smola, A.1
Narayanamurthy, S.2
-
45
-
-
80053360508
-
Cheap and fast - But is it good? Evaluating non-expert annotations for natural language tasks
-
R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. EMNLP, 2008.
-
(2008)
EMNLP
-
-
Snow, R.1
O'Connor, B.2
Jurafsky, D.3
Ng, A.4
-
47
-
-
77952775707
-
Hive - A petabyte scale data warehouse using Hadoop
-
A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using Hadoop. ICDE, 2010.
-
(2010)
ICDE
-
-
Thusoo, A.1
Sarma, J.2
Jain, N.3
Shao, Z.4
Chakka, P.5
Zhang, N.6
Anthony, S.7
Liu, H.8
Murthy, R.9
-
49
-
-
82155187960
-
-
Technical Report UCB/EECS-2011-82, Berkeley
-
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Technical Report UCB/EECS-2011-82, Berkeley, 2011.
-
(2011)
Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing
-
-
Zaharia, M.1
Chowdhury, M.2
Das, T.3
Dave, A.4
Ma, J.5
McCauley, M.6
Franklin, M.7
Shenker, S.8
Stoica, I.9
|