SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGMOD International Conference on Management of Data

Volumn , Issue , 2012, Pages 793-804

Large-scale machine learning at Twitter

(2) Lin, Jimmy a Kolcz, Alek a

a TWITTER INC (United States)

Author keywords

ensembles; logistic regression; online learning; stochastic gradient descent

Indexed keywords

DATA SAMPLING; ENSEMBLE METHODS; ENSEMBLES; FEATURE GENERATION; LARGE AMOUNTS OF DATA; LOGISTIC REGRESSIONS; ONLINE LEARNING; PREDICTIVE ANALYTICS; PRODUCTION ENVIRONMENTS; SEAMLESS INTEGRATION; STOCHASTIC GRADIENT DESCENT; STORAGE FUNCTION; SUPERVISED CLASSIFICATION; USER DEFINED FUNCTIONS;

DATA WAREHOUSES; INFORMATION MANAGEMENT; LEARNING SYSTEMS; LOGISTICS; MAMMALS; SCHEDULING; SOCIAL NETWORKING (ONLINE); TIME VARYING NETWORKS;

E-LEARNING;

EID: 84862684679 PISSN: 07308078 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2213836.2213958 Document Type: Conference Paper

Times cited : (153)

References (49)

1
- 79957809015
- HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads
- A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. VLDB, 2009.
- (2009) VLDB
- Abouzeid, A.¹ Bajda-Pawlikowski, K.² Abadi, D.³ Silberschatz, A.⁴ Rasin, A.⁵

2
- 84862688930
- arXiv:1110.4198v1
- A. Agarwal, O. Chapelle, M. Dudik, and J. Langford. A reliable effective terascale linear learning system. arXiv:1110.4198v1, 2011.
- (2011) A Reliable Effective Terascale Linear Learning System
- Agarwal, A.¹ Chapelle, O.² Dudik, M.³ Langford, J.⁴

3
- 79959994432
- Efficient processing of data warehousing queries in a split execution environment
- K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and E. Paulson. Efficient processing of data warehousing queries in a split execution environment. SIGMOD, 2011.
- (2011) SIGMOD
- Bajda-Pawlikowski, K.¹ Abadi, D.² Silberschatz, A.³ Paulson, E.⁴

4
- 0345570094
- Scaling to very very large corpora for natural language disambiguation
- M. Banko and E. Brill. Scaling to very very large corpora for natural language disambiguation. ACL, 2001.
- (2001) ACL
- Banko, M.¹ Brill, E.²

5
- 80052691167
- High-precision phrase-based document classification on a modern scale
- R. Bekkerman and M. Gavish. High-precision phrase-based document classification on a modern scale. KDD, 2011.
- (2011) KDD
- Bekkerman, R.¹ Gavish, M.²

6
- 33846516584
- Springer-Verlag
- C. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, 2006.
- (2006) Pattern Recognition and Machine Learning
- Bishop, C.¹

7
- 84904136037
- Large-scale machine learning with stochastic gradient descent
- L. Bottou. Large-scale machine learning with stochastic gradient descent. COMPSTAT, 2010.
- (2010) COMPSTAT
- Bottou, L.¹

8
- 80053375619
- Large language models in machine translation
- T. Brants, A. Popat, P. Xu, F. Och, and J. Dean. Large language models in machine translation. EMNLP, 2007.
- (2007) EMNLP
- Brants, T.¹ Popat, A.² Xu, P.³ Och, F.⁴ Dean, J.⁵

9
- 0030211964
- Bagging predictors
- L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.
- (1996) Machine Learning , vol.24 , Issue.2 , pp. 123-140
- Breiman, L.¹

10
- 0346786584
- Arcing classifiers
- L. Breiman. Arcing classifiers. Annals of Statistics, 26(3):801-849, 1998.
- (1998) Annals of Statistics , vol.26 , Issue.3 , pp. 801-849
- Breiman, L.¹

11
- 0035478854
- Random forests
- L. Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.
- (2001) Machine Learning , vol.45 , Issue.1 , pp. 5-32
- Breiman, L.¹

12
- 84862676370
- PSVM: Parallel Support Vector Machines with incomplete Cholesky factorization
- Cambridge University Press
- E. Chang, H. Bai, K. Zhu, H. Wang, J. Li, and Z. Qiu. PSVM: Parallel Support Vector Machines with incomplete Cholesky factorization. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2012.
- (2012) Scaling Up Machine Learning: Parallel and Distributed Approaches
- Chang, E.¹ Bai, H.² Zhu, K.³ Wang, H.⁴ Li, J.⁵ Qiu, Z.⁶

13
- 85071319367
- Bigtable: A distributed storage system for structured data
- F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. OSDI, 2006.
- (2006) OSDI
- Chang, F.¹ Dean, J.² Ghemawat, S.³ Hsieh, W.⁴ Wallach, D.⁵ Burrows, M.⁶ Chandra, T.⁷ Fikes, A.⁸ Gruber, R.⁹

14
- 84868307166
- MAD skills: New analysis practices for big data
- J. Cohen, B. Dolan, M. Dunlap, J. Hellerstein, and C. Welton. MAD skills: New analysis practices for big data. VLDB, 2009.
- (2009) VLDB
- Cohen, J.¹ Dolan, B.² Dunlap, M.³ Hellerstein, J.⁴ Welton, C.⁵

15
- 85030321143
- MapReduce: Simplified data processing on large clusters
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
- (2004) OSDI
- Dean, J.¹ Ghemawat, S.²

16
- 84957855798
- Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce
- C. Dyer, A. Cordova, A. Mont, and J. Lin. Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. StatMT Workshop, 2008.
- StatMT Workshop, 2008
- Dyer, C.¹ Cordova, A.² Mont, A.³ Lin, J.⁴

17
- 0002593344
- Multi-interval discretization of continuous-valued attributes for classification learning
- U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. IJCAI, 1993.
- (1993) IJCAI
- Fayyad, U.¹ Irani, K.²

18
- 84862679713
- O'Reilly
- A. Gates. Programming Pig. O'Reilly, 2011.
- (2011) Programming Pig
- Gates, A.¹

19
- 77952278077
- Building a high-level dataflow system on top of MapReduce: The Pig experience
- A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience. VLDB, 2009.
- (2009) VLDB
- Gates, A.¹ Natkovich, O.² Chopra, S.³ Kamath, P.⁴ Narayanamurthy, S.⁵ Olston, C.⁶ Reed, B.⁷ Srinivasan, S.⁸ Srivastava, U.⁹

20
- 79957859069
- SystemML: Declarative machine learning on MapReduce
- A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on MapReduce. ICDE, 2011.
- (2011) ICDE
- Ghoting, A.¹ Krishnamurthy, R.² Pednault, E.³ Reinwald, B.⁴ Sindhwani, V.⁵ Tatikonda, S.⁶ Tian, Y.⁷ Vaithyanathan, S.⁸

21
- 70849126253
- The unreasonable effectiveness of data
- A. Halevy, P. Norvig, and F. Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2):8-12, 2009.
- (2009) IEEE Intelligent Systems , vol.24 , Issue.2 , pp. 8-12
- Halevy, A.¹ Norvig, P.² Pereira, F.³

22
- 85006719793
- Information platforms and the rise of the data scientist
- O'Reilly
- J. Hammerbacher. Information platforms and the rise of the data scientist. Beautiful Data: The Stories Behind Elegant Data Solutions. O'Reilly, 2009.
- (2009) Beautiful Data: The Stories behind Elegant Data Solutions
- Hammerbacher, J.¹

23
- 0003684449
- Springer
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
- (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

24
- 34247882698
- Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
- T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM TOIS, 25(2):1-27, 2007.
- (2007) ACM TOIS , vol.25 , Issue.2 , pp. 1-27
- Joachims, T.¹ Granka, L.² Pan, B.³ Hembrooke, H.⁴ Radlinski, F.⁵ Gay, G.⁶

25
- 85128719106
- Twitter sentiment analysis: The good the bad and the OMG!
- E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the OMG! ICWSM, 2011.
- (2011) ICWSM
- Kouloumpis, E.¹ Wilson, T.² Moore, J.³

26
- 14644422971
- Wiley-Interscience
- L. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
- (2004) Combining Pattern Classifiers: Methods and Algorithms
- Kuncheva, L.¹

27
- 81855213402
- Morgan & Claypool
- H. Li. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool, 2011.
- (2011) Learning to Rank for Information Retrieval and Natural Language Processing
- Li, H.¹

28
- 77956239200
- Morgan & Claypool
- J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan & Claypool, 2010.
- (2010) Data-Intensive Text Processing with MapReduce
- Lin, J.¹ Dyer, C.²

29
- 79961034447
- Full-text indexing for optimizing selection operations in large-scale data analytics
- J. Lin, D. Ryaboy, and K. Weil. Full-text indexing for optimizing selection operations in large-scale data analytics. MAPREDUCE Workshop, 2011.
- MAPREDUCE Workshop, 2011
- Lin, J.¹ Ryaboy, D.² Weil, K.³

30
- 79959945877
- Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework
- Y. Lin, D. Agrawal, C. Chen, B. Ooi, and S. Wu. Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework. SIGMOD, 2011.
- (2011) SIGMOD
- Lin, Y.¹ Agrawal, D.² Chen, C.³ Ooi, B.⁴ Wu, S.⁵

31
- 84864232092
- Data infrastructure at LinkedIn
- LinkedIn
- LinkedIn. Data infrastructure at LinkedIn. ICDE, 2012.
- (2012) ICDE

32
- 80052652249
- Efficient large-scale distributed training of conditional maximum entropy models
- G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. NIPS, 2009.
- (2009) NIPS
- Mann, G.¹ McDonald, R.² Mohri, M.³ Silberman, N.⁴ Walker, D.⁵

33
- 80052650170
- Distributed training strategies for the structured perceptron
- R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. HLT, 2010.
- (2010) HLT
- McDonald, R.¹ Hall, K.² Mann, G.³

34
- 79957860992
- Distributed cube materialization on holistic measures
- A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Distributed cube materialization on holistic measures. ICDE, 2011.
- (2011) ICDE
- Nandi, A.¹ Yu, C.² Bohannon, P.³ Ramakrishnan, R.⁴

35
- 52649085804
- Map-reduce for machine learning on multicore
- A. Ng, G. Bradski, C.-T. Chu, K. Olukotun, S. Kim, Y.-A. Lin, and Y. Yu. Map-reduce for machine learning on multicore. NIPS, 2006.
- (2006) NIPS
- Ng, A.¹ Bradski, G.² Chu, C.-T.³ Olukotun, K.⁴ Kim, S.⁵ Lin, Y.-A.⁶ Yu, Y.⁷

36
- 84890614558
- From Tweets to polls: Linking text sentiment to public opinion time series
- B. O'Connor, R. Balasubramanyan, B. Routledge, and N. Smith. From Tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 2010.
- (2010) ICWSM
- O'Connor, B.¹ Balasubramanyan, R.² Routledge, B.³ Smith, N.⁴

37
- 55349148888
- Pig Latin: A not-so-foreign language for data processing
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.
- (2008) SIGMOD
- Olston, C.¹ Reed, B.² Srivastava, U.³ Kumar, R.⁴ Tomkins, A.⁵

38
- 85028156346
- Twitter as a corpus for sentiment analysis and opinion mining
- A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. LREC, 2010.
- (2010) LREC
- Pak, A.¹ Paroubek, P.²

39
- 84862681315
- MapReduce and its application to massively parallel learning of decision tree ensembles
- Cambridge University Press
- B. Panda, J. Herbach, S. Basu, and R. Bayardo. MapReduce and its application to massively parallel learning of decision tree ensembles. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2012.
- (2012) Scaling Up Machine Learning: Parallel and Distributed Approaches
- Panda, B.¹ Herbach, J.² Basu, S.³ Bayardo, R.⁴

40
- 48449095896
- Opinion mining and sentiment analysis
- B. Pang and L. Lee. Opinion mining and sentiment analysis. FnTIR, 2(1-2):1-135, 2008.
- (2008) FnTIR , vol.2 , Issue.1-2 , pp. 1-135
- Pang, B.¹ Lee, L.²

41
- 84862695896
- O'Reilly
- D. Patil. Building Data Science Teams. O'Reilly, 2011.
- (2011) Building Data Science Teams
- Patil, D.¹

42
- 80052672008
- Detecting adversarial advertisements in the wild
- D. Sculley, M. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, and Y. Zhou. Detecting adversarial advertisements in the wild. KDD, 2011.
- (2011) KDD
- Sculley, D.¹ Otey, M.² Pohl, M.³ Spitznagel, B.⁴ Hainsworth, J.⁵ Zhou, Y.⁶

43
- 48849117633
- Pegasos: Primal estimated sub-gradient solver for SVM
- Y. Singer and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. ICML, 2007.
- (2007) ICML
- Singer, Y.¹ Srebro, N.²

44
- 80052119994
- An architecture for parallel topic models
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB, 2010.
- (2010) VLDB
- Smola, A.¹ Narayanamurthy, S.²

45
- 80053360508
- Cheap and fast - But is it good? Evaluating non-expert annotations for natural language tasks
- R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. EMNLP, 2008.
- (2008) EMNLP
- Snow, R.¹ O'Connor, B.² Jurafsky, D.³ Ng, A.⁴

46
- 84857171809
- Large-scale learning to rank using boosted decision trees
- Cambridge University Press
- K. Svore and C. Burges. Large-scale learning to rank using boosted decision trees. Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, 2012.
- (2012) Scaling Up Machine Learning: Parallel and Distributed Approaches
- Svore, K.¹ Burges, C.²

47
- 77952775707
- Hive - A petabyte scale data warehouse using Hadoop
- A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using Hadoop. ICDE, 2010.
- (2010) ICDE
- Thusoo, A.¹ Sarma, J.² Jain, N.³ Shao, Z.⁴ Chakka, P.⁵ Zhang, N.⁶ Anthony, S.⁷ Liu, H.⁸ Murthy, R.⁹

48
- 84862695898
- Machine learning in ScalOps, a higher order cloud computing language
- M. Weimer, T. Condie, and R. Ramakrishnan. Machine learning in ScalOps, a higher order cloud computing language. Big Learning Workshop, 2011.
- Big Learning Workshop, 2011
- Weimer, M.¹ Condie, T.² Ramakrishnan, R.³

49
- 82155187960
- Technical Report UCB/EECS-2011-82, Berkeley
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Technical Report UCB/EECS-2011-82, Berkeley, 2011.
- (2011) Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing
- Zaharia, M.¹ Chowdhury, M.² Das, T.³ Dave, A.⁴ Ma, J.⁵ McCauley, M.⁶ Franklin, M.⁷ Shenker, S.⁸ Stoica, I.⁹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.