-
2
-
-
0036590708
-
The data locality of work stealing
-
Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321-347, 2002.
-
(2002)
Theory of Computing Systems (TOCS)
, vol.35
, Issue.3
, pp. 321-347
-
-
Acar, U.A.1
Blelloch, G.E.2
Blumofe, R.D.3
-
3
-
-
0031628001
-
Thread scheduling for multiprogrammed multiprocessors
-
ACM Press
-
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA '98, pages 119-129. ACM Press, 1998.
-
(1998)
SPAA '98
, pp. 119-129
-
-
Arora, N.S.1
Blumofe, R.D.2
Greg Plaxton, C.3
-
4
-
-
0344584867
-
The natural work-stealing algorithm is stable
-
May
-
Petra Berenbrink, Tom Friedetzky, and Leslie Ann Goldberg. The natural work-stealing algorithm is stable. SIAM J. Comput., 32:1260-1279, May 2003.
-
(2003)
SIAM J. Comput.
, vol.32
, pp. 1260-1279
-
-
Berenbrink, P.1
Friedetzky, T.2
Goldberg, L.A.3
-
5
-
-
58449090994
-
Provably good multicore cache performance for divide-and-conquer algorithms
-
Guy E. Blelloch, Rezaul A. Chowdhury, Phillip B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In In the Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms, pages 501-510, 2008.
-
(2008)
In the Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms
, pp. 501-510
-
-
Blelloch, G.E.1
Chowdhury, R.A.2
Gibbons, P.B.3
Ramachandran, V.4
Chen, S.5
Kozuch, M.6
-
6
-
-
84858427811
-
Internally deterministic parallel algorithms can be fast
-
NY, USA, ACM
-
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. Internally deterministic parallel algorithms can be fast. In PPoPP '12, pages 181-192, NY, USA, 2012. ACM.
-
(2012)
PPoPP '12
, pp. 181-192
-
-
Blelloch, G.E.1
Fineman, J.T.2
Gibbons, P.B.3
Shun, J.4
-
7
-
-
0029696091
-
A provable time and space efficient implementation of NESL
-
ACM
-
Guy E. Blelloch and John Greiner. A provable time and space efficient implementation of NESL. In ICFP '96, pages 213-225. ACM, 1996.
-
(1996)
ICFP '96
, pp. 213-225
-
-
Blelloch, G.E.1
Greiner, J.2
-
9
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In PPoPP, pages 207-216, 1995.
-
(1995)
PPoPP
, pp. 207-216
-
-
Blumofe, R.D.1
Joerg, C.F.2
Kuszmaul, B.C.3
Leiserson, C.E.4
Randall, K.H.5
Zhou, Y.6
-
10
-
-
0000269759
-
Scheduling multithreaded computations by work stealing
-
September
-
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720-748, September 1999.
-
(1999)
J. ACM
, vol.46
, pp. 720-748
-
-
Blumofe, R.D.1
Leiserson, C.E.2
-
12
-
-
32144435090
-
Dynamic circular work-stealing deque
-
David Chase and Yossi Lev. Dynamic circular work-stealing deque. In SPAA '05, pages 21-28, 2005.
-
(2005)
SPAA '05
, pp. 21-28
-
-
Chase, D.1
Lev, Y.2
-
14
-
-
55849100059
-
Solving large, irregular graph problems using adaptive work-stealing
-
Guojing Cong, Sreedhar B. Kodali, Sriram Krishnamoorthy, Doug Lea, Vijay A. Saraswat, and Tong Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536-545, 2008.
-
(2008)
ICPP
, pp. 536-545
-
-
Cong, G.1
Kodali, S.B.2
Krishnamoorthy, S.3
Lea, D.4
Saraswat, V.A.5
Wen, T.6
-
15
-
-
0030786221
-
The effect of scheduling discipline on dynamic load sharing in heterogeneous distributed systems
-
0
-
Sivarama P. Dandamudi. The effect of scheduling discipline on dynamic load sharing in heterogeneous distributed systems. Modeling, Analysis, and Simulation of Computer Systems, International Symposium on, 0:17, 1997.
-
(1997)
Modeling, Analysis, and Simulation of Computer Systems, International Symposium on
, pp. 17
-
-
Dandamudi, S.P.1
-
16
-
-
34548771395
-
Dynamic load balancing of unbalanced computations using message passing
-
J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IPDPS '07. IEEE International, march 2007.
-
IPDPS '07. IEEE International, March 2007
-
-
Dinan, J.1
Olivier, S.2
Sabin, G.3
Prins, J.4
Sadayappan, P.5
Tseng, C.-W.6
-
17
-
-
74049140383
-
Scalable work stealing
-
ACM
-
James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 53:1-53:11. ACM, 2009.
-
(2009)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
-
-
Dinan, J.1
Larkins, D.B.2
Sadayappan, P.3
Krishnamoorthy, S.4
Nieplocha, J.5
-
18
-
-
0022676728
-
COMPARISON of RECEIVER-INITIATED and SENDER-INITIATED ADAPTIVE LOAD SHARING
-
DOI 10.1016/0166-5316(86)90008-8
-
Derek L. Eager, Edward D. Lazowska, and John Zahorjan. A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval., 6(1):53-68, 1986. (Pubitemid 16538292)
-
(1986)
Performance Evaluation
, vol.6
, Issue.1
, pp. 53-68
-
-
Eager, D.L.1
Lazowska, E.D.2
Zahorjan, J.3
-
19
-
-
85028891596
-
A message passing implementation of lazy task creation
-
Marc Feeley. A message passing implementation of lazy task creation. In Parallel Symbolic Computing, pages 94-107, 1992.
-
(1992)
Parallel Symbolic Computing
, pp. 94-107
-
-
Feeley, M.1
-
21
-
-
0027844215
-
Polling efficiently on stock hardware
-
NY, USA, ACM
-
Marc Feeley. Polling efficiently on stock hardware. In Proceedings of the conference on Functional programming languages and computer architecture, FPCA '93, pages 179-187, NY, USA, 1993. ACM.
-
(1993)
Proceedings of the Conference on Functional Programming Languages and Computer Architecture, FPCA '93
, pp. 179-187
-
-
Feeley, M.1
-
22
-
-
80054926287
-
Implicitly threaded parallelism in Manticore
-
Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1-40, 2011.
-
(2011)
Journal of Functional Programming
, vol.20
, Issue.5-6
, pp. 1-40
-
-
Fluet, M.1
Rainey, M.2
Reppy, J.3
Shaw, A.4
-
23
-
-
0347507496
-
The implementation of the Cilk-5 multithreaded language
-
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212-223, 1998.
-
(1998)
PLDI
, pp. 212-223
-
-
Frigo, M.1
Leiserson, C.E.2
Randall, K.H.3
-
24
-
-
84862601360
-
A performance model for x10 applications: What's going on under the hood?
-
NY, USA, ACM
-
David Grove, Olivier Tardieu, David Cunningham, Ben Herta, Igor Peshansky, and Vijay Saraswat. A performance model for x10 applications: what's going on under the hood? In Proceedings of the 2011 ACMSIGPLAN X10Workshop, pages 1:1-1:8, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the 2011 ACMSIGPLAN X10Workshop
-
-
Grove, D.1
Tardieu, O.2
Cunningham, D.3
Ben Herta, I.P.4
Saraswat, V.5
-
26
-
-
32844466488
-
A dynamic-sized nonblocking work stealing deque
-
February
-
Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Distrib. Comput., 18:189-207, February 2006.
-
(2006)
Distrib. Comput.
, vol.18
, pp. 189-207
-
-
Hendler, D.1
Lev, Y.2
Moir, M.3
Shavit, N.4
-
27
-
-
0036954275
-
Non-blocking steal-half work queues
-
Danny Hendler and Nir Shavit. Non-blocking steal-half work queues. In PODC, pages 280-289, 2002.
-
(2002)
PODC
, pp. 280-289
-
-
Hendler, D.1
Shavit, N.2
-
28
-
-
0036954486
-
Work dealing
-
ACM
-
Danny Hendler and Nir Shavit. Work dealing. In SPAA '02, pages 164-172. ACM, 2002.
-
(2002)
SPAA '02
, pp. 164-172
-
-
Hendler, D.1
Shavit, N.2
-
29
-
-
67650093461
-
Backtracking-based load balancing
-
ACM
-
Tasuku Hiraishi, Masahiro Yasugi, Seiji Umatani, and Taiichi Yuasa. Backtracking-based load balancing. In PPoPP '09, pages 55-64. ACM, 2009.
-
(2009)
PPoPP '09
, pp. 55-64
-
-
Hiraishi, T.1
Yasugi, M.2
Umatani, S.3
Yuasa, T.4
-
30
-
-
84875186606
-
-
Intel. Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk- plus/.
-
Cilk Plus
-
-
-
31
-
-
84875155436
-
-
Specifications at
-
Intel. Intel Xeon Processor X7550. Specifications at http://ark.intel. com/products/46498/Intel-Xeon-Processor-X7550-(18M-Cache-2-00-GHz-6-40-GTs- Intel- QPI).
-
Intel Xeon Processor X7550
-
-
-
32
-
-
78249264449
-
Regular, shape-polymorphic, parallel arrays in haskell
-
Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. Regular, shape-polymorphic, parallel arrays in haskell. In ICFP '10, pages 261-272, 2010.
-
(2010)
ICFP '10
, pp. 261-272
-
-
Keller, G.1
Chakravarty, M.M.T.2
Leshchinskiy, R.3
Jones, S.P.4
Lippmeier, B.5
-
33
-
-
35348855586
-
Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
-
June
-
Sanjeev Kumar, Christopher J. Hughes, and Anthony Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. SIGARCH Computer Architecture News, 35:162-173, June 2007.
-
(2007)
SIGARCH Computer Architecture News
, vol.35
, pp. 162-173
-
-
Kumar, S.1
Hughes, C.J.2
Nguyen, A.3
-
35
-
-
0024771473
-
Analysis of the effects of delays on load sharing
-
nov
-
R. Mirchandaney, D. Towsley, and J.A. Stankovic. Analysis of the effects of delays on load sharing. Computers, IEEE Transactions on, 38(11):1513 -1525, nov 1989.
-
(1989)
Computers, IEEE Transactions on
, vol.38
, Issue.11
, pp. 1513-1525
-
-
Mirchandaney, R.1
Towsley, D.2
Stankovic, J.A.3
-
36
-
-
0031635830
-
Analyses of load stealing models based on differential equations
-
NY, USA, ACM
-
Michael Mitzenmacher. Analyses of load stealing models based on differential equations. In SPAA '98, pages 212-221, NY, USA, 1998. ACM.
-
(1998)
SPAA '98
, pp. 212-221
-
-
Mitzenmacher, M.1
-
37
-
-
84987792525
-
A simple load balancing scheme for task allocation in parallel machines
-
NY, USA, ACM
-
Larry Rudolph, Miriam Slivkin-Allalouf, and Eli Upfal. A simple load balancing scheme for task allocation in parallel machines. In SPAA '91, pages 237-245, NY, USA, 1991. ACM.
-
(1991)
SPAA '91
, pp. 237-245
-
-
Rudolph, L.1
Slivkin-Allalouf, M.2
Upfal, E.3
-
38
-
-
84875199468
-
-
Technical Report. Intel Corp.
-
Bratin Saha, Ali-Reza Adl-Tabatabai, Anwar Ghuloum, Mohan Rajagopalan, Richard L. Hudson, Leaf Petersen, Vijay Menon, Brian Murphy, Tatiana Shpeisman, Jesse Fang, Eric Sprangle, Anwar Rohillah, and Doug Carmean. Enabling scalability and performance in a large scale chip multiprocessor environment. Technical Report. Intel Corp., 2006.
-
(2006)
Enabling Scalability and Performance in a Large Scale Chip Multiprocessor Environment
-
-
Saha, B.1
Adl-Tabatabai, A.-R.2
Ghuloum, A.3
Rajagopalan, M.4
Hudson, R.L.5
Petersen, L.6
Menon, V.7
Murphy, B.8
Shpeisman, T.9
Fang, J.10
Sprangle, E.11
Rohillah, A.12
Carmean, D.13
-
39
-
-
77952259532
-
Flexible architectural support for fine-grain scheduling
-
NY, USA, ACM
-
Daniel Sanchez, Richard M. Yoo, and Christos Kozyrakis. Flexible architectural support for fine-grain scheduling. In ASPLOS '10, pages 311-322, NY, USA, 2010. ACM.
-
(2010)
ASPLOS '10
, pp. 311-322
-
-
Sanchez, D.1
Yoo, R.M.2
Kozyrakis, C.3
-
40
-
-
0036395865
-
Randomized receiver initiated load-balancing algorithms for tree-shaped computations
-
Peter Sanders. Randomized receiver initiated load-balancing algorithms for tree-shaped computations. Comput. J., 45(5):561-573, 2002.
-
(2002)
Comput. J.
, vol.45
, Issue.5
, pp. 561-573
-
-
Sanders, P.1
-
41
-
-
84875158398
-
Miser - A dynamically loadable memory allocator for multi-threaded applications
-
Barry Tannenbaum. Miser - a dynamically loadable memory allocator for multi-threaded applications. Intel Software Network, 2009.
-
(2009)
Intel Software Network
-
-
Tannenbaum, B.1
-
42
-
-
78650866403
-
A tighter analysis of work stealing
-
Algorithms and Computation - 21st International Symposium, ISAAC 2010, Springer
-
Marc Tchiboukdjian, Nicolas Gast, Denis Trystram, Jean-Louis Roch, and Julien Bernard. A tighter analysis of work stealing. In Algorithms and Computation - 21st International Symposium, ISAAC 2010, volume 6507 of LNCS, pages 291-302. Springer, 2010.
-
(2010)
LNCS
, vol.6507
, pp. 291-302
-
-
Tchiboukdjian, M.1
Gast, N.2
Trystram, D.3
Roch, J.-L.4
Bernard, J.5
-
44
-
-
0242276122
-
Pursuing laziness for efficient implementation of modern multithreaded languages
-
Seiji Umatani, Masahiro Yasugi, Tsuneyasu Komiya, and Taiichi Yuasa. Pursuing laziness for efficient implementation of modern multithreaded languages. In ISHPC, pages 174-188, 2003.
-
(2003)
ISHPC
, pp. 174-188
-
-
Umatani, S.1
Yasugi, M.2
Komiya, T.3
Yuasa, T.4
|