SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 24, Issue 12, 2013, Pages 2334-2343

Adaptive cache aware bitier work-stealing in multisocket multicore architectures

(3) Chen, Quan a Guo, Minyi a Huang, Zhiyi b

a SHANGHAI JIAO TONG UNIVERSITY (China)

b UNIVERSITY OF OTAGO (New Zealand)

Author keywords

Cache aware; Divide and conquer programs; Multisocket multicore architectures; Work stealing

Indexed keywords

CACHE AWARE; DIRECTED ACYCLIC GRAPH (DAG); DIVIDE AND CONQUER; MEMORY FOOTPRINT; MEMORY-BOUND APPLICATIONS; MULTICORE ARCHITECTURES; SHARED CACHE; WORK-STEALING;

SCHEDULING;

SOFTWARE ARCHITECTURE;

EID: 84887962188 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2012.322 Document Type: Article

Times cited : (17)

References (31)

1
- 0036590708
- The data locality of work stealing
- DOI 10.1007/s00224-002-1057-3
- U. Acar, G. Blelloch, and R. Blumofe, "The Data Locality of Work Stealing," Theory of Computing Systems, vol. 35, no. 3, pp. 321-347, 2002. (Pubitemid 36156138)
- (2002) Theory of Computing Systems , vol.35 , Issue.3 , pp. 321-347
- Acar, U.A.¹ Blelloch, G.E.² Blumofe, R.D.³

2
- 60449097203
- The design of OpenMP tasks
- Mar
- E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang, "The Design of OpenMP Tasks," IEEE Trans. Parallel and Distributed Systems, vol. 20, no. 3, pp. 404-418, Mar. 2009.
- (2009) IEEE Trans. Parallel and Distributed Systems , vol.20 , Issue.3 , pp. 404-418
- Ayguadé, E.¹ Copty, N.² Duran, A.³ Hoeflinger, J.⁴ Lin, Y.⁵ Massaioli, F.⁶ Teruel, X.⁷ Unnikrishnan, P.⁸ Zhang, G.⁹

3
- 32844456410
- Online performance analysis by statistical sampling of microprocessor performance counters
- DOI 10.1145/1088149.1088163, ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
- R. Azimi, M. Stumm, and R. Wisniewski, "Online Performance Analysis by Statistical Sampling of Microprocessor Performance Counters," Proc. 19th Ann. Int'l Conf. Supercomputing, pp. 101-110, 2005. (Pubitemid 43251314)
- (2005) Proceedings of the International Conference on Supercomputing , pp. 101-110
- Azimi, R.¹ Stumm, M.² Wisniewski, R.W.³

4
- 48749141209
- Adaptive mesh refinement for hyperbolic partial differential equations
- M. Berger and J. Oliger, "Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations," J. Computational Physics, vol. 53, no. 3, pp. 484-512, 1984.
- (1984) J. Computational Physics , vol.53 , Issue.3 , pp. 484-512
- Berger, M.¹ Oliger, J.²

5
- 58449090994
- Provably good multicore cache performance for divide-and-conquer algorithms
- G. Blelloch, R. Chowdhury, P. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch, "Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms," Proc. 19th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 501-510, 2008.
- (2008) Proc. 19th Ann. ACM-SIAM Symp. Discrete Algorithms , pp. 501-510
- Blelloch, G.¹ Chowdhury, R.² Gibbons, P.³ Ramachandran, V.⁴ Chen, S.⁵ Kozuch, M.⁶

6
- 79959676391
- Scheduling irregular parallel computations on hierarchical caches
- June
- G. Blelloch, J. Fineman, P. Gibbons, and H.V. Simhadri, "Scheduling Irregular Parallel Computations on Hierarchical Caches," Proc. 20th ACM Symp. Parallel Algorithms and Architectures, June 2011.
- (2011) Proc. 20th ACM Symp. Parallel Algorithms and Architectures
- Blelloch, G.¹ Fineman, J.² Gibbons, P.³ Simhadri, H.V.⁴

7
- 77954942935
- Low depth cache-oblivious algorithms
- G. Blelloch, P. Gibbons, and H. Simhadri, "Low Depth Cache-Oblivious Algorithms," Proc. 22nd ACM Symp. Parallelism in Algorithms and Architectures, pp. 189-199, 2010.
- (2010) Proc. 22nd ACM Symp. Parallelism in Algorithms and Architectures , pp. 189-199
- Blelloch, G.¹ Gibbons, P.² Simhadri, H.³

8
- 0003459808
- PhD thesis, Dept. of Electrical Eng. and Computer Science, Massachusetts Inst. of Technology, Technical Report MIT/LCS/TR-677, MIT Laboratory for Computer Science, Sept
- R.D. Blumofe, "Executing Multithreaded Programs Efficiently," PhD thesis, Dept. of Electrical Eng. and Computer Science, Massachusetts Inst. of Technology, Technical Report MIT/LCS/TR-677, MIT Laboratory for Computer Science, Sept. 1995.
- (1995) Executing Multithreaded Programs Efficiently
- Blumofe, R.D.¹

9
- 0030601279
- Cilk: An efficient multithreaded runtime system
- DOI 10.1006/jpdc.1996.0107
- R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System," J. Parallel and Distributed computing, vol. 37, no. 1, pp. 55-69, Aug. 1996. (Pubitemid 126167766)
- (1996) Journal of Parallel and Distributed Computing , vol.37 , Issue.1 , pp. 55-69
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

10
- 0004224686
- Addison-Wesley Longman Publishing Co., Inc
- D. Butenhof, Programming with POSIX Threads. Addison-Wesley Longman Publishing Co., Inc., 1997.
- (1997) Programming with POSIX Threads.
- Butenhof, D.¹

11
- 32144435090
- Dynamic circular work-stealing deque
- DOI 10.1145/1073970.1073974, SPAA 2005 - Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures
- D. Chase and Y. Lev, "Dynamic Circular Work-Stealing Deque," Proc. 17th Ann. ACM Symp. Parallelism Algorithms and Architectures, pp. 21-28, 2005. (Pubitemid 43206548)
- (2005) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 21-28
- Chase, D.¹ Lev, Y.²

12
- 84864066885
- Cats: Cache aware task-stealing based on online profiling in multi-socket multi-core architectures
- Q. Chen, M. Guo, and Z. Huang, "Cats: Cache Aware Task-Stealing Based on Online Profiling in Multi-Socket Multi-Core Architectures," Proc. 26th Int'l Conf. Supercomputing, pp 163-172, 2012.
- (2012) Proc. 26th Int'l Conf. Supercomputing , pp. 163-172
- Chen, Q.¹ Guo, M.² Huang, Z.³

13
- 80155183145
- CAB: CachE-Aware bi-tier task-stealing in multi-socket multi-core architecture
- Q. Chen, Z. Huang, M. Guo, and J. Zhou, "CAB: CachE-Aware Bi-Tier Task-Stealing In Multi-Socket Multi-Core Architecture," Proc. 40th Int'l Conf. Parallel Processing, pp. 722-732, 2011.
- (2011) Proc. 40th Int'l Conf. Parallel Processing , pp. 722-732
- Chen, Q.¹ Huang, Z.² Guo, M.³ Zhou, J.⁴

14
- 35248852476
- Scheduling threads for constructive cache sharing on CMPs
- DOI 10.1145/1248377.1248396, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
- S. Chen et al., "Scheduling Threads for Constructive Cache Sharing on CMPs," Proc. 19th Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 105-115, 2007. (Pubitemid 47568559)
- (2007) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 105-115
- Chen, S.¹ Gibbons, P.B.² Kozuch, M.³ Liaskovitis, V.⁴ Ailamaki, A.⁵ Blelloch, G.E.⁶ Falsafi, B.⁷ Fix, L.⁸ Hardavellas, N.⁹ Mowry, T.C.¹⁰ Wilkerson, C.¹¹

15
- 84864050289
- Analysis of randomized work stealing with false sharing
- Mar
- R. Cole and V. Ramachandran, "Analysis of Randomized Work Stealing with False Sharing," ArXiv e-prints, Mar. 2011.
- (2011) ArXiv E-prints
- Cole, R.¹ Ramachandran, V.²

16
- 79952789017
- ULCC: A user-level facility for optimizing shared cache performance on multicores
- X. Ding, K. Wang, and X. Zhang, "ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores," Proc. ACM SIGPLAN Symp. Principles and Practice Parallel Programming, pp. 103-112, 2011.
- (2011) Proc. ACM SIGPLAN Symp. Principles and Practice Parallel Programming , pp. 103-112
- Ding, X.¹ Wang, K.² Zhang, X.³

17
- 44049113422
- A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors
- A. Gerasoulis and T. Yang, "A Comparison of Clustering Heuristics for Scheduling Directed Acyclic Graphs on Multiprocessors," J. Parallel and Distributed Computing, vol. 16, no. 4, pp. 276-291, 1992.
- (1992) J. Parallel and Distributed Computing , vol.16 , Issue.4 , pp. 276-291
- Gerasoulis, A.¹ Yang, T.²

18
- 0003417929
- MIT Press
- W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, 1999.
- (1999) Using MPI: Portable Parallel Programming with the Message Passing Interface.
- Gropp, W.¹ Lusk, E.² Skjellum, A.³

19
- 70450029262
- Work-first and help-first scheduling policies for async-finish task parallelism
- Y. Guo, R. Barik, R. Raman, and V. Sarkar, "Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism," Proc. IEEE 23th Int'l Parallel and Distributed Processing Symp., pp. 1-12, 2009.
- (2009) Proc. IEEE 23th Int'l Parallel and Distributed Processing Symp. , pp. 1-12
- Guo, Y.¹ Barik, R.² Raman, R.³ Sarkar, V.⁴

20
- 77953967811
- Slaw: A scalable locality-aware adaptive work-stealing scheduler
- Y. Guo, J. Zhao, V. Cave, and V. Sarkar, "Slaw: A Scalable Locality-Aware Adaptive Work-Stealing Scheduler," Proc. IEEE 24th Int'l Parallel and Distributed Processing Symp., pp. 1-12, 2010.
- (2010) Proc. IEEE 24th Int'l Parallel and Distributed Processing Symp. , pp. 1-12
- Guo, Y.¹ Zhao, J.² Cave, V.³ Sarkar, V.⁴

21
- 32844470883
- A dynamic-sized nonblocking work stealing deque
- Sun Microsystems, Inc
- D. Hendler, Y. Lev, M. Moir, and N. Shavit, "A Dynamic-Sized Nonblocking Work Stealing Deque," Technical Report TR-2005-144, Sun Microsystems, Inc., p. 69, 2005.
- (2005) Technical Report TR-2005-144 , pp. 69
- Hendler, D.¹ Lev, Y.² Moir, M.³ Shavit, N.⁴

22
- 0036954275
- Non-blocking steal-half work queues
- D. Hendler and N. Shavit, "Non-Blocking Steal-Half Work Queues," Proc. 21th Ann. Symp. Principles Distributed Computing, pp. 280-289, 2002.
- (2002) Proc. 21th Ann. Symp. Principles Distributed Computing , pp. 280-289
- Hendler, D.¹ Shavit, N.²

23
- 0034593391
- A java fork/join framework
- D. Lea, "A Java Fork/Join Framework," Proc. ACM Conf. Java Grande, pp. 36-43, 2000.
- (2000) Proc. ACM Conf. Java Grande , pp. 36-43
- Lea, D.¹

24
- 77957593108
- Featherweight x10: A core calculus for async-finish parallelism
- J. Lee and J. Palsberg, "Featherweight X10: A Core Calculus for Async-Finish Parallelism," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Computing, pp. 25-36, 2010.
- (2010) Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Computing , pp. 25-36
- Lee, J.¹ Palsberg, J.²

25
- 72249096886
- The design of a task parallel library
- D. Leijen, W. Schulte, and S. Burckhardt, "The Design of a Task Parallel Library," ACM SIGPLAN Notices, vol. 44, no. 10, pp. 227-242, 2009.
- (2009) ACM SIGPLAN Notices , vol.44 , Issue.10 , pp. 227-242
- Leijen, D.¹ Schulte, W.² Burckhardt, S.³

26
- 70350733812
- The cilk++ concurrency platform
- C. Leiserson, "The Cilk++ Concurrency Platform," Proc. 46th Ann. Design Automation Conf., pp. 522-527, 2009.
- (2009) Proc. 46th Ann. Design Automation Conf. , pp. 522-527
- Leiserson, C.¹

27
- 67650093463
- Idempotent work stealing
- M.M. Michael, M.T. Vechev, and V.A. Saraswat, "Idempotent Work Stealing," Proc. 14th ACM SIGPLAN Symp. Principles and Practice Parallel Programming, pp. 45-54, 2009.
- (2009) Proc. 14th ACM SIGPLAN Symp. Principles and Practice Parallel Programming , pp. 45-54
- Michael, M.M.¹ Vechev, M.T.² Saraswat, V.A.³

28
- 78349246727
- Hierarchical work-stealing
- J.-N. Quintin and F. Wagner, "Hierarchical Work-Stealing," Proc. 16th Int'l Euro-Par Conf. Parallel processing: Part I, pp. 217-229, 2010.
- (2010) Proc. 16th Int'l Euro-Par Conf. Parallel Processing: Part i , pp. 217-229
- Quintin, J.-N.¹ Wagner, F.²

29
- 43149087461
- O'Reilly
- J. Reinders, Intel Threading Building Blocks. O'Reilly, 2007.
- (2007) Intel Threading Building Blocks.
- Reinders, J.¹

30
- 77953990150
- An adaptive task creation strategy for work-stealing scheduling
- L. Wang, H. Cui, Y. Duan, F. Lu, X. Feng, and P. Yew, "An Adaptive Task Creation Strategy for Work-Stealing Scheduling," Proc. IEEE/ACM Eighth Ann. Int'l Symp. Code Generation and Optimization, pp. 266-277, 2010.
- (2010) Proc. IEEE/ACM Eighth Ann. Int'l Symp. Code Generation and Optimization , pp. 266-277
- Wang, L.¹ Cui, H.² Duan, Y.³ Lu, F.⁴ Feng, X.⁵ Yew, P.⁶

31
- 55849143328
- Maotai: View-oriented parallel programming on CMT processors
- J. Zhang, Z. Huang, W. Chen, Q. Huang, and W. Zheng, "Maotai: View-Oriented Parallel Programming on CMT Processors," Proc. 37th Int'l Conf. Parallel Processing, pp. 636-643, 2008.
- (2008) Proc. 37th Int'l Conf. Parallel Processing , pp. 636-643
- Zhang, J.¹ Huang, Z.² Chen, W.³ Huang, Q.⁴ Zheng, W.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.