SCOPUS 정보 검색 플랫폼

ACM SIGPLAN Notices

Volumn 46, Issue 8, 2011, Pages 103-112

ULCC: A user-level facility for optimizing shared cache performance on multicores

(3) Ding, Xiaoning a Wang, Kaibo a Zhang, Xiaodong a

a OHIO STATE UNIVERSITY (United States)

Author keywords

Cache; Multicore; Scientific Computing

Indexed keywords

APPLICATION EXECUTION; APPLICATION PERFORMANCE; CACHE; CACHE CONTROL; CACHE OPTIMIZATION; CACHE POLLUTION; CACHE-CONSCIOUS; CRITICAL ISSUES; DATA SETS; EXECUTION TIME; MEMORY ACCESS; MULTI CORE; MULTI-CORE PROCESSOR; MULTI-CORES; MULTI-THREADED PROGRAMS; MULTIPLE-CASE STUDY; PERFORMANCE IMPROVEMENTS; SCIENTIFIC APPLICATIONS; SHARED CACHE; SHARED SPACES; SOFTWARE RUNTIME; USER LEVELS;

ALGORITHMS; BUFFER STORAGE; OPTIMIZATION; POLLUTION; PROGRAM PROCESSORS;

MULTICORE PROGRAMMING;

EID: 80053979318 PISSN: 15232867 EISSN: None Source Type: Journal
DOI: 10.1145/2038037.1941568 Document Type: Conference Paper

Times cited : (27)

References (35)

1
- 80053933239
- NAS parallel benchmarks in OpenMP. URL
- NAS parallel benchmarks in OpenMP. URL http://phase.hpcc. jp/Omni/benchmarks/NPB/index.html.

2
- 0025536635
- LAPACK: A portable linear algebra library for high-performance computers
- E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, and D. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. In SC '90, pages 2-11, 1990. (Pubitemid 21675291)
- (1990) Proc Supercomput 90 , pp. 2-11
- Anderson, E.¹ Bai, Z.² Dongarra, J.³ Greenbaum, A.⁴ McKenney, A.⁵ Du Croz, J.⁶ Hammarling, S.⁷ Demmel, J.⁸ Bischof, C.⁹ Sorensen, D.¹⁰

3
- 19044386208
- An Updated Set of Basic Linear Algebra Subprograms (BLAS)
- DOI 10.1145/567806.567807
- L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw., 28(2):135-151, 2002. (Pubitemid 135701673)
- (2002) ACM Transactions on Mathematical Software , vol.28 , Issue.2 , pp. 135-151
- Blackford, L.S.¹ Demmel, J.² Dongarra, J.³ Duff, I.⁴ Hammarling, S.⁵ Henry, G.⁶ Heroux, M.⁷ Kaufman, L.⁸ Lumsdaine, A.⁹ Petitet, A.¹⁰ Pozo, R.¹¹ Remington, K.¹² Whaley, R.C.¹³

4
- 35248852476
- Scheduling threads for constructive cache sharing on CMPs
- DOI 10.1145/1248377.1248396, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
- S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA'07, pages 105-115, 2007. (Pubitemid 47568559)
- (2007) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 105-115
- Chen, S.¹ Gibbons, P.B.² Kozuch, M.³ Liaskovitis, V.⁴ Ailamaki, A.⁵ Blelloch, G.E.⁶ Falsafi, B.⁷ Fix, L.⁸ Hardavellas, N.⁹ Mowry, T.C.¹⁰ Wilkerson, C.¹¹

5
- 17244375796
- Cache-conscious structure layout
- T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99, pages 1-12, 1999. (Pubitemid 129686073)
- (1999) SIGPLAN Notices (ACM Special Interest Group on Programming Languages) , vol.34 , Issue.5 , pp. 1-12
- Chilimbi, T.M.¹ Hill, M.D.² Larus, J.R.³

6
- 80053954369
- HP Corp. Perfmon project. URL
- HP Corp. Perfmon project. URL http://www.hpl.hp.com/ research/linux/ perfmon.

7
- 63549085110
- Analysis and approximation of optimal co-scheduling on chip multiprocessors
- Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT'08, pages 220-229, 2008.
- (2008) PACT'08 , pp. 220-229
- Jiang, Y.¹ Shen, X.² Chen, J.³ Tripathi, R.⁴

8
- 0033894726
- Dynamic data layouts for cache-conscious factorization of DFT
- D. Kang. Dynamic data layouts for cache-conscious factorization of DFT. In IPDPS '00, page 693, 2000.
- (2000) IPDPS , pp. 693
- Kang, D.¹

9
- 84976736383
- Page placement algorithms for large real-indexed caches
- R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst., 10(4), 1992.
- (1992) ACM Trans. Comput. Syst. , vol.10 , pp. 4
- Kessler, R.E.¹ Hill, M.D.²

10
- 10444238444
- Fair cache sharing and partitioning in a chip multiprocessor architecture
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT'04, pages 111-122, 2004.
- (2004) PACT'04 , pp. 111-122
- Kim, S.¹ Chandra, D.² Solihin, Y.³

11
- 47249103334
- Using OS observations to improve performance in multicore systems
- R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multicore systems. IEEE Micro, 28(3):54-66, 2008.
- (2008) IEEE Micro , vol.28 , Issue.3 , pp. 54-66
- Knauerhase, R.¹ Brett, P.² Hohlt, B.³ Li, T.⁴ Hahn, S.⁵

12
- 0026137116
- The cache performance and optimizations of blocked algorithms
- M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In ASPLOS '91, pages 63-74, 1991.
- (1991) ASPLOS , vol.91 , pp. 63-74
- Lam, M.D.¹ Rothberg, E.E.² Wolf, M.E.³

13
- 0030733703
- The influence of caches on the performance of sorting
- A. LaMarca and R. E. Ladner. The influence of caches on the performance of sorting. In SODA '97, pages 370-379.
- SODA , vol.97 , pp. 370-379
- LaMarca, A.¹ Ladner, R.E.²

14
- 77955032509
- MCC-DB: Minimizing cache conflicts in muli-core processors for databases
- R. Lee, X. Ding, F. Chen, Q. Lu, and X. Zhang. MCC-DB: Minimizing cache conflicts in muli-core processors for databases. In VLDB'09.
- VLDB'09
- Lee, R.¹ Ding, X.² Chen, F.³ Lu, Q.⁴ Zhang, X.⁵

15
- 57749186047
- Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
- Salt Lake City, UT
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA '08, pages 367-378, Salt Lake City, UT, 2008.
- (2008) HPCA , vol.8 , pp. 367-378
- Lin, J.¹ Lu, Q.² Ding, X.³ Zhang, Z.⁴ Zhang, X.⁵ Sadayappan, P.⁶

16
- 79952791256
- Enabling software multicore cache management with lightweight hardware support
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Enabling software multicore cache management with lightweight hardware support. In SC'09, 2009.
- (2009) SC'09
- Lin, J.¹ Lu, Q.² Ding, X.³ Zhang, Z.⁴ Zhang, X.⁵ Sadayappan, P.⁶

17
- 2342468635
- Organizing the last line of defense before hitting the memory wall for CMPs
- C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In HPCA'04, pages 176-185, 2004.
- (2004) HPCA'04 , pp. 176-185
- Liu, C.¹ Sivasubramaniam, A.² Kandemir, M.³

18
- 70449652924
- Soft-OLP: Improving hardware cache performance through softwarecontrolled object-level partitioning
- Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through softwarecontrolled object-level partitioning. In PACT '09, pages 246-257, 2009.
- (2009) PACT , vol.9 , pp. 246-257
- Lu, Q.¹ Lin, J.² Ding, X.³ Zhang, Z.⁴ Zhang, X.⁵ Sadayappan, P.⁶

19
- 77951854292
- S. K. Moore. Multicore is bad news for supercomputers. pages 213-226, 2008.
- (2008) Multicore is Bad News for Supercomputers. , pp. 213-226
- Moore, S.K.¹

20
- 0035177611
- Cache-friendly implementations of transitive closure
- Barcelona, Spain
- M. Penner and V. K. Prasanna. Cache-friendly implementations of transitive closure. In PACT '01, page 185, Barcelona, Spain, 2001.
- (2001) PACT , vol.1 , pp. 185
- Penner, M.¹ Prasanna, V.K.²

21
- 34548042910
- Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
- DOI 10.1109/MICRO.2006.49, 4041865, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-39
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO'06, pages 423-432, 2006. (Pubitemid 351337015)
- (2006) Proceedings of the Annual International Symposium on Microarchitecture, MICRO , pp. 423-432
- Qureshi, M.K.¹ Patt, Y.N.²

22
- 0036038691
- Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
- A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In SIGMETRICS' 02, pages 66-76.
- SIGMETRICS' , vol.2 , pp. 66-76
- Snavely, A.¹ Tullsen, D.M.² Voelker, G.³

23
- 66749168716
- Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer
- L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In MICRO '08, pages 258-269, 2008.
- (2008) MICRO , vol.8 , pp. 258-269
- Soares, L.¹ Tam, D.² Stumm, M.³

24
- 1642371317
- Dynamic partitioning of shared cache memory
- G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. J. Supercomputing, 28(1), 2002.
- (2002) J. Supercomputing , vol.28 , pp. 1
- Suh, G.E.¹ Rudolph, L.² Devadas, S.³

25
- 57749176037
- Managing shared l2 caches on multicore systems in software
- D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In WIOSCA '07, 2007.
- (2007) WIOSCA , vol.7
- Tam, D.¹ Azimi, R.² Soares, L.³ Stumm, M.⁴

26
- 34548030923
- Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors
- DOI 10.1145/1272996.1273004, Operating Systems Review - Proceedings of the 2007 EuroSys Conference
- D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys'07, pages 47-58, 2007. (Pubitemid 47281574)
- (2007) Operating Systems Review (ACM) , pp. 47-58
- Tam, D.¹ Azimi, R.² Stumm, M.³

27
- 80054007458
- TOP500.Org. URL
- TOP500.Org. URL http://www.top500.org/lists/2010/06.

28
- 84863652586
- A new approach to array redistribution: Strip mining redistribution
- A. Wakatani and M. Wolfe. A new approach to array redistribution: Strip mining redistribution. In PARLE '94, pages 323-335, 1994.
- (1994) PARLE , vol.94 , pp. 323-335
- Wakatani, A.¹ Wolfe, M.²

29
- 0003278639
- Automatically tuned linear algebra software
- R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SC '98, 1998.
- (1998) SC , vol.98
- Whaley, R.C.¹ Dongarra, J.²

30
- 0002433589
- Iteration space tiling for memory hierarchies
- Philadelphia, PA
- M. Wolfe. Iteration space tiling for memory hierarchies. In PP '89, pages 357-361, Philadelphia, PA, 1989.
- (1989) PP , vol.89 , pp. 357-361
- Wolfe, M.¹

31
- 0024935630
- More iteration space tiling
- M. Wolfe. More iteration space tiling. In SC '89, pages 655-664, 1989. (Pubitemid 20665965)
- (1989) Proc Supercomput 89 , pp. 655-664
- Wolfe Michael¹

32
- 85000339078
- Improving memory performance of sorting algorithms
- L. Xiao, X. Zhang, and S. A. Kubricht. Improving memory performance of sorting algorithms. ACM J. Exp. Algorithmics, 5:2000, 2000.
- (2000) ACM J. Exp. Algorithmics , vol.5 , pp. 2000
- Xiao, L.¹ Zhang, X.² Kubricht, S.A.³

33
- 35248846531
- An experimental comparison of cache-oblivious and cache-conscious programs
- DOI 10.1145/1248377.1248394, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
- K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson. An experimental comparison of cache-oblivious and cache-conscious programs. In SPAA '07, pages 93-104, 2007. (Pubitemid 47568558)
- (2007) Annual ACM Symposium on Parallelism in Algorithms and Architectures , pp. 93-104
- Yotov, K.¹ Roeder, T.² Pingali, K.³ Gunnels, J.⁴ Gustavson, F.⁵

34
- 70349111334
- Towards practical page coloring-based multicore cache management
- X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys'09, pages 89-102, 2009.
- (2009) EuroSys'09 , pp. 89-102
- Zhang, X.¹ Dwarkadas, S.² Shen, K.³

35
- 77952248898
- Addressing shared resource contention in multicore processors via scheduling
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS '10, pages 129-142, 2010.
- ASPLOS , vol.10 , Issue.2010 , pp. 129-142
- Zhuravlev, S.¹ Blagodurov, S.² Fedorova, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.