SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn , Issue , 2009, Pages 505-516

Optimizing shared cache behavior of chip multiprocessors

(5) Kandemir, Mahmut a Muralidhara, Sai Prashanth a Narayanan, Sri Hari Krishna a,c Zhang, Yuanrui a Ozturk, Ozcan b

a PENNSYLVANIA STATE UNIVERSITY (United States)

b BILKENT UNIVERSITY (Turkey)

c ARGONNE NATIONAL LABORATORY (United States)

Author keywords

Algorithm; B.3.2 memory structures : design styles cache memories; D.3.4 programming languages : processors compilers; Design; Experimentation; Performance

Indexed keywords

D.3.4 [PROGRAMMING LANGUAGES]: PROCESSORS - COMPILERS; DESIGN STYLES; EXPERIMENTATION; MEMORY STRUCTURE; PERFORMANCE; PROGRAMMING LANGUAGE;

COMPUTER SOFTWARE; DESIGN; EXPERIMENTS; LINGUISTICS; MICROPROCESSOR CHIPS; MULTIPROCESSING SYSTEMS; OPTIMIZATION; PROGRAM COMPILERS;

CACHE MEMORY;

EID: 76749137634 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1669112.1669176 Document Type: Conference Paper

Times cited : (29)

References (57)

1
- 76749126201
- AMD Athlon 64 X2 Dual-Core processor for desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30-118-9485-13041,00.html
- AMD Athlon 64 X2 Dual-Core processor for desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30-118-9485-13041,00.html

2
- 76749162689
- Data and computation transformations for multiprocessors
- J. M. Anderson et al. Data and computation transformations for multiprocessors. In Proc. POPL, 1995.
- (1995) Proc. POPL
- Anderson, J.M.¹

3
- 0029373981
- Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors
- A. Agarwal et al. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. In TPDS, 1995.
- (1995) TPDS
- Agarwal, A.¹

4
- 76749087837
- Precise automatable analytical modeling of the cache behavior of codes with indirections
- D. Andrade et al. Precise automatable analytical modeling of the cache behavior of codes with indirections. In TACO, 2007.
- (2007) TACO
- Andrade, D.¹

5
- 76749103691
- Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
- C.S. Ballapuram et al. Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. In Proc. ASPLOS, 2008.
- (2008) Proc. ASPLOS
- Ballapuram, C.S.¹

6
- 63549135938
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
- M. Baskaran et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In Proc. PPoPP, 2008.
- (2008) Proc. PPoPP
- Baskaran, M.¹

7
- 21644472427
- Managing wire delay in large chip-multiprocessor caches
- D. Beckmann, D. Wood. Managing wire delay in large chip-multiprocessor caches. In Proc. MICRO, 2004.
- (2004) Proc. MICRO
- Beckmann, D.¹ Wood, D.²

8
- 63549095070
- The PARSEC benchmark suite: Characterization and architectural implications
- October
- C. Bienia, S. Kumar, J. P. Singh and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proc. PACT, October 2008.
- (2008) Proc. PACT
- Bienia, C.¹ Kumar, S.² Singh, J.P.³ Li, K.⁴

9
- 76749086882
- Programming for parallelism and locality with hierarchically tiled arrays
- G. Bikshandi et al. Programming for parallelism and locality with hierarchically tiled arrays. In Proc. PPOPP, 2006.
- (2006) Proc. PPOPP
- Bikshandi, G.¹

10
- 57349145904
- Automatic transformations for communication- minimized parallelization and locality optimization in the polyhedral model
- U. Bondhugula et al. Automatic transformations for communication- minimized parallelization and locality optimization in the polyhedral model. In Proc. CC, 2008.
- (2008) Proc. CC
- Bondhugula, U.¹

11
- 85009364061
- Compiler optimizations for improving data locality
- S. Carr et al. Compiler optimizations for improving data locality. In Proc. ASPLOS, 1994.
- (1994) Proc. ASPLOS
- Carr, S.¹

12
- 0035338106
- Code transformations for data transfer and storage exploration preprocessing in multimedia processors
- F. Catthoor et al. Code transformations for data transfer and storage exploration preprocessing in multimedia processors. In IEEE Design Test, 2001.
- (2001) IEEE Design Test
- Catthoor, F.¹

13
- 0034832018
- Exact analysis of the cache behavior of nested loops
- S. Chatterjee et al.Exact analysis of the cache behavior of nested loops. In SIGPLAN Not., 2001.
- (2001) SIGPLAN
- Chatterjee, S.¹

14
- 76749152674
- Dynamic partitioning of shared cache memory
- J. Chang, G. Sohi. Dynamic partitioning of shared cache memory. In Proc. ICS, 2007.
- (2007) Proc. ICS
- Chang, J.¹ Sohi, G.²

15
- 0028499023
- Communication-free data allocation techniques for parallelizing compilers on multicomputers
- T. S. Chen, J. P. Sheu. Communication-free data allocation techniques for parallelizing compilers on multicomputers. In TPDS, 1994.
- (1994) TPDS
- Chen, T.S.¹ Sheu, J.P.²

16
- 35248852476
- Scheduling threads for constructive cache sharing on CMPs
- June
- S. Chen et al. Scheduling threads for constructive cache sharing on CMPs. In Proc. ACM SPAA, June 2007.
- (2007) Proc. ACM SPAA
- Chen, S.¹

17
- 76749093491
- A TDI system and its application to approximation algorithms
- M. Cheng et al. A TDI system and its application to approximation algorithms. In Proc. FOCS, 1998.
- (1998) Proc. FOCS
- Cheng, M.¹

18
- 27544432313
- Optimizing replication, communication, and capacity allocation in CMPs
- Z. Chishti et al. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. ISCA, 2005.
- (2005) Proc. ISCA
- Chishti, Z.¹

19
- 84886020769
- Proc. CGO
- M. L. Chu, S. A. Mahlke. Compiler-directed data partitioning for multicluster processors. In Proc. CGO, 2006.
- (2006) Compiler-directed data partitioning for multicluster processors
- Chu, M.L.¹ Mahlke, S.A.²

20
- 0003795618
- Unifying Data and control transformations for distributed shared memory machines
- Rochester
- M. Cierniak, W. Li, Unifying Data and control transformations for distributed shared memory machines. In Tech. Rep. U. Rochester, 1994.
- (1994) Tech. Rep. U
- Cierniak, M.¹ Li, W.²

21
- 76749100063
- K. Cooper L. Torczon. Engineering a compiler. 2008.
- K. Cooper L. Torczon. Engineering a compiler. 2008.

22
- 0004116989
- T. H. Cormen et al. Introduction to algorithms. 2001.
- (2001) Introduction to algorithms
- Cormen, T.H.¹

23
- 0003662159
- D. Culler et al. Parallel computer architecture: a hardware/software approach. 1999.
- (1999) Parallel computer architecture: A hardware/software approach
- Culler, D.¹

24
- 0026891897
- Partitioning and labeling of loops by unimodular transformations
- E. DŠ'Hollander. Partitioning and labeling of loops by unimodular transformations. In TPDS, 1992.
- (1992) TPDS
- DŠ'Hollander, E.¹

25
- 0030675463
- Cache miss equations: An analytical representation of cache misses
- S. Ghosh et al. Cache miss equations: An analytical representation of cache misses.In Proc. ICS, 1997.
- (1997) Proc. ICS
- Ghosh, S.¹

26
- 0030380793
- Maximizing multiprocessor performance with the SUIF compiler
- M. W. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. In Computer, 1996.
- (1996) Computer
- Hall, M.W.¹

27
- 84868170244
- http://www.intel.com/p/en US/products/server/processor/xeon7000?iid= servproc+body xeon7400subtitle

28
- 76749110184
- Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:ggl|xeon us clovertown|k7449|s
- Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:ggl|xeon us clovertown|k7449|s

29
- 84868176335
- http://www.intel.com/idf/.

30
- 25844503119
- Introduction to the Cell Multiprocessor
- J. Kahle et al. Introduction to the Cell Multiprocessor. In IBM Journal of Research and Development, 2005.
- (2005) IBM Journal of Research and Development
- Kahle, J.¹

31
- 3042669130
- IBM Power5 chip: A dual-core multithreaded processor
- R. Kalla et al. IBM Power5 chip: a dual-core multithreaded processor. In IEEE Micro, 2004.
- (2004) IEEE Micro
- Kalla, R.¹

32
- 50249115185
- Data locality enhancement for CMPs
- M. Kandemir. Data locality enhancement for CMPs. In Proc. ICCAD, 2007.
- (2007) Proc. ICCAD
- Kandemir, M.¹

33
- 76749105972
- Cache-aware iteration space partitioning
- A. Kejariwal et al. Cache-aware iteration space partitioning. In Proc. PPoPP, 2008.
- (2008) Proc. PPoPP
- Kejariwal, A.¹

34
- 0346865818
- Data-centric transformations for locality enhancement
- I. Kodukula, K. Pingali. Data-centric transformations for locality enhancement. In IJPP, 2001.
- (2001) IJPP
- Kodukula, I.¹ Pingali, K.²

35
- 20344374162
- Niagara: A 32-way multithreaded SPARC processor
- P. Kongetira et al. Niagara: A 32-way multithreaded SPARC processor. In IEEE Micro, 2005.
- (2005) IEEE Micro
- Kongetira, P.¹

36
- 62349131952
- The cache performance of blocked algorithms
- M. Lam et al. The cache performance of blocked algorithms. In Proc. ASPLOS, 1991.
- (1991) Proc. ASPLOS
- Lam, M.¹

37
- 37549032725
- IBM POWER6 microarchitecture
- H. Q. Le, et al. IBM POWER6 microarchitecture. In IBM Jrnl. of R&D, 2007.
- (2007) IBM Jrnl. of R&D
- Le, H.Q.¹

38
- 0003888396
- Compiling for NUMA parallel machines
- W. Li. Compiling for NUMA parallel machines. In Ph.D. Thesis, Cornell University, 1993.
- (1993) Ph.D. Thesis, Cornell University
- Li, W.¹

39
- 57349101237
- Data and computation transformations for Brook streaming applications on multiprocessors
- S. Liao et al. Data and computation transformations for Brook streaming applications on multiprocessors. In Proc. CGO, 2006.
- (2006) Proc. CGO
- Liao, S.¹

40
- 33748870886
- Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset
- September
- M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. R. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, September 2005.
- (2005) Computer Architecture News
- Martin, M.¹ Sorin, D.² Beckmann, B.³ Marty, M.⁴ Xu, M.⁵ Alameldeen, A.R.⁶ Moore, K.⁷ Hill, M.⁸ Wood, D.⁹

41
- 33751424104
- Adaptive designs for power and thermal optimization
- R. McGowen. Adaptive designs for power and thermal optimization. In Proc. ICCAD, 2005.
- (2005) Proc. ICCAD
- McGowen, R.¹

42
- 34247326334
- Omega library. http://www.cs.umd.edu/projects/omega.
- Omega library

43
- 0028132512
- Counting solutions to Presburger formulas: How and why
- W. Pugh. Counting solutions to Presburger formulas: how and why. Proc. PLDI, 1994.
- (1994) Proc. PLDI
- Pugh, W.¹

44
- 84868190036
- Quad-core AMD Opteron. http://multicore.amd.com/us-en/quadcore/
- Opteron

45
- 34548042910
- Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
- M. K. Qureshi, Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. MICRO, 2006.
- (2006) Proc. MICRO
- Qureshi, M.K.¹ Patt, Y.N.²

46
- 64949109435
- Architectural support for operating system-driven CMP cache management
- N. Rafique et al. Architectural support for operating system-driven CMP cache management. In Proc. PACT, 2006.
- (2006) Proc. PACT
- Rafique, N.¹

47
- 43249127323
- Dynamically configurable shared CMP helper engines for improved performance
- A. Shayesteh et al. Dynamically configurable shared CMP helper engines for improved performance. In SIGARCH Comput. Archit., 2005.
- (2005) SIGARCH Comput. Archit
- Shayesteh, A.¹

48
- 0004233425
- A. Silberschatz et al. Operating system concepts. 2008.
- (2008) Operating system concepts
- Silberschatz, A.¹

49
- 84868170241
- SIMICS
- SIMICS. http://www.virtutech.com/simics/simics.html.

50
- 76749094940
- Using locality surfaces to characterize the SPECint 2000 benchmark suite
- E. Sorenson, J. K. Flanagan. Using locality surfaces to characterize the SPECint 2000 benchmark suite. In Workload Characterization of Emerging Computer Applications, 2001.
- (2001) Workload Characterization of Emerging Computer Applications
- Sorenson, E.¹ Flanagan, J.K.²

51
- 70350634177
- Adaptive set pinning: Managing shared caches in CMPs
- S. Srikantaiah et al. Adaptive set pinning: managing shared caches in CMPs. In Proc. ASPLOS, 2008.
- (2008) Proc. ASPLOS
- Srikantaiah, S.¹

52
- 1642371317
- Dynamic partitioning of shared cache memory
- G. E. Suh et al. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 2004.
- (2004) Journal of Supercomputing
- Suh, G.E.¹

53
- 76749158772
- Dataflow analysis driven dynamic data partitioning
- J. Tims et al. Dataflow analysis driven dynamic data partitioning. In Proc. of Workshop. on Languages, Compilers, and Run-time Systems for Scalable Computers, 1998.
- (1998) Proc. of Workshop. on Languages, Compilers, and Run-time Systems for Scalable Computers
- Tims, J.¹

54
- 1842635044
- A fast and accurate framework to analyze and optimize cache memory behavior
- X. Vera et al. A fast and accurate framework to analyze and optimize cache memory behavior. In TOPLAS 2004.
- (2004) TOPLAS
- Vera, X.¹

55
- 85013942562
- A data locality optimizing algorithm
- M. Wolf, M. Lam. A data locality optimizing algorithm. In Proc. PLDI, 1991.
- (1991) Proc. PLDI
- Wolf, M.¹ Lam, M.²

56
- 0002375353
- The SPLASH-2 programs: Characterization and methodological considerations
- S. Woo et al. The SPLASH-2 programs: characterization and methodological considerations. In Proc. ISCA, 1995.
- (1995) Proc. ISCA
- Woo, S.¹

57
- 76749139374
- A hierarchical model of data locality
- C. Zhang et al. A hierarchical model of data locality. In Proc. POPL, 2006.
- (2006) Proc. POPL
- Zhang, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.