메뉴 건너뛰기




Volumn , Issue , 2009, Pages 505-516

Optimizing shared cache behavior of chip multiprocessors

Author keywords

Algorithm; B.3.2 memory structures : design styles cache memories; D.3.4 programming languages : processors compilers; Design; Experimentation; Performance

Indexed keywords

D.3.4 [PROGRAMMING LANGUAGES]: PROCESSORS - COMPILERS; DESIGN STYLES; EXPERIMENTATION; MEMORY STRUCTURE; PERFORMANCE; PROGRAMMING LANGUAGE;

EID: 76749137634     PISSN: 10724451     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1669112.1669176     Document Type: Conference Paper
Times cited : (29)

References (57)
  • 1
    • 76749126201 scopus 로고    scopus 로고
    • AMD Athlon 64 X2 Dual-Core processor for desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30-118-9485-13041,00.html
    • AMD Athlon 64 X2 Dual-Core processor for desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30-118-9485-13041,00.html
  • 2
    • 76749162689 scopus 로고
    • Data and computation transformations for multiprocessors
    • J. M. Anderson et al. Data and computation transformations for multiprocessors. In Proc. POPL, 1995.
    • (1995) Proc. POPL
    • Anderson, J.M.1
  • 3
    • 0029373981 scopus 로고
    • Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors
    • A. Agarwal et al. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. In TPDS, 1995.
    • (1995) TPDS
    • Agarwal, A.1
  • 4
    • 76749087837 scopus 로고    scopus 로고
    • Precise automatable analytical modeling of the cache behavior of codes with indirections
    • D. Andrade et al. Precise automatable analytical modeling of the cache behavior of codes with indirections. In TACO, 2007.
    • (2007) TACO
    • Andrade, D.1
  • 5
    • 76749103691 scopus 로고    scopus 로고
    • Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
    • C.S. Ballapuram et al. Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. In Proc. ASPLOS, 2008.
    • (2008) Proc. ASPLOS
    • Ballapuram, C.S.1
  • 6
    • 63549135938 scopus 로고    scopus 로고
    • Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
    • M. Baskaran et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In Proc. PPoPP, 2008.
    • (2008) Proc. PPoPP
    • Baskaran, M.1
  • 7
    • 21644472427 scopus 로고    scopus 로고
    • Managing wire delay in large chip-multiprocessor caches
    • D. Beckmann, D. Wood. Managing wire delay in large chip-multiprocessor caches. In Proc. MICRO, 2004.
    • (2004) Proc. MICRO
    • Beckmann, D.1    Wood, D.2
  • 8
    • 63549095070 scopus 로고    scopus 로고
    • The PARSEC benchmark suite: Characterization and architectural implications
    • October
    • C. Bienia, S. Kumar, J. P. Singh and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proc. PACT, October 2008.
    • (2008) Proc. PACT
    • Bienia, C.1    Kumar, S.2    Singh, J.P.3    Li, K.4
  • 9
    • 76749086882 scopus 로고    scopus 로고
    • Programming for parallelism and locality with hierarchically tiled arrays
    • G. Bikshandi et al. Programming for parallelism and locality with hierarchically tiled arrays. In Proc. PPOPP, 2006.
    • (2006) Proc. PPOPP
    • Bikshandi, G.1
  • 10
    • 57349145904 scopus 로고    scopus 로고
    • Automatic transformations for communication- minimized parallelization and locality optimization in the polyhedral model
    • U. Bondhugula et al. Automatic transformations for communication- minimized parallelization and locality optimization in the polyhedral model. In Proc. CC, 2008.
    • (2008) Proc. CC
    • Bondhugula, U.1
  • 11
    • 85009364061 scopus 로고
    • Compiler optimizations for improving data locality
    • S. Carr et al. Compiler optimizations for improving data locality. In Proc. ASPLOS, 1994.
    • (1994) Proc. ASPLOS
    • Carr, S.1
  • 12
    • 0035338106 scopus 로고    scopus 로고
    • Code transformations for data transfer and storage exploration preprocessing in multimedia processors
    • F. Catthoor et al. Code transformations for data transfer and storage exploration preprocessing in multimedia processors. In IEEE Design Test, 2001.
    • (2001) IEEE Design Test
    • Catthoor, F.1
  • 13
    • 0034832018 scopus 로고    scopus 로고
    • Exact analysis of the cache behavior of nested loops
    • S. Chatterjee et al.Exact analysis of the cache behavior of nested loops. In SIGPLAN Not., 2001.
    • (2001) SIGPLAN
    • Chatterjee, S.1
  • 14
    • 76749152674 scopus 로고    scopus 로고
    • Dynamic partitioning of shared cache memory
    • J. Chang, G. Sohi. Dynamic partitioning of shared cache memory. In Proc. ICS, 2007.
    • (2007) Proc. ICS
    • Chang, J.1    Sohi, G.2
  • 15
    • 0028499023 scopus 로고
    • Communication-free data allocation techniques for parallelizing compilers on multicomputers
    • T. S. Chen, J. P. Sheu. Communication-free data allocation techniques for parallelizing compilers on multicomputers. In TPDS, 1994.
    • (1994) TPDS
    • Chen, T.S.1    Sheu, J.P.2
  • 16
    • 35248852476 scopus 로고    scopus 로고
    • Scheduling threads for constructive cache sharing on CMPs
    • June
    • S. Chen et al. Scheduling threads for constructive cache sharing on CMPs. In Proc. ACM SPAA, June 2007.
    • (2007) Proc. ACM SPAA
    • Chen, S.1
  • 17
    • 76749093491 scopus 로고    scopus 로고
    • A TDI system and its application to approximation algorithms
    • M. Cheng et al. A TDI system and its application to approximation algorithms. In Proc. FOCS, 1998.
    • (1998) Proc. FOCS
    • Cheng, M.1
  • 18
    • 27544432313 scopus 로고    scopus 로고
    • Optimizing replication, communication, and capacity allocation in CMPs
    • Z. Chishti et al. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. ISCA, 2005.
    • (2005) Proc. ISCA
    • Chishti, Z.1
  • 20
    • 0003795618 scopus 로고
    • Unifying Data and control transformations for distributed shared memory machines
    • Rochester
    • M. Cierniak, W. Li, Unifying Data and control transformations for distributed shared memory machines. In Tech. Rep. U. Rochester, 1994.
    • (1994) Tech. Rep. U
    • Cierniak, M.1    Li, W.2
  • 21
    • 76749100063 scopus 로고    scopus 로고
    • K. Cooper L. Torczon. Engineering a compiler. 2008.
    • K. Cooper L. Torczon. Engineering a compiler. 2008.
  • 24
    • 0026891897 scopus 로고
    • Partitioning and labeling of loops by unimodular transformations
    • E. DŠ'Hollander. Partitioning and labeling of loops by unimodular transformations. In TPDS, 1992.
    • (1992) TPDS
    • DŠ'Hollander, E.1
  • 25
    • 0030675463 scopus 로고    scopus 로고
    • Cache miss equations: An analytical representation of cache misses
    • S. Ghosh et al. Cache miss equations: An analytical representation of cache misses.In Proc. ICS, 1997.
    • (1997) Proc. ICS
    • Ghosh, S.1
  • 26
    • 0030380793 scopus 로고    scopus 로고
    • Maximizing multiprocessor performance with the SUIF compiler
    • M. W. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. In Computer, 1996.
    • (1996) Computer
    • Hall, M.W.1
  • 27
    • 84868170244 scopus 로고    scopus 로고
    • http://www.intel.com/p/en US/products/server/processor/xeon7000?iid= servproc+body xeon7400subtitle
  • 28
    • 76749110184 scopus 로고    scopus 로고
    • Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:ggl|xeon us clovertown|k7449|s
    • Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:ggl|xeon us clovertown|k7449|s
  • 29
    • 84868176335 scopus 로고    scopus 로고
    • http://www.intel.com/idf/.
  • 31
    • 3042669130 scopus 로고    scopus 로고
    • IBM Power5 chip: A dual-core multithreaded processor
    • R. Kalla et al. IBM Power5 chip: a dual-core multithreaded processor. In IEEE Micro, 2004.
    • (2004) IEEE Micro
    • Kalla, R.1
  • 32
    • 50249115185 scopus 로고    scopus 로고
    • Data locality enhancement for CMPs
    • M. Kandemir. Data locality enhancement for CMPs. In Proc. ICCAD, 2007.
    • (2007) Proc. ICCAD
    • Kandemir, M.1
  • 33
    • 76749105972 scopus 로고    scopus 로고
    • Cache-aware iteration space partitioning
    • A. Kejariwal et al. Cache-aware iteration space partitioning. In Proc. PPoPP, 2008.
    • (2008) Proc. PPoPP
    • Kejariwal, A.1
  • 34
    • 0346865818 scopus 로고    scopus 로고
    • Data-centric transformations for locality enhancement
    • I. Kodukula, K. Pingali. Data-centric transformations for locality enhancement. In IJPP, 2001.
    • (2001) IJPP
    • Kodukula, I.1    Pingali, K.2
  • 35
    • 20344374162 scopus 로고    scopus 로고
    • Niagara: A 32-way multithreaded SPARC processor
    • P. Kongetira et al. Niagara: A 32-way multithreaded SPARC processor. In IEEE Micro, 2005.
    • (2005) IEEE Micro
    • Kongetira, P.1
  • 36
    • 62349131952 scopus 로고
    • The cache performance of blocked algorithms
    • M. Lam et al. The cache performance of blocked algorithms. In Proc. ASPLOS, 1991.
    • (1991) Proc. ASPLOS
    • Lam, M.1
  • 37
    • 37549032725 scopus 로고    scopus 로고
    • IBM POWER6 microarchitecture
    • H. Q. Le, et al. IBM POWER6 microarchitecture. In IBM Jrnl. of R&D, 2007.
    • (2007) IBM Jrnl. of R&D
    • Le, H.Q.1
  • 39
    • 57349101237 scopus 로고    scopus 로고
    • Data and computation transformations for Brook streaming applications on multiprocessors
    • S. Liao et al. Data and computation transformations for Brook streaming applications on multiprocessors. In Proc. CGO, 2006.
    • (2006) Proc. CGO
    • Liao, S.1
  • 41
    • 33751424104 scopus 로고    scopus 로고
    • Adaptive designs for power and thermal optimization
    • R. McGowen. Adaptive designs for power and thermal optimization. In Proc. ICCAD, 2005.
    • (2005) Proc. ICCAD
    • McGowen, R.1
  • 42
    • 34247326334 scopus 로고    scopus 로고
    • Omega library. http://www.cs.umd.edu/projects/omega.
    • Omega library
  • 43
    • 0028132512 scopus 로고
    • Counting solutions to Presburger formulas: How and why
    • W. Pugh. Counting solutions to Presburger formulas: how and why. Proc. PLDI, 1994.
    • (1994) Proc. PLDI
    • Pugh, W.1
  • 44
    • 84868190036 scopus 로고    scopus 로고
    • Quad-core AMD Opteron. http://multicore.amd.com/us-en/quadcore/
    • Opteron
  • 45
    • 34548042910 scopus 로고    scopus 로고
    • Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
    • M. K. Qureshi, Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. MICRO, 2006.
    • (2006) Proc. MICRO
    • Qureshi, M.K.1    Patt, Y.N.2
  • 46
    • 64949109435 scopus 로고    scopus 로고
    • Architectural support for operating system-driven CMP cache management
    • N. Rafique et al. Architectural support for operating system-driven CMP cache management. In Proc. PACT, 2006.
    • (2006) Proc. PACT
    • Rafique, N.1
  • 47
    • 43249127323 scopus 로고    scopus 로고
    • Dynamically configurable shared CMP helper engines for improved performance
    • A. Shayesteh et al. Dynamically configurable shared CMP helper engines for improved performance. In SIGARCH Comput. Archit., 2005.
    • (2005) SIGARCH Comput. Archit
    • Shayesteh, A.1
  • 49
    • 84868170241 scopus 로고    scopus 로고
    • SIMICS
    • SIMICS. http://www.virtutech.com/simics/simics.html.
  • 51
    • 70350634177 scopus 로고    scopus 로고
    • Adaptive set pinning: Managing shared caches in CMPs
    • S. Srikantaiah et al. Adaptive set pinning: managing shared caches in CMPs. In Proc. ASPLOS, 2008.
    • (2008) Proc. ASPLOS
    • Srikantaiah, S.1
  • 52
    • 1642371317 scopus 로고    scopus 로고
    • Dynamic partitioning of shared cache memory
    • G. E. Suh et al. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 2004.
    • (2004) Journal of Supercomputing
    • Suh, G.E.1
  • 54
    • 1842635044 scopus 로고    scopus 로고
    • A fast and accurate framework to analyze and optimize cache memory behavior
    • X. Vera et al. A fast and accurate framework to analyze and optimize cache memory behavior. In TOPLAS 2004.
    • (2004) TOPLAS
    • Vera, X.1
  • 55
    • 85013942562 scopus 로고
    • A data locality optimizing algorithm
    • M. Wolf, M. Lam. A data locality optimizing algorithm. In Proc. PLDI, 1991.
    • (1991) Proc. PLDI
    • Wolf, M.1    Lam, M.2
  • 56
    • 0002375353 scopus 로고
    • The SPLASH-2 programs: Characterization and methodological considerations
    • S. Woo et al. The SPLASH-2 programs: characterization and methodological considerations. In Proc. ISCA, 1995.
    • (1995) Proc. ISCA
    • Woo, S.1
  • 57
    • 76749139374 scopus 로고    scopus 로고
    • A hierarchical model of data locality
    • C. Zhang et al. A hierarchical model of data locality. In Proc. POPL, 2006.
    • (2006) Proc. POPL
    • Zhang, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.