SCOPUS 정보 검색 플랫폼

2015 IEEE 21st International Symposium on High Performance Computer Architecture, HPCA 2015

Volumn , Issue , 2015, Pages 538-550

Scaling distributed cache hierarchies through computation and data co-scheduling

(3) Beckmann, Nathan a Tsai, Po An a Sanchez, Daniel a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

cache; NUCA; partitioning; thread scheduling

Indexed keywords

SCHEDULING;

CACHE; CACHE HIERARCHIES; DISTRIBUTED CACHE; MONITORING HARDWARE; NUCA; PARTITIONING; SPACE ALLOCATION; THREAD SCHEDULING;

COMPUTER ARCHITECTURE;

EID: 84934297423 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCA.2015.7056061 Document Type: Conference Paper

Times cited : (33)

References (64)

1
- 33947715600
- IPC considered harmful for multiprocessor workloads
- A. Alameldeen and D. Wood, "IPC considered harmful for multiprocessor workloads," IEEE Micro, vol. 26, no. 4, 2006.
- (2006) IEEE Micro , vol.26 , Issue.4
- Alameldeen, A.¹ Wood, D.²

2
- 34548008288
- ASR: Adaptive selective replication for cmp caches
- B. Beckmann, M. Marty, and D. Wood, "ASR: Adaptive selective replication for CMP caches," in Proc. MICRO-39, 2006.
- (2006) Proc. MICRO-39
- Beckmann, B.¹ Marty, M.² Wood, D.³

3
- 21644472427
- Managing wire delay in large chipmultiprocessor caches
- B. Beckmann and D. Wood, "Managing wire delay in large chipmultiprocessor caches," in Proc. MICRO-37, 2004.
- (2004) Proc. MICRO-37
- Beckmann, B.¹ Wood, D.²

4
- 84887440618
- Jigsaw: Scalable software-defined caches
- N. Beckmann and D. Sanchez, "Jigsaw: Scalable Software-Defined Caches," in Proc. PACT-22, 2013.
- (2013) Proc. PACT-22
- Beckmann, N.¹ Sanchez, D.²

5
- 84934268669
- Talus: A simple way to remove cliffs in cache performance
- N. Beckmann and D. Sanchez, "Talus: A Simple Way to Remove Cliffs in Cache Performance," in Proc. HPCA-21, 2015.
- (2015) Proc. HPCA-21
- Beckmann, N.¹ Sanchez, D.²

6
- 49549108733
- TILE64 processor: A 64-core soc with mesh interconnect
- S. Bell, B. Edwards, J. Amann et al., "TILE64 processor: A 64-core SoC with mesh interconnect," in Proc. ISSCC, 2008.
- (2008) Proc. ISSCC
- Bell, S.¹ Edwards, B.² Amann, J.³

7
- 85077067285
- A case for NUMA-aware contention management on multicore systems
- S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova, "A case for NUMA-aware contention management on multicore systems," in Proc. USENIX ATC, 2011.
- (2011) Proc. USENIX ATC
- Blagodurov, S.¹ Zhuravlev, S.² Dashti, M.³ Fedorova, A.⁴

8
- 33845903561
- Cooperative caching for chip multiprocessors
- J. Chang and G. Sohi, "Cooperative caching for chip multiprocessors," in Proc. ISCA-33, 2006.
- (2006) Proc. ISCA-33
- Chang, J.¹ Sohi, G.²

9
- 0033683314
- Application-specific memory management for embedded systems using software-controlled caches
- D. Chiou, P. Jain, L. Rudolph, and S. Devadas, "Application-specific memory management for embedded systems using software-controlled caches," in Proc. DAC-37, 2000.
- (2000) Proc. DAC-37
- Chiou, D.¹ Jain, P.² Rudolph, L.³ Devadas, S.⁴

10
- 27544432313
- Optimizing replication, communication, and capacity allocation in cmps
- Z. Chishti, M. Powell, and T. Vijaykumar, "Optimizing replication, communication, and capacity allocation in cmps," in ISCA-32, 2005.
- (2005) ISCA-32
- Chishti, Z.¹ Powell, M.² Vijaykumar, T.³

11
- 40349095122
- Managing distributed, shared L2 caches through OS-level page allocation
- S. Cho and L. Jin, "Managing distributed, shared L2 caches through OS-level page allocation," in Proc. MICRO-39, 2006.
- (2006) Proc. MICRO-39
- Cho, S.¹ Jin, L.²

12
- 84881160871
- A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness
- H. Cook, M. Moreto, S. Bird et al., "A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness," in ISCA-40, 2013.
- (2013) ISCA-40
- Cook, H.¹ Moreto, M.² Bird, S.³

13
- 84906973650
- GPU Computing: To exascale and beyond
- W. J. Dally, "GPU Computing: To Exascale and Beyond," in SC Plenary Talk, 2010.
- (2010) SC Plenary Talk
- Dally, W.J.¹

14
- 84880278122
- Application-to-core mapping policies to reduce memory system interference in multi-core systems
- R. Das, R. Ausavarungnirun, O. Mutlu et al., "Application-to-core mapping policies to reduce memory system interference in multi-core systems," in Proc. HPCA-19, 2013.
- (2013) Proc. HPCA-19
- Das, R.¹ Ausavarungnirun, R.² Mutlu, O.³

15
- 84875650624
- Traffic management: A holistic approach to memory placement on NUMA systems
- M. Dashti, A. Fedorova, J. Funston et al., "Traffic management: a holistic approach to memory placement on NUMA systems," in Proc. ASPLOS-18, 2013.
- (2013) Proc. ASPLOS-18
- Dashti, M.¹ Fedorova, A.² Funston, J.³

16
- 84873622276
- The tail at scale
- J. Dean and L. Barroso, "The Tail at Scale," CACM, vol. 56, 2013.
- (2013) CACM , vol.56
- Dean, J.¹ Barroso, L.²

17
- 80052528714
- Dark silicon and the end of multicore scaling
- H. Esmaeilzadeh, E. Blem, R. St Amant et al., "Dark Silicon and The End of Multicore Scaling," in Proc. ISCA-38, 2011.
- (2011) Proc. ISCA-38
- Esmaeilzadeh, H.¹ Blem, E.² St Amant, R.³

18
- 47349085427
- A framework for providing quality of service in chip multi-processors
- F. Guo, Y. Solihin, L. Zhao, and R. Iyer, "A framework for providing quality of service in chip multi-processors," in Proc. MICRO-40, 2007.
- (2007) Proc. MICRO-40
- Guo, F.¹ Solihin, Y.² Zhao, L.³ Iyer, R.⁴

19
- 84890404001
- Gurobi
- Gurobi, "Gurobi optimizer reference manual version 5. 6," 2013.
- (2013) Gurobi Optimizer Reference Manual Version 5. 6

20
- 70350601187
- Reactive NUCA: Near-optimal block placement and replication in distributed caches
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: near-optimal block placement and replication in distributed caches," in Proc. ISCA-36, 2009.
- (2009) Proc. ISCA-36
- Hardavellas, N.¹ Ferdman, M.² Falsafi, B.³ Ailamaki, A.⁴

21
- 79961040286
- Toward dark silicon in servers
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward dark silicon in servers," IEEE Micro, vol. 31, no. 4, 2011.
- (2011) IEEE Micro , vol.31 , Issue.4
- Hardavellas, N.¹ Ferdman, M.² Falsafi, B.³ Ailamaki, A.⁴

22
- 84863354514
- Database servers on chip multiprocessors: Limitations and opportunities
- N. Hardavellas, I. Pandis, R. Johnson, and N. Mancheril, "Database Servers on Chip Multiprocessors: Limitations and Opportunities," in Proc. CIDR, 2007.
- (2007) Proc. CIDR
- Hardavellas, N.¹ Pandis, I.² Johnson, R.³ Mancheril, N.⁴

23
- 0004302191
- 5th ed.). Morgan Kaufmann
- J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantita-tive Approach (5th ed.). Morgan Kaufmann, 2011.
- (2011) Computer Architecture: A Quantita-tive Approach
- Hennessy, J.L.¹ Patterson, D.A.²

24
- 0000800074
- Geometrical cluster growth models and kinetic gelation
- H. J. Herrmann, "Geometrical cluster growth models and kinetic gelation," Physics Reports, vol. 136, no. 3, pp. 153-224, 1986.
- (1986) Physics Reports , vol.136 , Issue.3 , pp. 153-224
- Herrmann, H.J.¹

25
- 48249118853
- Amdahl's law in the multicore era
- M. D. Hill and M. R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41, no. 7, 2008.
- (2008) Computer , vol.41 , Issue.7
- Hill, M.D.¹ Marty, M.R.²

26
- 84910129119
- FIESTA: A sample-balanced multi-program workload methodology
- A. Hilton, N. Eswaran, and A. Roth, "FIESTA: A sample-balanced multi-program workload methodology," in Proc. MoBS, 2009.
- (2009) Proc. MoBS
- Hilton, A.¹ Eswaran, N.² Roth, A.³

27
- 84934313335
- Knights Landing: Next Generation Intel Xeon Phi
- Intel, "Knights Landing: Next Generation Intel Xeon Phi," in SC Presentation, 2013.
- (2013) SC Presentation
- Intel¹

28
- 34548225417
- A NUCA substrate for flexible CMP cache sharing
- J. Jaehyuk Huh, C. Changkyu Kim, H. Shafi et al., "A NUCA substrate for flexible CMP cache sharing," IEEE Trans. Par. Dist. Sys., vol. 18, no. 8, 2007.
- (2007) IEEE Trans. Par. Dist. Sys. , vol.18 , Issue.8
- Jaehyuk Huh, J.¹ Changkyu Kim, C.² Shafi, H.³

29
- 84858767531
- CRUISE: Cache replacement and utility-aware scheduling
- A. Jaleel, H. H. Najaf-Abadi, S. Subramaniam et al., "CRUISE: Cache replacement and utility-aware scheduling," in Proc. ASPLOS, 2012.
- (2012) Proc. ASPLOS
- Jaleel, A.¹ Najaf-Abadi, H.H.² Subramaniam, S.³

30
- 84904009167
- D. Kanter, "Silvermont, Intels Low Power Architecture," 2013. [Online]. Available: http://www. realworldtech. com/silvermont/
- (2013) Silvermont, Intels Low Power Architecture
- Kanter, D.¹

31
- 0032131147
- A fast and high quality multilevel scheme for partitioning irregular graphs
- G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM J. Sci. Comput., vol. 20, 1998.
- (1998) SIAM J. Sci. Comput. , vol.20
- Karypis, G.¹ Kumar, V.²

32
- 84897791436
- Ubik: Efficient cache sharing with strict qos for latency-critical workloads
- H. Kasture and D. Sanchez, "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. ASPLOS-19, 2014.
- (2014) Proc. ASPLOS-19
- Kasture, H.¹ Sanchez, D.²

33
- 0028445155
- A comparison of trace-sampling techniques for multi-megabyte caches
- R. Kessler, M. Hill, and D. Wood, "A comparison of trace-sampling techniques for multi-megabyte caches," IEEE T. Comput., vol. 43, 1994.
- (1994) IEEE T. Comput. , vol.43
- Kessler, R.¹ Hill, M.² Wood, D.³

34
- 0036949388
- An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
- C. Kim, D. Burger, and S. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in ASPLOS, 2002.
- (2002) ASPLOS
- Kim, C.¹ Burger, D.² Keckler, S.³

35
- 77952125596
- Westmere: A family of 32nm IA processors
- N. Kurd, S. Bhamidipati, C. Mozak et al., "Westmere: A family of 32nm IA processors," in Proc. ISSCC, 2010.
- (2010) Proc. ISSCC
- Kurd, N.¹ Bhamidipati, S.² Mozak, C.³

36
- 79955893556
- CloudCache: Expanding and shrinking private caches
- H. Lee, S. Cho, and B. R. Childers, "CloudCache: Expanding and shrinking private caches," in Proc. HPCA-17, 2011.
- (2011) Proc. HPCA-17
- Lee, H.¹ Cho, S.² Childers, B.R.³

37
- 76749146060
- McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures
- S. Li, J. H. Ahn, R. Strong et al., "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO-42, 2009.
- (2009) MICRO-42
- Li, S.¹ Ahn, J.H.² Strong, R.³

38
- 57749186047
- Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
- J. Lin, Q. Lu, X. Ding et al., "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in Proc. HPCA-14, 2008.
- (2008) Proc. HPCA-14
- Lin, J.¹ Lu, Q.² Ding, X.³

39
- 84934313338
- Memory system performance in a NUMA multicore multiprocessor
- Z. Majo and T. R. Gross, "Memory system performance in a NUMA multicore multiprocessor," in Proc. ISMM, 2011.
- (2011) Proc. ISMM
- Majo, Z.¹ Gross, T.R.²

40
- 84864836765
- Probabilistic shared cache management (PriSM)
- R. Manikantan, K. Rajan, and R. Govindarajan, "Probabilistic shared cache management (PriSM)," in Proc. ISCA-39, 2012.
- (2012) Proc. ISCA-39
- Manikantan, R.¹ Rajan, K.² Govindarajan, R.³

41
- 35348900723
- Virtual hierarchies to support server consolidation
- M. Marty and M. Hill, "Virtual hierarchies to support server consolidation," in Proc. ISCA-34, 2007.
- (2007) Proc. ISCA-34
- Marty, M.¹ Hill, M.²

42
- 77952573440
- ESP-NUCA: A low-cost adaptive non-uniform cache architecture
- J. Merino, V. Puente, and J. Gregorio, "ESP-NUCA: A low-cost adaptive non-uniform cache architecture," in Proc. HPCA-16, 2010.
- (2010) Proc. HPCA-16
- Merino, J.¹ Puente, V.² Gregorio, J.³

43
- 84934272528
- Micron, "1. 35V DDR3L power calculator (4Gb x16 chips)," 2013.
- (2013) Micron 1. 35V DDR3L Power Calculator (4Gb x16 Chips)

44
- 70449655189
- FlexDCP: A QoS framework for CMP architectures
- M. Moreto, F. J. Cazorla, A. Ramirez et al., "FlexDCP: A QoS framework for CMP architectures," ACM SIGOPS Operating Systems Review, vol. 43, no. 2, 2009.
- (2009) ACM SIGOPS Operating Systems Review , vol.43 , Issue.2
- Moreto, M.¹ Cazorla, F.J.² Ramirez, A.³

45
- 84934313339
- A general constraint-centric scheduling framework for spatial architectures
- T. Nowatzki, M. Tarm, L. Carli et al., "A general constraint-centric scheduling framework for spatial architectures," in Proc. PLDI-34, 2013.
- (2013) Proc. PLDI-34
- Nowatzki, T.¹ Tarm, M.² Carli, L.³

46
- 77954780208
- The case for RAMClouds: Scalable high-performance storage entirely in DRAM
- J. Ousterhout, P. Agrawal, D. Erickson et al., "The case for RAMClouds: scalable high-performance storage entirely in DRAM," ACM SIGOPS Operating Systems Review, vol. 43, no. 4, 2010.
- (2010) ACM SIGOPS Operating Systems Review , vol.43 , Issue.4
- Ousterhout, J.¹ Agrawal, P.² Erickson, D.³

47
- 38549120069
- Partitioned cache architecture as a side-channel defence mechanism
- 2005/280
- D. Page, "Partitioned cache architecture as a side-channel defence mechanism," IACR Cryptology ePrint archive, no. 2005/280, 2005.
- (2005) IACR Cryptology EPrint Archive
- Page, D.¹

48
- 0031635603
- Congestion driven quadratic placement
- P. N. Parakh, R. B. Brown, and K. A. Sakallah, "Congestion driven quadratic placement," in Proc. DAC-35, 1998.
- (1998) Proc. DAC-35
- Parakh, P.N.¹ Brown, R.B.² Sakallah, K.A.³

49
- 77954949789
- Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures
- J. Park and W. Dally, "Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures," in Proc. SPAA-22, 2010.
- (2010) Proc. SPAA-22
- Park, J.¹ Dally, W.²

50
- 0000529292
- SCOTCH: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs
- F. Pellegrini and J. Roman, "SCOTCH: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs," in Proc. HPCN, 1996.
- (1996) Proc. HPCN
- Pellegrini, F.¹ Roman, J.²

51
- 64949187933
- Adaptive spill-receive for robust high-performance caching in cmps
- M. Qureshi, "Adaptive Spill-Receive for Robust High-Performance Caching in CMPs," in Proc. HPCA-10, 2009.
- (2009) Proc. HPCA-10
- Qureshi, M.¹

52
- 34548042910
- Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
- M. Qureshi and Y. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in Proc. MICRO-39, 2006.
- (2006) Proc. MICRO-39
- Qureshi, M.¹ Patt, Y.²

53
- 80052521720
- Vantage: Scalable and efficient fine-grain cache partitioning
- D. Sanchez and C. Kozyrakis, "Vantage: Scalable and Efficient Fine-Grain Cache Partitioning," in Proc. ISCA-38, 2011.
- (2011) Proc. ISCA-38
- Sanchez, D.¹ Kozyrakis, C.²

54
- 84881154274
- Zsim: Fast and accurate microarchitectural simulation of thousand-core systems
- D. Sanchez and C. Kozyrakis, "ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems," in ISCA-40, 2013.
- (2013) ISCA-40
- Sanchez, D.¹ Kozyrakis, C.²

55
- 0034443570
- Symbiotic jobscheduling for a simultaneous multithreading processor
- A. Snavely and D. M. Tullsen, "Symbiotic jobscheduling for a simultaneous multithreading processor," in Proc. ASPLOS-8, 2000.
- (2000) Proc. ASPLOS-8
- Snavely, A.¹ Tullsen, D.M.²

56
- 57749176037
- Managing shared L2 caches on multicore systems in software
- D. Tam, R. Azimi, L. Soares, and M. Stumm, "Managing shared L2 caches on multicore systems in software," in WIOSCA, 2007.
- (2007) WIOSCA
- Tam, D.¹ Azimi, R.² Soares, L.³ Stumm, M.⁴

57
- 47249165359
- Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors
- D. Tam, R. Azimi, and M. Stumm, "Thread clustering: sharing-aware scheduling on smp-cmp-smt multiprocessors," in Proc. Eurosys, 2007.
- (2007) Proc. Eurosys
- Tam, D.¹ Azimi, R.² Stumm, M.³

58
- 67649661466
- CACTI 5. 1
- S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi, "CACTI 5. 1," HP Labs, Tech. Rep. HPL-2008-20, 2008.
- (2008) HP Labs, Tech. Rep. HPL-2008-20
- Thoziyoor, S.¹ Muralimanohar, N.² Ahn, J.H.³ Jouppi, N.P.⁴

59
- 84934313342
- Asymmetry-aware execution placement on manycore chips
- A. Tumanov, J. Wise, O. Mutlu, and G. R. Ganger, "Asymmetry-aware execution placement on manycore chips," in SFMA-3, 2013.
- (2013) SFMA-3
- Tumanov, A.¹ Wise, J.² Mutlu, O.³ Ganger, G.R.⁴

60
- 0001957806
- Operating system support for improving data locality on CC-NUMA compute servers
- B. Verghese, S. Devine, A. Gupta, and M. Rosenblum, "Operating system support for improving data locality on CC-NUMA compute servers," in Proc. ASPLOS, 1996.
- (1996) Proc. ASPLOS
- Verghese, B.¹ Devine, S.² Gupta, A.³ Rosenblum, M.⁴

61
- 0003876878
- Kluwer Academic Publishers
- D. Wong, H. W. Leong, and C. L. Liu, Simulated annealing for VLSI design. Kluwer Academic Publishers, 1988.
- (1988) Simulated Annealing for VLSI Design
- Wong, D.¹ Leong, H.W.² Liu, C.L.³

62
- 80052529677
- A comparison of capacity management schemes for shared cmp caches
- C. Wu and M. Martonosi, "A Comparison of Capacity Management Schemes for Shared CMP Caches," in WDDD-7, 2008.
- (2008) WDDD-7
- Wu, C.¹ Martonosi, M.²

63
- 27544495466
- Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
- M. Zhang and K. Asanovic, "Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors," in ISCA, 2005.
- (2005) ISCA
- Zhang, M.¹ Asanovic, K.²

64
- 77952248898
- Addressing shared resource contention in multicore processors via scheduling
- S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing shared resource contention in multicore processors via scheduling," in Proc. ASPLOS, 2010
- (2010) Proc. ASPLOS
- Zhuravlev, S.¹ Blagodurov, S.² Fedorova, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.