메뉴 건너뛰기




Volumn , Issue , 2014, Pages 729-742

Ubik: Efficient cache sharing with strict QoS for latency-critical workloads

Author keywords

Cache partitioning; Interference; Isolation; Multicore; Quality of service; Resource management; Tail latency

Indexed keywords

CACHE PARTITIONING; ISOLATION; MULTI CORE; RESOURCE MANAGEMENT; TAIL LATENCY;

EID: 84897791436     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2541940.2541944     Document Type: Conference Paper
Times cited : (146)

References (61)
  • 2
    • 33947715600 scopus 로고    scopus 로고
    • IPC considered harmful for multiprocessor workloads
    • A. Alameldeen and D.Wood. IPC considered harmful for multiprocessor workloads. IEEE Micro, 26(4), 2006.
    • (2006) IEEE Micro , vol.26 , Issue.4
    • Alameldeen, A.1    Wood, D.2
  • 3
    • 47249127725 scopus 로고    scopus 로고
    • The case for energy-proportional computing
    • L. Barroso and U. Hölzle. The case for energy-proportional computing. IEEE Computer, 40(12):33-37, 2007.
    • (2007) IEEE Computer , vol.40 , Issue.12 , pp. 33-37
    • Barroso, L.1    Hölzle, U.2
  • 4
    • 84887440618 scopus 로고    scopus 로고
    • Jigsaw: Scalable software-defined caches
    • N. Beckmann and D. Sanchez. Jigsaw: Scalable Software-Defined Caches. In Proc. PACT-22, 2013.
    • (2013) Proc. PACT-22
    • Beckmann, N.1    Sanchez, D.2
  • 5
    • 84887501582 scopus 로고    scopus 로고
    • PACORA: Performance aware convex optimization for resource allocation
    • S. Bird and B. Smith. PACORA: Performance aware convex optimization for resource allocation. In Proc. HotPar-3, 2011.
    • (2011) Proc. HotPar , vol.3
    • Bird, S.1    Smith, B.2
  • 6
    • 84880270753 scopus 로고    scopus 로고
    • Power struggles: Revisiting the risc vs cisc debate on contemporary arm and x86 architectures
    • E. Blem, J. Menon, and K. Sankaralingam. Power Struggles: Revisiting the RISC vs CISC Debate on Contemporary ARM and x86 Architectures. In Proc. HPCA-16, 2013.
    • (2013) Proc. HPCA , vol.16
    • Blem, E.1    Menon, J.2    Sankaralingam, K.3
  • 7
    • 84883366263 scopus 로고    scopus 로고
    • A 22nm high performance embedded dram soc technology featuring tri-gate transistors and mimcap cob
    • R. Brain, A. Baran, N. Bisnik, et al. A 22nm High Performance Embedded DRAM SoC Technology Featuring Tri-Gate Transistors and MIMCAP COB. In Proc. of the Symposium on VLSI Technology, 2013.
    • (2013) Proc. of the Symposium on VLSI Technology
    • Brain, R.1    Baran, A.2    Bisnik, N.3
  • 8
    • 53549130720 scopus 로고    scopus 로고
    • Impact of cache partitioning on multi-tasking real time embedded systems
    • B. D. Bui, M. Caccamo, L. Sha, and J. Martinez. Impact of cache partitioning on multi-tasking real time embedded systems. In Proc. RTCSA-14, 2008.
    • (2008) Proc. RTCSA , vol.14
    • Bui, B.D.1    Caccamo, M.2    Sha, L.3    Martinez, J.4
  • 9
    • 0033683314 scopus 로고    scopus 로고
    • Application-specific memory management for embedded systems using software-controlled caches
    • D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In Proc. DAC-37, 2000.
    • (2000) Proc. DAC , vol.37
    • Chiou, D.1    Jain, P.2    Rudolph, L.3    Devadas, S.4
  • 10
    • 84881160871 scopus 로고    scopus 로고
    • A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness
    • H. Cook, M. Moreto, S. Bird, et al. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proc. ISCA-40, 2013.
    • (2013) Proc. ISCA , vol.40
    • Cook, H.1    Moreto, M.2    Bird, S.3
  • 12
    • 84875649537 scopus 로고    scopus 로고
    • Paragon: Qos-aware scheduling for heterogeneous datacenters
    • C. Delimitrou and C. Kozyrakis. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proc. ASPLOS-18, 2013.
    • (2013) Proc. ASPLOS , vol.18
    • Delimitrou, C.1    Kozyrakis, C.2
  • 13
    • 77952285828 scopus 로고    scopus 로고
    • Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems
    • E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proc. ASPLOS-15, 2010.
    • (2010) Proc. ASPLOS , vol.15
    • Ebrahimi, E.1    Lee, C.J.2    Mutlu, O.3    Patt, Y.N.4
  • 14
    • 34249813667 scopus 로고    scopus 로고
    • A performance counter architecture for computing accurate CPI components
    • S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A performance counter architecture for computing accurate CPI components. In Proc. ASPLOS-12, 2006.
    • (2006) Proc. ASPLOS , vol.12
    • Eyerman, S.1    Eeckhout, L.2    Karkhanis, T.3    Smith, J.E.4
  • 15
    • 84858791438 scopus 로고    scopus 로고
    • Clearing the clouds: A study of emerging scale-out workloads on modern hardware
    • M. Ferdman, A. Adileh, O. Kocberber, et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proc. ASPLOS-17, 2012.
    • (2012) Proc. ASPLOS , vol.17
    • Ferdman, M.1    Adileh, A.2    Kocberber, O.3
  • 16
    • 80052522708 scopus 로고    scopus 로고
    • Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees
    • B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. In Proc. ISCA-38, 2011.
    • (2011) Proc. ISCA , vol.38
    • Grot, B.1    Hestness, J.2    Keckler, S.W.3    Mutlu, O.4
  • 17
    • 47349085427 scopus 로고    scopus 로고
    • A framework for providing quality of service in chip multi-processors
    • F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In Proc. MICRO-40, 2007.
    • (2007) Proc. MICRO , vol.40
    • Guo, F.1    Solihin, Y.2    Zhao, L.3    Iyer, R.4
  • 18
    • 70350601187 scopus 로고    scopus 로고
    • Reactive NUCA: Near-optimal block placement and replication in distributed caches
    • N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proc. ISCA-36, 2009.
    • (2009) Proc. ISCA , vol.36
    • Hardavellas, N.1    Ferdman, M.2    Falsafi, B.3    Ailamaki, A.4
  • 19
    • 84910129119 scopus 로고    scopus 로고
    • FIESTA: A sample-balanced multi-program workload methodology
    • A. Hilton, N. Eswaran, and A. Roth. FIESTA: A sample-balanced multi-program workload methodology. In MoBS, 2009.
    • (2009) MoBS
    • Hilton, A.1    Eswaran, N.2    Roth, A.3
  • 20
    • 47349095214 scopus 로고    scopus 로고
    • QoS policies and architecture for cache/memory in CMP platforms
    • R. Iyer, L. Zhao, F. Guo, et al. QoS policies and architecture for cache/memory in CMP platforms. In Proc. SIGMETRICS, 2007.
    • (2007) Proc. SIGMETRICS
    • Iyer, R.1    Zhao, L.2    Guo, F.3
  • 21
    • 84863550145 scopus 로고    scopus 로고
    • A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
    • M. K. Jeong, M. Erez, C. Sudanthi, and N. Paver. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proc. DAC-49, 2012.
    • (2012) Proc. DAC , vol.49
    • Jeong, M.K.1    Erez, M.2    Sudanthi, C.3    Paver, N.4
  • 22
    • 84860338234 scopus 로고    scopus 로고
    • Network congestion avoidance through speculative reservation
    • N. Jiang, D. Becker, G. Michelogiannakis, and W. Dally. Network congestion avoidance through speculative reservation. In Proc. HPCA-18, 2012.
    • (2012) Proc. HPCA , vol.18
    • Jiang, N.1    Becker, D.2    Michelogiannakis, G.3    Dally, W.4
  • 23
    • 70349141254 scopus 로고    scopus 로고
    • Shore-MT: A scalable storage manager for the multicore era
    • R. Johnson, I. Pandis, N. Hardavellas, et al. Shore-MT: A scalable storage manager for the multicore era. In Proc. EDBT-12, 2009.
    • (2009) Proc. EDBT , vol.12
    • Johnson, R.1    Pandis, I.2    Hardavellas, N.3
  • 24
    • 84870557554 scopus 로고    scopus 로고
    • Chronos: Predictable low latency for data center applications
    • R. Kapoor, G. Porter, M. Tewari, et al. Chronos: predictable low latency for data center applications. In Proc. SoCC-3, 2012.
    • (2012) Proc. SoCC , vol.3
    • Kapoor, R.1    Porter, G.2    Tewari, M.3
  • 25
    • 79951718838 scopus 로고    scopus 로고
    • Thread cluster memory scheduling: Exploiting differences in memory access behavior
    • Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proc. MICRO-43, 2010.
    • (2010) Proc. MICRO , vol.43
    • Kim, Y.1    Papamichael, M.2    Mutlu, O.3    Harchol-Balter, M.4
  • 26
    • 85110867932 scopus 로고    scopus 로고
    • Moses: Open source toolkit for statistical machine translation
    • P. Koehn, H. Hoang, A. Birch, et al. Moses: Open source toolkit for statistical machine translation. In Proc. ACL-45, 2007.
    • (2007) Proc. ACL , vol.45
    • Koehn, P.1    Hoang, H.2    Birch, A.3
  • 28
    • 84897787167 scopus 로고    scopus 로고
    • PRETI: Partitioned REal-TIme shared cache for mixed-criticality real-time systems
    • B. Lesage, I. Puaut, and A. Seznec. PRETI: Partitioned REal-TIme shared cache for mixed-criticality real-time systems. In Proc. ICRTNS-20, 2012.
    • (2012) Proc. ICRTNS , vol.20
    • Lesage, B.1    Puaut, I.2    Seznec, A.3
  • 30
    • 84977144248 scopus 로고    scopus 로고
    • Refining the utility metric for utilitybased cache partitioning
    • X. Lin and R. Balasubramonian. Refining the utility metric for utilitybased cache partitioning. In Proc. WDDD, 2011.
    • (2011) Proc. WDDD
    • Lin, X.1    Balasubramonian, R.2
  • 31
    • 85092783412 scopus 로고    scopus 로고
    • Tessellation: Space-time partitioning in a manycore client OS
    • R. Liu, K. Klues, S. Bird, et al. Tessellation: Space-time partitioning in a manycore client OS. In Proc. HotPar-1, 2009.
    • (2009) Proc. HotPar , vol.1
    • Liu, R.1    Klues, K.2    Bird, S.3
  • 32
    • 84860592643 scopus 로고    scopus 로고
    • Cache craftiness for fast multicore key-value storage
    • Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proc. EuroSys-7, 2012.
    • (2012) Proc. EuroSys , vol.7
    • Mao, Y.1    Kohler, E.2    Morris, R.T.3
  • 33
    • 84858783719 scopus 로고    scopus 로고
    • Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations
    • J. Mars, L. Tang, R. Hundt, et al. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proc. MICRO-44, 2011.
    • (2011) Proc. MICRO , vol.44
    • Mars, J.1    Tang, L.2    Hundt, R.3
  • 34
    • 84885629106 scopus 로고    scopus 로고
    • Stochastic queuing simulation for data center workloads
    • D. Meisner and T. F. Wenisch. Stochastic queuing simulation for data center workloads. EXERT, 2010.
    • (2010) EXERT
    • Meisner, D.1    Wenisch, T.F.2
  • 35
    • 67650078267 scopus 로고    scopus 로고
    • PowerNap: Eliminating server idle power
    • D. Meisner, B. Gold, and T. Wenisch. PowerNap: Eliminating server idle power. Proc. ASPLOS-14, 2009.
    • (2009) Proc. ASPLOS , vol.14
    • Meisner, D.1    Gold, B.2    Wenisch, T.3
  • 36
    • 85084163128 scopus 로고    scopus 로고
    • Eliminating receive livelock in an interrupt-driven kernel
    • J. Mogul and K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. In Proc. USENIX ATC, 1996.
    • (1996) Proc. USENIX ATC
    • Mogul, J.1    Ramakrishnan, K.2
  • 40
    • 77954780208 scopus 로고    scopus 로고
    • The case for RAMClouds: Scalable high-performance storage entirely in DRAM
    • J. Ousterhout, P. Agrawal, D. Erickson, et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM. SIGOPS Operat-ing Systems Review, 43(4), 2010.
    • (2010) SIGOPS Operat-ing Systems Review , vol.43 , Issue.4
    • Ousterhout, J.1    Agrawal, P.2    Erickson, D.3
  • 41
    • 34548304615 scopus 로고    scopus 로고
    • Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison
    • I. Puaut and C. Pais. Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison. In Proc. DATE, 2007.
    • (2007) Proc. DATE
    • Puaut, I.1    Pais, C.2
  • 42
    • 34548042910 scopus 로고    scopus 로고
    • Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches
    • M. Qureshi and Y. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In Proc. MICRO-39, 2006.
    • (2006) Proc. MICRO , vol.39
    • Qureshi, M.1    Patt, Y.2
  • 43
    • 77954977639 scopus 로고    scopus 로고
    • Web search using mobile cores: Quantifying and mitigating the price of efficiency
    • V. Reddi, B. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proc. ISCA-37, 2010.
    • (2010) Proc. ISCA , vol.37
    • Reddi, V.1    Lee, B.2    Chilimbi, T.3    Vaid, K.4
  • 44
    • 79951696261 scopus 로고    scopus 로고
    • The zcache: Decoupling ways and associativity
    • D. Sanchez and C. Kozyrakis. The ZCache: Decoupling Ways and Associativity. In Proc. MICRO-43, 2010.
    • (2010) Proc. MICRO , vol.43
    • Sanchez, D.1    Kozyrakis, C.2
  • 45
    • 80052521720 scopus 로고    scopus 로고
    • Vantage: Scalable and efficient fine-grain cache partitioning
    • D. Sanchez and C. Kozyrakis. Vantage: Scalable and Efficient Fine-Grain Cache Partitioning. In Proc. ISCA-38, 2011.
    • (2011) Proc. ISCA , vol.38
    • Sanchez, D.1    Kozyrakis, C.2
  • 46
    • 84881154274 scopus 로고    scopus 로고
    • ZSim: Fast and accurate microarchitectural simulation of thousand-core systems
    • D. Sanchez and C. Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In Proc. ISCA-40, 2013.
    • (2013) Proc. ISCA , vol.40
    • Sanchez, D.1    Kozyrakis, C.2
  • 48
    • 0027307814 scopus 로고
    • A case for two-way skewed-associative caches
    • A. Seznec. A case for two-way skewed-associative caches. In Proc. ISCA-20, 1993.
    • (1993) Proc. ISCA , vol.20
    • Seznec, A.1
  • 49
    • 84892655102 scopus 로고    scopus 로고
    • METE: Meeting end-to-end QoS in multicores through system-wide resource management
    • A. Sharifi, S. Srikantaiah, A. Mishra, et al. METE: meeting end-to-end QoS in multicores through system-wide resource management. In Proc. SIGMETRICS, 2011.
    • (2011) Proc. SIGMETRICS
    • Sharifi, A.1    Srikantaiah, S.2    Mishra, A.3
  • 50
    • 77952200539 scopus 로고    scopus 로고
    • A 40nm 16-core 128-thread CMT SPARC SoC processor
    • J. Shin, K. Tam, D. Huang, et al. A 40nm 16-core 128-thread CMT SPARC SoC processor. In ISSCC, 2010.
    • (2010) ISSCC
    • Shin, J.1    Tam, K.2    Huang, D.3
  • 51
    • 0034443570 scopus 로고    scopus 로고
    • Symbiotic jobscheduling for a simultaneous multithreading processor
    • A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proc. ASPLOS-8, 2000.
    • (2000) Proc. ASPLOS , vol.8
    • Snavely, A.1    Tullsen, D.M.2
  • 52
    • 76749118968 scopus 로고    scopus 로고
    • SHARP control: Controlled shared cache management in chip multiprocessors
    • S. Srikantaiah, M. Kandemir, and Q. Wang. SHARP control: Controlled shared cache management in chip multiprocessors. In MICRO-42, 2009.
    • (2009) MICRO , vol.42
    • Srikantaiah, S.1    Kandemir, M.2    Wang, Q.3
  • 54
    • 84875673650 scopus 로고    scopus 로고
    • ReQoS: Reactive static/dynamic compilation for qos in warehouse scale computers
    • L. Tang, J. Mars, W. Wang, et al. ReQoS: Reactive Static/Dynamic Compilation for QoS in Warehouse Scale Computers. In Proc. ASPLOS-18, 2013.
    • (2013) Proc. ASPLOS , vol.18
    • Tang, L.1    Mars, J.2    Wang, W.3
  • 55
    • 79959879840 scopus 로고    scopus 로고
    • C4: The continuously concurrent compacting collector
    • G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. In Proc. ISMM, 2011.
    • (2011) Proc. ISMM
    • Tene, G.1    Iyengar, B.2    Wolf, M.3
  • 57
    • 0346935130 scopus 로고    scopus 로고
    • Data caches in multitasking hard realtime systems
    • X. Vera, B. Lisper, and J. Xue. Data caches in multitasking hard realtime systems. In Proc. RTSS-24, 2003.
    • (2003) Proc. RTSS , vol.24
    • Vera, X.1    Lisper, B.2    Xue, J.3
  • 58
    • 77952179543 scopus 로고    scopus 로고
    • The implementation of POWER7: A highly parallel and scalable multi-core high-end server processor
    • D.Wendel, R. Kalla, R. Cargoni, et al. The implementation of POWER7: A highly parallel and scalable multi-core high-end server processor. In ISSCC, 2010.
    • (2010) ISSCC
    • Wendel, D.1    Kalla, R.2    Cargoni, R.3
  • 59
    • 70450279102 scopus 로고    scopus 로고
    • PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches
    • Y. Xie and G. H. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. ISCA-36, 2009.
    • (2009) Proc. ISCA , vol.36
    • Xie, Y.1    Loh, G.H.2
  • 60
    • 84881190996 scopus 로고    scopus 로고
    • Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers
    • H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. In Proc. ISCA-40, 2013.
    • (2013) Proc. ISCA , vol.40
    • Yang, H.1    Breslow, A.2    Mars, J.3    Tang, L.4
  • 61
    • 85077083345 scopus 로고    scopus 로고
    • Hardware execution throttling for multi-core resource management
    • X. Zhang, S. Dwarkadas, and K. Shen. Hardware execution throttling for multi-core resource management. In Proc. of USENIX ATC, 2009
    • (2009) Proc. of USENIX ATC
    • Zhang, X.1    Dwarkadas, S.2    Shen, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.