-
1
-
-
84898061010
-
5.1 POWER8TM: A 12-core server-class processor in 22 nm SOI with 7.6 Tb/s off-chip bandwidth
-
San Francisco, CA, USA, 9–13 February
-
Fluhr, E.J.; Friedrich, J.; Dreps, D.; Zyuban, V.; Still, G.; Gonzalez, C.; Hall, A.; Hogenmiller, D.; Malgioglio, F.; Nett, R. et al. 5.1 POWER8TM: A 12-core server-class processor in 22 nm SOI with 7.6 Tb/s off-chip bandwidth. In Proceedings of the International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 96–97.
-
(2014)
Proceedings of the International Solid-State Circuits Conference (ISSCC)
, pp. 96-97
-
-
Fluhr, E.J.1
Friedrich, J.2
Dreps, D.3
Zyuban, V.4
Still, G.5
Gonzalez, C.6
Hall, A.7
Hogenmiller, D.8
Malgioglio, F.9
Nett, R.10
-
2
-
-
84898062900
-
5.9 Haswell: A family of IA 22 nm processors
-
San Francisco, CA, USA, 9–13 February
-
Kurd, N.; Chowdhury, M.; Burton, E.; Thomas, T.P.; Mozak, C.; Boswell, B.; Lal, M.; Deval, A.; Douglas, J.; Elassal, M. et al. 5.9 Haswell: A family of IA 22 nm processors. In Proceedings of the International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 112–113.
-
(2014)
Proceedings of the International Solid-State Circuits Conference (ISSCC
, pp. 112-113
-
-
Kurd, N.1
Chowdhury, M.2
Burton, E.3
Thomas, T.P.4
Mozak, C.5
Boswell, B.6
Lal, M.7
Deval, A.8
Douglas, J.9
Elassal, M.10
-
3
-
-
84965137094
-
-
NVIDIA. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi
-
NVIDIA. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi. 2009. Available online: http://goo.gl/X2AI0b (accessed on 27 April 2016).
-
(2009)
-
-
-
4
-
-
84965138205
-
-
NVIDIA. NVIDIA’s Next Generation CUDA Compute Architecture:Kepler GK110/210
-
NVIDIA. NVIDIA’s Next Generation CUDA Compute Architecture:Kepler GK110/210. 2014. Available online: http://goo.gl/qOSWW1 (accessed on 27 April 2016).
-
(2014)
-
-
-
6
-
-
84903384277
-
A survey of techniques for managing and leveraging caches in GPUs
-
Mittal, S. A survey of techniques for managing and leveraging caches in GPUs. J. Circuits Syst. Comput. 2014, 23, 229–236.
-
(2014)
J. Circuits Syst. Comput
, vol.23
, pp. 229-236
-
-
Mittal, S.1
-
7
-
-
84962881097
-
Real-Time GPU Computing: Cache or No Cache?
-
Auckland, New Zealand, 13–17 April
-
Huangfu, Y.; Zhang, W. Real-Time GPU Computing: Cache or No Cache? In Proceedings of the International Symposium on Real-Time Distributed Computing (ISORC), Auckland, New Zealand, 13–17 April 2015; pp. 182–189.
-
(2015)
Proceedings of the International Symposium on Real-Time Distributed Computing (ISORC)
, pp. 182-189
-
-
Huangfu, Y.1
Zhang, W.2
-
8
-
-
0024906840
-
Improving cache performance by selective cache bypass
-
Kailua-Kona, HI, USA, 3–6 January
-
Chi, C.H.; Dietz, H. Improving cache performance by selective cache bypass. In Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences, Kailua-Kona, HI, USA, 3–6 January 1989; Volume 1, pp. 277–285.
-
(1989)
Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences
, vol.1
, pp. 277-285
-
-
Chi, C.H.1
Dietz, H.2
-
9
-
-
0031353109
-
Design and performance evaluation of a cache assist to implement selective caching
-
Austin, TX, USA, 12–15 October
-
John, L.K.; Subramanian, A. Design and performance evaluation of a cache assist to implement selective caching. In Proceedings of the International Conference on Computer Design, Austin, TX, USA, 12–15 October 1997; pp. 510–518.
-
(1997)
Proceedings of the International Conference on Computer Design
, pp. 510-518
-
-
John, L.K.1
Subramanian, A.2
-
10
-
-
0033311745
-
Hardware identification of cache conflict misses
-
Haifa, Israel, 16–18 November
-
Collins, J.D.; Tullsen, D.M. Hardware identification of cache conflict misses. In Proceedings of the International Symposium on Microarchitecture, Haifa, Israel, 16–18 November 1999; pp. 126–135.
-
(1999)
Proceedings of the International Symposium on Microarchitecture
, pp. 126-135
-
-
Collins, J.D.1
Tullsen, D.M.2
-
11
-
-
84937704296
-
Adaptive cache management for energy-efficient GPU computing
-
Cambridge, UK, 13–17 December
-
Chen, X.; Chang, L.W.; Rodrigues, C.I.; Lv, J.; Wang, Z.; Hwu, W.M. Adaptive cache management for energy-efficient GPU computing. In Proceedings of the 47th International Symposium on Microarchitecture, Cambridge, UK, 13–17 December 2014; pp. 343–355.
-
(2014)
Proceedings of the 47Th International Symposium on Microarchitecture
, pp. 343-355
-
-
Chen, X.1
Chang, L.W.2
Rodrigues, C.I.3
Lv, J.4
Wang, Z.5
Hwu, W.M.6
-
12
-
-
84906813610
-
SBAC: A statistics based cache bypassing method for asymmetric-access caches
-
La Jolla, CA, USA, 11–13 August
-
Zhang, C.; Sun, G.; Li, P.; Wang, T.; Niu, D.; Chen, Y. SBAC: A statistics based cache bypassing method for asymmetric-access caches. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), La Jolla, CA, USA, 11–13 August 2014; pp. 345–350.
-
(2014)
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED)
, pp. 345-350
-
-
Zhang, C.1
Sun, G.2
Li, P.3
Wang, T.4
Niu, D.5
Chen, Y.6
-
13
-
-
84904004484
-
DASCA: Dead write prediction assisted STT-RAM cache architecture
-
Orlando, FL, USA, 15–19 February
-
Ahn, J.; Yoo, S.; Choi, K. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Orlando, FL, USA, 15–19 February 2014; pp. 25–36.
-
(2014)
Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA)
, pp. 25-36
-
-
Ahn, J.1
Yoo, S.2
Choi, K.3
-
14
-
-
84876561848
-
Improving cache management policies using dynamic reuse distances
-
Vancouver, BC, Canada, 1–5 December
-
Duong, N.; Zhao, D.; Kim, T.; Cammarota, R.; Valero, M.; Veidenbaum, A.V. Improving cache management policies using dynamic reuse distances. In Proceedings of the 45th International SymposiumonMicroarchitecture, Vancouver, BC, Canada, 1–5 December 2012; pp. 389–400.
-
(2012)
Proceedings of the 45Th International Symposiumonmicroarchitecture
, pp. 389-400
-
-
Duong, N.1
Zhao, D.2
Kim, T.3
Cammarota, R.4
Valero, M.5
Veidenbaum, A.V.6
-
15
-
-
84897572369
-
A Survey of Architectural Techniques For Improving Cache Power Efficiency
-
Mittal, S. A Survey of Architectural Techniques For Improving Cache Power Efficiency. Sustain. Comput. Inform. Syst. 2014, 4, 33–43.
-
(2014)
Sustain. Comput. Inform. Syst
, vol.4
, pp. 33-43
-
-
Mittal, S.1
-
16
-
-
0003003638
-
A study of replacement algorithms for a virtual-storage computer
-
Belady, L.A. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 1966, 5, 78–101.
-
(1966)
IBM Syst. J
, vol.5
, pp. 78-101
-
-
Belady, L.A.1
-
17
-
-
0026242244
-
Performance and the i860 microprocessor
-
Atkins, M. Performance and the i860 microprocessor. IEEE Micro 1991, 11, 24–27.
-
(1991)
IEEE Micro
, vol.11
, pp. 24-27
-
-
Atkins, M.1
-
18
-
-
84965160711
-
-
Intel Corporation. Intel 64 and IA-32 Architectures, Software Developer’s Manual, Instruction Set Reference, A-Z; Intel Corporation: Santa Clara, CA, USA
-
Intel Corporation. Intel 64 and IA-32 Architectures, Software Developer’s Manual, Instruction Set Reference, A-Z; Intel Corporation: Santa Clara, CA, USA, 2011; Volume 2.
-
(2011)
, vol.2
-
-
-
19
-
-
84965136367
-
-
NVIDIA Corporation, Version 4.2; NVIDIA Corporation: Santa Clara, CA, USA
-
NVIDIA Corporation. Parallel Thread Execution ISA Version 4.2; NVIDIA Corporation: Santa Clara, CA, USA, 2015.
-
(2015)
Parallel Thread Execution ISA
-
-
-
20
-
-
41149104074
-
Counter-based cache replacement and bypassing algorithms
-
Kharbutli, M.; Solihin, Y. Counter-based cache replacement and bypassing algorithms. IEEE Trans. Comput. 2008, 57, 433–447.
-
(2008)
IEEE Trans. Comput
, vol.57
, pp. 433-447
-
-
Kharbutli, M.1
Solihin, Y.2
-
21
-
-
80052536606
-
Bypass and insertion algorithms for exclusive last-level caches
-
San Jose, CA, USA, 4–8 June
-
Gaur, J.; Chaudhuri, M.; Subramoney, S. Bypass and insertion algorithms for exclusive last-level caches. In Proceedings of the 38 th International Symposium on Computer Architecture (ISCA), San Jose, CA, USA, 4–8 June 2011; pp. 81–92.
-
(2011)
Proceedings of the 38 Th International Symposium on Computer Architecture (ISCA)
, pp. 81-92
-
-
Gaur, J.1
Chaudhuri, M.2
Subramoney, S.3
-
22
-
-
84892550871
-
FlexiWay: A Cache Energy Saving Technique Using Fine-grained Cache Reconfiguration
-
Asheville, NC, USA, 6–9 October
-
Mittal, S.; Zhang, Z.; Vetter, J. FlexiWay: A Cache Energy Saving Technique Using Fine-grained Cache Reconfiguration. In Proceedings of the 31st IEEE International Conference on Computer Design (ICCD); Asheville, NC, USA, 6–9 October 2013.
-
(2013)
Proceedings of the 31St IEEE International Conference on Computer Design (ICCD)
-
-
Mittal, S.1
Zhang, Z.2
Vetter, J.3
-
23
-
-
84871651935
-
Energy savings via dead sub-block prediction
-
New York, NY, USA, 24–26 October
-
Alves, M.; Khubaib, K.; Ebrahimi, E.; Narasiman, V.; Villavieja, C.; Navaux, P.O.A.; Patt, Y.N. Energy savings via dead sub-block prediction. In Proceedings of the 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), New York, NY, USA, 24–26 October 2012; pp. 51–58.
-
(2012)
Proceedings of the 24Th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
, pp. 51-58
-
-
Alves, M.1
Khubaib, K.2
Ebrahimi, E.3
Narasiman, V.4
Villavieja, C.5
Navaux, P.O.A.6
Patt, Y.N.7
-
24
-
-
84928400002
-
EnCache: A Dynamic Profiling Based Reconfiguration Technique for Improving Cache Energy Efficiency
-
Mittal, S.; Zhang, Z. EnCache: A Dynamic Profiling Based Reconfiguration Technique for Improving Cache Energy Efficiency. J. Circuits Syst. Comput. 2014, 23, 1450147.
-
(2014)
J. Circuits Syst. Comput
, vol.23
, pp. 23
-
-
Mittal, S.1
Zhang, Z.2
-
25
-
-
84929352865
-
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches
-
Mittal, S.; Vetter, J.S.; Li, D. A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 1524–1537.
-
(2015)
IEEE Trans. Parallel Distrib. Syst
, vol.26
, pp. 1524-1537
-
-
Mittal, S.1
Vetter, J.S.2
Li, D.3
-
26
-
-
84963816640
-
A Survey of Power Management Techniques for Phase Change Memory
-
Mittal, S. A Survey of Power Management Techniques for Phase Change Memory. Int. J. Comput. Aided Eng. Technol. 2014.
-
(2014)
Int. J. Comput. Aided Eng. Technol
-
-
Mittal, S.1
-
27
-
-
84945941165
-
-
Technical Report ORNL/TM-2014/636; Oak Ridge National Laboratory: Oak Ridge, TN, USA
-
Mittal, S.; Poremba, M.; Vetter, J.; Xie, Y. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool; Technical Report ORNL/TM-2014/636; Oak Ridge National Laboratory: Oak Ridge, TN, USA, 2014.
-
(2014)
Exploring Design Space of 3D NVM and Edram Caches Using DESTINY Tool
-
-
Mittal, S.1
Poremba, M.2
Vetter, J.3
Xie, Y.4
-
28
-
-
84963787521
-
A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems
-
Mittal, S.; Vetter, J.S. A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 1537–1550.
-
(2016)
IEEE Trans. Parallel Distrib. Syst
, vol.27
, pp. 1537-1550
-
-
Mittal, S.1
Vetter, J.S.2
-
29
-
-
84885645578
-
OAP: An obstruction-aware cache management policy for STT-RAM last-level caches
-
Grenoble, France, 18–22 March
-
Wang, J.; Dong, X.; Xie, Y. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Conference on Design, Automation and Test in Europe, Grenoble, France, 18–22 March 2013; pp. 847–852.
-
(2013)
Proceedings of the Conference on Design, Automation and Test in Europe
, pp. 847-852
-
-
Wang, J.1
Dong, X.2
Xie, Y.3
-
31
-
-
84965119872
-
-
MD. AMD Graphics Cores Next (GCN) Architecture
-
AMD. AMD Graphics Cores Next (GCN) Architecture. 2012. Available online: https://goo.gl/NjNcDY (accessed on 27 April 2016).
-
(2012)
-
-
-
32
-
-
84966487021
-
Adaptive and Transparent Cache Bypassing for GPUs
-
Austin, TX, USA, 15–20 November
-
Li, A.; van den Braak, G.J.; Kumar, A.; Corporaal, H. Adaptive and Transparent Cache Bypassing for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Austin, TX, USA, 15–20 November 2015.
-
(2015)
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
-
-
Li, A.1
Van Den Braak, G.J.2
Kumar, A.3
Corporaal, H.4
-
34
-
-
84903985058
-
MRPB: Memory request prioritization for massively parallel processors
-
Orlando, FL, USA
-
Jia, W.; Shaw, K.; Martonosi, M. MRPB: Memory request prioritization for massively parallel processors. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Orlando, FL, USA, 15–19 February 2014; pp. 272–283.
-
(2014)
Proceedings of the 20Th International Symposium on High Performance Computer Architecture (HPCA)
, pp. 272-283
-
-
Jia, W.1
Shaw, K.2
Martonosi, M.3
-
35
-
-
84938849494
-
Adaptive GPU cache bypassing
-
San Francisco, CA, USA
-
Tian, Y.; Puthoor, S.; Greathouse, J.L.; Beckmann, B.M.; Jiménez, D.A. Adaptive GPU cache bypassing. In Proceedings of the 8thWorkshop on General Purpose Processing Using GPUs, San Francisco, CA, USA, 7 February 2015; pp. 25–35.
-
(2015)
Proceedings of the 8Thworkshop on General Purpose Processing Using Gpus
, pp. 25-35
-
-
Tian, Y.1
Puthoor, S.2
Greathouse, J.L.3
Beckmann, B.M.4
Jiménez, D.A.5
-
36
-
-
84867291675
-
Exploiting core working sets to filter the L1 cache with random sampling
-
Etsion, Y.; Feitelson, D.G. Exploiting core working sets to filter the L1 cache with random sampling. IEEE Trans. Comput. 2012, 61, 1535–1550.
-
(2012)
IEEE Trans. Comput
, vol.61
, pp. 1535-1550
-
-
Etsion, Y.1
Feitelson, D.G.2
-
37
-
-
84960122898
-
BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches
-
Portland, OR, USA, 13–17
-
Chou, C.; Jaleel, A.; Qureshi, M.K. BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches. In Proceedings of the 42nd International Symposium on Computer Architecture (ISCA), Portland, OR, USA, 13–17 June 2015.
-
(2015)
Proceedings of the 42Nd International Symposium on Computer Architecture (ISCA)
-
-
Chou, C.1
Jaleel, A.2
Qureshi, M.K.3
-
38
-
-
84934293443
-
Coordinated static and dynamic cache bypassing for GPUs
-
Burlingame, CA, USA, 7–11
-
Xie, X.; Liang, Y.; Wang, Y.; Sun, G.; Wang, T. Coordinated static and dynamic cache bypassing for GPUs. In Proceedings of the 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, CA, USA, 7–11 February 2015; pp. 76–88.
-
(2015)
Proceedings of the 21St International Symposium on High Performance Computer Architecture (HPCA)
, pp. 76-88
-
-
Xie, X.1
Liang, Y.2
Wang, Y.3
Sun, G.4
Wang, T.5
-
39
-
-
84867487115
-
Optimal bypass monitor for high performance last-level caches
-
Minneapolis, MN, USA, 19–23
-
Li, L.; Tong, D.; Xie, Z.; Lu, J.; Cheng, X. Optimal bypass monitor for high performance last-level caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, Minneapolis, MN, USA, 19–23 September 2012; pp. 315–324.
-
(2012)
Proceedings of the 21St International Conference on Parallel Architectures and Compilation Techniques
, pp. 315-324
-
-
Li, L.1
Tong, D.2
Xie, Z.3
Lu, J.4
Cheng, X.5
-
40
-
-
84894205278
-
SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches
-
Amman, Jordan, 3–5 Decembe
-
Kharbutli, M.; Jarrah, M.; Jararweh, Y. SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches. In Proceedings of the IEEE Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Amman, Jordan, 3–5 December 2013; pp. 1–6.
-
(2013)
Proceedings of the IEEE Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
, pp. 1-6
-
-
Kharbutli, M.1
Jarrah, M.2
Jararweh, Y.3
-
41
-
-
84903977516
-
Full system simulation framework for integrated CPU/GPU architecture
-
Hsinchu, Taiwan, 28–30 April
-
Wang, P.H.; Liu, G.H.; Yeh, J.C.; Chen, T.M.; Huang, H.Y.; Yang, C.L.; Liu, S.L.; Greensky, J. Full system simulation framework for integrated CPU/GPU architecture. In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 28–30 April 2014; pp. 1–4.
-
(2014)
Roceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT)
, pp. 1-4
-
-
Wang, P.H.1
Liu, G.H.2
Yeh, J.C.3
Chen, T.M.4
Huang, H.Y.5
Yang, C.L.6
Liu, S.L.7
Greensky, J.8
-
42
-
-
84939814753
-
Survey of CPU-GPU Heterogeneous Computing Techniques
-
Mittal, S.; Vetter, J. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. 2015, 47, 69:1–69:35.
-
(2015)
ACM Comput. Surv
, vol.47
-
-
Mittal, S.1
Vetter, J.A.2
-
43
-
-
84884874409
-
Adaptive cache bypassing for inclusive last level caches
-
Cambridge, MA, USA, 20–24 May
-
Gupta, S.; Gao, H.; Zhou, H. Adaptive cache bypassing for inclusive last level caches. In Proceedings of the International Symposium on Parallel & Distributed Processing (IPDPS), Cambridge, MA, USA, 20–24 May 2013; pp. 1243–1253.
-
(2013)
Roceedings of the International Symposium on Parallel & Distributed Processing (IPDPS)
, pp. 1243-1253
-
-
Gupta, S.1
Gao, H.2
Zhou, H.3
-
44
-
-
84960841884
-
Bypassing method for STT-RAM based inclusive last-level cache
-
Prague, Czech Republic, 9–12
-
Kim, M.K.; Choi, J.H.; Kwak, J.W.; Jhang, S.T.; Jhon, C.S. Bypassing method for STT-RAM based inclusive last-level cache. In Proceedings of the Conference on Research in Adaptive and Convergent Systems, Prague, Czech Republic, 9–12 October 2015; pp. 424–429.
-
(2015)
Proceedings of the Conference on Research in Adaptive and Convergent Systems
, pp. 424-429
-
-
Kim, M.K.1
Choi, J.H.2
Kwak, J.W.3
Jhang, S.T.4
Jhon, C.S.5
-
45
-
-
84867490427
-
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
-
Minneapolis, MN, USA, 19–23 September
-
Chaudhuri, M.; Gaur, J.; Bashyam, N.; Subramoney, S.; Nuzman, J. Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, Minneapolis, MN, USA, 19–23 September 2012; pp. 293–304.
-
(2012)
Proceedings of the 21St International Conference on Parallel Architectures and Compilation Techniques
, pp. 293-304
-
-
Chaudhuri, M.1
Gaur, J.2
Bashyam, N.3
Subramoney, S.4
Nuzman, J.5
-
46
-
-
2642585003
-
Using cache mapping to improve memory performance handheld devices
-
Austin, TX, USA, 10–12 March
-
Xu, R.; Li, Z. Using cache mapping to improve memory performance handheld devices. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, USA, 10–12 March 2004; pp. 106–114.
-
(2004)
Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS)
, pp. 106-114
-
-
Xu, R.1
Li, Z.2
-
47
-
-
84957538591
-
Locality-Driven Dynamic GPU Cache Bypassing
-
Newport Beach, CA, USA, 8–11 June
-
Li, C.; Song, S.L.; Dai, H.; Sidelnik, A.; Hari, S.K.S.; Zhou, H. Locality-Driven Dynamic GPU Cache Bypassing. In Proceedings of the International Conference on Supercomputing (ICS), Newport Beach, CA, USA, 8–11 June 2015.
-
(2015)
Proceedings of the International Conference on Supercomputing (ICS)
-
-
Li, C.1
Song, S.L.2
Dai, H.3
Sidelnik, A.4
Hari, S.K.S.5
Zhou, H.6
-
48
-
-
84960075571
-
A fully associative, tagless DRAM cache
-
Portland,OR,USA, 13–17 June
-
Lee, Y.; Kim, J.; Jang, H.; Yang, H.; Kim, J.; Jeong, J.; Lee, J.W. A fully associative, tagless DRAM cache. In Proceedings of the International Symposiumon ComputerArchitecture, Portland,OR,USA, 13–17 June 2015; pp. 211–222.
-
(2015)
Proceedings of the International Symposiumon Computerarchitecture
, pp. 211-222
-
-
Lee, Y.1
Kim, J.2
Jang, H.3
Yang, H.4
Kim, J.5
Jeong, J.6
Lee, J.W.7
-
49
-
-
70449729946
-
Less reused filter: Improving L2 cache performance via filtering less reused lines
-
NY, USA, 8–12 June
-
Xiang, L.; Chen, T.; Shi, Q.; Hu, W. Less reused filter: Improving L2 cache performance via filtering less reused lines. In Proceedings of the 23rd International conference on Supercomputing, Yorktown Heights, NY, USA, 8–12 June 2009; pp. 68–79.
-
(2009)
Roceedings of the 23Rd International Conference on Supercomputing, Yorktown Heights
, pp. 68-79
-
-
Xiang, L.1
Chen, T.2
Shi, Q.3
Hu, W.4
-
50
-
-
66749155879
-
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
-
Como, Italy, 8–12 November
-
Liu, H.; Ferdman, M.; Huh, J.; Burger, D. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the International Symposium on Microarchitecture, Como, Italy, 8–12 November 2008; pp. 222–233.
-
(2008)
Proceedings of the International Symposium on Microarchitecture
, pp. 222-233
-
-
Liu, H.1
Ferdman, M.2
Huh, J.3
Burger, D.4
-
51
-
-
84871174086
-
Enhancing LRU replacement via phantom associativity
-
New Orleans, LA, USA, 25 February
-
Feng, M.; Tian, C.; Gupta, R. Enhancing LRU replacement via phantom associativity. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT), New Orleans, LA, USA, 25 February 2012; pp. 9–16.
-
(2012)
Proceedings of the 16Th Workshop on Interaction between Compilers and Computer Architectures (INTERACT)
, pp. 9-16
-
-
Feng, M.1
Tian, C.2
Gupta, R.3
-
52
-
-
84899697182
-
Location-aware cache management for many-core processors with deep cache hierarchy
-
Denver, CO, USA, 17–22 November
-
Park, J.; Yoo, R.M.; Khudia, D.S.; Hughes, C.J.; Kim, D. Location-aware cache management for many-core processors with deep cache hierarchy. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 17–22 November 2013; p. 20.
-
(2013)
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
-
-
Park, J.1
Yoo, R.M.2
Khudia, D.S.3
Hughes, C.J.4
Kim, D.5
-
53
-
-
84903934670
-
Adaptive placement and migration policy for an STT-RAM-based hybrid cache
-
Orlando, FL, USA, 15–19 February
-
Wang, Z.; Jiménez, D.A.; Xu, C.; Sun, G.; Xie, Y. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA), Orlando, FL, USA, 15–19 February 2014; pp. 13–24.
-
(2014)
Proceedings of the 20Th International Symposium on High Performance Computer Architecture (HPCA)
, pp. 13-24
-
-
Wang, Z.1
Jiménez, D.A.2
Xu, C.3
Sun, G.4
Xie, Y.5
-
54
-
-
84862965853
-
Global Priority Table for Last-Level Caches
-
Sydney, Australia, 12–14 December
-
Yu, B.; Ma, J.; Chen, T.; Wu, M. Global Priority Table for Last-Level Caches. In Proceedings of the International Conference on Dependable, Autonomic and Secure Computing (DASC), Sydney, Australia, 12–14 December 2011; pp. 279–285.
-
(2011)
Proceedings of the International Conference on Dependable, Autonomic and Secure Computing (DASC)
, pp. 279-285
-
-
Yu, B.1
Ma, J.2
Chen, T.3
Wu, M.4
-
55
-
-
84946560845
-
SLIP: Reducing wire energy in the memory hierarchy
-
Portland, OR, USA, 13–17 June
-
Das, S.; Aamodt, T.M.; Dally, W.J. SLIP: Reducing wire energy in the memory hierarchy. In Proceedings of the International Symposium on Computer Architecture, Portland, OR, USA, 13–17 June 2015; pp. 349–361.
-
(2015)
Proceedings of the International Symposium on Computer Architecture
, pp. 349-361
-
-
Das, S.1
Aamodt, T.M.2
Dally, W.J.3
-
57
-
-
84948958301
-
Compiler managed micro-cache bypassing for high performance EPIC processors
-
Istanbul, Turkey, 18–22 November
-
Wu, Y.; Rakvic, R.; Chen, L.L.; Miao, C.C.; Chrysos, G.; Fang, J. Compiler managed micro-cache bypassing for high performance EPIC processors. In Proceedings of the 35th Annual IEEE International Symposium on Microarchitecture, Istanbul, Turkey, 18–22 November 2002; pp. 134–145.
-
(2002)
Proceedings of the 35Th Annual IEEE International Symposium on Microarchitecture
, pp. 134-145
-
-
Wu, Y.1
Rakvic, R.2
Chen, L.L.3
Miao, C.C.4
Chrysos, G.5
Fang, J.6
-
58
-
-
84938828873
-
Efficient utilization of GPGPU cache hierarchy
-
San Francisco, CA, USA, 7 February
-
Khairy, M.; Zahran, M.; Wassal, A.G. Efficient utilization of GPGPU cache hierarchy. In Proceedings of the 8thWorkshop on General Purpose Processing Using GPUs, San Francisco, CA, USA, 7 February 2015; pp. 36–47.
-
(2015)
Proceedings of the 8Thworkshop on General Purpose Processing Using Gpus
, pp. 36-47
-
-
Khairy, M.1
Zahran, M.2
Wassal, A.G.3
-
59
-
-
84961750961
-
Adaptive cache and concurrency allocation on GPGPUs
-
Zheng, Z.; Wang, Z.; Lipasti, M. Adaptive cache and concurrency allocation on GPGPUs. IEEE Comput. Archit. Lett. 2015, 14, 90–93.
-
(2015)
IEEE Comput. Archit. Lett
, vol.14
, pp. 90-93
-
-
Zheng, Z.1
Wang, Z.2
Lipasti, M.3
-
60
-
-
85027162994
-
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
-
San Francisco, CA, USA, 18–21 October
-
Ausavarungnirun, R.; Ghose, S.; Kayiran, O.; Loh, G.H.; Das, C.R.; Kandemir, M.T.; Mutlu, O. Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT), San Francisco, CA, USA, 18–21 October 2015.
-
(2015)
Proceedings of the International Conference on Parallel Architecture and Compilation (PACT)
-
-
Ausavarungnirun, R.1
Ghose, S.2
Kayiran, O.3
Loh, G.H.4
Das, C.R.5
Kandemir, M.T.6
Mutlu, O.7
-
61
-
-
0029508817
-
A modified approach to data cache management
-
Ann Arbor, MI, USA, 29 November–1 December
-
Tyson, G.; Farrens, M.; Matthews, J.; Pleszkun, A.R. A modified approach to data cache management. In Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, MI, USA, 29 November–1 December 1995; pp. 93–103.
-
(1995)
Proceedings of the 28Th Annual International Symposium on Microarchitecture
, pp. 93-103
-
-
Tyson, G.1
Farrens, M.2
Matthews, J.3
Pleszkun, A.R.4
-
62
-
-
84977080696
-
A Model-DrivenApproach toWarp/Thread-Block Level GPU Cache Bypassing
-
Austin, TX, USA, 5–9 June
-
Dai, H.; Gupta, S.; Li, C.; Kartsaklis, C.; Mantor, M; Zhou, H. A Model-DrivenApproach toWarp/Thread-Block Level GPU Cache Bypassing. In Proceedings of the Design Automation Conference (DAC), Austin, TX, USA, 5–9 June 2016.
-
(2016)
Proceedings of the Design Automation Conference (DAC)
-
-
Dai, H.1
Gupta, S.2
Li, C.3
Kartsaklis, C.4
Mantor, M.5
Zhou, H.6
-
63
-
-
84858775441
-
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
-
London, UK, 3 March
-
Choi, H.; Ahn, J.; Sung, W. Reducing off-chip memory traffic by selective cache management scheme in GPGPUs. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, London, UK, 3 March 2012; pp. 110–119.
-
(2012)
Proceedings of the 5Th Annual Workshop on General Purpose Processing with Graphics Processing Units
, pp. 110-119
-
-
Choi, H.1
Ahn, J.2
Sung, W.3
-
64
-
-
84905105251
-
Orchestrating cache management and memory scheduling for GPGPU applications
-
Mu, S.; Deng, Y.; Chen, Y.; Li, H.; Pan, J.; Zhang, W.; Wang, Z. Orchestrating cache management and memory scheduling for GPGPU applications. IEEE Trans. Very Large Scale Integr. Syst. 2014, 22, 1803–1814.
-
(2014)
IEEE Trans. Very Large Scale Integr. Syst
, vol.22
, pp. 1803-1814
-
-
Mu, S.1
Deng, Y.2
Chen, Y.3
Li, H.4
Pan, J.5
Zhang, W.6
Wang, Z.7
-
65
-
-
0030717768
-
Run-time adaptive cache hierarchy management via reference analysis
-
Denver, CO, USA, 1–4 June
-
Johnson, T.L.; Hwu, W.M.W. Run-time adaptive cache hierarchy management via reference analysis. In Proceedings of the International Symposium on Computer Architecture, Denver, CO, USA, 1–4 June 1997; Volume 25, pp. 315–326.
-
(1997)
Proceedings of the International Symposium on Computer Architecture
, vol.25
, pp. 315-326
-
-
Johnson, T.L.1
Hwu, W.M.W.2
-
66
-
-
66749173006
-
A novel approach to cache block reuse prediction
-
Kaohsiung, Taiwan, 6–9 October
-
Jalminger, J.; Stenström, P. A novel approach to cache block reuse prediction. In Proceedings of the 42nd International Conference on Parallel Processing, Kaohsiung, Taiwan, 6–9 October 2003; pp. 294–302.
-
(2003)
Proceedings of the 42Nd International Conference on Parallel Processing
, pp. 294-302
-
-
Jalminger, J.1
Stenström, P.2
-
67
-
-
84892456364
-
WADE:Writeback-aware dynamic cachemanagement forNVM-basedmainmemory system
-
Wang, Z.; Shan, S.; Cao, T.; Gu, J.; Xu, Y.; Mu, S.; Xie, Y.; Jiménez, D.A. WADE:Writeback-aware dynamic cachemanagement forNVM-basedmainmemory system. ACM Trans. Archit. Code Optim. 2013, 10, 51:1–51:21.
-
(2013)
ACM Trans. Archit. Code Optim
, vol.10
-
-
Wang, Z.1
Shan, S.2
Cao, T.3
Gu, J.4
Xu, Y.5
Mu, S.6
Xie, Y.7
Jiménez, D.A.8
-
68
-
-
84957553600
-
DaCache: Memory Divergence-Aware GPU Cache Management
-
Newport Beach, CA, USA, 8–11 June
-
Wang, B.; Yu, W.; Sun, X.H.; Wang, X. DaCache: Memory Divergence-Aware GPU Cache Management. In Proceedings of the 29th International Conference on Supercomputing, Newport Beach, CA, USA, 8–11 June 2015; pp. 89–98.
-
(2015)
Proceedings of the 29Th International Conference on Supercomputing
, pp. 89-98
-
-
Wang, B.1
Yu, W.2
Sun, X.H.3
Wang, X.4
-
69
-
-
84893396474
-
An Efficient Compiler Framework for Cache Bypassing on GPUs
-
San Jose, CA, USA, 18–21 November
-
Liang, Y.; Xie, X.; Sun, G.; Chen, D. An Efficient Compiler Framework for Cache Bypassing on GPUs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 18–21 November 2013.
-
(2013)
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
-
-
Liang, Y.1
Xie, X.2
Sun, G.3
Chen, D.4
-
70
-
-
34548810162
-
Load miss prediction-exploiting power performance trade-offs
-
Long Beach, CA, USA, 26–30 March
-
Malkowski, K.; Link, G.; Raghavan, P.; Irwin, M.J. Load miss prediction-exploiting power performance trade-offs. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, CA, USA, 26–30 March 2007; pp. 1–8.
-
(2007)
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS)
, pp. 1-8
-
-
Malkowski, K.1
Link, G.2
Raghavan, P.3
Irwin, M.J.4
-
71
-
-
0029204095
-
A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality
-
Barcelona, Spain, 3–7 July
-
González, A.; Aliagas, C.; Valero, M. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proceedings of the 9th International Conference on Supercomputing, Barcelona, Spain, 3–7 July 1995; pp. 338–347.
-
(1995)
Proceedings of the 9Th International Conference on Supercomputing
, pp. 338-347
-
-
González, A.1
Aliagas, C.2
Valero, M.3
-
72
-
-
84961121803
-
Technique For Improving Lifetime of Non-volatile Caches using Write-minimization
-
Mittal, S.; Vetter, J. A Technique For Improving Lifetime of Non-volatile Caches using Write-minimization. J. Low Power Electron. Appl. 2016, 6, 1.
-
(2016)
J. Low Power Electron. Appl
, vol.6
-
-
Mittal, S.1
Vetter, J.A.2
-
73
-
-
84965166416
-
-
Chan, K.K.; Hay, C.C.; Keller, J.R.; Kurpanek, G.P.; Schumacher, F.X.; Zheng, J. Design of the HP PA 7200 CPU. HP J. 1996.
-
(1996)
Design of the HP PA 7200 CPU. HP J
-
-
Chan, K.K.1
Hay, C.C.2
Keller, J.R.3
Kurpanek, G.P.4
Schumacher, F.X.5
Zheng, J.6
-
74
-
-
84965101162
-
-
Springer: New York, NY, USA
-
Karlsson, M.; Hagersten, E. Timestamp-based selective cache allocation. In High Performance Memory Systems; Springer: New York, NY, USA, 2004; pp. 43–59.
-
(2004)
Timestamp-Based Selective Cache Allocation. in High Performance Memory Systems
, pp. 43-59
-
-
Karlsson, M.1
Hagersten, E.2
-
75
-
-
84960893370
-
GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs
-
Lee, J.; Woo, D.H.; Kim, H.; Azimi, M. GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs. IEEE Trans. Comput. 2015, 64, 3167–3180.
-
(2015)
IEEE Trans. Comput
, vol.64
, pp. 3167-3180
-
-
Lee, J.1
Woo, D.H.2
Kim, H.3
Azimi, M.4
-
76
-
-
79951697650
-
Sampling dead block prediction for last-level caches
-
Atlanta, GA, USA, 4–8 December
-
Khan, S.; Tian, Y.; Jiménez, D. Sampling dead block prediction for last-level caches. In Proceedings of the International Symposium on Microarchitecture (MICRO), Atlanta, GA, USA, 4–8 December 2010; pp. 175–186.
-
(2010)
Proceedings of the International Symposium on Microarchitecture (MICRO)
, pp. 175-186
-
-
Khan, S.1
Tian, Y.2
Jiménez, D.3
-
77
-
-
84887456430
-
Managing shared last-level cache in a heterogeneousmulticore processor
-
Edinburgh, UK, 7–11 September
-
Mekkat, V.; Holey, A.; Yew, P.C.; Zhai, A. Managing shared last-level cache in a heterogeneousmulticore processor. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), Edinburgh, UK, 7–11 September 2013, pp. 225–234.
-
(2013)
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT)
, pp. 225-234
-
-
Mekkat, V.1
Holey, A.2
Yew, P.C.3
Zhai, A.4
-
79
-
-
84965105240
-
A Survey of Recent Prefetching Techniques for Processor Caches
-
Mittal, S. A Survey of Recent Prefetching Techniques for Processor Caches. ACM Comput. Surv. 2016.
-
(2016)
ACM Comput. Surv
-
-
Mittal, S.1
-
80
-
-
84905112592
-
MASTER: A multicore cache energy saving technique using dynamic cache reconfiguration
-
Mittal, S.; Cao, Y.; Zhang, Z. MASTER: A multicore cache energy saving technique using dynamic cache reconfiguration. IEEE Trans. Very Large Scale Integr. Syst. 2014, 22, 1653–1665.
-
(2014)
IEEE Trans. Very Large Scale Integr. Syst
, vol.22
, pp. 1653-1665
-
-
Mittal, S.1
Cao, Y.2
Zhang, Z.3
-
81
-
-
4143137693
-
Self-correcting LRU replacement policies
-
Ischia, Italy, 14–16 April
-
Kampe, M.; Stenstrom, P.; Dubois, M. Self-correcting LRU replacement policies. In Proceedings of the 1st Conference on Computing Frontiers, Ischia, Italy, 14–16 April 2004; pp. 181–191.
-
(2004)
Proceedings of the 1St Conference on Computing Frontiers
, pp. 181-191
-
-
Kampe, M.1
Stenstrom, P.2
Dubois, M.3
-
82
-
-
84946136069
-
Improve LLC Bypassing Performance by Memory Controller Improvements inHeterogeneousMulticore System
-
Hong Kong, 9–11 December
-
Ma, J.; Meng, J.; Chen, T.; Shi, Q.; Wu, M.; Liu, L. Improve LLC Bypassing Performance by Memory Controller Improvements inHeterogeneousMulticore System. In Proceedings of the International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Hong Kong, 9–11 December 2014; pp. 82–89.
-
(2014)
Proceedings of the International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)
, pp. 82-89
-
-
Ma, J.1
Meng, J.2
Chen, T.3
Shi, Q.4
Wu, M.5
Liu, L.6
-
83
-
-
84982084476
-
RACB: Resource Aware Cache Bypass on GPUs
-
Paris, France, 22–24 October
-
Dai, H.; Kartsaklis, C.; Li, C.; Janjusic, T.; Zhou, H. RACB: Resource Aware Cache Bypass on GPUs. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), Paris, France, 22–24 October 2014; pp. 24–29.
-
(2014)
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW
, pp. 24-29
-
-
Dai, H.1
Kartsaklis, C.2
Li, C.3
Janjusic, T.4
Zhou, H.5
-
84
-
-
84894459157
-
Shared Data Caches Conflicts Reduction for WCET Computation in Multi-Core Architectures
-
Toulouse, France, 4–5 Novermber
-
Lesage, B.; Hardy, D.; Puaut, I. Shared Data Caches Conflicts Reduction for WCET Computation in Multi-Core Architectures. In Proceedings of the 18th International Conference on Real-Time and Network Systems, Toulouse, France, 4–5 Novermber 2010; p. 2283.
-
(2010)
Proceedings of the 18Th International Conference on Real-Time and Network Systems
-
-
Lesage, B.1
Hardy, D.2
Puaut, I.3
-
85
-
-
77649302111
-
Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches
-
Washington, DC, USA, 1–4 December
-
Hardy, D.; Piquet, T.; Puaut, I. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In Proceedings of the 34th IEEE Real-Time Systems Symposium (RTSS), Washington, DC, USA, 1–4 December 2009; pp. 68–77.
-
(2009)
Roceedings of the 34Th IEEE Real-Time Systems Symposium (RTSS)
, pp. 68-77
-
-
Hardy, D.1
Piquet, T.2
Puaut, I.3
-
86
-
-
77954998134
-
High performance cache replacement using re-reference interval prediction (RRIP)
-
Saint-Malo, France, 19–23 June
-
Jaleel, A.; Theobald, K.B.; Steely, S.C., Jr.; Emer, J. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th International Symposium on Computer Architecture, Saint-Malo, France, 19–23 June 2010; pp. 60–71.
-
(2010)
Proceedings of the 37Th International Symposium on Computer Architecture
, pp. 60-71
-
-
Jaleel, A.1
Theobald, K.B.2
Steely, S.C.3
Emer, J.4
-
87
-
-
84965140743
-
-
Intel Corporation. Intel StrongARM SA-1110 Microprocessor Developer’s Manual; Intel Corporation: Santa Clara, CA, USA
-
Intel Corporation. Intel StrongARM SA-1110 Microprocessor Developer’s Manual; Intel Corporation: Santa Clara, CA, USA, 2000.
-
(2000)
-
-
-
88
-
-
84893396474
-
An efficient compiler framework for cache bypassing on GPUs
-
San Jose, CA, USA, 18–21 November
-
Xie, X.; Liang, Y.; Sun, G.; Chen, D. An efficient compiler framework for cache bypassing on GPUs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 18–21 November 2013; pp. 516–523.
-
(2013)
Proceedings of the International Conference on Computer-Aided Design (ICCAD)
, pp. 516-523
-
-
Xie, X.1
Liang, Y.2
Sun, G.3
Chen, D.4
-
89
-
-
84958770551
-
A survey of architectural techniques for managing process variation
-
Mittal, S. A survey of architectural techniques for managing process variation. ACM Comput. Surv. 2016, 48, Article No. 54.
-
(2016)
ACM Comput. Surv
-
-
Mittal, S.1
-
90
-
-
84963984095
-
A survey of techniques for approximate computing
-
Mittal, S. A survey of techniques for approximate computing. ACM Comput. Surv. 2016, 48, Article No. 62.
-
(2016)
ACM Comput. Surv
-
-
Mittal, S.1
|