SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on High-Performance Computer Architecture

Volumn , Issue , 2014, Pages 272-283

MRPB: Memory request prioritization for massively parallel processors

(3) Jia, Wenhao a Shaw, Kelly A b Martonosi, Margaret a

a PRINCETON UNIVERSITY (United States)

b UNIVERSITY OF RICHMOND (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; PROGRAM PROCESSORS; SUPERCOMPUTERS;

CACHE MANAGEMENT TECHNIQUES; GPU PROGRAMMING; GRAPHICS PROCESSING UNITS; HARDWARE STRUCTURES; MASSIVELY PARALLEL PROCESSORS; MASSIVELY PARALLELS; MEMORY HIERARCHY; PERFORMANCE IMPACT;

CACHE MEMORY;

EID: 84903985058 PISSN: 15300897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCA.2014.6835938 Document Type: Conference Paper

Times cited : (142)

References (26)

1
- 70349169075
- Analyzing cuda workloads using a detailed gpu simulator
- A. Bakhoda et al. Analyzing CUDA workloads using a detailed GPU simulator. In Intl. Symp. on Performance Analysis of Systems and Software, 2009.
- (2009) Intl. Symp. on Performance Analysis of Systems and Software
- Bakhoda, A.¹

2
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In Intl. Symp. Workload Characterization, 2009.
- (2009) Intl. Symp. Workload Characterization
- Che, S.¹

3
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- S. Che, J.W. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Intl. Conf. High Performance Computing, Networking, Storage and Analysis, 2011.
- (2011) Intl. Conf. High Performance Computing, Networking, Storage and Analysis
- Che, S.¹ Sheaffer, J.W.² Skadron, K.³

4
- 79951707102
- Memory latency reduction via thread throttling
- H.-Y. Cheng et al. Memory latency reduction via thread throttling. In Intl. Symp. Microarchitecture, 2010.
- (2010) Intl. Symp. Microarchitecture
- Cheng, H.-Y.¹

5
- 84877083867
- Merrimac: Supercomputing with streams
- W. J. Dally et al. Merrimac: Supercomputing with streams. In ACM/IEEE Conf. Supercomputing, 2003.
- (2003) ACM/IEEE Conf. Supercomputing
- Dally, W.J.¹

6
- 53349090243
- A closer look at GPUs
- October
- K. Fatahalian and M. Houston. A closer look at GPUs. Communications of the ACM, 51(10):50-57, October 2008.
- (2008) Communications of the ACM , vol.51 , Issue.10 , pp. 50-57
- Fatahalian, K.¹ Houston, M.²

7
- 79955923056
- Thread block compaction for efficient SIMD control flow
- W. W. L. Fung and T. M. Aamodt. Thread block compaction for efficient SIMD control flow. In Intl. Symp. on High Performance Computer Architecture, 2011.
- (2011) Intl. Symp. on High Performance Computer Architecture
- Fung, W.W.L.¹ Aamodt, T.M.²

8
- 80052533471
- Energy-efficient mechanisms for managing thread context in throughput processors
- M. Gebhart et al. Energy-efficient mechanisms for managing thread context in throughput processors. In Intl. Symp. Computer Architecture, 2011.
- (2011) Intl. Symp. Computer Architecture
- Gebhart, M.¹

9
- 84870700173
- Auto-tuning a high-level language targeted to GPU codes
- S. Grauer-Gray et al. Auto-tuning a high-level language targeted to GPU codes. In Innovative Parallel Computing, 2012.
- (2012) Innovative Parallel Computing
- Grauer-Gray, S.¹

10
- 67650635164
- Many-core vs many-thread machines: Stay away from the valley
- January-June
- Z. Guz et al. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters, 8(1):25-28, January-June 2009.
- (2009) IEEE Computer Architecture Letters , vol.8 , Issue.1 , pp. 25-28
- Guz, Z.¹

11
- 0030677581
- The design and analysis of a cache architecture for texture mapping
- Z. S. Hakura and A. Gupta. The design and analysis of a cache architecture for texture mapping. In Intl. Symp. Computer Architecture, 1997.
- (1997) Intl. Symp. Computer Architecture
- Hakura, Z.S.¹ Gupta, A.²

12
- 77954998134
- High performance cache replacement using re-reference interval prediction (RRIP)
- A. Jaleel et al. High performance cache replacement using re-reference interval prediction (RRIP). In Intl. Symp. on Computer Architecture, 2010.
- (2010) Intl. Symp. on Computer Architecture
- Jaleel, A.¹

13
- 84864068497
- Characterizing and improving the use of demand-fetched caches in GPUs
- W. Jia, K. A. Shaw, and M. Martonosi. Characterizing and improving the use of demand-fetched caches in GPUs. In Intl. Conf. on Supercomputing, 2012.
- (2012) Intl. Conf. on Supercomputing
- Jia, W.¹ Shaw, K.A.² Martonosi, M.³

14
- 84904014280
- Many-thread aware prefetching mechanisms for GPGPU applications
- J. Lee et al. Many-thread aware prefetching mechanisms for GPGPU applications. In Intl. Symp. Microarchitecture, 2010.
- (2010) Intl. Symp. Microarchitecture
- Lee, J.¹

15
- 84860351946
- Tap: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture
- J. Lee and H. Kim. Tap: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Intl. Symp. on High Performance Computer Architecture, 2012.
- (2012) Intl. Symp. on High Performance Computer Architecture
- Lee, J.¹ Kim, H.²

16
- 77950987305
- Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
- J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In Intl. Conf. Computer Design, 2009.
- (2009) Intl. Conf. Computer Design
- Meng, J.¹ Skadron, K.²

17
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In Intl. Symp. Computer Architecture, 2010.
- (2010) Intl. Symp. Computer Architecture
- Meng, J.¹ Tarjan, D.² Skadron, K.³

18
- 77955655912
- Cacti 6.0: A tool to model large caches
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. Cacti 6.0: A tool to model large caches. Technical report, HP Laboratories, 2009.
- (2009) Technical Report, HP Laboratories
- Muralimanohar, N.¹ Balasubramonian, R.² Jouppi, N.P.³

19
- 84863342255
- Improving GPU performance via large warps and two-level warp scheduling
- V. Narasiman et al. Improving GPU performance via large warps and two-level warp scheduling. In Intl. Symp. Microarchitecture, 2011.
- (2011) Intl. Symp. Microarchitecture
- Narasiman, V.¹

20
- 84904014282
- NVIDIA Corp
- NVIDIA Corp. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009. v. 1.1.
- (2009) NVIDIA's Next Generation CUDA Compute Architecture: Fermi , vol.11

21
- 84904014283
- NVIDIA Corp
- NVIDIA Corp. Tuning CUDA Applications for Fermi, 2011. v. 1.5.
- (2011) Tuning CUDA Applications for Fermi , vol.1 , Issue.5

22
- 0003850954
- Prentice Hall, 2nd edition
- J. M. Rabaey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits: A Design Perspective, chapter 12. Prentice Hall, 2nd edition, 2003.
- (2003) Digital Integrated Circuits: A Design Perspective, Chapter 12
- Rabaey, J.M.¹ Chandrakasan, A.² Nikolic, B.³

23
- 0001583271
- Memory access scheduling
- S. Rixner et al. Memory access scheduling. In Intl. Symp. Computer Architecture, 2000.
- (2000) Intl. Symp. Computer Architecture
- Rixner, S.¹

24
- 84876590572
- Cacheconscious wavefront scheduling
- T. G. Rogers, M. O'Connor, and T. M. Aamodt. Cacheconscious wavefront scheduling. In Intl. Symp. Microarchitecture, 2012.
- (2012) Intl. Symp. Microarchitecture
- Rogers, T.G.¹ O'connor, M.² Aamodt, T.M.³

25
- 84870691946
- DL: A data layout transformation system for heterogeneous computing
- I.-J. Sung, D. Liu, and W.-M. Hwu. DL: A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing, 2012.
- (2012) Innovative Parallel Computing
- Sung, I.-J.¹ Liu, D.² Hwu, W.-M.³

26
- 80053044270
- Understanding the impact of CUDA tuning techniques for Fermi
- Y. Torres and A. Gonzales-Escribano. Understanding the impact of CUDA tuning techniques for Fermi. In Intl. Conf. on High Performance Computing and Simulation, 2011.
- (2011) Intl. Conf. on High Performance Computing and Simulation
- Torres, Y.¹ Gonzales-Escribano, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.