SCOPUS 정보 검색 플랫폼

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

Volumn , Issue , 2012, Pages 33-42

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

(3) Pai, Sreepathi a Govindarajan, R a Thazhuthaveetil, Matthew J a

a INDIAN INSTITUTE OF SCIENCE (India)

Author keywords

Automatic; Data transfers; GPU; Memory management; Software coherence

Indexed keywords

AUTOMATIC; AUTOMATIC MEMORY MANAGEMENT; BENCHMARK SUITES; COMPILER ANALYSIS; COMPILER-ASSISTED; ERROR PRONES; GPU; MANUAL MEMORY-MANAGEMENT; MEMORY MANAGEMENT; MEMORY MANAGER; PERFORMANCE POTENTIALS; RODINIA; RUNTIMES;

DATA TRANSFER; PARALLEL ARCHITECTURES;

PROGRAM COMPILERS;

EID: 84867509022 PISSN: 1089795X EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2370816.2370824 Document Type: Conference Paper

Times cited : (40)

References (21)

1
- 79959456077
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
- M. M. Baskaran, U. Bondhugula, et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In PPoPP, 2008.
- (2008) PPoPP
- Baskaran, M.M.¹ Bondhugula, U.²

2
- 34249696738
- Parallel programmability and the chapel language
- B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21:291-312, 2007.
- (2007) International Journal of High Performance Computing Applications , vol.21 , pp. 291-312
- Chamberlain, B.L.¹ Callahan, D.² Zima, H.P.³

3
- 31744441529
- X10: An object-oriented approach to non-uniform cluster computing
- P. Charles, C. Grothoff, V. Saraswat, et al. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005.
- (2005) OOPSLA
- Charles, P.¹ Grothoff, C.² Saraswat, V.³

4
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
- (2009) IISWC
- Che, S.¹ Boyer, M.² Meng, J.³

5
- 33646009337
- Optimizing compiler for the CELL processor
- A. E. Eichenberger, K. O'Brien, et al. Optimizing compiler for the CELL processor. In PACT, 2005.
- (2005) PACT
- Eichenberger, A.E.¹ O'brien, K.²

6
- 77952251540
- An asymmetric distributed shared memory model for heterogeneous parallel systems
- I. Gelado, J. E. Stone, et al. An asymmetric distributed shared memory model for heterogeneous parallel systems. In ASPLOS, 2010.
- (2010) ASPLOS
- Gelado, I.¹ Stone, J.E.²

7
- 84867518389
- URL
- I. Gelado et al. GMAC: Global Memory for Accelerators (version 0.0.20). URL http://code.google.com/p/adsm/.
- GMAC: Global Memory for Accelerators (Version 0.0.20)
- Gelado, I.¹

8
- 79959904195
- Automatic CPU-GPU communication management and optimization
- T. B. Jablin, P. Prabhu, et al. Automatic CPU-GPU communication management and optimization. In PLDI, 2011.
- (2011) PLDI
- Jablin, T.B.¹ Prabhu, P.²

9
- 84863423999
- Dynamically managed data for CPU-GPU architectures
- March
- T. B. Jablin, J. A. Jablin, et al. Dynamically Managed Data for CPU-GPU architectures. In CGO, March 2012.
- (2012) CGO
- Jablin, T.B.¹ Jablin, J.A.²

10
- 74349092397
- URL
- Khronos. OpenCL: The open standard for parallel programming of heterogeneous systems. URL http://www.khronos.org/opencl.
- OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems

11
- 78650802947
- OpenMPC: Extended Open MP programming and tuning for GPUs
- S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP programming and tuning for GPUs. In SC, 2010.
- (2010) SC
- Lee, S.¹ Eigenmann, R.²

12
- 67650081010
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP, 2009.
- (2009) PPoPP
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

13
- 77954995885
- Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
- V. W. Lee, C. Kim, et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA, 2010.
- (2010) ISCA
- Lee, V.W.¹ Kim, C.²

14
- 84873052000
- URL
- NVIDIA. CUDA: Compute Unified Device Architecture. URL http://developer.nvidia.com/cuda.
- CUDA: Compute Unified Device Architecture

15
- 84856182857
- NVIDIA
- NVIDIA. NVIDIA CUDA C Programming Guide version 4.0. 2011.
- (2011) NVIDIA CUDA C Programming Guide Version 4.0

16
- 84867511421
- G. Paoloni. How to benchmark code execution times on Intel IA-32 and IA-64 instruction set architectures. 2010.
- (2010) How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures
- Paoloni, G.¹

17
- 70450263364
- Programming model for a heterogeneous x86 platform
- B. Saha, X. Zhou, et al. Programming model for a heterogeneous x86 platform. In PLDI, 2009.
- (2009) PLDI
- Saha, B.¹ Zhou, X.²

18
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- L. Seiler, D. Carmean, et al. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3), 2008.
- (2008) ACM Trans. Graph , vol.27 , Issue.3
- Seiler, L.¹ Carmean, D.²

19
- 84867410835
- Using the high productivity language chapel to target GPGPU architectures
- A. Sidelnik, B. L. Chamberlain, M. J. Garzaran, and D. Padua. Using the High Productivity Language Chapel to target GPGPU architectures. Technical report, UIUC Dept. of Computer Science, 2011.
- (2011) Technical Report, UIUC Dept. of Computer Science
- Sidelnik, A.¹ Chamberlain, B.L.² Garzaran, M.J.³ Padua, D.⁴

20
- 70350441970
- URL
- TOP500.org. The Top 500. URL http://www.top500.org/.
- TOP500.org

21
- 84867556832
- URL, x10 lang.org. X10 2.1 cuda
- x10 lang.org. X10 2.1 cuda. URL http://docs.codehaus.org/display/ XTENLANG/X10+2.1+CUDA.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.