SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Parallel Processing Workshops

Volumn , Issue , 2009, Pages 174-181

CUDA memory optimizations for large data-structures in the gravit simulator

(3) Siegel, Jakob a Ributzka, Juergen a Li, Xiaoming a

a UNIVERSITY OF DELAWARE (United States)

Author keywords

CUDA; GPGPU; Memory layout; n body; Optimization

Indexed keywords

ACCESS PATTERNS; CUDA; GENERAL PURPOSE CPUS; GPU IMPLEMENTATION; GRAVITATIONAL FORCES; INSTRUCTION-LEVEL; LARGE DATA; LOOP UNROLLING; MEMORY HIERARCHY; MEMORY LAYOUT; MEMORY OPTIMIZATION; MEMORY USAGE; OPTIMIZING PROGRAMS; PERFORMANCE IMPROVEMENTS; PERFORMANCE OPTIMIZATIONS; PROGRAM OPTIMIZATION;

COMPUTER GRAPHICS EQUIPMENT; GRAVITATION; PARALLEL ALGORITHMS; PROGRAM PROCESSORS; STRUCTURED PROGRAMMING;

OPTIMIZATION;

EID: 77949494317 PISSN: 15302016 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICPPW.2009.78 Document Type: Conference Paper

Times cited : (10)

References (19)

1
- 77949509509
- home page, online
- Gravit home page. [online]. http://gravit.slowchop.com.
- Gravit

2
- 77949527681
- Open64. http://www.open64.net.
- Open64

3
- 84900342836
- SPEComp: A new benchmark suite for measuring parallel computer performance
- V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. Jones, and B. Parady. SPEComp: A new benchmark suite for measuring parallel computer performance. Lecture Notes in Computer Science, pages 1-10, 2001.
- (2001) Lecture Notes in Computer Science , pp. 1-10
- Aslot, V.¹ Domeika, M.² Eigenmann, R.³ Gaertner, G.⁴ Jones, W.⁵ Parady, B.⁶

4
- 33846349887
- A hierarchical O (N log N) force-calculation algorithm
- J. Barnes and P. Hut. A hierarchical O (N log N) force-calculation algorithm. Nature, 324(6096):446-449, 1986.
- (1986) Nature , vol.324 , Issue.6096 , pp. 446-449
- Barnes, J.¹ Hut, P.²

5
- 0032630166
- T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 1999. ACM Press. Separate a class into hot class and coldclass.
- T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 1999. ACM Press. Separate a class into "hot" class and "cold"class.

6
- 17244375796
- T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 1-12, New York, NY, USA, 1999. ACM Press. (1) Organize tree-like data structure together in cache. (2) Allocate contemporary elements ina cache block as much as possible.
- T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 1-12, New York, NY, USA, 1999. ACM Press. (1) Organize tree-like data structure together in cache. (2) Allocate contemporary elements ina cache block as much as possible.

7
- 84976745804
- Tile Selection Using Cache Organization and Data Layout
- June
- S. Coleman and K. s. McKinley. Tile Selection Using Cache Organization and Data Layout. In Proc. of Int. Conference Programming Language Design and Implementation, pages 279-290, June 1995.
- (1995) Proc. of Int. Conference Programming Language Design and Implementation , pp. 279-290
- Coleman, S.¹ McKinley, K.S.²

8
- 66349127226
- Turning FPGAs Into Supercomputers
- A. Dellson, G. Sandberg, and S. Möhl. Turning FPGAs Into Supercomputers. Cray User Group, 2006.
- (2006) Cray User Group
- Dellson, A.¹ Sandberg, G.² Möhl, S.³

9
- 3042686026
- The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling
- P. Kisubi, P. Knijnenburg, and M. O'Boyle. The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling. In Proc. of the International Conference on Parallel Architectures and Compilation Techniques, pages 237-246, 2000.
- (2000) Proc. of the International Conference on Parallel Architectures and Compilation Techniques , pp. 237-246
- Kisubi, P.¹ Knijnenburg, P.² O'Boyle, M.³

10
- 0026137116
- The Cache Performance and Optimizations of Blocked Algorithms
- October
- M. Lam, E. Rothberg, and M. E. Wolf. The Cache Performance and Optimizations of Blocked Algorithms. In Proc. of the Int. conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 63-74, October 1991.
- (1991) Proc. of the Int. conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 63-74
- Lam, M.¹ Rothberg, E.² Wolf, M.E.³

11
- 44849094749
- chapter 32, GPU Gems. Addison-Wesly
- J. P. Lars Nyland, Mark Harris. Fast N-Body Simulation with CUDA, chapter 32, pages 677-695. GPU Gems. Addison-Wesly, 2007.
- (2007) Fast N-Body Simulation with CUDA , pp. 677-695
- Lars Nyland, J.P.¹ Harris, M.²

12
- 31844446709
- Automatic pool allocation: Improving performance by controlling data structure layout in the heap
- C. Lattner and V. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. SIGPLAN Not., 40(6):129-142, 2005.
- (2005) SIGPLAN Not , vol.40 , Issue.6 , pp. 129-142
- Lattner, C.¹ Adve, V.²

13
- 77949526544
- C. NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 1.1 edition, 11 2007.
- C. NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 1.1 edition, 11 2007.

14
- 0033076195
- Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
- February
- P. Panda, H. Nakamura, N. Dutt, and A. Nicolau. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance. IEEE Trans. on Computers, 48(2):142-149, February 1999.
- (1999) IEEE Trans. on Computers , vol.48 , Issue.2 , pp. 142-149
- Panda, P.¹ Nakamura, H.² Dutt, N.³ Nicolau, A.⁴

15
- 0031622954
- Data Transformations for Eliminating conflict Misses
- June
- G. Rivera and C. Tseng. Data Transformations for Eliminating conflict Misses. In Proc. of Int. Conference Programming Language Design and Implementation, pages 38-49, June 1998.
- (1998) Proc. of Int. Conference Programming Language Design and Implementation , pp. 38-49
- Rivera, G.¹ Tseng, C.²

16
- 43449094719
- Program optimization space pruning for a multithreaded gpu
- New York, NY, USA, ACM
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-m. W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, pages 195-204, New York, NY, USA, 2008. ACM.
- (2008) CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.-M.W.⁷

17
- 0343462141
- Automated Empirical Optimizations of Sofware and the ATLAS Project
- R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27(1-2):3-35, 2001.
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Whaley, R.¹ Petitet, A.² Dongarra, J.³

18
- 0003651470
- Addison-Wesley Longman Publishing Co, Inc. Boston, MA, USA
- M. Woo and M. Sheridan. OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1999.
- (1999) OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2
- Woo, M.¹ Sheridan, M.²

19
- 0038378242
- A Comparison of Empirical and Model-driven Optimization
- June
- K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzarán, D. Padua, K. Pingali, P. Stodghill, and P. Wu. A Comparison of Empirical and Model-driven Optimization. In Proc. of Programing Language Design and Implementation, pages 63-76, June 2003.
- (2003) Proc. of Programing Language Design and Implementation , pp. 63-76
- Yotov, K.¹ Li, X.² Ren, G.³ Cibulskis, M.⁴ DeJong, G.⁵ Garzarán, M.⁶ Padua, D.⁷ Pingali, K.⁸ Stodghill, P.⁹ Wu, P.¹⁰

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.