메뉴 건너뛰기




Volumn , Issue , 2009, Pages 174-181

CUDA memory optimizations for large data-structures in the gravit simulator

Author keywords

CUDA; GPGPU; Memory layout; n body; Optimization

Indexed keywords

ACCESS PATTERNS; CUDA; GENERAL PURPOSE CPUS; GPU IMPLEMENTATION; GRAVITATIONAL FORCES; INSTRUCTION-LEVEL; LARGE DATA; LOOP UNROLLING; MEMORY HIERARCHY; MEMORY LAYOUT; MEMORY OPTIMIZATION; MEMORY USAGE; OPTIMIZING PROGRAMS; PERFORMANCE IMPROVEMENTS; PERFORMANCE OPTIMIZATIONS; PROGRAM OPTIMIZATION;

EID: 77949494317     PISSN: 15302016     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICPPW.2009.78     Document Type: Conference Paper
Times cited : (10)

References (19)
  • 1
    • 77949509509 scopus 로고    scopus 로고
    • home page, online
    • Gravit home page. [online]. http://gravit.slowchop.com.
    • Gravit
  • 2
    • 77949527681 scopus 로고    scopus 로고
    • Open64. http://www.open64.net.
    • Open64
  • 4
    • 33846349887 scopus 로고
    • A hierarchical O (N log N) force-calculation algorithm
    • J. Barnes and P. Hut. A hierarchical O (N log N) force-calculation algorithm. Nature, 324(6096):446-449, 1986.
    • (1986) Nature , vol.324 , Issue.6096 , pp. 446-449
    • Barnes, J.1    Hut, P.2
  • 5
    • 0032630166 scopus 로고    scopus 로고
    • T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 1999. ACM Press. Separate a class into hot class and coldclass.
    • T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 1999. ACM Press. Separate a class into "hot" class and "cold"class.
  • 6
    • 17244375796 scopus 로고    scopus 로고
    • T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 1-12, New York, NY, USA, 1999. ACM Press. (1) Organize tree-like data structure together in cache. (2) Allocate contemporary elements ina cache block as much as possible.
    • T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 1-12, New York, NY, USA, 1999. ACM Press. (1) Organize tree-like data structure together in cache. (2) Allocate contemporary elements ina cache block as much as possible.
  • 12
    • 31844446709 scopus 로고    scopus 로고
    • Automatic pool allocation: Improving performance by controlling data structure layout in the heap
    • C. Lattner and V. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. SIGPLAN Not., 40(6):129-142, 2005.
    • (2005) SIGPLAN Not , vol.40 , Issue.6 , pp. 129-142
    • Lattner, C.1    Adve, V.2
  • 13
    • 77949526544 scopus 로고    scopus 로고
    • C. NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 1.1 edition, 11 2007.
    • C. NVIDIA. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 1.1 edition, 11 2007.
  • 14
    • 0033076195 scopus 로고    scopus 로고
    • Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
    • February
    • P. Panda, H. Nakamura, N. Dutt, and A. Nicolau. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance. IEEE Trans. on Computers, 48(2):142-149, February 1999.
    • (1999) IEEE Trans. on Computers , vol.48 , Issue.2 , pp. 142-149
    • Panda, P.1    Nakamura, H.2    Dutt, N.3    Nicolau, A.4
  • 17
    • 0343462141 scopus 로고    scopus 로고
    • Automated Empirical Optimizations of Sofware and the ATLAS Project
    • R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27(1-2):3-35, 2001.
    • (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
    • Whaley, R.1    Petitet, A.2    Dongarra, J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.