메뉴 건너뛰기




Volumn , Issue , 2010, Pages

5-D blocking optimization for stencil computations on modern CPUs and GPUs

Author keywords

[No Author keywords available]

Indexed keywords

BLOCKING ALGORITHMS; DATA-LEVEL PARALLELISM; LARGE CLASS; LATTICE BOLTZMANN METHOD; MEMORY BANDWIDTHS; MULTIPLE TIME STEP; NEAREST-NEIGHBORS; ON CHIP MEMORY; PRECISION FLOATING POINT; SPATIAL GRIDS; STENCIL COMPUTATIONS; TEMPORAL BLOCKING;

EID: 78650806116     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/SC.2010.2     Document Type: Conference Paper
Times cited : (260)

References (34)
  • 1
    • 48749141209 scopus 로고
    • Adaptive mesh refinement for hyperbolic partial differential equations
    • M. Berger and J. Oliger, "Adaptive mesh refinement for hyperbolic partial differential equations, "Journal of Computational Physics, vol. 53, no. 1, pp. 484-512, 1984.
    • (1984) Journal of Computational Physics , vol.53 , Issue.1 , pp. 484-512
    • Berger, M.1    Oliger, J.2
  • 2
    • 0000148916 scopus 로고
    • Salinity-driven thermocline transients in a wind- and thermohaline-forces isopycnic coordinate model of the north atlantic
    • R. Bleck, C. Rooth, D. Hu, and L. T. Smith, "Salinity-driven thermocline transients in a wind- and thermohaline-forces isopycnic coordinate model of the north atlantic, "Journal of Physical Oceanography, vol. 22, no. 12, pp. 1486-1505, 1992.
    • (1992) Journal of Physical Oceanography , vol.22 , Issue.12 , pp. 1486-1505
    • Bleck, R.1    Rooth, C.2    Hu, D.3    Smith, L.T.4
  • 4
    • 0028714453 scopus 로고
    • Multiresolution molecular dynamics for realistic materials modeling on parallel computers
    • A. Nakano, P. Vashishta, and R. K. Kalra, "Multiresolution molecular dynamics for realistic materials modeling on parallel computers, "Computer Physics Communications, vol. 83, no. 1, pp. 197-214, 1994.
    • (1994) Computer Physics Communications , vol.83 , Issue.1 , pp. 197-214
    • Nakano, A.1    Vashishta, P.2    Kalra, R.K.3
  • 6
    • 38849206150 scopus 로고    scopus 로고
    • Divide-and conquer density functional theory on hierarchical real-space grids: Parallel implementationa and applications
    • F. Shimojo, R. K. Kalia, A. Nakano, and P. Vashishta, "Divide-and conquer density functional theory on hierarchical real-space grids: parallel implementationa and applications, "Physical Review, vol. B, no. 77, pp. 1-12, 2008.
    • (2008) Physical Review , vol.B , Issue.77 , pp. 1-12
    • Shimojo, F.1    Kalia, R.K.2    Nakano, A.3    Vashishta, P.4
  • 10
    • 77953972043 scopus 로고    scopus 로고
    • Ph.D. dissertation, University of California, Berkeley, Dec, [Online]. Available
    • K. Datta, "Auto-tuning stencil codes for cache-based multicore platforms, "Ph.D. dissertation, EECS Department, University of California, Berkeley, Dec 2009. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/ TechRpts/2009/EECS-2009-177.html.
    • (2009) Auto-tuning Stencil Codes for Cache-based Multicore Platforms
    • Datta, K.1
  • 12
    • 33947307610 scopus 로고    scopus 로고
    • The memory behavior of cache oblivious stencil computations
    • M. Frigo and V. Strumpen, "The memory behavior of cache oblivious stencil computations, "J. Supercomput., vol. 39, no. 2, pp. 93-112, 2007.
    • (2007) J. Supercomput. , vol.39 , Issue.2 , pp. 93-112
    • Frigo, M.1    Strumpen, V.2
  • 18
    • 67650998701 scopus 로고    scopus 로고
    • Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms
    • S.Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms, "J. Parallel Distrib. Comput., vol. 69, no. 9, pp. 762-777, 2009.
    • (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.9 , pp. 762-777
    • Williams, S.1    Carter, J.2    Oliker, L.3    Shalf, J.4    Yelick, K.5
  • 19
    • 59749100826 scopus 로고    scopus 로고
    • Optimization and performance modeling of stencil computations on modern microprocessors
    • K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors, "SIAM Rev., vol. 51, no. 1, pp. 129-159, 2009.
    • (2009) SIAM Rev. , vol.51 , Issue.1 , pp. 129-159
    • Datta, K.1    Kamil, S.2    Williams, S.3    Oliker, L.4    Shalf, J.5    Yelick, K.6
  • 20
    • 1242352441 scopus 로고    scopus 로고
    • Optimization and profiling of the cache performance of parallel lattice boltzmann codes in 2d and 3d
    • T. Pohl, M. Kowarschik, J. Wilke, K. Iglberger, and U. Rde, "Optimization and profiling of the cache performance of parallel lattice boltzmann codes in 2d and 3d, "PARALLEL PROCESSING LETTERS, vol. 13, no. 4, pp. 549-560, 2003.
    • (2003) Parallel Processing Letters , vol.13 , Issue.4 , pp. 549-560
    • Pohl, T.1    Kowarschik, M.2    Wilke, J.3    Iglberger, K.4    Rde, U.5
  • 23
    • 79953269601 scopus 로고    scopus 로고
    • Efficient multicore-aware parallelization strategies for iterative stencil computations
    • Submitted to, vol. abs/1004.1741
    • J. Treibig, G. Wellein, and G. Hager, "Efficient multicore-aware parallelization strategies for iterative stencil computations, "Submitted to Computing Research Repository (CoRR), vol. abs/1004.1741, 2010.
    • (2010) Computing Research Repository (CoRR)
    • Treibig, J.1    Wellein, G.2    Hager, G.3
  • 26
    • 77949484883 scopus 로고    scopus 로고
    • Lbm based flow simulation using gpu computing processor
    • mesoscopic Methods in Engineering and Science, International Conferences on Mesoscopic Methods in Engineering and Science. [Online]. Available
    • F. Kuznik, C. Obrecht, G. Rusaouen, and J.-J. Roux, "Lbm based flow simulation using gpu computing processor, " Computers & Mathematics with Applications, vol. 59, no. 7, pp. 2380 - 2392, 2010, mesoscopic Methods in Engineering and Science, International Conferences on Mesoscopic Methods in Engineering and Science. [Online]. Available: http://www.sciencedirect.com/ science/article/B6TYJ-4X9D5D0-3/2/9e7676667251dd6bdc7ea63fbc0232a8.
    • (2010) Computers & Mathematics with Applications , vol.59 , Issue.7 , pp. 2380-2392
    • Kuznik, F.1    Obrecht, C.2    Rusaouen, G.3    Roux, J.-J.4
  • 28
    • 67349120241 scopus 로고    scopus 로고
    • Implementation of a latticeboltzmann method for numerical fluid mechanics using the nvidia cuda technology
    • E. Riegel, T. Indinger, and N. A. Adams, "Implementation of a latticeboltzmann method for numerical fluid mechanics using the nvidia cuda technology, "Computer Science - Research and Development, vol. 23, no. 3-4, pp. 241-247, 2009.
    • (2009) Computer Science - Research and Development , vol.23 , Issue.3-4 , pp. 241-247
    • Riegel, E.1    Indinger, T.2    Adams, N.A.3
  • 29
    • 72149122150 scopus 로고    scopus 로고
    • Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nvidia
    • J. Tolke, "Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nvidia, "Comput. Vis. Sci., vol. 13, no. 1, pp. 29-39, 2009.
    • (2009) Comput. Vis. Sci. , vol.13 , Issue.1 , pp. 29-39
    • Tolke, J.1
  • 30
    • 80052538275 scopus 로고    scopus 로고
    • When multicore isn't enough: Trends and the future for multi-multicore systems
    • M. Reilly, "When multicore isn't enough: Trends and the future for multi-multicore systems, "in HPEC, 2008.
    • (2008) HPEC
    • Reilly, M.1
  • 32
    • 35948991669 scopus 로고    scopus 로고
    • NVIDIA, [Online]. Available
    • NVIDIA, "NVIDIA CUDA TM Programming Guide, Version 3.0, "2010. [Online]. Available: http://download.intel.com/pressroom/kits/32nm/westmere/ Intel32nmOverview.pdf.
    • (2010) NVIDIA CUDA TM Programming Guide, Version 3.0
  • 33
    • 84976718540 scopus 로고
    • Algorithms for scalable synchronization on shared-memory multiprocessors
    • J. M. Mellor-Crummey and M. L. Scott, "Algorithms for scalable synchronization on shared-memory multiprocessors, "ACM Trans. Comput. Syst., vol. 9, no. 1, pp. 21-65, 1991.
    • (1991) ACM Trans. Comput. Syst. , vol.9 , Issue.1 , pp. 21-65
    • Mellor-Crummey, J.M.1    Scott, M.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.