SCOPUS 정보 검색 플랫폼

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010

Volumn , Issue , 2010, Pages

5-D blocking optimization for stencil computations on modern CPUs and GPUs

(5) Nguyen, Anthony a Satish, Nadathur a Chhugani, Jatin a Kim, Changkyu a Dubey, Pradeep a

a INTEL CORPORATION (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BLOCKING ALGORITHMS; DATA-LEVEL PARALLELISM; LARGE CLASS; LATTICE BOLTZMANN METHOD; MEMORY BANDWIDTHS; MULTIPLE TIME STEP; NEAREST-NEIGHBORS; ON CHIP MEMORY; PRECISION FLOATING POINT; SPATIAL GRIDS; STENCIL COMPUTATIONS; TEMPORAL BLOCKING;

ALGORITHMS; BANDWIDTH;

PROGRAM PROCESSORS;

EID: 78650806116 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SC.2010.2 Document Type: Conference Paper

Times cited : (260)

References (34)

1
- 48749141209
- Adaptive mesh refinement for hyperbolic partial differential equations
- M. Berger and J. Oliger, "Adaptive mesh refinement for hyperbolic partial differential equations, "Journal of Computational Physics, vol. 53, no. 1, pp. 484-512, 1984.
- (1984) Journal of Computational Physics , vol.53 , Issue.1 , pp. 484-512
- Berger, M.¹ Oliger, J.²

2
- 0000148916
- Salinity-driven thermocline transients in a wind- and thermohaline-forces isopycnic coordinate model of the north atlantic
- R. Bleck, C. Rooth, D. Hu, and L. T. Smith, "Salinity-driven thermocline transients in a wind- and thermohaline-forces isopycnic coordinate model of the north atlantic, "Journal of Physical Oceanography, vol. 22, no. 12, pp. 1486-1505, 1992.
- (1992) Journal of Physical Oceanography , vol.22 , Issue.12 , pp. 1486-1505
- Bleck, R.¹ Rooth, C.² Hu, D.³ Smith, L.T.⁴

3
- 70350630432
- A multilevel parallelization framework for high-order stencil computations
- H. Dursun, K. ichi Nomura, L. Peng, R. Seymour, W.Wang, R. K. Kalia, A. Nakano, and P. Vashishta, "A multilevel parallelization framework for high-order stencil computations, "in Euro-Par, 2009, pp. 642-653.
- (2009) Euro-Par , pp. 642-653
- Dursun, H.¹ Nomura, K.I.² Peng, L.³ Seymour, R.⁴ Wang, W.⁵ Kalia, R.K.⁶ Nakano, A.⁷ Vashishta, P.⁸

4
- 0028714453
- Multiresolution molecular dynamics for realistic materials modeling on parallel computers
- A. Nakano, P. Vashishta, and R. K. Kalra, "Multiresolution molecular dynamics for realistic materials modeling on parallel computers, "Computer Physics Communications, vol. 83, no. 1, pp. 197-214, 1994.
- (1994) Computer Physics Communications , vol.83 , Issue.1 , pp. 197-214
- Nakano, A.¹ Vashishta, P.² Kalra, R.K.³

5
- 34548752231
- Towards optimal multi-level tiling for stencil computations
- L. Renganarayanan, M. Harthikote-Matha, R. Dewri, and S. V. Rajopadhye, "Towards optimal multi-level tiling for stencil computations, "in IPDPS, 2007, pp. 1-10.
- (2007) IPDPS , pp. 1-10
- Renganarayanan, L.¹ Harthikote-Matha, M.² Dewri, R.³ Rajopadhye, S.V.⁴

6
- 38849206150
- Divide-and conquer density functional theory on hierarchical real-space grids: Parallel implementationa and applications
- F. Shimojo, R. K. Kalia, A. Nakano, and P. Vashishta, "Divide-and conquer density functional theory on hierarchical real-space grids: parallel implementationa and applications, "Physical Review, vol. B, no. 77, pp. 1-12, 2008.
- (2008) Physical Review , vol.B , Issue.77 , pp. 1-12
- Shimojo, F.¹ Kalia, R.K.² Nakano, A.³ Vashishta, P.⁴

7
- 77954729232
- "Intel Advanced Vector Extensions Programming Reference, "http://softwarecommunity.intel.com/isn/downloads/intelavx/ Intel-AVX-Programming-Reference-31943302.pdf 2008.
- (2008) Intel Advanced Vector Extensions Programming Reference

8
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- August
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual computing, "ACM Trans. Graph., vol. 27, no. 3, pp. 1-15, August 2008.
- (2008) ACM Trans. Graph. , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

9
- 77954966585
- [Online]. Available
- Nikolaj Leischner and Vitaly Osipov and Peter Sanders, "Fermi Architecture White Paper, "2009. [Online]. Available: http://www.nvidia. com/content/PDF/fermiwhitepapers/NVIDIAFermiComputeArchitectureWhitepaper.pdf.
- (2009) Fermi Architecture White Paper
- Leischner, N.¹ Osipov, V.² Sanders, P.³

10
- 77953972043
- Ph.D. dissertation, University of California, Berkeley, Dec, [Online]. Available
- K. Datta, "Auto-tuning stencil codes for cache-based multicore platforms, "Ph.D. dissertation, EECS Department, University of California, Berkeley, Dec 2009. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/ TechRpts/2009/EECS-2009-177.html.
- (2009) Auto-tuning Stencil Codes for Cache-based Multicore Platforms
- Datta, K.¹

11
- 70350771127
- Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
- Piscataway, NJ, USA: IEEE Press
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, "in SC'08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008, pp. 1-12.
- (2008) SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing , pp. 1-12
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

12
- 33947307610
- The memory behavior of cache oblivious stencil computations
- M. Frigo and V. Strumpen, "The memory behavior of cache oblivious stencil computations, "J. Supercomput., vol. 39, no. 2, pp. 93-112, 2007.
- (2007) J. Supercomput. , vol.39 , Issue.2 , pp. 93-112
- Frigo, M.¹ Strumpen, V.²

13
- 78650849839
- Enabling temporal blocking a lattice boltzmann flow solver through multicore-aware wavefront parallelization
- J. Habich, T. Zeiser, G. Hager, and G. Wellein, "Enabling temporal blocking a lattice boltzmann flow solver through multicore-aware wavefront parallelization, "21st International Conference on Parallel Computational Fluid Dynamics, pp. 178-182, 2009.
- (2009) 21st International Conference on Parallel Computational Fluid Dynamics , pp. 178-182
- Habich, J.¹ Zeiser, T.² Hager, G.³ Wellein, G.⁴

14
- 34547500808
- Implicit and explicit optimizations for stencil computations
- New York, NY, USA: ACM
- S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Implicit and explicit optimizations for stencil computations, "in MSPC'06: Proceedings of the 2006 workshop on Memory system performance and correctness. New York, NY, USA: ACM, 2006, pp. 51-60.
- (2006) MSPC'06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness , pp. 51-60
- Kamil, S.¹ Datta, K.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

15
- 67650671606
- 3d finite difference computation on gpus using cuda
- New York, NY, USA: ACM
- P. Micikevicius, "3d finite difference computation on gpus using cuda, "in GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York, NY, USA: ACM, 2009, pp. 79-84.
- (2009) GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 79-84
- Micikevicius, P.¹

16
- 78649765479
- Tiling optimizations for 3d scientific computations
- Washington, DC, USA: IEEE Computer Society
- G. Rivera and C.-W. Tseng, "Tiling optimizations for 3d scientific computations, "in Supercomputing'00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM). Washington, DC, USA: IEEE Computer Society, 2000, p. 32.
- (2000) Supercomputing'00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM) , pp. 32
- Rivera, G.¹ Tseng, C.-W.²

17
- 70449657442
- Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
- Washington, DC, USA: IEEE Computer Society
- G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske, "Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization, "in COMPSAC'09: Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference. Washington, DC, USA: IEEE Computer Society, 2009, pp. 579-586.
- (2009) COMPSAC'09: Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference , pp. 579-586
- Wellein, G.¹ Hager, G.² Zeiser, T.³ Wittmann, M.⁴ Fehske, H.⁵

18
- 67650998701
- Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms
- S.Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms, "J. Parallel Distrib. Comput., vol. 69, no. 9, pp. 762-777, 2009.
- (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.9 , pp. 762-777
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

19
- 59749100826
- Optimization and performance modeling of stencil computations on modern microprocessors
- K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors, "SIAM Rev., vol. 51, no. 1, pp. 129-159, 2009.
- (2009) SIAM Rev. , vol.51 , Issue.1 , pp. 129-159
- Datta, K.¹ Kamil, S.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

20
- 1242352441
- Optimization and profiling of the cache performance of parallel lattice boltzmann codes in 2d and 3d
- T. Pohl, M. Kowarschik, J. Wilke, K. Iglberger, and U. Rde, "Optimization and profiling of the cache performance of parallel lattice boltzmann codes in 2d and 3d, "PARALLEL PROCESSING LETTERS, vol. 13, no. 4, pp. 549-560, 2003.
- (2003) Parallel Processing Letters , vol.13 , Issue.4 , pp. 549-560
- Pohl, T.¹ Kowarschik, M.² Wilke, J.³ Iglberger, K.⁴ Rde, U.⁵

21
- 34250216007
- Scientific computing kernels on the cell processor
- S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husb, and K. Yelick, "Scientific computing kernels on the cell processor, "International Journal of Parallel Programming, vol. 35, p. 2007, 2007.
- (2007) International Journal of Parallel Programming , vol.35 , pp. 2007
- Williams, S.¹ Shalf, J.² Oliker, L.³ Kamil, S.⁴ Husb, P.⁵ Yelick, K.⁶

22
- 77954056084
- Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
- M. Wittmann, G. Hager, and G. Wellein, "Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory, "LSPP10: Workshop on Large-Scale Parallel Processing at IPDPS, 2010.
- (2010) LSPP10: Workshop on Large-Scale Parallel Processing at IPDPS
- Wittmann, M.¹ Hager, G.² Wellein, G.³

23
- 79953269601
- Efficient multicore-aware parallelization strategies for iterative stencil computations
- Submitted to, vol. abs/1004.1741
- J. Treibig, G. Wellein, and G. Hager, "Efficient multicore-aware parallelization strategies for iterative stencil computations, "Submitted to Computing Research Repository (CoRR), vol. abs/1004.1741, 2010.
- (2010) Computing Research Repository (CoRR)
- Treibig, J.¹ Wellein, G.² Hager, G.³

24
- 77951435761
- Accelerating lattice boltzmann fluid flow simulations using graphics processors
- Vienna, Austria
- P. Bailey, J. Myre, S. Walsh, D. Lilja, and M. Saar, "Accelerating lattice boltzmann fluid flow simulations using graphics processors, "in ICPP- 2009: 38th International Conference on Parallel Processing, Vienna, Austria, 2009.
- (2009) ICPP- 2009: 38th International Conference on Parallel Processing
- Bailey, P.¹ Myre, J.² Walsh, S.³ Lilja, D.⁴ Saar, M.⁵

25
- 70449378728
- Implementing the lattice boltzmann model on commodity graphics hardware
- June
- A. Kaufman, Z. Fan, and K. Petkov, "Implementing the lattice boltzmann model on commodity graphics hardware, " Journal of Statistical Mechanics: Theory and Experiment, vol. 2009, June 2009.
- (2009) Journal of Statistical Mechanics: Theory and Experiment , vol.2009
- Kaufman, A.¹ Fan, Z.² Petkov, K.³

26
- 77949484883
- Lbm based flow simulation using gpu computing processor
- mesoscopic Methods in Engineering and Science, International Conferences on Mesoscopic Methods in Engineering and Science. [Online]. Available
- F. Kuznik, C. Obrecht, G. Rusaouen, and J.-J. Roux, "Lbm based flow simulation using gpu computing processor, " Computers & Mathematics with Applications, vol. 59, no. 7, pp. 2380 - 2392, 2010, mesoscopic Methods in Engineering and Science, International Conferences on Mesoscopic Methods in Engineering and Science. [Online]. Available: http://www.sciencedirect.com/ science/article/B6TYJ-4X9D5D0-3/2/9e7676667251dd6bdc7ea63fbc0232a8.
- (2010) Computers & Mathematics with Applications , vol.59 , Issue.7 , pp. 2380-2392
- Kuznik, F.¹ Obrecht, C.² Rusaouen, G.³ Roux, J.-J.⁴

27
- 51849160421
- Parallel lattice boltzmann flow simulation on emerging multi-core platforms
- Berlin, Heidelberg: Springer-Verlag
- L. Peng, K.-I. Nomura, T. Oyakawa, R. K. Kalia, A. Nakano, and P. Vashishta, "Parallel lattice boltzmann flow simulation on emerging multi-core platforms, "in Euro-Par'08: Proceedings of the 14th international Euro-Par conference on Parallel Processing. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 763-777.
- (2008) Euro-Par'08: Proceedings of the 14th International Euro-Par Conference on Parallel Processing , pp. 763-777
- Peng, L.¹ Nomura, K.-I.² Oyakawa, T.³ Kalia, R.K.⁴ Nakano, A.⁵ Vashishta, P.⁶

28
- 67349120241
- Implementation of a latticeboltzmann method for numerical fluid mechanics using the nvidia cuda technology
- E. Riegel, T. Indinger, and N. A. Adams, "Implementation of a latticeboltzmann method for numerical fluid mechanics using the nvidia cuda technology, "Computer Science - Research and Development, vol. 23, no. 3-4, pp. 241-247, 2009.
- (2009) Computer Science - Research and Development , vol.23 , Issue.3-4 , pp. 241-247
- Riegel, E.¹ Indinger, T.² Adams, N.A.³

29
- 72149122150
- Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nvidia
- J. Tolke, "Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nvidia, "Comput. Vis. Sci., vol. 13, no. 1, pp. 29-39, 2009.
- (2009) Comput. Vis. Sci. , vol.13 , Issue.1 , pp. 29-39
- Tolke, J.¹

30
- 80052538275
- When multicore isn't enough: Trends and the future for multi-multicore systems
- M. Reilly, "When multicore isn't enough: Trends and the future for multi-multicore systems, "in HPEC, 2008.
- (2008) HPEC
- Reilly, M.¹

31
- 51849135153
- "Intel SSE4 programming reference, "2007, http://www.intel.com/ design/processor/manuals/253667.pdf.
- (2007) Intel SSE4 Programming Reference

32
- 35948991669
- NVIDIA, [Online]. Available
- NVIDIA, "NVIDIA CUDA TM Programming Guide, Version 3.0, "2010. [Online]. Available: http://download.intel.com/pressroom/kits/32nm/westmere/ Intel32nmOverview.pdf.
- (2010) NVIDIA CUDA TM Programming Guide, Version 3.0

33
- 84976718540
- Algorithms for scalable synchronization on shared-memory multiprocessors
- J. M. Mellor-Crummey and M. L. Scott, "Algorithms for scalable synchronization on shared-memory multiprocessors, "ACM Trans. Comput. Syst., vol. 9, no. 1, pp. 21-65, 1991.
- (1991) ACM Trans. Comput. Syst. , vol.9 , Issue.1 , pp. 21-65
- Mellor-Crummey, J.M.¹ Scott, M.L.²

34
- 78650843310
- Intel Corporation
- Intel Corporation, "Introduction to Intel's 32nm Process Technology, "2009.
- (2009) Introduction to Intel's 32nm Process Technology

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.