SCOPUS 정보 검색 플랫폼

Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Volumn , Issue , 2011, Pages

Extracting ultra-scale lattice boltzmann performance via hierarchical and distributed auto-tuning

(4) Williams, Samuel a Oliker, Leonid a Carter, Jonathan a Shalf, John a

a LAWRENCE BERKELEY NATIONAL LABORATORY (United States)

Author keywords

Auto tuning; Bluegene; Hybrid programming models; Lattice boltzmann; OpenMP; SIMD

Indexed keywords

AUTOTUNING; BLUEGENE; HYBRID PROGRAMMING MODEL; LATTICE BOLTZMANN; OPENMP; SIMD;

APPLICATION PROGRAMMING INTERFACES (API); COMPUTER SOFTWARE SELECTION AND EVALUATION; MAGNETOHYDRODYNAMICS; NUMERICAL METHODS; OPTIMIZATION;

PARALLEL PROGRAMMING;

EID: 83155188480 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2063384.2063458 Document Type: Conference Paper

Times cited : (34)

References (39)

1
- 26344468007
- A model for collisional processes in gases I: Small amplitude processes in charged and neutral one-component systems
- P. Bhatnagar, E. Gross, and M. Krook. A model for collisional processes in gases I: small amplitude processes in charged and neutral one-component systems. Phys. Rev., 94:511, 1954.
- (1954) Phys. Rev. , vol.94 , pp. 511
- Bhatnagar, P.¹ Gross, E.² Krook, M.³

2
- 1242267320
- Cambridge University Press
- D. Biskamp. Magnetohydrodynamic Turbulence. Cambridge University Press, 2003.
- (2003) Magnetohydrodynamic Turbulence
- Biskamp, D.¹

3
- 84899683182
- Magnetohydrodynamic turbulence simulations on the earth simulator using the lattice Boltzmann method
- Seattle, WA
- J. Carter, M. Soe, L. Oliker, Y. Tsuda, G. Vahala, L. Vahala, and A. Macnab. Magnetohydrodynamic turbulence simulations on the earth simulator using the lattice Boltzmann method. In SC05, Seattle, WA, 2005.
- (2005) SC05
- Carter, J.¹ Soe, M.² Oliker, L.³ Tsuda, Y.⁴ Vahala, G.⁵ Vahala, L.⁶ Macnab, A.⁷

4
- 77953980209
- Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
- Atlanta, Georgia
- A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In Interational Conference on Parallel and Distributed Computing Systems (IPDPS), Atlanta, Georgia, 2010.
- (2010) Interational Conference on Parallel and Distributed Computing Systems (IPDPS)
- Chandramowlishwaran, A.¹ Williams, S.² Oliker, L.³ Lashuk, I.⁴ Biros, G.⁵ Vuduc, R.⁶

5
- 70449959487
- CHiLL: A framework for composing high-level loop transformations
- June
- C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California, June 2008.
- (2008) Technical Report 08-897, University of Southern California
- Chen, C.¹ Chame, J.² Hall, M.³

6
- 59749100826
- Optimization and performance modeling of stencil computations on modern microprocessors
- K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. A. Yelick. Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review, 51(1):129-159, 2009.
- (2009) SIAM Review , vol.51 , Issue.1 , pp. 129-159
- Datta, K.¹ Kamil, S.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.A.⁶

7
- 70350771127
- Stencil computation optimization and autotuning on state-of-the-art multicore architectures
- nov
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and autotuning on state-of-the-art multicore architectures. In Proc. SC2008: High performance computing, networking, and storage conference, nov 2008.
- (2008) Proc. SC2008: High Performance Computing, Networking, and Storage Conference
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

8
- 84971423310
- Auto-tuning the 27-point stencil for multicore
- K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Auto-tuning the 27-point stencil for multicore. In In Proc. iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning, 2009.
- (2009) Proc. IWAPT2009: The Fourth International Workshop on Automatic Performance Tuning
- Datta, K.¹ Williams, S.² Volkov, V.³ Carter, J.⁴ Oliker, L.⁵ Shalf, J.⁶ Yelick, K.⁷

9
- 0037054259
- Lattice kinetic schemes for magnetohydrodynamics
- P. Dellar. Lattice kinetic schemes for magnetohydrodynamics. J. Comput. Phys., 79, 2002.
- (2002) J. Comput. Phys. , vol.79
- Dellar, P.¹

10
- 0031636309
- FFTW: An adaptive software architecture for the FFT
- IEEE
- M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, volume 3, pages 1381-1384. IEEE, 1998.
- (1998) Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing , vol.3 , pp. 1381-1384
- Frigo, M.¹ Johnson, S.G.²

11
- 34547539633
- Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations
- Boston, MA
- M. Frigo and V. Strumpen. Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations. In Proc. of the 19th ACM International Conference on Supercomputing (ICS05), Boston, MA, 2005.
- (2005) Proc. of the 19th ACM International Conference on Supercomputing (ICS05)
- Frigo, M.¹ Strumpen, V.²

12
- 77954022347
- An auto-tuning framework for parallel multicore stencil computations
- Atlanta, Georgia
- S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams. An auto-tuning framework for parallel multicore stencil computations. In Interational Conference on Parallel and Distributed Computing Systems (IPDPS), Atlanta, Georgia, 2010.
- (2010) Interational Conference on Parallel and Distributed Computing Systems (IPDPS)
- Kamil, S.¹ Chan, C.² Oliker, L.³ Shalf, J.⁴ Williams, S.⁵

13
- 84958661690
- Impact of modern memory subsystems on cache optimizations for stencil computations
- ACM
- S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Memory Systen Performance, pages 36-43. ACM, 2005.
- (2005) Memory Systen Performance , pp. 36-43
- Kamil, S.¹ Husbands, P.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

14
- 33645446819
- Lattice Boltzmann model for dissipative MHD
- Montreux, Switzerland, June 17-21
- A. Macnab, G. Vahala, L. Vahala, and P. Pavlo. Lattice Boltzmann model for dissipative MHD. In Proc. 29th EPS Conference on Controlled Fusion and Plasma Physics, volume 26B, Montreux, Switzerland, June 17-21, 2002.
- (2002) Proc. 29th EPS Conference on Controlled Fusion and Plasma Physics , vol.26 B
- Macnab, A.¹ Vahala, G.² Vahala, L.³ Pavlo, P.⁴

15
- 74049134929
- Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors
- K. Madduri, S. Williams, S. Ethier, L. Oliker, J. Shalf, E. Strohmaier, and K. Yelick. Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors. In Proc. SC2009: High performance computing, networking, and storage conference, 2009.
- (2009) Proc. SC2009: High Performance Computing, Networking, and Storage Conference
- Madduri, K.¹ Williams, S.² Ethier, S.³ Oliker, L.⁴ Shalf, J.⁵ Strohmaier, E.⁶ Yelick, K.⁷

16
- 0000979764
- Lattice Boltzmann magnetohydrodynamics
- June
- D. Martinez, S. Chen, and W. Matthaeus. Lattice Boltzmann magnetohydrodynamics. Physics of Plasmas, 1:1850-1867, June 1994.
- (1994) Physics of Plasmas , vol.1 , pp. 1850-1867
- Martinez, D.¹ Chen, S.² Matthaeus, W.³

17
- 34547503691
- Time skewing: A value-based approach to optimizing for memory locality
- Rugers University
- J. McCalpin and D. Wonnacott. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers University, 1999.
- (1999) Technical Report DCS-TR-379, Department of Computer Science
- McCalpin, J.¹ Wonnacott, D.²

18
- 74049146136
- Minimizing communication in sparse matrix solvers
- M. Mohiyuddin, M. Hoemmen, J. Demmel, and K. Yelick. Minimizing communication in sparse matrix solvers. In Proc. SC2009: High performance computing, networking, and storage conference, 2009. http://dx.doi.org/10.1145/ 1654059.1654096.
- (2009) Proc. SC2009: High Performance Computing, Networking, and Storage Conference
- Mohiyuddin, M.¹ Hoemmen, M.² Demmel, J.³ Yelick, K.⁴

19
- 78650806116
- 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
- Washington, DC, USA, IEEE Computer Society
- A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey. 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC'10, pages 1-13, Washington, DC, USA, 2010. IEEE Computer Society.
- (2010) Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC'10 , pp. 1-13
- Nguyen, A.¹ Satish, N.² Chhugani, J.³ Kim, C.⁴ Dubey, P.⁵

20
- 52249091740
- Efficient algorithms for ghost cell updates on two classes of MPP architectures
- B. Palmer and J. Nieplocha. Efficient algorithms for ghost cell updates on two classes of MPP architectures. In Proc. PDCS International Conference on Parallel and Distributed Computing Systems, pages 192-197, 2002.
- (2002) Proc. PDCS International Conference on Parallel and Distributed Computing Systems , pp. 192-197
- Palmer, B.¹ Nieplocha, J.²

21
- 42749090414
- Progress in lattice Boltzmann methods for magnetohydrodynamic ows relevant to fusion applications
- M. Pattison, K. Premnath, N. Morley, and M. Abdou. Progress in lattice Boltzmann methods for magnetohydrodynamic ows relevant to fusion applications. Fusion Eng. Des., 83:557-572, 2008.
- (2008) Fusion Eng. Des. , vol.83 , pp. 557-572
- Pattison, M.¹ Premnath, K.² Morley, N.³ Abdou, M.⁴

22
- 1242352441
- Optimization and profiling of the cache performance of parallel lattice Boltzmann codes
- T. Pohl, M. Kowarschik, J. Wilke, K. Iglberger, and U. Rüde. Optimization and profiling of the cache performance of parallel lattice Boltzmann codes. Parallel Processing Letters, 13(4):S:549, 2003.
- (2003) Parallel Processing Letters , vol.13 , Issue.4 , pp. 549
- Pohl, T.¹ Kowarschik, M.² Wilke, J.³ Iglberger, K.⁴ Rüde, U.⁵

23
- 84905678244
- SPIRAL Project. http://www.spiral.net.
- SPIRAL Project

24
- 0345025793
- STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream.
- STREAM: Sustainable Memory Bandwidth in High Performance Computers

25
- 0003841357
- Oxford Science Publ.
- S. Succi. The Lattice Boltzmann equation for uids and beyond. Oxford Science Publ., 2001.
- (2001) The Lattice Boltzmann Equation for Uids and beyond
- Succi, S.¹

26
- 32844469834
- Top500 Supercomputer Sites. http://www.top500.org.
- Top500 Supercomputer Sites

27
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- Institute of Physics Publishing, June
- R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. of SciDAC 2005, J. of Physics: Conference Series. Institute of Physics Publishing, June 2005.
- (2005) Proc. of SciDAC 2005, J. of Physics: Conference Series
- Vuduc, R.¹ Demmel, J.² Yelick, K.³

28
- 70449657442
- Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
- G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske. Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In International Computer Software and Applications Conference, pages 579-586, 2009.
- (2009) International Computer Software and Applications Conference , pp. 579-586
- Wellein, G.¹ Hager, G.² Zeiser, T.³ Wittmann, M.⁴ Fehske, H.⁵

29
- 33646809359
- On the single processor performance of simple lattice Boltzmann kernels
- Nov. ISSN 0045-7930
- G. Wellein, T. Zeiser, G. Hager, and S. Donath. On the single processor performance of simple lattice Boltzmann kernels. computers & fluids, 35(8-9):910-919, Nov. 2006. ISSN 0045-7930.
- (2006) Computers & Fluids , vol.35 , Issue.8-9 , pp. 910-919
- Wellein, G.¹ Zeiser, T.² Hager, G.³ Donath, S.⁴

30
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- DOI 10.1016/S0167-8191(00)00087-9
- R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001. (Pubitemid 32264775)
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Clint Whaley, R.¹ Petitet, A.² Dongarra, J.J.³

31
- 65649090648
- PhD thesis, EECS Department, University of California, Berkeley, December
- S. Williams. Auto-tuning Performance on Multicore Computers. PhD thesis, EECS Department, University of California, Berkeley, December 2008.
- (2008) Auto-tuning Performance on Multicore Computers
- Williams, S.¹

32
- 51049106193
- Lattice Boltzmann simulation optimization on leading multicore platforms
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. In International Parallel & Distributed Processing Symposium, 2008.
- (2008) International Parallel & Distributed Processing Symposium
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

33
- 67650998701
- Lattice Boltzmann simulation optimization on leading multicore platforms
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. Journal of Parallel and Distributed Computing, 69(9):762-777, 2009.
- (2009) Journal of Parallel and Distributed Computing , vol.69 , Issue.9 , pp. 762-777
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

34
- 83155177858
- Resource-efficient, hierarchical auto-tuning of a hybrid lattice Boltzmann computation on the Cray XT4
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Resource-efficient, hierarchical auto-tuning of a hybrid lattice Boltzmann computation on the Cray XT4. In Proc. CUG09: Cray User Group meeting, 2009.
- (2009) Proc. CUG09: Cray User Group Meeting
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

35
- 56749158843
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. SC2007: High performance computing, networking, and storage conference, 2007.
- (2007) Proc. SC2007: High Performance Computing, Networking, and Storage Conference
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

36
- 67650797544
- Roofline: An insightful visual performance model for floating-point programs and multicore architectures
- April
- S. Williams, A. Watterman, and D. Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Communications of the ACM, April 2009.
- (2009) Communications of the ACM
- Williams, S.¹ Watterman, A.² Patterson, D.³

37
- 0000331979
- Lattice Boltzmann method for 3D flows with curved boundary
- D. Yu, R. Mei, W. Shyy, and L. Luo. Lattice Boltzmann method for 3D flows with curved boundary. Journal of Comp. Physics, 161:680-699, 2000.
- (2000) Journal of Comp. Physics , vol.161 , pp. 680-699
- Yu, D.¹ Mei, R.² Shyy, W.³ Luo, L.⁴

38
- 73849092882
- Benchmark analysis and application results for lattice Boltzmann simulations on NEC SXvector and Intel Nehalemsystems
- T. Zeiser, G. Hager, and G. Wellein. Benchmark analysis and application results for lattice Boltzmann simulations on NEC SXvector and Intel Nehalemsystems. Parallel Processing Letters, 19(4):491-511, 2009.
- (2009) Parallel Processing Letters , vol.19 , Issue.4 , pp. 491-511
- Zeiser, T.¹ Hager, G.² Wellein, G.³

39
- 56349170328
- Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method
- T. Zeiser, G. Wellein, A. Nitsure, K. Iglberger, U. Rude, and G. Hager. Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method. Progress in Computational Fluid Dynamics, 8, 2008.
- (2008) Progress in Computational Fluid Dynamics , vol.8
- Zeiser, T.¹ Wellein, G.² Nitsure, A.³ Iglberger, K.⁴ Rude, U.⁵ Hager, G.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.