SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 28, Issue 7, 2016, Pages 2295-2315

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations

(5) Wittmann, Markus a Hager, Georg a Zeiser, Thomas a Treibig, Jan a Wellein, Gerhard a

a UNIVERSITY OF ERLANGEN NUREMBERG (Germany)

Author keywords

ECM performance model; energy optimization; lattice Boltzmann method

Indexed keywords

CLOCKS; COMPUTER AIDED SOFTWARE ENGINEERING; ENERGY UTILIZATION; KINETIC THEORY; MESSAGE PASSING;

COMMUNICATION OVERHEADS; ENERGY OPTIMIZATION; ENERGY SAVING POTENTIAL; LATTICE BOLTZMANN METHOD; MESSAGE PASSING INTERFACE; OPTIMAL OPERATING POINT; PERFORMANCE DEGRADATION; PERFORMANCE MODEL;

ENERGY CONSERVATION;

EID: 84929440437 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.3489 Document Type: Conference Paper

Times cited : (36)

References (35)

1
- 33846243532
- Parallelization strategies and efficiency of CFD computations in complex geometries using lattice Boltzmann methods on high performance computers
- Breuer M. Durst F. Zenger C. (eds)., Lecture Notes in Computational Science and Engineering. Springer: Berlin, Heidelberg
- Schulz M, Krafczyk M, Tölke J, Rank E,. Parallelization strategies and efficiency of CFD computations in complex geometries using lattice Boltzmann methods on high performance computers. In High Performance Scientific and Engineering Computing Proceedings of the 3rd International FORTWIHR Conference on HPSEC, Erlangen, March 12-14, 2001, vol. 21, Breuer M, Durst F, Zenger C, (eds)., Lecture Notes in Computational Science and Engineering. Springer: Berlin, Heidelberg, 2002; 115-122. DOI: 10.1016/j.jcp.2008.01.013.
- (2002) High Performance Scientific and Engineering Computing Proceedings of the 3rd International FORTWIHR Conference on HPSEC, Erlangen, March 12-14, 2001 , vol.21 , pp. 115-122
- Schulz, M.¹ Krafczyk, M.² Tölke, J.³ Rank, E.⁴

2
- 1642342275
- A high-performance lattice Boltzmann implementation to model flow in porous media
- Pan C, Prins JF, Miller CT,. A high-performance lattice Boltzmann implementation to model flow in porous media. Computer Physics Communications 2004; 158 (2): 89-105.
- (2004) Computer Physics Communications , vol.158 , Issue.2 , pp. 89-105
- Pan, C.¹ Prins, J.F.² Miller, C.T.³

3
- 27244459147
- Domain-decomposition method for parallel lattice Boltzmann simulation of incompressible flow in porous media
- Wang J, Zhang X, Bengough AG, Crawford JW,. Domain-decomposition method for parallel lattice Boltzmann simulation of incompressible flow in porous media. Physical Review E 2005; 72 (1): 016706.
- (2005) Physical Review E , vol.72 , Issue.1 , pp. 016706
- Wang, J.¹ Zhang, X.² Bengough, A.G.³ Crawford, J.W.⁴

4
- 33646809359
- On the single processor performance of simple lattice Boltzmann kernels
- Wellein G, Zeiser T, Donath S, Hager G,. On the single processor performance of simple lattice Boltzmann kernels. Computers & Fluids 2006; 35: 910-919.
- (2006) Computers & Fluids , vol.35 , pp. 910-919
- Wellein, G.¹ Zeiser, T.² Donath, S.³ Hager, G.⁴

5
- 51049102365
- MUPHY: A parallel high performance MUlti PHYsics/Scale code
- IPDPS 2008 (IPDPS)
- Bernaschi M, Succi S, Fyta M, Kaxiras E, Melchionna S, Sircar JK,. MUPHY: A parallel high performance MUlti PHYsics/Scale code. IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008 (IPDPS), 2008; 1-8. DOI: 10.1109/IPDPS.2008.4536464.
- (2008) IEEE International Symposium on Parallel and Distributed Processing , pp. 1-8
- Bernaschi, M.¹ Succi, S.² Fyta, M.³ Kaxiras, E.⁴ Melchionna, S.⁵ Sircar, J.K.⁶

6
- 39449120748
- Comparison of implementations of the lattice-Boltzmann method
- Mattila K, Hyvaluoma J, Timonen J, Rossi T,. Comparison of implementations of the lattice-Boltzmann method. Computers & Mathematics with Applications 2008; 55 (7): 1514-1524.
- (2008) Computers & Mathematics with Applications , vol.55 , Issue.7 , pp. 1514-1524
- Mattila, K.¹ Hyvaluoma, J.² Timonen, J.³ Rossi, T.⁴

7
- 77951435761
- Accelerating lattice Boltzmann fluid flow simulations using graphics processors
- Bailey P, Myre J, Walsh SDC, Lilja DJ, Saar MO,. Accelerating lattice Boltzmann fluid flow simulations using graphics processors. IEEE International Conference on Parallel Processing 2009 (ICPP'09), 2009; 550-557. DOI: 10.1109/ICPP.2009.38.
- (2009) IEEE International Conference on Parallel Processing 2009 (ICPP'09) , pp. 550-557
- Bailey, P.¹ Myre, J.² Walsh, S.D.C.³ Lilja, D.J.⁴ Saar, M.O.⁵

8
- 70350719194
- On improving the performance of large parallel lattice Boltzmann flow simulations in heterogeneous porous media
- Vidal D, Roy R, Bertrand F,. On improving the performance of large parallel lattice Boltzmann flow simulations in heterogeneous porous media. Computers & Fluids 2010; 39 (2): 324-337.
- (2010) Computers & Fluids , vol.39 , Issue.2 , pp. 324-337
- Vidal, D.¹ Roy, R.² Bertrand, F.³

9
- 84896617265
- A fully distributed CFD framework for massively parallel systems
- Stuttgart, Germany. [Accessed on 26 March 2015]
- Zudrop J, Klimach H, Hasert M, Masilamani K, Roller S,. A fully distributed CFD framework for massively parallel systems. Cray Users Group Conference 2011, Stuttgart, Germany. (Available from: https://cug.org/proceedings/attendee-program-cug2012/includes/files/pap136.pdf) [Accessed on 26 March 2015], 2012.
- (2012) Cray Users Group Conference 2011
- Zudrop, J.¹ Klimach, H.² Hasert, M.³ Masilamani, K.⁴ Roller, S.⁵

10
- 84875236800
- Comparison of different propagation steps for lattice Boltzmann methods
- Wittmann M, Zeiser T, Hager G, Wellein G,. Comparison of different propagation steps for lattice Boltzmann methods. Computers & Mathematics with Applications 2013; 65 (6): 924-935.
- (2013) Computers & Mathematics with Applications , vol.65 , Issue.6 , pp. 924-935
- Wittmann, M.¹ Zeiser, T.² Hager, G.³ Wellein, G.⁴

11
- 65949107549
- Roofline: An insightful visual performance model for multicore architectures
- Williams S, Waterman A, Patterson D,. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM 2009; 52 (4): 65-76.
- (2009) Communications of the ACM , vol.52 , Issue.4 , pp. 65-76
- Williams, S.¹ Waterman, A.² Patterson, D.³

12
- 78650813616
- Multiscale simulation of cardiovascular flows on the IBM Bluegene/P: Full heart-circulation system at red-blood cell resolution
- SC 2010, New Orleans, LA, USA, IEEE, November 13-19
- Peters A, Melchionna S, Kaxiras E, Lätt J, Sircar JK, Bernaschi M, Bisson M, Succi S,. Multiscale simulation of cardiovascular flows on the IBM Bluegene/P: full heart-circulation system at red-blood cell resolution. Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking and Storage, SC 2010, New Orleans, LA, USA, IEEE, November 13-19, 2010; 1-10. DOI: 10.1109/SC.2010.33.
- (2010) Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking and Storage , pp. 1-10
- Peters, A.¹ Melchionna, S.² Kaxiras, E.³ Lätt, J.⁴ Sircar, J.K.⁵ Bernaschi, M.⁶ Bisson, M.⁷ Succi, S.⁸

13
- 84899683182
- Magnetohydrodynamic turbulence simulations on the earth simulator using the lattice Boltzmann method
- Seattle, WA, November 12-18
- Carter J, Soe M, Oliker L, Tsuda Y, Vahala G, Vahala L, Macnab A,. Magnetohydrodynamic turbulence simulations on the earth simulator using the lattice Boltzmann method. Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking and Storage (SC05), Seattle, WA, November 12-18, 2005.
- (2005) Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking and Storage (SC05)
- Carter, J.¹ Soe, M.² Oliker, L.³ Tsuda, Y.⁴ Vahala, G.⁵ Vahala, L.⁶ MacNab, A.⁷

14
- 67650998701
- Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
- Williams S, Carter J, Oliker L, Shalf J, Yelick K,. Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms. Journal of Parallel and Distributed Computing 2009; 69 (9): 762-777.
- (2009) Journal of Parallel and Distributed Computing , vol.69 , Issue.9 , pp. 762-777
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

15
- 84962933449
- Energy-to-solution: A today's metric for tomorrow's concerns
- Mons, Belgium, November 9, [Accessed on 26 March 2015]
- Keller V,. Energy-to-solution: A today's metric for tomorrow's concerns. Talk at the Symposium on Future Generations of Processors and Systems (FGPS'175), Mons, Belgium, November 9, 2012. (Available from: http://www.ig.fpms.ac.be/sites/default/files/FGPS175-Keller.pdf) [Accessed on 26 March 2015].
- (2012) Talk at the Symposium on Future Generations of Processors and Systems (FGPS'175)
- Keller, V.¹

16
- 84883192796
- Energy to solution: A new mission for parallel computing
- Wolf F. Mohr B. Mey D. (eds)., Lecture Notes in Computer Science. Springer: Berlin Heidelberg
- Bode Arndt,. Energy to solution: A new mission for parallel computing. In Euro-par 2013 Parallel Processing, vol. 8097, Wolf F, Mohr B, Mey D, (eds)., Lecture Notes in Computer Science. Springer: Berlin Heidelberg, 2013; 1-2. (Available from: http://dx.doi.org/10.1007/978-3-642-40047-6-1).
- (2013) Euro-par 2013 Parallel Processing , vol.8097 , pp. 1-2
- Bode, A.¹

17
- 77958509771
- A new energy aware performance metric
- Bekas C, Curioni A,. A new energy aware performance metric. Computer Science-Research and Development 2010; 25 (3-4): 187-195.
- (2010) Computer Science - Research and Development , vol.25 , Issue.3-4 , pp. 187-195
- Bekas, C.¹ Curioni, A.²

18
- 84861042376
- Towards efficient supercomputing: Searching for the right efficiency metric
- ACM, New York, NY, USA
- Hsu C-H, Kuehn JA, Poole SW,. Towards efficient supercomputing: searching for the right efficiency metric. Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ICPE '12, ACM, New York, NY, USA, 2012; 157-162.
- (2012) Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ICPE '12 , pp. 157-162
- Hsu, C.-H.¹ Kuehn, J.A.² Poole, S.W.³

19
- 84870895827
- Strategies for energy efficient resource management of hybrid programming models
- Li D, de Supinski BR, Schulz M, Nikolopoulos DS, Cameron KW,. Strategies for energy efficient resource management of hybrid programming models. IEEE Transactions on Parallel and Distributed Systems 2012; 99. DOI: 10.1109/TPDS.2012.95.
- (2012) IEEE Transactions on Parallel and Distributed Systems , vol.99
- Li, D.¹ De Supinski, B.R.² Schulz, M.³ Nikolopoulos, D.S.⁴ Cameron, K.W.⁵

20
- 84956679644
- Exploring performance and power properties of modern multicore chips via simple machine models
- Hager G, Treibig J, Habich J, Wellein G,. Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency and Computation: Practice and Experience 2014. DOI: 10.1002/cpe.3180.
- (2014) Concurrency and Computation: Practice and Experience
- Hager, G.¹ Treibig, J.² Habich, J.³ Wellein, G.⁴

21
- 85086888962
- Memory performance at reduced CPU clock speeds: An analysis of current x86-64 processors
- USENIX Association, Berkeley, CA, USA, [Accessed on 26 March 2015]
- Schöne R, Hackenberg D, Molka D,. Memory performance at reduced CPU clock speeds: an analysis of current x86-64 processors. Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower'12, USENIX Association, Berkeley, CA, USA, 2012. (Available from: https://www.usenix.org/system/files/conference/hotpower12/ hotpower12-final5.pdf) [Accessed on 26 March 2015].
- (2012) Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower'12
- Schöne, R.¹ Hackenberg, D.² Molka, D.³

22
- 84884824369
- A roofline model of energy
- Choi JW, Bedard D, Fowler R, Vuduc R,. A roofline model of energy. 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), 2013; 661-672. DOI: 10.1109/IPDPS.2013.77.
- (2013) 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS) , pp. 661-672
- Choi, J.W.¹ Bedard, D.² Fowler, R.³ Vuduc, R.⁴

23
- 73849092882
- Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems
- Zeiser T, Hager G, Wellein G,. Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems. Parallel Processing Letters 2009; 19 (4): 491-511.
- (2009) Parallel Processing Letters , vol.19 , Issue.4 , pp. 491-511
- Zeiser, T.¹ Hager, G.² Wellein, G.³

24
- 40549109949
- Two-relaxation-time lattice Boltzmann scheme: About parametrization, velocity, pressure and mixed boundary conditions
- Ginzburg I, Verhaeghe F, d'Humieres D,. Two-relaxation-time lattice Boltzmann scheme: about parametrization, velocity, pressure and mixed boundary conditions. Communications and Computer of Physics 2008; 3 (2): 427-428.
- (2008) Communications and Computer of Physics , vol.3 , Issue.2 , pp. 427-428
- Ginzburg, I.¹ Verhaeghe, F.² D'Humieres, D.³

25
- 84884480524
- [Accessed on 26 March 2015]
- SuperMUC petascale system. (Available from: http://www.lrz.de/services/compute/supermuc) [Accessed on 26 March 2015].
- SuperMUC Petascale System

26
- 77955113636
- Introducing a performance model for bandwidth-limited loop kernels
- Lecture Notes in Computer Science. Springer: Wroclaw Poland
- Treibig J, Hager G,. Introducing a performance model for bandwidth-limited loop kernels. In Proceedings of the Workshop Memory Issues on Multi- and Manycore Platforms at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics, vol. 6067, Lecture Notes in Computer Science. Springer: Wroclaw Poland, 2010; 615-624.
- (2010) Proceedings of the Workshop Memory Issues on Multi- And Manycore Platforms at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics , vol.6067 , pp. 615-624
- Treibig, J.¹ Hager, G.²

27
- 24444442056
- (Self-edition), [Accessed on 26 March 2015]
- Schönauer Willi,. Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers (Self-edition), 2000. (Available from: http://www.rz.uni-karlsruhe.de/~rx03/book) [Accessed on 26 March 2015].
- (2000) Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers
- Schönauer, W.¹

28
- 78650871519
- Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
- Wittmann M, Hager G, Treibig J, Wellein G,. Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. Parallel Processing Letters 2010; 20 (4): 359-376.
- (2010) Parallel Processing Letters , vol.20 , Issue.4 , pp. 359-376
- Wittmann, M.¹ Hager, G.² Treibig, J.³ Wellein, G.⁴

29
- 84877281508
- Pushing the limits for medical image reconstruction on recent standard multicore processors
- Treibig J, Hager G, Hofmann HG, Hornegger J, Wellein G,. Pushing the limits for medical image reconstruction on recent standard multicore processors. International Journal of High Performance Computing Applications 2013; 27 (2): 162-177.
- (2013) International Journal of High Performance Computing Applications , vol.27 , Issue.2 , pp. 162-177
- Treibig, J.¹ Hager, G.² Hofmann, H.G.³ Hornegger, J.⁴ Wellein, G.⁵

30
- 85050288436
- Intel. June. [Accessed on 26 March 2015]
- Intel. Intel Architecture Code Analyzer, June 2012. (Available from: http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/) [Accessed on 26 March 2015].
- (2012) Intel Architecture Code Analyzer

31
- 67650784628
- Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs
- Suleman MA, Qureshi MK, Patt YN,. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. SIGARCH Computer Architecture News 2008; 36 (1): 277-286.
- (2008) SIGARCH Computer Architecture News , vol.36 , Issue.1 , pp. 277-286
- Suleman, M.A.¹ Qureshi, M.K.² Patt, Y.N.³

32
- 34447569672
- Intel Corp. [Accessed on 26 March 2015]
- Intel Corp. Intel 64 and IA-32 Architectures Software Developer's Manual, 2013. (Available from: http://download.intel.com/products/processor/manual/325384.pdf) [Accessed on 26 March 2015].
- (2013) Intel 64 and IA-32 Architectures Software Developer's Manual

33
- 84859729360
- Power-management architecture of the Intel microarchitecture code-named Sandy Bridge
- Rotem E, Naveh A, Ananthakrishnan A, Rajwan D, Weissmann E,. Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 2012; 32: 20-27.
- (2012) IEEE Micro , vol.32 , pp. 20-27
- Rotem, E.¹ Naveh, A.² Ananthakrishnan, A.³ Rajwan, D.⁴ Weissmann, E.⁵

34
- 84962953842
- LRZ, Private Communication
- Huber H,. LRZ, Private Communication.
- Huber, H.¹

35
- 84962964547
- Intel MPI benchmarks Intel Corp. [Accessed on 26 March 2015]
- Intel Corp. Intel MPI benchmarks. (Available from: http://software.intel.com/en-us/articles/intel-mpi-benchmarks) [Accessed on 26 March 2015].

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.