SCOPUS 정보 검색 플랫폼

International Journal of High Performance Computing Applications

Volumn 27, Issue 2, 2013, Pages 193-209

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

(5) Malas, Tareq a Ahmadia, Aron J a Brown, Jed b Gunnels, John A c Keyes, David E a

a KING ABDULLAH UNIVERSITY OF SCIENCE AND TECHNOLOGY (Saudi Arabia)

b ARGONNE NATIONAL LABORATORY (United States)

c IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Blue Gene P; code generation; high performance computing; performance optimization; SIMD

Indexed keywords

BLUE GENE/P; CODE GENERATION; HIGH-PERFORMANCE COMPUTING; PERFORMANCE OPTIMIZATIONS; SIMD;

COMPUTER ARCHITECTURE; OPTIMIZATION; PARTIAL DIFFERENTIAL EQUATIONS; SUPERCOMPUTERS;

NUMERICAL METHODS;

EID: 84877260365 PISSN: 10943420 EISSN: 17412846 Source Type: Journal
DOI: 10.1177/1094342012444795 Document Type: Article

Times cited : (4)

References (45)

1
- 85062050911
- New York: ACM Press;
- Ananthanarayanan R, Esser SK, Simon HD, Modha DS Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). New York: ACM Press ; 2009: 1-63.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09) , pp. 1-63
- Ananthanarayanan, R.¹ Esser, S.K.² Simon, H.D.³ Modha, D.S.⁴

2
- 60649098999
- 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors
- Araya-Polo M, Rubio F, De R, Hanzich M, María J. 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors. Scientific Programming. 2009 ; 17: 185-198
- (2009) Scientific Programming , vol.17 , pp. 185-198
- Araya-Polo, M.¹ Rubio, F.² De, R.³ Hanzich, M.⁴ María, J.⁵

3
- 84973836157
- The NAS parallel benchmarks
- Bailey D, Barszcz E, Barton J, et al. The NAS parallel benchmarks. International Journal of High Performance Computing Applications. 1991 ; 5 (3). 63
- (1991) International Journal of High Performance Computing Applications , vol.5 , Issue.3 , pp. 63
- Bailey, D.¹ Barszcz, E.² Barton, J.³

4
- 48749141209
- Adaptive mesh refinement for hyperbolic partial differential equations
- Berger MJ, Oliger J. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics. 1984 ; 53: 484-512
- (1984) Journal of Computational Physics , vol.53 , pp. 484-512
- Berger, M.J.¹ Oliger, J.²

5
- 0000493064
- Estimating interlock and improving balance for pipelined architectures - 1
- Callahan D, Cocke J, Kennedy K. Estimating interlock and improving balance for pipelined architectures - 1. Journal of Parallel and Distributed Computing. 1988 ; 5: 334-358
- (1988) Journal of Parallel and Distributed Computing , vol.5 , pp. 334-358
- Callahan, D.¹ Cocke, J.² Kennedy, K.³

6
- 0028549474
- Improving the ratio of memory operations to floating-point operations in loops
- Carr S, Kennedy K. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems (TOPLAS). 1994 ; 16: 1768-1810
- (1994) ACM Transactions on Programming Languages and Systems (TOPLAS) , vol.16 , pp. 1768-1810
- Carr, S.¹ Kennedy, K.²

7
- 0031268141
- Using integer linear programming for instruction scheduling and register allocation in multi-issue processors - 1
- Chang C, Chen C, King C. Using integer linear programming for instruction scheduling and register allocation in multi-issue processors - 1. Computers and Mathematics with Applications. 1997 ; 34 (9). 1-14
- (1997) Computers and Mathematics with Applications , vol.34 , Issue.9 , pp. 1-14
- Chang, C.¹ Chen, C.² King, C.³

8
- 80051670105
- Automatic code generation and tuning for stencil kernels on modern shared memory architectures
- Christen M, Schenk O, Burkhart H. Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Computer Science - Research and Development. 2011 ; 26: 205-210
- (2011) Computer Science - Research and Development , vol.26 , pp. 205-210
- Christen, M.¹ Schenk, O.² Burkhart, H.³

9
- 84877247710
- Piscataway, NJ: IEEE Press;
- Christen M, Schenk O, Neufeld E, Messmer P, Burkhart H 2009 IEEE International Symposium on Parallel and Distributed Processing. Piscataway, NJ: IEEE Press ; 2009: 1-10.
- (2009) 2009 IEEE International Symposium on Parallel and Distributed Processing , pp. 1-10
- Christen, M.¹ Schenk, O.² Neufeld, E.³ Messmer, P.⁴ Burkhart, H.⁵

10
- 77953972043
- PhD thesis, EECS Department, University of California, Berkeley, CA
- Datta K (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, EECS Department, University of California, Berkeley, CA. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-177.html.
- (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms
- Datta, K.¹

11
- 84971423310
- Auto-tuning the 27-point Stencil for Multicore
- Datta K, Williams S, Volkov V, et al. (2009) Auto-tuning the 27-point Stencil for Multicore. In Proc. iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning. http://crd.lbl.gov/∼oliker/papers/iwapt09.pdf.
- (2009) Proc. iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning
- Datta, K.¹ Williams, S.² Volkov, V.³

12
- 79951595196
- The international exascale software project roadmap
- Dongarra J, Beckman P, Moore T, et al. The international exascale software project roadmap. International Journal of High Performance Computing Applications. 2011 ; 25 (1). 3-60
- (2011) International Journal of High Performance Computing Applications , vol.25 , Issue.1 , pp. 3-60
- Dongarra, J.¹ Beckman, P.² Moore, T.³

13
- 70350630432
- Berlin: Springer-Verlag;
- Dursun H, Nomura K-I, Peng L, et al Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par '09). Berlin: Springer-Verlag ; 2009: 642-653.
- (2009) Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par '09) , pp. 642-653
- Dursun, H.¹ Nomura, K.-I.² Peng, L.³

14
- 4544335844
- Vectorization for SIMD architectures with alignment constraints
- Eichenberger A, Wu P, O'Brien K. Vectorization for SIMD architectures with alignment constraints. ACM SIGPLAN Notices. 2004 ; 39 (6). 82-93
- (2004) ACM SIGPLAN Notices , vol.39 , Issue.6 , pp. 82-93
- Eichenberger, A.¹ Wu, P.² O'Brien, K.³

15
- 64349099995
- The Green500 List: Encouraging sustainable supercomputing
- Feng W, Cameron K. The Green500 List: Encouraging Sustainable Supercomputing. Computer. 2007 ;: 50-55
- (2007) Computer , pp. 50-55
- Feng, W.¹ Cameron, K.²

16
- 84877245515
- Berkeley, CA: USENIX Association;
- Ganapathi A, Datta K, Fox A, Patterson D Proceedings of the First USENIX conference on Hot topics in parallelism. Berkeley, CA: USENIX Association ; 2009 :
- (2009) Proceedings of the First USENIX Conference on Hot Topics in Parallelism
- Ganapathi, A.¹ Datta, K.² Fox, A.³ Patterson, D.⁴

17
- 84877262456
- New York: ACM Press;
- Ghoting A, Makarychev K Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). New York: ACM Press ; 2009: 61.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09) , pp. 61
- Ghoting, A.¹ Makarychev, K.²

18
- 84976829023
- Postpass code optimization of pipeline constraints
- Hennessy JL, Gross T. Postpass code optimization of pipeline constraints. ACM Transactions on Programming Language Systems. 1983 ; 5: 422-448
- (1983) ACM Transactions on Programming Language Systems , vol.5 , pp. 422-448
- Hennessy, J.L.¹ Gross, T.²

19
- 79953274591
- New York: Springer;
- Henretty T, Stock K, Pouchet L, Franchetti F, Ramanujam J, Sadayappan P Compiler Construction. New York: Springer ; 2011: 225-245.
- (2011) Compiler Construction , pp. 225-245
- Henretty, T.¹ Stock, K.² Pouchet, L.³ Franchetti, F.⁴ Ramanujam, J.⁵ Sadayappan, P.⁶

20
- 40749160036
- Overview of the IBM Blue Gene/P project
- Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development. 2008 ; 52 (1/2). 199
- (2008) IBM Journal of Research and Development , vol.52 , Issue.1-2 , pp. 199

21
- 79551702774
- Piscataway, NJ: IEEE Press;
- Kamil S, Chan C, Oliker L, Shalf J, Williams S 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS). Piscataway, NJ: IEEE Press ; 2010: 1-12.
- (2010) 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS) , pp. 1-12
- Kamil, S.¹ Chan, C.² Oliker, L.³ Shalf, J.⁴ Williams, S.⁵

22
- 34547500808
- Implicit and explicit optimizations for stencil computations
- Kamil S, Datta K, Williams S, Oliker L, Shalf J, Yelick K. Implicit and explicit optimizations for stencil computations. Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06. 2006 ;: 51
- (2006) Proceedings of the 2006 Workshop on Memory System Performance and Correctness - MSPC '06 , pp. 51
- Kamil, S.¹ Datta, K.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

23
- 84958661690
- Impact of modern memory subsystems on cache optimizations for stencil computations
- Kamil S, Husbands P, Oliker L, Shalf J, Yelick K. Impact of modern memory subsystems on cache optimizations for stencil computations. Memory System Performance. 2005 ;: 36-43
- (2005) Memory System Performance , pp. 36-43
- Kamil, S.¹ Husbands, P.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

24
- 79551674713
- Exaflop/s: The why and the how
- Keyes D. Exaflop/s: The why and the how. Comptes Rendus Mécanique. 2011 ; 339 (2-3). 70-77
- (2011) Comptes Rendus Mécanique , vol.339 , Issue.23 , pp. 70-77
- Keyes, D.¹

25
- 66749092384
- Kogge P, Bergman K, Borkar S, et al. (2008) Exascale computing study: Technology challenges in achieving exascale systems. http://www.cse.nd.edu/ Reports/2008/TR-2008-13.pdf
- (2008) Exascale Computing Study: Technology Challenges in Achieving Exascale Systems
- Kogge, P.¹ Bergman, K.² Borkar, S.³

26
- 35448944792
- Effective automatic parallelization of stencil computations
- Krishnamoorthy S, Baskaran M, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P. Effective automatic parallelization of stencil computations. ACM Sigplan Notices. 2007 ; 42 (6). 235
- (2007) ACM Sigplan Notices , vol.42 , Issue.6 , pp. 235
- Krishnamoorthy, S.¹ Baskaran, M.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

27
- 24644456455
- Automatic tiling of iterative stencil loops
- Li Z, Song Y. Automatic tiling of iterative stencil loops. ACM Transactions on Programming Languages and Systems (TOPLAS). 2004 ; 26: 975-1028
- (2004) ACM Transactions on Programming Languages and Systems (TOPLAS) , vol.26 , pp. 975-1028
- Li, Z.¹ Song, Y.²

28
- 44849137198
- NVIDIA Tesla: A unified graphics and computing architecture
- Lindholm E, Nickolls J, Oberman S, Montrym J. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro. 2008 ; 28 (2). 39-55
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

29
- 50649115040
- CorePy: High-productivity Cell/BE programming
- Mueller C and Martin B (2007) CorePy: high-productivity Cell/BE programming. Applications for the Cell/BE, http://sti.cc.gatech.edu/Slides/ Mueller-070619.pdf.
- (2007) Applications for the Cell/BE
- Mueller, C.¹ Martin, B.²

30
- 79957475280
- Intel's array building blocks: A retargetable, dynamic compiler and embedded language
- Newburn C, So B, Liu Z, et al. (2011) Intel's Array Building Blocks: A Retargetable, Dynamic Compiler and Embedded Language. Proceedings of Code Generation and Optimization, http://software.intel.com/en-us/blogs/wordpress/wp- content/uploads/2011/03/ArBB-CGO2011-distr.pdf.
- (2011) Proceedings of Code Generation and Optimization
- Newburn, C.¹ So, B.² Liu, Z.³

31
- 78650806116
- 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
- Nguyen A, Satish N, Chhugani J, Kim C, Dubey P. 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. Proceedings of SuperComputing. 2010 ;: 1-13
- (2010) Proceedings of SuperComputing , pp. 1-13
- Nguyen, A.¹ Satish, N.² Chhugani, J.³ Kim, C.⁴ Dubey, P.⁵

32
- 70449975635
- High-order stencil computations on multicore clusters
- Peng L, Seymour R, Nomura K-I, et al. High-order stencil computations on multicore clusters. Proceedings of IPDPS. 2009 ;: 1-11
- (2009) Proceedings of IPDPS , pp. 1-11
- Peng, L.¹ Seymour, R.² Nomura, K.-I.³

33
- 31344457004
- Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor
- Pham DC, Aipperspach T, Boerstler D, et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits. 2006 ; 41: 179-196
- (2006) IEEE Journal of Solid-State Circuits , vol.41 , pp. 179-196
- Pham, D.C.¹ Aipperspach, T.² Boerstler, D.³

34
- 84877288469
- New York: ACM Press;
- Richards DF, Glosli JN, Chan B, et al Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). New York: ACM Press ; 2009: 1.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09) , pp. 1
- Richards, D.F.¹ Glosli, J.N.² Chan, B.³

35
- 23544482118
- Los Alamitos, CA: IEEE Computer Society Press;
- Rivera G, Tseng CW Proceedings of SC'00. Los Alamitos, CA: IEEE Computer Society Press ; 2000 :
- (2000) Proceedings of SC'00
- Rivera, G.¹ Tseng, C.W.²

36
- 84877252225
- New York: ACM Press;
- Seiler L, Carmean D, Sprangle E, et al ACM SIGGRAPH 2008 papers. New York: ACM Press ; 2008: 1-15.
- (2008) ACM SIGGRAPH 2008 Papers , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³

37
- 0037383334
- High-order finite difference and finite volume WENO schemes and discontinuous Galerkin methods for CFD
- Shu C. High-order finite difference and finite volume WENO schemes and discontinuous Galerkin methods for CFD. International Journal of Computational Fluid Dynamics. 2003 ; 17: 107-118
- (2003) International Journal of Computational Fluid Dynamics , vol.17 , pp. 107-118
- Shu, C.¹

38
- 35449003235
- New York: ACM Press;
- Solar-Lezama A, Arnold G, Tancau L, Bodik R, Saraswat V, Seshia S Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM Press ; 2007: 167-178.
- (2007) Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 167-178
- Solar-Lezama, A.¹ Arnold, G.² Tancau, L.³ Bodik, R.⁴ Saraswat, V.⁵ Seshia, S.⁶

39
- 70350747817
- IBM System Blue Gene Solution: Blue Gene/P Application Development. 2008 :
- (2008) IBM System Blue Gene Solution: Blue Gene/P Application Development

40
- 79959673844
- New York: ACM Press;
- Tang Y, Chowdhury RA, Kuszmaul BC, Luk C-K, Leiserson CE Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11). New York: ACM Press ; 2011: 117.
- (2011) Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '11) , pp. 117
- Tang, Y.¹ Chowdhury, R.A.² Kuszmaul, B.C.³ Luk, C.-K.⁴ Leiserson, C.E.⁵

41
- 77955113636
- Wyrzykowski R Dongarra J Karczewski K Wasniewski J, ed. Berlin: Springer;
- Treibig J, Hager G Parallel Processing and Applied Mathematics (Lecture Notes in Computer Science. Wyrzykowski R Dongarra J Karczewski K Wasniewski J, ed. Berlin: Springer ; 2010: 615-624.
- (2010) Parallel Processing and Applied Mathematics (Lecture Notes in Computer Science , pp. 615-624
- Treibig, J.¹ Hager, G.²

42
- 70449657442
- Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
- Wellein G, Hager G, Zeiser T, Wittmann M, Fehske H. Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. 2009 33rd Annual IEEE International Computer Software and Applications Conference. 2009 ;: 579-586
- (2009) 2009 33rd Annual IEEE International Computer Software and Applications Conference , pp. 579-586
- Wellein, G.¹ Hager, G.² Zeiser, T.³ Wittmann, M.⁴ Fehske, H.⁵

43
- 0034448098
- Optimal instruction scheduling using integer programming
- Wilken K. Optimal instruction scheduling using integer programming. ACM SIGPLAN Notices. 2000 ;:
- (2000) ACM SIGPLAN Notices
- Wilken, K.¹

44
- 51049106193
- Lattice Boltzmann simulation optimization on leading multicore platforms
- Williams S, Carter J, Oliker L, Shalf J, Yelick K. Lattice Boltzmann simulation optimization on leading multicore platforms. 2008 IEEE International Symposium on Parallel and Distributed Processing. 2008 ;: 1-14
- (2008) 2008 IEEE International Symposium on Parallel and Distributed Processing , pp. 1-14
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

45
- 78650871519
- Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
- Wittmann M, Hager G, Treibig J, Wellein G. Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. Parallel Processing Letters. 2010 ; 20: 359-376
- (2010) Parallel Processing Letters , vol.20 , pp. 359-376
- Wittmann, M.¹ Hager, G.² Treibig, J.³ Wellein, G.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.