메뉴 건너뛰기




Volumn 27, Issue 2, 2013, Pages 193-209

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Author keywords

Blue Gene P; code generation; high performance computing; performance optimization; SIMD

Indexed keywords

BLUE GENE/P; CODE GENERATION; HIGH-PERFORMANCE COMPUTING; PERFORMANCE OPTIMIZATIONS; SIMD;

EID: 84877260365     PISSN: 10943420     EISSN: 17412846     Source Type: Journal    
DOI: 10.1177/1094342012444795     Document Type: Article
Times cited : (4)

References (45)
  • 2
    • 60649098999 scopus 로고    scopus 로고
    • 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors
    • Araya-Polo M, Rubio F, De R, Hanzich M, María J. 3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors. Scientific Programming. 2009 ; 17: 185-198
    • (2009) Scientific Programming , vol.17 , pp. 185-198
    • Araya-Polo, M.1    Rubio, F.2    De, R.3    Hanzich, M.4    María, J.5
  • 4
    • 48749141209 scopus 로고
    • Adaptive mesh refinement for hyperbolic partial differential equations
    • Berger MJ, Oliger J. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics. 1984 ; 53: 484-512
    • (1984) Journal of Computational Physics , vol.53 , pp. 484-512
    • Berger, M.J.1    Oliger, J.2
  • 7
    • 0031268141 scopus 로고    scopus 로고
    • Using integer linear programming for instruction scheduling and register allocation in multi-issue processors - 1
    • Chang C, Chen C, King C. Using integer linear programming for instruction scheduling and register allocation in multi-issue processors - 1. Computers and Mathematics with Applications. 1997 ; 34 (9). 1-14
    • (1997) Computers and Mathematics with Applications , vol.34 , Issue.9 , pp. 1-14
    • Chang, C.1    Chen, C.2    King, C.3
  • 8
    • 80051670105 scopus 로고    scopus 로고
    • Automatic code generation and tuning for stencil kernels on modern shared memory architectures
    • Christen M, Schenk O, Burkhart H. Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Computer Science - Research and Development. 2011 ; 26: 205-210
    • (2011) Computer Science - Research and Development , vol.26 , pp. 205-210
    • Christen, M.1    Schenk, O.2    Burkhart, H.3
  • 10
    • 77953972043 scopus 로고    scopus 로고
    • PhD thesis, EECS Department, University of California, Berkeley, CA
    • Datta K (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, EECS Department, University of California, Berkeley, CA. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-177.html.
    • (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms
    • Datta, K.1
  • 14
    • 4544335844 scopus 로고    scopus 로고
    • Vectorization for SIMD architectures with alignment constraints
    • Eichenberger A, Wu P, O'Brien K. Vectorization for SIMD architectures with alignment constraints. ACM SIGPLAN Notices. 2004 ; 39 (6). 82-93
    • (2004) ACM SIGPLAN Notices , vol.39 , Issue.6 , pp. 82-93
    • Eichenberger, A.1    Wu, P.2    O'Brien, K.3
  • 15
    • 64349099995 scopus 로고    scopus 로고
    • The Green500 List: Encouraging sustainable supercomputing
    • Feng W, Cameron K. The Green500 List: Encouraging Sustainable Supercomputing. Computer. 2007 ;: 50-55
    • (2007) Computer , pp. 50-55
    • Feng, W.1    Cameron, K.2
  • 20
    • 40749160036 scopus 로고    scopus 로고
    • Overview of the IBM Blue Gene/P project
    • Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development. 2008 ; 52 (1/2). 199
    • (2008) IBM Journal of Research and Development , vol.52 , Issue.1-2 , pp. 199
  • 24
    • 79551674713 scopus 로고    scopus 로고
    • Exaflop/s: The why and the how
    • Keyes D. Exaflop/s: The why and the how. Comptes Rendus Mécanique. 2011 ; 339 (2-3). 70-77
    • (2011) Comptes Rendus Mécanique , vol.339 , Issue.23 , pp. 70-77
    • Keyes, D.1
  • 28
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A unified graphics and computing architecture
    • Lindholm E, Nickolls J, Oberman S, Montrym J. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro. 2008 ; 28 (2). 39-55
    • (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 29
    • 50649115040 scopus 로고    scopus 로고
    • CorePy: High-productivity Cell/BE programming
    • Mueller C and Martin B (2007) CorePy: high-productivity Cell/BE programming. Applications for the Cell/BE, http://sti.cc.gatech.edu/Slides/ Mueller-070619.pdf.
    • (2007) Applications for the Cell/BE
    • Mueller, C.1    Martin, B.2
  • 30
    • 79957475280 scopus 로고    scopus 로고
    • Intel's array building blocks: A retargetable, dynamic compiler and embedded language
    • Newburn C, So B, Liu Z, et al. (2011) Intel's Array Building Blocks: A Retargetable, Dynamic Compiler and Embedded Language. Proceedings of Code Generation and Optimization, http://software.intel.com/en-us/blogs/wordpress/wp- content/uploads/2011/03/ArBB-CGO2011-distr.pdf.
    • (2011) Proceedings of Code Generation and Optimization
    • Newburn, C.1    So, B.2    Liu, Z.3
  • 32
    • 70449975635 scopus 로고    scopus 로고
    • High-order stencil computations on multicore clusters
    • Peng L, Seymour R, Nomura K-I, et al. High-order stencil computations on multicore clusters. Proceedings of IPDPS. 2009 ;: 1-11
    • (2009) Proceedings of IPDPS , pp. 1-11
    • Peng, L.1    Seymour, R.2    Nomura, K.-I.3
  • 33
    • 31344457004 scopus 로고    scopus 로고
    • Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor
    • Pham DC, Aipperspach T, Boerstler D, et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits. 2006 ; 41: 179-196
    • (2006) IEEE Journal of Solid-State Circuits , vol.41 , pp. 179-196
    • Pham, D.C.1    Aipperspach, T.2    Boerstler, D.3
  • 37
    • 0037383334 scopus 로고    scopus 로고
    • High-order finite difference and finite volume WENO schemes and discontinuous Galerkin methods for CFD
    • Shu C. High-order finite difference and finite volume WENO schemes and discontinuous Galerkin methods for CFD. International Journal of Computational Fluid Dynamics. 2003 ; 17: 107-118
    • (2003) International Journal of Computational Fluid Dynamics , vol.17 , pp. 107-118
    • Shu, C.1
  • 43
    • 0034448098 scopus 로고    scopus 로고
    • Optimal instruction scheduling using integer programming
    • Wilken K. Optimal instruction scheduling using integer programming. ACM SIGPLAN Notices. 2000 ;:
    • (2000) ACM SIGPLAN Notices
    • Wilken, K.1
  • 45
    • 78650871519 scopus 로고    scopus 로고
    • Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
    • Wittmann M, Hager G, Treibig J, Wellein G. Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. Parallel Processing Letters. 2010 ; 20: 359-376
    • (2010) Parallel Processing Letters , vol.20 , pp. 359-376
    • Wittmann, M.1    Hager, G.2    Treibig, J.3    Wellein, G.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.