-
1
-
-
78650818575
-
Scalable Earthquake Simulation on Petascale Supercomputers
-
Y. Cui, K. Olsen, T. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. Panda, A. Chourasia, J. Levesque, S. Day, and P. Maechling, "Scalable Earthquake Simulation on Petascale Supercomputers," in Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010), 2010, pp. 1-20.
-
Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010), 2010
, pp. 1-20
-
-
Cui, Y.1
Olsen, K.2
Jordan, T.3
Lee, K.4
Zhou, J.5
Small, P.6
Roten, D.7
Ely, G.8
Panda, D.9
Chourasia, A.10
Levesque, J.11
Day, S.12
Maechling, P.13
-
2
-
-
80053238973
-
PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures
-
M. Christen, O. Schenk, and H. Burkhart, "PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures," in Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2011), 2011, pp. 1-12.
-
Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2011), 2011
, pp. 1-12
-
-
Christen, M.1
Schenk, O.2
Burkhart, H.3
-
5
-
-
0042885467
-
On the Implementation of Perfectly Matched Layers in a 3D Fourth-Order Velocity-Stress Finite-Difference Scheme
-
C. Marcinkovich and K. Olsen, "On the Implementation of Perfectly Matched Layers in a 3D Fourth-Order Velocity-Stress Finite-Difference Scheme," J. Geophys. Res., vol. 108 (B5), 2003.
-
(2003)
J. Geophys. Res.
, vol.108
, Issue.B5
-
-
Marcinkovich, C.1
Olsen, K.2
-
6
-
-
70450077422
-
Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures
-
M. Christen, O. Schenk, E. Neufeld, P. Messmer, and H. Burkhart, "Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures," in Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2009), May 2009, pp. 1-10.
-
Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2009), May 2009
, pp. 1-10
-
-
Christen, M.1
Schenk, O.2
Neufeld, E.3
Messmer, P.4
Burkhart, H.5
-
8
-
-
77954709215
-
Cache oblivious parallelograms in iterative stencil computations
-
R. Strzodka, M. Shaheen, D. Pajak, and H. Seidel, "Cache oblivious parallelograms in iterative stencil computations," in Proc. ACM Int'l Conference on Supercomputing (ICS 2010), 2010, pp. 49-59.
-
Proc. ACM Int'l Conference on Supercomputing (ICS 2010), 2010
, pp. 49-59
-
-
Strzodka, R.1
Shaheen, M.2
Pajak, D.3
Seidel, H.4
-
9
-
-
79959673844
-
The Pochoir Stencil Compiler
-
Y. Tang, R. Chowdhury, B. Kuszmaul, C. Luk, and C. Leiserson, "The Pochoir Stencil Compiler," in Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2011), 2011, pp. 117-128.
-
Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2011), 2011
, pp. 117-128
-
-
Tang, Y.1
Chowdhury, R.2
Kuszmaul, B.3
Luk, C.4
Leiserson, C.5
-
10
-
-
79958773431
-
Efficient Multicore-Aware Parallelization Strategies for Iterative Stencil Computations
-
J. Treibig, G. Wellein, and G. Hager, "Efficient Multicore-Aware Parallelization Strategies for Iterative Stencil Computations," J. Comp. Sci., vol. 2, no. 2, pp. 130-137, 2011.
-
(2011)
J. Comp. Sci.
, vol.2
, Issue.2
, pp. 130-137
-
-
Treibig, J.1
Wellein, G.2
Hager, G.3
-
12
-
-
79959601133
-
Mint: Realizing CUDA Performance in 3D Stencil Methods with Annotated C
-
D. Unat, X. Cai, and S. Baden, "Mint: Realizing CUDA Performance in 3D Stencil Methods with Annotated C," in Proc. ACM Int'l Conference on Supercomputing (ICS 2011), 2011, pp. 214-224.
-
Proc. ACM Int'l Conference on Supercomputing (ICS 2011), 2011
, pp. 214-224
-
-
Unat, D.1
Cai, X.2
Baden, S.3
-
13
-
-
83155190224
-
Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers
-
IEEE Computer Society
-
N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka, "Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers," in Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011). IEEE Computer Society, 2011.
-
(2011)
Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011)
-
-
Maruyama, N.1
Nomura, T.2
Sato, K.3
Matsuoka, S.4
-
14
-
-
79551491518
-
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
-
February
-
J. Meng and K. Skadron, "A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations," Int. J. Parallel Prog., vol. 39, pp. 115-142, February 2011.
-
(2011)
Int. J. Parallel Prog.
, vol.39
, pp. 115-142
-
-
Meng, J.1
Skadron, K.2
-
15
-
-
84861635761
-
A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
-
Y. Yang, H. Cui, X. Feng, and J. Xue, "A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs," J. Comp. Sci. Tech., vol. 27, pp. 57-74, 2012.
-
(2012)
J. Comp. Sci. Tech.
, vol.27
, pp. 57-74
-
-
Yang, Y.1
Cui, H.2
Feng, X.3
Xue, J.4
-
16
-
-
24644456455
-
Automatic tiling of iterative stencil loops
-
DOI 10.1145/1034774.1034777
-
Z. Li and Y. Song, "Automatic Tiling of Iterative Stencil Loops," ACM Trans. Program. Lang. Syst., vol. 26, no. 6, pp. 975-1028, 2004. (Pubitemid 41270296)
-
(2004)
ACM Transactions on Programming Languages and Systems
, vol.26
, Issue.6
, pp. 975-1028
-
-
Li, Z.1
Song, Y.2
-
18
-
-
77954022347
-
An Auto-tuning Framework For Parallel Multicore Stencil Computations
-
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An Auto-tuning Framework For Parallel Multicore Stencil Computations," in Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2010), April 2010.
-
Proc. IEEE Int'l Parallel & Distributed Processing Symposium (IPDPS 2010), April 2010
-
-
Kamil, S.1
Chan, C.2
Oliker, L.3
Shalf, J.4
Williams, S.5
-
19
-
-
78650806116
-
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
-
IEEE Computer Society
-
A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey, "3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs," in Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010). IEEE Computer Society, 2010.
-
(2010)
Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010)
-
-
Nguyen, A.1
Satish, N.2
Chhugani, J.3
Kim, C.4
Dubey, P.5
-
20
-
-
84877698136
-
Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor
-
T. Malas, A. Ahmadia, J. Brown, J. Gunnels, and D. Keyes, "Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor," IJHPCA, 2012.
-
(2012)
IJHPCA
-
-
Malas, T.1
Ahmadia, A.2
Brown, J.3
Gunnels, J.4
Keyes, D.5
-
21
-
-
35448985754
-
Parameterized Tiled Loops for Free
-
L. Renganarayanan, D. Kim, S. Rajopadhye, and M. Strout, "Parameterized Tiled Loops for Free," in Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), 2007, pp. 405-414.
-
Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), 2007
, pp. 405-414
-
-
Renganarayanan, L.1
Kim, D.2
Rajopadhye, S.3
Strout, M.4
-
22
-
-
57349139452
-
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
-
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A Practical Automatic Polyhedral Parallelizer and Locality Optimizer," in Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2008), 2008.
-
Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2008), 2008
-
-
Bondhugula, U.1
Hartono, A.2
Ramanujam, J.3
Sadayappan, P.4
-
23
-
-
77954412565
-
Loop Transformation Recipes for Code Generation and Auto-Tuning
-
Languages and Compilers for Parallel Computing, ser. G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., Springer Berlin / Heidelberg
-
M. Hall, J. Chame, C. Chen, J. Shin, G. Rudy, and M. Khan, "Loop Transformation Recipes for Code Generation and Auto-Tuning," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., vol. 5898. Springer Berlin / Heidelberg, 2010, pp. 50-64.
-
(2010)
Lecture Notes in Computer Science
, vol.5898
, pp. 50-64
-
-
Hall, M.1
Chame, J.2
Chen, C.3
Shin, J.4
Rudy, G.5
Khan, M.6
-
24
-
-
84863015363
-
A Heterogeneous Parallel Framework for Domain-Specific Languages
-
K. Brown, A. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "A Heterogeneous Parallel Framework for Domain-Specific Languages," in Proc. Int'l Conference on Parallel Architectures and Compilation Techniques (PACT 2011), 2011, pp. 89-100.
-
Proc. Int'l Conference on Parallel Architectures and Compilation Techniques (PACT 2011), 2011
, pp. 89-100
-
-
Brown, K.1
Sujeeth, A.2
Lee, H.3
Rompf, T.4
Chafi, H.5
Odersky, M.6
Olukotun, K.7
-
25
-
-
84955498575
-
Cetus: A Source-to-Source Compiler Infrastructure for Multicores
-
H. Bae, L. Bachega, C. Dave, S. Lee, S. Lee, S. Min, R. Eigenmann, and S. Midkiff, "Cetus: A Source-to-Source Compiler Infrastructure for Multicores," in Proc. Int'l Workshop on Compilers for Parallel Computing (CPC 2009), 2009.
-
Proc. Int'l Workshop on Compilers for Parallel Computing (CPC 2009), 2009
-
-
Bae, H.1
Bachega, L.2
Dave, C.3
Lee, S.4
Lee, S.5
Min, S.6
Eigenmann, R.7
Midkiff, S.8
-
28
-
-
78649844813
-
LIKWID: A Lightweight Performance-oriented Tool Suite for x86 Multicore Environments
-
J. Treibig, G. Hager, and G. Wellein, "LIKWID: A Lightweight Performance-oriented Tool Suite for x86 Multicore Environments," in Proc. First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI2010), San Diego CA, 2010.
-
Proc. First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI2010), San Diego CA, 2010
-
-
Treibig, J.1
Hager, G.2
Wellein, G.3
|