-
1
-
-
70350771127
-
Stencil computation optimization and autotuning on state-of-The-Art multicore architectures
-
K. Datta, M. Murphy, V. Volkov, S. Williams,J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and autotuning on state-of-The-Art multicore architectures,"In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC08), pp. 1-12, 2008.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC08)
, pp. 1-12
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
2
-
-
83155190228
-
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
-
Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Akinori Yamanaka, Akira Nukada, Toshio Endo, Naoya Maruyama, and Satoshi Matsuoka, "Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer,"In Proceedings of IEEE/ACM International Conference on Supercomputing (SC11), pp. 1-11, 2011.
-
(2011)
Proceedings of IEEE/ACM International Conference on Supercomputing (SC11)
, pp. 1-11
-
-
Shimokawabe, T.1
Aoki, T.2
Takaki, T.3
Yamanaka, A.4
Nukada, A.5
Endo, T.6
Maruyama, N.7
Matsuoka, S.8
-
3
-
-
84893593562
-
Physis: An implicitly-parallel programming model for stencil computing on large-scale GPU-Accelerated supercomputers
-
Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka, "Physis: An implicitly-parallel programming model for stencil computing on large-scale GPU-Accelerated supercomputers," IEEE SC11,2011.
-
(2011)
IEEE SC11
-
-
Maruyama, N.1
Nomura, T.2
Sato, K.3
Matsuoka, S.4
-
5
-
-
77954056084
-
Multicore-Aware parallel temporal blocking of stencil codes for shared and distributed memory
-
April
-
M. Wittmann, G. Hager, and G. Wellein, "Multicore-Aware parallel temporal blocking of stencil codes for shared and distributed memory," Workshop on Large-Scale Parallel Processing (LSPP10), in conjunction with IEEE IPDPS2010, 7pages, April 2010.
-
(2010)
Workshop on Large-Scale Parallel Processing (LSPP10), in Conjunction with IEEE IPDPS2010
, pp. 7
-
-
Wittmann, M.1
Hager, G.2
Wellein, G.3
-
6
-
-
70449657442
-
Efcient temporal blocking for stencil computations by multicore-Aware wavefront parallelization
-
Gerhard Wellein, Georg Hager, Thomas Zeiser, Markus Wittmann and Holger Fehske, "Ef-cient temporal blocking for stencil computations by multicore-Aware wavefront parallelization," Computer Software and Applications Conference, vol.1, pp. 579-586, 2009.
-
(2009)
Computer Software and Applications Conference
, vol.1
, pp. 579-586
-
-
Wellein, G.1
Hager, G.2
Zeiser, T.3
Wittmann, M.4
Fehske, H.5
-
7
-
-
84893521469
-
Performance model for automatic optimization of communication in data-parallel stencil computations
-
2012-HPC-135
-
Tomoki Kawamura,Naoya Maruyama, and Satoshi Matsuoka, "Performance model for automatic optimization of communication in data-parallel stencil computations," IPSJ SIG Technical Report-vol.2012-HPC-135, 8pages, 2012
-
(2012)
IPSJ SIG Technical Report
, pp. 8
-
-
Kawamura, T.1
Maruyama, N.2
Matsuoka, S.3
-
8
-
-
79958272014
-
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
-
Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey, "3.5-D blocking optimization for stencil computations on modern CPUs and GPUs," IEEE SC10, 2010.
-
(2010)
IEEE SC10
-
-
Nguyen, A.1
Satish, N.2
Chhugani, J.3
Kim, C.4
Dubey, P.5
-
9
-
-
79953768747
-
Overcoming the GPU memory limitation on FDTDthrough the use of overlappingsubgrids
-
Leonardo Mattes and Sergio Kofuji, "Overcoming the GPU memory limitation on FDTDthrough the use of overlappingsubgrids," ICMMT, pp.1536-1539, 2010.
-
(2010)
ICMMT
, pp. 1536-1539
-
-
Mattes, L.1
Kofuji, S.2
-
10
-
-
77954903012
-
The use of overlapping subgrids to accelerate the FDTD on GPU devices
-
Leonardo Mattes and Sergio Kofuji, "The use of overlapping subgrids to accelerate the FDTD on GPU devices,"Radar Conference, pp. 807-810, 2010.
-
(2010)
Radar Conference
, pp. 807-810
-
-
Mattes, L.1
Kofuji, S.2
-
11
-
-
84893560097
-
Cache-Aware performance improvement of FDTD kernel
-
2010-HPC-124 No.5
-
Takeshi Minami-Takeshi Iwashita-Yasuhito Takahashi, and Hiroshi Nakashima, "Cache-Aware performance improvement of FDTD kernel," IPSJ SIG Technical Report-vol.2010-HPC-124 No.5, 7pages, 2010.
-
(2010)
IPSJ SIG Technical Report
, pp. 7
-
-
Minami, T.1
Iwashita, T.2
Takahashi, Y.3
Nakashima, H.4
|