-
1
-
-
30344463648
-
Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus
-
Allen, G., Dramlitsch, T., Foster, I., Karonis, N.T., Ripeanu, M., Seidel, E., Toonen, B.: Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: SC'01, pp. 52-52 (2001)
-
(2001)
SC'01
, pp. 52-52
-
-
Allen, G.1
Dramlitsch, T.2
Foster, I.3
Karonis, N.T.4
Ripeanu, M.5
Seidel, E.6
Toonen, B.7
-
3
-
-
84976690230
-
Fortran at ten gigaflops: The connection machine convolution compiler
-
10.1145/113445.113458
-
M. Bromley S. Heller T. McNerney G.L. Steele Jr 1991 Fortran at ten gigaflops: the connection machine convolution compiler PLDI '91 26 6 145 156 10.1145/113445.113458
-
(1991)
PLDI '91
, vol.26
, Issue.6
, pp. 145-156
-
-
Bromley, M.1
Heller, S.2
McNerney, T.3
Steele Jr., G.L.4
-
4
-
-
0027763868
-
Mobile and replicated alignment of arrays in data-parallel programs
-
November
-
Chatterjee, S., Gilbert, J.R., Schreiber, R.: Mobile and replicated alignment of arrays in data-parallel programs. In: SC'93, pp. 420-429 November (1993)
-
(1993)
SC'93
, pp. 420-429
-
-
Chatterjee, S.1
Gilbert, J.R.2
Schreiber, R.3
-
5
-
-
70449701626
-
-
June
-
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general purpose applications on graphics processors using CUDA, June (2008)
-
(2008)
A Performance Study of General Purpose Applications on Graphics Processors Using CUDA
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
6
-
-
79551499286
-
-
NVIDIA Corporation. Geforce gtx 280 specifications. (2008)
-
NVIDIA Corporation. Geforce gtx 280 specifications. (2008)
-
-
-
-
7
-
-
79551493358
-
-
NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)
-
NVIDIA Corporation. NVIDIA CUDA visual profiler. June (2008)
-
-
-
-
9
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC '08. 1-12 (2008)
-
(2008)
SC '08
, pp. 1-12
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
10
-
-
0034818853
-
Eliminating redundancies in sum-of-product array computations
-
Deitz, S.J., Chamberlain, B.L., Snyder, L.: Eliminating redundancies in sum-of-product array computations. In: ICS '01, pp. 65-77 (2001)
-
(2001)
ICS '01
, pp. 65-77
-
-
Deitz, S.J.1
Chamberlain, B.L.2
Snyder, L.3
-
11
-
-
0003338620
-
Partial differential equations
-
Evans, L.C.: Partial differential equations. Am. Math. Soc. (1998)
-
(1998)
Am. Math. Soc.
-
-
Evans, L.C.1
-
12
-
-
84964699790
-
Redundant computation partition on distributed-memory systems
-
Chen, L., Zhang, Z.-Q., Feng, X.-B.: Redundant computation partition on distributed-memory systems. In: ICA3PP '02, pp. 252 (2002)
-
(2002)
ICA3PP '02
, pp. 252
-
-
Chen, L.1
Zhang, Z.-Q.2
Feng, X.-B.3
-
13
-
-
32844463802
-
Cache oblivious stencil computations
-
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: ICS'05, pp. 361-366 (2005)
-
(2005)
ICS'05
, pp. 361-366
-
-
Frigo, M.1
Strumpen, V.2
-
15
-
-
34247376580
-
Chip multiprocessing and the cell broadband engine
-
Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: CF'06 (2006)
-
(2006)
CF'06
-
-
Gschwind, M.1
-
16
-
-
38249000489
-
Communication-free hyperplane partitioning of nested loops
-
0783.68027 10.1006/jpdc.1993.1094
-
C.-H. Huang P. Sadayappan 1993 Communication-free hyperplane partitioning of nested loops J. Parallel Distrib. Comput. 19 2 90 102 0783.68027 10.1006/jpdc.1993.1094
-
(1993)
J. Parallel Distrib. Comput.
, vol.19
, Issue.2
, pp. 90-102
-
-
Huang, C.-H.1
Sadayappan, P.2
-
17
-
-
4444374512
-
Compact thermal modeling for temperature-aware design
-
Huang, W., Stan, M.R., Skadron, K., Ghosh, S., Sankaranarayanan, K., Velusamy, S.: Compact thermal modeling for temperature-aware design. In: DAC'04. (2004)
-
(2004)
DAC'04
-
-
Huang, W.1
Stan, M.R.2
Skadron, K.3
Ghosh, S.4
Sankaranarayanan, K.5
Velusamy, S.6
-
20
-
-
84958661690
-
Impact of modern memory subsystems on cache optimizations for stencil computations
-
Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP'05, pp. 36-43 (2005)
-
(2005)
MSP'05
, pp. 36-43
-
-
Kamil, S.1
Husbands, P.2
Oliker, L.3
Shalf, J.4
Yelick, K.5
-
21
-
-
0010828753
-
Cache-aware multigrid methods for solving poisson's equation in two dimensions
-
0957.68002 10.1007/s006070070032 1783469
-
M. Kowarschik C. Wei W. Karl U. Rüde 2000 Cache-aware multigrid methods for solving poisson's equation in two dimensions Computing 64 4 381 399 0957.68002 10.1007/s006070070032 1783469
-
(2000)
Computing
, vol.64
, Issue.4
, pp. 381-399
-
-
Kowarschik, M.1
Wei, C.2
Karl, W.3
Rüde, U.4
-
23
-
-
0029490313
-
Techniques for compiling programs on distributed memory multicomputers
-
10.1016/0167-8191(95)00052-6 1369230
-
P. Lee 1995 Techniques for compiling programs on distributed memory multicomputers Parallel Comput. 21 1895 1923 10.1016/0167-8191(95)00052-6 1369230
-
(1995)
Parallel Comput.
, vol.21
, pp. 1895-1923
-
-
Lee, P.1
-
24
-
-
24644456455
-
Automatic tiling of iterative stencil loops
-
DOI 10.1145/1034774.1034777
-
Z. Li Y. Song 2004 Automatic tiling of iterative stencil loops ACM Trans. Program. Lang. Syst. 26 6 975 1028 10.1145/1034774.1034777 (Pubitemid 41270296)
-
(2004)
ACM Transactions on Programming Languages and Systems
, vol.26
, Issue.6
, pp. 975-1028
-
-
Li, Z.1
Song, Y.2
-
26
-
-
70449723385
-
Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
-
Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS '09, pp. 256-265 (2009)
-
(2009)
ICS '09
, pp. 256-265
-
-
Meng, J.1
Skadron, K.2
-
27
-
-
78651550268
-
Scalable parallel programming with CUDA
-
10.1145/1365490.1365500
-
J. Nickolls I. Buck M. Garland K. Skadron 2008 Scalable parallel programming with CUDA Queue 6 2 40 53 10.1145/1365490.1365500
-
(2008)
Queue
, vol.6
, Issue.2
, pp. 40-53
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
28
-
-
34248593308
-
Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow
-
1116.76066 10.1016/j.jcp.2006.10.023 2330283
-
K.N. Premnath J. Abraham 2007 Three-dimensional multi-relaxation time (mrt) lattice-Boltzmann models for multiphase flow J. Comput. Phys. 224 2 539 559 1116.76066 10.1016/j.jcp.2006.10.023 2330283
-
(2007)
J. Comput. Phys.
, vol.224
, Issue.2
, pp. 539-559
-
-
Premnath, K.N.1
Abraham, J.2
-
30
-
-
34548752231
-
Towards optimal multi-level tiling for stencil computations
-
March
-
Renganarayana, L., Harthikote-Matha, M., Dewri, R., Rajopadhye, S.: Towards optimal multi-level tiling for stencil computations. IPDPS'07, pp. 1-10, March (2007)
-
(2007)
IPDPS'07
, pp. 1-10
-
-
Renganarayana, L.1
Harthikote-Matha, M.2
Dewri, R.3
Rajopadhye, S.4
-
31
-
-
70350786558
-
Positivity, posynomials and tile size selection
-
Renganarayana, L., Rajopadhye, S.: Positivity, posynomials and tile size selection. In: SC '08, pp. 1-12 (2008)
-
(2008)
SC '08
, pp. 1-12
-
-
Renganarayana, L.1
Rajopadhye, S.2
-
32
-
-
70449722959
-
Cactus application: Performance predictions in a grid environment
-
Ripeanu, M., Iamnitchi, A., Foster, I.: Cactus application: Performance predictions in a grid environment. In: EuroPar'01. (2001)
-
(2001)
EuroPar'01
-
-
Ripeanu, M.1
Iamnitchi, A.2
Foster, I.3
-
33
-
-
33845574641
-
Tiling optimizations for 3D scientific computations
-
Rivera G., Tseng, C.-W.: Tiling optimizations for 3D scientific computations. In: SC '00, p. 32 (2000)
-
(2000)
SC '00
, pp. 32
-
-
Rivera, G.1
Tseng, C.-W.2
-
34
-
-
70449706048
-
CUDA-lite: Reducing GPU programming complexity
-
Ueng, S.-Z., Baghsorkhi, S., Lathara, M., Hwu, W.m.: CUDA-lite: Reducing GPU programming complexity. In: LCPC'08. (2008)
-
(2008)
LCPC'08
-
-
Ueng, S.-Z.1
Baghsorkhi, S.2
Lathara, M.3
Hwu, W.M.4
-
35
-
-
84863436006
-
Time skewing for parallel computers
-
Wonnacott, D.: Time skewing for parallel computers. In: WLCPC'99, pp. 477-480 (1999)
-
(1999)
WLCPC'99
, pp. 477-480
-
-
Wonnacott, D.1
-
36
-
-
1542392248
-
Achieving scalable locality with time skewing
-
1019.68024 10.1023/A:1015460304860
-
D. Wonnacott 2002 Achieving scalable locality with time skewing Int. J. Parallel Program. 30 3 181 221 1019.68024 10.1023/A:1015460304860
-
(2002)
Int. J. Parallel Program.
, vol.30
, Issue.3
, pp. 181-221
-
-
Wonnacott, D.1
-
37
-
-
79951471318
-
Parallel image processing based on CUDA
-
10.1109/CSSE.2008.1448
-
Z. Yang Y. Zhu Y. Pu 2008 Parallel image processing based on CUDA Int. Conf. Comput. Sci. Software Eng. 3 198 201 10.1109/CSSE.2008.1448
-
(2008)
Int. Conf. Comput. Sci. Software Eng.
, vol.3
, pp. 198-201
-
-
Yang, Z.1
Zhu, Y.2
Pu, Y.3
|