-
1
-
-
0031139728
-
Interprocedural data flow based optimizations for distributed memory compilation
-
May
-
G. Agrawal and J. Saltz. Interprocedural data flow based optimizations for distributed memory compilation. Software Practice and Experience, 27(5):519-546, May 1997.
-
(1997)
Software Practice and Experience
, vol.27
, Issue.5
, pp. 519-546
-
-
Agrawal, G.1
Saltz, J.2
-
3
-
-
63549135938
-
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
-
NY, USA, ACM
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In PPoPP, pages 1-10, NY, USA, 2008. ACM.
-
(2008)
PPoPP
, pp. 1-10
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
4
-
-
0030382364
-
Parallel programming with Polaris
-
Dec.
-
W. Blume, R. Doallo, R. Eigenman, J. Grout, J. Hoelflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78-82, Dec. 1996.
-
(1996)
IEEE Computer
, vol.29
, Issue.12
, pp. 78-82
-
-
Blume, W.1
Doallo, R.2
Eigenman, R.3
Grout, J.4
Hoelflinger, J.5
Lawrence, T.6
Lee, J.7
Padua, D.8
Paek, Y.9
Pottenger, B.10
Rauchwerger, L.11
Tu, P.12
-
5
-
-
77749340082
-
Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs
-
Feb.
-
J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. In PPoPP, Feb. 2010.
-
(2010)
PPoPP
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.W.3
-
6
-
-
77958483977
-
Running unstructured grid cfd solvers on modern graphics hardware
-
number AIAA 2009-4001, June
-
A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid cfd solvers on modern graphics hardware. In 19th AIAA Computational Fluid Dynamics Conference, number AIAA 2009-4001, June 2009.
-
(2009)
19th AIAA Computational Fluid Dynamics Conference
-
-
Corrigan, A.1
Camelli, F.2
Löhner, R.3
Wallin, J.4
-
7
-
-
0029430697
-
Index array flattening through program transformation
-
IEEE Computer Society Press, Dec.
-
R. Das, , P. Havlak, J. Saltz, and K. Kennedy. Index array flattening through program transformation. In SC95. IEEE Computer Society Press, Dec. 1995.
-
(1995)
SC95
-
-
Das, R.1
Havlak, P.2
Saltz, J.3
Kennedy, K.4
-
8
-
-
0028386843
-
The design and implementation of a parallel unstructured Euler solver using software primitives
-
Mar.
-
R. Das, D. J. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured Euler solver using software primitives. AIAA Journal, 32(3):489-496, Mar. 1994.
-
(1994)
AIAA Journal
, vol.32
, Issue.3
, pp. 489-496
-
-
Das, R.1
Mavriplis, D.J.2
Saltz, J.3
Gupta, S.4
Ponnusamy, R.5
-
9
-
-
79954630742
-
Improving cache performance of dynamic applications with computation and data layout transformations
-
May
-
C. Ding and K. Kennedy. Improving cache performance of dynamic applications with computation and data layout transformations. In PLDI99, May 1999.
-
(1999)
PLDI99
-
-
Ding, C.1
Kennedy, K.2
-
10
-
-
64649105762
-
Accelerating molecular dynamic simulation on graphics processing units
-
Radeon 4870
-
M. S. Friedrichs, P. Eastman, V. Vaidyanathan, M. Houston, S. Legrand, A. L. Beberg, D. L. Ensign, C. M. Bruns, and V. S. Pande. Accelerating molecular dynamic simulation on graphics processing units. Journal of Computational Chemistry, 30(Radeon 4870):864-872, 2009.
-
(2009)
Journal of Computational Chemistry
, vol.30
, pp. 864-872
-
-
Friedrichs, M.S.1
Eastman, P.2
Vaidyanathan, V.3
Houston, M.4
Legrand, S.5
Beberg, A.L.6
Ensign, D.L.7
Bruns, C.M.8
Pande, V.S.9
-
11
-
-
51549093017
-
Sparse matrix computations on manycore GPUs
-
M. Garland. Sparse matrix computations on manycore GPUs. In DAC, 2008.
-
(2008)
DAC
-
-
Garland, M.1
-
12
-
-
0033707876
-
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors
-
ACM Press, May
-
E. Gutierrez, O. Plata, and E. L. Zapata. A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors. In ICS00, pages 78-87. ACM Press, May 2000.
-
(2000)
ICS00
, pp. 78-87
-
-
Gutierrez, E.1
Plata, O.2
Zapata, E.L.3
-
13
-
-
0030380793
-
Maximizing multiprocessor performance with the SUIF compiler
-
Dec.
-
M. Hall, S. Amarsinghe, B. Murphy, S. Liao, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, (12), Dec. 1996.
-
(1996)
IEEE Computer
, Issue.12
-
-
Hall, M.1
Amarsinghe, S.2
Murphy, B.3
Liao, S.4
Lam, M.5
-
14
-
-
79959622062
-
Improving compiler and runtime support for irregular reductions
-
Aug.
-
H. Han and C.-W. Tseng. Improving compiler and runtime support for irregular reductions. In LCPC98, Aug. 1998.
-
(1998)
LCPC98
-
-
Han, H.1
Tseng, C.-W.2
-
16
-
-
0342622933
-
Handling irregular problems with Fortran D - A preliminary report
-
Also available as CRPC Technical Report CRPC-TR93339-S
-
R. v. Hanxleden. Handling irregular problems with Fortran D - a preliminary report. In CPC, Delft, The Netherlands, Dec. 1993. Also available as CRPC Technical Report CRPC-TR93339-S.
-
CPC, Delft, the Netherlands, Dec. 1993
-
-
Hanxleden, R.V.1
-
17
-
-
0029322399
-
Parallelizing molecular dynamics programs for distributed memory machines
-
Summer Also available as University of Maryland Technical Report CS-TR-3374 and UMIACS-TR-94-125
-
Y.-S. Hwang, R. Das, J. H. Saltz, M. Hodoscek, and B. R. Brooks. Parallelizing molecular dynamics programs for distributed memory machines. IEEE Computational Science & Engineering, 2(2):18-29, Summer 1995. Also available as University of Maryland Technical Report CS-TR-3374 and UMIACS-TR-94-125.
-
(1995)
IEEE Computational Science & Engineering
, vol.2
, Issue.2
, pp. 18-29
-
-
Hwang, Y.-S.1
Das, R.2
Saltz, J.H.3
Hodoscek, M.4
Brooks, B.R.5
-
18
-
-
0029375750
-
Partitioning unstructured computational graphs for nonuniform and adaptive environments
-
Fall
-
M. Kaddoura, C.-W. Ou, and S. Ranka. Partitioning unstructured computational graphs for nonuniform and adaptive environments. IEEE Parallel & Distributed Technology, 3(3):63-69, Fall 1995.
-
(1995)
IEEE Parallel & Distributed Technology
, vol.3
, Issue.3
, pp. 63-69
-
-
Kaddoura, M.1
Ou, C.-W.2
Ranka, S.3
-
19
-
-
84990479742
-
An efficient heuristic procedure for partitioning graphs
-
Feb.
-
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291-307, Feb. 1970.
-
(1970)
Bell System Technical Journal
, vol.49
, Issue.2
, pp. 291-307
-
-
Kernighan, B.1
Lin, S.2
-
20
-
-
79959588500
-
-
Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries
-
M. Khan, G. Rudy, C. Chen, M. Hall, and J. Chame. Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries. SC 2010 Poster Session, 2010.
-
SC 2010 Poster Session, 2010
-
-
Khan, M.1
Rudy, G.2
Chen, C.3
Hall, M.4
Chame, J.5
-
22
-
-
0029229672
-
Exploiting spatial regularity in irregular iterative applications
-
IEEE Computer Society Press, Apr.
-
A. Lain and P. Banerjee. Exploiting spatial regularity in irregular iterative applications. In IPPS95, pages 820-826. IEEE Computer Society Press, Apr. 1995.
-
(1995)
IPPS95
, pp. 820-826
-
-
Lain, A.1
Banerjee, P.2
-
23
-
-
78650802947
-
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
-
S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC, Nov 2010.
-
SC, Nov 2010
-
-
Lee, S.1
Eigenmann, R.2
-
24
-
-
67650081010
-
OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization
-
S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In PPoPP'09, 2009.
-
(2009)
PPoPP'09
-
-
Lee, S.1
Min, S.-J.2
Eigenmann, R.3
-
26
-
-
38349105400
-
Molecular dynamics simulations on commodity gpus with cuda
-
W. Liu, B. Schmidt, G. Voss, and W. Müller-Wittig. Molecular dynamics simulations on commodity gpus with cuda. In HiPC, pages 185-196, 2007.
-
(2007)
HiPC
, pp. 185-196
-
-
Liu, W.1
Schmidt, B.2
Voss, G.3
Müller-Wittig, W.4
-
27
-
-
70449707774
-
A Translation System for Enabling Data Mining Applications on GPUs
-
June
-
W. Ma and G. Agrawal. A Translation System for Enabling Data Mining Applications on GPUs. In ICS, June 2009.
-
(2009)
ICS
-
-
Ma, W.1
Agrawal, G.2
-
28
-
-
79952788812
-
An Integer Programming Framework for Optimizing Shared Memory Use on GPUs
-
Dec.
-
W. Ma and G. Agrawal. An Integer Programming Framework for Optimizing Shared Memory Use on GPUs. In HiPC, Dec. 2010.
-
(2010)
HiPC
-
-
Ma, W.1
Agrawal, G.2
-
29
-
-
0032684978
-
Improving memory hierarchy performance of irregular applications
-
June
-
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance of irregular applications. In ICS, June 1999.
-
(1999)
ICS
-
-
Mellor-Crummey, J.1
Whalley, D.2
Kennedy, K.3
-
30
-
-
0033362479
-
Localizing non-affine array references
-
Oct.
-
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In PACT, Oct. 1999.
-
(1999)
PACT
-
-
Mitchell, N.1
Carter, L.2
Ferrante, J.3
-
32
-
-
0029192463
-
Efficient support for irregular applications on distributed-memory machines
-
ACM Press, July
-
S. Mukherjee, S. Sharma, M. Hill, J. Larus, A. Rogers, and J. Saltz. Efficient support for irregular applications on distributed-memory machines. In PPOPP, pages 68-79. ACM Press, July 1995.
-
(1995)
PPOPP
, pp. 68-79
-
-
Mukherjee, S.1
Sharma, S.2
Hill, M.3
Larus, J.4
Rogers, A.5
Saltz, J.6
-
34
-
-
0029356841
-
Runtime support and compilation methods for user-specified irregular data distributions
-
Aug.
-
R. Ponnusamy, J. Saltz, A. Choudhary, Y.-S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified irregular data distributions. TPDS, 6(8):815-831, Aug. 1995.
-
(1995)
TPDS
, vol.6
, Issue.8
, pp. 815-831
-
-
Ponnusamy, R.1
Saltz, J.2
Choudhary, A.3
Hwang, Y.-S.4
Fox, G.5
-
35
-
-
0036505103
-
Parallel static and dynamic multi-constraint graph partitioning
-
DOI 10.1002/cpe.605
-
K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 14(3):219-240, 2002. (Pubitemid 34460007)
-
(2002)
Concurrency Computation Practice and Experience
, vol.14
, Issue.3
, pp. 219-240
-
-
Schloegel, K.1
Karypis, G.2
Kumar, V.3
-
36
-
-
70450029523
-
A framework for efficient and scalable execution of domain-specific templates on GPUs
-
N. Sundaram, A. Raghunathan, and S. Chakradhar. A framework for efficient and scalable execution of domain-specific templates on GPUs. In IPDPS, 2009.
-
(2009)
IPDPS
-
-
Sundaram, N.1
Raghunathan, A.2
Chakradhar, S.3
-
37
-
-
84963626200
-
Accelerating molecular dynamics simulations with gpus
-
J. P. Walters, V. Balu, V. Chaudhary, D. Kofke, and A. Schultz. Accelerating molecular dynamics simulations with gpus. In ISCA PDCCS, pages 44-49, 2008.
-
(2008)
ISCA PDCCS
, pp. 44-49
-
-
Walters, J.P.1
Balu, V.2
Chaudhary, V.3
Kofke, D.4
Schultz, A.5
-
38
-
-
0029322543
-
Distributed memory compiler design for sparse problems
-
June
-
J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44(6):737-753, June 1995.
-
(1995)
IEEE Transactions on Computers
, vol.44
, Issue.6
, pp. 737-753
-
-
Wu, J.1
Das, R.2
Saltz, J.3
Berryman, H.4
Hiranandani, S.5
-
39
-
-
77954691442
-
A GPGPU compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, 2010.
-
(2010)
PLDI
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
40
-
-
0033703286
-
Adaptive reduction parallelization techniques
-
ACM Press, May
-
H. Yu and L. Rauchwerger. Adaptive reduction parallelization techniques. In ICS00, pages 66-75. ACM Press, May 2000.
-
(2000)
ICS00
, pp. 66-75
-
-
Yu, H.1
Rauchwerger, L.2
|