-
1
-
-
84856515925
-
-
libpfm4
-
libpfm4. http://perfmon2.sourceforge.net/docs.html.
-
-
-
-
2
-
-
84856519788
-
-
NVIDIA CUDA. http://www.nvidia.com/cuda.
-
-
-
-
3
-
-
57349180412
-
A compiler framework for optimization of affine loop nests for GPGPUs
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In Proceedings of ICS, 2008.
-
(2008)
Proceedings of ICS
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
4
-
-
0023346636
-
Partitioning strategy for nonuniform problems on multiprocessors
-
M. Berger and S. Bokhari. A partitioning strategy for non-uniform problems on multiprocessors. IEEE Trans. Computers, 37(12):570-580, 1987. (Pubitemid 17582501)
-
(1987)
IEEE Transactions on Computers
, vol.C-36
, Issue.5
, pp. 570-580
-
-
Berger Marsha, J.1
Bokhari Shahid, H.2
-
6
-
-
0037883031
-
The design and implementation of a parallel unstructured euler solver using software primitives
-
R. Das, D. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured euler solver using software primitives. In Proceedings of the 30th Aerospace Science Meeting, 1992.
-
(1992)
Proceedings of the 30th Aerospace Science Meeting
-
-
Das, R.1
Mavriplis, D.2
Saltz, J.3
Gupta, S.4
Ponnusamy, R.5
-
7
-
-
0001483604
-
Communication optimizations for irregular scientific computations on distributioned memory architectures
-
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributioned memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, 1994.
-
(1994)
Journal of Parallel and Distributed Computing
, vol.22
, Issue.3
, pp. 462-479
-
-
Das, R.1
Uysal, M.2
Saltz, J.3
Hwang, Y.-S.4
-
8
-
-
1642502420
-
Improving effective bandwidth through compiler enhancement of global cache reuse
-
DOI 10.1016/j.jpdc.2003.09.005
-
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108-134, 2004. (Pubitemid 38117742)
-
(2004)
Journal of Parallel and Distributed Computing
, vol.64
, Issue.1
, pp. 108-134
-
-
Ding, C.1
Kennedy, K.2
-
12
-
-
33745715056
-
Exploiting locality for irregular scientific codes
-
DOI 10.1109/TPDS.2006.88
-
H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel Distributed Systems, 17(7):606-618, 2006. (Pubitemid 43997184)
-
(2006)
IEEE Transactions on Parallel and Distributed Systems
, vol.17
, Issue.7
, pp. 606-618
-
-
Han, H.1
Tseng, C.-W.2
-
14
-
-
0009406160
-
A fast and high quality multilevel scheme for partitioning irregular graphs
-
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. In Proceedings of ICPP, 1995.
-
(1995)
Proceedings of ICPP
-
-
Karypis, G.1
Kumar, V.2
-
15
-
-
79958785075
-
Region-based parallelization of irregular reductions onexplicitly managed memory hierarchies
-
S. Kim, H. Han, and K. Choe. Region-based parallelization of irregular reductions onexplicitly managed memory hierarchies. Journal of Supercomputing, 2009.
-
(2009)
Journal of Supercomputing
-
-
Kim, S.1
Han, H.2
Choe, K.3
-
16
-
-
77957808385
-
Optimistic parallelism benefits from data partitioning
-
DOI 10.1145/1346281.1346311, ASPLOS XIII - Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems
-
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In Proceedings of ASPLOS, pages 233-243, 2008. (Pubitemid 351585410)
-
(2008)
International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
, pp. 233-243
-
-
Kulkarni, M.1
Pingali, K.2
Ramanarayanan, G.3
Walter, B.4
Bala, K.5
Chew, L.P.6
-
17
-
-
67650081010
-
OpenMP to GPGPU: A compiler framework for automatic translation and optimization
-
S. Lee, S. Min, and R. Eigenmann. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proceedings of PPoPP, 2009.
-
(2009)
Proceedings of PPoPP
-
-
Lee, S.1
Min, S.2
Eigenmann, R.3
-
18
-
-
0016940739
-
Comparative analysis of the cuthill-mckee and the reverse cuthill-mckee ordering algorithms for sparse matrices
-
April
-
W. Liu and A. Sherman. Comparative analysis of the cuthill-mckee and the reverse cuthill-mckee ordering algorithms for sparse matrices. SIAM J. Numerical Analysis, 13(2), April 1976.
-
(1976)
SIAM J. Numerical Analysis
, vol.13
, pp. 2
-
-
Liu, W.1
Sherman, A.2
-
24
-
-
77954709868
-
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations
-
V. Ravi, W. Ma, D. Chiu, and G. Agrawal. compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In Proceedings of ICS, 2010.
-
(2010)
Proceedings of ICS
-
-
Ravi, V.1
Ma, W.2
Chiu, D.3
Agrawal, G.4
-
25
-
-
77953978573
-
Efficient compilation of fine-grained spmd-threaded programs for multicore cpus
-
J. Stratton, V. Grover, J. Marathe, B. Aarts, M. Murphy, Z. Hu, and W. Hwu. Efficient compilation of fine-grained spmd-threaded programs for multicore cpus. In CGO '10: Proceedings of the International Symposium on Code Generation and Optimization, 2010.
-
(2010)
CGO '10: Proceedings of the International Symposium on Code Generation and Optimization
-
-
Stratton, J.1
Grover, V.2
Marathe, J.3
Aarts, B.4
Murphy, M.5
Hu, Z.6
Hwu, W.7
-
27
-
-
77954691442
-
A gpgpu compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI, 2010.
-
(2010)
PLDI
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
28
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for gpu computing
-
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369-380, 2011.
-
(2011)
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 369-380
-
-
Zhang, E.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
-
30
-
-
8344272049
-
Array regrouping and structure splitting using whole-program reference affinity
-
June
-
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 255-266, June 2004.
-
(2004)
Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 255-266
-
-
Zhong, Y.1
Orlovich, M.2
Shen, X.3
Ding, C.4
|