-
1
-
-
0004072686
-
-
Addison Wesley, 2nd edition, August
-
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 2nd edition, August 2006.
-
(2006)
Compilers: Principles, Techniques, and Tools
-
-
Aho, A.V.1
Lam, M.S.2
Sethi, R.3
Ullman, J.D.4
-
2
-
-
57349180412
-
A compiler framework for optimization of affine loop nests for gpgpus
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08, pages 225-234, 2008.
-
(2008)
ICS'08
, pp. 225-234
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
5
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In SC, 2011.
-
(2011)
SC
-
-
Che, S.1
Sheaffer, J.W.2
Skadron, K.3
-
6
-
-
33746070806
-
Cache-conscious coallocation of hot data streams
-
T. M. Chilimbi and R. Shaham. Cache-conscious coallocation of hot data streams. In PLDI, 2006.
-
(2006)
PLDI
-
-
Chilimbi, T.M.1
Shaham, R.2
-
7
-
-
77954719557
-
-
A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. 2010.
-
(2010)
The Scalable Heterogeneous Computing (Shoc) Benchmark Suite
-
-
Danalis, A.1
Marin, G.2
McCurdy, C.3
Meredith, J.4
Roth, P.5
Spafford, K.6
Tipparaju, V.7
Vetter, J.8
-
8
-
-
1642502420
-
Improving effective bandwidth through compiler enhancement of global cache reuse
-
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1): 108-134, 2004.
-
(2004)
Journal of Parallel and Distributed Computing
, vol.64
, Issue.1
, pp. 108-134
-
-
Ding, C.1
Kennedy, K.2
-
9
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient gpu control flow
-
Washington, DC, USA, IEEE Computer Society
-
W. Fung, I. Sham, G. Yuan, and T. Aamodt. Dynamic warp formation and scheduling for efficient gpu control flow. In MICRO'07, pages 407-420, Washington, DC, USA, 2007. IEEE Computer Society.
-
(2007)
MICRO'07
, pp. 407-420
-
-
Fung, W.1
Sham, I.2
Yuan, G.3
Aamodt, T.4
-
12
-
-
79953071805
-
Sponge: Portable stream programming on graphics engines
-
A. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. In ASPLOS, 2011.
-
(2011)
ASPLOS
-
-
Hormati, A.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
13
-
-
79959575872
-
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus
-
X. Huo, V. Ravi, W. Ma, and G. Agrawal. An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus. In ICS, 2011.
-
(2011)
ICS
-
-
Huo, X.1
Ravi, V.2
Ma, W.3
Agrawal, G.4
-
14
-
-
81455141868
-
Enhancing locality for recursive traversals of recursive structures
-
Y. Jo and M. KulKarni. Enhancing locality for recursive traversals of recursive structures. In OOPSLA, 2011.
-
(2011)
OOPSLA
-
-
Jo, Y.1
Kulkarni, M.2
-
15
-
-
0035029828
-
A compiler technique for improving whole-program locality
-
M. Kandemir. A compiler technique for improving whole-program locality. In POPL, 2001.
-
(2001)
POPL
-
-
Kandemir, M.1
-
16
-
-
84863371431
-
Opencl as a unified programming model for heterogeneous cpu/gpu clusters
-
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Opencl as a unified programming model for heterogeneous cpu/gpu clusters. In PPoPP, 2012.
-
(2012)
PPoPP
-
-
Kim, J.1
Seo, S.2
Lee, J.3
Nah, J.4
Jo, G.5
Lee, J.6
-
17
-
-
77957808385
-
Optimistic parallelism benefits from data partitioning
-
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS, pages 233-243, 2008.
-
(2008)
ASPLOS
, pp. 233-243
-
-
Kulkarni, M.1
Pingali, K.2
Ramanarayanan, G.3
Walter, B.4
Bala, K.5
Chew, L.P.6
-
18
-
-
67650081010
-
Openmp to gpgpu: A compiler framework for automatic translation and optimization
-
S. Lee, S. Min, and R. Eigenmann. Openmp to gpgpu: A compiler framework for automatic translation and optimization. In PPoPP, 2009.
-
(2009)
PPoPP
-
-
Lee, S.1
Min, S.2
Eigenmann, R.3
-
19
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA, 2010.
-
(2010)
ISCA
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
20
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded gpu using cuda
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, andW.W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, pages 73-82, 2008.
-
(2008)
PPoPP
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.W.6
-
21
-
-
0038039924
-
Compile-time composition of run-time data and iteration reorderings
-
San Diego, CA, June
-
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In PLDI, San Diego, CA, June 2003.
-
(2003)
PLDI
-
-
Strout, M.M.1
Carter, L.2
Ferrante, J.3
-
22
-
-
74049151553
-
Increasing memory miss tolerance for simd cores
-
D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
-
(2009)
SC
-
-
Tarjan, D.1
Meng, J.2
Skadron, K.3
-
23
-
-
84856544146
-
Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control
-
B. Wu, E. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In PACT, 2011.
-
(2011)
PACT
-
-
Wu, B.1
Zhang, E.2
Shen, X.3
-
24
-
-
0033703258
-
Cacheminer: A runtime approach to exploit cache locality on smp
-
Y. Yan, X. Zhang, and Z. Zhang. Cacheminer: A runtime approach to exploit cache locality on smp. IEEE Transactions on Parallel Distributed Systems, 11(4): 357-374, 2000.
-
(2000)
IEEE Transactions on Parallel Distributed Systems
, vol.11
, Issue.4
, pp. 357-374
-
-
Yan, Y.1
Zhang, X.2
Zhang, Z.3
-
25
-
-
77954691442
-
A gpgpu compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI, 2010.
-
(2010)
PLDI
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
26
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for gpu computing
-
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In ASPLOS, 2011.
-
(2011)
ASPLOS
-
-
Zhang, E.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
|