-
1
-
-
77957561221
-
An adaptive performance modeling tool for GPU architectures
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-M. W. Hwu, "An adaptive performance modeling tool for gpu architectures," in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '10, 2010, pp. 105-114.
-
(2010)
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '10
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
2
-
-
77957561221
-
An adaptive performance modeling tool for GPU architectures
-
ACM SIGPLAN symposium on Principles and practice of parallel programming
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W. Mei W. Hwu, "An adaptive performance modeling tool for GPU architectures," in ACM SIGPLAN symposium on Principles and practice of parallel programming, 2010.
-
(2010)
In
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Mei, W.5
Hwu, W.6
-
3
-
-
57349180412
-
A compiler framework for optimization of affine loop nests for GPGPUS
-
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A compiler framework for optimization of affine loop nests for GPGPUs," in ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, 2008, pp. 225-234.
-
(2008)
ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing
, pp. 225-234
-
-
Baskaran, M.M.1
Bondhugula, U.2
Krishnamoorthy, S.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
6
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in IISWC, 2009.
-
(2009)
IISWC
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
7
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
S. Che, J. W. Sheaffer, and K. Skadron, "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11, 2011, pp. 13:1-13:11.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Ser. SC '11
, pp. 131-1311
-
-
Che, S.1
Sheaffer, J.W.2
Skadron, K.3
-
8
-
-
77954719557
-
The scalable heterogeneous computing (shoc) benchmark suite
-
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (shoc) benchmark suite," in GPGPU, 2010.
-
(2010)
GPGPU
-
-
Danalis, A.1
Marin, G.2
McCurdy, C.3
Meredith, J.S.4
Roth, P.C.5
Spafford, K.6
Tipparaju, V.7
Vetter, J.S.8
-
9
-
-
81355161778
-
The university of Florida sparse matrix collection
-
Dec.
-
T. A. Davis and Y. Hu, "The university of Florida sparse matrix collection," ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1:1-1:25, Dec. 2011.
-
(2011)
ACM Trans. Math. Softw.
, vol.38
, Issue.1
, pp. 11-125
-
-
Davis, T.A.1
Hu, Y.2
-
10
-
-
1442313416
-
Predicting whole-program locality with reuse distance analysis
-
San Diego, CA, June
-
C. Ding and Y. Zhong, "Predicting whole-program locality with reuse distance analysis," in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003, pp. 245-257.
-
(2003)
Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 245-257
-
-
Ding, C.1
Zhong, Y.2
-
11
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in International Symposium on Computer Architecture, 2009.
-
(2009)
International Symposium on Computer Architecture
-
-
Hong, S.1
Kim, H.2
-
12
-
-
78649824847
-
Exploiting memory access patterns to improve memory performance in data-parallel architectures
-
B. Jang, D. Schaa, P. Mistry, and D. Kaeli, "Exploiting memory access patterns to improve memory performance in data-parallel architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 105-118, 2011.
-
(2011)
IEEE Transactions on Parallel and Distributed Systems
, vol.22
, Issue.1
, pp. 105-118
-
-
Jang, B.1
Schaa, D.2
Mistry, P.3
Kaeli, D.4
-
13
-
-
84881191462
-
Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache
-
D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked dram caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache." in ISCA, 2013, pp. 404-415.
-
(2013)
ISCA
, pp. 404-415
-
-
Jevdjic, D.1
Volos, S.2
Falsafi, B.3
-
14
-
-
84864068497
-
Characterizing and improving the use of demand-fetched caches in GPUS
-
W. Jia, K. A. Shaw, and M. Martonosi, "Characterizing and improving the use of demand-fetched caches in gpus," in Proceedings of the 26th ACM international conference on Supercomputing, ser. ICS '12, 2012.
-
(2012)
Proceedings of the 26th ACM International Conference on Supercomputing, Ser. ICS '12
-
-
Jia, W.1
Shaw, K.A.2
Martonosi, M.3
-
15
-
-
35048854568
-
Cetus-an extensible compiler infrastructure for source-to-source transformation
-
S. Lee, T. Johnson, and R. Eigenmann, "Cetus-an extensible compiler infrastructure for source-to-source transformation," in In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003, pp. 539-553.
-
(2003)
Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC)
, pp. 539-553
-
-
Lee, S.1
Johnson, T.2
Eigenmann, R.3
-
16
-
-
78149272414
-
An integer programming framework for optimizing shared memory use on GPUS
-
W. Ma and G. Agrawal, "An integer programming framework for optimizing shared memory use on gpus," in PACT, 2010, pp. 553-554.
-
(2010)
PACT
, pp. 553-554
-
-
Ma, W.1
Agrawal, G.2
-
17
-
-
33750831358
-
Generic database cost models for hierarchical memory systems
-
S. Manegold, P. Boncz, and M. L. Kersten, "Generic Database Cost Models for Hierarchical Memory Systems," in Proceedings of VLDB, 2002, pp. 191-202.
-
(2002)
Proceedings of VLDB
, pp. 191-202
-
-
Manegold, S.1
Boncz, P.2
Kersten, M.L.3
-
18
-
-
70450273507
-
Scalable high performance main memory system using phase-change memory technology
-
M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable high performance main memory system using phase-change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09, 2009, pp. 24-33.
-
(2009)
Proceedings of the 36th Annual International Symposium on Computer Architecture, Ser. ISCA '09
, pp. 24-33
-
-
Qureshi, M.K.1
Srinivasan, V.2
Rivers, J.A.3
-
19
-
-
79959583242
-
Page placement in hybrid memory systems
-
L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing, ser. ICS '11, 2011, pp. 85-95.
-
(2011)
Proceedings of the International Conference on Supercomputing, Ser. ICS '11
, pp. 85-95
-
-
Ramos, L.E.1
Gorbatov, E.2
Bianchini, R.3
-
20
-
-
84863347222
-
A performance analysis framework for identifying potential benefits in GPGPU applications
-
J. Sim, A. Dasgupta, H. Kim, and R. W. Vuduc, "A performance analysis framework for identifying potential benefits in GPGPU applications," in ACM SIGPLAN symposium on Principles and practice of parallel programming, 2012.
-
(2012)
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
-
-
Sim, J.1
Dasgupta, A.2
Kim, H.3
Vuduc, R.W.4
-
21
-
-
33846547030
-
On the effectiveness of set associative page mapping and its applications in main memory management
-
A. J. Smith, "On the effectiveness of set associative page mapping and its applications in main memory management," in Proceedings of the 2nd International Conference on Software Engineering, 1976, pp. 286-292.
-
(1976)
Proceedings of the 2nd International Conference on Software Engineering
, pp. 286-292
-
-
Smith, A.J.1
-
22
-
-
78149251414
-
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
-
I.-J. Sung, J. A. Stratton, and W.-M. W. Hwu, "Data layout transformation exploiting memory-level parallelism in structured grid many-core applications," in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '10, 2010, pp. 513-522.
-
(2010)
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '10
, pp. 513-522
-
-
Sung, I.-J.1
Stratton, J.A.2
Hwu, W.-M.W.3
-
23
-
-
84887454272
-
Exploring hybrid memory for GPU energy efficiency through softwarehardware co-design
-
Piscataway, NJ, USA: IEEE Press
-
B. Wang, B. Wu, D. Li, X. Shen, W. Yu, Y. Jiao, and J. S. Vetter, "Exploring hybrid memory for gpu energy efficiency through softwarehardware co-design," in Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 93-102.
-
(2013)
Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '13
, pp. 93-102
-
-
Wang, B.1
Wu, B.2
Li, D.3
Shen, X.4
Yu, W.5
Jiao, Y.6
Vetter, J.S.7
-
24
-
-
77952579552
-
Demystifying GPU microarchitecture through microbenchmarking
-
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, "Demystifying gpu microarchitecture through microbenchmarking." in ISPASS. IEEE Computer Society, 2010, pp. 235-246.
-
(2010)
ISPASS. IEEE Computer Society
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.-M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
-
25
-
-
84875195366
-
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU
-
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen, "Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu," in Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013.
-
(2013)
Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
-
-
Wu, B.1
Zhao, Z.2
Zhang, E.Z.3
Jiang, Y.4
Shen, X.5
-
26
-
-
77954691442
-
A gpgpu compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou, "A gpgpu compiler for memory optimization and parallelism management," in Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '10, 2010, pp. 86-97.
-
(2010)
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '10
, pp. 86-97
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
27
-
-
79953126288
-
On - The-fly elimination of dynamic irregularities for GPU computing
-
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for gpu computing," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2011.
-
(2011)
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems
-
-
Zhang, E.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
-
30
-
-
84968739606
-
Miss rate prediction across all program inputs
-
New Orleans, Louisiana, September
-
Y. Zhong, S. G. Dropsho, and C. Ding, "Miss rate prediction across all program inputs," in Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.
-
(2003)
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
-
-
Zhong, Y.1
Dropsho, S.G.2
Ding, C.3
|