-
2
-
-
70349100958
-
-
Khronos group, Version 2.0
-
Khronos group, "The OpenCL specification, " Version 2.0, 2015.
-
(2015)
The OpenCL Specification
-
-
-
4
-
-
84873470137
-
Parboil: A revised benchmark suite for scientific and commercial throughput computing
-
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. W. Hwu, "Parboil: A revised benchmark suite for scientific and commercial throughput computing, " IMPACT Technical Report, 2012.
-
(2012)
IMPACT Technical Report
-
-
Stratton, J.A.1
Rodrigues, C.2
Sung, I.-J.3
Obeid, N.4
Chang, L.-W.5
Anssari, N.6
Liu, G.D.7
Hwu, W.-M.W.8
-
5
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing, " in Workload Characterization, IEEE International Symposium on, pp. 44-54, 2009.
-
(2009)
Workload Characterization, IEEE International Symposium on
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.5
Lee, S.-H.6
Skadron, K.7
-
6
-
-
77952273045
-
The scalable heterogeneous computing (SHOC) benchmark, suite
-
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (SHOC) benchmark suite, " in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63- 74, 2010.
-
(2010)
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
, pp. 63-74
-
-
Danalis, A.1
Marin, G.2
McCurdy, C.3
Meredith, J.S.4
Roth, P.C.5
Spafford, K.6
Tipparaju, V.7
Vetter, J.S.8
-
7
-
-
84923879310
-
NUPAR: A benchmark suite for modern GPU architectures
-
Y. Ukidave, F. N. Paravecino, L. Yu, C. Kalra, A. Momeni, Z. Chen, N. Materise, B. Daley, P. Mistry, and D. Kaeli, "NUPAR: A benchmark suite for modern GPU architectures, " in Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 2015.
-
(2015)
Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering
-
-
Ukidave, Y.1
Paravecino, F.N.2
Yu, L.3
Kalra, C.4
Momeni, A.5
Chen, Z.6
Materise, N.7
Daley, B.8
Mistry, P.9
Kaeli, D.10
-
8
-
-
84873458159
-
A quantitative study of irregular programs on GPUs
-
M. Burtscher, R. Nasre, and K. Pingali, "A quantitative study of irregular programs on GPUs, " in Workload Characterization, IEEE International Symposium on, pp. 141-151, 2012.
-
(2012)
Workload Characterization, IEEE International Symposium on
, pp. 141-151
-
-
Burtscher, M.1
Nasre, R.2
Pingali, K.3
-
9
-
-
84875979403
-
Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems
-
P. Mistry, Y. Ukidave, D. Schaa, and D. Kaeli, "Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems, " in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pp. 54-65, 2013.
-
(2013)
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
, pp. 54-65
-
-
Mistry, P.1
Ukidave, Y.2
Schaa, D.3
Kaeli, D.4
-
10
-
-
84994777428
-
Hetero-Mark, a benchmark suite for CPU-GPU collaborative computing
-
Y. Sun, X. Gong, A. K. Ziabari, L. Yu, X. Li, S. Mukherjee, C. Mc-Cardwell, A. Villegas, and D. Kaeli, "Hetero-Mark, a benchmark suite for CPU-GPU collaborative computing, " in Workload Characterization, IEEE International Symposium on, 2016.
-
(2016)
Workload Characterization, IEEE International Symposium on
-
-
Sun, Y.1
Gong, X.2
Ziabari, A.K.3
Yu, L.4
Li, X.5
Mukherjee, S.6
Mc-Cardwell, C.7
Villegas, A.8
Kaeli, D.9
-
11
-
-
84958535612
-
Exploring the features of OpenCL 2.0
-
S. Mukherjee, X. Gong, L. Yu, C. McCardwell, Y. Ukidave, T. Dao, F. N. Paravecino, and D. Kaeli, "Exploring the features of OpenCL 2.0, " in Proceedings of the 3rd International Workshop on OpenCL, pp. 51-55, 2015.
-
(2015)
Proceedings of the 3rd International Workshop on OpenCL
, pp. 51-55
-
-
Mukherjee, S.1
Gong, X.2
Yu, L.3
McCardwell, C.4
Ukidave, Y.5
Dao, T.6
Paravecino, F.N.7
Kaeli, D.8
-
12
-
-
84978733890
-
A comprehensive performance analysis of HSA and OpenCL 2.0
-
S. Mukherjee, Y. Sun, P. Blinzer, A. K. Ziabari, and D. Kaeli, "A comprehensive performance analysis of HSA and OpenCL 2.0, " in Performance Analysis of Systems and Software, IEEE International Symposium on, pp. 183-193, 2016.
-
(2016)
Performance Analysis of Systems and Software, IEEE International Symposium on
, pp. 183-193
-
-
Mukherjee, S.1
Sun, Y.2
Blinzer, P.3
Ziabari, A.K.4
Kaeli, D.5
-
13
-
-
84962221365
-
Implementing cross-device atomics in heterogeneous processors
-
M. Gupta, D. Das, P. Raghavendra, T. Tye, L. Lobachev, A. Agarwal, and R. Hegde, "Implementing cross-device atomics in heterogeneous processors, " in Parallel and Distributed Processing Symposium Workshop, IEEE International, pp. 659-668, 2015.
-
(2015)
Parallel and Distributed Processing Symposium Workshop, IEEE International
, pp. 659-668
-
-
Gupta, M.1
Das, D.2
Raghavendra, P.3
Tye, T.4
Lobachev, L.5
Agarwal, A.6
Hegde, R.7
-
14
-
-
0022808786
-
A computational approach to edge detection
-
J. Canny, "A computational approach to edge detection, " Pattern Analysis and Machine Intelligence, IEEE Transactions on, no. 6, pp. 679-698, 1986.
-
(1986)
Pattern Analysis and Machine Intelligence, IEEE Transactions on
, Issue.6
, pp. 679-698
-
-
Canny, J.1
-
15
-
-
84976501593
-
Inplace data sliding algorithms for many-core architectures
-
J. Ǵomez Luna, L.-W. Chang, I.-J. Sung, W.-M. Hwu, and N. Guil, "Inplace data sliding algorithms for many-core architectures, " in Parallel Processing, 44th International Conference on, pp. 210-219, 2015.
-
(2015)
Parallel Processing, 44th International Conference on
, pp. 210-219
-
-
Luna, J.G.1
Chang, L.-W.2
Sung, I.-J.3
Hwu, W.-M.4
Guil, N.5
-
16
-
-
0019574599
-
Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography
-
June
-
M. A. Fischler and R. C. Bolles, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, " Communications of the ACM, vol. 24, pp. 381-395, June 1981.
-
(1981)
Communications of the ACM
, vol.24
, pp. 381-395
-
-
Fischler, M.A.1
Bolles, R.C.2
-
17
-
-
84870691946
-
DL: A data layout transformation system for heterogeneous computing
-
I.-J. Sung, G. Liu, and W.-M. Hwu, "DL: A data layout transformation system for heterogeneous computing, " in Innovative Parallel Computing, pp. 1 -11, 2012.
-
(2012)
Innovative Parallel Computing
, pp. 1-11
-
-
Sung, I.-J.1
Liu, G.2
Hwu, W.-M.3
-
18
-
-
77953985375
-
Dynamic load balancing on single-And multi-GPU systems
-
L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao, "Dynamic load balancing on single-And multi-GPU systems, " in Parallel Distributed Processing, IEEE International Symposium on, pp. 1-12, 2010.
-
(2010)
Parallel Distributed Processing, IEEE International Symposium on
, pp. 1-12
-
-
Chen, L.1
Villa, O.2
Krishnamoorthy, S.3
Gao, G.4
-
19
-
-
77956200064
-
An effective GPU implementation of breadth-first search
-
L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search, " in Proceedings of the 47th Design Automation Conference, pp. 52-55, 2010.
-
(2010)
Proceedings of the 47th Design Automation Conference
, pp. 52-55
-
-
Luo, L.1
Wong, M.2
Hwu, W.-M.3
-
20
-
-
84903968515
-
Gem5-GPU: A heterogeneous CPU-GPU simulator
-
Jan
-
J. Power, J. Hestness, M. Orr, M. Hill, and D. Wood, "gem5-gpu: A heterogeneous CPU-GPU simulator, " Computer Architecture Letters, vol. 13, Jan 2014.
-
(2014)
Computer Architecture Letters
, vol.13
-
-
Power, J.1
Hestness, J.2
Orr, M.3
Hill, M.4
Wood, D.5
-
22
-
-
85027453863
-
-
AMD
-
AMD, "App profiler settings." http://developer.amd.com/tools-And-sdks/archive/compute/amd-App-profiler/user-guide/app-profiler-settings/.
-
App Profiler Settings
-
-
-
23
-
-
85027451719
-
-
S. Kelley. https://github.com/smskelley/canny-opencl.
-
-
-
Kelley, S.1
-
25
-
-
85027468688
-
-
bshaozi, September
-
bshaozi, "Compile problem." https://github.com/RadeonOpenCompute/hcc/issues/124, September 2016.
-
(2016)
Compile Problem
-
-
-
26
-
-
84946020782
-
MachSuite: Benchmarks for accelerator design and customized architectures
-
B. Reagen, R. Adolf, Y. S. Shao, G. Y. Wei, and D. Brooks, "MachSuite: Benchmarks for accelerator design and customized architectures, " in Workload Characterization, IEEE International Symposium on, pp. 110- 119, 2014.
-
(2014)
Workload Characterization, IEEE International Symposium on
, pp. 110-119
-
-
Reagen, B.1
Adolf, R.2
Shao, Y.S.3
Wei, G.Y.4
Brooks, D.5
-
27
-
-
84862695013
-
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
-
K. L. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter, "The tradeoffs of fused memory hierarchies in heterogeneous computing architectures, " in Proceedings of the 9th conference on Computing Frontiers, pp. 103-112, 2012.
-
(2012)
Proceedings of the 9th Conference on Computing Frontiers
, pp. 103-112
-
-
Spafford, K.L.1
Meredith, J.S.2
Lee, S.3
Li, D.4
Roth, P.C.5
Vetter, J.S.6
-
28
-
-
84882833309
-
Performance characterization of dataintensive kernels on AMD fusion architectures
-
K. Lee, H. Lin, and W.-c. Feng, "Performance characterization of dataintensive kernels on AMD fusion architectures, " Computer Science- Research and Development, vol. 28, no. 2-3, pp. 175-184, 2013.
-
(2013)
Computer Science- Research and Development
, vol.28
, Issue.2-3
, pp. 175-184
-
-
Lee, K.1
Lin, H.2
Feng, W.-C.3
-
29
-
-
85016777931
-
Understanding co-run performance on CPU-GPU integrated processors: Observations, insights, directions
-
Q. Zhu, B. Wu, X. Shen, K. Shen, L. Shen, and Z. Wang, "Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions, " Frontiers of Computer Science, pp. 1-17, 2016.
-
(2016)
Frontiers of Computer Science
, pp. 1-17
-
-
Zhu, Q.1
Wu, B.2
Shen, X.3
Shen, K.4
Shen, L.5
Wang, Z.6
-
30
-
-
84978477088
-
Accelerating graph applications on integrated GPU platforms via instrumentationdriven optimizations
-
N. Farooqui, I. Roy, Y. Chen, V. Talwar, and K. Schwan, "Accelerating graph applications on integrated GPU platforms via instrumentationdriven optimizations, " in Proceedings of the ACM International Conference on Computing Frontiers, pp. 19-28, 2016.
-
(2016)
Proceedings of the ACM International Conference on Computing Frontiers
, pp. 19-28
-
-
Farooqui, N.1
Roy, I.2
Chen, Y.3
Talwar, V.4
Schwan, K.5
-
31
-
-
85027447368
-
Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications
-
V. Garcia-Flores, J. Ǵomez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Pena, "Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications, " in Workload Characterization, IEEE International Symposium on, pp. 1- 10, 2016.
-
(2016)
Workload Characterization, IEEE International Symposium, on
, pp. 1-10
-
-
Garcia-Flores, V.1
Ǵomez-Luna, J.2
Grass, T.3
Rico, A.4
Ayguade, E.5
Pena, A.J.6
-
32
-
-
85015994987
-
Dynamic buffer overflow detection for gpgpus
-
C. Erb, M. Collins, and J. L. Greathouse, "Dynamic buffer overflow detection for gpgpus, " in Proceedings of the 2017 International Symposium on Code Generation and Optimization, pp. 61-73, 2017.
-
(2017)
Proceedings of the 2017 International Symposium on Code Generation and Optimization
, pp. 61-73
-
-
Erb, C.1
Collins, M.2
Greathouse, J.L.3
-
34
-
-
85027464144
-
Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on GPUs
-
J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on GPUs, " in ACM SIGARCH Computer Architecture News, vol. 43, pp. 528-540, 2015.
-
(2015)
ACM SIGARCH Computer Architecture News
, vol.43
, pp. 528-540
-
-
Wang, J.1
Rubin, N.2
Sidelnik, A.3
Yalamanchili, S.4
-
35
-
-
84983239150
-
Compiler-Assisted workload consolidation for efficient dynamic parallelism on GPU
-
H. Wu, D. Li, and M. Becchi, "Compiler-Assisted workload consolidation for efficient dynamic parallelism on GPU, " in Parallel and Distributed Processing Symposium, 2016 IEEE International, pp. 534-543, 2016.
-
(2016)
Parallel and Distributed Processing Symposium, 2016 IEEE International
, pp. 534-543
-
-
Wu, H.1
Li, D.2
Becchi, M.3
-
36
-
-
85009382810
-
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
-
IEEE
-
I. El Hajj, J. Ǵomez-Luna, C. Li, L.-W. Chang, D. Milojicic, and W.-m. Hwu, "KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism, " in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pp. 1-12, IEEE, 2016.
-
(2016)
Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on
, pp. 1-12
-
-
Hajj, I.E.1
Ǵomez-Luna, J.2
Li, C.3
Chang, L.-W.4
Milojicic, D.5
Hwu, W.-M.6
-
37
-
-
85027441431
-
-
X. Tang, A. Pattnaik, H. Jiang, O. Kayiran, A. Jog, M. I. Sreepathi Pai, M. T. Kandemir, and C. R. Das, "Controlled kernel launch for dynamic parallelism in GPUs .
-
Controlled Kernel Launch for Dynamic Parallelism in GPUs
-
-
Tang, X.1
Pattnaik, A.2
Jiang, H.3
Kayiran, O.4
Jog, A.5
Pai, M.I.S.6
Kandemir, M.T.7
Das, C.R.8
|