-
1
-
-
84857846339
-
OpenCL, parallel computing on GPU and CPU
-
AAFTAB M. 2008. OpenCL, parallel computing on GPU and CPU. In Proceedings of SigGraph.
-
(2008)
Proceedings of SigGraph
-
-
Aaftab, M.1
-
3
-
-
77952660587
-
Visualizing complex dynamics in many-core accelerator architectures
-
ARIEL, A., FUNG, W. W. L., TURNER, A., AND AAMODT, T. M. Visualizing complex dynamics in many-core accelerator architectures. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 164-174.
-
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
, pp. 164-174
-
-
Ariel, A.1
Fung, W.W.L.2
Turner, A.3
Aamodt, T.M.4
-
4
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
BAKHODA, A., YUAN, G. L., FUNG, W. W. L., WONG, H., AND AAMODT, T. M. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 163-174.
-
(2009)
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
5
-
-
0000269759
-
Scheduling multithreaded computations by work stealing
-
BLUMOFE, R. D. AND LEISERSON, C. E. 1999. Scheduling multithreaded computations by work stealing. J. ACM. 46, 5.
-
(1999)
J. ACM
, vol.46
, pp. 5
-
-
Blumofe, R.D.1
Leiserson, C.E.2
-
6
-
-
51449118065
-
A performance study of general purpose applications on graphics processors using CUDA
-
CHE, S., BOYER, M., MENG, J., TARJAN, D., SHEAFER, J. W., AND SKADRON, K. 2008. A performance study of general purpose applications on graphics processors using CUDA. J. Parall. Distrib. Comput. 68, 10.
-
(2008)
J. Parall. Distrib. Comput.
, vol.68
, pp. 10
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheafer, J.W.5
Skadron, K.6
-
7
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
CHE, S., BOYER, M., MENG, J., TARJAN, D., SHEAFER, J., LEE, S.-H., AND SKADRON, K. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization. 44-54.
-
(2009)
Proceedings of the IEEE International Symposium on Workload Characterization
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheafer, J.5
Lee, S.-H.6
Skadron, K.7
-
9
-
-
80054828244
-
-
Computer Science Department, Carleton University, Ottawa, Canada
-
DEHNE, F. AND YOGARATNAM, K. 2010. Exploring the limits of GPU's with parallel graph algorithms. Computer Science Department, Carleton University, Ottawa, Canada.
-
(2010)
Exploring the Limits of GPU's with Parallel Graph Algorithms
-
-
Dehne, F.1
Yogaratnam, K.2
-
10
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient gpu control flow
-
FUNG, W. W. L., SHAM, I., YUAN, G., AND AAMODT, T. M. 2007. Dynamic warp formation and scheduling for efficient gpu control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.
-
(2007)
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
12
-
-
38349041620
-
Accelerating large graph algorithms on the GPU using CUDA
-
HARISH, P. AND NARAYANAN, P. J. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of HiPC. 197-208.
-
(2007)
Proceedings of HiPC
, pp. 197-208
-
-
Harish, P.1
Narayanan, P.J.2
-
14
-
-
79952811127
-
Accelerating CUDA graph algorithms at maximum warp
-
HONG, S., KIM, S. K., OGUNTEBI, T., AND OLUKOTUN, K. 2011. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of PPoPP.
-
(2011)
Proceedings of PPoPP
-
-
Hong, S.1
Kim, S.K.2
Oguntebi, T.3
Olukotun, K.4
-
15
-
-
84856541553
-
Efficient parallel graph exploration for multi-core CPU and GPU
-
HONG, S., OGUNTEBI, T., AND OLUKOTUN, K. 2011. Efficient parallel graph exploration for multi-core CPU and GPU. In Proceedings of PACT.
-
(2011)
Proceedings of PACT
-
-
Hong, S.1
Oguntebi, T.2
Olukotun, K.3
-
16
-
-
0034459255
-
Efficient conditional operations for data-parallel architectures
-
KAPASI, U. J., DALLY, J., RIXNER, W. S., MATTSON, P. R., OWENS, J. D., AND KHAILANY, B. 2000. Efficient conditional operations for data-parallel architectures. In Proceedings of MICRO.
-
(2000)
Proceedings of MICRO
-
-
Kapasi, U.J.1
Dally, J.2
Rixner, W.S.3
Mattson, P.R.4
Owens, J.D.5
Khailany, B.6
-
20
-
-
84857846338
-
-
MAXIME, B. 2010. Ray tracing in CUDA. http://ercbench.ece.wisc.edu/index. php?option=com-content&view=article&id=59:ray-tracing&catid=18: gpgpu&Itemid=20.
-
(2010)
Ray Tracing in CUDA
-
-
Maxime, B.1
-
21
-
-
77954994930
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
University of Virgina
-
MENG, J., TARJAN, D., AND SKADRON, K. 2010. Dynamic warp subdivision for integrated branch and memory divergence tolerance. Tech. rep. CS-2010-5, University of Virgina.
-
(2010)
Tech. Rep. CS-2010-5
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
23
-
-
84857874275
-
CUDA C programming best practices guide
-
NVIDIA
-
NVIDIA. 2009. CUDA C programming best practices guide. CUDA Toolkit 2.3.
-
(2009)
CUDA Toolkit 2.3
-
-
-
25
-
-
0033691565
-
Memory access scheduling
-
RIXNER, S., DALLY, W. J., KAPASI, U. J., MATTSON, P., AND OWENS, J. D. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00).
-
(2000)
Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00)
-
-
Rixner, S.1
Dally, W.J.2
Kapasi, U.J.3
Mattson, P.4
Owens, J.D.5
-
26
-
-
38849131252
-
High-throughput sequence alignment using graphics processing units
-
SCHATZ, M., TRAPNELL, C., DELCHER, A., AND VARSHNEY, A. 2007. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 8, 1, 474.
-
(2007)
BMC Bioinformatics
, vol.8
, Issue.1
, pp. 474
-
-
Schatz, M.1
Trapnell, C.2
Delcher, A.3
Varshney, A.4
-
27
-
-
83755201638
-
-
Department of Electrical and Computer Engineering, University of Toronto
-
WONG, H., PAPADOPOULOU, M., SADOOGHI-ALVANDI, M., AND MOSHOVOS, A. 2010 . Demystifying GPU microarchitecture through microbenchmarking. Department of Electrical and Computer Engineering, University of Toronto.
-
(2010)
Demystifying GPU Microarchitecture Through Microbenchmarking
-
-
Wong, H.1
Papadopoulou, M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
|