-
2
-
-
84964582380
-
-
3.1 ed AMD
-
AMD. CodeXL, 3.1 ed. AMD.
-
AMD. CodeXL
-
-
-
3
-
-
77952660587
-
Visualizing complex dynamics in many-core accelerator architectures
-
IEEE Computer Society
-
Ariel, A., Fung, W. W. L., Turner, A. E., and Aamodt, T. M. Visualizing complex dynamics in many-core accelerator architectures. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (White Plains, NY, USA, March 2010), IEEE Computer Society, pp. 164-174.
-
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (White Plains, NY, USA, March 2010)
, pp. 164-174
-
-
Ariel, A.1
Fung, W.W.L.2
Turner, A.E.3
Aamodt, T.M.4
-
4
-
-
77749337497
-
An adaptive performance modeling tool for GPU architectures
-
PPoPP '10, ACM
-
Baghsorkhi, S. S., Delahaye, M., Patel, S. J., Gropp, W. D., and Hwu, W.-m. W. An adaptive performance modeling tool for gpu architectures. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2010), PPoPP '10, ACM, pp. 105-114.
-
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, NY, USA 2010)
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
5
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
Boston, MA, USA, April
-
Bakhoda, A., Yuan, G., Fung, W. W. L., Wong, H., and Aamodt, T. M. Analyzing cuda workloads using a detailed gpu simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (Boston, MA, USA, April 2009), pp. 163-174.
-
(2009)
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
6
-
-
84900589248
-
Efficient mapping of irregular c++ applications to integrated GPUs
-
CGO '14, ACM
-
Barik, R., Kaleem, R., Majeti, D., Lewis, B. T., Shpeisman, T., Hu, C., Ni, Y., and Adl-Tabatabai, A.-R. Efficient mapping of irregular c++ applications to integrated gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (New York, NY, USA, 2014), CGO '14, ACM, pp. 33:33-33:43.
-
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (New York, NY, USA 2014)
, pp. 3333-3343
-
-
Barik, R.1
Kaleem, R.2
Majeti, D.3
Lewis, B.T.4
Shpeisman, T.5
Hu, C.6
Ni, Y.7
Adl-Tabatabai, A.-R.8
-
7
-
-
84863973589
-
A virtual memory based runtime to support multi-Tenancy in clusters with GPUs
-
HPDC '12 ACM
-
Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., and Chakradhar, S. A virtual memory based runtime to support multi-Tenancy in clusters with gpus. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (New York, NY, USA, 2012), HPDC '12, ACM, pp. 97-108.
-
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (New York, NY, USA 2012)
, pp. 97-108
-
-
Becchi, M.1
Sajjapongse, K.2
Graves, I.3
Procter, A.4
Ravi, V.5
Chakradhar, S.6
-
8
-
-
84879533965
-
Load balancing in a changing world: Dealing with heterogeneity and performance variability
-
CF '13, ACM
-
Boyer, M., Skadron, K., Che, S., and Jayasena, N. Load balancing in a changing world: Dealing with heterogeneity and performance variability. In Proceedings of the ACM International Conference on Computing Frontiers (New York, NY, USA, 2013), CF '13, ACM, pp. 21:1-21:10.
-
Proceedings of the ACM International Conference on Computing Frontiers (New York, NY, USA 2013)
, pp. 211-2110
-
-
Boyer, M.1
Skadron, K.2
Che, S.3
Jayasena, N.4
-
9
-
-
84873458159
-
A quantitative study of irregular programs on GPUs
-
Burtscher, M., Nasre, R., and Pingali, K. A quantitative study of irregular programs on gpus. In Workload Characterization (IISWC), 2012 IEEE International Symposium on (2012), pp. 141-151.
-
(2012)
Workload Characterization (IISWC) 2012 IEEE International Symposium on
, pp. 141-151
-
-
Burtscher, M.1
Nasre, R.2
Pingali, K.3
-
10
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.-H., and Skadron, K. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on (Oct. 2009), pp. 44-54.
-
(2009)
Workload Characterization 2009 IISWC 2009 IEEE International Symposium on (Oct)
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.5
Lee, S.-H.6
Skadron, K.7
-
11
-
-
70649096881
-
-
Tech. Rep. hal-00359342
-
Collange, S., Defour, D., and Parello, D. Barra, a modular functional gpu simulator for gpgpu. Tech. Rep. hal-00359342, 2009.
-
(2009)
A Modular Functional GPU Simulator for Gpgpu
-
-
Collange, S.1
Defour, D.2
Barra, P.D.3
-
12
-
-
84862085532
-
Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures
-
Farooqui, N., Kerr, A., Eisenhauer, G., Schwan, K., and Yalamanchili, S. Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures. In Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on (april 2012), pp. 58-67.
-
(2012)
Performance Analysis of Systems and Software (ISPASS 2012 IEEE International Symposium on (April)
, pp. 58-67
-
-
Farooqui, N.1
Kerr, A.2
Eisenhauer, G.3
Schwan, K.4
Yalamanchili, S.5
-
13
-
-
78751477137
-
Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications
-
Goswami, N., Shankar, R., Joshi, M., and Li, T. Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In Workload Characterization (IISWC), 2010 IEEE International Symposium on (2010), pp. 1-10.
-
(2010)
Workload Characterization (IISWC) 2010 IEEE International Symposium on
, pp. 1-10
-
-
Goswami, N.1
Shankar, R.2
Joshi, M.3
Li, T.4
-
14
-
-
84863043723
-
Pegasus: Coordinated scheduling for virtualized accelerator-based systems
-
Gupta, V., Schwan, K., Tolia, N., Talwar, V., and Ranganathan, P. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In Proceedings of the 2011 Usenix Annual Technical Conference (Portland, USA, 2011).
-
(2011)
Proceedings of the 2011 Usenix Annual Technical Conference (Portland, USA)
-
-
Gupta, V.1
Schwan, K.2
Tolia, N.3
Talwar, V.4
Ranganathan, P.5
-
15
-
-
80053955412
-
Accelerating CUDA graph algorithms at maximum warp
-
PPoPP '11, ACM
-
Hong, S., Kim, S. K., Oguntebi, T., and Olukotun, K. Accelerating cuda graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2011), PPoPP '11, ACM, pp. 267-276.
-
Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (New York, NY, USA 2011)
, pp. 267-276
-
-
Hong, S.1
Kim, S.K.2
Oguntebi, T.3
Olukotun, K.4
-
17
-
-
59049085159
-
Predictive runtime code scheduling for heterogeneous architectures
-
HiPEAC '09, Springer-Verlag
-
Jimenez, V. J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., and Navarro, N. Predictive runtime code scheduling for heterogeneous architectures. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (Berlin, Heidelberg, 2009), HiPEAC '09, Springer-Verlag, pp. 19-33.
-
Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (Berlin, Heidelberg 2009)
, pp. 19-33
-
-
Jimenez, V.J.1
Vilanova, L.2
Gelado, I.3
Gil, M.4
Fursin, G.5
Navarro, N.6
-
18
-
-
84907087776
-
Adaptive heterogeneous scheduling for integrated GPUs
-
PACT '14 ACM
-
Kaleem, R., Barik, R., Shpeisman, T., Lewis, B. T., Hu, C., and Pingali, K. Adaptive heterogeneous scheduling for integrated gpus. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (New York, NY, USA, 2014), PACT '14, ACM, pp. 151-162.
-
Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (New York, NY, USA 2014)
, pp. 151-162
-
-
Kaleem, R.1
Barik, R.2
Shpeisman, T.3
Lewis, B.T.4
Hu, C.5
Pingali, K.6
-
19
-
-
84855757761
-
Timegraph: GPU scheduling for real-Time multi-Tasking environments
-
USENIXATC'11, USENIX Association
-
Kato, S., Lakshmanan, K., Rajkumar, R., and Ishikawa, Y. Timegraph: Gpu scheduling for real-Time multi-Tasking environments. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (Berkeley, CA, USA, 2011), USENIXATC'11, USENIX Association, pp. 2-2.
-
Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (Berkeley, CA, USA 2011)
, pp. 2
-
-
Kato, S.1
Lakshmanan, K.2
Rajkumar, R.3
Ishikawa, Y.4
-
20
-
-
84878156908
-
Gdev: First-class GPU resource management in the operating system
-
USENIX ATC'12, USENIX Association
-
Kato, S., McThrow, M., Maltzahn, C., and Brandt, S. Gdev: First-class gpu resource management in the operating system. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (Berkeley, CA, USA, 2012), USENIX ATC'12, USENIX Association, pp. 37-37.
-
Proceedings of the 2012 USENIX Conference on Annual Technical Conference (Berkeley, CA, USA 2012)
, pp. 37
-
-
Kato, S.1
McThrow, M.2
Maltzahn, C.3
Brandt, S.4
-
21
-
-
70649104826
-
A characterization and analysis of ptx kernels
-
Kerr, A., Diamos, G., and Yalamanchili, S. A characterization and analysis of ptx kernels. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on (oct. 2009), pp. 3-12.
-
(2009)
Workload Characterization 2009 IISWC 2009 IEEE International Symposium on (Oct)
, pp. 3-12
-
-
Kerr, A.1
Diamos, G.2
Yalamanchili, S.3
-
22
-
-
84889682621
-
Evaluating integrated graphics processors for data center workloads
-
HotPower '13 ACM
-
Kim, S., Roy, I., and Talwar, V. Evaluating integrated graphics processors for data center workloads. In Proceedings of the Workshop on Power-Aware Computing and Systems (New York, NY, USA, 2013), HotPower '13, ACM, pp. 8:1-8:5.
-
Proceedings of the Workshop on Power-Aware Computing and Systems (New York, NY, USA 2013)
, pp. 81-85
-
-
Kim, S.1
Roy, I.2
Talwar, V.3
-
23
-
-
84899673745
-
Efficient data partitioning model for heterogeneous graphs in the cloud
-
SC '13 ACM
-
Lee, K., and Liu, L. Efficient data partitioning model for heterogeneous graphs in the cloud. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (New York, NY, USA, 2013), SC '13, ACM, pp. 46:1-46:12.
-
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (New York, NY, USA 2013)
, pp. 461-4612
-
-
Lee, K.1
Liu, L.2
-
25
-
-
84863735533
-
Distributed graphlab: A framework for machine learning and data mining in the cloud
-
Apr
-
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (Apr. 2012), 716-727.
-
(2012)
Proc. VLDB Endow
, vol.5
, Issue.8
, pp. 716-727
-
-
Low, Y.1
Bickson, D.2
Gonzalez, J.3
Guestrin, C.4
Kyrola, A.5
Hellerstein, J.M.6
-
26
-
-
77956200064
-
An effective GPU implementation of breadth-first search
-
ACM
-
Luo, L., Wong, M., and Hwu, W.-m. An effective gpu implementation of breadth-first search. In Proceedings of the 47th design automation conference (2010), ACM, pp. 52-55.
-
(2010)
Proceedings of the 47th Design Automation Conference
, pp. 52-55
-
-
Luo, L.1
Wong, M.2
Hwu, W.-M.3
-
27
-
-
84858783719
-
L. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations
-
MICRO-44, ACM
-
Mars, J., Tang, L., Hundt, R., Skadron, K., and Soffa, M. L. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (New York, NY, USA, 2011), MICRO-44, ACM, pp. 248-259.
-
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (New York, NY, USA 2011)
, pp. 248-259
-
-
Mars, J.1
Tang, L.2
Hundt, R.3
Skadron, K.4
Soffa, M.5
-
28
-
-
84897749415
-
Disengaged scheduling for fair, protected access to fast computational accelerators
-
ASPLOS '14 ACM
-
Menychtas, K., Shen, K., and Scott, M. L. Disengaged scheduling for fair, protected access to fast computational accelerators. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2014), ASPLOS '14, ACM, pp. 301-316.
-
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA 2014)
, pp. 301-316
-
-
Menychtas, K.1
Shen, K.2
Scott, M.L.3
-
29
-
-
84889648298
-
A lightweight infrastructure for graph analytics
-
SOSP '13, ACM
-
Nguyen, D., Lenharth, A., and Pingali, K. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP '13, ACM, pp. 456-471.
-
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA 2013)
, pp. 456-471
-
-
Nguyen, D.1
Lenharth, A.2
Pingali, K.3
-
31
-
-
84978529944
-
-
4.0 ed. NVIDIA Corporation, Santa Clara, CaliforniaMay
-
NVIDIA. NVIDIA Compute Visual Profiler, 4.0 ed. NVIDIA Corporation, Santa Clara, California, May 2011.
-
(2011)
NVIDIA. NVIDIA Compute Visual Profiler
-
-
-
32
-
-
84875671819
-
Portable performance on heterogeneous architectures
-
ASPLOS '13 ACM
-
Phothilimthana, P. M., Ansel, J., Ragan-Kelley, J., and Amarasinghe, S. Portable performance on heterogeneous architectures. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (New York, NY, USA, 2013), ASPLOS '13, ACM, pp. 431-444.
-
Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA 2013)
, pp. 431-444
-
-
Phothilimthana, P.M.1
Ansel, J.2
Ragan-Kelley, J.3
Amarasinghe, S.4
-
33
-
-
84863933095
-
Interference-driven resource management for GPU-based heterogeneous clusters
-
HPDC '12 ACM
-
Phull, R., Li, C.-H., Rao, K., Cadambi, H., and Chakradhar, S. Interference-driven resource management for gpu-based heterogeneous clusters. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (New York, NY, USA, 2012), HPDC '12, ACM, pp. 109-120.
-
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (New York, NY, USA 2012)
, pp. 109-120
-
-
Phull, R.1
Li, C.-H.2
Rao, K.3
Cadambi, H.4
Chakradhar, S.5
-
34
-
-
79960506159
-
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
-
HPDC '11 ACM
-
Ravi, V. T., Becchi, M., Agrawal, G., and Chakradhar, S. Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In Proceedings of the 20th international symposium on High performance distributed computing (New York, NY, USA, 2011), HPDC '11, ACM, pp. 217-228.
-
Proceedings of the 20th International Symposium on High Performance Distributed Computing (New York, NY, USA 2011)
, pp. 217-228
-
-
Ravi, V.T.1
Becchi, M.2
Agrawal, G.3
Chakradhar, S.4
-
35
-
-
84863676008
-
Scheduling concurrent applications on a cluster of cpu-GPU nodes
-
CCGRID '12, IEEE Computer Society
-
Ravi, V. T., Becchi, M., Jiang, W., Agrawal, G., and Chakradhar, S. Scheduling concurrent applications on a cluster of cpu-gpu nodes. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) (Washington, DC, USA, 2012), CCGRID '12, IEEE Computer Society, pp. 140-147.
-
Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012) (Washington, DC, USA 2012)
, pp. 140-147
-
-
Ravi, V.T.1
Becchi, M.2
Jiang, W.3
Agrawal, G.4
Chakradhar, S.5
-
36
-
-
82655162782
-
Ptask: Operating system abstractions to manage GPUs as compute devices
-
SOSP '11, ACM
-
Rossbach, C. J., Currey, J., Silberstein, M., Ray, B., and Witchel, E. Ptask: Operating system abstractions to manage gpus as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (New York, NY, USA, 2011), SOSP '11, ACM, pp. 233-248.
-
Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (New York, NY, USA 2011)
, pp. 233-248
-
-
Rossbach, C.J.1
Currey, J.2
Silberstein, M.3
Ray, B.4
Witchel, E.5
-
37
-
-
84889679621
-
Dandelion: A compiler and runtime for heterogeneous systems
-
SOSP '13, ACM
-
Rossbach, C. J., Yu, Y., Currey, J., Martin, J.-P., and Fetterly, D. Dandelion: A compiler and runtime for heterogeneous systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP '13, ACM, pp. 49-68.
-
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA 2013)
, pp. 49-68
-
-
Rossbach, C.J.1
Yu, Y.2
Currey, J.3
Martin, J.-P.4
Fetterly, D.5
-
38
-
-
84900624911
-
Red fox: An execution environment for relational query processing on GPUs
-
CGO '14, ACM
-
Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., and Yalamanchili, S. Red fox: An execution environment for relational query processing on gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (New York, NY, USA, 2014), CGO '14, ACM, pp. 44:44-44:54.
-
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (New York, NY, USA 2014)
, pp. 4444-4454
-
-
Wu, H.1
Diamos, G.2
Sheard, T.3
Aref, M.4
Baxter, S.5
Garland, M.6
Yalamanchili, S.7
-
39
-
-
79955921273
-
A quantitative performance analysis model for GPU architectures
-
San Antonio, TX, USA February IEEE Computer Society
-
Zhang, Y., and Owens, J. D. A quantitative performance analysis model for gpu architectures. In 17th International Conference on High-Performance Computer Architecture (HPCA-17) (San Antonio, TX, USA, February 2011), IEEE Computer Society, pp. 382-393.
-
(2011)
17th International Conference on High-Performance Computer Architecture (HPCA-17)
, pp. 382-393
-
-
Zhang, Y.1
Owens, J.D.2
|