-
1
-
-
77957561221
-
An adaptive performance modeling tool for GPU architectures
-
New York, NY, USA
-
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, Wen-mei W. Hwu, An adaptive performance modeling tool for GPU architectures, in: Proceedings of the 15th ACM SIGPLAN Smposium on Principles and Practice of Parallel Programming, PPoPP'10, ACM, New York, NY, USA, 2010, pp. 105-114.
-
(2010)
Proceedings of the 15th ACM SIGPLAN Smposium on Principles and Practice of Parallel Programming, PPoPP'10, ACM
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
2
-
-
70350647423
-
Parallelization of a video segmentation algorithm on CUDA enabled graphics processing units
-
Juan Gómez-Luna, José María González-Linares, José Ignacio Benavides, Nicolás Guil, Parallelization of a video segmentation algorithm on CUDAenabled graphics processing units, in: Proc. of the Int'l Euro-Par Conference on Parallel Processing, EuroPar'09, 2009, pp. 924-935.
-
(2009)
Proc. of the Int'l Euro-Par Conference on Parallel Processing, EuroPar'09
, pp. 924-935
-
-
Gómez-Luna, J.1
María González-Linares, J.2
Ignacio Benavides, J.3
Guil, N.4
-
3
-
-
85030495350
-
Performance models for CUDA streams on NVIDIA GeForce series
-
Juan Gómez-Luna, José María González-Linares, José Ignacio Benavides, Nicolás Guil, Performance models for CUDA streams on NVIDIA GeForce series, Technical Report, University of Málaga, 2011. http://www.ac.uma.es/~vip/publications/UMA-DAC-11-02.pdf.
-
(2011)
Technical Report, University of Málaga
-
-
Gómez-Luna, J.1
María González-Linares, J.2
Ignacio Benavides, J.3
Guil, N.4
-
4
-
-
70449464746
-
Using GPU to accelerate cache simulation
-
Wan Han, Gao Xiaopeng, Wang Zhiqiang, Li Yi, Using GPU to accelerate cache simulation, in: IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009, pp. 565-570.
-
(2009)
IEEE International Symposium on Parallel and Distributed Processing with Applications
, pp. 565-570
-
-
Wan, H.1
Gao, X.2
Wang, Z.3
Yi, L.4
-
5
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
ACM, New York, NY, USA
-
Sunpyo Hong, Hyesoon Kim, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, in: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA'09, ACM, New York, NY, USA, 2009, pp. 152-163.
-
(2009)
Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA'09
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
6
-
-
79953071805
-
Sponge: Portable stream programming on graphics engines
-
ACM, New York, NY, USA
-
Amir H. Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, Scott Mahlke, Sponge: portable stream programming on graphics engines, in: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, ACM, New York, NY, USA, 2011, pp. 381-392.
-
(2011)
Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11
, pp. 381-392
-
-
Hormati, A.H.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
7
-
-
85030487244
-
-
Khronos Group, OpenCL. http://www.khronos.org/opencl/.
-
-
-
-
8
-
-
77954574912
-
Stream experiments: Toward latency hiding in GPGPU
-
Supada Laosooksathit, Chokchai B. Leangsuksun, Abdelkader Baggag, Clayton F. Chandler, Stream experiments: toward latency hiding in GPGPU, in: Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN'10, 2010, pp. 240-248.
-
(2010)
Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN'10
, pp. 240-248
-
-
Laosooksathit, S.1
Leangsuksun, C.B.2
Baggag, A.3
Chandler, C.F.4
-
9
-
-
77954725202
-
Overlapping communication and computation by using a hybrid MPI/SMPSS approach
-
ACM, New York, NY, USA
-
Vladimir Marjanović, Jesús Labarta, Eduard Ayguadé, Mateo Valero, Overlapping communication and computation by using a hybrid MPI/SMPSS approach, in: Proceedings of the 24th ACM International Conference on Supercomputing, ICS'10, ACM, New York, NY, USA, 2010, pp. 5-16.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing, ICS'10
-
-
Marjanović, V.1
Labarta, J.2
Ayguadé, E.3
Valero, M.4
-
11
-
-
79955066309
-
-
August
-
NVIDIA, CUDA C best practices guide 3.2, August 2010. http://developer.download.nvidia.com/compute/cuda/3-2/toolkit/docs/ CUDA-C-Best-Practices-Guide.pdf.
-
(2010)
CUDA C Best Practices Guide 3.2
-
-
-
12
-
-
79955074605
-
-
September
-
NVIDIA, CUDA C programming guide 3.2, September 2010. http://developer.download.nvidia.com/compute/cuda/3-2/toolkit/docs/ CUDA-C-Programming-Guide.pdf.
-
(2010)
CUDA C Programming Guide 3.2
-
-
-
14
-
-
84873478761
-
-
NVIDIA, CUDA Zone. http://www.nvidia.com/object/cuda-home-new.html.
-
CUDA Zone
-
-
-
15
-
-
84870669626
-
-
Peripheral Component Interconnect Special Interest Group, PCI Express. http://www.pcisig.com/.
-
PCI Express
-
-
-
16
-
-
70350754499
-
Adapting a message-driven parallel application to GPU-accelerated clusters
-
IEEE Press, Piscataway, NJ, USA
-
James C. Phillips, John E. Stone, Klaus Schulten, Adapting a message-driven parallel application to GPU-accelerated clusters, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC'08, IEEE Press, Piscataway, NJ, USA, 2008, pp. 8:1-8:9.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC'08
, pp. 81-89
-
-
Phillips, J.C.1
Stone, J.E.2
Schulten, K.3
-
17
-
-
57349086588
-
-
White Paper
-
V. Podlozhnyuk, Histogram calculation in CUDA, White Paper, 2007. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/ histogram256/doc/histogram.pdf.
-
(2007)
Histogram Calculation in CUDA
-
-
Podlozhnyuk, V.1
-
18
-
-
67650812067
-
Synergistic execution of stream programs on multicores with accelerators
-
ACM, New York, NY, USA
-
Abhishek Udupa, R. Govindarajan, Matthew J. Thazhuthaveetil, Synergistic execution of stream programs on multicores with accelerators, in: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES'09, ACM, New York, NY, USA, 2009, pp. 99-108.
-
(2009)
Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES'09
, pp. 99-108
-
-
Udupa, A.1
Govindarajan, R.2
Thazhuthaveetil, M.J.3
-
19
-
-
77954735057
-
Improving linpack performance on SMP clusters with asynchronous MPI programming
-
Ta Quoc Viet, Tsutomu Yoshinaga, Improving linpack performance on SMP clusters with asynchronous MPI programming, IPSJ Digital Courier 2 (2006) 598-606.
-
(2006)
IPSJ Digital Courier
, vol.2
, pp. 598-606
-
-
Quoc Viet, T.1
Yoshinaga, T.2
|