-
1
-
-
70450183916
-
Understanding the efficiency of ray traversal on GPUs
-
Aila, T. and Laine, S. 2009. Understanding the efficiency of ray traversal on GPUs. Proceedings of the Conference on High Performance Graphics 2009 (New York, NY, USA, 2009), 145-149.
-
(2009)
Proceedings of the Conference on High Performance Graphics 2009 (New York, NY, USA, 2009)
, pp. 145-149
-
-
Aila, T.1
Laine, S.2
-
2
-
-
67650786281
-
PetaBricks: A language and compiler for algorithmic choice
-
Ansel, J. et al. 2009. PetaBricks: a language and compiler for algorithmic choice. Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation (New York, NY, USA, 2009), 38-49.
-
(2009)
Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2009)
, pp. 38-49
-
-
Ansel, J.1
-
5
-
-
84870721658
-
-
Accessed: 2011-08-25
-
CUDA: http://www.nvidia.com/object/cuda-home-new.html. Accessed: 2011-08-25.
-
CUDA
-
-
-
6
-
-
70449710961
-
-
Google Project Hosting: Accessed: 2011-07-12
-
cudpp - CUDA Data Parallel Primitives Library - Google Project Hosting: http://code.google.com/p/cudpp/. Accessed: 2011-07-12.
-
Cudpp - CUDA Data Parallel Primitives Library
-
-
-
7
-
-
0002806690
-
OpenMP: An industry standard API for shared-memory programming
-
Mar. 1998
-
Dagum, L. and Menon, R. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering. 5, (Mar. 1998), 46-55.
-
(1998)
IEEE Computational Science and Engineering
, vol.5
, pp. 46-55
-
-
Dagum, L.1
Menon, R.2
-
8
-
-
37549003336
-
MapReduce: Simplified data processing on large clusters
-
Jan. 2008
-
Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM. 51, 1 (Jan. 2008), 107-113.
-
(2008)
Commun. ACM.
, vol.51
, Issue.1
, pp. 107-113
-
-
Dean, J.1
Ghemawat, S.2
-
9
-
-
20744452904
-
Self-Adapting Linear Algebra Algorithms and Software
-
Feb. 2005
-
Demmel, J. et al. 2005. Self-Adapting Linear Algebra Algorithms and Software. Proceedings of the IEEE. 93, 2 (Feb. 2005), 293-312.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 293-312
-
-
Demmel, J.1
-
12
-
-
0348209599
-
A fast Fourier transform compiler
-
Frigo, M. 1999. A fast Fourier transform compiler. Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation (New York, NY, USA, 1999), 169-180.
-
(1999)
Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (New York, NY, USA, 1999)
, pp. 169-180
-
-
Frigo, M.1
-
13
-
-
84976721284
-
MULTILISP: A language for concurrent symbolic computation
-
Oct. 1985
-
Halstead,Jr., R.H. 1985. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501-538.
-
(1985)
ACM Trans. Program. Lang. Syst.
, vol.7
, Issue.4
, pp. 501-538
-
-
Halstead Jr., R.H.1
-
14
-
-
0003568839
-
-
IEEE Computer Society 2009. IEEE Std 1076-2008 (Revision of IEEE Std 1076-2002)
-
IEEE Computer Society 2009. IEEE Standard VHDL Language Reference Manual. IEEE Std 1076-2008 (Revision of IEEE Std 1076-2002). (2009), c1-626.
-
(2009)
IEEE Standard VHDL Language Reference Manual
-
-
-
15
-
-
84870669933
-
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation
-
Sep. 2011
-
Klöckner, A. et al. 2011. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing. (Sep. 2011).
-
(2011)
Parallel Computing
-
-
Klöckner, A.1
-
16
-
-
84866532295
-
-
Technical Report #245. LAPACK Working Note
-
Kurzak, J. et al. 2011. Autotuning GEMMs for Fermi. Technical Report #245. LAPACK Working Note.
-
(2011)
Autotuning GEMMs for Fermi
-
-
Kurzak, J.1
-
19
-
-
79959718248
-
High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing
-
2011
-
Merrill, D. and Grimshaw, A. 2011. High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters. 21, 02 (2011), 245-272.
-
(2011)
Parallel Processing Letters
, vol.21
, Issue.2
, pp. 245-272
-
-
Merrill, D.1
Grimshaw, A.2
-
20
-
-
78149268496
-
-
Technical Report #CS2009-14. Department of Computer Science, University of Virginia
-
Merrill, D. and Grimshaw, A. 2009. Parallel Scan for Stream Architectures. Technical Report #CS2009-14. Department of Computer Science, University of Virginia.
-
(2009)
Parallel Scan for Stream Architectures
-
-
Merrill, D.1
Grimshaw, A.2
-
21
-
-
67650661447
-
-
Accessed: 2009-12-12
-
Optimizing parallel reduction in CUDA: 2007. http://developer.download. nvidia.com/compute/cuda/1-1/Website/projects/reduction/doc/reduction.pdf. Accessed: 2009-12-12.
-
(2007)
Optimizing Parallel Reduction in CUDA
-
-
-
22
-
-
49049088756
-
GPU Computing
-
May. 2008
-
Owens, J.D. et al. 2008. GPU Computing. Proceedings of the IEEE. 96, 5 (May. 2008), 879-899.
-
(2008)
Proceedings of the IEEE.
, vol.96
, Issue.5
, pp. 879-899
-
-
Owens, J.D.1
-
23
-
-
19344368072
-
SPIRAL: Code Generation for DSP Transforms
-
Feb. 2005
-
Puschel, M. et al. 2005. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE. 93, 2 (Feb. 2005), 232-275.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 232-275
-
-
Puschel, M.1
-
24
-
-
33749908081
-
Classes of Recursively Enumerable Sets and Their Decision Problems
-
1953
-
Rice, H.G. 1953. Classes of Recursively Enumerable Sets and Their Decision Problems. Transactions of the American Mathematical Society. 74, 2 (1953), pp. 358-366.
-
(1953)
Transactions of the American Mathematical Society
, vol.74
, Issue.2
, pp. 358-366
-
-
Rice, H.G.1
-
27
-
-
84870714547
-
-
Google Project Hosting: Accessed: 2011-08-25
-
Thrust - Code at the speed of light - Google Project Hosting: http://code.google.com/p/thrust/. Accessed: 2011-08-25.
-
Thrust - Code at the Speed of Light
-
-
-
28
-
-
70449844310
-
A scalable auto-tuning framework for compiler optimization
-
Tiwari, A. et al. 2009. A scalable auto-tuning framework for compiler optimization. Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (Washington, DC, USA, 2009), 1-12.
-
(2009)
Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (Washington, DC, USA, 2009)
, pp. 1-12
-
-
Tiwari, A.1
-
29
-
-
84934313374
-
Task management for irregular-parallel workloads on the GPU
-
Tzeng, S. et al. 2010. Task management for irregular-parallel workloads on the GPU. Proceedings of the Conference on High Performance Graphics (Aire-la-Ville, Switzerland, Switzerland, 2010), 29-37.
-
(2010)
Proceedings of the Conference on High Performance Graphics (Aire-la-Ville, Switzerland, Switzerland, 2010)
, pp. 29-37
-
-
Tzeng, S.1
-
30
-
-
0025467711
-
A bridging model for parallel computation
-
Aug. 1990
-
Valiant, L.G. 1990. A bridging model for parallel computation. Commun. ACM. 33, 8 (Aug. 1990), 103-111.
-
(1990)
Commun. ACM.
, vol.33
, Issue.8
, pp. 103-111
-
-
Valiant, L.G.1
-
31
-
-
70350771131
-
Benchmarking GPUs to tune dense linear algebra
-
Volkov, V. and Demmel, J.W. 2008. Benchmarking GPUs to tune dense linear algebra. Proceedings of the 2008 ACM/IEEE conference on Supercomputing (Piscataway, NJ, USA, 2008), 31:1-31:11.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Piscataway, NJ, USA, 2008)
-
-
Volkov, V.1
Demmel, J.W.2
-
32
-
-
24344485098
-
OSKI: A library of automatically tuned sparse matrix kernels
-
Jan. 2005
-
Vuduc, R. et al. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series. 16, (Jan. 2005), 521-530.
-
(2005)
Journal of Physics: Conference Series
, vol.16
, pp. 521-530
-
-
Vuduc, R.1
|