-
1
-
-
77957660323
-
-
NVIDIA CUDA (Compute Unified Device Architecture): Programming Guide Version 2.1 December
-
NVIDIA CUDA (Compute Unified Device Architecture): Programming Guide, Version 2.1, December 2008.
-
(2008)
-
-
-
3
-
-
77956260008
-
Efficient sparse matrix-vector multiplication on CUDA
-
Portland, OR, USA, November, (to appear)
-
Nathan Bell and Michael Garland. Efficient sparse matrix-vector multiplication on CUDA. In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009. (to appear).
-
(2009)
Proc. ACM/IEEE Conf. Supercomputing (SC)
-
-
Bell, N.1
Garland, M.2
-
4
-
-
77953998137
-
Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
-
San Diego, CA, USA, July
-
Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schröder. Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. In Proc. Special Interest Group on Graphics Conf. (SIGGRAPH), San Diego, CA, USA, July 2003. doi: http://dx.doi.org/10.1145/882262.882364.
-
(2003)
Proc. Special Interest Group on Graphics Conf. (SIGGRAPH)
-
-
Bolz, J.1
Farmer, I.2
Grinspun, E.3
Schröder, P.4
-
6
-
-
25144499116
-
Vectorized sparse matrix multiply for compressed row storage
-
2005 of LNCS, Springer Berlin / Heidelberg
-
Eduardo F. D'Azevedo, Mark R. Fahey, and Richard T. Mills. Vectorized sparse matrix multiply for compressed row storage. In Proc. Int'l. Conf. Computational Science (ICCS), volume 3514/2005 of LNCS, pages 99-106. Springer Berlin / Heidelberg, 2005. doi: http://dx.doi.org/10.1007/11428831 13.
-
(2005)
Proc. Int'l. Conf. Computational Science (ICCS)
, vol.3514
, pp. 99-106
-
-
Eduardo, F.1
D'Azevedo, M.R.F.2
Richard, T.M.3
-
7
-
-
20744452904
-
Self-adapting linear algebra algorithms and software
-
February
-
James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick. Self-adapting linear algebra algorithms and software. Proc. IEEE, 93(2):293-312, February 2005. doi: http://dx.doi.org/10.1109/JPROC.2004.840848.
-
(2005)
Proc. IEEE
, vol.93
, Issue.2
, pp. 293-312
-
-
Demmel, J.1
Dongarra, J.2
Eijkhout, V.3
Fuentes, E.4
Petitet, A.5
Vuduc, R.6
Whaley, R.C.7
Yelick, K.8
-
8
-
-
51549093017
-
Sparse matrix computations on manycore GPUs
-
Anaheim, CA, USA
-
Michael Garland. Sparse matrix computations on manycore GPUs. In Proc. ACM/IEEE Design Automation Conf. (DAC), pages 2-6, Anaheim, CA, USA, 2008. doi: http://dx.doi.org/10.1145/1391469.1391473.
-
(2008)
Proc. ACM/IEEE Design Automation Conf. (DAC)
, pp. 2-6
-
-
Garland, M.1
-
9
-
-
0035370546
-
Towards a fast sparse symmetric matrix-vector multiplication
-
June
-
Roman Geus and Stefan Röllin. Towards a fast sparse symmetric matrix-vector multiplication. Parallel Computing, 27(7):883-896, June 2001. doi: http://dx.doi.org/10.1016/S0167-8191(01)00073-74
-
(2001)
Parallel Computing
, vol.27
, Issue.7
, pp. 883-896
-
-
Geus, R.1
Röllin, S.2
-
10
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
Austin, TX, USA, June
-
Sunpyo Hong and Hyesoon Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. ACM Int'l. Symp. Comp. Arch. (ISCA), pages 152-163, Austin, TX, USA, June 2009. doi: http://dx.doi.org/10.1145/1555815.1555775.
-
(2009)
Proc. ACM Int'l. Symp. Comp. Arch. (ISCA)
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
11
-
-
1542501019
-
SPARSITY: Optimization framework for sparse matrix kernels
-
February
-
Eun-Jin Im, Katherine Yelick, and Richard Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. Int'l J. of High Performance Computing Applications (IJHPCA), 18(1):135-158, February 2004. doi: http://dx.doi.org/10. 1177/1094342004041296.
-
(2004)
Int'l J. of High Performance Computing Applications (IJHPCA)
, vol.18
, Issue.1
, pp. 135-158
-
-
Im, E.-J.1
Yelick, K.2
Vuduc, R.3
-
12
-
-
35248834555
-
Parallel finite element analysis platform for the Earth Simulator: GeoFEM
-
of LNCS, Springer
-
Hiroshi Okuda, Kengo Nakajima, Mikio Iizuka, Li Chen, and Hisashi Nakamura. Parallel finite element analysis platform for the Earth Simulator: GeoFEM. In Proc. Int'l. Conf. Computational Science (ICCS), volume 2659 of LNCS, pages 773-780. Springer, 2003. doi: http://dx.doi.org/10.1007/3-540-44863-2 75.
-
(2003)
Proc. Int'l. Conf. Computational Science (ICCS)
, vol.2659
, pp. 773-780
-
-
Okuda, H.1
Nakajima, K.2
Iizuka, M.3
Chen, L.4
Nakamura, H.5
-
13
-
-
85031264203
-
Improving performance of sparse matrix-vector multiplication
-
Portland, OR, USA
-
Ali Pinar and Michael T. Heath. Improving performance of sparse matrix-vector multiplication. In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, 1999. doi: http://dx.doi.org/10.1145/331532.331562.
-
(1999)
Proc. ACM/IEEE Conf. Supercomputing (SC)
-
-
Pinar, A.1
Michael, T.H.2
-
16
-
-
78651284120
-
Scan primitives for GPU computing
-
San Diego, CA, USA
-
Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens. Scan primitives for GPU computing. In Proc. ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware, San Diego, CA, USA, 2007.
-
(2007)
Proc. ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware
-
-
Sengupta, S.1
Harris, M.2
Zhang, Y.3
Owens, J.D.4
-
18
-
-
24344485098
-
OSKI: A library of automatically tuned sparse matrix kernels
-
Richard Vuduc, JamesW. Demmel, and Katherine A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Phys.: Conf. Series, volume 16, pages 521-530, 2005. doi: http://dx.doi.org/10.1088/1742- 6596/16/1/071.
-
(2005)
Proc. SciDAC, J. Phys.: Conf. Series
, vol.16
, pp. 521-530
-
-
Vuduc, R.1
Demmel, J.W.2
Yelick, K.A.3
-
19
-
-
10044233808
-
-
PhD thesis, University of California, Berkeley, CA, USA, January
-
Richard W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, CA, USA, January 2004.
-
(2004)
Automatic Performance Tuning of Sparse Matrix Kernels
-
-
Vuduc, R.W.1
-
20
-
-
33646389518
-
Fast sparse matrix-vector multiplication by exploiting variable block structure
-
LNCS, Sorrento, Italy, September, LNCSSpringer. doi
-
Richard W. Vuduc and Hyun-Jin Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proc. High- Performance Computing and Communications Conf., volume LNCS 3726/2005, pages 807-816, Sorrento, Italy, September 2005. Springer. doi: http://dx.doi.org/10. 1007/11557654 91.
-
(2005)
Proc. High- Performance Computing and Communications Conf.
, vol.2005-3726
, pp. 807-816
-
-
Vuduc, R.W.1
Moon, H.-J.2
-
21
-
-
60949098907
-
Optimizing sparse matrix-vector multiply on emerging multicore platforms
-
March
-
Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel. Optimizing sparse matrix-vector multiply on emerging multicore platforms. Journal of Parallel Computing, 35(3):178-194, March 2009. doi: http://dx.doi.org/10.1016/j.parco.2008.12.006.
-
(2009)
Journal of Parallel Computing
, vol.35
, Issue.3
, pp. 178-194
-
-
Williams, S.1
Vuduc, R.2
Oliker, L.3
Shalf, J.4
Yelick, K.5
Demmel, J.6
-
22
-
-
20744459570
-
Is search really necessary to generate high-performance BLAS?
-
February
-
Kamen Yotov, Xiaoming Li, Gang Ren, María Jesús Garzarán, David Padua, Keshav Pingali, and Paul Stodghill. Is search really necessary to generate high-performance BLAS? Proc. IEEE, 93(2):358-386, February 2005. doi: .
-
(2005)
Proc IEEE
, vol.93
, Issue.2
, pp. 358-386
-
-
Yotov, K.1
Li, X.2
Ren, G.3
Garzarán, M.J.4
Padua, D.5
Pingali, K.6
Stodghill, P.7
|