-
1
-
-
51649124194
-
Efficient breadth-first search on the cell/BE processor
-
Oct. 10
-
D. P. Scarpazza, O. Villa, and F. Petrini, "Efficient breadth-first search on the cell/BE processor," IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 10, pp. 1381-1395, Oct. 10, 2008.
-
(2008)
IEEE Trans. Parallel Distrib. Syst.
, vol.19
, Issue.10
, pp. 1381-1395
-
-
Scarpazza, D.P.1
Villa, O.2
Petrini, F.3
-
2
-
-
77955747336
-
FPGA and GPU implementation of large scale SpMV
-
Y. Shan, W. Tianji, Y. Wang, B. Wang, Z. Wang, N. Xu, and H. Yang, "FPGA and GPU implementation of large scale SpMV," in Proc. 8th Symp. Appl. Sp., 2010, pp. 64-70.
-
Proc. 8th Symp. Appl. Sp. 2010
, pp. 64-70
-
-
Shan, Y.1
Tianji, W.2
Wang, Y.3
Wang, B.4
Wang, Z.5
Xu, N.6
Yang, H.7
-
3
-
-
77954995885
-
Debunking the 100X GPU versus CPU myth: An evaluation of throughput computing on CPU and GPU
-
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU versus CPU myth: An evaluation of throughput computing on CPU and GPU," in Proc. Int. Symp. Comput. Arch., vol. 38, no. 3, 2010, pp. 451-460.
-
(2010)
Proc. Int. Symp. Comput. Arch.
, vol.38
, Issue.3
, pp. 451-460
-
-
Lee, V.W.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.D.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
Singhal, R.11
Dubey, P.12
-
4
-
-
84869388261
-
Codesign tradeoffs for high-performance, low-power linear algebra architectures
-
Oct.
-
A. Pedram, R. A. van de Geijn, and A. Gerstlauer, "Codesign tradeoffs for high-performance, low-power linear algebra architectures," IEEE Trans. Comput., vol. 61, no. 12, pp. 1724-1736, Oct. 2012.
-
(2012)
IEEE Trans. Comput.
, vol.61
, Issue.12
, pp. 1724-1736
-
-
Pedram, A.1
Van De Geijn, R.A.2
Gerstlauer, A.3
-
7
-
-
84900536807
-
Optimization of Quasi diagonal matrix vector multiplication on GPU
-
W. Yang, K. Li, Y. Liu, L. Shi, and L. Wan, "Optimization of Quasi diagonal matrix vector multiplication on GPU," Int. J. High Performance Comput. Appl., vol. 28, no. 2, pp. 183-195, 2014.
-
(2014)
Int. J. High Performance Comput. Appl.
, vol.28
, Issue.2
, pp. 183-195
-
-
Yang, W.1
Li, K.2
Liu, Y.3
Shi, L.4
Wan, L.5
-
8
-
-
0242533311
-
Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
-
J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, "Sparse matrix solvers on the GPU: Conjugate gradients and multigrid," ACM Trans. Graph., vol. 22, no. 3, pp. 917-924, 2003.
-
(2003)
ACM Trans. Graph.
, vol.22
, Issue.3
, pp. 917-924
-
-
Bolz, J.1
Farmer, I.2
Grinspun, E.3
Schroder, P.4
-
10
-
-
84884657209
-
Architecting the finite element method pipeline for the GPU
-
Feb.
-
Z. Fu, T. J. Lewis, R. M. Kirby, and R. T. Whitaker, "Architecting the finite element method pipeline for the GPU," J. Comput. Appl. Math., vol. 257, pp. 195-211, Feb. 2014.
-
(2014)
J. Comput. Appl. Math.
, vol.257
, pp. 195-211
-
-
Fu, Z.1
Lewis, T.J.2
Kirby, R.M.3
Whitaker, R.T.4
-
11
-
-
74049143158
-
Implementing sparse matrix-vector multiplication on throughput-oriented processors
-
N. Bell and M. Garland, "Implementing sparse matrix-vector multiplication on throughput-oriented processors," in Proc. Conf. High Performance Comput. Netw., Storage Anal., 2009, p. 18.
-
Proc. Conf. High Performance Comput. Netw., Storage Anal., 2009
, pp. 18
-
-
Bell, N.1
Garland, M.2
-
12
-
-
84899694907
-
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
-
W. T. Tang, W. J. Tan, R. Ray, Y. W. Wong, W. Chan, S. H. Kuo, R. S. M. Goh, S. J. Turner, and W. F. Wong, "Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes," in Proc. Int. Conf. High Performance Comput., Netw., Storage Anal., 2013.
-
Proc. Int. Conf. High Performance Comput., Netw., Storage Anal., 2013
-
-
Tang, W.T.1
Tan, W.J.2
Ray, R.3
Wong, Y.W.4
Chan, W.5
Kuo, S.H.6
Goh, R.S.M.7
Turner, S.J.8
Wong, W.F.9
-
13
-
-
78249244772
-
Improving the performance of the sparse matrix vector product with GPUs
-
Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser.
-
F. Vazquez, G. Ortega, J. J. Fernandez, and E. M. Garzon, "Improving the performance of the sparse matrix vector product with GPUs," in Proc. 10th IEEE Int. Conf. Comput. Inform. Technol., ser. CIT, 2010, pp. 1146-1151.
-
(2010)
CIT
, pp. 1146-1151
-
-
Vazquez, F.1
Ortega, G.2
Fernandez, J.J.3
Garzon, E.M.4
-
14
-
-
67650998701
-
Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms
-
S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, "Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms," J. Parallel Distrib. Comput., vol. 69, no. 9, pp. 762-777, 2009.
-
(2009)
J. Parallel Distrib. Comput.
, vol.69
, Issue.9
, pp. 762-777
-
-
Williams, S.1
Carter, J.2
Oliker, L.3
Shalf, J.4
Yelick, K.5
-
15
-
-
77956238872
-
Exact sparse matrix vector multiplication on GPU's and multicore architectures
-
B. Boyer, J. G. Dumas, and P. Giorgi, "Exact sparse matrix vector multiplication on GPU's and multicore architectures," in Proc. 4th Int. Workshop Parallel Symbolic Comput., Jul. 2010, pp. 80-88.
-
Proc. 4th Int. Workshop Parallel Symbolic Comput., Jul. 2010
, pp. 80-88
-
-
Boyer, B.1
Dumas, J.G.2
Giorgi, P.3
-
16
-
-
80053263342
-
Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication
-
May
-
A. Buluc, S. Williams, L. Oliker, and J. Demmel, "Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication," in Proc. IEEE Int. Parallel Distrib. Process. Symp., pp. 721-733, May 2009.
-
(2009)
Proc. IEEE Int. Parallel Distrib. Process. Symp.
, pp. 721-733
-
-
Buluc, A.1
Williams, S.2
Oliker, L.3
Demmel, J.4
-
17
-
-
84855652802
-
An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs
-
Jan.
-
S. Sun, M. Monga, P. H. Jones, and J. Zambreno, "An I/O bandwidth-sensitive sparse matrix vector multiplication engine on FPGAs," IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 59, no. 1, pp. 113-123, Jan. 2012.
-
(2012)
IEEE Trans. Circuits Syst. I: Reg. Papers
, vol.59
, Issue.1
, pp. 113-123
-
-
Sun, S.1
Monga, M.2
Jones, P.H.3
Zambreno, J.4
-
18
-
-
84885948161
-
Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor
-
J. C. Pichel and F. F. Rivera, "Sparse matrix-vector multiplication on the single-chip cloud computer many-core processor," J. Parallel Distrib. Comput., vol. 73, no. 12, pp. 1539-1550, 2013.
-
(2013)
J. Parallel Distrib. Comput.
, vol.73
, Issue.12
, pp. 1539-1550
-
-
Pichel, J.C.1
Rivera, F.F.2
-
19
-
-
77952611196
-
Concurrent number cruncher: A GPU implementation of a general sparse linear solver
-
L. Buatois, G. Caumon, and B. Levy, "Concurrent number cruncher: A GPU implementation of a general sparse linear solver," Int. J. Parallel Emerg. Distrib. Syst., vol. 24, no. 3, pp. 205-223, 2009.
-
(2009)
Int. J. Parallel Emerg. Distrib. Syst.
, vol.24
, Issue.3
, pp. 205-223
-
-
Buatois, L.1
Caumon, G.2
Levy, B.3
-
21
-
-
78650279432
-
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
-
M. Belgin, G. Back, and C. J. Ribbens, "Pattern-based sparse matrix representation for memory-efficient SMVM kernels," in Proc. 23rd Int. Conf. Supercomput., Jun. 2009, pp. 100-109.
-
Proc. 23rd Int. Conf. Supercomput., Jun. 2009
, pp. 100-109
-
-
Belgin, M.1
Back, G.2
Ribbens, C.J.3
-
22
-
-
60949098907
-
Optimization of sparse matrix vector multiplication on emerging multicore platforms
-
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of sparse matrix vector multiplication on emerging multicore platforms," Parallel Comput., vol. 35, no. 3, pp. 178-194, 2009.
-
(2009)
Parallel Comput.
, vol.35
, Issue.3
, pp. 178-194
-
-
Williams, S.1
Oliker, L.2
Vuduc, R.3
Shalf, J.4
Yelick, K.5
Demmel, J.6
-
23
-
-
77949577730
-
Automatically tuning sparse matrix vector multiplication for GPU architectures
-
Berlin, Germany: Springer
-
A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers. Berlin, Germany: Springer, 2010, pp. 111-125.
-
(2010)
High Performance Embedded Architectures and Compilers
, pp. 111-125
-
-
Monakov, A.1
Lokhmotov, A.2
Avetisyan, A.3
-
24
-
-
77957679421
-
Model-driven autotuning of sparse matrix vector multiply on GPUs
-
J. W. Choi, A. Singh, and R. W. Vuduc, "Model-driven autotuning of sparse matrix vector multiply on GPUs," in Proc. 15th ACM SIGPLAN Symp. Principles Practice Parallel Programming, 2010, pp. 115-126.
-
Proc. 15th ACM SIGPLAN Symp. Principles Practice Parallel Programming, 2010
, pp. 115-126
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.W.3
-
25
-
-
84856613262
-
Optimization of sparse matrix vector multiplication with variant CSR on GPUs
-
X. Feng, H. Jin, R. Zheng, K. Hu, J. Zeng, and Z. Shao, "Optimization of sparse matrix vector multiplication with variant CSR on GPUs," in Proc. IEEE 17th Int. Conf. Parallel Distrib. Syst., 2011, pp. 165-172.
-
Proc. IEEE 17th Int. Conf. Parallel Distrib. Syst., 2011
, pp. 165-172
-
-
Feng, X.1
Jin, H.2
Zheng, R.3
Hu, K.4
Zeng, J.5
Shao, Z.6
-
26
-
-
84867417216
-
Sparse matrix vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
-
M. Kreutzer, G. Hager, G. Wellein, H. Fehske, A. Basermann, and A. R. Bishop, "Sparse matrix vector multiplication on GPGPU clusters: A new storage format and a scalable implementation," in Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, May 2012, pp. 1696-1702.
-
Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, May 2012
, pp. 1696-1702
-
-
Kreutzer, M.1
Hager, G.2
Wellein, G.3
Fehske, H.4
Basermann, A.5
Bishop, A.R.6
-
27
-
-
81355148805
-
Two-dimensional cacheoblivious sparse matrix vector multiplication
-
A. N. Yzelman and R. H. Bisseling, "Two-dimensional cacheoblivious sparse matrix vector multiplication," Parallel Comput., vol. 37, no. 12, pp. 806-819, 2011.
-
(2011)
Parallel Comput.
, vol.37
, Issue.12
, pp. 806-819
-
-
Yzelman, A.N.1
Bisseling, R.H.2
-
28
-
-
84883314318
-
An extended compression format for the optimization of sparse matrix vector multiplication
-
Sep. Oct.
-
V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris, "An extended compression format for the optimization of sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst. Sep. vol. 24, no. 10, pp. 1930-1940, Oct. 2013.
-
(2013)
IEEE Trans. Parallel Distrib. Syst.
, vol.24
, Issue.10
, pp. 1930-1940
-
-
Karakasis, V.1
Gkountouvas, T.2
Kourtis, K.3
Goumas, G.4
Koziris, N.5
-
29
-
-
84898682038
-
A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs
-
May
-
P. Chen, "A performance modeling and optimization analysis tool for sparse matrix vector multiplication on GPUs," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 5, pp. 1112-1123, May 2014.
-
(2014)
IEEE Trans. Parallel Distrib. Syst.
, vol.25
, Issue.5
, pp. 1112-1123
-
-
Chen, P.1
-
30
-
-
84874116376
-
Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems
-
B. Schmidt, H. Aribowo, and H.-V. Dang, "Iterative sparse matrix vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems," Concurrency Comput.: Practice Experience, vol. 25, no. 4, pp. 586-603, 2013.
-
(2013)
Concurrency Comput.: Practice Experience
, vol.25
, Issue.4
, pp. 586-603
-
-
Schmidt, B.1
Aribowo, H.2
Dang, H.-V.3
-
31
-
-
84878396949
-
Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products
-
Jul.
-
M. Cenk, C. Negre, and M. A. Hasan, "Improved three-way split formulas for binary polynomial and Toeplitz matrix vector products," IEEE Trans. Comput., vol. 62, no. 7, pp. 1345-1361, Jul. 2013.
-
(2013)
IEEE Trans. Comput.
, vol.62
, Issue.7
, pp. 1345-1361
-
-
Cenk, M.1
Negre, C.2
Hasan, M.A.3
-
32
-
-
84878402645
-
Multiway splitting method for toeplitz matrix vector product
-
May
-
M. A. Hasan and C. Negre, "Multiway splitting method for toeplitz matrix vector product," IEEE Trans. Comput., vol. 62, no. 7, pp. 1467-1471, May 2013.
-
(2013)
IEEE Trans. Comput.
, vol.62
, Issue.7
, pp. 1467-1471
-
-
Hasan, M.A.1
Negre, C.2
-
33
-
-
84919494711
-
High-level strategies for parallel shared-memory sparse matrix vector multiplication
-
Jan.
-
A.-J. N. Yzelman and D. Roose, "High-level strategies for parallel shared-memory sparse matrix vector multiplication," IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 1, pp. 116-125, Jan. 2014.
-
(2014)
IEEE Trans. Parallel Distrib. Syst.
, vol.25
, Issue.1
, pp. 116-125
-
-
Yzelman, A.-J.N.1
Roose, D.2
-
34
-
-
84919470072
-
Performance analysis and optimization for SpMV on GPU using probabilistic modeling
-
Jan.
-
K. Li, W. Yang, and K. Li, "Performance analysis and optimization for SpMV on GPU using probabilistic modeling," IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 196-205, Jan. 2015.
-
(2015)
IEEE Trans. Parallel Distrib. Syst.
, vol.26
, Issue.1
, pp. 196-205
-
-
Li, K.1
Yang, W.2
Li, K.3
-
35
-
-
0003763748
-
-
New York, NY, USA: Wiley
-
N. L. Johnson, S. Kotz, and A. Kemp, Univariate Discrete Distributions. 2nd Ed., New York, NY, USA: Wiley, ISBN 0-471-54897-9, 1993 p. 36.
-
(1993)
Univariate Discrete Distributions. 2nd Ed.
, pp. 36
-
-
Johnson, N.L.1
Kotz, S.2
Kemp, A.3
|