-
1
-
-
20744449792
-
The design and implementation of fftw3
-
Frigo, M., Johnson, S.: The design and implementation of fftw3. Proceedings of the IEEE 93(2) (2005) 216-231
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.2
-
2
-
-
84880048054
-
K-means for parallel architectures using all-prefix-sum sorting and updating steps
-
Kohlhoff, K., Pande, V., Altman, R.: K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Transactions on Parallel and Distributed Systems 24(8) (2013) 1602-1612
-
(2013)
IEEE Transactions on Parallel and Distributed Systems
, vol.24
, Issue.8
, pp. 1602-1612
-
-
Kohlhoff, K.1
Pande, V.2
Altman, R.3
-
3
-
-
44249094647
-
Anatomy of high-performance matrix multiplication
-
May
-
Goto, K., Geijn, R.A.v.d.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3) (May 2008) 12:1-12:25
-
(2008)
ACM Trans. Math. Softw.
, vol.34
, Issue.3
, pp. 121-1225
-
-
Goto, K.1
Geijn, R.A.V.D.2
-
4
-
-
84896900508
-
-
Intel MKL: Intel Math Kernel Library (January 2013)
-
Intel MKL: Intel Math Kernel Library (January 2013)
-
-
-
-
6
-
-
38049073987
-
Transposing matrices in a digital computer
-
Windley, P.F.: Transposing matrices in a digital computer. The Computer Journal 2(1) (1959) 47-48
-
(1959)
The Computer Journal
, vol.2
, Issue.1
, pp. 47-48
-
-
Windley, P.F.1
-
7
-
-
34247127338
-
A method for transposing a matrix
-
October
-
Berman, M.F.: A method for transposing a matrix. J. ACM 5(4) (October 1958) 383-384
-
(1958)
J ACM
, vol.5
, Issue.4
, pp. 383-384
-
-
Berman, M.F.1
-
9
-
-
84896808494
-
-
PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May)
-
Sung, I.J.: Data layout transformation through in-placetransposition. PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May 2013) http://hdl.handle.net/2142/44300.
-
(2013)
Data Layout Transformation Through In-placetransposition
-
-
Sung, I.J.1
-
11
-
-
84862107202
-
Parallel and cacheefficient in-place matrix storage format conversion
-
April
-
Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cacheefficient in-place matrix storage format conversion. ACM Transactions on Mathematical Software 38(3) (April 2012) 17:1-17:32
-
(2012)
ACM Transactions on Mathematical Software
, vol.38
, Issue.3
, pp. 171-1732
-
-
Gustavson, F.1
Karlsson, L.2
Kågström, B.3
-
12
-
-
84870691946
-
DL: A data layout transformation system for heterogeneous computing
-
May
-
Sung, I.J., Liu, G., Hwu, W.M.: DL: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing, InPar. (May 2012) 1-11
-
(2012)
Innovative Parallel Computing, InPar
, pp. 1-11
-
-
Sung, I.J.1
Liu, G.2
Hwu, W.M.3
-
13
-
-
0006359598
-
Algorithm 513: Analysis of in-situ transposition [f1]
-
March
-
Cate, E.G., Twigg, D.W.: Algorithm 513: Analysis of in-situ transposition [f1]. ACM Trans. Math. Softw. 3(1) (March 1977) 104-110
-
(1977)
ACM Trans. Math. Softw.
, vol.3
, Issue.1
, pp. 104-110
-
-
Cate, E.G.1
Twigg, D.W.2
-
14
-
-
0027719668
-
Efficient transposition algorithms for large matrices
-
November
-
Kaushik, S.D., Huang, C.H., Johnson, J.R., Johnson, R.W., Sadayappan, P.: Efficient transposition algorithms for large matrices. In: Supercomputing. (November 1993)
-
(1993)
Supercomputing
-
-
Kaushik, S.D.1
Huang, C.H.2
Johnson, J.R.3
Johnson, R.W.4
Sadayappan, P.5
-
15
-
-
79952782168
-
Auto-tuning of fast fourier transform on graphics processors
-
New York, NY, USA, ACM
-
Dotsenko, Y., Baghsorkhi, S.S., Lloyd, B., Govindaraju, N.K.: Auto-tuning of fast fourier transform on graphics processors. In: Proceedings of the 16th ACM symposium on Principles and Practice of Parallel Programming. PPoPP '11, New York, NY, USA, ACM (2011) 257-266
-
(2011)
Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP '11
, pp. 257-266
-
-
Dotsenko, Y.1
Baghsorkhi, S.S.2
Lloyd, B.3
Govindaraju, N.K.4
-
16
-
-
84879555900
-
An optimized approach to histogram computation on gpu
-
Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: An optimized approach to histogram computation on gpu. Machine Vision and Applications 24(5) (2013) 899-908
-
(2013)
Machine Vision and Applications
, vol.24
, Issue.5
, pp. 899-908
-
-
Gómez-Luna, J.1
González-Linares, J.M.2
Benavides, J.I.3
Guil, N.4
-
17
-
-
84867653909
-
GPU-vote: A framework for accelerating voting algorithms on GPU
-
Kaklamanis, C., Papatheodorou, T., Spirakis, P., eds.
-
Van den Braak, G.J., Nugteren, C., Mesman, B., Corporaal, H.: GPU-vote: A framework for accelerating voting algorithms on GPU. In Kaklamanis, C., Papatheodorou, T., Spirakis, P., eds.: Euro-Par Parallel Processing. Volume 7484 of Lecture Notes in Computer Science. (2012) 945-956
-
(2012)
Euro-Par Parallel Processing. Volume 7484 of Lecture Notes in Computer Science
, pp. 945-956
-
-
Braak Van Den, G.J.1
Nugteren, C.2
Mesman, B.3
Corporaal, H.4
-
19
-
-
84885108092
-
Performance modeling of atomic additions on GPU scratchpad memory
-
Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance modeling of atomic additions on GPU scratchpad memory. IEEE Transactions on Parallel and Distributed Systems 24(11) (2013) 2273-2282
-
(2013)
IEEE Transactions on Parallel and Distributed Systems
, vol.24
, Issue.11
, pp. 2273-2282
-
-
Gómez-Luna, J.1
González-Linares, J.M.2
Benavides, J.I.3
Guil, N.4
-
20
-
-
84896846609
-
-
NVIDIA: CUDA C Programming Guide 5.0 (July 2012)
-
NVIDIA: CUDA C Programming Guide 5.0 (July 2012)
-
-
-
-
21
-
-
70350771131
-
Benchmarking GPUs to tune dense linear algebra
-
Piscataway NJ USA, IEEE Press
-
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Supercomputing, Piscataway, NJ, USA, IEEE Press (2008) 31:1-31:11
-
(2008)
Supercomputing
, pp. 311-3111
-
-
Volkov, V.1
Demmel, J.W.2
-
22
-
-
84865705401
-
Performance models for asynchronous data transfers on consumer graphics processing units
-
Accelerators for High-Performance Computing
-
Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance models for asynchronous data transfers on consumer graphics processing units. Journal of Parallel and Distributed Computing 72(9) (2012) 1117 - 1126 Accelerators for High-Performance Computing.
-
(2012)
Journal of Parallel and Distributed Computing
, vol.72
, Issue.9
, pp. 1117-1126
-
-
Gómez-Luna, J.1
González-Linares, J.M.2
Benavides, J.I.3
Guil, N.4
-
23
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
New York, NY, USA, ACM
-
Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11, New York, NY, USA, ACM (2011) 13:1-13:11
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11
, pp. 131-1311
-
-
Che, S.1
Sheaffer, J.W.2
Skadron, K.3
-
24
-
-
84896890428
-
-
AMD: ATI Stream SDK OpenCL Programming Guide (2010)
-
AMD: ATI Stream SDK OpenCL Programming Guide (2010)
-
-
-
-
27
-
-
84896852495
-
-
Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)
-
Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)
-
-
-
|