메뉴 건너뛰기




Volumn , Issue , 2014, Pages 207-218

In-place transposition of rectangular matrices on accelerators

Author keywords

GPU; In place; Transposition

Indexed keywords

ASYNCHRONOUS EXECUTIONS; GPU; IN-PLACE; MATRIX TRANSPOSITION; MULTI-THREADED IMPLEMENTATION; PARALLELISM AND LOCALITIES; RECTANGULAR MATRIX; TRANSPOSITION;

EID: 84896819561     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2555243.2555266     Document Type: Conference Paper
Times cited : (12)

References (27)
  • 1
    • 20744449792 scopus 로고    scopus 로고
    • The design and implementation of fftw3
    • Frigo, M., Johnson, S.: The design and implementation of fftw3. Proceedings of the IEEE 93(2) (2005) 216-231
    • (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
    • Frigo, M.1    Johnson, S.2
  • 2
  • 3
    • 44249094647 scopus 로고    scopus 로고
    • Anatomy of high-performance matrix multiplication
    • May
    • Goto, K., Geijn, R.A.v.d.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3) (May 2008) 12:1-12:25
    • (2008) ACM Trans. Math. Softw. , vol.34 , Issue.3 , pp. 121-1225
    • Goto, K.1    Geijn, R.A.V.D.2
  • 4
    • 84896900508 scopus 로고    scopus 로고
    • Intel MKL: Intel Math Kernel Library (January 2013)
    • Intel MKL: Intel Math Kernel Library (January 2013)
  • 6
    • 38049073987 scopus 로고
    • Transposing matrices in a digital computer
    • Windley, P.F.: Transposing matrices in a digital computer. The Computer Journal 2(1) (1959) 47-48
    • (1959) The Computer Journal , vol.2 , Issue.1 , pp. 47-48
    • Windley, P.F.1
  • 7
    • 34247127338 scopus 로고
    • A method for transposing a matrix
    • October
    • Berman, M.F.: A method for transposing a matrix. J. ACM 5(4) (October 1958) 383-384
    • (1958) J ACM , vol.5 , Issue.4 , pp. 383-384
    • Berman, M.F.1
  • 9
    • 84896808494 scopus 로고    scopus 로고
    • PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May)
    • Sung, I.J.: Data layout transformation through in-placetransposition. PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May 2013) http://hdl.handle.net/2142/44300.
    • (2013) Data Layout Transformation Through In-placetransposition
    • Sung, I.J.1
  • 12
    • 84870691946 scopus 로고    scopus 로고
    • DL: A data layout transformation system for heterogeneous computing
    • May
    • Sung, I.J., Liu, G., Hwu, W.M.: DL: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing, InPar. (May 2012) 1-11
    • (2012) Innovative Parallel Computing, InPar , pp. 1-11
    • Sung, I.J.1    Liu, G.2    Hwu, W.M.3
  • 13
    • 0006359598 scopus 로고
    • Algorithm 513: Analysis of in-situ transposition [f1]
    • March
    • Cate, E.G., Twigg, D.W.: Algorithm 513: Analysis of in-situ transposition [f1]. ACM Trans. Math. Softw. 3(1) (March 1977) 104-110
    • (1977) ACM Trans. Math. Softw. , vol.3 , Issue.1 , pp. 104-110
    • Cate, E.G.1    Twigg, D.W.2
  • 20
    • 84896846609 scopus 로고    scopus 로고
    • NVIDIA: CUDA C Programming Guide 5.0 (July 2012)
    • NVIDIA: CUDA C Programming Guide 5.0 (July 2012)
  • 21
    • 70350771131 scopus 로고    scopus 로고
    • Benchmarking GPUs to tune dense linear algebra
    • Piscataway NJ USA, IEEE Press
    • Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Supercomputing, Piscataway, NJ, USA, IEEE Press (2008) 31:1-31:11
    • (2008) Supercomputing , pp. 311-3111
    • Volkov, V.1    Demmel, J.W.2
  • 22
    • 84865705401 scopus 로고    scopus 로고
    • Performance models for asynchronous data transfers on consumer graphics processing units
    • Accelerators for High-Performance Computing
    • Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance models for asynchronous data transfers on consumer graphics processing units. Journal of Parallel and Distributed Computing 72(9) (2012) 1117 - 1126 Accelerators for High-Performance Computing.
    • (2012) Journal of Parallel and Distributed Computing , vol.72 , Issue.9 , pp. 1117-1126
    • Gómez-Luna, J.1    González-Linares, J.M.2    Benavides, J.I.3    Guil, N.4
  • 24
    • 84896890428 scopus 로고    scopus 로고
    • AMD: ATI Stream SDK OpenCL Programming Guide (2010)
    • AMD: ATI Stream SDK OpenCL Programming Guide (2010)
  • 27
    • 84896852495 scopus 로고    scopus 로고
    • Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)
    • Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.