SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Volumn , Issue , 2014, Pages 207-218

In-place transposition of rectangular matrices on accelerators

(5) Sung, I Jui a Gómez Luna, Juan b González Linares, José María c Guil, Nicolás c Hwu, Wen Mei W d

a MulticoreWare Inc (Spain)

b UNIVERSITY OF CÓRDOBA (Spain)

c UNIVERSITY OF MÁLAGA (Spain)

d UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

GPU; In place; Transposition

Indexed keywords

ASYNCHRONOUS EXECUTIONS; GPU; IN-PLACE; MATRIX TRANSPOSITION; MULTI-THREADED IMPLEMENTATION; PARALLELISM AND LOCALITIES; RECTANGULAR MATRIX; TRANSPOSITION;

BUILDING MATERIALS; COMPUTER PROGRAMMING LANGUAGES; DATA TRANSFER; MATRIX ALGEBRA; PARALLEL PROGRAMMING; PROGRAM PROCESSORS;

ALGORITHMS;

EID: 84896819561 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2555243.2555266 Document Type: Conference Paper

Times cited : (12)

References (27)

1
- 20744449792
- The design and implementation of fftw3
- Frigo, M., Johnson, S.: The design and implementation of fftw3. Proceedings of the IEEE 93(2) (2005) 216-231
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.²

2
- 84880048054
- K-means for parallel architectures using all-prefix-sum sorting and updating steps
- Kohlhoff, K., Pande, V., Altman, R.: K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Transactions on Parallel and Distributed Systems 24(8) (2013) 1602-1612
- (2013) IEEE Transactions on Parallel and Distributed Systems , vol.24 , Issue.8 , pp. 1602-1612
- Kohlhoff, K.¹ Pande, V.² Altman, R.³

3
- 44249094647
- Anatomy of high-performance matrix multiplication
- May
- Goto, K., Geijn, R.A.v.d.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3) (May 2008) 12:1-12:25
- (2008) ACM Trans. Math. Softw. , vol.34 , Issue.3 , pp. 121-1225
- Goto, K.¹ Geijn, R.A.V.D.²

4
- 84896900508
- Intel MKL: Intel Math Kernel Library (January 2013)
- Intel MKL: Intel Math Kernel Library (January 2013)

5
- 77952265152
- January
- Ruetsch, G., Micikevicius, P.: Optimizing matrix transpose in CUDA. (January 2009)
- (2009) Optimizing Matrix Transpose in CUDA
- Ruetsch, G.¹ Micikevicius, P.²

6
- 38049073987
- Transposing matrices in a digital computer
- Windley, P.F.: Transposing matrices in a digital computer. The Computer Journal 2(1) (1959) 47-48
- (1959) The Computer Journal , vol.2 , Issue.1 , pp. 47-48
- Windley, P.F.¹

7
- 34247127338
- A method for transposing a matrix
- October
- Berman, M.F.: A method for transposing a matrix. J. ACM 5(4) (October 1958) 383-384
- (1958) J ACM , vol.5 , Issue.4 , pp. 383-384
- Berman, M.F.¹

8
- 84896855734
- Abstract algebra: An introduction
- Hungerford, T.: Abstract algebra: an introduction. Saunders College Publishing (1997)
- (1997) Saunders College Publishing (
- Hungerford, T.¹

9
- 84896808494
- PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May)
- Sung, I.J.: Data layout transformation through in-placetransposition. PhD thesis, University of Illinois at Urbana- Champaign, Department of Electrical and Computer Engineering (May 2013) http://hdl.handle.net/2142/44300.
- (2013) Data Layout Transformation Through In-placetransposition
- Sung, I.J.¹

10
- 84857492607
- Technical report
- Karlsson, L.: Blocked in-place transposition with applicationto storage format conversion. Technical report (2009)
- (2009) Blocked In-place Transposition with Applicationto Storage Format Conversion
- Karlsson, L.¹

11
- 84862107202
- Parallel and cacheefficient in-place matrix storage format conversion
- April
- Gustavson, F., Karlsson, L., Kågström, B.: Parallel and cacheefficient in-place matrix storage format conversion. ACM Transactions on Mathematical Software 38(3) (April 2012) 17:1-17:32
- (2012) ACM Transactions on Mathematical Software , vol.38 , Issue.3 , pp. 171-1732
- Gustavson, F.¹ Karlsson, L.² Kågström, B.³

12
- 84870691946
- DL: A data layout transformation system for heterogeneous computing
- May
- Sung, I.J., Liu, G., Hwu, W.M.: DL: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing, InPar. (May 2012) 1-11
- (2012) Innovative Parallel Computing, InPar , pp. 1-11
- Sung, I.J.¹ Liu, G.² Hwu, W.M.³

13
- 0006359598
- Algorithm 513: Analysis of in-situ transposition [f1]
- March
- Cate, E.G., Twigg, D.W.: Algorithm 513: Analysis of in-situ transposition [f1]. ACM Trans. Math. Softw. 3(1) (March 1977) 104-110
- (1977) ACM Trans. Math. Softw. , vol.3 , Issue.1 , pp. 104-110
- Cate, E.G.¹ Twigg, D.W.²

14
- 0027719668
- Efficient transposition algorithms for large matrices
- November
- Kaushik, S.D., Huang, C.H., Johnson, J.R., Johnson, R.W., Sadayappan, P.: Efficient transposition algorithms for large matrices. In: Supercomputing. (November 1993)
- (1993) Supercomputing
- Kaushik, S.D.¹ Huang, C.H.² Johnson, J.R.³ Johnson, R.W.⁴ Sadayappan, P.⁵

15
- 79952782168
- Auto-tuning of fast fourier transform on graphics processors
- New York, NY, USA, ACM
- Dotsenko, Y., Baghsorkhi, S.S., Lloyd, B., Govindaraju, N.K.: Auto-tuning of fast fourier transform on graphics processors. In: Proceedings of the 16th ACM symposium on Principles and Practice of Parallel Programming. PPoPP '11, New York, NY, USA, ACM (2011) 257-266
- (2011) Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP '11 , pp. 257-266
- Dotsenko, Y.¹ Baghsorkhi, S.S.² Lloyd, B.³ Govindaraju, N.K.⁴

16
- 84879555900
- An optimized approach to histogram computation on gpu
- Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: An optimized approach to histogram computation on gpu. Machine Vision and Applications 24(5) (2013) 899-908
- (2013) Machine Vision and Applications , vol.24 , Issue.5 , pp. 899-908
- Gómez-Luna, J.¹ González-Linares, J.M.² Benavides, J.I.³ Guil, N.⁴

17
- 84867653909
- GPU-vote: A framework for accelerating voting algorithms on GPU
- Kaklamanis, C., Papatheodorou, T., Spirakis, P., eds.
- Van den Braak, G.J., Nugteren, C., Mesman, B., Corporaal, H.: GPU-vote: A framework for accelerating voting algorithms on GPU. In Kaklamanis, C., Papatheodorou, T., Spirakis, P., eds.: Euro-Par Parallel Processing. Volume 7484 of Lecture Notes in Computer Science. (2012) 945-956
- (2012) Euro-Par Parallel Processing. Volume 7484 of Lecture Notes in Computer Science , pp. 945-956
- Braak Van Den, G.J.¹ Nugteren, C.² Mesman, B.³ Corporaal, H.⁴

18
- 0003657590
- 2nd Edition Addison-Wesley
- Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd Edition. Addison-Wesley (1981)
- (1981) The Art of Computer Programming, Volume II: Seminumerical Algorithms
- Knuth, D.E.¹

19
- 84885108092
- Performance modeling of atomic additions on GPU scratchpad memory
- Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance modeling of atomic additions on GPU scratchpad memory. IEEE Transactions on Parallel and Distributed Systems 24(11) (2013) 2273-2282
- (2013) IEEE Transactions on Parallel and Distributed Systems , vol.24 , Issue.11 , pp. 2273-2282
- Gómez-Luna, J.¹ González-Linares, J.M.² Benavides, J.I.³ Guil, N.⁴

20
- 84896846609
- NVIDIA: CUDA C Programming Guide 5.0 (July 2012)
- NVIDIA: CUDA C Programming Guide 5.0 (July 2012)

21
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- Piscataway NJ USA, IEEE Press
- Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Supercomputing, Piscataway, NJ, USA, IEEE Press (2008) 31:1-31:11
- (2008) Supercomputing , pp. 311-3111
- Volkov, V.¹ Demmel, J.W.²

22
- 84865705401
- Performance models for asynchronous data transfers on consumer graphics processing units
- Accelerators for High-Performance Computing
- Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance models for asynchronous data transfers on consumer graphics processing units. Journal of Parallel and Distributed Computing 72(9) (2012) 1117 - 1126 Accelerators for High-Performance Computing.
- (2012) Journal of Parallel and Distributed Computing , vol.72 , Issue.9 , pp. 1117-1126
- Gómez-Luna, J.¹ González-Linares, J.M.² Benavides, J.I.³ Guil, N.⁴

23
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- New York, NY, USA, ACM
- Che, S., Sheaffer, J.W., Skadron, K.: Dymaxion: optimizing memory access patterns for heterogeneous systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11, New York, NY, USA, ACM (2011) 13:1-13:11
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11 , pp. 131-1311
- Che, S.¹ Sheaffer, J.W.² Skadron, K.³

24
- 84896890428
- AMD: ATI Stream SDK OpenCL Programming Guide (2010)
- AMD: ATI Stream SDK OpenCL Programming Guide (2010)

25
- 84950113779
- A decomposition for in-place matrix transposition
- February
- Catanzaro, B., Keller, A., Garland, M.: A decomposition for in-place matrix transposition. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP '14 (February 2014)
- (2014) Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP '14
- Catanzaro, B.¹ Keller, A.² Garland, M.³

26
- 84896838047
- Improving GPU performance prediction with data transfer modeling
- Boyer, M., Meng, J., Kumaran, K.: Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW). (2013) 1097-1106
- (2013) 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW) , pp. 1097-1106
- Boyer, M.¹ Meng, J.² Kumaran, K.³

27
- 84896852495
- Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)
- Intel: OpenCL design and programming guide for the Intel Xeon Phi coprocessor. (2013)

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.