SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013

Volumn , Issue , 2013, Pages

Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs

(2) Lai, Junjie a Seznec, Andre a

a INRIA (France)

Author keywords

CUDA; Fermi GPU; Kepler GPU; Performance Upper Bound Analysis; SGEMM

Indexed keywords

CUDA; FERMI GPU; KEPLER GPU; SGEMM; UPPER BOUND ANALYSIS;

NETWORK COMPONENTS; OPTIMIZATION; PROGRAM PROCESSORS;

BENCHMARKING;

EID: 84876904433 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CGO.2013.6494986 Document Type: Conference Paper

Times cited : (81)

References (20)

1
- 84876947481
- Asfermi. http://code. google. com/p/asfermi/.
- Asfermi

2
- 84876915752
- Netlib. http://www. netlib. org/blas/.

3
- 33645207150
- Nvidia. Visual Profiler, https://developer. nvidia. com/nvidia-visual- profiler.
- Visual Profiler

4
- 0025028257
- The tera computer system
- New York, NY, USA. ACM
- R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The tera computer system. In Proceedings of the 4th international conference on Supercomputing, ICS '90, New York, NY, USA, 1990. ACM.
- (1990) Proceedings of the 4th International Conference on Supercomputing, ICS '90
- Alverson, R.¹ Callahan, D.² Cummings, D.³ Koblenz, B.⁴ Porterfield, A.⁵ Smith, B.⁶

5
- 70349169075
- Analyzing cuda workloads using a detailed gpu simulator
- april
- A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, april 2009.
- (2009) Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on
- Bakhoda, A.¹ Yuan, G.² Fung, W.³ Wong, H.⁴ Aamodt, T.⁵

6
- 70450231944
- An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
- New York, NY, USA. ACM
- S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In Proceedings of the 36th annual international symposium on Computer architecture, ISCA '09, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09
- Hong, S.¹ Kim, H.²

7
- 84867310932
- Autotuning gemm kernels for the fermi gpu
- PP
- J. Kurzak, S. Tomov, and J. Dongarra. Autotuning gemm kernels for the fermi gpu. Parallel and Distributed Systems, IEEE Transactions on, PP(99):1, 2012.
- (2012) Parallel and Distributed Systems, IEEE Transactions on , Issue.99 , pp. 1
- Kurzak, J.¹ Tomov, S.² Dongarra, J.³

8
- 0026137116
- The cache performance and optimizations of blocked algorithms
- Apr
- M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. SIGPLAN Not. , 26(4):63-74, Apr. 1991.
- (1991) SIGPLAN Not. , vol.26 , Issue.4 , pp. 63-74
- Lam, M.D.¹ Rothberg, E.E.² Wolf, M.E.³

9
- 84945709131
- Organizing matrices and matrix operations for paged memory systems
- Mar
- A. C. McKellar and E. G. Coffman, Jr. Organizing matrices and matrix operations for paged memory systems. Commun. ACM, 12(3):153-165, Mar. 1969.
- (1969) Commun. ACM , vol.12 , Issue.3 , pp. 153-165
- McKellar, A.C.¹ Coffman Jr., E.G.²

10
- 83155184571
- Grophecy: Gpu performance projection from cpu code skeletons
- New York, NY, USA. ACM
- J. Meng, V. A. Morozov, K. Kumaran, V. Vishwanath, and T. D. Uram. Grophecy: Gpu performance projection from cpu code skeletons. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
- Meng, J.¹ Morozov, V.A.² Kumaran, K.³ Vishwanath, V.⁴ Uram, T.D.⁵

11
- 79958284905
- R. Nath, S. Tomov, and J. Dongarra. An improved magma gemm for fermi gpus, 2010.
- (2010) An Improved Magma Gemm for Fermi Gpus
- Nath, R.¹ Tomov, S.² Dongarra, J.³

12
- 84874434384
- NVIDIA
- NVIDIA. Nvidia cuda c programming guide 4. 2.
- Nvidia Cuda C Programming Guide 4. 2

13
- 84876900901
- NVIDIA. Fermi Whitepaper. http://www. nvidia. com/content/PDF/Fermi- white-papers/ NVIDIA-Fermi-Compute-Architecture-Whitepaper. pdf, 2009.
- (2009) Fermi Whitepaper

14
- 84876891538
- NVIDIA
- NVIDIA. GTX680 Whitepaper. http://www. geforce. com/Active/en-US/en-US/ pdf/GeForce-GTX-680-Whitepaper-FINAL. pdf, 2012.
- (2012) GTX680 Whitepaper

15
- 84876911285
- NVIDIA, Nov
- NVIDIA. NVIDIA Tesla K20/K20X GPU Accelerators Application Performance Technical Brief. http://www. nvidia. com/docs/IO/122874/ K20-and-K20X- application-performancetechnical-brief. pdf, Nov. 2012.
- (2012) NVIDIA Tesla K20/K20X GPU Accelerators Application Performance Technical Brief

16
- 43449094719
- Program optimization space pruning for a multithreaded gpu
- New York, NY, USA. ACM
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO '08: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, New York, NY, USA, 2008. ACM.
- (2008) CGO '08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Mei, W.⁷ Hwu, W.⁸

17
- 84863347222
- A performance analysis framework for identifying potential benefits in gpgpu applications
- New York, NY, USA. ACM
- J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, New York, NY, USA, 2012. ACM.
- (2012) Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12
- Sim, J.¹ Dasgupta, A.² Kim, H.³ Vuduc, R.⁴

18
- 83155160943
- Fast implementation of dgemm on fermi gpu
- New York, NY, USA. ACM
- G. Tan, L. Li, S. Triechle, E. Phillips, Y. Bao, and N. Sun. Fast implementation of dgemm on fermi gpu. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 35:1-35:11, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11 , pp. 351-3511
- Tan, G.¹ Li, L.² Triechle, S.³ Phillips, E.⁴ Bao, Y.⁵ Sun, N.⁶

19
- 65949107549
- Roofline: An insightful visual performance model for multicore architectures
- Apr
- S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4), Apr. 2009.
- (2009) Commun. ACM , vol.52 , Issue.4
- Williams, S.¹ Waterman, A.² Patterson, D.³

20
- 79955921273
- A quantitative performance analysis model for gpu architectures
- Feb
- Y. Zhang and J. D. Owens. A quantitative performance analysis model for gpu architectures. In Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA 17), Feb. 2011.
- (2011) Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA 17)
- Zhang, Y.¹ Owens, J.D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.