SCOPUS 정보 검색 플랫폼

ISPASS 2009 - International Symposium on Performance Analysis of Systems and Software

Volumn , Issue , 2009, Pages 163-174

Analyzing CUDA workloads using a detailed GPU simulator

(5) Bakhoda, Ali a Yuan, George L a Fung, Wilson W L a Wong, Henry a Aamodt, T M a

a UNIVERSITY OF BRITISH COLUMBIA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

BISECTION BANDWIDTH; COMPUTING POWER; DATA-LEVEL PARALLELISM; FLEXIBLE PROGRAMMING MODEL; GRAPHIC PROCESSING UNITS; HIGH-END GRAPHICS; INSTRUCTION SET; INTERCONNECT TOPOLOGY; MANY-CORE; MEMORY CONTROLLER; MEMORY SYSTEMS; MICRO ARCHITECTURES; MICRO-ARCHITECTURE DESIGN; MULTITHREADED; NON-TRIVIAL; ON CHIPS; ORDERS OF MAGNITUDE; PEAK PERFORMANCE; PERFORMANCE IMPACT; PERFORMANCE IMPROVEMENTS; PERFORMANCE SIMULATOR; PROGRAMMING MODELS; THREAD LEVEL PARALLELISM; WORK-LOAD DISTRIBUTION;

COMPUTER GRAPHICS EQUIPMENT; FLOCCULATION; MACHINE DESIGN; PROGRAM PROCESSORS; SIMULATORS;

CACHE MEMORY;

EID: 70349169075 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ISPASS.2009.4919648 Document Type: Conference Paper

Times cited : (1368)

References (46)

1
- 43649092214
- Advanced Micro Devices, Inc, 1.01 edition
- Advanced Micro Devices, Inc. ATI CTM Guide, 1.01 edition, 2006.
- (2006) ATI CTM Guide

2
- 84964634356
- Advanced Micro Devices, Inc, 28 January
- Advanced Micro Devices, Inc. Press Release: AMD Delivers Enthusiast Performance Leadership with the Introduction of the ATI Radeon HD 3870 X2, 28 January 2008.
- (2008) Press Release: AMD Delivers Enthusiast Performance Leadership with the Introduction of the ATI Radeon HD 3870 X2

3
- 4644295630
- Evaluating the Imagine stream architecture
- J. H. Ahn,W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the Imagine stream architecture. In Proc. 31st Int'l Symp. on Computer Architecture, page 14, 2004.
- (2004) Proc. 31st Int'l Symp. on Computer Architecture , pp. 14
- Ahn, J.H.¹ Dally, W.J.² Khailany, B.³ Kapasi, U.J.⁴ Das, A.⁵

4
- 57349130987
- StoreGPU: Exploiting graphics processing units to accelerate distributed storage systems
- S. Al-Kiswany, A. Gharaibeh, E. Santos-Neto, G. Yuan, and M. Ripeanu. StoreGPU: exploiting graphics processing units to accelerate distributed storage systems. In Proc. 17th Int'l Symp. on High Performance Distributed Computing, pages 165-174, 2008.
- (2008) Proc. 17th Int'l Symp. on High Performance Distributed Computing , pp. 165-174
- Al-Kiswany, S.¹ Gharaibeh, A.² Santos-Neto, E.³ Yuan, G.⁴ Ripeanu, M.⁵

5
- 70349169252
- Billconan and Kavinguy. A Neural Network on GPU. http://www.codeproject. com/KB/graphics/GPUNN.aspx.
- Billconan and Kavinguy. A Neural Network on GPU

6
- 0033725306
- Methodology for I/O cell placement and checking in ASIC designs using area-array power grid
- P. Buffet, J. Natonio, R. Proctor, Y. Sun, and G. Yasar. Methodology for I/O cell placement and checking in ASIC designs using area-array power grid. In IEEE Custom Integrated Circuits Conference, 2000.
- (2000) IEEE Custom Integrated Circuits Conference
- Buffet, P.¹ Natonio, J.² Proctor, R.³ Sun, Y.⁴ Yasar, G.⁵

7
- 34247371330
- Cell Broadband Engine interconnect and memory interface
- Palo Alto, CA, August
- S. Clark, K. Haselhorst, K. Imming, J. Irish, D. Krolak, and T. Ozguner. Cell Broadband Engine interconnect and memory interface. In Hot Chips 17, Palo Alto, CA, August 2005.
- (2005) Hot Chips 17
- Clark, S.¹ Haselhorst, K.² Imming, K.³ Irish, J.⁴ Krolak, D.⁵ Ozguner, T.⁶

8
- 84877083867
- Merrimac: Supercomputing with streams
- W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J.-H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merrimac: Supercomputing with streams. In SC '03: Proc. 2003 ACM/IEEE Conf. on Supercomputing, page 35, 2003.
- (2003) SC '03: Proc. 2003 ACM/IEEE Conf. on Supercomputing , pp. 35
- Dally, W.J.¹ Labonte, F.² Das, A.³ Hanrahan, P.⁴ Ahn, J.-H.⁵ Gummaraju, J.⁶ Erez, M.⁷ Jayasena, N.⁸ Buck, I.⁹ Knight, T.J.¹⁰ Kapasi, U.J.¹¹

9
- 4043097206
- Morgan Kaufmann
- W. J. Dally and B. Towles. Interconnection Networks. Morgan Kaufmann, 2004.
- (2004) Interconnection Networks
- Dally, W.J.¹ Towles, B.²

10
- 33750834456
- V. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and E. E. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. Int'l Symp. on Performance Analysis of Systems and Software, pages 231-241, March 2006.
- V. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and E. E. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. Int'l Symp. on Performance Analysis of Systems and Software, pages 231-241, March 2006.

11
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007.
- (2007) Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

12
- 70349166146
- structured grid
- M. Giles. Jacobi iteration for a Laplace discretisation on a 3D structured grid. http://people.maths.ox.ac.uk/̃gilesm/hpc/NVIDIA/laplace3d. pdf.
- Jacobi iteration for a Laplace discretisation on a , vol.3 D
- Giles, M.¹

13
- 77952620490
- M. Giles and S. Xiaoke. Notes on using the NVIDIA 8800 GTX graphics card. http://people.maths.ox.ac.uk/̃gilesm/hpc/.
- Notes on using the NVIDIA 8800 GTX graphics card
- Giles, M.¹ Xiaoke, S.²

14
- 0030677581
- The design and analysis of a cache architecture for texture mapping
- Z. S. Hakura and A. Gupta. The design and analysis of a cache architecture for texture mapping. In Proc. 24th Int'l Symp. on Computer Architecture, pages 108-120, 1997.
- (1997) Proc. 24th Int'l Symp. on Computer Architecture , pp. 108-120
- Hakura, Z.S.¹ Gupta, A.²

15
- 38349041620
- Accelerating Large Graph Algorithms on the GPU Using CUDA
- P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In HiPC, pages 197-208, 2007.
- (2007) HiPC , pp. 197-208
- Harish, P.¹ Narayanan, P.J.²

16
- 0003278283
- The Microarchitecture of the Pentium® 4 Processor
- G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium® 4 Processor. Intel® Technology Journal, 5(1), 2001.
- (2001) Intel® Technology Journal , vol.5 , Issue.1
- Hinton, G.¹ Sager, D.² Upton, M.³ Boggs, D.⁴ Carmean, D.⁵ Kyker, A.⁶ Roussel, P.⁷

17
- 0031606564
- Prefetching in a texture cache architecture
- H. Igehy, M. Eldridge, and K. Proudfoot. Prefetching in a texture cache architecture. In Proc. SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 1998.
- (1998) Proc. SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware
- Igehy, H.¹ Eldridge, M.² Proudfoot, K.³

18
- 67650692011
- Illinois Microarchitecture Project utilizing Advanced Compiler Technology Research Group
- Illinois Microarchitecture Project utilizing Advanced Compiler Technology Research Group. Parboil benchmark suite. http://www.crhc.uiuc.edu/IMPACT/ parboil.php.
- Parboil benchmark suite

19
- 70349173991
- Infineon. 256Mbit GDDR3 DRAM, Revision 1.03 (Part No. HYB18H256321AF). http://www.infineon.com, December 2005.
- Infineon. 256Mbit GDDR3 DRAM, Revision 1.03 (Part No. HYB18H256321AF). http://www.infineon.com, December 2005.

20
- 84955473128
- Exploring the VLSI scalability of stream processors
- B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, and B. Towles. Exploring the VLSI scalability of stream processors. In Proc. 9th Int'l Symp. on High Performance Computer Architecture, page 153, 2003.
- (2003) Proc. 9th Int'l Symp. on High Performance Computer Architecture , pp. 153
- Khailany, B.¹ Dally, W.J.² Rixner, S.³ Kapasi, U.J.⁴ Owens, J.D.⁵ Towles, B.⁶

21
- 0019892368
- Lockup-free Instruction Fetch/Prefetch Cache Organization
- D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th Int'l Symp. Computer Architecture, pages 81-87, 1981.
- (1981) Proc. 8th Int'l Symp. Computer Architecture , pp. 81-87
- Kroft, D.¹

22
- 44849137198
- NVIDIA Tesla: A Unified Graphics and Computing Architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2):39-55, 2008.
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

23
- 66749170578
- Tradeoffs in designing accelerator architectures for visual computing
- A. Mahesri, D. Johnson, N. Crago, and S. J. Patel. Tradeoffs in designing accelerator architectures for visual computing. In Proc. 41st IEEE/ACM Int'l Symp. on Microarchitecture, 2008.
- (2008) Proc. 41st IEEE/ACM Int'l Symp. on Microarchitecture
- Mahesri, A.¹ Johnson, D.² Crago, N.³ Patel, S.J.⁴

24
- 51049111938
- CUDA compatible GPU as an efficient hardware accelerator for AES cryptography
- S. A. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In ICSPC 2007: Proc. of IEEE Int'l Conf. on Signal Processing and Communication, pages 65-68, 2007.
- (2007) ICSPC 2007: Proc. of IEEE Int'l Conf. on Signal Processing and Communication , pp. 65-68
- Manavski, S.A.¹

25
- 70349167821
- Marco Chiappetta. ATI Radeon HD 2900 XT - R600 Has Arrived. http://www.hothardware.com/printarticle.aspx?articleid=966.
- Marco Chiappetta. ATI Radeon HD 2900 XT - R600 Has Arrived. http://www.hothardware.com/printarticle.aspx?articleid=966.

26
- 85015171905
- Maxime. Ray tracing. http://www.nvidia.com/cuda.
- Ray tracing
- Maxime¹

27
- 51049099597
- J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. IPDPS 2008: IEEE Int'l Symp. on Parallel and Distributed Processing, pages 1-7, April 2008.
- J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. IPDPS 2008: IEEE Int'l Symp. on Parallel and Distributed Processing, pages 1-7, April 2008.

28
- 70349189057
- NVIDIA's Experience with Open64
- M. Murphy. NVIDIA's Experience with Open64. In 1st Annual Workshop on Open64, 2008.
- (2008) 1st Annual Workshop on Open64
- Murphy, M.¹

29
- 78651550268
- Scalable Parallel Programming with CUDA
- Mar.-Apr
- J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, Mar.-Apr. 2008.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

30
- 70349186177
- NVIDIA. CUDA ZONE. http://www.nvidia.com/cuda.
- NVIDIA. CUDA ZONE. http://www.nvidia.com/cuda.

31
- 70349170944
- NVIDIA. Geforce 8 series. http://www.nvidia.com/page/geforce8.html.
- NVIDIA. Geforce 8 series. http://www.nvidia.com/page/geforce8.html.

32
- 84872053761
- NVIDIA Corporation. NVIDIA CUDA SDK code samples. http://developer. download.nvidia.com/compute/cuda/sdk/website/samples.html.
- NVIDIA CUDA SDK code samples

33
- 70349170942
- NVIDIA Corporation. NVIDIA CUDA Programming Guide, 1.1 edition, 2007.
- NVIDIA Corporation. NVIDIA CUDA Programming Guide, 1.1 edition, 2007.

34
- 70349189054
- NVIDIA Corporation, 20 June
- NVIDIA Corporation. Press Release: NVIDIA Tesla GPU Computing Processor Ushers In the Era of Personal Supercomputing, 20 June 2007.
- (2007) Press Release: NVIDIA Tesla GPU Computing Processor Ushers In the Era of Personal Supercomputing

35
- 70349183057
- NVIDIA Corporation. PTX: Parallel Thread Execution ISA, 1.1 edition, 2007.
- NVIDIA Corporation. PTX: Parallel Thread Execution ISA, 1.1 edition, 2007.

36
- 84892357909
- Open64. The open research compiler. http://www.open64.net/.
- Open64. The open research compiler

37
- 70349167820
- Pcchen. N-Queens Solver. http://forums.nvidia.com/index.php?showtopic= 76893.
- Pcchen. N-Queens Solver. http://forums.nvidia.com/index.php?showtopic= 76893.

38
- 27344435504
- D. Pham, S. Asano, M. Bolliger, M. D. , H. Hofstee, C. Johns, J. Kahle, A.Kameyama, J. Keaty, Y. Masubuchi, D. S. M. Riley, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. W. D. Wendel, T.Yamazaki, and K. Yazawa. The design and implementation of a first-generation Cell processor. Digest of Technical Papers, IEEE Int'l Solid-State Circuits Conference (ISSCC), pages 184-592 1, 10-10 Feb. 2005.
- D. Pham, S. Asano, M. Bolliger, M. D. , H. Hofstee, C. Johns, J. Kahle, A.Kameyama, J. Keaty, Y. Masubuchi, D. S. M. Riley, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. W. D. Wendel, T.Yamazaki, and K. Yazawa. The design and implementation of a first-generation Cell processor. Digest of Technical Papers, IEEE Int'l Solid-State Circuits Conference (ISSCC), pages 184-592 Vol. 1, 10-10 Feb. 2005.

39
- 0033691565
- Memory access scheduling
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proc. 27th Int'l Symp. on Computer Architecture, pages 128-138, 2000.
- (2000) Proc. 27th Int'l Symp. on Computer Architecture , pp. 128-138
- Rixner, S.¹ Dally, W.J.² Kapasi, U.J.³ Mattson, P.⁴ Owens, J.D.⁵

40
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- April
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S.-Z. Ueng, J. Stratton, and W.W. Hwu. Program optimization space pruning for a multithreaded GPU. In Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO), pages 195-204, April 2008.
- (2008) Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO) , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.² Stone, S.³ Baghsorkhi, S.⁴ Ueng, S.-Z.⁵ Stratton, J.⁶ Hwu, W.W.⁷

41
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proc. 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 73-82, 2008.
- (2008) Proc. 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

42
- 38849131252
- High-throughput sequence alignment using Graphics Processing Units
- M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

43
- 78650725832
- A flexible simulation framework for graphics architectures
- J. W. Sheaffer, D. Luebke, and K. Skadron. A flexible simulation framework for graphics architectures. In Proc. ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 85-94, 2004.
- (2004) Proc. ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware , pp. 85-94
- Sheaffer, J.W.¹ Luebke, D.² Skadron, K.³

44
- 52449089583
- Sun Microsystems, Inc
- TM T2 Core Microarchitecture Specification, 2007.
- (2007) TM T2 Core Microarchitecture Specification

45
- 40349098914
- Scalable Cache Miss Handling for High Memory-Level Parallelism
- J. Tuck, L. Ceze, and J. Torrellas. Scalable Cache Miss Handling for High Memory-Level Parallelism. In Proc. 39th IEEE/ACM Int'l Symp. on Microarchitecture, pages 409-422, 2006.
- (2006) Proc. 39th IEEE/ACM Int'l Symp. on Microarchitecture , pp. 409-422
- Tuck, J.¹ Ceze, L.² Torrellas, J.³

46
- 70349184438
- T. C. Warburton. Mini Discontinuous Galerkin Solvers. http://www.caam.rice.edu/̃timwar/RMMC/MIDG.html.
- Mini Discontinuous Galerkin Solvers
- Warburton, T.C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.