메뉴 건너뛰기




Volumn , Issue , 2009, Pages 163-174

Analyzing CUDA workloads using a detailed GPU simulator

Author keywords

[No Author keywords available]

Indexed keywords

BISECTION BANDWIDTH; COMPUTING POWER; DATA-LEVEL PARALLELISM; FLEXIBLE PROGRAMMING MODEL; GRAPHIC PROCESSING UNITS; HIGH-END GRAPHICS; INSTRUCTION SET; INTERCONNECT TOPOLOGY; MANY-CORE; MEMORY CONTROLLER; MEMORY SYSTEMS; MICRO ARCHITECTURES; MICRO-ARCHITECTURE DESIGN; MULTITHREADED; NON-TRIVIAL; ON CHIPS; ORDERS OF MAGNITUDE; PEAK PERFORMANCE; PERFORMANCE IMPACT; PERFORMANCE IMPROVEMENTS; PERFORMANCE SIMULATOR; PROGRAMMING MODELS; THREAD LEVEL PARALLELISM; WORK-LOAD DISTRIBUTION;

EID: 70349169075     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ISPASS.2009.4919648     Document Type: Conference Paper
Times cited : (1368)

References (46)
  • 1
    • 43649092214 scopus 로고    scopus 로고
    • Advanced Micro Devices, Inc, 1.01 edition
    • Advanced Micro Devices, Inc. ATI CTM Guide, 1.01 edition, 2006.
    • (2006) ATI CTM Guide
  • 10
    • 33750834456 scopus 로고    scopus 로고
    • V. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and E. E. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. Int'l Symp. on Performance Analysis of Systems and Software, pages 231-241, March 2006.
    • V. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and E. E. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. Int'l Symp. on Performance Analysis of Systems and Software, pages 231-241, March 2006.
  • 15
    • 38349041620 scopus 로고    scopus 로고
    • Accelerating Large Graph Algorithms on the GPU Using CUDA
    • P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In HiPC, pages 197-208, 2007.
    • (2007) HiPC , pp. 197-208
    • Harish, P.1    Narayanan, P.J.2
  • 18
    • 67650692011 scopus 로고    scopus 로고
    • Illinois Microarchitecture Project utilizing Advanced Compiler Technology Research Group
    • Illinois Microarchitecture Project utilizing Advanced Compiler Technology Research Group. Parboil benchmark suite. http://www.crhc.uiuc.edu/IMPACT/ parboil.php.
    • Parboil benchmark suite
  • 19
    • 70349173991 scopus 로고    scopus 로고
    • Infineon. 256Mbit GDDR3 DRAM, Revision 1.03 (Part No. HYB18H256321AF). http://www.infineon.com, December 2005.
    • Infineon. 256Mbit GDDR3 DRAM, Revision 1.03 (Part No. HYB18H256321AF). http://www.infineon.com, December 2005.
  • 21
    • 0019892368 scopus 로고
    • Lockup-free Instruction Fetch/Prefetch Cache Organization
    • D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. In Proc. 8th Int'l Symp. Computer Architecture, pages 81-87, 1981.
    • (1981) Proc. 8th Int'l Symp. Computer Architecture , pp. 81-87
    • Kroft, D.1
  • 22
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A Unified Graphics and Computing Architecture
    • E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2):39-55, 2008.
    • (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 25
    • 70349167821 scopus 로고    scopus 로고
    • Marco Chiappetta. ATI Radeon HD 2900 XT - R600 Has Arrived. http://www.hothardware.com/printarticle.aspx?articleid=966.
    • Marco Chiappetta. ATI Radeon HD 2900 XT - R600 Has Arrived. http://www.hothardware.com/printarticle.aspx?articleid=966.
  • 27
    • 51049099597 scopus 로고    scopus 로고
    • J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. IPDPS 2008: IEEE Int'l Symp. on Parallel and Distributed Processing, pages 1-7, April 2008.
    • J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. IPDPS 2008: IEEE Int'l Symp. on Parallel and Distributed Processing, pages 1-7, April 2008.
  • 29
    • 78651550268 scopus 로고    scopus 로고
    • Scalable Parallel Programming with CUDA
    • Mar.-Apr
    • J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, Mar.-Apr. 2008.
    • (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
    • Nickolls, J.1    Buck, I.2    Garland, M.3    Skadron, K.4
  • 30
    • 70349186177 scopus 로고    scopus 로고
    • NVIDIA. CUDA ZONE. http://www.nvidia.com/cuda.
    • NVIDIA. CUDA ZONE. http://www.nvidia.com/cuda.
  • 31
    • 70349170944 scopus 로고    scopus 로고
    • NVIDIA. Geforce 8 series. http://www.nvidia.com/page/geforce8.html.
    • NVIDIA. Geforce 8 series. http://www.nvidia.com/page/geforce8.html.
  • 32
    • 84872053761 scopus 로고    scopus 로고
    • NVIDIA Corporation. NVIDIA CUDA SDK code samples. http://developer. download.nvidia.com/compute/cuda/sdk/website/samples.html.
    • NVIDIA CUDA SDK code samples
  • 33
    • 70349170942 scopus 로고    scopus 로고
    • NVIDIA Corporation. NVIDIA CUDA Programming Guide, 1.1 edition, 2007.
    • NVIDIA Corporation. NVIDIA CUDA Programming Guide, 1.1 edition, 2007.
  • 35
    • 70349183057 scopus 로고    scopus 로고
    • NVIDIA Corporation. PTX: Parallel Thread Execution ISA, 1.1 edition, 2007.
    • NVIDIA Corporation. PTX: Parallel Thread Execution ISA, 1.1 edition, 2007.
  • 37
    • 70349167820 scopus 로고    scopus 로고
    • Pcchen. N-Queens Solver. http://forums.nvidia.com/index.php?showtopic= 76893.
    • Pcchen. N-Queens Solver. http://forums.nvidia.com/index.php?showtopic= 76893.
  • 38
    • 27344435504 scopus 로고    scopus 로고
    • D. Pham, S. Asano, M. Bolliger, M. D. , H. Hofstee, C. Johns, J. Kahle, A.Kameyama, J. Keaty, Y. Masubuchi, D. S. M. Riley, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. W. D. Wendel, T.Yamazaki, and K. Yazawa. The design and implementation of a first-generation Cell processor. Digest of Technical Papers, IEEE Int'l Solid-State Circuits Conference (ISSCC), pages 184-592 1, 10-10 Feb. 2005.
    • D. Pham, S. Asano, M. Bolliger, M. D. , H. Hofstee, C. Johns, J. Kahle, A.Kameyama, J. Keaty, Y. Masubuchi, D. S. M. Riley, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. W. D. Wendel, T.Yamazaki, and K. Yazawa. The design and implementation of a first-generation Cell processor. Digest of Technical Papers, IEEE Int'l Solid-State Circuits Conference (ISSCC), pages 184-592 Vol. 1, 10-10 Feb. 2005.
  • 42
    • 38849131252 scopus 로고    scopus 로고
    • High-throughput sequence alignment using Graphics Processing Units
    • M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
    • (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
    • Schatz, M.1    Trapnell, C.2    Delcher, A.3    Varshney, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.