메뉴 건너뛰기




Volumn , Issue , 2009, Pages 152-163

An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness

Author keywords

Analytical model; CUDA; GPU architecture; Memory level parallelism; Performance estimation; Warp level parallelism

Indexed keywords

ABSOLUTE ERROR; ANALYTICAL MODEL; APPLICATION PERFORMANCE; DEGREE OF MEMORY; DESIGN SPACES; EXECUTION TIME; GEOMETRIC MEAN; GPU COMPUTING; KEY COMPONENT; MEMORY BANDWIDTHS; MEMORY LEVEL PARALLELISMS; MODEL ESTIMATES; MULTI CORE; OVERALL EXECUTION; PARALLEL APPLICATION; PARALLEL MEMORY; PARALLEL PROCESSOR; PARALLEL PROGRAM; PERFORMANCE BOTTLENECKS; PERFORMANCE CHARACTERISTICS; PERFORMANCE ESTIMATION; PROGRAMMING LANGUAGE; SOFTWARE ENGINEERS; THREAD LEVEL PARALLELISM;

EID: 70450231944     PISSN: 10636897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1555754.1555775     Document Type: Conference Paper
Times cited : (521)

References (28)
  • 1
    • 84869687636 scopus 로고    scopus 로고
    • ATI Mobility RadeonTM HD4850/4870 Graphics-Overview
    • ATI Mobility RadeonTM HD4850/4870 Graphics-Overview. http://ati.amd.com/ products/radeonhd4800.
  • 5
    • 84869664151 scopus 로고    scopus 로고
    • Advanced Micro Devices, Inc
    • Advanced Micro Devices, Inc. AMD Brook+. http://ati.amd.com/technology/ streamcomputing/AMD-Brookplus.pdf.
    • AMD Brook
  • 7
    • 64949101685 scopus 로고    scopus 로고
    • A first-order fine-grained multithreaded throughput model
    • X. E. Chen and T. M. Aamodt. A first-order fine-grained multithreaded throughput model. In HPCA, 2009.
    • (2009) HPCA
    • Chen, X.E.1    Aamodt, T.M.2
  • 8
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A Unified Graphics and Computing Architecture
    • March-April
    • E. Lindholm, J. Nickolls, S. Oberman and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2):39-55, March-April 2008.
    • (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 12
    • 70450274279 scopus 로고    scopus 로고
    • An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
    • Technical Report TR-2009-003, Atlanta, GA, USA
    • S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. Technical Report TR-2009-003, Atlanta, GA, USA, 2009.
    • (2009)
    • Hong, S.1    Kim, H.2
  • 14
    • 70450275951 scopus 로고    scopus 로고
    • Intel SSE/MMX2/KNI documentation. http://www.intel80386.com/simd/mmx2- doc.html.
    • Intel SSE/MMX2/KNI documentation. http://www.intel80386.com/simd/mmx2- doc.html.
  • 17
    • 68149168035 scopus 로고    scopus 로고
    • Merge: A programming model for heterogeneous multi-core systems
    • M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008.
    • (2008) ASPLOS , vol.13
    • Linderman, M.D.1    Collins, J.D.2    Wang, H.3    Meng, T.H.4
  • 18
    • 0034824085 scopus 로고    scopus 로고
    • Data-flow prescheduling for large instruction windows in out-of-order processors
    • P. Michaud and A. Seznec. Data-flow prescheduling for large instruction windows in out-of-order processors. In HPCA, 2001.
    • (2001) HPCA
    • Michaud, P.1    Seznec, A.2
  • 19
    • 0033365427 scopus 로고    scopus 로고
    • Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors
    • P. Michaud, A. Seznec, and S. Jourdan. Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors. In PA C T, 1999.
    • (1999) PA , Issue.C T
    • Michaud, P.1    Seznec, A.2    Jourdan, S.3
  • 20
    • 78651550268 scopus 로고    scopus 로고
    • Scalable Parallel Programming with CUDA
    • March-April
    • J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, March-April 2008.
    • (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
    • Nickolls, J.1    Buck, I.2    Garland, M.3    Skadron, K.4
  • 21
    • 85016676932 scopus 로고
    • Theoretical modeling of superscalar processor performance
    • D. B. Noonburg and J. P. Shen. Theoretical modeling of superscalar processor performance. In MICRO-27, 1994.
    • (1994) MICRO-27
    • Noonburg, D.B.1    Shen, J.P.2
  • 24
    • 43449094719 scopus 로고    scopus 로고
    • S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO, 2008.
    • S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Stratton, and W. Hwu. Program optimization space pruning for a multithreaded gpu. In CGO, 2008.
  • 25
    • 0342373102 scopus 로고
    • An analytical solution for a markov chain modeling multithreaded
    • Technical report, Berkeley, CA, USA
    • R. H. Saavedra-Barrera and D. E. Culler. An analytical solution for a markov chain modeling multithreaded. Technical report, Berkeley, CA, USA, 1991.
    • (1991)
    • Saavedra-Barrera, R.H.1    Culler, D.E.2
  • 27
    • 0031593993 scopus 로고    scopus 로고
    • Analytic evaluation of shared-memory systems with ILP processors
    • D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood. Analytic evaluation of shared-memory systems with ILP processors. In ISCA, 1998.
    • (1998) ISCA
    • Sorin, D.J.1    Pai, V.S.2    Adve, S.V.3    Vernon, M.K.4    Wood, D.A.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.