SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn 2015-January, Issue January, 2015, Pages 88-100

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

(4) Chen, Guoyang a Wu, Bo b Li, Dong c Shen, Xipeng a

a North Carolina State University (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

c Department of Metallurgical and Materials Engineering (United States)

Author keywords

cache; compiler; data placement; hardware specification language

Indexed keywords

CACHE MEMORY; GRAPHICS PROCESSING UNIT; MEMORY ARCHITECTURE; SPECIFICATION LANGUAGES; SPECIFICATIONS; TEXTURES;

CACHE; COMPILER; CONSTANT MEMORY; DATA ACCESS PATTERNS; DATA PLACEMENT; HARDWARE SPECIFICATIONS; PLACEMENT SCHEME; RUN-TIME PROFILING;

PROGRAM COMPILERS;

EID: 84937693610 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MICRO.2014.20 Document Type: Conference Paper

Times cited : (46)

References (30)

1
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-M. W. Hwu, "An adaptive performance modeling tool for gpu architectures," in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '10, 2010, pp. 105-114.
- (2010) Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '10 , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

2
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- ACM SIGPLAN symposium on Principles and practice of parallel programming
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W. Mei W. Hwu, "An adaptive performance modeling tool for GPU architectures," in ACM SIGPLAN symposium on Principles and practice of parallel programming, 2010.
- (2010) In
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Mei, W.⁵ Hwu, W.⁶

3
- 57349180412
- A compiler framework for optimization of affine loop nests for GPGPUS
- M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A compiler framework for optimization of affine loop nests for GPGPUs," in ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, 2008, pp. 225-234.
- (2008) ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing , pp. 225-234
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

4
- 80053007267
- Measurements of major locality phases in symbolic reference strings
- Cambridge, MA, March
- A. P. Batson and A. W. Madison, "Measurements of major locality phases in symbolic reference strings," in Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Cambridge, MA, March 1976.
- (1976) Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems
- Batson, A.P.¹ Madison, A.W.²

5
- 0003510310
- Ph.D. dissertation University of Illinois at Urbana-Champaign
- G. C. Cascaval, "Compile-time performance prediction of scientific programs," Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2000.
- (2000) Compile-time Performance Prediction of Scientific Programs
- Cascaval, G.C.¹

6
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in IISWC, 2009.
- (2009) IISWC
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

7
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- S. Che, J. W. Sheaffer, and K. Skadron, "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11, 2011, pp. 13:1-13:11.
- (2011) Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Ser. SC '11 , pp. 131-1311
- Che, S.¹ Sheaffer, J.W.² Skadron, K.³

8
- 77954719557
- The scalable heterogeneous computing (shoc) benchmark suite
- A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (shoc) benchmark suite," in GPGPU, 2010.
- (2010) GPGPU
- Danalis, A.¹ Marin, G.² McCurdy, C.³ Meredith, J.S.⁴ Roth, P.C.⁵ Spafford, K.⁶ Tipparaju, V.⁷ Vetter, J.S.⁸

9
- 81355161778
- The university of Florida sparse matrix collection
- Dec.
- T. A. Davis and Y. Hu, "The university of Florida sparse matrix collection," ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1:1-1:25, Dec. 2011.
- (2011) ACM Trans. Math. Softw. , vol.38 , Issue.1 , pp. 11-125
- Davis, T.A.¹ Hu, Y.²

10
- 1442313416
- Predicting whole-program locality with reuse distance analysis
- San Diego, CA, June
- C. Ding and Y. Zhong, "Predicting whole-program locality with reuse distance analysis," in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003, pp. 245-257.
- (2003) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 245-257
- Ding, C.¹ Zhong, Y.²

11
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in International Symposium on Computer Architecture, 2009.
- (2009) International Symposium on Computer Architecture
- Hong, S.¹ Kim, H.²

12
- 78649824847
- Exploiting memory access patterns to improve memory performance in data-parallel architectures
- B. Jang, D. Schaa, P. Mistry, and D. Kaeli, "Exploiting memory access patterns to improve memory performance in data-parallel architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 105-118, 2011.
- (2011) IEEE Transactions on Parallel and Distributed Systems , vol.22 , Issue.1 , pp. 105-118
- Jang, B.¹ Schaa, D.² Mistry, P.³ Kaeli, D.⁴

13
- 84881191462
- Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache
- D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked dram caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache." in ISCA, 2013, pp. 404-415.
- (2013) ISCA , pp. 404-415
- Jevdjic, D.¹ Volos, S.² Falsafi, B.³

14
- 84864068497
- Characterizing and improving the use of demand-fetched caches in GPUS
- W. Jia, K. A. Shaw, and M. Martonosi, "Characterizing and improving the use of demand-fetched caches in gpus," in Proceedings of the 26th ACM international conference on Supercomputing, ser. ICS '12, 2012.
- (2012) Proceedings of the 26th ACM International Conference on Supercomputing, Ser. ICS '12
- Jia, W.¹ Shaw, K.A.² Martonosi, M.³

15
- 35048854568
- Cetus-an extensible compiler infrastructure for source-to-source transformation
- S. Lee, T. Johnson, and R. Eigenmann, "Cetus-an extensible compiler infrastructure for source-to-source transformation," in In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003, pp. 539-553.
- (2003) Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC) , pp. 539-553
- Lee, S.¹ Johnson, T.² Eigenmann, R.³

16
- 78149272414
- An integer programming framework for optimizing shared memory use on GPUS
- W. Ma and G. Agrawal, "An integer programming framework for optimizing shared memory use on gpus," in PACT, 2010, pp. 553-554.
- (2010) PACT , pp. 553-554
- Ma, W.¹ Agrawal, G.²

17
- 33750831358
- Generic database cost models for hierarchical memory systems
- S. Manegold, P. Boncz, and M. L. Kersten, "Generic Database Cost Models for Hierarchical Memory Systems," in Proceedings of VLDB, 2002, pp. 191-202.
- (2002) Proceedings of VLDB , pp. 191-202
- Manegold, S.¹ Boncz, P.² Kersten, M.L.³

18
- 70450273507
- Scalable high performance main memory system using phase-change memory technology
- M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable high performance main memory system using phase-change memory technology," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09, 2009, pp. 24-33.
- (2009) Proceedings of the 36th Annual International Symposium on Computer Architecture, Ser. ISCA '09 , pp. 24-33
- Qureshi, M.K.¹ Srinivasan, V.² Rivers, J.A.³

19
- 79959583242
- Page placement in hybrid memory systems
- L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the International Conference on Supercomputing, ser. ICS '11, 2011, pp. 85-95.
- (2011) Proceedings of the International Conference on Supercomputing, Ser. ICS '11 , pp. 85-95
- Ramos, L.E.¹ Gorbatov, E.² Bianchini, R.³

20
- 84863347222
- A performance analysis framework for identifying potential benefits in GPGPU applications
- J. Sim, A. Dasgupta, H. Kim, and R. W. Vuduc, "A performance analysis framework for identifying potential benefits in GPGPU applications," in ACM SIGPLAN symposium on Principles and practice of parallel programming, 2012.
- (2012) ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Sim, J.¹ Dasgupta, A.² Kim, H.³ Vuduc, R.W.⁴

21
- 33846547030
- On the effectiveness of set associative page mapping and its applications in main memory management
- A. J. Smith, "On the effectiveness of set associative page mapping and its applications in main memory management," in Proceedings of the 2nd International Conference on Software Engineering, 1976, pp. 286-292.
- (1976) Proceedings of the 2nd International Conference on Software Engineering , pp. 286-292
- Smith, A.J.¹

22
- 78149251414
- Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
- I.-J. Sung, J. A. Stratton, and W.-M. W. Hwu, "Data layout transformation exploiting memory-level parallelism in structured grid many-core applications," in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '10, 2010, pp. 513-522.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '10 , pp. 513-522
- Sung, I.-J.¹ Stratton, J.A.² Hwu, W.-M.W.³

23
- 84887454272
- Exploring hybrid memory for GPU energy efficiency through softwarehardware co-design
- Piscataway, NJ, USA: IEEE Press
- B. Wang, B. Wu, D. Li, X. Shen, W. Yu, Y. Jiao, and J. S. Vetter, "Exploring hybrid memory for gpu energy efficiency through softwarehardware co-design," in Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 93-102.
- (2013) Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '13 , pp. 93-102
- Wang, B.¹ Wu, B.² Li, D.³ Shen, X.⁴ Yu, W.⁵ Jiao, Y.⁶ Vetter, J.S.⁷

24
- 77952579552
- Demystifying GPU microarchitecture through microbenchmarking
- H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, "Demystifying gpu microarchitecture through microbenchmarking." in ISPASS. IEEE Computer Society, 2010, pp. 235-246.
- (2010) ISPASS. IEEE Computer Society , pp. 235-246
- Wong, H.¹ Papadopoulou, M.-M.² Sadooghi-Alvandi, M.³ Moshovos, A.⁴

25
- 84875195366
- Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU
- B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen, "Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu," in Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013.
- (2013) Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Wu, B.¹ Zhao, Z.² Zhang, E.Z.³ Jiang, Y.⁴ Shen, X.⁵

26
- 77954691442
- A gpgpu compiler for memory optimization and parallelism management
- Y. Yang, P. Xiang, J. Kong, and H. Zhou, "A gpgpu compiler for memory optimization and parallelism management," in Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '10, 2010, pp. 86-97.
- (2010) Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '10 , pp. 86-97
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

27
- 79953126288
- On - The-fly elimination of dynamic irregularities for GPU computing
- E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, "On-the-fly elimination of dynamic irregularities for gpu computing," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2011.
- (2011) Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems
- Zhang, E.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

28
- 79955921273
- A quantitative performance analysis model for GPU architectures
- Y. Zhang and J. D. Owens, "A quantitative performance analysis model for gpu architectures," in Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, ser. HPCA '11, 2011, pp. 382-393.
- (2011) Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, Ser. HPCA '11 , pp. 382-393
- Zhang, Y.¹ Owens, J.D.²

29
- 84863496063
- Auto-generation and auto-tuning of 3d stencil codes on GPU clusters
- Y. Zhang and F. Mueller, "Auto-generation and auto-tuning of 3d stencil codes on gpu clusters," in Proceedings of the Tenth International Symposium on Code Generation and Optimization, ser. CGO '12, 2012, pp. 155-164.
- (2012) Proceedings of the Tenth International Symposium on Code Generation and Optimization, Ser. CGO '12 , pp. 155-164
- Zhang, Y.¹ Mueller, F.²

30
- 84968739606
- Miss rate prediction across all program inputs
- New Orleans, Louisiana, September
- Y. Zhong, S. G. Dropsho, and C. Ding, "Miss rate prediction across all program inputs," in Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.
- (2003) Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
- Zhong, Y.¹ Dropsho, S.G.² Ding, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.