SCOPUS 정보 검색 플랫폼

ACM SIGPLAN Notices

Volumn 48, Issue 8, 2013, Pages 57-67

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

(5) Wu, Bo a Zhao, Zhijia a Zhang, Eddy Z b Jiang, Yunlian c Shen, Xipeng a

a The College of William and Mary (United States)

b RUTGERS UNIVERSITY (United States)

c GOOGLE INC (United States)

Author keywords

Computational complexity; Data transformation; GPGPU; Memory coalescing; Runtime optimizations; Thread data remapping

Indexed keywords

DATA TRANSFORMATION; GPGPU; MEMORY COALESCING; RUNTIME OPTIMIZATION; THREAD-DATA REMAPPING;

COMPUTATIONAL COMPLEXITY; FLOCCULATION; HEURISTIC METHODS; PROGRAM PROCESSORS;

ALGORITHMS;

EID: 84885201786 PISSN: 15232867 EISSN: None Source Type: Journal
DOI: 10.1145/2517327.2442523 Document Type: Conference Paper

Times cited : (39)

References (26)

1
- 0004072686
- Addison Wesley, 2nd edition, August
- A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 2nd edition, August 2006.
- (2006) Compilers: Principles, Techniques, and Tools
- Aho, A.V.¹ Lam, M.S.² Sethi, R.³ Ullman, J.D.⁴

2
- 57349180412
- A compiler framework for optimization of affine loop nests for gpgpus
- M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08, pages 225-234, 2008.
- (2008) ICS'08 , pp. 225-234
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

3
- 77951159230
- A control-structure splitting optimization for gpgpu
- S. Carrillo, J. Siegel, and X. Li. A control-structure splitting optimization for gpgpu. In Proceedings of ACM Computing Frontiers, 2009.
- (2009) Proceedings of ACM Computing Frontiers
- Carrillo, S.¹ Siegel, J.² Li, X.³

4
- 0003510310
- PhD thesis, University of Illinois at Urbana-Champaign
- G. C. Cascaval. Compile-time Performance Prediction of Scientific Programs. PhD thesis, University of Illinois at Urbana-Champaign, 2000.
- (2000) Compile-time Performance Prediction of Scientific Programs
- Cascaval, G.C.¹

5
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In SC, 2011.
- (2011) SC
- Che, S.¹ Sheaffer, J.W.² Skadron, K.³

6
- 33746070806
- Cache-conscious coallocation of hot data streams
- T. M. Chilimbi and R. Shaham. Cache-conscious coallocation of hot data streams. In PLDI, 2006.
- (2006) PLDI
- Chilimbi, T.M.¹ Shaham, R.²

7
- 77954719557
- A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. 2010.
- (2010) The Scalable Heterogeneous Computing (Shoc) Benchmark Suite
- Danalis, A.¹ Marin, G.² McCurdy, C.³ Meredith, J.⁴ Roth, P.⁵ Spafford, K.⁶ Tipparaju, V.⁷ Vetter, J.⁸

8
- 1642502420
- Improving effective bandwidth through compiler enhancement of global cache reuse
- C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1): 108-134, 2004.
- (2004) Journal of Parallel and Distributed Computing , vol.64 , Issue.1 , pp. 108-134
- Ding, C.¹ Kennedy, K.²

9
- 47349104432
- Dynamic warp formation and scheduling for efficient gpu control flow
- Washington, DC, USA, IEEE Computer Society
- W. Fung, I. Sham, G. Yuan, and T. Aamodt. Dynamic warp formation and scheduling for efficient gpu control flow. In MICRO'07, pages 407-420, Washington, DC, USA, 2007. IEEE Computer Society.
- (2007) MICRO'07 , pp. 407-420
- Fung, W.¹ Sham, I.² Yuan, G.³ Aamodt, T.⁴

10
- 33745715056
- Exploiting locality for irregular scientific codes
- H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel Distributed Systems, 17(7):606-618, 2006.
- (2006) IEEE Transactions on Parallel Distributed Systems , vol.17 , Issue.7 , pp. 606-618
- Han, H.¹ Tseng, C.-W.²

11
- 0003777592
- PWS Publishing Company
- D. S. Hochbaum. Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, 1995.
- (1995) Approximation Algorithms for NP-Hard Problems
- Hochbaum, D.S.¹

12
- 79953071805
- Sponge: Portable stream programming on graphics engines
- A. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. In ASPLOS, 2011.
- (2011) ASPLOS
- Hormati, A.¹ Samadi, M.² Woh, M.³ Mudge, T.⁴ Mahlke, S.⁵

13
- 79959575872
- An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus
- X. Huo, V. Ravi, W. Ma, and G. Agrawal. An execution strategy and optimized runtime support for parallelizing irregular reductions on modern gpus. In ICS, 2011.
- (2011) ICS
- Huo, X.¹ Ravi, V.² Ma, W.³ Agrawal, G.⁴

14
- 81455141868
- Enhancing locality for recursive traversals of recursive structures
- Y. Jo and M. KulKarni. Enhancing locality for recursive traversals of recursive structures. In OOPSLA, 2011.
- (2011) OOPSLA
- Jo, Y.¹ Kulkarni, M.²

15
- 0035029828
- A compiler technique for improving whole-program locality
- M. Kandemir. A compiler technique for improving whole-program locality. In POPL, 2001.
- (2001) POPL
- Kandemir, M.¹

16
- 84863371431
- Opencl as a unified programming model for heterogeneous cpu/gpu clusters
- J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. Opencl as a unified programming model for heterogeneous cpu/gpu clusters. In PPoPP, 2012.
- (2012) PPoPP
- Kim, J.¹ Seo, S.² Lee, J.³ Nah, J.⁴ Jo, G.⁵ Lee, J.⁶

17
- 77957808385
- Optimistic parallelism benefits from data partitioning
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS, pages 233-243, 2008.
- (2008) ASPLOS , pp. 233-243
- Kulkarni, M.¹ Pingali, K.² Ramanarayanan, G.³ Walter, B.⁴ Bala, K.⁵ Chew, L.P.⁶

18
- 67650081010
- Openmp to gpgpu: A compiler framework for automatic translation and optimization
- S. Lee, S. Min, and R. Eigenmann. Openmp to gpgpu: A compiler framework for automatic translation and optimization. In PPoPP, 2009.
- (2009) PPoPP
- Lee, S.¹ Min, S.² Eigenmann, R.³

19
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In ISCA, 2010.
- (2010) ISCA
- Meng, J.¹ Tarjan, D.² Skadron, K.³

20
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded gpu using cuda
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, andW.W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, pages 73-82, 2008.
- (2008) PPoPP , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

21
- 0038039924
- Compile-time composition of run-time data and iteration reorderings
- San Diego, CA, June
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In PLDI, San Diego, CA, June 2003.
- (2003) PLDI
- Strout, M.M.¹ Carter, L.² Ferrante, J.³

22
- 74049151553
- Increasing memory miss tolerance for simd cores
- D. Tarjan, J. Meng, and K. Skadron. Increasing memory miss tolerance for simd cores. In SC, 2009.
- (2009) SC
- Tarjan, D.¹ Meng, J.² Skadron, K.³

23
- 84856544146
- Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control
- B. Wu, E. Zhang, and X. Shen. Enhancing data locality for dynamic simulations through asynchronous data transformations and adaptive control. In PACT, 2011.
- (2011) PACT
- Wu, B.¹ Zhang, E.² Shen, X.³

24
- 0033703258
- Cacheminer: A runtime approach to exploit cache locality on smp
- Y. Yan, X. Zhang, and Z. Zhang. Cacheminer: A runtime approach to exploit cache locality on smp. IEEE Transactions on Parallel Distributed Systems, 11(4): 357-374, 2000.
- (2000) IEEE Transactions on Parallel Distributed Systems , vol.11 , Issue.4 , pp. 357-374
- Yan, Y.¹ Zhang, X.² Zhang, Z.³

25
- 77954691442
- A gpgpu compiler for memory optimization and parallelism management
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In PLDI, 2010.
- (2010) PLDI
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

26
- 79953126288
- On-the-fly elimination of dynamic irregularities for gpu computing
- E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In ASPLOS, 2011.
- (2011) ASPLOS
- Zhang, E.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.