SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn 2015-January, Issue January, 2015, Pages 63-74

Locality-aware mapping of nested parallel patterns on GPUS

(5) Lee, Hyoukjoong a Brown, Kevin J a Sujeeth, Arvind K a Rompf, Tiark b,c Olukotun, Kunle a

a STANFORD UNIVERSITY (United States)

b PURDUE UNIVERSITY (United States)

c ORACLE CORPORATION (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BENCHMARKING; COMPUTER ARCHITECTURE; COMPUTER HARDWARE; HIGH LEVEL LANGUAGES; PROGRAM PROCESSORS; SEMANTICS;

AUTOMATIC COMPILATION; COMPILER OPTIMIZATIONS; DEGREE OF PARALLELISM; DYNAMIC MEMORY ALLOCATION; HARD AND SOFT CONSTRAINTS; HIGHER-LEVEL LANGUAGES; OPTIMIZED IMPLEMENTATION; PROGRAMMER PRODUCTIVITY;

MAPPING;

EID: 84937692188 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MICRO.2014.23 Document Type: Conference Paper

Times cited : (18)

References (29)

1
- 79952784184
- Copperhead: Compiling an embedded data parallel language
- New York, NY, USA: ACM
- B. Catanzaro, M. Garland, and K. Keutzer, "Copperhead: compiling an embedded data parallel language," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP. New York, NY, USA: ACM, 2011, pp. 47-56.
- (2011) Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP , pp. 47-56
- Catanzaro, B.¹ Garland, M.² Keutzer, K.³

2
- 84860295764
- Nikola: Embedding compiled GPU functions in Haskell
- New York, NY, USA: ACM
- G. Mainland and G. Morrisett, "Nikola: embedding compiled GPU functions in Haskell," in Proceedings of the third ACM Haskell symposium on Haskell, ser. Haskell '10. New York, NY, USA: ACM, 2010, pp. 67-78.
- (2010) Proceedings of the Third ACM Haskell Symposium on Haskell, Ser. Haskell '10 , pp. 67-78
- Mainland, G.¹ Morrisett, G.²

3
- 84887171337
- Optimising purely functional GPU programs
- New York, NY, USA: ACM. [Online]
- T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier, "Optimising purely functional GPU programs," in Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ser. ICFP '13. New York, NY, USA: ACM, 2013, pp. 49-60. [Online]. Available: http://doi.acm.org/10.1145/2500365.2500595
- (2013) Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, Ser. ICFP '13 , pp. 49-60
- McDonell, T.L.¹ Chakravarty, M.M.² Keller, G.³ Lippmeier, B.⁴

4
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- Oct.
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, Oct 2009, pp. 44-54.
- (2009) Workload Characterization 2009. IISWC 2009. IEEE International Symposium on , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

5
- 78349247407
- J. Hoberock and N. Bell, "Thrust: C++ template library for CUDA," 2009.
- (2009) Thrust: C++ Template Library for CUDA
- Hoberock, J.¹ Bell, N.²

6
- 81455154935
- Firepile: Run-time compilation for GPUS in Scala
- New York, NY, USA: ACM
- N. Nystrom, D. White, and K. Das, "Firepile: run-time compilation for GPUs in Scala," in Proceedings of the 10th ACM international conference on Generative programming and component engineering, ser. GPCE. New York, NY, USA: ACM, 2011, pp. 107-116.
- (2011) Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, Ser. GPCE , pp. 107-116
- Nystrom, N.¹ White, D.² Das, K.³

7
- 79952811127
- Accelerating CUDA graph algorithms at maximum warp
- S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP, 2011, pp. 267-276.
- (2011) Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP , pp. 267-276
- Hong, S.¹ Kim, S.K.² Oguntebi, T.³ Olukotun, K.⁴

8
- 84863015363
- A heterogeneous parallel framework for domain-specific languages
- K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "A heterogeneous parallel framework for domain-specific languages," ser. PACT, 2011.
- (2011) Ser. PACT
- Brown, K.J.¹ Sujeeth, A.K.² Lee, H.³ Rompf, T.⁴ Chafi, H.⁵ Odersky, M.⁶ Olukotun, K.⁷

9
- 84968716973
- A generic parallel collection framework
- A. Prokopec, P. Bagwell, and T. R. Abd Martin Odersky, "A generic parallel collection framework," ser. Euro-Par, 2010.
- (2010) Ser. Euro-Par
- Prokopec, A.¹ Bagwell, P.² Abd Martin Odersky, T.R.³

10
- 84880234733
- Harnessing the multicores: Nested data parallelism in Haskell
- S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty, "Harnessing the multicores: Nested data parallelism in Haskell," in FSTTCS, 2008, pp. 383-414.
- (2008) FSTTCS , pp. 383-414
- Jones, S.L.P.¹ Leshchinskiy, R.² Keller, G.³ Chakravarty, M.M.T.⁴

11
- 84872943015
- Polyhedral parallel code generation for CUDA
- Jan. [Online]
- S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado, and F. Catthoor, "Polyhedral parallel code generation for CUDA," ACM Trans. Archit. Code Optim., vol. 9, no. 4, pp. 54:1-54:23, Jan. 2013. [Online]. Available: http://doi.acm.org/10.1145/2400682.2400713
- (2013) ACM Trans. Archit. Code Optim. , vol.9 , Issue.4 , pp. 541-5423
- Verdoolaege, S.¹ Carlos Juega, J.² Cohen, A.³ Ignacio Gómez, J.⁴ Tenllado, C.⁵ Catthoor, F.⁶

12
- 84908238161
- Par4all: From convex array regions to heterogeneous computing
- M. Amini, O. Goubier, S. Guelton, J. O. Mcmahon, F.-X. Pasquier, G. PÃl'an, and P. Villalon, "Par4all: From convex array regions to heterogeneous computing," in Second International Workshop on Polyhedral Compilation Techniques, ser. IMPACT 2012, 2012.
- (2012) Second International Workshop on Polyhedral Compilation Techniques, Ser. IMPACT 2012
- Amini, M.¹ Goubier, O.² Guelton, S.³ Mcmahon, J.O.⁴ Pasquier, F.-X.⁵ Pãl'an, G.⁶ Villalon, P.⁷

13
- 0003780986
- Stanford InfoLab, Technical Report 1999-66, November 1999, previous number = SIDL-WP-1999-0120. [Online]
- L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web." Stanford InfoLab, Technical Report 1999-66, November 1999, previous number = SIDL-WP-1999-0120. [Online]. Available: http://ilpubs.stanford.edu:8090/422/
- The Pagerank Citation Ranking: Bringing Order to the Web
- Page, L.¹ Brin, S.² Motwani, R.³ Winograd, T.⁴

14
- 85162467517
- Hogwild!: A lock-free approach to parallelizing stochastic gradient descent
- F. Niu, B. Recht, C. Ré, and S. J. Wright, "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent," Advances in Neural Information Processing Systems, vol. 24, pp. 693-701, 2011.
- (2011) Advances in Neural Information Processing Systems , vol.24 , pp. 693-701
- Niu, F.¹ Recht, B.² Ré, C.³ Wright, S.J.⁴

15
- 63449118443
- Using generalized ensemble simulations and Markov state models to identify conformational states
- [Online]
- G. R. Bowman, X. Huang, and V. S. Pande, "Using generalized ensemble simulations and Markov state models to identify conformational states," Methods, vol. 49, no. 2, pp. 197-201, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1046202309000978
- (2009) Methods , vol.49 , Issue.2 , pp. 197-201
- Bowman, G.R.¹ Huang, X.² Pande, V.S.³

16
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- New York, NY, USA: ACM. [Online]
- S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY, USA: ACM, 2009, pp. 152-163. [Online]. Available: http://doi.acm.org/10.1145/1555754.1555775
- (2009) Proceedings of the 36th Annual International Symposium on Computer Architecture, Ser. ISCA '09 , pp. 152-163
- Hong, S.¹ Kim, H.²

17
- 77749337497
- An adaptive performance modeling tool for GPU architectures
- New York, NY, USA: ACM. [Online]
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '10. New York, NY, USA: ACM, 2010, pp. 105-114. [Online]. Available: http://doi.acm.org/10.1145/1693453.1693470
- (2010) Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '10 , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

18
- 67650673468
- HiCUDA: A high-level directivebased language for GPU programming
- ACM
- T. D. Han and T. S. Abdelrahman, "hiCUDA: a high-level directivebased language for GPU programming," in Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, 2009, pp. 52-61.
- (2009) Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 52-61
- Han, T.D.¹ Abdelrahman, T.S.²

19
- 58449127539
- CUDAlite: Reducing GPU programming complexity
- Springer
- S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W. H. Wen-mei, "CUDAlite: Reducing GPU programming complexity," in Languages and Compilers for Parallel Computing. Springer, 2008, pp. 1-15.
- (2008) Languages and Compilers for Parallel Computing , pp. 1-15
- Ueng, S.-Z.¹ Lathara, M.² Baghsorkhi, S.S.³ Wen-Mei, W.H.⁴

20
- 77954691442
- A GPGPU compiler for memory optimization and parallelism management
- New York, NY, USA: ACM. [Online]
- Y. Yang, P. Xiang, J. Kong, and H. Zhou, "A GPGPU compiler for memory optimization and parallelism management," in Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '10. New York, NY, USA: ACM, 2010, pp. 86-97. [Online]. Available: http://doi.acm.org/10.1145/1806596.1806606
- (2010) Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '10 , pp. 86-97
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

21
- 84863827989
- Sponge: Portable stream programming on graphics engines
- ACM
- A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke, "Sponge: portable stream programming on graphics engines," in ACM SIGPLAN Notices, vol. 46, no. 3. ACM, 2011, pp. 381-392.
- (2011) ACM SIGPLAN Notices , vol.46 , Issue.3 , pp. 381-392
- Hormati, A.H.¹ Samadi, M.² Woh, M.³ Mudge, T.⁴ Mahlke, S.⁵

22
- 84959045524
- Streamit: A language for streaming applications
- Springer
- W. Thies, M. Karczmarek, and S. Amarasinghe, "Streamit: A language for streaming applications," in Compiler Construction. Springer, 2002, pp. 179-196.
- (2002) Compiler Construction , pp. 179-196
- Thies, W.¹ Karczmarek, M.² Amarasinghe, S.³

23
- 67650563116
- Software pipelined execution of stream programs on GPUS
- IEEE
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil, "Software pipelined execution of stream programs on GPUs," in Code Generation and Optimization, 2009. CGO 2009. International Symposium on. IEEE, 2009, pp. 200-209.
- (2009) Code Generation and Optimization 2009. CGO 2009. International Symposium on , pp. 200-209
- Udupa, A.¹ Govindarajan, R.² Thazhuthaveetil, M.J.³

24
- 79959904195
- Automatic CPU-GPU communication management and optimization
- T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August, "Automatic CPU-GPU communication management and optimization," ACM SIGPLAN Notices, vol. 46, no. 6, pp. 142-151, 2011.
- (2011) ACM SIGPLAN Notices , vol.46 , Issue.6 , pp. 142-151
- Jablin, T.B.¹ Prabhu, P.² Jablin, J.A.³ Johnson, N.P.⁴ Beard, S.R.⁵ August, D.I.⁶

25
- 78650145768
- Lime: A Javacompatible and synthesizable language for heterogeneous architectures
- New York, NY, USA: ACM
- J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah, "Lime: a Javacompatible and synthesizable language for heterogeneous architectures," in Proceedings of the ACM international conference on Object oriented programming systems languages and applications, ser. OOPSLA. New York, NY, USA: ACM, 2010, pp. 89-108.
- (2010) Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, Ser. OOPSLA , pp. 89-108
- Auerbach, J.¹ Bacon, D.F.² Cheng, P.³ Rabbah, R.⁴

26
- 84863463369
- Compiling a high-level language for GPUS: (Via language support for architectures and compilers)
- C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink, "Compiling a high-level language for GPUs: (via language support for architectures and compilers)," in Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, ser. PLDI '12, 2012, pp. 1-12.
- (2012) Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '12 , pp. 1-12
- Dubach, C.¹ Cheng, P.² Rabbah, R.³ Bacon, D.F.⁴ Fink, S.J.⁵

27
- 84889679621
- Dandelion: A compiler and runtime for heterogeneous systems
- ACM
- C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly, "Dandelion: a compiler and runtime for heterogeneous systems," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013, pp. 49-68.
- (2013) Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , pp. 49-68
- Rossbach, C.J.¹ Yu, Y.² Currey, J.³ Martin, J.-P.⁴ Fetterly, D.⁵

28
- 84867546922
- Nested data-parallelism on the GPU
- New York, NY, USA: ACM. [Online]
- L. Bergstrom and J. Reppy, "Nested data-parallelism on the GPU," in Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, ser. ICFP '12. New York, NY, USA: ACM, 2012, pp. 247-258. [Online]. Available: http://doi.acm.org/10.1145/2364527.2364563
- (2012) Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, Ser. ICFP '12 , pp. 247-258
- Bergstrom, L.¹ Reppy, J.²

29
- 84896893237
- CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
- New York, NY, USA: ACM. [Online]
- Y. Yang and H. Zhou, "CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications," in Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '14. New York, NY, USA: ACM, 2014, pp. 93-106. [Online]. Available: http://doi.acm.org/10.1145/2555243.2555254.
- (2014) Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '14 , pp. 93-106
- Yang, Y.¹ Zhou, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.