SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn 2016-December, Issue , 2016, Pages

Efficient kernel synthesis for performance portable programming

(5) Chang, Li Wen a Hajj, Izzat El a Rodrigues, Christopher b Gomez Luna, Juan c Hwu, Wen Mei a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

b Huawei America Research Lab (Spain)

c UNIVERSITY OF CÓRDOBA (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SOFTWARE PORTABILITY; ENERGY EFFICIENCY;

ARCHITECTURAL MODELING; COMPUTING ELEMENT; HETEROGENEOUS COMPUTING SYSTEM; HIERARCHICAL ORGANIZATIONS; MICRO-ARCHITECTURE DESIGN; PERFORMANCE PORTABILITY; PORTABLE PROGRAMMING; TARGET ARCHITECTURES;

COMPUTER ARCHITECTURE;

EID: 85008936147 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MICRO.2016.7783715 Document Type: Conference Paper

Times cited : (15)

References (43)

1
- 84961311076
- N. Rotem, "Intel OpenCL implicit vectorization module, " 2011.
- (2011) Intel OpenCL Implicit Vectorization Module
- Rotem, N.¹

2
- 78149276036
- Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors
- J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. R. Gaster, and B. Zheng, "Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors, " in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 205-216, 2010.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 205-216
- Gummaraju, J.¹ Morichetti, L.² Houston, M.³ Sander, B.⁴ Gaster, B.R.⁵ Zheng, B.⁶

3
- 84859143447
- Improving performance of opencl on cpus
- R. Karrenberg and S. Hack, "Improving Performance of OpenCL on CPUs, " in Proceedings of the 21st International Conference on Compiler Construction, pp. 1-20, 2012.
- (2012) Proceedings of the 21st International Conference on Compiler Construction , pp. 1-20
- Karrenberg, R.¹ Hack, S.²

4
- 84938982672
- Pocl: A performance-portable opencl implementation
- P. Jääskeläinen, C. S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg, "pocl: A performance-portable opencl implementation, " International Journal of Parallel Programming, vol. 43, no. 5, pp. 752-785, 2015.
- (2015) International Journal of Parallel Programming , vol.43 , Issue.5 , pp. 752-785
- Jääskeläinen, P.¹ De La Lama, C.S.² Schnetter, E.³ Raiskila, K.⁴ Takala, J.⁵ Berg, H.⁶

5
- 84961314978
- Localitycentric thread scheduling for bulk-synchronous programming models on CPU architectures
- H.-S. Kim, I. El Hajj, J. Stratton, S. Lumetta, and W.-M. Hwu, "Localitycentric thread scheduling for bulk-synchronous programming models on CPU architectures, " in Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 257-268, 2015.
- (2015) Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 257-268
- Kim, H.-S.¹ El Hajj, I.² Stratton, J.³ Lumetta, S.⁴ Hwu, W.-M.⁵

6
- 84937693610
- PORPLE: An extensible optimizer for portable data placement on GPU
- G. Chen, B. Wu, D. Li, and X. Shen, "PORPLE: An extensible optimizer for portable data placement on GPU, " in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 88-100, 2014.
- (2014) Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 88-100
- Chen, G.¹ Wu, B.² Li, D.³ Shen, X.⁴

7
- 78649824847
- Exploiting memory access patterns to improve memory performance in data-parallel architectures
- B. Jang, D. Schaa, P. Mistry, and D. Kaeli, "Exploiting memory access patterns to improve memory performance in data-parallel architectures, " IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 105-118, 2011.
- (2011) IEEE Trans. Parallel Distrib. Syst , vol.22 , Issue.1 , pp. 105-118
- Jang, B.¹ Schaa, D.² Mistry, P.³ Kaeli, D.⁴

8
- 0343462141
- Automated empirical optimizations of software and the atlas project
- R. C. Whaley, A. Petitet, and J. J. Dongarra, "Automated empirical optimizations of software and the atlas project, " Parallel Computing, vol. 27, no. 1, pp. 3-35, 2001.
- (2001) Parallel Computing , vol.27 , Issue.1 , pp. 3-35
- Whaley, R.C.¹ Petitet, A.² Dongarra, J.J.³

9
- 1542396679
- Spiral: A generator for platform-Adapted libraries of signal processing alogorithms
- M. Püschel, J. M. Moura, B. Singer, J. Xiong, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson, "Spiral: A generator for platform-Adapted libraries of signal processing alogorithms, " International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 21-45, 2004.
- (2004) International Journal of High Performance Computing Applications , vol.18 , Issue.1 , pp. 21-45
- Püschel, M.¹ Moura, J.M.² Singer, B.³ Xiong, J.⁴ Johnson, J.⁵ Padua, D.⁶ Veloso, M.⁷ Johnson, R.W.⁸

10
- 84870725376
- Policy-based tuning for performance portability and library co-optimization
- D. Merrill, M. Garland, and A. Grimshaw, "Policy-based tuning for performance portability and library co-optimization, " in Innovative Parallel Computing, pp. 1-10, 2012.
- (2012) Innovative Parallel Computing , pp. 1-10
- Merrill, D.¹ Garland, M.² Grimshaw, A.³

11
- 84883116448
- Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
- J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, "Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, " ACM SIGPLAN Notices, vol. 48, no. 6, pp. 519-530, 2013.
- (2013) ACM SIGPLAN Notices , vol.48 , Issue.6 , pp. 519-530
- Ragan-Kelley, J.¹ Barnes, C.² Adams, A.³ Paris, S.⁴ Durand, F.⁵ Amarasinghe, S.⁶

12
- 0003966887
- tech. rep., DTIC Document
- G. E. Blelloch, "NESL: A nested data-parallel language.(version 3.1), " tech. rep., DTIC Document, 1995.
- (1995) NESL: A Nested Data-parallel Language.(version 3.1)
- Blelloch, G.E.¹

13
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan, "Sequoia: Programming the memory hierarchy, " in Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, 2006.
- (2006) Proceedings of the 2006 ACM/ IEEE Conference on Supercomputing, ACM
- Fatahalian, K.¹ Horn, D.R.² Knight, T.J.³ Leem, L.⁴ Houston, M.⁵ Park, J.Y.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

14
- 70450227331
- Petabricks: A language and compiler for algorithmic choice
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe, "Petabricks: A language and compiler for algorithmic choice, " in Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 38-49, 2009.
- Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation , vol.2009 , pp. 38-49
- Ansel, J.¹ Chan, C.² Wong, Y.L.³ Olszewski, M.⁴ Zhao, Q.⁵ Edelman, A.⁶ Amarasinghe, S.⁷

15
- 80053955412
- Accelerating CUDA graph algorithms at maximum warp
- S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp, " in ACM SIGPLAN Notices, vol. 46, pp. 267-276, 2011.
- (2011) ACM SIGPLAN Notices , vol.46 , pp. 267-276
- Hong, S.¹ Kim, S.K.² Oguntebi, T.³ Olukotun, K.⁴

16
- 85009382810
- KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
- in press
- I. El Hajj, J. Ǵomez-Luna, C. Li, L.-W. Chang, D. Milojicic, and W. mei Hwu, "KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism, " in Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016 (in press).
- (2016) Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture
- El Hajj, I.¹ Ǵomez-Luna, J.² Li, C.³ Chang, L.-W.⁴ Milojicic, W.⁵ Mei Hwu, D.⁶

17
- 85009366731
- NVIDIA, CUDA C best practices guide, v. 7.0
- NVIDIA, "CUDA C best practices guide v. 7.0, " 2015.
- (2015)

18
- 84975230376
- DySel: Lightweight dynamic selection for kernelbased data-parallel programming model
- ACM
- L.-W. Chang, H.-S. Kim, and W.-m. Hwu, "DySel: Lightweight dynamic selection for kernelbased data-parallel programming model, " in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 667-680, ACM, 2016.
- (2016) Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 667-680
- Chang, L.-W.¹ Kim, W.-M.² Hwu, H.-S.³

19
- 20344394051
- "The Matrix Market." http://math.nist.gov/MatrixMarket/.
- The Matrix Market

20
- 3042658703
- LLVM: A compilation framework for lifelong program analysis & transformation
- C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation, " in Code Generation and Optimization, International Symposium on, pp. 75-86, 2004.
- (2004) Code Generation and Optimization, International Symposium on , pp. 75-86
- Lattner, C.¹ Adve, V.²

21
- 84882564541
- Thrust: A productivity-oriented library for CUDA
- N. Bell and J. Hoberock, "Thrust: A productivity-oriented library for CUDA, " GPU Computing Gems Jade Edition, p. 359, 2011.
- (2011) GPU Computing Gems Jade Edition , pp. 359
- Bell, N.¹ Hoberock, J.²

22
- 85009381347
- Intel Math Kernel Library
- "Intel Math Kernel Library." http://software.intel.com/enus/articles/intel-mkl/.

23
- 84977938542
- NVIDIA. NVIDIA, v7.0 ed Oct
- NVIDIA, CUBLAS Library User Guide. NVIDIA, v7.0 ed., Oct. 2015.
- (2015) CUBLAS Library User Guide

24
- 84989261328
- NVIDIA Aug
- NVIDIA, CUDA CUSPARSE Library, Aug. 2015.
- (2015) CUDA CUSPARSE Library

25
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing, " in Workload Characterization, 2009, IEEE International Symposium on, pp. 44-54, 2009.
- Workload Characterization 2009 IEEE International Symposium on , vol.2009 , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

26
- 57349184047
- Fast scan algorithms on graphics processors
- Y. Dotsenko, N. K. Govindaraju, P.-P. Sloan, C. Boyd, and J. Manferdelli, "Fast scan algorithms on graphics processors, " in Proceedings of the 22Nd Annual International Conference on Supercomputing, pp. 205-213, 2008.
- (2008) Proceedings of the 22Nd Annual International Conference on Supercomputing , pp. 205-213
- Dotsenko, Y.¹ Govindaraju, N.K.² Sloan, P.-P.³ Boyd, C.⁴ Manferdelli, J.⁵

27
- 84875175606
- StreamScan: Fast scan algorithms for GPUs without global barrier synchronization
- S. Yan, G. Long, and Y. Zhang, "StreamScan: Fast scan algorithms for GPUs without global barrier synchronization, " in Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 229-238, 2013.
- (2013) Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 229-238
- Yan, S.¹ Long, G.² Zhang, Y.³

28
- 84976501593
- Inplace data sliding algorithms for many-core architectures
- IEEE
- J. Ǵomez-Luna, L.-W. Chang, I.-J. Sung, W.-M. Hwu, and N. Guil, "Inplace data sliding algorithms for many-core architectures, " in Parallel Processing, 2015 44th International Conference on, pp. 210-219, IEEE, 2015.
- (2015) Parallel Processing 2015 44th International Conference on , pp. 210-219
- Ǵomez-Luna, J.¹ Chang, L.-W.² Sung, I.-J.³ Hwu, W.-M.⁴ Guil, N.⁵

29
- 84876904433
- Performance upper bound analysis and optimization of sgemm on Fermi and Kepler GPUs
- J. Lai and A. Seznec, "Performance upper bound analysis and optimization of sgemm on Fermi and Kepler GPUs, " in Code Generation and Optimization, 2013 IEEE/ACM International Symposium on, pp. 1-10, 2013.
- (2013) Code Generation and Optimization 2013 IEEE/ACM International Symposium on , pp. 1-10
- Lai, J.¹ Seznec, A.²

30
- 70350368872
- NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation
- N. Bell and M. Garland, "Efficient sparse matrix-vector multiplication on CUDA, " NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, 2008.
- (2008) Efficient Sparse Matrix-vector Multiplication on CUDA
- Bell, N.¹ Garland, M.²

31
- 77952273045
- The scalable heterogeneous computing (SHOC) benchmark suite
- A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (SHOC) benchmark suite, " in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63-74, 2010.
- (2010) Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units , pp. 63-74
- Danalis, A.¹ Marin, G.² McCurdy, C.³ Meredith, J.S.⁴ Roth, P.C.⁵ Spafford, K.⁶ Tipparaju, V.⁷ Vetter, J.S.⁸

32
- 84936931250
- Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format
- IEEE
- J. L. Greathouse and M. Daga, "Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format, " in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 769-780, IEEE, 2014.
- (2014) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 769-780
- Greathouse, J.L.¹ Daga, M.²

33
- 84939147992
- A collection-oriented programming model for performance portability
- S. Muralidharan, M. Garland, B. Catanzaro, A. Sidelnik, and M. Hall, "A collection-oriented programming model for performance portability, " in Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 263-264, 2015.
- (2015) Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 263-264
- Muralidharan, S.¹ Garland, M.² Catanzaro, B.³ Sidelnik, A.⁴ Hall, M.⁵

34
- 84957710915
- Generating performance portable code using rewrite rules: From high-level functional expressions to high-performance OpenCL code
- M. Steuwer, C. Fensch, S. Lindley, and C. Dubach, "Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code, " in Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp. 205-217, 2015.
- (2015) Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming , pp. 205-217
- Steuwer, M.¹ Fensch, C.² Lindley, S.³ Dubach, C.⁴

35
- 80054864401
- PEPPHER: Efficient and productive usage of hybrid computing systems
- S. Benkner, S. Pllana, J. L. Träf, P. Tsigas, U. Dolinsky, C. Augonnet, B. Bachmayer, C. Kessler, D. Moloney, and V. Osipov, "PEPPHER: Efficient and productive usage of hybrid computing systems, " IEEE Micro, vol. 31, no. 5, pp. 28-41, 2011.
- (2011) IEEE Micro , vol.31 , Issue.5 , pp. 28-41
- Benkner, S.¹ Pllana, S.² Träf, J.L.³ Tsigas, P.⁴ Dolinsky, U.⁵ Augonnet, C.⁶ Bachmayer, B.⁷ Kessler, C.⁸ Moloney, D.⁹ Osipov, V.¹⁰

36
- 84876535618
- The PEPPHER composition tool: Performance-Aware dynamic composition of applications for GPU-based systems
- U. Dastgeer, L. Li, and C. Kessler, "The PEPPHER composition tool: Performance-Aware dynamic composition of applications for GPU-based systems, " in High Performance Computing, Networking, Storage and Analysis, 2012 SC Companion:, pp. 711-720, 2012.
- (2012) High Performance Computing, Networking, Storage and Analysis 2012 SC Companion , pp. 711-720
- Dastgeer, U.¹ Li, L.² Kessler, C.³

37
- 84937692188
- Locality-Aware mapping of nested parallel patterns on GPUs
- IEEE Computer Society
- H. Lee, K. J. Brown, A. K. Sujeeth, T. Rompf, and K. Olukotun, "Locality-Aware mapping of nested parallel patterns on GPUs, " in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 63-74, IEEE Computer Society, 2014.
- (2014) Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 63-74
- Lee, H.¹ Brown, K.J.² Sujeeth, A.K.³ Rompf, T.⁴ Olukotun, K.⁵

38
- 70449959487
- CHiLL: A framework for composing high-level loop transformations
- C. Chen, J. Chame, and M. Hall, "CHiLL: A framework for composing high-level loop transformations, " tech. rep., 2008.
- (2008) Tech. Rep
- Chen, C.¹ Chame, J.² Hall, M.³

39
- 44249094647
- Anatomy of high-performance matrix multiplication
- May
- K. Goto and R. A. v. d. Geijn, "Anatomy of high-performance matrix multiplication, " ACM Transactions on Mathematical Software, vol. 34, pp. 12:1-12:25, May 2008.
- (2008) ACM Transactions on Mathematical Software , vol.34 , pp. 121-1225
- Goto, K.¹ Geijn, D.V.A.R.²

40
- 85009397782
- CUB:kernel-level software reuse and library design
- D. Merrill, "CUB:kernel-level software reuse and library design, " in GPU Technology Conference Presentation, 2013.
- (2013) GPU Technology Conference Presentation
- Merrill, D.¹

41
- 84905980170
- Delite: A compiler architecture for performance-oriented embedded domain-specific languages
- A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "Delite: A compiler architecture for performance-oriented embedded domain-specific languages, " ACM Trans. Embed. Comput. Syst., vol. 13, no. 4s, pp. 134:1-134:25, 2014.
- (2014) ACM Trans. Embed. Comput. Syst , vol.13 , Issue.4 , pp. 1341-13425
- Sujeeth, A.K.¹ Brown, K.J.² Lee, H.³ Rompf, T.⁴ Chafi, H.⁵ Odersky, M.⁶ Olukotun, K.⁷

42
- 78650151869
- Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs
- T. Rompf and M. Odersky, "Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs, " in Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, pp. 127-136, 2010.
- (2010) Proceedings of the Ninth International Conference on Generative Programming and Component Engineering , pp. 127-136
- Rompf, T.¹ Odersky, M.²

43
- 84875671819
- Portable performance on heterogeneous architectures
- ACM
- P. M. Phothilimthana, J. Ansel, J. Ragan-Kelley, and S. Amarasinghe, "Portable performance on heterogeneous architectures, " in Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 48, pp. 431-444, ACM, 2013.
- (2013) Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems , vol.48 , pp. 431-444
- Phothilimthana, P.M.¹ Ansel, J.² Ragan-Kelley, J.³ Amarasinghe, S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.