SCOPUS 정보 검색 플랫폼

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Volumn , Issue , 2011, Pages 381-392

Sponge: Portable stream programming on graphics engines

(5) Hormati, Amir a Samadi, Mehrzad a Woh, Mark a Mudge, Trevor a Mahlke, Scott a

a UNIVERSITY OF MICHIGAN (United States)

Author keywords

Compiler; GPU; Optimization; Portability; Streaming

Indexed keywords

COMPILER; COMPILER OPTIMIZATIONS; GPU; GPU PROGRAMMING; GRAPHICS ENGINE; GRAPHICS PROCESSING UNITS; HARDWARE DIFFERENCES; HIGH PERFORMANCE COMPUTATION; LOW COSTS; MEMORY HIERARCHY; NON-TRIVIAL TASKS; PERFORMANCE OPTIMIZATIONS; PORTABILITY; PROGRAMMING LANGUAGE; SOFTWARE PARADIGM; STREAM PROGRAMMING; STREAMING; SYNCHRONOUS DATA FLOW; THREADING MODEL; TIME-CONSUMING TASKS; WRITE ONCE;

ALGORITHMS; OPTIMIZATION; PROGRAM COMPILERS;

COMPUTER GRAPHICS EQUIPMENT;

EID: 79953071805 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1950365.1950409 Document Type: Conference Paper

Times cited : (74)

References (30)

1
- 10644248153
- Brook for GPUs: Stream computing on graphics hardware
- Aug.
- I. Buck et al. Brook for GPUs: Stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777-786, Aug. 2004.
- (2004) ACM Transactions on Graphics , vol.23 , Issue.3 , pp. 777-786
- Buck, I.¹

2
- 77952565941
- Weak execution ordering - Exploiting iterative methods on many-core gpus
- J. Chen, Z. Huang, F. Su, J.-K. Peir, J. Ho, and L. Peng. Weak execution ordering - exploiting iterative methods on many-core gpus. In Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software, pages 154-163, 2010.
- (2010) Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software , pp. 154-163
- Chen, J.¹ Huang, Z.² Su, F.³ Peir, J.-K.⁴ Ho, J.⁵ Peng, L.⁶

3
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 83, 2006.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 83
- Fatahalian, K.¹ Horn, D.R.² Knight, T.J.³ Leem, L.⁴ Houston, M.⁵ Park, J.Y.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

4
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 407-420, 2007.
- (2007) Proc. of the 40th Annual International Symposium on Microarchitecture , pp. 407-420
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

5
- 34547423880
- Exploiting coarsegrained task, data, and pipeline parallelism in stream programs
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 151-162, 2006.
- (2006) 14th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 151-162
- Gordon, M.I.¹ Thies, W.² Amarasinghe, S.³

6
- 0036959649
- A stream compiler for communication-exposed architectures
- DOI 10.1145/635508.605428
- M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291-303, Oct. 2002. (Pubitemid 44892241)
- (2002) Operating Systems Review (ACM) , vol.36 , Issue.5 , pp. 291-303
- Gordon, M.I.¹ Thies, W.² Karczmarek, M.³ Lin, J.⁴ Meli, A.S.⁵ Lamb, A.A.⁶ Leger, C.⁷ Wong, J.⁸ Hoffmann, H.⁹ Maze, D.¹⁰ Amarasinghe, S.¹¹

7
- 0026255448
- Compile-time scheduling and assignment of data-flow program graphs with data-dependent iteration
- S. Ha and E. A. Lee. Compile-time scheduling and assignment of data-flow program graphs with data-dependent iteration. IEEE Transactions on Computers, 40(11):1225-1238, 1991.
- (1991) IEEE Transactions on Computers , vol.40 , Issue.11 , pp. 1225-1238
- Ha, S.¹ Lee, E.A.²

8
- 79952031801
- Hicuda: High-level gpgpu programming
- T. Han and T. Abdelrahman. hicuda: High-level gpgpu programming. IEEE Transactions on Parallel and Distributed Systems, (99):1-1, 2010.
- (2010) IEEE Transactions on Parallel and Distributed Systems , vol.99 , pp. 1-1
- Han, T.¹ Abdelrahman, T.²

9
- 70450231944
- An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In Proc. of the 36th Annual International Symposium on Computer Architecture, pages 152-163, 2009.
- (2009) Proc. of the 36th Annual International Symposium on Computer Architecture , pp. 152-163
- Hong, S.¹ Kim, H.²

10
- 63349092007
- Optimus: Efficient realization of streaming applications on FPGAs
- Oct.
- A. Hormati, M. Kudlur, D. Bacon, S. Mahlke, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In Proc. of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 41-50, Oct. 2008.
- (2008) Proc. of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems , pp. 41-50
- Hormati, A.¹ Kudlur, M.² Bacon, D.³ Mahlke, S.⁴ Rabbah, R.⁵

11
- 70449669477
- Flextream: Adaptive compilation of streaming applications for heterogeneous architectures
- A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In Proc. of the 18th International Conference on Parallel Architectures and Compilation Techniques, pages 214-223, 2009.
- (2009) Proc. of the 18th International Conference on Parallel Architectures and Compilation Techniques , pp. 214-223
- Hormati, A.H.¹ Choi, Y.² Kudlur, M.³ Rabbah, R.⁴ Mudge, T.⁵ Mahlke, S.⁶

12
- 77952252026
- Macross: Macro-simdization of streaming applications
- A. H. Hormati, Y. Choi,M.Woh,M. Kudlur, T.Mudge, and S.Mahlke. Macross: Macro-simdization of streaming applications. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 285-296, 2010.
- (2010) 18th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 285-296
- Hormati, A.H.¹ Choi, Y.² Who, M.³ Kudlur, M.⁴ Mudge, T.⁵ Mahlke, S.⁶

13
- 74349092397
- KHRONOS Group
- KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2010.
- (2010) OpenCL - The Open Standard for Parallel Programming of Heterogeneous Systems

14
- 57349172999
- Orchestrating the execution of stream programs on multicore platforms
- June
- M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proc. of the '08 Conference on Programming Language Design and Implementation, pages 114-124, June 2008.
- (2008) Proc. of the '08 Conference on Programming Language Design and Implementation , pp. 114-124
- Kudlur, M.¹ Mahlke, S.²

15
- 84939698077
- Synchronous data flow
- E. Lee and D. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235-1245, 1987.
- (1987) Proceedings of the IEEE , vol.75 , Issue.9 , pp. 1235-1245
- Lee, E.¹ Messerschmitt, D.²

16
- 67650081010
- Openmp to gpgpu: A compiler framework for automatic translation and optimization
- S. Lee, S.-J. Min, and R. Eigenmann. Openmp to gpgpu: a compiler framework for automatic translation and optimization. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 101-110, 2009.
- (2009) Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 101-110
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

17
- 77954995885
- Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
- V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of the 37th Annual International Symposium on Computer Architecture, pages 451-460, 2010.
- (2010) Proc. of the 37th Annual International Symposium on Computer Architecture , pp. 451-460
- Lee, V.W.¹ Kim, C.² Chhugani, J.³ Deisher, M.⁴ Kim, D.⁵ Nguyen, A.D.⁶ Satish, N.⁷ Smelyanskiy, M.⁸ Chennupaty, S.⁹ Hammarlund, P.¹⁰ Singhal, R.¹¹ Dubey, P.¹²

18
- 77953983400
- Cg: A system for programming graphics hardware in a C-like language
- July
- th International Conference on Computer Graphics and Interactive Techniques, pages 893-907, July 2003.
- (2003) th International Conference on Computer Graphics and Interactive Techniques , pp. 893-907
- Mark, W.¹ Glanville, R.² Akeley, K.³ Kilgard, J.⁴

19
- 34547309668
- June
- NVIDIA. CUDA Programming Guide, June 2007. http://developer.download. nvidia.com/compute/cuda.
- (2007) CUDA Programming Guide

20
- 77951900491
- NVIDIA. Fermi: Nvidias next generation cuda compute architecture, 2009. http://www.nvidia.com/content/PDF/fermi-whitepapers/NVIDIA-Fermi-Compute- Architecture-Whitepaper.pdf.
- (2009) Fermi: Nvidias Next Generation Cuda Compute Architecture

21
- 79953108478
- NVIDIA. Gpus are only up to 14 times faster than cpus says intel, 2010. http://blogs.nvidia.com/ntersect/2010/06/gpus-are-only-up-to-14-times-faster- than-cpus-says-intel.html.
- (2010) Gpus Are only Up to 14 Times Faster Than Cpus Says Intel

22
- 33846545187
- A hierarchical multiprocessor scheduling framework for synchronous dataflow graphs
- University of California, Berkeley, May
- J. L. Pino, S. S. Bhattacharyya, and E. A. Lee. A hierarchical multiprocessor scheduling framework for synchronous dataflow graphs. Technical Report UCB/ERL M95/36, University of California, Berkeley, May 1995.
- (1995) Technical Report UCB/ERL M95/36
- Pino, J.L.¹ Bhattacharyya, S.S.² Lee, E.A.³

23
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded gpu using cuda
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. meiW. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
- (2008) Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Mei, W.⁶ Hwu, W.⁷

24
- 58449109179
- Mcuda: An efficient implementation of cuda kernels for multi-core cpus
- J. A. Stratton, S. S. Stone, and W.-M. W. Hwu. Mcuda: An efficient implementation of cuda kernels for multi-core cpus. In Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 16-30, 2008.
- (2008) Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 16-30
- Stratton, J.A.¹ Stone, S.S.² Hwu, W.-M.W.³

25
- 78149262760
- An empirical characterization of stream programs and its implications for language and compiler design
- To Appear
- W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. of the 19th International Conference on Parallel Architectures and Compilation Techniques, page To Appear, 2010.
- (2010) Proc. of the 19th International Conference on Parallel Architectures and Compilation Techniques
- Thies, W.¹ Amarasinghe, S.²

26
- 84959045524
- StreamIt: A language for streaming applications
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179-196, 2002.
- (2002) Proc. of the 2002 International Conference on Compiler Construction , pp. 179-196
- Thies, W.¹ Karczmarek, M.² Amarasinghe, S.P.³

27
- 67650563116
- Software pipelined execution of stream programs on gpus
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on gpus. In Proc. of the 2009 International Symposium on Code Generation and Optimization, pages 200-209, 2009.
- (2009) Proc. of the 2009 International Symposium on Code Generation and Optimization , pp. 200-209
- Udupa, A.¹ Govindarajan, R.² Thazhuthaveetil, M.J.³

28
- 57349101237
- Data and computation transformations for brook streaming applications on multiprocessors
- S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for brook streaming applications on multiprocessors. Proc. of the 2006 International Symposium on Code Generation and Optimization, 0(1):196-207, 2006.
- (2006) Proc. of the 2006 International Symposium on Code Generation and Optimization , Issue.1 , pp. 196-207
- Liao, S.W.¹ Du, Z.² Wu, G.³ Lueh, G.-Y.⁴

29
- 77954691442
- A gpgpu compiler for memory optimization and parallelism management
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A gpgpu compiler for memory optimization and parallelism management. In Proc. of the '10 Conference on Programming Language Design and Implementation, pages 86-97, 2010.
- (2010) Proc. of the '10 Conference on Programming Language Design and Implementation , pp. 86-97
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

30
- 58449127539
- Cuda-lite: Reducing gpu programming complexity
- S. zee Ueng, M. Lathara, S. S. Baghsorkhi, and W. mei W. Hwu. Cuda-lite: Reducing gpu programming complexity. In Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing, pages 1-15, 2008.
- (2008) Proc. of the 21st Workshop on Languages and Compilers for Parallel Computing , pp. 1-15
- Ueng, S.Z.¹ Lathara, M.² Baghsorkhi, S.S.³ Mei, W.⁴ Hwu, W.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.