SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn , Issue , 2013, Pages 409-420

Efficient scheduling of recursive control flow on GPUs

(3) Huo, Xin a Krishnamoorthy, Sriram b Agrawal, Gagan a

a The Ohio State University (United States)

b PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

Author keywords

GPU; reconvergence methods; recursion; SIMD

Indexed keywords

EFFICIENT SCHEDULING; GPU; GRAPHICS PROCESSING UNITS; HIGH PERFORMANCE COMPUTING; RE CONVERGENCES; RECURSIONS; SIMD; SINGLE INSTRUCTION MULTIPLE THREADS (SIMT);

COMPUTER GRAPHICS; INTELLIGENT CONTROL; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 84879836252 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2464996.2479870 Document Type: Conference Paper

Times cited : (10)

References (30)

1
- 77956773183
- Extending openmp to survive the heterogeneous multi-core era
- E. Ayguadé, R. M. Badia, P. Bellens, D. Cabrera, A. Duran, R. Ferrer, M. González, F. D. Igual, D. Jiménez-González, and J. Labarta. Extending openmp to survive the heterogeneous multi-core era. International Journal of Parallel Programming, 38(5-6):440-459, 2010.
- (2010) International Journal of Parallel Programming , vol.38 , Issue.5-6 , pp. 440-459
- Ayguadé, E.¹ Badia, R.M.² Bellens, P.³ Cabrera, D.⁴ Duran, A.⁵ Ferrer, R.⁶ González, M.⁷ Igual, F.D.⁸ Jiménez-González, D.⁹ Labarta, J.¹⁰

2
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09), pages 163-174, 2009.
- (2009) IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09) , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

3
- 0015330108
- The Illiac IV system
- W. J. Bouknight, S. A. Denenberg, D. E. McIntyre, J. M. Randall, A. H. Sameh, and D. L. Slotnick. The Illiac IV system. Proc. of IEEE, 60(4):369-388, 1972.
- (1972) Proc. of IEEE , vol.60 , Issue.4 , pp. 369-388
- Bouknight, W.J.¹ Denenberg, S.A.² McIntyre, D.E.³ Randall, J.M.⁴ Sameh, A.H.⁵ Slotnick, D.L.⁶

4
- 84864834311
- Simultaneous branch and warp interweaving for sustained GPU performance
- N. Brunie, S. Collange, and G. F. Diamos. Simultaneous branch and warp interweaving for sustained GPU performance. In International Symposium on Computer Architecture (ISCA '12), pages 49-60, 2012.
- (2012) International Symposium on Computer Architecture (ISCA '12) , pp. 49-60
- Brunie, N.¹ Collange, S.² Diamos, G.F.³

5
- 84879815667
- Dynamic task parallelism with a gpu work-stealing runtime system
- S. Chatterjee, M. Grossman, A. S. Sbîrlea, and V. Sarkar. Dynamic task parallelism with a gpu work-stealing runtime system. In International Workshop on Languages and Compilers for Parallel Computing (LCPC'11), pages 203-217, 2011.
- (2011) International Workshop on Languages and Compilers for Parallel Computing (LCPC'11) , pp. 203-217
- Chatterjee, S.¹ Grossman, M.² Sbîrlea, A.S.³ Sarkar, V.⁴

6
- 84863351470
- SIMD re-convergence at thread frontiers
- G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili. SIMD re-convergence at thread frontiers. In IEEE/ACM International Symposium on Microarchitecture (MICRO '11), pages 477-488, 2011.
- (2011) IEEE/ACM International Symposium on Microarchitecture (MICRO '11) , pp. 477-488
- Diamos, G.¹ Ashbaugh, B.² Maiyuran, S.³ Kerr, A.⁴ Wu, H.⁵ Yalamanchili, S.⁶

7
- 77951455429
- Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP
- A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In International Conference on Parallel Processing (ICPP '09), pages 124-131, 2009.
- (2009) International Conference on Parallel Processing (ICPP '09) , pp. 124-131
- Duran, A.¹ Teruel, X.² Ferrer, R.³ Martorell, X.⁴ Ayguade, E.⁵

8
- 84879816910
- Dwarf mine. http://view.eecs.berkeley.edu/wiki/Dwarf\-Mine.
- Dwarf Mine

9
- 34249839187
- The area of the mandelbrot set
- J. Ewing and G. Schober. The area of the mandelbrot set. Numerische Mathematik, 61(1):59-72, 1992.
- (1992) Numerische Mathematik , vol.61 , Issue.1 , pp. 59-72
- Ewing, J.¹ Schober, G.²

10
- 84966570185
- A SIMD vectorizing compiler for digital signal processing algorithms
- F. Franchetti and M. Puschel. A SIMD vectorizing compiler for digital signal processing algorithms. In International Parallel and Distributed Processing Symposium (IPDPS '02), pages 20-26, 2002.
- (2002) International Parallel and Distributed Processing Symposium (IPDPS '02) , pp. 20-26
- Franchetti, F.¹ Puschel, M.²

11
- 79955923056
- Thread block compaction for efficient SIMT control flow
- W. W. L. Fung and T. M. Aamodt. Thread block compaction for efficient SIMT control flow. In IEEE International Symposium on High Performance Computer Architecture (HPCA '11), pages 25-36, 2011.
- (2011) IEEE International Symposium on High Performance Computer Architecture (HPCA '11) , pp. 25-36
- Fung, W.W.L.¹ Aamodt, T.M.²

12
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In IEEE/ACM International Symposium on Microarchitecture (MICRO '07), pages 407-420, 2007.
- (2007) IEEE/ACM International Symposium on Microarchitecture (MICRO '07) , pp. 407-420
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

13
- 84947255974
- Vectorization of multigrid codes using SIMD ISA extensions
- C. Garcia, R. Lario, M. Prieto, L. Pinuel, and F. Tirado. Vectorization of multigrid codes using SIMD ISA extensions. In International Parallel and Distributed Processing Symposium (IPDPS '03), pages 8-pp, 2003.
- (2003) International Parallel and Distributed Processing Symposium (IPDPS '03) , pp. 8
- Garcia, C.¹ Lario, R.² Prieto, M.³ Pinuel, L.⁴ Tirado, F.⁵

14
- 84892549898
- GPGPU-Sim 3.x manual. http://gpgpu-sim.org/manual/index.php5/GPGPU-Sim-3. x-Manual#Introduction.
- GPGPU-Sim 3.X Manual

15
- 0034228634
- 2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing
- Jul
- N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, A. Kunimatsu, T. Sato, T. Kamei, T. Okada, and M. Suzuoki. 2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing. IEEE Journal of Solid-State Circuits, 35(7):1025-1033, Jul 2000.
- (2000) IEEE Journal of Solid-State Circuits , vol.35 , Issue.7 , pp. 1025-1033
- Ide, N.¹ Hirano, M.² Endo, Y.³ Yoshioka, S.⁴ Murakami, H.⁵ Kunimatsu, A.⁶ Sato, T.⁷ Kamei, T.⁸ Okada, T.⁹ Suzuoki, M.¹⁰

16
- 4544235747
- Graph coloring algorithms
- W. Klotz. Graph coloring algorithms. Mathematics Report, pages 1-9, 2002.
- (2002) Mathematics Report , pp. 1-9
- Klotz, W.¹

17
- 0004671788
- Textbook examples of recursion
- Academic Press, San Diego, CA
- D. E. Knuth. Textbook examples of recursion. Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, Academic Press, San Diego, CA, pages 207-229, 1991.
- (1991) Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy , pp. 207-229
- Knuth, D.E.¹

18
- 0003657590
- 3rd ed. fundamental algorithms . Addison Wesley Longman Publishing Co., Inc.
- D. E. Knuth. The art of computer programming, volume 1: (3rd ed.) fundamental algorithms . Addison Wesley Longman Publishing Co., Inc., 1997.
- (1997) The Art of Computer Programming , vol.1
- Knuth, D.E.¹

19
- 0021458622
- Chap - A SIMD graphics processor
- A. Levinthal and T. Porter. Chap - a SIMD graphics processor. In Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84), pages 77-82, 1984.
- (1984) Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84) , pp. 77-82
- Levinthal, A.¹ Porter, T.²

20
- 77955001720
- Method for conditional branch execution in simd vector processors
- Mar. 6 US Patent 4,435,758
- R. A. Lorie and H. R. Strong Jr. Method for conditional branch execution in simd vector processors, Mar. 6 1984. US Patent 4,435,758.
- (1984)
- Lorie, R.A.¹ Strong Jr., H.R.²

21
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In International Symposium on Computer architecture (ISCA '10), pages 235-246, 2010.
- (2010) International Symposium on Computer Architecture (ISCA '10) , pp. 235-246
- Meng, J.¹ Tarjan, D.² Skadron, K.³

22
- 84863342255
- Improving gpu performance via large warps and two-level warp scheduling
- V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt. Improving gpu performance via large warps and two-level warp scheduling. In IEEE/ACM International Symposium on Microarchitecture (MICRO '11), pages 308-317, 2011.
- (2011) IEEE/ACM International Symposium on Microarchitecture (MICRO '11) , pp. 308-317
- Narasiman, V.¹ Shebanow, M.² Lee, C.J.³ Miftakhutdinov, R.⁴ Mutlu, O.⁵ Patt, Y.N.⁶

23
- 84879820077
- NVIDIA. August
- NVIDIA. Cuda dynamic parallelism programming guide. August 2012.
- (2012) Cuda Dynamic Parallelism Programming Guide

24
- 84880298026
- The dual-path execution model for efficient GPU control flow
- M. Rhu and M. Erez. The dual-path execution model for efficient GPU control flow. In IEEE International Symposium on High Performance Computer Architecture (HPCA '13), 2013.
- IEEE International Symposium on High Performance Computer Architecture (HPCA '13), 2013
- Rhu, M.¹ Erez, M.²

25
- 0033743209
- Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors
- T. Rognes and E. Seeberg. Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics, 16(8):699-706, 2000.
- (2000) Bioinformatics , vol.16 , Issue.8 , pp. 699-706
- Rognes, T.¹ Seeberg, E.²

26
- 84870721374
- Top 500 supercomputers. http://www.top500.org/lists/2012/11/.
- Top 500 Supercomputers

27
- 30744459395
- RPU: A programmable ray processing unit for realtime ray tracing
- S. Woop, J. Schmittler, and P. Slusallek. RPU: a programmable ray processing unit for realtime ray tracing. In ACM SIGGRAPH 2005 Papers, SIGGRAPH '05, pages 434-444, 2005.
- (2005) ACM SIGGRAPH 2005 Papers, SIGGRAPH '05 , pp. 434-444
- Woop, S.¹ Schmittler, J.² Slusallek, P.³

28
- 70350600765
- Stack-based parallel recursion on graphics processors
- K. Yang, B. He, Q. Luo, P. V. Sander, and J. Shi. Stack-based parallel recursion on graphics processors. In ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 299-300, 2009.
- (2009) ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09 , pp. 299-300
- Yang, K.¹ He, B.² Luo, Q.³ Sander, P.V.⁴ Shi, J.⁵

29
- 84860322837
- CPU-assisted GPGPU on fused CPU-GPU architectures
- IEEE
- Y. Yang, P. Xiang, M. Mantor, and H. Zhou. CPU-assisted GPGPU on fused CPU-GPU architectures. In IEEE International Symposium on High Performance Computer Architecture (HPCA '12), pages 1-12. IEEE, 2012.
- (2012) IEEE International Symposium on High Performance Computer Architecture (HPCA '12) , pp. 1-12
- Yang, Y.¹ Xiang, P.² Mantor, M.³ Zhou, H.⁴

30
- 0029182293
- Translation of serial recursive codes to parallel SIMD codes
- A. Youssef. Translation of serial recursive codes to parallel SIMD codes. In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, pages 254-263, 1995.
- (1995) IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques , pp. 254-263
- Youssef, A.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.