SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Volumn 2015-June, Issue , 2015, Pages 509-520

Efficient execution of recursive programs on commodity vector hardware

(5) Ren, Bin a Jo, Youngjoon b Krishnamoorthy, Sriram a Agrawal, Kunal c Kulkarni, Milind b

a PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

b PURDUE UNIVERSITY (United States)

c WASHINGTON UNIVERSITY (United States)

Author keywords

Recursive programs; Task parallelism; Vectorization

Indexed keywords

COMPUTATIONAL EFFICIENCY; COMPUTATIONAL LINGUISTICS; COMPUTER PROGRAMMING LANGUAGES; COSINE TRANSFORMS; HARDWARE; PROGRAM PROCESSORS; VECTORS;

CODE TRANSFORMATION; COMMODITY HARDWARE; COMMODITY PROCESSORS; RECURSIVE PROGRAMS; RESOURCE UTILIZATIONS; SCHEDULING POLICIES; TASK PARALLELISM; VECTORIZATION;

VECTOR SPACES;

EID: 84951798257 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2737924.2738004 Document Type: Conference Paper

Times cited : (15)

References (37)

1
- 78651324376
- Understanding the efficiency of ray traversal on GPUs
- T. Aila and S. Laine. Understanding the Efficiency of Ray Traversal on GPUs. In HPG'09, pages 145-149, 2009.
- (2009) HPG'09 , pp. 145-149
- Aila, T.¹ Laine, S.²

2
- 77955398216
- Barcelona OpenMP Task Suite (BOTS). Barcelona OpenMP Task Suite (BOTS). https://pm.bsc.es/projects/bots.
- Barcelona OpenMP Task Suite (BOTS)
- Barcelona OpenMP Task Suite (BOTS)¹

3
- 84875205533
- From relational verification to SIMD loop synthesis
- G. Barthe, J. M. Crespo, S. Gulwani, C. Kunz, and M. Marron. From Relational Verification to SIMD Loop Synthesis. In PPoPP'13, pages 123-134, 2013.
- (2013) PPoPP'13 , pp. 123-134
- Barthe, G.¹ Crespo, J.M.² Gulwani, S.³ Kunz, C.⁴ Marron, M.⁵

4
- 0029191296
- Cilk: An efficient multithreaded runtime system
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPOPP'95, pages 207-216, 1995.
- (1995) PPOPP'95 , pp. 207-216
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiserson, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

5
- 84877715459
- Billion-particle SIMD-friendly two-point correlation on largescale HPC cluster systems
- J. Chhugani, C. Kim, H. Shukla, J. Park, P. Dubey, J. Shalf, and H. D. Simon. Billion-particle SIMD-friendly Two-point Correlation on Largescale HPC Cluster Systems. In SC'12, pages 1:1-1:11, 2012.
- (2012) SC'12 , pp. 11-111
- Chhugani, J.¹ Kim, C.² Shukla, H.³ Park, J.⁴ Dubey, P.⁵ Shalf, J.⁶ Simon, H.D.⁷

6
- 84951776710
- Cilk. Cilk. http://supertech.csail.mit.edu/cilk/.
- Cilk
- Cilk¹

7
- 51549087961
- Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays
- H. Dammertz, J. Hanika, and A. Keller. Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays. In EGSR'08, pages 1225-1233, 2008.
- (2008) EGSR'08 , pp. 1225-1233
- Dammertz, H.¹ Hanika, J.² Keller, A.³

8
- 33749253908
- Programming with Exceptions in JCilk
- Dec.
- J. S. Danaher, I.-T. A. Lee, and C. E. Leiserson. Programming with Exceptions in JCilk. Sci. Comput. Program., 63(2):147-171, Dec. 2006.
- (2006) Sci. Comput. Program. , vol.63 , Issue.2 , pp. 147-171
- Danaher, J.S.¹ Lee, A.I.-T.² Leiserson, C.E.³

9
- 0000011164
- A fast computer method for matrix transposing
- July
- J. O. Eklundh. A Fast Computer Method for Matrix Transposing. IEEE Trans. Comput., 21(7):801-803, July 1972.
- (1972) IEEE Trans. Comput. , vol.21 , Issue.7 , pp. 801-803
- Eklundh, J.O.¹

10
- 0347507496
- The implementation of the cilk-5 multithreaded language
- M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In PLDI'98, pages 212-223, 1998.
- (1998) PLDI'98 , pp. 212-223
- Frigo, M.¹ Leiserson, C.E.² Randall, K.H.³

11
- 70449631676
- Reducers and other cilk++ hyperobjects
- M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and Other Cilk++ Hyperobjects. In SPAA'09, pages 79-90, 2009.
- (2009) SPAA'09 , pp. 79-90
- Frigo, M.¹ Halpern, P.² Leiserson, C.E.³ Lewin-Berlin, S.⁴

12
- 84865327496
- Can GPGPU programming be liberated from the data-parallel bottleneck?
- August
- B. Gaster and L. Howes. Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? Computer, 45(8):42-52, August 2012.
- (2012) Computer , vol.45 , Issue.8 , pp. 42-52
- Gaster, B.¹ Howes, L.²

13
- 70450029262
- Work-first and help-first scheduling policies for async-finish task parallelism
- Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and Help-first Scheduling Policies for Async-finish Task Parallelism. In IPDPS'09, pages 1-12, 2009.
- (2009) IPDPS'09 , pp. 1-12
- Guo, Y.¹ Barik, R.² Raman, R.³ Sarkar, V.⁴

14
- 77949653560
- Yet faster ray-triangle intersection (using SSE4)
- May
- J. Havel and A. Herout. Yet Faster Ray-Triangle Intersection (Using SSE4). IEEE Transactions on Visualization and Computer Graphics, 16(3):434-438, May 2010.
- (2010) IEEE Transactions on Visualization and Computer Graphics , vol.16 , Issue.3 , pp. 434-438
- Havel, J.¹ Herout, A.²

15
- 0141427127
- Vectorization of tree traversals
- Mar.
- L. Hernquist. Vectorization of Tree Traversals. J. Comput. Phys., 87(1):137-147, Mar. 1990.
- (1990) J. Comput. Phys. , vol.87 , Issue.1 , pp. 137-147
- Hernquist, L.¹

16
- 84976759390
- Graphinators and the Duality of SIMD and MIMD
- P. Hudak and E. Hohr. Graphinators and the Duality of SIMD and MIMD. In LFP'88, pages 224-234, 1988.
- (1988) LFP'88 , pp. 224-234
- Hudak, P.¹ Hohr, E.²

17
- 84879836252
- Efficient scheduling of recursive control flow on GPUs
- X. Huo, S. Krishnamoorthy, and G. Agrawal. Efficient Scheduling of Recursive Control Flow on GPUs. In ICS'13, pages 409-420, 2013.
- (2013) ICS'13 , pp. 409-420
- Huo, X.¹ Krishnamoorthy, S.² Agrawal, G.³

18
- 84858310773
- Enhancing locality for recursive traversals of recursive structures
- Y. Jo and M. Kulkarni. Enhancing Locality for Recursive Traversals of Recursive Structures. In OOPSLA'11, pages 463-482, 2011.
- (2011) OOPSLA'11 , pp. 463-482
- Jo, Y.¹ Kulkarni, M.²

19
- 84887467173
- Automatic vectorization of tree traversals
- Y. Jo, M. Goldfarb, and M. Kulkarni. Automatic Vectorization of Tree Traversals. In PACT'13, pages 363-374, 2013.
- (2013) PACT'13 , pp. 363-374
- Jo, Y.¹ Goldfarb, M.² Kulkarni, M.³

20
- 77954701719
- FAST: Fast architecture sensitive tree search on modern CPUs and GPUs
- C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In SIGMOD'10, pages 339-350, 2010.
- (2010) SIGMOD'10 , pp. 339-350
- Kim, C.¹ Chhugani, J.² Satish, N.³ Sedlar, E.⁴ Nguyen, A.D.⁵ Kaldewey, T.⁶ Lee, V.W.⁷ Brandt, S.A.⁸ Dubey, P.⁹

21
- 84878542156
- Efficient SIMD code generation for irregular kernels
- S. Kim and H. Han. Efficient SIMD Code Generation for Irregular Kernels. In PPoPP'12, pages 55-64, 2012.
- (2012) PPoPP'12 , pp. 55-64
- Kim, S.¹ Han, H.²

22
- 34547358180
- Efficient parallel out-of-core matrix transposition
- S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C.-C. Lam, and P. Sadayappan. Efficient Parallel Out-of-core Matrix Transposition. International Journal of High Performance Computing and Networking, 2(2):110-119, 2004.
- (2004) International Journal of High Performance Computing and Networking , vol.2 , Issue.2 , pp. 110-119
- Krishnamoorthy, S.¹ Baumgartner, G.² Cociorva, D.³ Lam, C.-C.⁴ Sadayappan, P.⁵

23
- 84863012838
- An evaluation of vectorizing compilers
- S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua. An Evaluation of Vectorizing Compilers. In PACT'11, pages 372-382, 2011.
- (2011) PACT'11 , pp. 372-382
- Maleki, S.¹ Gao, Y.² Garzarán, M.J.³ Wong, T.⁴ Padua, D.A.⁵

24
- 84897807567
- Data-parallel Finite-state Machines
- T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel Finite-state Machines. In ASPLOS'14, pages 529-542, 2014.
- (2014) ASPLOS'14 , pp. 529-542
- Mytkowicz, T.¹ Musuvathi, M.² Schulte, W.³

25
- 63549093768
- Outer-loop Vectorization: Revisited for Short SIMD Architectures
- D. Nuzman and A. Zaks. Outer-loop Vectorization: Revisited for Short SIMD Architectures. In PACT'08, pages 2-11, 2008.
- (2008) PACT'08 , pp. 2-11
- Nuzman, D.¹ Zaks, A.²

26
- 84922773010
- NVIDIA. CUDA. http://www.nvidia.com/object/cuda-home-new.html.
- CUDA
- NVIDIA¹

27
- 38149069665
- UTS: An unbalanced tree search benchmark
- S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An Unbalanced Tree Search Benchmark. In LCPC'06, pages 235-250, 2007.
- (2007) LCPC'06 , pp. 235-250
- Olivier, S.¹ Huan, J.² Liu, J.³ Prins, J.⁴ Dinan, J.⁵ Sadayappan, P.⁶ Tseng, C.-W.⁷

28
- 84951139900
- May
- OpenMP Architecture Review Board. OpenMP Specification and Features. http://openmp.org/wp/, May 2008.
- (2008) OpenMP Specification and Features
- OpenMP Architecture Review Board¹

29
- 84905454859
- Finegrain task aggregation and coordination on GPUs
- M. S. Orr, B. M. Beckmann, S. K. Reinhardt, and D. A. Wood. Finegrain Task Aggregation and Coordination on GPUs. In ISCA'14, pages 181-192, 2014.
- (2014) ISCA'14 , pp. 181-192
- Orr, M.S.¹ Beckmann, B.M.² Reinhardt, S.K.³ Wood, D.A.⁴

30
- 19344368072
- SPIRAL: Code generation for DSP transforms
- M. Puschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE, 93(2):232-275, 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 232-275
- Puschel, M.¹ Moura, J.M.² Johnson, J.R.³ Padua, D.⁴ Veloso, M.M.⁵ Singer, B.W.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gacic, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

31
- 43149087461
- O'Reilly
- J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly, 2007.
- (2007) Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism
- Reinders, J.¹

32
- 84876909157
- SIMD parallelization of applications that traverse irregular data structures
- B. Ren, G. Agrawal, J. R. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD Parallelization of Applications that Traverse Irregular Data Structures. In CGO'13, pages 1-10, 2013.
- (2013) CGO'13 , pp. 1-10
- Ren, B.¹ Agrawal, G.² Larus, J.R.³ Mytkowicz, T.⁴ Poutanen, T.⁵ Schulte, W.⁶

33
- 79951700098
- Improving SIMT efficiency of global rendering algorithms with architectural support for dynamic micro-kernels
- M. Steffen and J. Zambreno. Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels. In MICRO'43, pages 237-248, 2010.
- (2010) MICRO'43 , pp. 237-248
- Steffen, M.¹ Zambreno, J.²

34
- 77952162137
- OpenCL: A parallel programming standard for heterogeneous computing systems
- May
- J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des. Test, 12(3):66-73, May 2010.
- (2010) IEEE Des. Test , vol.12 , Issue.3 , pp. 66-73
- Stone, J.E.¹ Gohara, D.² Shi, G.³

35
- 84951008030
- Oct.
- TPL. The Task Parallel Library. http://msdn. microsoft.com/en-us/magazine/cc163340.aspx, Oct. 2007.
- (2007) The Task Parallel Library
- TPL¹

36
- 84934313374
- Task management for irregularparallel workloads on the GPU
- S. Tzeng, A. Patney, and J. D. Owens. Task Management for Irregularparallel Workloads on the GPU. In HPG'10, pages 29-37, 2010.
- (2010) HPG'10 , pp. 29-37
- Tzeng, S.¹ Patney, A.² Owens, J.D.³

37
- 84867430990
- Mar.
- X10. The X10 Programming Language. www.research.ibm.com/x10/, Mar. 2006.
- (2006) The X10 Programming Language
- X10¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.