SCOPUS 정보 검색 플랫폼

ACM SIGPLAN Notices

Volumn 47, Issue 9, 2012, Pages 247-258

Nested data-parallelism on the GPU

(2) Bergstrom, Lars a Reppy, John a

a UNIVERSITY OF CHICAGO (United States)

Author keywords

GPGPU; GPU; NESL; Nested data parallelism

Indexed keywords

ARITHMETIC PERFORMANCE; DATA PARALLELISM; DATA-LEVEL PARALLELISM; DIVIDE-AND-CONQUER ALGORITHM; EMPIRICAL EVIDENCE; FIRST-ORDER FUNCTIONAL LANGUAGES; GPGPU; GPU; GRAPHICS PROCESSING UNITS; LANGUAGE IMPLEMENTATIONS; MEMORY BANDWIDTHS; NESL; NESTED DATA; PARALLEL COMPUTER;

COMPUTER GRAPHICS; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 84870410502 PISSN: 15232867 EISSN: None Source Type: Journal
DOI: 10.1145/2398856.2364563 Document Type: Conference Paper

Times cited : (17)

References (38)

1
- 0025545476
- VCODE: A data-parallel intermediate language
- BC90
- [BC90] Blelloch, G. and S. Chatterjee. VCODE: A data-parallel intermediate language. In FOMPC3, 1990, pp. 471-480.
- (1990) FOMPC3 , pp. 471-480
- Blelloch, G.¹ Chatterjee, S.²

2
- 84867541841
- BC93
- [BC93] Blelloch, G. and S. Chatterjee. CVL: A C vector language, 1993.
- (1993) CVL: A C Vector Language
- Blelloch, G.¹ Chatterjee, S.²

3
- 43949161602
- Implementation of a portable nested data-parallel language
- BCH+94
- [BCH+94] Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. JPDC, 21(1), 1994, pp. 4-14.
- (1994) JPDC , vol.21 , Issue.1 , pp. 4-14
- Blelloch, E.G.¹ Chatterjee, S.² Hardwick, J.C.³ Sipelstein, J.⁴ Zagha, M.⁵

4
- 0030381077
- The quickhull algorithm for convex hulls
- BDH96
- [BDH96] Barber, C. B., D. P. Dobkin, and H. Huhdanpaa. The quickhull algorithm for convex hulls. ACM TOMS, 22(4), 1996, pp. 469-483.
- (1996) ACM TOMS , vol.22 , Issue.4 , pp. 469-483
- Barber, B.C.¹ Dobkin, D.P.² Huhdanpaa, H.³

5
- 78249233242
- Lazy tree splitting
- [BFR+10]. ACM, September
- [BFR+10] Bergstrom, L., M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Lazy tree splitting. In ICFP '10. ACM, September 2010, pp. 93-104.
- (2010) ICFP '10 , pp. 93-104
- Bergstrom, L.¹ Fluet, M.² Rainey, M.³ Reppy, J.⁴ Shaw, A.⁵

6
- 33846349887
- A hierarchical O(N log N) force calculation algorithm
- [BH86]. 324, December
- [BH86] Barnes, J. and P. Hut. A hierarchical O(N logN) force calculation algorithm. Nature, 324, December 1986, pp. 446-449.
- (1986) Nature , pp. 446-449
- Barnes, J.¹ Hut, P.²

7
- 0030105185
- Programming parallel algorithms
- [Ble96], , March
- [Ble96] Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85-97.
- (1996) CACM , vol.39 , Issue.3 , pp. 85-97
- Blelloch, E.G.¹

8
- 84858427151
- An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm
- [BP11]. In, chapter 6, pp. 75-92. Elsevier Science Publishers, New York, NY
- [BP11] Burtscher, M. and K. Pingali. An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. In GPU Computing Gems Emerald Edition, chapter 6, pp. 75-92. Elsevier Science Publishers, New York, NY, 2011.
- (2011) GPU Computing Gems Emerald Edition
- Burtscher, M.¹ Pingali, K.²

9
- 85015692260
- The pricing of options and corporate liabilities
- [BS73]
- [BS73] Black, F. and M. Scholes. The pricing of options and corporate liabilities. JPE, 81(3), 1973, pp. 637-654.
- (1973) JPE , vol.81 , Issue.3 , pp. 637-654
- Black, F.¹ Scholes, M.²

10
- 0025380943
- Compiling collection-oriented languages onto massively parallel computers
- BS90
- [BS90] Blelloch, G. E. and G.W. Sabot. Compiling collection-oriented languages onto massively parallel computers. JPDC, 8(2), 1990, pp. 119-134.
- (1990) JPDC , vol.8 , Issue.2 , pp. 119-134
- Blelloch, E.G.¹ Sabot, G.W.²

11
- 84862632175
- GPU programming in a high level language compiling X10 to CUDA
- [CBS11]. In, San Jose, CA, May. http://x10-lang.org
- [CBS11] Cunningham, D., R. Bordawekar, and V. Saraswat. GPU programming in a high level language compiling X10 to CUDA. In X10'11, San Jose, CA, May 2011. Available from http://x10-lang.org/.
- (2011) X10'11
- Cunningham, D.¹ Bordawekar, R.² Saraswat, V.³

12
- 79952784184
- Copperhead: Compiling an embedded data parallel language
- [CGK11]. In, San Antonio, TX, February. ACM
- [CGK11] Catanzaro, B., M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPoPP '11, San Antonio, TX, February 2011. ACM, pp. 47-56.
- (2011) PPoPP '11 , pp. 47-56
- Catanzaro, B.¹ Garland, M.² Keutzer, K.³

13
- 0027632582
- Compiling nested data-parallel programs for shared-memory multiprocessors
- [Cha93] July
- [Cha93] Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400-462.
- (1993) ACM TOPLAS , vol.15 , Issue.3 , pp. 400-462
- Chatterjee, S.¹

14
- 79952136178
- Accelerating Haskell array codes with multicore GPUs
- [CKL+11]. In, Austin, January. ACM
- [CKL+11] Chakravarty, M. M., G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In DAMP '11, Austin, January 2011. ACM, pp. 3-14.
- (2011) DAMP '11 , pp. 3-14
- Chakravarty, M.M.¹ Keller, G.² Lee, S.³ McDonell, T.L.⁴ Grover, V.⁵

15
- 84937389888
- Nepal - Nested data parallelism in Haskell
- [CKLP01]. of LNCS. Springer-Verlag, August
- [CKLP01] Chakravarty, M. M. T., G. Keller, R. Leshchinskiy, and W. Pfannenstiel. Nepal - nested data parallelism in Haskell. In Euro-Par '01, vol. 2150 of LNCS. Springer-Verlag, August 2001, pp. 524-534.
- (2001) Euro-Par '01 , vol.2150 , pp. 524-534
- Chakravarty, T.M.M.¹ Keller, G.² Leshchinskiy, R.³ Pfannenstiel, W.⁴

16
- 79551658111
- Partial vectorisation of Haskell programs
- [CLPK08]. In. ACM, January, pp.. Available from
- [CLPK08] Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial vectorisation of Haskell programs. In DAMP '08. ACM, January 2008, pp. 2-16. Available from http://clip.dia.fi.upm.es/Conferences/DAMP08/.
- (2008) DAMP '08 , pp. 2-16
- Chakravarty, T.M.M.¹ Leshchinskiy, R.² Peyton Jones, S.³ Keller, G.⁴

17
- 84872376298
- A new method for GPU based irregular reductions and its application to k-means clustering
- [DR11]. In, Newport Beach, California, March. ACM
- [DR11] Dhanasekaran, B. and N. Rubin. A new method for GPU based irregular reductions and its application to k-means clustering. In GPGPU-4, Newport Beach, California, March 2011. ACM.
- (2011) GPGPU-4
- Dhanasekaran, B.¹ Rubin, N.²

18
- 12744262557
- Threaded code variations and optimizations
- [Ert01] . In, Schloss Dagstuhl, Germany, November. pp.. Available from
- [Ert01] Ertl, M. A. Threaded code variations and optimizations. In EuroForth 2001, Schloss Dagstuhl, Germany, November 2001. pp. 49-55. Available from http://www.complang. tuwien.ac.at/papers/.
- (2001) EuroForth 2001 , pp. 49-55
- Ertl, A.M.¹

19
- 84867517229
- Technical Report TRA1/12, [GCN+12] National University of Singapore, School of Computing, January
- [GCN+12] Gao, M., T.-T. Cao, A. Nanjappa, T.-S. Tan, and Z. Huang. A GPU Algorithm for Convex Hull. Technical Report TRA1/12, National University of Singapore, School of Computing, January 2012.
- (2012) A GPU Algorithm for Convex Hull
- Gao, M.¹ Cao, T.-T.² Nanjappa, A.³ Tan, T.-S.⁴ Huang, Z.⁵

20
- 84870435416
- GHC
- [GHC]
- [GHC] GHC. The Glasgow Haskell Compiler. Available from http://www.haskell.org/ghc.
- The Glasgow Haskell Compiler

21
- 33747508171
- SAC - A Functional Array Language for Efficient Multi-threaded Execution
- [GS06] August
- [GS06] Grelck, C. and S.-B. Scholz. SAC - A Functional Array Language for Efficient Multi-threaded Execution. IJPP, 34(4), August 2006, pp. 383-427.
- (2006) IJPP , vol.34 , Issue.4 , pp. 383-427
- Grelck, C.¹ Scholz, S.-B.²

22
- 79952162843
- Breaking the GPU programming barrier with the auto-parallelising SAC compiler
- [GTS11]. In, Austin, January. ACM
- [GTS11] Guo, J., J. Thiyagalingam, and S.-B. Scholz. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In DAMP '11, Austin, January 2011. ACM, pp. 15-24.
- (2011) DAMP '11 , pp. 15-24
- Guo, J.¹ Thiyagalingam, J.² Scholz, S.-B.³

23
- 84882564541
- Thrust: A productivity-oriented library for CUDA
- [HB11]. In W.W. Hwu (ed.), chapter 26, pp. 359-372. Morgan Kaufmann Publishers, October
- [HB11] Hoberock, J. and N. Bell. Thrust: A productivity-oriented library for CUDA. InW.W. Hwu (ed.), GPU Computing Gems, Jade Edition, chapter 26, pp. 359-372. Morgan Kaufmann Publishers, October 2011.
- (2011) GPU Computing Gems, Jade Edition
- Hoberock, J.¹ Bell, N.²

24
- 38849195846
- [Kel99] . Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany
- [Kel99] Keller, G. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 1999.
- (1999) Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines
- Keller, G.¹

25
- 84870456255
- Khronos open CL working group
- [Khr11]., November. Available from
- [Khr11] Khronos OpenCL Working Group. OpenCL 1.2 Specification, November 2011. Available from http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.
- (2011) OpenCL 1.2 Specification

26
- 79952182078
- Simple optimizations for an applicative array language for graphics processors
- [Lar11] . In, Austin, January. ACM
- [Lar11] Larsen, B. Simple optimizations for an applicative array language for graphics processors. In DAMP '11, Austin, January 2011. ACM, pp. 25-34.
- (2011) DAMP '11 , pp. 25-34
- Larsen, B.¹

27
- 33746637093
- Higher order flattening
- [LCK06]. In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), number 3992 in LNCS. Springer- Verlag, May
- [LCK06] Leshchinskiy, R., M. M. T. Chakravarty, and G. Keller. Higher order flattening. In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), ICCS '06, number 3992 in LNCS. Springer- Verlag, May 2006, pp. 920-928.
- (2006) ICCS '06 , pp. 920-928
- Leshchinskiy, R.¹ Chakravarty, M.M.T.² Keller, G.³

28
- 33746593471
- [Les05] . Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany
- [Les05] Leshchinskiy, R. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 2005.
- (2005) Higher-Order Nested Data Parallelism: Semantics and Implementation
- Leshchinskiy, R.¹

29
- 84858391043
- Scalable GPU graph traversal
- [MGG12]. In, New Orleans, LA, February. ACM
- [MGG12] Merrill, D., M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 117-128.
- (2012) PPoPP '12 , pp. 117-128
- Merrill, D.¹ Garland, M.² Grimshaw, A.³

30
- 84858374841
- A GPU implementation of inclusion-based points-to analysis
- [MLBP12]. In, New Orleans, LA, February. ACM
- [MLBP12] Mendez-Lojo, M., M. Burtscher, and K. Pingali. A GPU implementation of inclusion-based points-to analysis. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 107-116.
- (2012) PPoPP '12 , pp. 107-116
- Mendez-Lojo, M.¹ Burtscher, M.² Pingali, K.³

31
- 78249272964
- Nikola: Embedding compiled GPU functions in Haskell
- [MM10]. In, Baltimore, MD, September. ACM
- [MM10] Mainland, G. and G. Morrisett. Nikola: Embedding compiled GPU functions in Haskell. In HASKELL '10, Baltimore, MD, September 2010. ACM, pp. 67-78.
- (2010) HASKELL '10 , pp. 67-78
- Mainland, G.¹ Morrisett, G.²

32
- 84870408716
- [NVI11a
- [NVI11a] NVIDIA. NVIDIA CUDA C Best Practices Guide, 2011.
- (2011) NVIDIA. NVIDIA CUDA C Best Practices Guide

33
- 35948991669
- [NVI11b], Available from
- [NVI11b] NVIDIA. NVIDIA CUDA C Programming Guide, 2011. Available from http://developer.nvidia. com/category/zone/cuda-zone.
- (2011) NVIDIA. NVIDIA CUDA C Programming Guide

34
- 77956373685
- OptiX: A general purpose ray tracing engine
- [PBD+10], 29, July
- [PBD+10] Parker, S. G., J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich. OptiX: a general purpose ray tracing engine. ACM TOG, 29, July 2010.
- (2010) ACM TOG
- Parker, G.S.¹ Bigler, J.² Dietrich, A.³ Friedrich, H.⁴ Hoberock, J.⁵ Luebke, D.⁶ McAllister, D.⁷ McGuire, M.⁸ Morley, K.⁹ Robison, A.¹⁰ Stich, M.¹¹

35
- 0029196596
- Work-efficient nested data-parallelism
- [PPW95]. In. IEEE Computer Society
- [PPW95] Palmer, D. W., J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In FoMPP5. IEEE Computer Society Press, 1995, pp. 186-193.
- (1995) FoMPP5 , pp. 186-193
- Palmer, W.D.¹ Prins, J.F.² Westfold, S.³

36
- 0029204372
- Optimizing an ANSI C interpreter with superoperators
- [Pro95] . In, San Francisco, January. ACM. pp
- [Pro95] Proebsting, T. A. Optimizing an ANSI C interpreter with superoperators. In POPL '95, San Francisco, January 1995. ACM. pp. 322-332.
- (1995) POPL '95 , pp. 322-332
- Proebsting, A.T.¹

37
- 78651284120
- Scan primitives for GPU computing
- [SHZO07]. In, San Diego, CA, August. Eurographics Association
- [SHZO07] Sengupta, S., M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In GH '07, San Diego, CA, August 2007. Eurographics Association, pp. 97-106.
- (2007) GH '07 , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

38
- 67650065270
- Stackbased parallel recursion on graphics processors
- [YHL+09]. In, Raleigh, NC, February. ACM
- [YHL+09] Yang, K., B. He, Q. Luo, P. V. Sander, and J. Shi. Stackbased parallel recursion on graphics processors. In PPoPP '09, Raleigh, NC, February 2009. ACM, pp. 299-300.
- (2009) PPoPP '09 , pp. 299-300
- Yang, K.¹ He, B.² Luo, Q.³ Sander, P.V.⁴ Shi, J.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.