SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN International Conference on Functional Programming, ICFP

Volumn , Issue , 2012, Pages 247-258

Nested data-parallelism on the gpu

(2) Bergstrom, Lars a Reppy, John a

a UNIVERSITY OF CHICAGO (United States)

Author keywords

gpgpu; gpu; nesl; nested data parallelism

Indexed keywords

ARITHMETIC PERFORMANCE; DATA PARALLELISM; DATA-LEVEL PARALLELISM; DIVIDE-AND-CONQUER ALGORITHM; EMPIRICAL EVIDENCE; FIRST-ORDER FUNCTIONAL LANGUAGES; GPGPU; GPU; GRAPHICS PROCESSING UNITS; LANGUAGE IMPLEMENTATIONS; MEMORY BANDWIDTHS; NESL; NESTED DATA; PARALLEL COMPUTER;

COMPUTER GRAPHICS; COMPUTER PROGRAMMING LANGUAGES; FUNCTIONAL PROGRAMMING; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 84867546922 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2364527.2364563 Document Type: Conference Paper

Times cited : (38)

References (38)

1
- 0025545476
- VCODE: A data-parallel intermediate language
- Blelloch, G. and S. Chatterjee. VCODE: A data-parallel intermediate language. In FOMPC3, 1990, pp. 471-480.
- (1990) FOMPC3 , pp. 471-480
- Blelloch, G.¹ Chatterjee, S.²

2
- 84867541841
- Blelloch, G. and S. Chatterjee. CVL: A C vector language, 1993.
- (1993) CVL: A C Vector Language
- Blelloch, G.¹ Chatterjee, S.²

3
- 43949161602
- Implementation of a portable nested data-parallel language
- Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. JPDC, 21(1), 1994, pp. 4-14.
- (1994) JPDC , vol.21 , Issue.1 , pp. 4-14
- Blelloch, G.E.¹ Chatterjee, S.² Hardwick, J.C.³ Sipelstein, J.⁴ Zagha, M.⁵

4
- 0030381077
- The quickhull algorithm for convex hulls
- Barber, C. B., D. P. Dobkin, and H. Huhdanpaa. The quickhull algorithm for convex hulls. ACM TOMS, 22(4), 1996, pp. 469-483.
- (1996) ACM TOMS , vol.22 , Issue.4 , pp. 469-483
- Barber, C.B.¹ Dobkin, D.P.² Huhdanpaa, H.³

5
- 78249233242
- Lazy tree splitting
- ACM, September
- Bergstrom, L., M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Lazy tree splitting. In ICFP '10. ACM, September 2010, pp. 93-104.
- (2010) ICFP '10 , pp. 93-104
- Bergstrom, L.¹ Fluet, M.² Rainey, M.³ Reppy, J.⁴ Shaw, A.⁵

6
- 33846349887
- A hierarchical O(N logN) force calculation algorithm
- December
- Barnes, J. and P. Hut. A hierarchical O(N logN) force calculation algorithm. Nature, 324, December 1986, pp. 446-449.
- (1986) Nature , vol.324 , pp. 446-449
- Barnes, J.¹ Hut, P.²

7
- 0030105185
- Programming parallel algorithms
- March
- Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85-97.
- (1996) CACM , vol.39 , Issue.3 , pp. 85-97
- Blelloch, G.E.¹

8
- 84858427151
- An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm
- chapter 6, Elsevier Science Publishers, New York, NY
- Burtscher, M. and K. Pingali. An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. In GPU Computing Gems Emerald Edition, chapter 6, pp. 75-92. Elsevier Science Publishers, New York, NY, 2011.
- (2011) GPU Computing Gems Emerald Edition , pp. 75-92
- Burtscher, M.¹ Pingali, K.²

9
- 85015692260
- The pricing of options and corporate liabilities
- Black, F. and M. Scholes. The pricing of options and corporate liabilities. JPE, 81(3), 1973, pp. 637-654.
- (1973) JPE , vol.81 , Issue.3 , pp. 637-654
- Black, F.¹ Scholes, M.²

10
- 0025380943
- Compiling collection-oriented languages onto massively parallel computers
- Blelloch, G. E. and G.W. Sabot. Compiling collection-oriented languages onto massively parallel computers. JPDC, 8(2), 1990, pp. 119-134.
- (1990) JPDC , vol.8 , Issue.2 , pp. 119-134
- Blelloch, G.E.¹ Sabot, G.W.²

11
- 84862632175
- GPU programming in a high level language compiling X10 to CUDA
- Available from
- Cunningham, D., R. Bordawekar, and V. Saraswat. GPU programming in a high level language compiling X10 to CUDA. In X10 '11, San Jose, CA, May 2011. Available from http://x10-lang.org/.
- X10 '11, San Jose, CA, May 2011
- Cunningham, D.¹ Bordawekar, R.² Saraswat, V.³

12
- 80053989560
- Copperhead: Compiling an embedded data parallel language
- ACM
- Catanzaro, B., M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPoPP '11, San Antonio, TX, February 2011. ACM, pp. 47-56.
- PPoPP '11, San Antonio, TX, February 2011 , pp. 47-56
- Catanzaro, B.¹ Garland, M.² Keutzer, K.³

13
- 0027632582
- Compiling nested data-parallel programs for shared-memory multiprocessors
- July
- Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400-462.
- (1993) ACM TOPLAS , vol.15 , Issue.3 , pp. 400-462
- Chatterjee, S.¹

14
- 79952136178
- Accelerating Haskell array codes with multicore GPUs
- ACM
- Chakravarty, M. M., G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In DAMP '11, Austin, January 2011. ACM, pp. 3-14.
- DAMP '11, Austin, January 2011 , pp. 3-14
- Chakravarty, M.M.¹ Keller, G.² Lee, S.³ McDonell, T.L.⁴ Grover, V.⁵

15
- 84937389888
- Nepal - Nested data parallelism in Haskell
- Euro-Par '01, Springer- Verlag, August
- Chakravarty, M. M. T., G. Keller, R. Leshchinskiy, and W. Pfannenstiel. Nepal - nested data parallelism in Haskell. In Euro-Par '01, vol. 2150 of LNCS. Springer-Verlag, August 2001, pp. 524-534.
- (2001) LNCS , vol.2150 , pp. 524-534
- Chakravarty, M.M.T.¹ Keller, G.² Leshchinskiy, R.³ Pfannenstiel, W.⁴

16
- 79551658111
- Partial vectorisation of Haskell programs
- ACM, January Available from
- Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial vectorisation of Haskell programs. In DAMP '08. ACM, January 2008, pp. 2-16. Available from http://clip.dia.fi.upm.es/Conferences/DAMP08/.
- (2008) DAMP '08 , pp. 2-16
- Chakravarty, M.M.T.¹ Leshchinskiy, R.² Jones, S.P.³ Keller, G.⁴

17
- 84867517226
- A new method for GPU based irregular reductions and its application to k-means clustering
- ACM
- Dhanasekaran, B. and N. Rubin. A new method for GPU based irregular reductions and its application to k-means clustering. In GPGPU-4, Newport Beach, California, March 2011. ACM.
- GPGPU-4, Newport Beach, California, March 2011
- Dhanasekaran, B.¹ Rubin, N.²

18
- 12744262557
- Threaded code variations and optimizations
- Available from
- Ertl, M. A. Threaded code variations and optimizations. In EuroForth 2001, Schloss Dagstuhl, Germany, November 2001. pp. 49-55. Available from http://www.complang.tuwien.ac.at/papers/.
- EuroForth 2001, Schloss Dagstuhl, Germany, November 2001 , pp. 49-55
- Ertl, M.A.¹

19
- 84867517229
- Technical Report TRA1/12, National University of Singapore, School of Computing, January
- Gao, M., T.-T. Cao, A. Nanjappa, T.-S. Tan, and Z. Huang. A GPU Algorithm for Convex Hull. Technical Report TRA1/12, National University of Singapore, School of Computing, January 2012.
- (2012) A GPU Algorithm for Convex Hull
- Gao, M.¹ Cao, T.-T.² Nanjappa, A.³ Tan, T.-S.⁴ Huang, Z.⁵

20
- 84870436907
- GHC. Available from
- GHC. The Glasgow Haskell Compiler. Available from http://www.haskell.org/ ghc.
- The Glasgow Haskell Compiler

21
- 33747508171
- SAC - A Functional Array Language for Efficient Multi-threaded Execution
- August
- Grelck, C. and S.-B. Scholz. SAC - A Functional Array Language for Efficient Multi-threaded Execution. IJPP, 34(4), August 2006, pp. 383-427.
- (2006) IJPP , vol.34 , Issue.4 , pp. 383-427
- Grelck, C.¹ Scholz, S.-B.²

22
- 79952162843
- Breaking the GPU programming barrier with the auto-parallelising SAC compiler
- ACM
- Guo, J., J. Thiyagalingam, and S.-B. Scholz. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In DAMP '11, Austin, January 2011. ACM, pp. 15-24.
- DAMP '11, Austin, January 2011 , pp. 15-24
- Guo, J.¹ Thiyagalingam, J.² Scholz, S.-B.³

23
- 84882564541
- Thrust: A productivity-oriented library for CUDA
- W.W. Hwu (ed.), chapter 26, Morgan Kaufmann Publishers, October
- Hoberock, J. and N. Bell. Thrust: A productivity-oriented library for CUDA. InW.W. Hwu (ed.), GPU Computing Gems, Jade Edition, chapter 26, pp. 359-372. Morgan Kaufmann Publishers, October 2011.
- (2011) GPU Computing Gems, Jade Edition , pp. 359-372
- Hoberock, J.¹ Bell, N.²

24
- 38849195846
- Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany
- Keller, G. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 1999.
- (1999) Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines
- Keller, G.¹

25
- 70349100958
- November Available from
- Khronos OpenCL Working Group. OpenCL 1.2 Specification, November 2011. Available from http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.
- (2011) OpenCL 1.2 Specification

26
- 79952182078
- Simple optimizations for an applicative array language for graphics processors
- ACM
- Larsen, B. Simple optimizations for an applicative array language for graphics processors. In DAMP '11, Austin, January 2011. ACM, pp. 25-34.
- DAMP '11, Austin, January 2011 , pp. 25-34
- Larsen, B.¹

27
- 33746637093
- Higher order flattening
- V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), ICCS '06, Springer-Verlag, May
- Leshchinskiy, R., M. M. T. Chakravarty, and G. Keller. Higher order flattening. In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), ICCS '06, number 3992 in LNCS. Springer-Verlag, May 2006, pp. 920-928.
- (2006) LNCS , vol.3992 , pp. 920-928
- Leshchinskiy, R.¹ Chakravarty, M.M.T.² Keller, G.³

28
- 33746593471
- Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany
- Leshchinskiy, R. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. dissertation, Technische Universität Berlin, Berlin, Germany, 2005.
- (2005) Higher-Order Nested Data Parallelism: Semantics and Implementation
- Leshchinskiy, R.¹

29
- 84858391043
- Scalable GPU graph traversal
- ACM
- Merrill, D., M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 117-128.
- PPoPP '12, New Orleans, LA, February 2012 , pp. 117-128
- Merrill, D.¹ Garland, M.² Grimshaw, A.³

30
- 84858374841
- A GPU implementation of inclusion-based points-to analysis
- ACM
- Mendez-Lojo, M., M. Burtscher, and K. Pingali. A GPU implementation of inclusion-based points-to analysis. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 107-116.
- PPoPP '12, New Orleans, LA, February 2012 , pp. 107-116
- Mendez-Lojo, M.¹ Burtscher, M.² Pingali, K.³

31
- 78249272964
- Nikola: Embedding compiled GPU functions in Haskell
- ACM
- Mainland, G. and G. Morrisett. Nikola: Embedding compiled GPU functions in Haskell. In HASKELL '10, Baltimore, MD, September 2010. ACM, pp. 67-78.
- HASKELL '10, Baltimore, MD, September 2010 , pp. 67-78
- Mainland, G.¹ Morrisett, G.²

32
- 84862941846
- NVIDIA
- NVIDIA. NVIDIA CUDA C Best Practices Guide, 2011.
- (2011) NVIDIA CUDA C Best Practices Guide

33
- 79551704836
- Available from
- NVIDIA. NVIDIA CUDA C Programming Guide, 2011. Available from http://developer.nvidia.com/category/zone/cuda-zone.
- (2011) NVIDIA CUDA C Programming Guide

34
- 77956373685
- OptiX: A general purpose ray tracing engine
- 29, July
- Parker, S. G., J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich. OptiX: a general purpose ray tracing engine. ACM TOG, 29, July 2010.
- (2010) ACM TOG
- Parker, S.G.¹ Bigler, J.² Dietrich, A.³ Friedrich, H.⁴ Hoberock, J.⁵ Luebke, D.⁶ McAllister, D.⁷ McGuire, M.⁸ Morley, K.⁹ Robison, A.¹⁰ Stich, M.¹¹

35
- 0029196596
- Work-efficient nested data-parallelism
- IEEE Computer Society Press
- Palmer, D. W., J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In FoMPP5. IEEE Computer Society Press, 1995, pp. 186-193.
- (1995) FoMPP5 , pp. 186-193
- Palmer, D.W.¹ Prins, J.F.² Westfold, S.³

36
- 0029204372
- Optimizing an ANSI C interpreter with superoperators
- ACM
- Proebsting, T. A. Optimizing an ANSI C interpreter with superoperators. In POPL '95, San Francisco, January 1995. ACM, pp. 322-332.
- POPL '95, San Francisco, January 1995 , pp. 322-332
- Proebsting, T.A.¹

37
- 78651284120
- Scan primitives for GPU computing
- Eurographics Association
- Sengupta, S., M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In GH '07, San Diego, CA, August 2007. Eurographics Association, pp. 97-106.
- GH '07, San Diego, CA, August 2007 , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

38
- 67650065270
- Stack-based parallel recursion on graphics processors
- ACM
- Yang, K., B. He, Q. Luo, P. V. Sander, and J. Shi. Stack-based parallel recursion on graphics processors. In PPoPP '09, Raleigh, NC, February 2009. ACM, pp. 299-300.
- PPoPP '09, Raleigh, NC, February 2009 , pp. 299-300
- Yang, K.¹ He, B.² Luo, Q.³ Sander, P.V.⁴ Shi, J.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.