메뉴 건너뛰기




Volumn 47, Issue 9, 2012, Pages 247-258

Nested data-parallelism on the GPU

Author keywords

GPGPU; GPU; NESL; Nested data parallelism

Indexed keywords

ARITHMETIC PERFORMANCE; DATA PARALLELISM; DATA-LEVEL PARALLELISM; DIVIDE-AND-CONQUER ALGORITHM; EMPIRICAL EVIDENCE; FIRST-ORDER FUNCTIONAL LANGUAGES; GPGPU; GPU; GRAPHICS PROCESSING UNITS; LANGUAGE IMPLEMENTATIONS; MEMORY BANDWIDTHS; NESL; NESTED DATA; PARALLEL COMPUTER;

EID: 84870410502     PISSN: 15232867     EISSN: None     Source Type: Journal    
DOI: 10.1145/2398856.2364563     Document Type: Conference Paper
Times cited : (17)

References (38)
  • 1
    • 0025545476 scopus 로고
    • VCODE: A data-parallel intermediate language
    • BC90
    • [BC90] Blelloch, G. and S. Chatterjee. VCODE: A data-parallel intermediate language. In FOMPC3, 1990, pp. 471-480.
    • (1990) FOMPC3 , pp. 471-480
    • Blelloch, G.1    Chatterjee, S.2
  • 3
    • 43949161602 scopus 로고
    • Implementation of a portable nested data-parallel language
    • BCH+94
    • [BCH+94] Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. JPDC, 21(1), 1994, pp. 4-14.
    • (1994) JPDC , vol.21 , Issue.1 , pp. 4-14
    • Blelloch, E.G.1    Chatterjee, S.2    Hardwick, J.C.3    Sipelstein, J.4    Zagha, M.5
  • 4
    • 0030381077 scopus 로고    scopus 로고
    • The quickhull algorithm for convex hulls
    • BDH96
    • [BDH96] Barber, C. B., D. P. Dobkin, and H. Huhdanpaa. The quickhull algorithm for convex hulls. ACM TOMS, 22(4), 1996, pp. 469-483.
    • (1996) ACM TOMS , vol.22 , Issue.4 , pp. 469-483
    • Barber, B.C.1    Dobkin, D.P.2    Huhdanpaa, H.3
  • 6
    • 33846349887 scopus 로고
    • A hierarchical O(N log N) force calculation algorithm
    • [BH86]. 324, December
    • [BH86] Barnes, J. and P. Hut. A hierarchical O(N logN) force calculation algorithm. Nature, 324, December 1986, pp. 446-449.
    • (1986) Nature , pp. 446-449
    • Barnes, J.1    Hut, P.2
  • 7
    • 0030105185 scopus 로고    scopus 로고
    • Programming parallel algorithms
    • [Ble96], , March
    • [Ble96] Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85-97.
    • (1996) CACM , vol.39 , Issue.3 , pp. 85-97
    • Blelloch, E.G.1
  • 8
    • 84858427151 scopus 로고    scopus 로고
    • An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm
    • [BP11]. In, chapter 6, pp. 75-92. Elsevier Science Publishers, New York, NY
    • [BP11] Burtscher, M. and K. Pingali. An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. In GPU Computing Gems Emerald Edition, chapter 6, pp. 75-92. Elsevier Science Publishers, New York, NY, 2011.
    • (2011) GPU Computing Gems Emerald Edition
    • Burtscher, M.1    Pingali, K.2
  • 9
    • 85015692260 scopus 로고
    • The pricing of options and corporate liabilities
    • [BS73]
    • [BS73] Black, F. and M. Scholes. The pricing of options and corporate liabilities. JPE, 81(3), 1973, pp. 637-654.
    • (1973) JPE , vol.81 , Issue.3 , pp. 637-654
    • Black, F.1    Scholes, M.2
  • 10
    • 0025380943 scopus 로고
    • Compiling collection-oriented languages onto massively parallel computers
    • BS90
    • [BS90] Blelloch, G. E. and G.W. Sabot. Compiling collection-oriented languages onto massively parallel computers. JPDC, 8(2), 1990, pp. 119-134.
    • (1990) JPDC , vol.8 , Issue.2 , pp. 119-134
    • Blelloch, E.G.1    Sabot, G.W.2
  • 11
    • 84862632175 scopus 로고    scopus 로고
    • GPU programming in a high level language compiling X10 to CUDA
    • [CBS11]. In, San Jose, CA, May. http://x10-lang.org
    • [CBS11] Cunningham, D., R. Bordawekar, and V. Saraswat. GPU programming in a high level language compiling X10 to CUDA. In X10'11, San Jose, CA, May 2011. Available from http://x10-lang.org/.
    • (2011) X10'11
    • Cunningham, D.1    Bordawekar, R.2    Saraswat, V.3
  • 12
    • 79952784184 scopus 로고    scopus 로고
    • Copperhead: Compiling an embedded data parallel language
    • [CGK11]. In, San Antonio, TX, February. ACM
    • [CGK11] Catanzaro, B., M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. In PPoPP '11, San Antonio, TX, February 2011. ACM, pp. 47-56.
    • (2011) PPoPP '11 , pp. 47-56
    • Catanzaro, B.1    Garland, M.2    Keutzer, K.3
  • 13
    • 0027632582 scopus 로고
    • Compiling nested data-parallel programs for shared-memory multiprocessors
    • [Cha93] July
    • [Cha93] Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400-462.
    • (1993) ACM TOPLAS , vol.15 , Issue.3 , pp. 400-462
    • Chatterjee, S.1
  • 14
    • 79952136178 scopus 로고    scopus 로고
    • Accelerating Haskell array codes with multicore GPUs
    • [CKL+11]. In, Austin, January. ACM
    • [CKL+11] Chakravarty, M. M., G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In DAMP '11, Austin, January 2011. ACM, pp. 3-14.
    • (2011) DAMP '11 , pp. 3-14
    • Chakravarty, M.M.1    Keller, G.2    Lee, S.3    McDonell, T.L.4    Grover, V.5
  • 15
    • 84937389888 scopus 로고    scopus 로고
    • Nepal - Nested data parallelism in Haskell
    • [CKLP01]. of LNCS. Springer-Verlag, August
    • [CKLP01] Chakravarty, M. M. T., G. Keller, R. Leshchinskiy, and W. Pfannenstiel. Nepal - nested data parallelism in Haskell. In Euro-Par '01, vol. 2150 of LNCS. Springer-Verlag, August 2001, pp. 524-534.
    • (2001) Euro-Par '01 , vol.2150 , pp. 524-534
    • Chakravarty, T.M.M.1    Keller, G.2    Leshchinskiy, R.3    Pfannenstiel, W.4
  • 16
    • 79551658111 scopus 로고    scopus 로고
    • Partial vectorisation of Haskell programs
    • [CLPK08]. In. ACM, January, pp.. Available from
    • [CLPK08] Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial vectorisation of Haskell programs. In DAMP '08. ACM, January 2008, pp. 2-16. Available from http://clip.dia.fi.upm.es/Conferences/DAMP08/.
    • (2008) DAMP '08 , pp. 2-16
    • Chakravarty, T.M.M.1    Leshchinskiy, R.2    Peyton Jones, S.3    Keller, G.4
  • 17
    • 84872376298 scopus 로고    scopus 로고
    • A new method for GPU based irregular reductions and its application to k-means clustering
    • [DR11]. In, Newport Beach, California, March. ACM
    • [DR11] Dhanasekaran, B. and N. Rubin. A new method for GPU based irregular reductions and its application to k-means clustering. In GPGPU-4, Newport Beach, California, March 2011. ACM.
    • (2011) GPGPU-4
    • Dhanasekaran, B.1    Rubin, N.2
  • 18
    • 12744262557 scopus 로고    scopus 로고
    • Threaded code variations and optimizations
    • [Ert01] . In, Schloss Dagstuhl, Germany, November. pp.. Available from
    • [Ert01] Ertl, M. A. Threaded code variations and optimizations. In EuroForth 2001, Schloss Dagstuhl, Germany, November 2001. pp. 49-55. Available from http://www.complang. tuwien.ac.at/papers/.
    • (2001) EuroForth 2001 , pp. 49-55
    • Ertl, A.M.1
  • 19
  • 21
    • 33747508171 scopus 로고    scopus 로고
    • SAC - A Functional Array Language for Efficient Multi-threaded Execution
    • [GS06] August
    • [GS06] Grelck, C. and S.-B. Scholz. SAC - A Functional Array Language for Efficient Multi-threaded Execution. IJPP, 34(4), August 2006, pp. 383-427.
    • (2006) IJPP , vol.34 , Issue.4 , pp. 383-427
    • Grelck, C.1    Scholz, S.-B.2
  • 22
    • 79952162843 scopus 로고    scopus 로고
    • Breaking the GPU programming barrier with the auto-parallelising SAC compiler
    • [GTS11]. In, Austin, January. ACM
    • [GTS11] Guo, J., J. Thiyagalingam, and S.-B. Scholz. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In DAMP '11, Austin, January 2011. ACM, pp. 15-24.
    • (2011) DAMP '11 , pp. 15-24
    • Guo, J.1    Thiyagalingam, J.2    Scholz, S.-B.3
  • 23
    • 84882564541 scopus 로고    scopus 로고
    • Thrust: A productivity-oriented library for CUDA
    • [HB11]. In W.W. Hwu (ed.), chapter 26, pp. 359-372. Morgan Kaufmann Publishers, October
    • [HB11] Hoberock, J. and N. Bell. Thrust: A productivity-oriented library for CUDA. InW.W. Hwu (ed.), GPU Computing Gems, Jade Edition, chapter 26, pp. 359-372. Morgan Kaufmann Publishers, October 2011.
    • (2011) GPU Computing Gems, Jade Edition
    • Hoberock, J.1    Bell, N.2
  • 25
    • 84870456255 scopus 로고    scopus 로고
    • Khronos open CL working group
    • [Khr11]., November. Available from
    • [Khr11] Khronos OpenCL Working Group. OpenCL 1.2 Specification, November 2011. Available from http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.
    • (2011) OpenCL 1.2 Specification
  • 26
    • 79952182078 scopus 로고    scopus 로고
    • Simple optimizations for an applicative array language for graphics processors
    • [Lar11] . In, Austin, January. ACM
    • [Lar11] Larsen, B. Simple optimizations for an applicative array language for graphics processors. In DAMP '11, Austin, January 2011. ACM, pp. 25-34.
    • (2011) DAMP '11 , pp. 25-34
    • Larsen, B.1
  • 27
    • 33746637093 scopus 로고    scopus 로고
    • Higher order flattening
    • [LCK06]. In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), number 3992 in LNCS. Springer- Verlag, May
    • [LCK06] Leshchinskiy, R., M. M. T. Chakravarty, and G. Keller. Higher order flattening. In V. Alexandrov, D. van Albada, P. Sloot, and J. Dongarra (eds.), ICCS '06, number 3992 in LNCS. Springer- Verlag, May 2006, pp. 920-928.
    • (2006) ICCS '06 , pp. 920-928
    • Leshchinskiy, R.1    Chakravarty, M.M.T.2    Keller, G.3
  • 29
    • 84858391043 scopus 로고    scopus 로고
    • Scalable GPU graph traversal
    • [MGG12]. In, New Orleans, LA, February. ACM
    • [MGG12] Merrill, D., M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 117-128.
    • (2012) PPoPP '12 , pp. 117-128
    • Merrill, D.1    Garland, M.2    Grimshaw, A.3
  • 30
    • 84858374841 scopus 로고    scopus 로고
    • A GPU implementation of inclusion-based points-to analysis
    • [MLBP12]. In, New Orleans, LA, February. ACM
    • [MLBP12] Mendez-Lojo, M., M. Burtscher, and K. Pingali. A GPU implementation of inclusion-based points-to analysis. In PPoPP '12, New Orleans, LA, February 2012. ACM, pp. 107-116.
    • (2012) PPoPP '12 , pp. 107-116
    • Mendez-Lojo, M.1    Burtscher, M.2    Pingali, K.3
  • 31
    • 78249272964 scopus 로고    scopus 로고
    • Nikola: Embedding compiled GPU functions in Haskell
    • [MM10]. In, Baltimore, MD, September. ACM
    • [MM10] Mainland, G. and G. Morrisett. Nikola: Embedding compiled GPU functions in Haskell. In HASKELL '10, Baltimore, MD, September 2010. ACM, pp. 67-78.
    • (2010) HASKELL '10 , pp. 67-78
    • Mainland, G.1    Morrisett, G.2
  • 33
    • 35948991669 scopus 로고    scopus 로고
    • [NVI11b], Available from
    • [NVI11b] NVIDIA. NVIDIA CUDA C Programming Guide, 2011. Available from http://developer.nvidia. com/category/zone/cuda-zone.
    • (2011) NVIDIA. NVIDIA CUDA C Programming Guide
  • 35
    • 0029196596 scopus 로고
    • Work-efficient nested data-parallelism
    • [PPW95]. In. IEEE Computer Society
    • [PPW95] Palmer, D. W., J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In FoMPP5. IEEE Computer Society Press, 1995, pp. 186-193.
    • (1995) FoMPP5 , pp. 186-193
    • Palmer, W.D.1    Prins, J.F.2    Westfold, S.3
  • 36
    • 0029204372 scopus 로고
    • Optimizing an ANSI C interpreter with superoperators
    • [Pro95] . In, San Francisco, January. ACM. pp
    • [Pro95] Proebsting, T. A. Optimizing an ANSI C interpreter with superoperators. In POPL '95, San Francisco, January 1995. ACM. pp. 322-332.
    • (1995) POPL '95 , pp. 322-332
    • Proebsting, A.T.1
  • 37
    • 78651284120 scopus 로고    scopus 로고
    • Scan primitives for GPU computing
    • [SHZO07]. In, San Diego, CA, August. Eurographics Association
    • [SHZO07] Sengupta, S., M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In GH '07, San Diego, CA, August 2007. Eurographics Association, pp. 97-106.
    • (2007) GH '07 , pp. 97-106
    • Sengupta, S.1    Harris, M.2    Zhang, Y.3    Owens, J.D.4
  • 38
    • 67650065270 scopus 로고    scopus 로고
    • Stackbased parallel recursion on graphics processors
    • [YHL+09]. In, Raleigh, NC, February. ACM
    • [YHL+09] Yang, K., B. He, Q. Luo, P. V. Sander, and J. Shi. Stackbased parallel recursion on graphics processors. In PPoPP '09, Raleigh, NC, February 2009. ACM, pp. 299-300.
    • (2009) PPoPP '09 , pp. 299-300
    • Yang, K.1    He, B.2    Luo, Q.3    Sander, P.V.4    Shi, J.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.