메뉴 건너뛰기




Volumn 21, Issue 2, 2011, Pages 245-272

High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

Author keywords

GPU; kernel fusion; Parallel sorting; prefix scan; prefix sum; radix sorting

Indexed keywords

GPU; KERNEL FUSION; PARALLEL SORTING; PREFIX SCAN; PREFIX SUM; RADIX SORTING;

EID: 79959718248     PISSN: 01296264     EISSN: None     Source Type: Journal    
DOI: 10.1142/S0129626411000187     Document Type: Article
Times cited : (146)

References (47)
  • 1
    • 49049088756 scopus 로고    scopus 로고
    • GPU computing
    • May
    • J D Owens et al., "GPU Computing," Proceedings of the IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
    • (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 879-899
    • J D Owens1
  • 2
    • 77954995885 scopus 로고    scopus 로고
    • Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
    • Saint-Malo, France
    • Victor W Lee et al., "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU," in Proceedings of the 37th annual international symposium on Computer architecture, Saint-Malo, France, 2010, pp. 451-460.
    • (2010) Proceedings of the 37th Annual International Symposium on Computer Architecture , pp. 451-460
    • Lee, V.W.1
  • 6
    • 79959717805 scopus 로고    scopus 로고
    • GPGPU.org. [Online].
    • GPGPU.org. [Online]. http://gpgpu.org/developer/cudpp
  • 8
    • 84865096511 scopus 로고    scopus 로고
    • Efficient implementation of sorting on multi-core SIMD CPU architecture
    • Jatin Chhugani et al., "Efficient implementation of sorting on multi-core SIMD CPU architecture," Proc. VLDB Endow., pp. 1313-1324, 2008.
    • (2008) Proc. VLDB Endow , pp. 1313-1324
    • Jatin Chhugani1
  • 10
    • 0003657590 scopus 로고
    • Reading, MA, USA: Addison-Wesley, Sorting and Searching
    • Donald Knuth, The Art of Computer Programming. Reading, MA, USA: Addison-Wesley, 1973, vol. III: Sorting and Searching.
    • (1973) The Art of Computer Programming , vol.3
    • Knuth, D.1
  • 14
    • 57749174539 scopus 로고    scopus 로고
    • Real-time KD-tree construction on graphics hardware
    • papers, Singapore
    • Kun Zhou, Qiming Hou, Rui Wang, and Baining Guo, "Real-time KD-tree construction on graphics hardware," in SIGGRAPH Asia '08: ACM SIGGRAPH Asia 2008 papers, Singapore, 2008, pp. 1-11.
    • (2008) SIGGRAPH Asia '08: ACM SIGGRAPH Asia 2008 , pp. 1-11
    • Zhou, K.1    Hou, Q.2    Wang, R.3    Guo, B.4
  • 17
    • 77749271078 scopus 로고    scopus 로고
    • RenderAnts: Interactive reyes rendering on GPUs
    • papers, Yokohama, Japan
    • Kun Zhou et al., "RenderAnts: interactive Reyes rendering on GPUs," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-11.
    • (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-11
    • Zhou, K.1
  • 18
    • 77749264949 scopus 로고    scopus 로고
    • Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs
    • papers, Yokohama, Japan
    • Bernhard Kainz et al., "Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-9.
    • (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-9
    • Kainz, B.1
  • 21
    • 79959706230 scopus 로고    scopus 로고
    • Fast ray sorting and breadth-first packet traversal for GPU ray tracing
    • Charles Loop and Kirill Garanzha, "Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing," in Eurographics, 2010.
    • (2010) Eurographics
    • Loop, C.1    Garanzha, K.2
  • 22
    • 80955153103 scopus 로고    scopus 로고
    • Ignacio Castaño., February)
    • Ignacio Castaño. (2007, February) High Quality DXT Compression Using CUDA. [Online]. http://developer.download.nvidia.com/compute/cuda/sdk/ website/projects/dxtc/doc/cuda-dxtc. pdf
    • (2007) High Quality DXT Compression Using CUDA. [Online]
  • 23
    • 77749295512 scopus 로고    scopus 로고
    • Real-time parallel hashing on the GPU
    • papers, Yokohama, Japan
    • Dan A Alcantara et al., "Real-time parallel hashing on the GPU," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-9.
    • (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-9
    • Alcantara, D.A.1
  • 25
    • 33947607609 scopus 로고    scopus 로고
    • GPUTeraSort: High performance graphics co-processor sorting for large database management
    • DOI 10.1145/1142473.1142511, SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
    • Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha, "GPUTeraSort: high performance graphics co-processor sorting for large database management," in SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, Chicago, IL, 2006, pp. 325-336. (Pubitemid 46950863)
    • (2006) Proceedings of the ACM SIGMOD International Conference on Management of Data , pp. 325-336
    • Govindaraju, N.1    Gray, J.2    Kumar, R.3    Manocha, D.4
  • 26
    • 29844438097 scopus 로고    scopus 로고
    • Fast and approximate stream mining of quantiles and frequencies using graphics processors
    • DOI 10.1145/1066157.1066227, SIGMOD 2005: Proceedings of the ACM SIGMOD International Conference on Management of Data
    • Naga Govindaraju, Nikunj Raghuvanshi, and Dinesh Manocha, "Fast and approximate stream mining of quantiles and frequencies using graphics processors," in SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, MD, 2005, pp. 611-622. (Pubitemid 43038962)
    • (2005) Proceedings of the ACM SIGMOD International Conference on Management of Data , pp. 611-622
    • Govindaraju, N.K.1    Raghuvanshi, N.2    Manocha, D.3
  • 27
    • 65249086158 scopus 로고    scopus 로고
    • March of the froblins: Simulation and rendering massive crowds of intelligent and detailed creatures on GPU
    • classes, Los Angeles, CA
    • Jeremy Shopf, Joshua Barczak, Christopher Oat, and Natalya Tatarchuk, "March of the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU," in SIGGRAPH '08: ACM SIGGRAPH 2008 classes, Los Angeles, CA, 2008, pp. 52- 101.
    • (2008) SIGGRAPH '08: ACM SIGGRAPH 2008 , pp. 52-101
    • Shopf, J.1    Barczak, J.2    Oat, C.3    Tatarchuk, N.4
  • 29
    • 0020102009 scopus 로고
    • A regular layout for parallel adders
    • March
    • R P Brent and H T Kung, "A Regular Layout for Parallel Adders," IEEE Trans. Comput., vol. 31, no. 3, pp. 260-264, March 1982.
    • (1982) IEEE Trans. Comput. , vol.31 , Issue.3 , pp. 260-264
    • R P Brent1    H T Kung2
  • 30
    • 0025550099 scopus 로고
    • Scan primitives for vector computers
    • Siddhartha Chatterjee, Guy Blelloch, and Marco Zagha, "Scan primitives for vector computers," in Supercomputing '90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, New York, New York, 1990, pp. 666-675. (Pubitemid 21675225)
    • (1990) Proc Supercomput 90 , pp. 666-675
    • Chatterjee Siddhartha1    Blelloch Guy, E.2    Zagha Marco3
  • 37
    • 78149268496 scopus 로고    scopus 로고
    • University of Virginia, Department of Computer Science, Charlottesville, VA, USA, Technical Report CS2009-14
    • Duane Merrill and Andrew Grimshaw, "Parallel Scan for Stream Architectures," University of Virginia, Department of Computer Science, Charlottesville, VA, USA, Technical Report CS2009-14, 2009.
    • (2009) Parallel Scan for Stream Architectures
    • Merrill, D.1    Grimshaw, A.2
  • 40
    • 0015651305 scopus 로고
    • A parallel algorithm for the efficient solution of a general class of recurrence equations
    • Peter M Kogge and Harold S Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. Comput., vol. 22, pp. 786-793, 1973.
    • (1973) IEEE Trans. Comput. , vol.22 , pp. 786-793
    • Kogge, P.M.1    Stone, H.S.2
  • 43
    • 78650745912 scopus 로고    scopus 로고
    • GPU-Quicksort: A practical Quicksort algorithm for graphics processors
    • Daniel Cederman and Philippas Tsigas, "GPU-Quicksort: A practical Quicksort algorithm for graphics processors," J. Exp. Algorithmics, vol. 14, pp. 1.4-1.24, 2009.
    • (2009) J. Exp. Algorithmics , vol.14 , pp. 14-124
    • Cederman, D.1    Tsigas, P.2
  • 46
    • 0002924004 scopus 로고
    • School of Computer Science, Carnegie Mellon University, Technical Report CMU-CS-
    • Guy Blelloch, "Prefix Sums and Their Applications," School of Computer Science, Carnegie Mellon University, Technical Report CMU-CS-90-190, 1990.
    • (1990) Prefix Sums and Their Applications , pp. 90-190
    • Blelloch, G.1
  • 47
    • 79959727793 scopus 로고    scopus 로고
    • Thrust. [Online]
    • Thrust. [Online]. http://code.google.com/p/thrust/.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.