SCOPUS 정보 검색 플랫폼

Parallel Processing Letters

Volumn 21, Issue 2, 2011, Pages 245-272

High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

(2) Merrill, Duane a Grimshaw, Andrew a

a University of Virginia (United States)

Author keywords

GPU; kernel fusion; Parallel sorting; prefix scan; prefix sum; radix sorting

Indexed keywords

GPU; KERNEL FUSION; PARALLEL SORTING; PREFIX SCAN; PREFIX SUM; RADIX SORTING;

ALGORITHMS; COMPUTER ARCHITECTURE; PARALLEL PROCESSING SYSTEMS; PROGRAM PROCESSORS;

SORTING;

EID: 79959718248 PISSN: 01296264 EISSN: None Source Type: Journal
DOI: 10.1142/S0129626411000187 Document Type: Article

Times cited : (147)

References (47)

1
- 49049088756
- GPU computing
- May
- J D Owens et al., "GPU Computing," Proceedings of the IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
- (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 879-899
- J D Owens¹

2
- 77954995885
- Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
- Saint-Malo, France
- Victor W Lee et al., "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU," in Proceedings of the 37th annual international symposium on Computer architecture, Saint-Malo, France, 2010, pp. 451-460.
- (2010) Proceedings of the 37th Annual International Symposium on Computer Architecture , pp. 451-460
- Lee, V.W.¹

3
- 85092761228
- On the limits of GPU acceleration
- Berkeley, CA
- Richard Vuduc, Aparna Chandramowlishwaran, Jee Choi, Murat Guney, and Aashay Shringarpure, "On the limits of GPU acceleration," in Proceedings of the 2nd USENIX conference on Hot topics in parallelism (HotPar'10), Berkeley, CA, 2010, pp. 13-13.
- (2010) Proceedings of the 2nd USENIX conference on Hot topics in parallelism (HotPar'10) , pp. 13-13
- Vuduc, R.¹ Chandramowlishwaran, A.² Choi, J.³ Guney, M.⁴ Shringarpure, A.⁵

4
- 0003880013
- Reading, MA, USA: Addisson-Wesley
- Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA, USA: Addisson-Wesley, 1995.
- (1995) Design Patterns: Elements of Reusable Object-Oriented Software
- Gamma, E.¹ Helm, R.² Johnson, R.³ Vlissides, J.⁴

5
- 70450077484
- Designing efficient sorting algorithms for manycore GPUs
- Nadathur Satish, Mark Harris, and Michael Garland, "Designing efficient sorting algorithms for manycore GPUs," in IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, pp. 1-10.
- (2009) IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing , pp. 1-10
- Satish, N.¹ Harris, M.² Garland, M.³

6
- 79959717805
- GPGPU.org. [Online].
- GPGPU.org. [Online]. http://gpgpu.org/developer/cudpp

7
- 77954743119
- Nadathur Satish et al., "Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort," , 2010, pp. 351-362.
- (2010) Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort , pp. 351-362
- Nadathur Satish¹

8
- 84865096511
- Efficient implementation of sorting on multi-core SIMD CPU architecture
- Jatin Chhugani et al., "Efficient implementation of sorting on multi-core SIMD CPU architecture," Proc. VLDB Endow., pp. 1313-1324, 2008.
- (2008) Proc. VLDB Endow , pp. 1313-1324
- Jatin Chhugani¹

9
- 79959733546
- techreport
- Nadathur Satish et al., "Fast Sort on CPUs, GPUs and Intel MIC Architectures," techreport 2010.
- (2010) Fast Sort on CPUs, GPUs and Intel MIC Architectures
- Satish, N.¹

10
- 0003657590
- Reading, MA, USA: Addison-Wesley, Sorting and Searching
- Donald Knuth, The Art of Computer Programming. Reading, MA, USA: Addison-Wesley, 1973, vol. III: Sorting and Searching.
- (1973) The Art of Computer Programming , vol.3
- Knuth, D.¹

11
- 0004116989
- 2nd ed.: McGraw-Hill
- Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein, Introduction to Algorithms, 2nd ed.: McGraw-Hill, 2001.
- (2001) Introduction to Algorithms
- Cormen, T.H.¹ Leiserson, C.E.² Rivest, R.L.³ Stein, C.⁴

12
- 78650814463
- Fast PGAS implementation of distributed graph algorithms
- Guojing Cong, George Almasi, and Vijay Saraswat, "Fast PGAS Implementation of Distributed Graph Algorithms," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), 2010, pp. 1-11.
- (2010) Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10) , pp. 1-11
- Cong, G.¹ Almasi, G.² Saraswat, V.³

13
- 78650885777
- Data-parallel octrees for surface reconstruction
- to appear
- Kun Zhou, Minmin Gong, Xin Huang, and Baining Guo, "Data-Parallel Octrees for Surface Reconstruction," EEE Transactions on Visualization & Computer Graphics, p. to appear, 2010.
- (2010) EEE Transactions on Visualization & Computer Graphics
- Zhou, K.¹ Gong, M.² Huang, X.³ Guo, B.⁴

14
- 57749174539
- Real-time KD-tree construction on graphics hardware
- papers, Singapore
- Kun Zhou, Qiming Hou, Rui Wang, and Baining Guo, "Real-time KD-tree construction on graphics hardware," in SIGGRAPH Asia '08: ACM SIGGRAPH Asia 2008 papers, Singapore, 2008, pp. 1-11.
- (2008) SIGGRAPH Asia '08: ACM SIGGRAPH Asia 2008 , pp. 1-11
- Zhou, K.¹ Hou, Q.² Wang, R.³ Guo, B.⁴

15
- 85053417076
- HLBVH: Hierarchical LBVH construction for real-time ray tracing of dynamic geometry
- Saarbrucken, Germany
- J Pantaleoni and D Luebke, "HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry," in Proceedings of the Conference on High Performance Graphics (HPG '10), Saarbrucken, Germany, 2010, pp. 87-95.
- (2010) Proceedings of the Conference on High Performance Graphics (HPG '10) , pp. 87-95
- Pantaleoni, J.¹ Luebke, D.²

16
- 77950453346
- Real-time approximate sorting for self shadowing and transparency in hair rendering
- Redwood City, California
- Erik Sintorn and Ulf Assarsson, "Real-time approximate sorting for self shadowing and transparency in hair rendering," in I3D '08: Proceedings of the 2008 symposium on Interactive 3D graphics and games, Redwood City, California, 2008, pp. 157-162.
- (2008) I3D '08: Proceedings of the 2008 symposium on Interactive 3D Graphics and Games , pp. 157-162
- Sintorn, E.¹ Assarsson, U.²

17
- 77749271078
- RenderAnts: Interactive reyes rendering on GPUs
- papers, Yokohama, Japan
- Kun Zhou et al., "RenderAnts: interactive Reyes rendering on GPUs," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-11.
- (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-11
- Zhou, K.¹

18
- 77749264949
- Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs
- papers, Yokohama, Japan
- Bernhard Kainz et al., "Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-9.
- (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-9
- Kainz, B.¹

19
- 27144467106
- "UberFlow: A GPU-based particle engine
- Grenoble, France
- Peter Kipfer, Mark Segal, and Rüdiger Westermann, "UberFlow: a GPU-based particle engine," in HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Grenoble, France, 2004, pp. 115-122.
- (2004) HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics Hardware , pp. 115-122
- Kipfer, P.¹ Segal, M.² Westermann, R.³

20
- 77951267602
- Interactive fluid-particle simulation using translating Eulerian grids
- Washington, D.C.
- Jonathan M Cohen, Sarah Tariq, and Simon Green, "Interactive fluid-particle simulation using translating Eulerian grids," in I3D '10: Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games, Washington, D.C., 2010, pp. 15-22.
- (2010) I3D '10: Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games , pp. 15-22
- Cohen, J.M.¹ Tariq, S.² Green, S.³

21
- 79959706230
- Fast ray sorting and breadth-first packet traversal for GPU ray tracing
- Charles Loop and Kirill Garanzha, "Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing," in Eurographics, 2010.
- (2010) Eurographics
- Loop, C.¹ Garanzha, K.²

22
- 80955153103
- Ignacio Castaño., February)
- Ignacio Castaño. (2007, February) High Quality DXT Compression Using CUDA. [Online]. http://developer.download.nvidia.com/compute/cuda/sdk/ website/projects/dxtc/doc/cuda-dxtc. pdf
- (2007) High Quality DXT Compression Using CUDA. [Online]

23
- 77749295512
- Real-time parallel hashing on the GPU
- papers, Yokohama, Japan
- Dan A Alcantara et al., "Real-time parallel hashing on the GPU," in SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 papers, Yokohama, Japan, 2009, pp. 1-9.
- (2009) SIGGRAPH Asia '09: ACM SIGGRAPH Asia 2009 , pp. 1-9
- Alcantara, D.A.¹

24
- 54749089017
- Relational joins on graphics processors
- Vancouver, Canada
- Bingsheng He et al., "Relational joins on graphics processors," in SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada, 2008, pp. 511-524.
- (2008) SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of Data , pp. 511-524
- He, B.¹

25
- 33947607609
- GPUTeraSort: High performance graphics co-processor sorting for large database management
- DOI 10.1145/1142473.1142511, SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
- Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha, "GPUTeraSort: high performance graphics co-processor sorting for large database management," in SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, Chicago, IL, 2006, pp. 325-336. (Pubitemid 46950863)
- (2006) Proceedings of the ACM SIGMOD International Conference on Management of Data , pp. 325-336
- Govindaraju, N.¹ Gray, J.² Kumar, R.³ Manocha, D.⁴

26
- 29844438097
- Fast and approximate stream mining of quantiles and frequencies using graphics processors
- DOI 10.1145/1066157.1066227, SIGMOD 2005: Proceedings of the ACM SIGMOD International Conference on Management of Data
- Naga Govindaraju, Nikunj Raghuvanshi, and Dinesh Manocha, "Fast and approximate stream mining of quantiles and frequencies using graphics processors," in SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, MD, 2005, pp. 611-622. (Pubitemid 43038962)
- (2005) Proceedings of the ACM SIGMOD International Conference on Management of Data , pp. 611-622
- Govindaraju, N.K.¹ Raghuvanshi, N.² Manocha, D.³

27
- 65249086158
- March of the froblins: Simulation and rendering massive crowds of intelligent and detailed creatures on GPU
- classes, Los Angeles, CA
- Jeremy Shopf, Joshua Barczak, Christopher Oat, and Natalya Tatarchuk, "March of the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU," in SIGGRAPH '08: ACM SIGGRAPH 2008 classes, Los Angeles, CA, 2008, pp. 52- 101.
- (2008) SIGGRAPH '08: ACM SIGGRAPH 2008 , pp. 52-101
- Shopf, J.¹ Barczak, J.² Oat, C.³ Tatarchuk, N.⁴

28
- 34247381686
- Cache miss behavior: Is it √2
- Ischia, Italy
- A Hartstein, V Srinivasan, T R Puzak, and P G Emma, "Cache miss behavior: is it q2?;' in CF '06: Proceedings of the 3rd conference on Computing frontiers, Ischia, Italy, 2006, pp. 313-320.
- (2006) CF '06: Proceedings of the 3rd conference on Computing Frontiers , pp. 313-320
- Hartstein, A.¹ Srinivasan, V.² Puzak, T.R.³ Emma, P.G.⁴

29
- 0020102009
- A regular layout for parallel adders
- March
- R P Brent and H T Kung, "A Regular Layout for Parallel Adders," IEEE Trans. Comput., vol. 31, no. 3, pp. 260-264, March 1982.
- (1982) IEEE Trans. Comput. , vol.31 , Issue.3 , pp. 260-264
- R P Brent¹ H T Kung²

30
- 0025550099
- Scan primitives for vector computers
- Siddhartha Chatterjee, Guy Blelloch, and Marco Zagha, "Scan primitives for vector computers," in Supercomputing '90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, New York, New York, 1990, pp. 666-675. (Pubitemid 21675225)
- (1990) Proc Supercomput 90 , pp. 666-675
- Chatterjee Siddhartha¹ Blelloch Guy, E.² Zagha Marco³

31
- 0030216116
- Fast parallel sorting under LogP: Experience with the CM-5
- Andrea C Dusseau, David E Culler, Klaus E Schauser, and Richard P Martin, "Fast Parallel Sorting Under LogP: Experience with the CM-5," IEEE Trans. Parallel Distrib. Syst., vol. 7, pp. 791-805, 1996. (Pubitemid 126784485)
- (1996) IEEE Transactions on Parallel and Distributed Systems , vol.7 , Issue.8 , pp. 791-805
- Dusseau, A.C.¹ Culler, D.E.² Schauser, K.E.³ Martin, R.P.⁴

32
- 0026310281
- Radix sort for vector multiprocessors
- Albuquerque, NM
- Marco Zagha and Guy Blelloch, "Radix sort for vector multiprocessors," in Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Albuquerque, NM, 1991, pp. 712-721.
- (1991) Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing , pp. 712-721
- Zagha, M.¹ Blelloch, G.²

33
- 78651284120
- Scan primitives for GPU computing
- San Diego, CA
- Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens, "Scan Primitives for GPU Computing," in GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, San Diego, CA, 2007, pp. 97-106.
- (2007) GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics Hardware , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

34
- 56849107345
- Efficient gather and scatter operations on graphics processors
- Reno, NV
- Bingsheng He, Naga K Govindaraju, Qiong Luo, and Burton Smith, "Efficient gather and scatter operations on graphics processors," in SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, NV, 2007, pp. 1-12.
- (2007) SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing , pp. 1-12
- He, B.¹ Govindaraju, N.K.² Luo, Q.³ Smith, B.⁴

35
- 77952833958
- Efficient parallel scan algorithms for GPUs
- Shubhabrata Sengupta, Mark Harris, and Michael Garland, "Efficient Parallel Scan Algorithms for GPUs," NVIDIA, Technical Report NVR-2008-003, 2008.
- (2008) NVIDIA, Technical Report NVR-2008-003
- Sengupta, S.¹ Harris, M.² Garland, M.³

36
- 57349184047
- Fast scan algorithms on graphics processors
- Island of Kos, Greece
- Yuri Dotsenko, Naga K Govindaraju, Peter-Pike Sloan, Charles Boyd, and John Manferdelli, "Fast scan algorithms on graphics processors," in ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, Island of Kos, Greece, 2008, pp. 205-213.
- (2008) ICS '08: Proceedings of the 22nd annual international conference on Supercomputing , pp. 205-213
- Dotsenko, Y.¹ Govindaraju, N.K.² Sloan, P.-P.³ Boyd, C.⁴ Manferdelli, J.⁵

37
- 78149268496
- University of Virginia, Department of Computer Science, Charlottesville, VA, USA, Technical Report CS2009-14
- Duane Merrill and Andrew Grimshaw, "Parallel Scan for Stream Architectures," University of Virginia, Department of Computer Science, Charlottesville, VA, USA, Technical Report CS2009-14, 2009.
- (2009) Parallel Scan for Stream Architectures
- Merrill, D.¹ Grimshaw, A.²

38
- 0026984897
- Solving linear recurrences with loop raking
- Guy E Blelloch, Siddhartha Chatterjee, and Marco Zagha, "Solving Linear Recurrences with Loop Raking," in Proceedings of the 6th International Parallel Processing Symposium, 1992, pp. 416-424. (Pubitemid 23612403)
- (1992) Proceedings of the International Conference on Parallel Processing , pp. 416-424
- Blelloch Guy, E.¹ Chatterjee Siddhartha² Zagha Marco³

39
- 67650661447
- Mark Harris. (2007) Optimizing parallel reduction in CUDA. [Online]. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/ reduction/doc/reduc tion.pdf
- (2007) Optimizing parallel reduction in CUDA. [Online]
- Harris, M.¹

40
- 0015651305
- A parallel algorithm for the efficient solution of a general class of recurrence equations
- Peter M Kogge and Harold S Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. Comput., vol. 22, pp. 786-793, 1973.
- (1973) IEEE Trans. Comput. , vol.22 , pp. 786-793
- Kogge, P.M.¹ Stone, H.S.²

41
- 78149265385
- University of Virginia, Charlottesville, VA, Technical Report CS2010-03
- Duane Merrill and Andrew Grimshaw, "Revisiting Sorting for GPGPU Stream Architectures," University of Virginia, Charlottesville, VA, Technical Report CS2010-03, 2010.
- (2010) Revisiting Sorting for GPGPU Stream Architectures
- Merrill, D.¹ Grimshaw, A.²

42
- 84978498075
- An improved supercomputer sorting benchmark
- Minneapolis, Minnesota
- K Thearling and S Smith, "An improved supercomputer sorting benchmark," in Proceedings of the 1992 ACM/IEEE conference on Supercomputing (SC '92), Minneapolis, Minnesota, 1992, pp. 14-19.
- Proceedings of the 1992 ACM/IEEE conference on Supercomputing (SC '92) , vol.1992 , pp. 14-19
- Thearling, K.¹ Smith, S.²

43
- 78650745912
- GPU-Quicksort: A practical Quicksort algorithm for graphics processors
- Daniel Cederman and Philippas Tsigas, "GPU-Quicksort: A practical Quicksort algorithm for graphics processors," J. Exp. Algorithmics, vol. 14, pp. 1.4-1.24, 2009.
- (2009) J. Exp. Algorithmics , vol.14 , pp. 14-124
- Cederman, D.¹ Tsigas, P.²

44
- 77954709551
- Nikolaj Leischner, Vitaly Osipov, and Peter Sanders, GPU sample sort, 2009.
- (2009) GPU Sample Sort
- Leischner, N.¹ Osipov, V.² P Sanders³

45
- 79959760800
- Frank Dehne and Hamidreza Zaboli, Deterministic Sample Sort For GPUs, 2010.
- (2010) Deterministic Sample Sort For GPUs
- Dehne, F.¹ Zaboli, H.²

46
- 0002924004
- School of Computer Science, Carnegie Mellon University, Technical Report CMU-CS-
- Guy Blelloch, "Prefix Sums and Their Applications," School of Computer Science, Carnegie Mellon University, Technical Report CMU-CS-90-190, 1990.
- (1990) Prefix Sums and Their Applications , pp. 90-190
- Blelloch, G.¹

47
- 79959727793
- Thrust. [Online]
- Thrust. [Online]. http://code.google.com/p/thrust/.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.