SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 23, Issue 7, 2011, Pages 681-693

Fast in-place, comparison-based sorting with CUDA: A study with bitonic sort

(3) Peters, Hagen a Schulz Hildebrandt, Ole a Luttenberger, Norbert a

a UNIVERSITY OF KIEL (Germany)

Author keywords

bitonic sort; CUDA; GPGPU; parallel sorting

Indexed keywords

MEMORY ARCHITECTURE; NETWORK ARCHITECTURE; PARALLEL PROCESSING SYSTEMS; PROGRAM PROCESSORS;

BITONIC SORT; CUDA; CUDA (COMPUTE UNIFIED DEVICE ARCHITECTURE); GENERAL-PURPOSE COMPUTING; GPGPU; HIGH PROCESSING POWER; PARALLEL SORTING; PARALLEL SORTING ALGORITHMS;

GRAPHICS PROCESSING UNIT;

EID: 79953269092 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.1686 Document Type: Article

Times cited : (31)

References (24)

1
- 0034459255
- Efficient conditional operations for data-parallel architectures
- Kapasi UJ, Dally WJ, Rixner S, Mattson PR, Owens JD, Khailany B,. Efficient conditional operations for data-parallel architectures. MICRO 33: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM: New York, NY, U.S.A., 2000; 159-170. (Pubitemid 32255840)
- (2000) Proceedings of the Annual International Symposium on Microarchitecture , pp. 159-170
- Kapasi Ujval, J.¹ Dally William, J.² Rixner Scott³ Mattson Peter, R.⁴ Owens John, D.⁵ Khailany Brucek⁶

2
- 10444224900
- Photon mapping on programmable graphics hardware
- Eurographics Association, Aire-la-Ville, Switzerland
- Purcell TJ, Donner C, Cammarano M, Wann Jensen H, Hanrahan P,. Photon mapping on programmable graphics hardware. HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. Eurographics Association, Aire-la-Ville, Switzerland, 2003; 41-50.
- (2003) HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware , pp. 41-50
- Purcell, T.J.¹ Donner, C.² Cammarano, M.³ Wann Jensen, H.⁴ Hanrahan, P.⁵

3
- 27144467106
- Uberflow: A GPU-based particle engine
- ACM: New York, NY, U.S.A
- Kipfer P, Segal M, Westermann R,. Uberflow: A GPU-based particle engine. HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. ACM: New York, NY, U.S.A., 2004; 115-122.
- (2004) HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware , pp. 115-122
- Kipfer, P.¹ Segal, M.² Westermann, R.³

4
- 29844445666
- University of North Carolina-Chapel Hill
- Govindaraju N, Raghuvanshi N, Henson M, Manocha D,. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Technical Report, University of North Carolina-Chapel Hill, 2005.
- (2005) A Cache-efficient Sorting Algorithm for Database and Data Mining Computations Using Graphics Processors. Technical Report
- Govindaraju, N.¹ Raghuvanshi, N.² Henson, M.³ Manocha, D.⁴

5
- 33847140219
- Gpu-abisort: Optimal parallel sorting on stream architectures
- Rhodes Island, Greece
- Greb A, Zachmann G,. Gpu-abisort: Optimal parallel sorting on stream architectures. 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, 2006.
- (2006) 20th International Parallel and Distributed Processing Symposium (IPDPS 2006)
- Greb, A.¹ Zachmann, G.²

6
- 0000773795
- Sorting networks and their applications
- Atlantic City, U.S.A.
- Batcher KE,. Sorting networks and their applications. AFIPS Spring Joint Computer Conference, Atlantic City, U.S.A., 1967.
- (1967) AFIPS Spring Joint Computer Conference
- Batcher, K.E.¹

7
- 36049035884
- Parallel prefix sum (scan) with cuda
- Addison-Wesley: Reading, MA
- Harris M, Sengupta S, Owens JD,. Parallel prefix sum (scan) with cuda. GPU Gems 3. Addison-Wesley: Reading, MA, 2007.
- (2007) GPU Gems 3
- Harris, M.¹ Sengupta, S.² Owens, J.D.³

8
- 46749134899
- Broad-phase collision detection with cuda
- Addison-Wesley: Reading, MA
- Grand SL,. Broad-phase collision detection with cuda. GPU Gems 3. Addison-Wesley: Reading, MA, 2007.
- (2007) GPU Gems 3
- Grand, S.L.¹

9
- 56849107345
- Efficient gather and scatter operations on graphics processors
- ACM: New York, NY, U.S.A
- He B, Govindaraju NK, Luo Q, Smith B,. Efficient gather and scatter operations on graphics processors. SC '07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM: New York, NY, U.S.A., 2007; 1-12.
- (2007) SC '07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing , pp. 1-12
- He, B.¹ Govindaraju, N.K.² Luo, Q.³ Smith, B.⁴

10
- 78651284120
- Scan primitives for GPU computing
- Eurographics Association, Aire-la-Ville, Switzerland
- Sengupta S, Harris M, Zhang Y, Owens JD,. Scan primitives for GPU computing. GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware. Eurographics Association, Aire-la-Ville, Switzerland, 2007; 97-106.
- (2007) GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

11
- 57749169896
- A practical quicksort algorithm for graphics processors
- Springer: Berlin, Heidelberg
- Cederman D, Tsigas P,. A practical quicksort algorithm for graphics processors. ESA '08: Proceedings of the 16th Annual European Symposium on Algorithms. Springer: Berlin, Heidelberg, 2008; 246-258.
- (2008) ESA '08: Proceedings of the 16th Annual European Symposium on Algorithms , pp. 246-258
- Cederman, D.¹ Tsigas, P.²

12
- 51449089689
- vol. Academic Press, Inc.: Orlando, FL, U.S.A
- Sintorn E, Assarsson U,. Fast Parallel GPU-sorting using a Hybrid Algorithm, vol. 68. Academic Press, Inc.: Orlando, FL, U.S.A., 2008; 1381-1388.
- (2008) Fast Parallel GPU-sorting Using A Hybrid Algorithm , vol.68 , pp. 1381-1388
- Sintorn, E.¹ Assarsson, U.²

13
- 70450077484
- Designing efficient sorting algorithms for many core GPUs
- Rome, Italy, May
- Satish N, Harris M, Garland M,. Designing efficient sorting algorithms for many core GPUs. Proceedings 23rd IEEE International Parallel and Distributed Processing Symposium, Rome, Italy, May 2009.
- (2009) Proceedings 23rd IEEE International Parallel and Distributed Processing Symposium
- Satish, N.¹ Harris, M.² Garland, M.³

14
- 77954709551
- Leischner N, Osipov V, Sanders P,. GPU sample sort. CoRR; abs/0909.5649, 2009.
- (2009) GPU Sample Sort. CoRR; abs/0909.5649
- Leischner, N.¹ Osipov, V.² Sanders, P.³

15
- 77954068819
- Fast in-place sorting with CUDA based on bitonic sort
- Wroclaw, Poland, September
- Peters H, Schulz-Hildebrandt O, Luttenberger N,. Fast in-place sorting with CUDA based on bitonic sort. PPAM09: Proceedings of the International Conference on Parallel Processing and Applied Mathematics, Wroclaw, Poland, September 2009.
- (2009) PPAM09: Proceedings of the International Conference on Parallel Processing and Applied Mathematics
- Peters, H.¹ Schulz-Hildebrandt, O.² Luttenberger, N.³

16
- 77954082286
- Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead
- Atlanta, U.S.A., April
- Peters H, Schulz-Hildebrandt O, Luttenberger N,. Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead. IPDPS2010: Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium, Workshops and Phd Forum, Atlanta, U.S.A., April 2010.
- (2010) IPDPS2010: Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium, Workshops and Phd Forum
- Peters, H.¹ Schulz-Hildebrandt, O.² Luttenberger, N.³

17
- 35948991669
- Available at: [December ]
- NVIDIA. Nvidia CUDA programming guide. Available at: [December 2010 ].
- (2010) NVIDIA. Nvidia CUDA Programming Guide

18
- 63549097654
- Mars: A mapreduce framework on graphics processors
- ACM: New York, NY, U.S.A
- He B, Fang W, Luo Q, Govindaraju NK, Wang T,. Mars: A mapreduce framework on graphics processors. PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM: New York, NY, U.S.A., 2008; 260-269.
- (2008) PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques , pp. 260-269
- He, B.¹ Fang, W.² Luo, Q.³ Govindaraju, N.K.⁴ Wang, T.⁵

19
- 85030321143
- Google Inc. Mapreduce: Simplified data processing on large clusters
- New York, U.S.A., USENIX Association
- Dean J, Ghemawat S,. Google Inc. Mapreduce: Simplified data processing on large clusters. OSDI'04: Proceedings of the Sixth Conference on Symposium on Opearting Systems Design and Implementation, New York, U.S.A., USENIX Association, 2004.
- (2004) OSDI'04: Proceedings of the Sixth Conference on Symposium on Opearting Systems Design and Implementation
- Dean, J.¹ Ghemawat, S.²

20
- 77952245006
- Accelerating SQL database operations on a GPU with CUDA
- ACM: New York, NY, U.S.A
- Bakkum P, Skadron K,. Accelerating SQL database operations on a GPU with CUDA. GPGPU'10: Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units. ACM: New York, NY, U.S.A., 2010; 94-103.
- (2010) GPGPU'10: Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units , pp. 94-103
- Bakkum, P.¹ Skadron, K.²

21
- 33947607609
- GPUTeraSort: High performance graphics co-processor sorting for large database management
- DOI 10.1145/1142473.1142511, SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
- Govindaraju, Naga, Gray, Jim, Kumar, Ritesh, Manocha, Dinesh,. Gputerasort: High performance graphics co-processor sorting for large database management. SIGMOD'06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM: New York, NY, U.S.A., 2006; 325-336. (Pubitemid 46950863)
- (2006) Proceedings of the ACM SIGMOD International Conference on Management of Data , pp. 325-336
- Govindaraju, N.¹ Gray, J.² Kumar, R.³ Manocha, D.⁴

22
- 80051640506
- Available at: [December ]
- NVIDIA. Fermi compute architecture white paper. Available at: [December 2010 ].
- (2010) NVIDIA. Fermi Compute Architecture White Paper

23
- 79953282067
- [April 2000 ]
- [April 2000 ].

24
- 84865096511
- Efficient implementation of sorting on multi-core SIMD CPU architecture
- Chhugani J, Nguyen AD, Lee VW, Macy W, Hagog M, Chen Y-K, Baransi A, Kumar S, Dubey P,. Efficient implementation of sorting on multi-core SIMD CPU architecture. Proceedings of the VLDB Endowment 2008; 1 (2): 1313-1324.
- (2008) Proceedings of the VLDB Endowment , vol.1 , Issue.2 , pp. 1313-1324
- Chhugani, J.¹ Nguyen, A.D.² Lee, V.W.³ MacY, W.⁴ Hagog, M.⁵ Chen, Y.-K.⁶ Baransi, A.⁷ Kumar, S.⁸ Dubey, P.⁹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.