SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Parallel Processing

Volumn 2015-December, Issue , 2015, Pages 210-219

In-place data sliding algorithms for many-core architectures

(5) Luna, Juan Gómez a Chang, Li Wen b Sung, I Jui c Hwu, Wen Mei b Guil, Nicolás d

a UNIVERSITY OF CÓRDOBA (Spain)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

c MulticoreWare Inc (United States)

d UNIVERSITY OF MÁLAGA (Spain)

Author keywords

In place; Relational algebra; Stream compaction

Indexed keywords

ALGEBRA; ALGORITHMS; COMPACTION; MEMORY ARCHITECTURE; PROGRAM PROCESSORS;

BULK SYNCHRONOUS PARALLEL MODEL; DATA MANIPULATIONS; DATA MOVEMENTS; MANY-CORE ARCHITECTURE; MEMORY REQUIREMENTS; ON-BOARD MEMORY; RELATIONAL ALGEBRA; STATE OF THE ART;

COMPUTER ARCHITECTURE;

EID: 84976501593 PISSN: 01903918 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICPP.2015.30 Document Type: Conference Paper

Times cited : (16)

References (23)

1
- 84855693023
- Speeding up the evaluation phase of GP classification algorithms on GPUs
- A. Cano, A. Zafra, and S. Ventura, "Speeding up the evaluation phase of GP classification algorithms on GPUs, " Soft Computing, vol. 16, no. 2, pp. 187-202, 2012.
- (2012) Soft Computing , vol.16 , Issue.2 , pp. 187-202
- Cano, A.¹ Zafra, A.² Ventura, S.³

2
- 70449723384
- Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
- S. Venkatasubramanian and R. W. Vuduc, "Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems, " in Proceedings of the 23rd International Conference on Supercomputing, 2009, pp. 244-255.
- (2009) Proceedings of the 23rd International Conference on Supercomputing , pp. 244-255
- Venkatasubramanian, S.¹ Vuduc, R.W.²

3
- 84885199802
- Relational algorithms for multi-bulk-synchronous processors
- G. Diamos, H. Wu, J. Wang, A. Lele, and S. Yalamanchili, "Relational algorithms for multi-bulk-synchronous processors, " in Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013, pp. 301-302.
- (2013) Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 301-302
- Diamos, G.¹ Wu, H.² Wang, J.³ Lele, A.⁴ Yalamanchili, S.⁵

4
- 84942543488
- A portable benchmark suite for highly parallel data intensive query processing
- I. Saeed, J. Young, and S. Yalamanchili, "A portable benchmark suite for highly parallel data intensive query processing, " in Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications, 2015, pp. 31-38.
- (2015) Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications , pp. 31-38
- Saeed, I.¹ Young, J.² Yalamanchili, S.³

5
- 63449109979
- Fast BVH construction on GPUs
- C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha, "Fast BVH construction on GPUs, " Computer Graphics Forum, vol. 28, no. 2, 2009.
- (2009) Computer Graphics Forum , vol.28 , Issue.2
- Lauterbach, C.¹ Garland, M.² Sengupta, S.³ Luebke, D.⁴ Manocha, D.⁵

6
- 38149002407
- Whitted raytracing for dynamic scenes using a ray-space hierarchy on the GPU
- D. Roger, U. Assarsson, and N. Holzschuch, "Whitted raytracing for dynamic scenes using a ray-space hierarchy on the GPU, " in Proceedings of the 18th Eurographics Conference on Rendering Techniques, 2007, pp. 99-110.
- (2007) Proceedings of the 18th Eurographics Conference on Rendering Techniques , pp. 99-110
- Roger, D.¹ Assarsson, U.² Holzschuch, N.³

7
- 79955825340
- Load balancing versus occupancy maximization on graphics processing units: The generalized hough transform as a case study
- J. Gómez-Luna, J. M. González-Linares, J. Ignacio Benavides, E. L. Zapata, and N. Guil, "Load balancing versus occupancy maximization on graphics processing units: The generalized hough transform as a case study, " International Journal of High Performance Computing Applications, vol. 25, no. 2, pp. 205-222, 2011.
- (2011) International Journal of High Performance Computing Applications , vol.25 , Issue.2 , pp. 205-222
- Gómez-Luna, J.¹ González-Linares, J.M.² Ignacio Benavides, J.³ Zapata, E.L.⁴ Guil, N.⁵

8
- 84865331893
- Algorithm and data optimization techniques for scaling to massively threaded systems
- J. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu, W.-M. Hwu, and N. Obeid, "Algorithm and data optimization techniques for scaling to massively threaded systems, " Computer, vol. 45, no. 8, pp. 26-32, 2012.
- (2012) Computer , vol.45 , Issue.8 , pp. 26-32
- Stratton, J.¹ Rodrigues, C.² Sung, I.-J.³ Chang, L.-W.⁴ Anssari, N.⁵ Liu, G.⁶ Hwu, W.-M.⁷ Obeid, N.⁸

9
- 84896811291
- A decomposition for in-place matrix transposition
- B. Catanzaro, A. Keller, and M. Garland, "A decomposition for in-place matrix transposition, " in Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014, pp. 193-206.
- (2014) Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 193-206
- Catanzaro, B.¹ Keller, A.² Garland, M.³

10
- 84976494728
- In-place matrix transposition on GPUs
- J. Gómez-Luna, I. Sung, L.-W. Chang, J. González-Linares, N. Guil, and W.-M. W. Hwu, "In-place matrix transposition on GPUs, " Parallel and Distributed Systems, IEEE Transactions on, vol. PP, no. 99, pp. 1-1, 2015.
- (2015) Parallel and Distributed Systems, IEEE Transactions on , vol.PP , Issue.99 , pp. 1
- Gómez-Luna, J.¹ Sung, I.² Chang, L.-W.³ González-Linares, J.⁴ Guil, N.⁵ Hwu, W.-M.W.⁶

11
- 84896808494
- Ph. D. dissertation, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering
- I.-J. Sung, "Data layout transformation through in-place transposition, " Ph. D. dissertation, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, 2013.
- (2013) Data Layout Transformation Through In-place Transposition
- Sung, I.-J.¹

12
- 84882564541
- Thrust: A productivity-oriented library for CUDA
- N. Bell and J. Hoberock, "Thrust: A productivity-oriented library for CUDA, " GPU Computing Gems: Jade Edition, 2012.
- (2012) GPU Computing Gems: Jade Edition
- Bell, N.¹ Hoberock, J.²

13
- 0029492798
- Transposing a matrix on a vector computer
- M. Dow, "Transposing a matrix on a vector computer, " Parallel Computing, vol. 21, no. 12, pp. 1997-2005, 1995.
- (1995) Parallel Computing , vol.21 , Issue.12 , pp. 1997-2005
- Dow, M.¹

14
- 84875175606
- StreamScan: Fast scan algorithms for GPUs without global barrier synchronization
- S. Yan, G. Long, and Y. Zhang, "StreamScan: Fast scan algorithms for GPUs without global barrier synchronization, " in Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013, pp. 229-238.
- (2013) Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 229-238
- Yan, S.¹ Long, G.² Zhang, Y.³

15
- 84946479869
- Master's thesis, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering
- L.-W. Chang, "Scalable parallel tridiagonal algorithms with diagonal pivoting and their optimization for many-core architectures, " Master's thesis, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, 2014.
- (2014) Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-core Architectures
- Chang, L.-W.¹

16
- 57349184047
- Fast scan algorithms on graphics processors
- Y. Dotsenko, N. K. Govindaraju, P.-P. Sloan, C. Boyd, and J. Manferdelli, "Fast scan algorithms on graphics processors, " in Proceedings of the 22nd Annual International Conference on Supercomputing, 2008, pp. 205-213.
- (2008) Proceedings of the 22nd Annual International Conference on Supercomputing , pp. 205-213
- Dotsenko, Y.¹ Govindaraju, N.K.² Sloan, P.-P.³ Boyd, C.⁴ Manferdelli, J.⁵

17
- 67650661447
- NVIDIA CUDA SDK. NVIDIA
- M. Harris, "Optimizing parallel reduction in CUDA, " in NVIDIA CUDA SDK. NVIDIA, 2007.
- (2007) Optimizing Parallel Reduction in CUDA
- Harris, M.¹

18
- 0002924004
- Carnegie Mellon University, Technical Report CMU-CS-90-190
- G. E. Blelloch, "Prefix sums and their applications, " Carnegie Mellon University, Technical Report CMU-CS-90-190, 1990.
- (1990) Prefix Sums and Their Applications
- Blelloch, G.E.¹

19
- 84877899022
- Optimizing parallel prefix operations for the Fermi architecture
- M. Harris and M. Garland, "Optimizing parallel prefix operations for the Fermi architecture, " GPU Computing Gems: Jade Edition, 2012.
- (2012) GPU Computing Gems: Jade Edition
- Harris, M.¹ Garland, M.²

20
- 84964843902
- Kepler shuffle: Tips and tricks
- Julien Demouth, "Kepler shuffle: Tips and tricks, " in GPU Technology Conference, 2013.
- (2013) GPU Technology Conference
- Demouth, J.¹

21
- 84961314978
- Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures
- H.-S. Kim, I. El Hajj, J. Stratton, S. Lumetta, and W.-M. Hwu, "Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures, " in Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015, pp. 257-268.
- (2015) Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 257-268
- Kim, H.-S.¹ El Hajj, I.² Stratton, J.³ Lumetta, S.⁴ Hwu, W.-M.⁵

22
- 84976513969
- Andrew Adinetz, "CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics, " 2014, http: //devblogs. nvidia. com/parallelforall/ CUDA-pro-tip-optimized-filtering-warp-aggregated-atomics/.
- (2014) CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics
- Adinetz, A.¹

23
- 77953983493
- Inter-block GPU communication via fast barrier synchronization
- S. Xiao and W.-c. Feng, "Inter-block GPU communication via fast barrier synchronization, " in Parallel Distributed Processing, 2010 IEEE International Symposium on, 2010, pp. 1-12.
- (2010) Parallel Distributed Processing, 2010 IEEE International Symposium on , pp. 1-12
- Xiao, S.¹ Feng, W.-C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.