SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

Volumn , Issue , 2009, Pages 292-299

Program optimization of array-intensive SPEC2k benchmarks on multithreaded GPU using CUDA and brook+

(4) Wang, Guibin a Tang, Tao a Fang, Xudong a Ren, Xiaoguang a

a NATIONAL UNIVERSITY OF DEFENSE TECHNOLOGY (China)

Author keywords

Brook+; CUDA; GPGPU; mgrid; Optimization; Swim

Indexed keywords

BROOK+; COMPUTING CAPACITY; DATA LOCALITY; DATA PARALLEL; ELIMINATION TECHNOLOGY; EQUILIBRIUM POINT; GENERAL PURPOSE; GRAPHIC PROCESSING UNITS; HARDWARE AND SOFTWARE; LONG MEMORIES; MULTI-LEVEL MEMORY HIERARCHY; MULTITHREADED; PARALLEL COMPUTING; PROGRAM OPTIMIZATION; SOFTWARE PLATFORMS;

COMPUTER GRAPHICS EQUIPMENT; COMPUTER SYSTEMS; MULTIPROCESSING SYSTEMS; OPTIMIZATION; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 77949647837 PISSN: 15219097 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICPADS.2009.12 Document Type: Conference Paper

Times cited : (12)

References (15)

1
- 79959466764
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (23th PPOPP'2008). Salt Lake City, UT: ACM SIGPLAN, Feb. 2008, pp. 73-82.
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA," in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (23th PPOPP'2008). Salt Lake City, UT: ACM SIGPLAN, Feb. 2008, pp. 73-82.

2
- 51449118065
- A performance study of general-purpose applications on graphics processors using CUDA
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, "A performance study of general-purpose applications on graphics processors using CUDA," J. Parallel Distrib. Comput, vol. 68, no. 10, pp. 1370-1380, 2008.
- (2008) J. Parallel Distrib. Comput , vol.68 , Issue.10 , pp. 1370-1380
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

3
- 67650021816
- G. Quintana-Ort?́, F. D. Igual, E. S. Quintana-Ort?́, and R. A. van de Geijn, Solving dense linear systems on platforms with multiple hardware accelerators, in PPOPP, D. A. Reed and V. Sarkar, Eds. ACM, 2009, pp. 121-130.
- G. Quintana-Ort?́, F. D. Igual, E. S. Quintana-Ort?́, and R. A. van de Geijn, "Solving dense linear systems on platforms with multiple hardware accelerators," in PPOPP, D. A. Reed and V. Sarkar, Eds. ACM, 2009, pp. 121-130.

4
- 35948931417
- Cache-efficient numerical algorithms using graphics hardware
- N. K. Govindaraju and D. Manocha, "Cache-efficient numerical algorithms using graphics hardware," Parallel Comput., vol. 33, no. 10-11, pp. 663-684, 2007.
- (2007) Parallel Comput , vol.33 , Issue.10-11 , pp. 663-684
- Govindaraju, N.K.¹ Manocha, D.²

5
- 10644248153
- Brook for GPUs: Stream computing on graphics hardware
- Aug
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," ACM Transactions on Graphics, vol. 23, no. 3, pp. 777-786, Aug. 2004.
- (2004) ACM Transactions on Graphics , vol.23 , Issue.3 , pp. 777-786
- Buck, I.¹ Foley, T.² Horn, D.³ Sugerman, J.⁴ Fatahalian, K.⁵ Houston, M.⁶ Hanrahan, P.⁷

6
- 70749131714
- AMD, April
- AMD, "ATI stream computing user guide v1.4beta," April 2009.
- (2009) ATI stream computing user guide v1.4beta

7
- 62949190469
- NVIDIA
- NVIDIA, "Compute unified device architecture programming guide v2.1beta," 2009.
- (2009) Compute unified device architecture programming guide v2.1beta

8
- 24644456455
- Automatic tiling of iterative stencil loops
- Z. Li and Y. Song, "Automatic tiling of iterative stencil loops," ACM Trans. Program. Lang. Syst, vol. 26, no. 6, pp. 975-1028, 2004.
- (2004) ACM Trans. Program. Lang. Syst , vol.26 , Issue.6 , pp. 975-1028
- Li, Z.¹ Song, Y.²

9
- 84877082695
- Identifying and exploiting spatial regularity in data memory references
- Washington, DC, USA: IEEE Computer Society
- T. Mohan, B. R. d. Supinski, S. A. McKee, F. Mueller, A. Yoo, and M. Schulz, "Identifying and exploiting spatial regularity in data memory references," in SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. Washington, DC, USA: IEEE Computer Society, 2003, p. 49.
- (2003) SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing , pp. 49
- Mohan, T.¹ Supinski, B.R.D.² McKee, S.A.³ Mueller, F.⁴ Yoo, A.⁵ Schulz, M.⁶

10
- 23544482118
- SC
- G. Rivera and C.-W. Tseng, "Tiling optimizations for 3D scientific computations," in SC, 2000.
- (2000) Tiling optimizations for 3D scientific computations
- Rivera, G.¹ Tseng, C.-W.²

11
- 67650702543
- Architecture-aware optimization targeting multithreaded stream computing
- New York, NY, USA: ACM
- B. Jang, S. Do, H. Pien, and D. Kaeli, "Architecture-aware optimization targeting multithreaded stream computing," in GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York, NY, USA: ACM, 2009, pp. 62-70.
- (2009) GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 62-70
- Jang, B.¹ Do, S.² Pien, H.³ Kaeli, D.⁴

12
- 67650081010
- Openmp to gpgpu: A compiler framework for automatic translation and optimization
- New York, NY, USA: ACM
- S. Lee, S.-J. Min, and R. Eigenmann, "Openmp to gpgpu: a compiler framework for automatic translation and optimization," in PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2009, pp. 101-110.
- (2009) PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming , pp. 101-110
- Lee, S.¹ Min, S.-J.² Eigenmann, R.³

13
- 43449094719
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu, Program optimization space pruning for a multithreaded gpu, in CGO, M. L. Soffa and E. Duesterwald, Eds. ACM, 2008, pp. 195-204.
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W. mei W. Hwu, "Program optimization space pruning for a multithreaded gpu," in CGO, M. L. Soffa and E. Duesterwald, Eds. ACM, 2008, pp. 195-204.

14
- 67650784628
- Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on cmps
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt, "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps," SIGARCH Comput. Archit. News, vol. 36, no. 1, pp. 277-286, 2008.
- (2008) SIGARCH Comput. Archit. News , vol.36 , Issue.1 , pp. 277-286
- Suleman, M.A.¹ Qureshi, M.K.² Patt, Y.N.³

15
- 70349169075
- Analyzing CUDA Workloads Using a Detailed GPU Simulator
- April
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 163-174.
- (2009) IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009) , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.