SCOPUS 정보 검색 플랫폼

16th International Conference on High Performance Computing, HiPC 2009 - Proceedings

Volumn , Issue , 2009, Pages 463-472

A performance prediction model for the CUDA GPGPU platform

(6) Kothapalli, Kishore a Mukherjee, Rishabh a Suhail Rehman, M a Patidar, Suryakant a Narayanan, P J a Srinathan, Kannan a

a INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY (India)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL POWER; COMPUTING PLATFORM; GENERAL PURPOSE PROGRAMMING; GRAPHICS PROCESSING UNITS; MATRIX MULTIPLICATION; MEMORY ACCESS; MEMORY HIERARCHY; PARALLEL COMPUTING PLATFORM; PERFORMANCE MODEL; PERFORMANCE PREDICTION; PSEUDO CODES;

ASYMPTOTIC ANALYSIS; MATHEMATICAL MODELS; PARALLEL ARCHITECTURES;

PROGRAM PROCESSORS;

EID: 77952204218 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HIPC.2009.5433179 Document Type: Conference Paper

Times cited : (74)

References (27)

1
- 0025256474
- A Simple Randomized Parallel Algorithm for List-Ranking
- ANDERSON, R. J., AND MILLER, G. L. A Simple Randomized Parallel Algorithm for List-Ranking. Information Processing Letters 33, 5 (1990), 269-273.
- (1990) Information Processing Letters , vol.33 , Issue.5 , pp. 269-273
- Anderson, R.J.¹ Miller, G.L.²

2
- 34548718683
- On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking
- BADER, D. A., AGARWAL, V., AND MADDURI, K. On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking. In Proc. of IEEE IPDPS (2007), pp. 1-10.
- Proc. of IEEE IPDPS (2007) , pp. 1-10
- Bader, D.A.¹ Agarwal, V.² Madduri, K.³

3
- 0024684158
- Faster Optimal Parallel Prefix sums and List Ranking
- COLE, R., AND VISHKIN, U. Faster Optimal Parallel Prefix sums and List Ranking. Information and Computation 81, 3 (1989), 334-352.
- (1989) Information and Computation , vol.81 , Issue.3 , pp. 334-352
- Cole, R.¹ Vishkin, U.²

4
- 0009346826
- LogP: Towards a Realistic Model of Parallel Computation
- CULLER, D., KARP, R., PATTERSON, D., A. SAHAY, K. E. S., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. LogP: Towards a Realistic Model of Parallel Computation. In Proc. ACM PPoPP (1993), pp. 1-12.
- Proc. ACM PPoPP (1993) , pp. 1-12
- Culler, D.¹ Karp, R.² Patterson, D.³ A Sahay, K.E.S.⁴ Santos, E.⁵ Subramonian, R.⁶ Von Eicken, T.⁷

5
- 0018052202
- Parallelism in Random Access Machines
- FORTUNE, S., AND WYLLIE, J. Parallelism in Random Access Machines. In Proc. ACM STOC (1978), pp. 114-118.
- Proc. ACM STOC (1978) , pp. 114-118
- Fortune, S.¹ Wyllie, J.²

6
- 77952113176
- The Queue-Read Queue-Write Asynchronous PRAM model
- GIBBONS, P. B., MATIAS, Y., AND RAMACHANDRAN, V. The Queue-Read Queue-Write Asynchronous PRAM model. In In Proc. of EURO-PAR (1996).
- In Proc. of EURO-PAR (1996)
- Gibbons, P.B.¹ Matias, Y.² Ramachandran, V.³

7
- 0032107941
- The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
- GIBBONS, P. B., MATIAS, Y., AND RAMACHANDRAN, V. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms. SIAM J. Comp. 28, 2 (1999), 733-769.
- (1999) SIAM J. Comp. , vol.28 , Issue.2 , pp. 733-769
- Gibbons, P.B.¹ Matias, Y.² Ramachandran, V.³

8
- 35948931417
- Cache-efficient numerical algorithms using graphics hardware
- DOI 10.1016/j.parco.2007.09.006, PII S0167819107001056, High-Performance Computing Using Accelerators
- GOVINDARAJU, N., AND MANOCHA, D. Cache-efficient Numerical Algorithms using Graphics Hardware. Parallel Computing 33, 10-11 (2007), 663-684. (Pubitemid 350064315)
- (2007) Parallel Computing , vol.33 , Issue.10-11 , pp. 663-684
- Govindaraju, N.K.¹ Manocha, D.²

9
- 58349086140
- Memory Locality Exploitation Strategies for FFT on the CUDA Architecture
- GUTIERREZ, E., ROMERO, S., TRENAS, M. A., AND ZAPATA, E. L. Memory Locality Exploitation Strategies for FFT on the CUDA Architecture. In Proc. of High Perf. Comp. for Comp. Sci. (2008), pp. 430-443.
- Proc. of High Perf. Comp. for Comp. Sci. (2008) , pp. 430-443
- Gutierrez, E.¹ Romero, S.² Trenas, M.A.³ Zapata, E.L.⁴

10
- 38349041620
- Accelerating Large Graph Algorithms on the GPU Using CUDA
- HARISH, P., AND NARAYANAN, P. Accelerating Large Graph Algorithms on the GPU Using CUDA. In High Performance Computing HiPC 2007 (2007), pp. 197-208.
- (2007) High Performance Computing HiPC 2007 , pp. 197-208
- Harish, P.¹ Narayanan, P.²

11
- 84979025439
- Designing Practical Efficient Algorithms for Symmetric Multiprocessors
- HELMAN, D. R., AND JÀ JÀ, J. Designing Practical Efficient Algorithms for Symmetric Multiprocessors. In Proc. ALENEX (1999), pp. 37-56.
- Proc. ALENEX (1999) , pp. 37-56
- Helman, D.R.¹ Jàjà, J.²

12
- 70450231944
- An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness
- ACM
- HONG, S., AND KIM, H. An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness. In ISCA '09: Proceedings of the 36th Annual International Symposium on Computer Architecture (New York, NY, USA, 2009), ACM, pp. 152-163.
- ISCA '09: Proceedings of the 36th Annual International Symposium on Computer Architecture (New York, NY, USA, 2009) , pp. 152-163
- Hong, S.¹ Kim, H.²

13
- 10644274024
- Hardware Accelerated Wavelet Transformations
- HOPF, M., AND ERTL, T. Hardware Accelerated Wavelet Transformations. In Proc. EG Symposium on Visualization (2000), pp. 93-103.
- Proc. EG Symposium on Visualization (2000) , pp. 93-103
- Hopf, M.¹ Ertl, T.²

14
- 0003819667
- Addison-Wesley
- JÀJÀ, J. Introduction to Parallel Algorithms. Addison-Wesley, 1992.
- (1992) Introduction to Parallel Algorithms
- Jàjà, J.¹

15
- 51849108616
- Canny Edge Detection on Nvidia CUDA
- LUO, Y., AND DURAISWAMI, R. Canny Edge Detection on Nvidia CUDA. In Proc. of IEEE Computer Vision and Pattern Recognition (2008), pp. 1-8.
- Proc. of IEEE Computer Vision and Pattern Recognition (2008) , pp. 1-8
- Luo, Y.¹ Duraiswami, R.²

16
- 70449723385
- Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs
- ACM
- MENG, J., AND SKADRON, K. Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs. In ICS '09: Proceedings of the 23rd international conference on Supercomputing (New York, NY, USA, 2009), ACM, pp. 256-265.
- ICS '09: Proceedings of the 23rd International Conference on Supercomputing (New York, NY, USA, 2009) , pp. 256-265
- Meng, J.¹ Skadron, K.²

17
- 55649109070
- Addison-Wesley Professional
- NGUYEN, H. GPU Gems 3. Addison-Wesley Professional, 2007.
- (2007) GPU Gems 3
- Nguyen, H.¹

18
- 78651550268
- Scalable Parallel Programming with CUDA
- NICKOLLS, J., BUCK, I., GARLAND, M., AND SKADRON, K. Scalable Parallel Programming with CUDA. ACM Queue 6, 2 (2008), 40-53.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

19
- 64449084366
- NVIDIA CORPORATION Tech. rep.
- NVIDIA CORPORATION. CUDA: Compute Unified Device Architecture Programming Guide. Tech. rep., 2007.
- (2007) CUDA: Compute Unified Device Architecture Programming Guide

20
- 70450200606
- Tech. Rep. IIIT/ TR/2009/99, IIIT-Hyderabad
- PATIDAR, S., AND NARAYANAN, P. J. Scalable Split and Gather Primitives for the GPU. Tech. Rep. IIIT/ TR/2009/99, IIIT-Hyderabad, 2009.
- (2009) Scalable Split and Gather Primitives for the GPU
- Patidar, S.¹ Narayanan, P.J.²

21
- 70449700267
- Fast and Scalable List Ranking on the GPU
- ACM
- REHMAN, M. S., KOTHAPALLI, K., AND NARAYANAN, P. J. Fast and Scalable List Ranking on the GPU. In ICS '09: Proceedings of the 23rd International Conference on Supercomputing (New York, NY, USA, 2009), ACM, pp. 235-243.
- ICS '09: Proceedings of the 23rd International Conference on Supercomputing (New York, NY, USA, 2009) , pp. 235-243
- Rehman, M.S.¹ Kothapalli, K.² Narayanan, P.J.³

22
- 43449094719
- Program Optimization Space Pruning for a Multithreaded GPU
- RYOO, S., RODRIGUES, C. I., STONE, S., BAGHSORKHI, S. S., UENG, S.-Z., STRATTON, J. A., AND HWU, W. W. Program Optimization Space Pruning for a Multithreaded GPU. In Proc. the Intl. Symp. Code Gen. and Opt. (2008), pp. 195-204.
- Proc. the Intl. Symp. Code Gen. and Opt. (2008) , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.W.⁷

23
- 70449793037
- Exploring the Multiple-GPU Design Space
- IEEE Computer Society
- SCHAA, D., AND KAELI, D. Exploring the Multiple-GPU Design Space. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (Washington, DC, USA, 2009), IEEE Computer Society, pp. 1-12.
- IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (Washington, DC, USA, 2009) , pp. 1-12
- Schaa, D.¹ Kaeli, D.²

24
- 78651284120
- Scan Primitives for GPU Computing
- SENGUPTA, S., HARRIS, M., ZHANG, Y., AND OWENS, J. D. Scan Primitives for GPU Computing. In Proc. ACM Symp. Graphics Hardware (2007), pp. 97-106.
- Proc. ACM Symp. Graphics Hardware (2007) , pp. 97-106
- Sengupta, S.¹ Harris, M.² Zhang, Y.³ Owens, J.D.⁴

25
- 0025467711
- A Bridging Model for Parallel Computation
- VALIANT, L. G. A Bridging Model for Parallel Computation. Comm. ACM 33, 8 (1990), 103-111.
- (1990) Comm. ACM , vol.33 , Issue.8 , pp. 103-111
- Valiant, L.G.¹

26
- 51849086874
- CUDA Cuts: Fast Graph Cuts on the GPU
- VINEET, V., AND NARAYANAN, P. J. CUDA Cuts: Fast Graph Cuts on the GPU. In Proceedings of the CVPR Workshop on Visual Computer Vision on GPUs (2008).
- Proceedings of the CVPR Workshop on Visual Computer Vision on GPUs (2008)
- Vineet, V.¹ Narayanan, P.J.²

27
- 0242424254
- Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages
- VIOLA, I., KANITSAR, A., AND GROLLER, E. Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages. In Proc. IEEE Visualization (2003), pp. 309-316.
- Proc. IEEE Visualization (2003) , pp. 309-316
- Viola, I.¹ Kanitsar, A.² Groller, E.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.