SCOPUS 정보 검색 플랫폼

Journal of Parallel and Distributed Computing

Volumn 72, Issue 9, 2012, Pages 1117-1126

Performance models for asynchronous data transfers on consumer Graphics Processing Units

(4) Gómez Luna, Juan a González Linares, José María b Benavides, José Ignacio a Guil, Nicolás b

a UNIVERSITY OF CÓRDOBA (Spain)

b UNIVERSITY OF MÁLAGA (Spain)

Author keywords

Asynchronous transfers; CUDA; GPU; Overlapping of communication and computation; Streams

Indexed keywords

APPLICATION PROGRAMMING INTERFACES (API); APPLICATION PROGRAMS; COMPUTER GRAPHICS; COMPUTER GRAPHICS EQUIPMENT; DATA TRANSFER; MEMORY ARCHITECTURE; PROGRAM PROCESSORS; SOFTWARE DESIGN;

ASYNCHRONOUS DATA TRANSFERS; ASYNCHRONOUS TRANSFERS; COMPUTE UNIFIED DEVICE ARCHITECTURE(CUDA); CUDA; HIGH-PERFORMANCE COMPUTING APPLICATIONS; PERFORMANCE BOTTLENECKS; SOFTWARE DEVELOPMENT KIT; STREAMS;

GRAPHICS PROCESSING UNIT;

EID: 84865705401 PISSN: 07437315 EISSN: None Source Type: Journal
DOI: 10.1016/j.jpdc.2011.07.011 Document Type: Article

Times cited : (38)

References (20)

1
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- New York, NY, USA
- Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, Wen-mei W. Hwu, An adaptive performance modeling tool for GPU architectures, in: Proceedings of the 15th ACM SIGPLAN Smposium on Principles and Practice of Parallel Programming, PPoPP'10, ACM, New York, NY, USA, 2010, pp. 105-114.
- (2010) Proceedings of the 15th ACM SIGPLAN Smposium on Principles and Practice of Parallel Programming, PPoPP'10, ACM , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

2
- 70350647423
- Parallelization of a video segmentation algorithm on CUDA enabled graphics processing units
- Juan Gómez-Luna, José María González-Linares, José Ignacio Benavides, Nicolás Guil, Parallelization of a video segmentation algorithm on CUDAenabled graphics processing units, in: Proc. of the Int'l Euro-Par Conference on Parallel Processing, EuroPar'09, 2009, pp. 924-935.
- (2009) Proc. of the Int'l Euro-Par Conference on Parallel Processing, EuroPar'09 , pp. 924-935
- Gómez-Luna, J.¹ María González-Linares, J.² Ignacio Benavides, J.³ Guil, N.⁴

3
- 85030495350
- Performance models for CUDA streams on NVIDIA GeForce series
- Juan Gómez-Luna, José María González-Linares, José Ignacio Benavides, Nicolás Guil, Performance models for CUDA streams on NVIDIA GeForce series, Technical Report, University of Málaga, 2011. http://www.ac.uma.es/~vip/publications/UMA-DAC-11-02.pdf.
- (2011) Technical Report, University of Málaga
- Gómez-Luna, J.¹ María González-Linares, J.² Ignacio Benavides, J.³ Guil, N.⁴

4
- 70449464746
- Using GPU to accelerate cache simulation
- Wan Han, Gao Xiaopeng, Wang Zhiqiang, Li Yi, Using GPU to accelerate cache simulation, in: IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009, pp. 565-570.
- (2009) IEEE International Symposium on Parallel and Distributed Processing with Applications , pp. 565-570
- Wan, H.¹ Gao, X.² Wang, Z.³ Yi, L.⁴

5
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- ACM, New York, NY, USA
- Sunpyo Hong, Hyesoon Kim, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, in: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA'09, ACM, New York, NY, USA, 2009, pp. 152-163.
- (2009) Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA'09 , pp. 152-163
- Hong, S.¹ Kim, H.²

6
- 79953071805
- Sponge: Portable stream programming on graphics engines
- ACM, New York, NY, USA
- Amir H. Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, Scott Mahlke, Sponge: portable stream programming on graphics engines, in: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11, ACM, New York, NY, USA, 2011, pp. 381-392.
- (2011) Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11 , pp. 381-392
- Hormati, A.H.¹ Samadi, M.² Woh, M.³ Mudge, T.⁴ Mahlke, S.⁵

7
- 85030487244
- Khronos Group, OpenCL. http://www.khronos.org/opencl/.

8
- 77954574912
- Stream experiments: Toward latency hiding in GPGPU
- Supada Laosooksathit, Chokchai B. Leangsuksun, Abdelkader Baggag, Clayton F. Chandler, Stream experiments: toward latency hiding in GPGPU, in: Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN'10, 2010, pp. 240-248.
- (2010) Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN'10 , pp. 240-248
- Laosooksathit, S.¹ Leangsuksun, C.B.² Baggag, A.³ Chandler, C.F.⁴

9
- 77954725202
- Overlapping communication and computation by using a hybrid MPI/SMPSS approach
- ACM, New York, NY, USA
- Vladimir Marjanović, Jesús Labarta, Eduard Ayguadé, Mateo Valero, Overlapping communication and computation by using a hybrid MPI/SMPSS approach, in: Proceedings of the 24th ACM International Conference on Supercomputing, ICS'10, ACM, New York, NY, USA, 2010, pp. 5-16.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing, ICS'10
- Marjanović, V.¹ Labarta, J.² Ayguadé, E.³ Valero, M.⁴

10
- 84870637076
- MPI Forum, The Message Passing Interface Standard. http://www.mpiforum. org/.
- The Message Passing Interface Standard

11
- 79955066309
- August
- NVIDIA, CUDA C best practices guide 3.2, August 2010. http://developer.download.nvidia.com/compute/cuda/3-2/toolkit/docs/ CUDA-C-Best-Practices-Guide.pdf.
- (2010) CUDA C Best Practices Guide 3.2

12
- 79955074605
- September
- NVIDIA, CUDA C programming guide 3.2, September 2010. http://developer.download.nvidia.com/compute/cuda/3-2/toolkit/docs/ CUDA-C-Programming-Guide.pdf.
- (2010) CUDA C Programming Guide 3.2

13
- 85030498833
- NVIDIA, CUDA SDK code samples: matrix multiplication. http://developer.download.nvidia.com/compute/cuda/sdk/website/samples. html#matrixMul.
- CUDA SDK Code Samples: Matrix Multiplication

14
- 84873478761
- NVIDIA, CUDA Zone. http://www.nvidia.com/object/cuda-home-new.html.
- CUDA Zone

15
- 84870669626
- Peripheral Component Interconnect Special Interest Group, PCI Express. http://www.pcisig.com/.
- PCI Express

16
- 70350754499
- Adapting a message-driven parallel application to GPU-accelerated clusters
- IEEE Press, Piscataway, NJ, USA
- James C. Phillips, John E. Stone, Klaus Schulten, Adapting a message-driven parallel application to GPU-accelerated clusters, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC'08, IEEE Press, Piscataway, NJ, USA, 2008, pp. 8:1-8:9.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC'08 , pp. 81-89
- Phillips, J.C.¹ Stone, J.E.² Schulten, K.³

17
- 57349086588
- White Paper
- V. Podlozhnyuk, Histogram calculation in CUDA, White Paper, 2007. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/ histogram256/doc/histogram.pdf.
- (2007) Histogram Calculation in CUDA
- Podlozhnyuk, V.¹

18
- 67650812067
- Synergistic execution of stream programs on multicores with accelerators
- ACM, New York, NY, USA
- Abhishek Udupa, R. Govindarajan, Matthew J. Thazhuthaveetil, Synergistic execution of stream programs on multicores with accelerators, in: Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES'09, ACM, New York, NY, USA, 2009, pp. 99-108.
- (2009) Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES'09 , pp. 99-108
- Udupa, A.¹ Govindarajan, R.² Thazhuthaveetil, M.J.³

19
- 77954735057
- Improving linpack performance on SMP clusters with asynchronous MPI programming
- Ta Quoc Viet, Tsutomu Yoshinaga, Improving linpack performance on SMP clusters with asynchronous MPI programming, IPSJ Digital Courier 2 (2006) 598-606.
- (2006) IPSJ Digital Courier , vol.2 , pp. 598-606
- Quoc Viet, T.¹ Yoshinaga, T.²

20
- 79955921273
- A quantitative performance analysis model for GPU architectures
- February
- Yao Zhang, John D. Owens, A quantitative performance analysis model for GPU architectures, in: Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture, HPCA 17, February 2011.
- (2011) Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture, HPCA 17
- Zhang, Y.¹ Owens, J.D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.