SCOPUS 정보 검색 플랫폼

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Volumn , Issue , 2014, Pages 819-828

BigKernel - High performance CPU-GPU communication pipelining for big data-style applications

(2) Mokhtari, Reza a Stumm, Michael a

a UNIVERSITY OF TORONTO (Canada)

Author keywords

communication; CPU; GPU; optimization; stream processing

Indexed keywords

COMMUNICATION; DISTRIBUTED PARAMETER NETWORKS; OPTIMIZATION; PROGRAM COMPILERS;

COMPILER TRANSFORMATIONS; CPU; GPU; HARDWARE ARCHITECTURE; LIMITED BANDWIDTH; MEMORY BANDWIDTHS; PROGRAMMING MODELS; STREAM PROCESSING;

BIG DATA;

EID: 84906695225 PISSN: 15302075 EISSN: 23321237 Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2014.89 Document Type: Conference Paper

Times cited : (19)

References (20)

1
- 83155188972
- CudaDMA: Optimizing GPU memory bandwidth via warp specialization
- M. Bauer, H. Cook, and B. Khailany. cudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization. In Proc. 2011 Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SC), pages 12:1-12:11, 2011.
- (2011) Proc. 2011 Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SC) , pp. 121-1211
- Bauer, M.¹ Cook, H.² Khailany, B.³

2
- 80051839847
- Meraculous: De novo genome assembly with short paired-end reads
- J. Chapman, I. Ho, S. Sunkara, S. Luo, G. Schroth, and D. Rokhsar. Meraculous: De Novo Genome Assembly with Short Paired-End Reads. PLoS ONE, (8):e23501, 2011.
- (2011) PLoS ONE , Issue.8
- Chapman, J.¹ Ho, I.² Sunkara, S.³ Luo, S.⁴ Schroth, G.⁵ Rokhsar, D.⁶

3
- 70350667043
- Map-reduce meets wider varieties of applications
- S. Chen and S. Schlosser. Map-reduce Meets Wider Varieties of Applications. Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
- (2008) Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05
- Chen, S.¹ Schlosser, S.²

4
- 0001483604
- Communication optimizations for irregular scientific computations on distributed memory architectures
- R. Das, M. Uysal, J. Saltz, and Y. Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, pages 462-478, 1994.
- (1994) Journal of Parallel and Distributed Computing , pp. 462-478
- Das, R.¹ Uysal, M.² Saltz, J.³ Hwang, Y.⁴

5
- 77952251540
- An asymmetric distributed shared memory model for heterogeneous parallel systems
- I. Gelado, J. Stone, J. Cabezas, S. Patel, N. Navarro, and W. Hwu. An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems. In Proc. 15th Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 347-358, 2010.
- (2010) Proc. 15th Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 347-358
- Gelado, I.¹ Stone, J.² Cabezas, J.³ Patel, S.⁴ Navarro, N.⁵ Hwu, W.⁶

6
- 79957493885
- Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
- C. Gregg and K. Hazelwood. Where is the Data? Why You Cannot Debate CPU vs. GPU Performance Without the Answer. In Proc. IEEE Intl. Symp. on Performance Analysis of Systems and Software (ISPASS), pages 134-144, 2011.
- (2011) Proc. IEEE Intl. Symp. on Performance Analysis of Systems and Software (ISPASS) , pp. 134-144
- Gregg, C.¹ Hazelwood, K.²

7
- 80053240142
- Automated architecture-aware mapping of streaming applications onto GPUs
- A. Hagiescu, H. Huynh, W. Wong, and R. Goh. Automated Architecture-Aware Mapping of Streaming Applications onto GPUs. In Proc. 25th IEEE Intl. Parallel Distributed Processing Symp. (IPDPS), pages 467-478, 2011.
- (2011) Proc. 25th IEEE Intl. Parallel Distributed Processing Symp. (IPDPS) , pp. 467-478
- Hagiescu, A.¹ Huynh, H.² Wong, W.³ Goh, R.⁴

8
- 67650673468
- HiCUDA: A high-level directive-based language for GPU programming
- T. Han and T. Abdelrahman. hiCUDA: a High-level Directive-based Language for GPU Programming. In Proc. 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU), pages 52-61, 2009.
- (2009) Proc. 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU) , pp. 52-61
- Han, T.¹ Abdelrahman, T.²

9
- 84858381077
- Scalable framework for mapping streaming applications onto multi-GPU systems
- H. Huynh, A. Hagiescu, W. Wong, and R. Goh. Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems. In Proc. 17th Symp. on Principles and Practice of Parallel Programming (PPoPP), pages 1-10, 2012.
- (2012) Proc. 17th Symp. on Principles and Practice of Parallel Programming (PPoPP) , pp. 1-10
- Huynh, H.¹ Hagiescu, A.² Wong, W.³ Goh, R.⁴

10
- 84863423999
- Dynamically managed data for CPU-GPU architectures
- T. Jablin, J. Jablin, P. Prabhu, F. Liu, and D. August. Dynamically managed data for CPU-GPU architectures. In Proc. 10th Intl. Symp. on Code Generation and Optimization (CGO), pages 165-174, 2012.
- (2012) Proc. 10th Intl. Symp. on Code Generation and Optimization (CGO) , pp. 165-174
- Jablin, T.¹ Jablin, J.² Prabhu, P.³ Liu, F.⁴ August, D.⁵

11
- 79959904195
- Automatic CPU-GPU communication management and optimization
- T. Jablin, P. Prabhu, J. Jablin, N. Johnson, S. Beard, and D. August. Automatic CPU-GPU Communication Management and Optimization. In Proc. 32nd Conf. on Programming Language Design and Implementation (PLDI), pages 142-151, 2011.
- (2011) Proc. 32nd Conf. on Programming Language Design and Implementation (PLDI) , pp. 142-151
- Jablin, T.¹ Prabhu, P.² Jablin, J.³ Johnson, N.⁴ Beard, S.⁵ August, D.⁶

12
- 84867427249
- Communication library to overlap computation and communication for opencl application
- T. Komoda, S. Miwa, and H. Nakamura. Communication Library to Overlap Computation and Communication for OpenCL Application. In Proc. 26th IEEE Intl. Parallel and Distributed Processing Symp. Workshops PhD Forum (IPDPSW), pages 567-573, 2012.
- (2012) Proc. 26th IEEE Intl. Parallel and Distributed Processing Symp. Workshops PhD Forum (IPDPSW) , pp. 567-573
- Komoda, T.¹ Miwa, S.² Nakamura, H.³

13
- 67650081010
- OpenMP to GPGPU: A compiler framework for automatic translation and optimization
- S. Lee, S. Min, and R. Eigenmann. OpenMP to GPGPU: a Compiler Framework for Automatic Translation and Optimization. In Proc. 14th Symp. on Principles and Practice of Parallel Programming (PPoPP), pages 101-110, 2009.
- (2009) Proc. 14th Symp. on Principles and Practice of Parallel Programming (PPoPP) , pp. 101-110
- Lee, S.¹ Min, S.² Eigenmann, R.³

14
- 84867509022
- Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme
- S. Pai, R. Govindarajan, and M. Thazhuthaveetil. Fast and Efficient Automatic Memory Management for GPUs Using Compiler-assisted Runtime Coherence Scheme. In Proc. 21st Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 33-42, 2012.
- (2012) Proc. 21st Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT) , pp. 33-42
- Pai, S.¹ Govindarajan, R.² Thazhuthaveetil, M.³

15
- 0028741448
- Run-time and compile-time support for adaptive irregular problems
- S. Sharma, R. Ponnusamy, B. Moon, Y. Hwang, R. Das, and J. Saltz. Run-time and Compile-time Support for Adaptive Irregular Problems. In Proc. of the 1994 Conf. on Supercomputing, pages 97-106, 1994.
- (1994) Proc. of the 1994 Conf. on Supercomputing , pp. 97-106
- Sharma, S.¹ Ponnusamy, R.² Moon, B.³ Hwang, Y.⁴ Das, R.⁵ Saltz, J.⁶

16
- 58449127539
- CUDA-lite: Reducing GPU programming complexity
- S. Ueng, M. Lathara, S. Baghsorkhi, and W. Hwu. CUDA-Lite: Reducing GPU Programming Complexity. In Languages and Compilers for Parallel Computing, volume 5335, pages 1-15. 2008.
- (2008) Languages and Compilers for Parallel Computing , vol.5335 , pp. 1-15
- Ueng, S.¹ Lathara, M.² Baghsorkhi, S.³ Hwu, W.⁴

17
- 80053277662
- OpinionFinder: A system for subjectivity analysis
- HLT-Demo '05
- T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan. OpinionFinder: a System for Subjectivity Analysis. In Proc. HLT/EMNLP on Interactive Demonstrations, HLT-Demo '05, pages 34-35, 2005.
- (2005) Proc. HLT/EMNLP on Interactive Demonstrations , pp. 34-35
- Wilson, T.¹ Hoffmann, P.² Somasundaran, S.³ Kessler, J.⁴ Wiebe, J.⁵ Choi, Y.⁶ Cardie, C.⁷ Riloff, E.⁸ Patwardhan, S.⁹

18
- 70350678845
- JCUDA: A programmer-friendly interface for accelerating java programs with cuda
- Y. Yan, M. Grossman, and V. Sarkar. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In Euro-Par 2009 Parallel Processing, volume 5704 of Lecture Notes in Computer Science, pages 887-899. 2009.
- (2009) Euro-Par 2009 Parallel Processing, Volume 5704 of Lecture Notes in Computer Science , pp. 887-899
- Yan, Y.¹ Grossman, M.² Sarkar, V.³

19
- 77954691442
- A GPGPU compiler for cemory cptimization and carallelism canagement
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU Compiler for Cemory Cptimization and Carallelism Canagement. In Proc. 2010 Conf. on Programming Language Design and Implementation (PLDI), pages 86-97, 2010.
- (2010) Proc. 2010 Conf. on Programming Language Design and Implementation (PLDI) , pp. 86-97
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

20
- 79953126288
- On-the-fly elimination of dynamic irregularities for GPU computing
- E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly Elimination of Dynamic Irregularities for GPU Computing. In Proc. 16th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 369-380, 2011.
- (2011) Proc. 16th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , pp. 369-380
- Zhang, E.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.