메뉴 건너뛰기




Volumn , Issue , 2012, Pages 74-83

On improving the performance of multi-threaded CUDA applications with concurrent kernel execution by kernel reordering

Author keywords

Concurrent kernel execution; CUDA; GP GPU; Multi threaded applications

Indexed keywords

CONCURRENT KERNEL EXECUTION; CUDA; GP-GPU; GRAPHICS PROCESSING UNITS; MASSIVE PARALLELISM; MULTI-THREADED APPLICATION; MULTI-THREADED PROGRAMS; MULTITHREADED; NANODROPLET; NUMERICAL COMPUTATIONS; PERFORMANCE IMPROVEMENTS; PROBLEM SIZE; REAL-WORLD APPLICATION; SINGLE-THREADED; SMALL MOLECULES; VIABLE SOLUTIONS;

EID: 84870690907     PISSN: 21665133     EISSN: 2166515X     Source Type: Conference Proceeding    
DOI: 10.1109/SAAHPC.2012.12     Document Type: Conference Paper
Times cited : (32)

References (23)
  • 3
    • 79953001643 scopus 로고    scopus 로고
    • Astrophysical supercomputing with GPUs: Critical decisions for early adopters
    • C. J. Fluke, D. G. Barnes, B. R. Barsdell, and A. H. Hassan, "Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters, " Publ.Astron.Soc.Austral., vol. 28, pp. 15-27, 2011.
    • (2011) Publ.Astron.Soc.Austral. , vol.28 , pp. 15-27
    • Fluke, C.J.1    Barnes, D.G.2    Barsdell, B.R.3    Hassan, A.H.4
  • 4
    • 77951106340 scopus 로고    scopus 로고
    • CUDASW++2.0: Enhanced Smith-Waterman protein database search on CUDAenabled GPUs based on SIMT and virtualized SIMD abstractions
    • Y. Liu, B. Schmidt, and D. L. Maskell, "CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDAenabled GPUs based on SIMT and virtualized SIMD abstractions, " BMC Research Notes, vol. 3, p. 93, 2010.
    • (2010) BMC Research Notes , vol.3 , pp. 93
    • Liu, Y.1    Schmidt, B.2    Maskell, D.L.3
  • 5
    • 84863045359 scopus 로고    scopus 로고
    • Mapping of BLASTP Algorithm onto GPU Clusters
    • IEEE
    • W. Liu, B. Schmidt, Y. Liu, G. Voss, and W. M̈uller-Wittig, "Mapping of BLASTP Algorithm onto GPU Clusters, " in ICPADS. IEEE, 2011, pp. 236-243.
    • (2011) ICPADS , pp. 236-243
    • Liu, W.1    Schmidt, B.2    Liu, Y.3    Voss, G.4    M̈uller-Wittig, W.5
  • 6
    • 80052890093 scopus 로고    scopus 로고
    • CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled graphics hardware
    • W. Liu, B. Schmidt, and W. M̈uller-Wittig, "CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware, " IEEE/ACM Trans. Comput. Biology Bioinform., vol. 8, no. 6, pp. 1678-1684, 2011.
    • (2011) IEEE/ACM Trans. Comput. Biology Bioinform , vol.8 , Issue.6 , pp. 1678-1684
    • Liu, W.1    Schmidt, B.2    M̈uller-Wittig, W.3
  • 7
    • 85054441478 scopus 로고    scopus 로고
    • GPU algorithms for molecular modeling
    • J. Dongarra, D. A. Bader, and J. Kurzak, Eds. Chapman & Hall/CRC Press, ch. 16
    • J. E. Stone, D. J. Hardy, B. Isralewitz, and K. Schulten, "GPU algorithms for molecular modeling, " in Scientific Computing with Multicore and Accelerators, J. Dongarra, D. A. Bader, and J. Kurzak, Eds. Chapman & Hall/CRC Press, 2011, ch. 16, pp. 351-371.
    • (2011) Scientific Computing with Multicore and Accelerators , pp. 351-371
    • Stone, J.E.1    Hardy, D.J.2    Isralewitz, B.3    Schulten, K.4
  • 9
    • 85008042563 scopus 로고    scopus 로고
    • OpenMM: A hardware- independent framework for molecular simulations
    • Jul. [Online]. Available
    • P. Eastman and V. Pande, "OpenMM: A Hardware- Independent Framework for Molecular Simulations, " Computing in Science and Engg., vol. 12, no. 4, pp. 34-39, Jul. 2010. [Online]. Available: http://dx.doi.org/10.1109/MCSE. 2010.27.
    • (2010) Computing in Science and Engg. , vol.12 , Issue.4 , pp. 34-39
    • Eastman, P.1    Pande, V.2
  • 10
    • 79954507312 scopus 로고    scopus 로고
    • Dynamic precision for electron repulsion integral evaluation on graphical processing units (GPUs)
    • N. Luehr, I. S. Ufimtsev, and T. J. Martinez, "Dynamic Precision for Electron Repulsion Integral Evaluation on Graphical Processing Units (GPUs), " Journal of Chemical Theory and Computation, vol. 7, pp. 949-954, 2011.
    • (2011) Journal of Chemical Theory and Computation , vol.7 , pp. 949-954
    • Luehr, N.1    Ufimtsev, I.S.2    Martinez, T.J.3
  • 11
    • 73949083571 scopus 로고    scopus 로고
    • Quantum chemistry on graphical processing units. 3, analytical energy gradients and first principles molecular dynamics
    • I. S. Ufimtsev and T. J. Martinez, "Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients and First Principles Molecular Dynamics, " Journal of Chemical Theory and Computation, vol. 5, pp. 2619-2628, 2009.
    • (2009) Journal of Chemical Theory and Computation , vol.5 , pp. 2619-2628
    • Ufimtsev, I.S.1    Martinez, T.J.2
  • 14
    • 84870706049 scopus 로고    scopus 로고
    • NVIDIA CUDA, v4.0
    • NVIDIA CUDA C programming guide, v4.0, 2011.
    • (2011) C Programming Guide
  • 17
    • 80052554250 scopus 로고    scopus 로고
    • Scaling scientific applications on clusters of hybrid multicore/GPU nodes
    • ser. CF '11. New York, NY, USA: ACM, 6:1-6:10. [Online]. Available
    • L. Wang, M. Huang, V. K. Narayana, and T. El- Ghazawi, "Scaling scientific applications on clusters of hybrid multicore/GPU nodes, " in Proceedings of the 8th ACM International Conference on Computing Frontiers, ser. CF '11. New York, NY, USA: ACM, 2011, pp. 6:1-6:10. [Online]. Available: http://doi.acm.org/10.1145/2016604.2016612.
    • (2011) Proceedings of the 8th ACM International Conference on Computing Frontiers
    • Wang, L.1    Huang, M.2    Narayana, V.K.3    El-Ghazawi, T.4
  • 19
    • 70350754499 scopus 로고    scopus 로고
    • Adapting a message-driven parallel application to GPU-accelerated clusters
    • ser. SC '08. Piscataway, NJ, USA: IEEE Press, 8:1-8:9. [Online]. Available
    • J. C. Phillips, J. E. Stone, and K. Schulten, "Adapting a message-driven parallel application to GPU-accelerated clusters, " in Proceedings of the 2008 ACM/IEEE conference on Supercomputing, ser. SC '08. Piscataway, NJ, USA: IEEE Press, 2008, pp. 8:1-8:9. [Online]. Available: http://dl.acm.org/citation.cfm?id=1413370.1413379.
    • (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
    • Phillips, J.C.1    Stone, J.E.2    Schulten, K.3
  • 21
    • 79954481928 scopus 로고    scopus 로고
    • Adaptive spectral clustering for conformation analysis
    • F. Haack and S. R̈oblitz and O. Scharkoi and B. Schmidt and M. Weber, "Adaptive Spectral Clustering for Conformation Analysis, " in AIP Conference Proceedings, vol. 1281, no. 1, 2010, pp. 1585-1588, http://link.aip.org/link/doi/10.1063/1.3498116.
    • (2010) AIP Conference Proceedings , vol.1281 , Issue.1 , pp. 1585-1588
    • Haack, F.1    R̈oblitz, S.2    Scharkoi, O.3    Schmidt, B.4    Weber, M.5
  • 22
    • 0037571112 scopus 로고    scopus 로고
    • Merck molecular force field. I-V. Basis, form, scope, parameterization, and performance of MMFF94
    • T. A. Halgren, "Merck molecular force field. I-V. Basis, form, scope, parameterization, and performance of MMFF94, " J. of Comp. Chem., vol. 17, no. 5-6, pp. 490-641, 1996.
    • (1996) J. of Comp. Chem. , vol.17 , Issue.5-6 , pp. 490-641
    • Halgren, T.A.1
  • 23
    • 46249092554 scopus 로고    scopus 로고
    • GROMACS 4: Algorithms for highly efficient, load- balanced, and scalable molecular simulation
    • [Online]. Available
    • B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl, "GROMACS 4: Algorithms for Highly Efficient, Load- Balanced, and Scalable Molecular Simulation, " Journal of Chemical Theory and Computation, vol. 4, no. 3, pp. 435-447, 2008. [Online]. Available: http://pubs.acs.org/doi/abs/10.1021/ ct700301q.
    • (2008) Journal of Chemical Theory and Computation , vol.4 , Issue.3 , pp. 435-447
    • Hess, B.1    Kutzner, C.2    Van Der Spoel, D.3    Lindahl, E.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.