메뉴 건너뛰기




Volumn 5335 LNCS, Issue , 2008, Pages 16-30

MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs

Author keywords

[No Author keywords available]

Indexed keywords

COMPILER TRANSFORMATIONS; PARALLEL EXECUTIONS; PARALLEL-PROGRAMMING MODELS; PROGRAM SEMANTICS; RUN TIME SYSTEM (RTS);

EID: 58449109179     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-540-89740-8_2     Document Type: Conference Paper
Times cited : (124)

References (20)
  • 1
    • 58449123679 scopus 로고    scopus 로고
    • NVIDIA: NVIDIA CUDA, http://www.nvidia.com/cuda
    • NVIDIA: NVIDIA CUDA, http://www.nvidia.com/cuda
  • 2
    • 44849137198 scopus 로고    scopus 로고
    • NVIDIA Tesla: A unified graphics and computing architecture
    • in press
    • Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28(2) (in press, 2008)
    • (2008) IEEE Micro , vol.28 , Issue.2
    • Lindholm, E.1    Nickolls, J.2    Oberman, S.3    Montrym, J.4
  • 3
    • 30744459395 scopus 로고    scopus 로고
    • Woop, S., Schmittler, J., Slusallek, P.: RPU: A programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24(3), 434-444 (2005)
    • Woop, S., Schmittler, J., Slusallek, P.: RPU: A programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24(3), 434-444 (2005)
  • 4
    • 58449094436 scopus 로고    scopus 로고
    • Intel: Intel 64 and IA-32 Architectures Software Developer's Manual (May 2007)
    • Intel: Intel 64 and IA-32 Architectures Software Developer's Manual (May 2007)
  • 5
    • 58449084520 scopus 로고    scopus 로고
    • Devices, A.M.: 3DNow! technology manual. Technical Report 21928, Advanced Micro Devices, Sunnyvale, CA (May 1998)
    • Devices, A.M.: 3DNow! technology manual. Technical Report 21928, Advanced Micro Devices, Sunnyvale, CA (May 1998)
  • 8
    • 35048854568 scopus 로고    scopus 로고
    • Lee, S., Johnson, T., Eigenmann, R.: Cetus - An extensible compiler infrastructure for source-to-source transformation. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, 2958, Springer, Heidelberg (2004)
    • Lee, S., Johnson, T., Eigenmann, R.: Cetus - An extensible compiler infrastructure for source-to-source transformation. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, Springer, Heidelberg (2004)
  • 9
    • 35248845008 scopus 로고    scopus 로고
    • Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the schedule clause really necessary in OpenMP? In: Proceedings of the International Workshop on OpenMP Applications and Tools, June 2003, pp. 147-159 (2003)
    • Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the schedule clause really necessary in OpenMP? In: Proceedings of the International Workshop on OpenMP Applications and Tools, June 2003, pp. 147-159 (2003)
  • 10
    • 84876909872 scopus 로고    scopus 로고
    • Markatos, E.P., LeBlanc, T.J.: Using processor affinity in loop scheduling on shared-memory multiprocessors. In: Proceedings of the 1992 International Conference on Supercomputing, July 1992, pp. 104-113 (1992)
    • Markatos, E.P., LeBlanc, T.J.: Using processor affinity in loop scheduling on shared-memory multiprocessors. In: Proceedings of the 1992 International Conference on Supercomputing, July 1992, pp. 104-113 (1992)
  • 11
    • 0026264626 scopus 로고    scopus 로고
    • Hummel, S.F., Schonberg, E., Flynn, L.E.: Factoring: A practical and robust method for scheduling parallel loops. In: Proceedings of the 1001 International Conference of Supercomputing, June 1991, pp. 610-632 (1991)
    • Hummel, S.F., Schonberg, E., Flynn, L.E.: Factoring: A practical and robust method for scheduling parallel loops. In: Proceedings of the 1001 International Conference of Supercomputing, June 1991, pp. 610-632 (1991)
  • 13
    • 79959466764 scopus 로고    scopus 로고
    • Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. in: Proceedings of the 13th ACM S1GPLAN Symposium on Principles and Practice of Parallel Programming (February 2008)
    • Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. in: Proceedings of the 13th ACM S1GPLAN Symposium on Principles and Practice of Parallel Programming (February 2008)
  • 14
    • 43449094719 scopus 로고    scopus 로고
    • Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded GPU. in: Proceedings of the 2008 international Symposium on Code Generation and Optimization (April 2008)
    • Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded GPU. in: Proceedings of the 2008 international Symposium on Code Generation and Optimization (April 2008)
  • 15
    • 58449113459 scopus 로고    scopus 로고
    • Volkov, V., Demmel, J.W.: LU, QR and Cholesky factorizations using vector capabilities of CPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, CA (May 2008)
    • Volkov, V., Demmel, J.W.: LU, QR and Cholesky factorizations using vector capabilities of CPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, CA (May 2008)
  • 19
    • 0003487728 scopus 로고
    • High Performance Fortran language specification, version 1.0
    • Technical Report CRPC-TR92225, Rice University May
    • Forum, H.P.F.: High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Rice University (May 1993)
    • (1993)
    • Forum, H.P.F.1
  • 20
    • 57349101237 scopus 로고    scopus 로고
    • Liao, S.W., Du, Z., Wu, G., Lueh, G.Y.: Data and computation transformations for Brook streaming applications on multiprocessors. in: Proceedings of the 4th international Symposium on Code Generation and Optimization, March 2006, pp. 196-207 (2006)
    • Liao, S.W., Du, Z., Wu, G., Lueh, G.Y.: Data and computation transformations for Brook streaming applications on multiprocessors. in: Proceedings of the 4th international Symposium on Code Generation and Optimization, March 2006, pp. 196-207 (2006)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.