메뉴 건너뛰기




Volumn 230, Issue 13, 2011, Pages 5383-5398

Importance of explicit vectorization for CPU and GPU software performance

Author keywords

GPU; Ising model; Monte Carlo; Optimization; Performance; Vectorization

Indexed keywords

COMPUTER GRAPHICS; COMPUTER GRAPHICS EQUIPMENT; FLOCCULATION; ISING MODEL; MONTE CARLO METHODS; PROGRAM PROCESSORS;

EID: 79955479184     PISSN: 00219991     EISSN: 10902716     Source Type: Journal    
DOI: 10.1016/j.jcp.2011.03.041     Document Type: Article
Times cited : (22)

References (25)
  • 4
    • 48349093945 scopus 로고    scopus 로고
    • High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy
    • Samant S.S., Xia J., Muyan-Özçelik P., Owens J.D. High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy. Medical Physics 2008, 35(8):3546-3553.
    • (2008) Medical Physics , vol.35 , Issue.8 , pp. 3546-3553
    • Samant, S.S.1    Xia, J.2    Muyan-Özçelik, P.3    Owens, J.D.4
  • 5
    • 67349267818 scopus 로고    scopus 로고
    • GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model
    • Preis T., Virnau P., et al. GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. Journal of Computational Physics 2009, 228:4468-4477.
    • (2009) Journal of Computational Physics , vol.228 , pp. 4468-4477
    • Preis, T.1    Virnau, P.2
  • 6
    • 10644279605 scopus 로고    scopus 로고
    • Benchmarking and implementation of probability-based simulations on programmable graphics cards
    • Tomov S., McGuigan M., et al. Benchmarking and implementation of probability-based simulations on programmable graphics cards. Computers & Graphics 2005, 29:71-80.
    • (2005) Computers & Graphics , vol.29 , pp. 71-80
    • Tomov, S.1    McGuigan, M.2
  • 7
    • 79959466764 scopus 로고    scopus 로고
    • Optimization principles and application performance of a multithreaded GPU using CUDA, in: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
    • S. Ryoo, C. Rodrigues, et al., Optimization principles and application performance of a multithreaded GPU using CUDA, in: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008, pp. 73-82.
    • (2008) , pp. 73-82
    • Ryoo, S.1    Rodrigues, C.2
  • 9
    • 33646558229 scopus 로고    scopus 로고
    • Using advanced compiler technology to exploit the performance of the cell broadband engine architecture
    • Eichenberger A.E., O'Brien J.K., et al. Using advanced compiler technology to exploit the performance of the cell broadband engine architecture. IBM Systems Journal 2006, 45(1):59-84.
    • (2006) IBM Systems Journal , vol.45 , Issue.1 , pp. 59-84
    • Eichenberger, A.E.1    O'Brien, J.K.2
  • 10
    • 85190229132 scopus 로고    scopus 로고
    • Intel 64 and IA-32 Architectures Software Developer's Manual, Intel Corporation Order Numbers 253665 through 253669
    • Intel 64 and IA-32 Architectures Software Developer's Manual, Intel Corporation Order Numbers 253665 through 253669, 2011.
    • (2011)
  • 11
    • 0031599142 scopus 로고    scopus 로고
    • Mersenne twister: A 623-Dimensionally equidistributed uniform pseudo-random number generator
    • Matsumoto M., Nishimura T. Mersenne twister: A 623-Dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 1998, 8(1):3-30.
    • (1998) ACM Transactions on Modeling and Computer Simulation , vol.8 , Issue.1 , pp. 3-30
    • Matsumoto, M.1    Nishimura, T.2
  • 12
    • 0003657590 scopus 로고    scopus 로고
    • The art of computer programming
    • Addison Wesley Longman Publishing
    • Knuth D.E. The art of computer programming. Sorting and Searching 1998, vol. 3. Addison Wesley Longman Publishing.
    • (1998) Sorting and Searching , vol.3
    • Knuth, D.E.1
  • 13
    • 5744249209 scopus 로고
    • Equation of state calculations by fast computing machines
    • Metropolis N., Rosenbluth A.W., et al. Equation of state calculations by fast computing machines. Journal of Chemical Physics 1953, 21(6).
    • (1953) Journal of Chemical Physics , vol.21 , Issue.6
    • Metropolis, N.1    Rosenbluth, A.W.2
  • 16
    • 79951638366 scopus 로고    scopus 로고
    • High-performance physics simulations using multi-core CPUs and GPGPUs in a volunteer computing context
    • Karimi K., Dickson N., Hamze F. High-performance physics simulations using multi-core CPUs and GPGPUs in a volunteer computing context. International Journal of High-Performance Applications 2011, 25(1).
    • (2011) International Journal of High-Performance Applications , vol.25 , Issue.1
    • Karimi, K.1    Dickson, N.2    Hamze, F.3
  • 18
    • 85190223871 scopus 로고    scopus 로고
    • Investigating the performance of an adiabatic quantum optimization processor
    • Karimi K., Dickson N., Hamze F., et al. Investigating the performance of an adiabatic quantum optimization processor. Quantum Information Processing 2011.
    • (2011) Quantum Information Processing
    • Karimi, K.1    Dickson, N.2    Hamze, F.3
  • 19
    • 0001064590 scopus 로고
    • Generalized Trotter's formula and systematic approximants of exponential operators and inner derivations with applications to many-body problems
    • Suzuki M. Generalized Trotter's formula and systematic approximants of exponential operators and inner derivations with applications to many-body problems. Communications in Mathematical Physics 1976, 51(2):183-190.
    • (1976) Communications in Mathematical Physics , vol.51 , Issue.2 , pp. 183-190
    • Suzuki, M.1
  • 20
    • 85190199130 scopus 로고    scopus 로고
    • Improving Particle Filter Performance Using SSE Instructions, in: International Conference on Intelligent Robotics and Systems, USA
    • P. Djeu, M. Quinlan, P. Stone, Improving Particle Filter Performance Using SSE Instructions, in: International Conference on Intelligent Robotics and Systems, USA, 2009.
    • (2009)
    • Djeu, P.1    Quinlan, M.2    Stone, P.3
  • 21
    • 85190179298 scopus 로고    scopus 로고
    • Efficient SIMD numerical interpolation, in: International Conference on High Performance Computing and Communications (HPCC), Italy
    • H. Ahmadi, M. Moslemi-Naeini, H. Sarbazi-Azad, Efficient SIMD numerical interpolation, in: International Conference on High Performance Computing and Communications (HPCC), Italy, 2005.
    • (2005)
    • Ahmadi, H.1    Moslemi-Naeini, M.2    Sarbazi-Azad, H.3
  • 22
    • 85190200116 scopus 로고    scopus 로고
    • GPU-CPU multi-core for real-time signal processing, in: 2009 Digest of Technical Papers International Conference on Consumer Electronics
    • S.P. Mohanty, GPU-CPU multi-core for real-time signal processing, in: 2009 Digest of Technical Papers International Conference on Consumer Electronics, 2009, pp. 1-2.
    • (2009) , pp. 1-2
    • Mohanty, S.P.1
  • 23
    • 85190209175 scopus 로고    scopus 로고
    • Intel 64 and IA-32 Architectures Optimization Reference Manual, Intel Corporation Order Number 248966, November
    • Intel 64 and IA-32 Architectures Optimization Reference Manual, Intel Corporation Order Number 248966, November 2007.
    • (2007)
  • 25
    • 85190203290 scopus 로고    scopus 로고
    • NVIDIA OpenCL Best Practices Guide, Version 1.0, NVIDIA Corporation
    • NVIDIA OpenCL Best Practices Guide, Version 1.0, NVIDIA Corporation, 2009.
    • (2009)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.