SCOPUS 정보 검색 플랫폼

Computer Journal

Volumn 55, Issue 2, 2012, Pages 138-153

On the acceleration of wavefront applications using distributed many-core architectures

(5) Pennycook, S J a Hammond, S D a Mudalige, G R b Wright, S A a Jarvis, S A a

a UNIVERSITY OF WARWICK (United Kingdom)

b UNIVERSITY OF OXFORD (United Kingdom)

Author keywords

CUDA; GPU; many core computing; optimization; performance modelling; wavefront

Indexed keywords

APPLICATION PERFORMANCE; CUDA; DISTRIBUTED GRAPHICS; FUTURE PERFORMANCE; GPU; HIGH-PERFORMANCE COMPUTING; MANY-CORE ARCHITECTURE; MANY-CORE COMPUTING; NAS PARALLEL BENCHMARKS; PERFORMANCE MODEL; PERFORMANCE MODELLING; SCIENTIFIC AND ENGINEERING APPLICATIONS; THEORETICAL PERFORMANCE;

ALGORITHMS; COMPUTATION THEORY; COMPUTER GRAPHICS EQUIPMENT; COMPUTER SOFTWARE SELECTION AND EVALUATION; OPTIMIZATION; PROGRAM PROCESSORS; TEACHING; WAVEFRONTS;

BENCHMARKING;

EID: 84856898868 PISSN: 00104620 EISSN: 14602067 Source Type: Journal
DOI: 10.1093/comjnl/bxr073 Document Type: Article

Times cited : (18)

References (27)

1
- 0003605996
- RNR-94-007. NASA Ames Research Center, Moffet Field, CA
- RNR-94-007. (1994) The NAS Parallel Benchmarks. NASA Ames Research Center, Moffet Field, CA.
- (1994) The NAS Parallel Benchmarks

2
- 51049124075
- A plugand-Play model for evaluating wavefront computations on parallel architectures
- Miami, FL, April. IEEE Computer Society, Los Alamitos, CA
- Mudalige, G.R., Vernon, M.K. and Jarvis, S.A. (2008) A Plugand-Play Model for Evaluating Wavefront Computations on Parallel Architectures. Proc. IEEE Int. Parallel and Distributed Processing Symp., Miami, FL, April 14-18. IEEE Computer Society, Los Alamitos, CA.
- (2008) Proc. IEEE Int. Parallel and Distributed Processing Symp. , pp. 14-18
- Mudalige, G.R.¹ Vernon, M.K.² Jarvis, S.A.³

3
- 84922896495
- WARPP: A toolkit for simulating high-performance parallel scientific codes
- Rome, Italy, March 2-6,. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, Brussels, Belgium
- Hammond, S.D., Mudalige, G.R., Smith, J.A., Jarvis, S.A., Herdman, J.A. and Vadgama, A. (2009) WARPP: A Toolkit for Simulating High-Performance Parallel Scientific Codes. Proc. Int. Conf. Simulation Tools and Techniques, Rome, Italy, March 2-6, pp. 19:1-19:10. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, Brussels, Belgium.
- (2009) Proc. Int. Conf. Simulation Tools and Techniques , pp. 191-1910
- Hammond, S.D.¹ Mudalige, G.R.² Smith, J.A.³ Jarvis, S.A.⁴ Herdman, J.A.⁵ Vadgama, A.⁶

4
- 84949489562
- A general predictive performance model for wavefront algorithms on clusters of SMPs
- Toronto, Canada, August 21-24, IEEE Computer Society, Los Alamitos, CA
- Hoisie, A., Lubeck, O.,Wasserman, H., Petrini, F. and Alme, H. (2000) A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs. Proc. Int. Conf. Parallel Processing, Toronto, Canada, August 21-24, pp. 219-228. IEEE Computer Society, Los Alamitos, CA.
- (2000) Proc. Int. Conf. Parallel Processing , pp. 219-228
- Hoisie, A.¹ Lubeck, O.² Wasserman, H.³ Petrini, F.⁴ Alme, H.⁵

5
- 77950972570
- TR-08-24. Computer Science, Virginia Tech. Blacksburg, VA
- TR-08-24. (2008) Accelerating Data-Serial Applications on GPGPUs: A Systems Approach. Computer Science, Virginia Tech. Blacksburg, VA.
- (2008) Accelerating Data-Serial Applications on GPGPUs: A Systems Approach

6
- 67549093800
- Design and implementation of the smith-waterman algorithm on the CUDA-compatible gPU
- Athens, Greece, October 8-10, IEEE Computer Society, Los Alamitos, CA
- Munekawa, Y., Ino, F. and Hagihara, K. (2008) Design and Implementation of the Smith-Waterman Algorithm on the CUDA-Compatible GPU. Proc. IEEE Int. Conf. Bioinformatics and Bioengineering, Athens, Greece, October 8-10, pp. 1-6. IEEE Computer Society, Los Alamitos, CA.
- (2008) Proc. IEEE Int. Conf. Bioinformatics and Bioengineering , pp. 1-6
- Munekawa, Y.¹ Ino, F.² Hagihara, K.³

7
- 43349092363
- CUDA Compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
- Manavski, S. andValle, G. (2008) CUDA Compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinf., 9, S10.
- (2008) BMC Bioinf. , vol.9
- Manavski, S.¹ Valle, G.²

8
- 34548757858
- Multicore surprises: Lessons learned from optimizing sweep3D on the cell broadband engine
- Long Beach, CA, March, IEEE Computer Society, LosAlamitos, CA
- Petrini, F., Fossum, G., Fernandez, J., Varbanescu, A.L., Kistler, N. and Perrone, M. (2007) Multicore Surprises: Lessons Learned from Optimizing Sweep 3D on the Cell Broadband Engine. Proc. IEEE Int. Parallel and Distributed Processing Symp., Long Beach, CA, March 26-30. IEEE Computer Society, LosAlamitos, CA.
- (2007) Proc. IEEE Int. Parallel and Distributed Processing Symp. , pp. 26-30
- Petrini, F.¹ Fossum, G.² Fernandez, J.³ Varbanescu, A.L.⁴ Kistler, N.⁵ Perrone, M.⁶

9
- 79956151846
- Optimizing sweep3D for graphic processor unit
- Busan, Korea, May 21-23, Springer, Berlin
- Gong, C., Liu, J., Gong, Z., Qin, J. and Xie, J. (2010) Optimizing Sweep 3D for Graphic Processor Unit. Proc. Int. Conf. Algorithms and Architectures for Parallel Processing, Busan, Korea, May 21-23, pp. 416-426. Springer, Berlin.
- (2010) Proc. Int. Conf. Algorithms and Architectures for Parallel Processing , pp. 416-426
- Gong, C.¹ Liu, J.² Gong, Z.³ Qin, J.⁴ Xie, J.⁵

10
- 84856917433
- Los Alamos National Laboratory. (accessed May 12, 2011)
- (1995) The ASCI Sweep 3D Benchmark. Los Alamos National Laboratory. http://www.c3.lanl.gov/pal/software/sweep3d/sweep3d-readme.html (accessed May 12, 2011).
- (1995) The ASCI Sweep 3D Benchmark

11
- 70450059008
- Accelerating leukocyte tracking using CUDA:A case study in leveragingmanycore coprocessors
- Rome, Italy, May. IEEE Computer Society, Los Alamitos, CA
- Boyer, M., Tarjan, D., Acton, S.T. and Skadron, K. (2009) Accelerating Leukocyte Tracking using CUDA:A Case Study in LeveragingManycore Coprocessors. Proc. IEEE Int.Parallel and Distributed Processing Symp., Rome, Italy, May 23-29. IEEE Computer Society, Los Alamitos, CA.
- (2009) Proc. IEEE Int.Parallel and Distributed Processing Symp. , pp. 23-29
- Boyer, M.¹ Tarjan, D.² Acton, S.T.³ Skadron, K.⁴

12
- 70350754502
- High performance discrete fourier transforms on graphics processors
- Austin, TX, November 15-21, IEEE Press Piscataway, NJ
- Govindaraju, N.K., Lloyd, B., Dotsenko, Y., Smith, B. and Manferdelli, J. (2008) High Performance Discrete Fourier Transforms on Graphics Processors. Proc. ACM/IEEE Conf. Supercomputing, Austin, TX, November 15-21, pp. 2:1-2:12. IEEE Press Piscataway, NJ.
- (2008) Proc. ACM/IEEE Conf. Supercomputing , pp. 21-212
- Govindaraju, N.K.¹ Lloyd, B.² Dotsenko, Y.³ Smith, B.⁴ Manferdelli, J.⁵

13
- 78650819651
- An 80-fold speedup, 15.0 TFlops Full GPUAcceleration of non-hydrostatic weather model ASUCA production code
- New Orleans, LA, November, IEEE Computer SocietyWashington, DC
- Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N. and Matsuoka, S. (2010) An 80-Fold Speedup, 15.0 TFlops Full GPUAcceleration of Non-Hydrostatic Weather Model ASUCA Production Code. Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 13-19. IEEE Computer SocietyWashington, DC.
- (2010) Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis , pp. 13-19
- Shimokawabe, T.¹ Aoki, T.² Muroi, C.³ Ishida, J.⁴ Kawano, K.⁵ Endo, T.⁶ Nukada, A.⁷ Maruyama, N.⁸ Matsuoka, S.⁹

14
- 78649859889
- An MPICUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters
- Orlando, FL, January. American Institute of Aeronautics and Astronautics, Reston,VA
- Jacobsen, D.A., Thibault, J.C. and Senocak, I. (2010) An MPICUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters. Proc. 48th AIAA Aerospace Sciences Meeting, Orlando, FL, January 4-7. American Institute of Aeronautics and Astronautics, Reston,VA.
- (2010) Proc. 48th AIAA Aerospace Sciences Meeting , pp. 4-7
- Jacobsen, D.A.¹ Thibault, J.C.² Senocak, I.³

15
- 77954995885
- Debunking the 100X GPU vs. CPU Myth: An evaluation of throughput computing on CPU and GPU
- Saint-Malo, France, June 21-23,. ACM NewYork, NY
- Lee,V.W. et al. (2010) Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. Proc. ACM/IEEE Int. Symp. Computer Architecture, Saint-Malo, France, June 21-23, pp. 451-460. ACM NewYork, NY.
- (2010) Proc. ACM/IEEE Int. Symp. Computer Architecture , pp. 451-460
- Lee, V.W.¹

16
- 85092761228
- On the limits of GPU acceleration
- Berkeley, CA, June. USENIX Association, Berkeley, CA
- Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M.E. and Shringarpure, A. (2010) On the Limits of GPU Acceleration. Proc. USENIXWorkshop on Hot Topics in Parallelism, Berkeley, CA, June 14-15. USENIX Association, Berkeley, CA.
- (2010) Proc. USENIXWorkshop on Hot Topics in Parallelism , pp. 14-15
- Vuduc, R.¹ Chandramowlishwaran, A.² Choi, J.³ Guney, M.E.⁴ Shringarpure, A.⁵

17
- 78951473320
- RC24982. IBM Research Division, Thomas J.Watson Research Center.Yorktown Heights, NY
- RC24982. (2010) Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-Intensive Application!. IBM Research Division, Thomas J.Watson Research Center.Yorktown Heights, NY.
- (2010) Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-Intensive Application!

18
- 84856908021
- RC25033. IBM Research Division, Thomas J.Watson Research Center.Yorktown Heights, NY
- RC25033. (2010) Can CPUs Match GPUs on Performance with Productivity-: Experiences with Optimizing a FLOP-Intensive Application on CPUs andGPU. IBM Research Division, Thomas J.Watson Research Center.Yorktown Heights, NY.
- (2010) Can CPUs Match GPUs on Performance with Productivity-: Experiences with Optimizing a FLOP-Intensive Application on CPUs andGPU

19
- 84856841346
- Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark
- Pennycook, S.J., Hammond, S.D., Mudalige, G.R. and Jarvis, S.A. (2011) Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark. SIGMETRICS Perform. Eval. Rev., 38, 23-29.
- (2011) SIGMETRICS Perform. Eval. Rev. , vol.38 , pp. 23-29
- Pennycook, S.J.¹ Hammond, S.D.² Mudalige, G.R.³ Jarvis, S.A.⁴

20
- 84856919010
- HPC Wire. (accessed November 4, 2010)
- Lazou, C. (2010) Should I Buy GPGPUs or Blue Gene- HPC Wire. http://www.hpcwire.com/hpcwire/2010-11-04/should-i-buy-gpgpus-or-blue-gene.html (accessed November 4, 2010).
- (2010) Should I buy GPGPUs or Blue Gene
- Lazou, C.¹

21
- 0003605992
- NAS-96-18. NASA Ames Research Center, Moffet Field, CA
- NAS-96-18. (1996) NAS Parallel Benchmark (Version 1.0) Results 11-96. NASA Ames Research Center, Moffet Field, CA.
- (1996) NAS Parallel Benchmark (Version 1.0) Results 11-96

22
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- Salt Lake City, UT, February 20-23,. ACM NewYork, NY
- Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B. and Hwu, W.W. (2008) Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Salt Lake City, UT, February 20-23, pp. 73-82. ACM NewYork, NY.
- (2008) Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

23
- 0016026944
- The parallel execution of DOloops
- Lamport, L. (1974) The parallel execution of DOloops. Commun. ACM, 17, 83-93.
- (1974) Commun. ACM , vol.17 , pp. 83-93
- Lamport, L.¹

24
- 78650817529
- Size matters: Space/time tradeoffs to improve GPGPU applications performance
- New Orleans, LA, November, IEEE Computer SocietyWashington, DC
- Gharaibeh, A. and Ripeanu, M. (2010) Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance. Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 13-19. IEEE Computer SocietyWashington, DC.
- (2010) Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis , pp. 13-19
- Gharaibeh, A.¹ Ripeanu, M.²

25
- 84957882532
- SKaMPI: A detailed, accurate MPI benchmark
- Reussner, R., Sanders, P., Prechelt, L. and Müller, M. (1998) SKaMPI: A detailed, accurate MPI benchmark. Recent Adv. Parallel Virtual Mach. Message Passing Interface, 1497, 52-59.
- (1998) Recent Adv. Parallel Virtual Mach. Message Passing Interface , vol.1497 , pp. 52-59
- Reussner, R.¹ Sanders, P.² Prechelt, L.³ Müller, M.⁴

26
- 84856919926
- Lawrence Livermore National Laboratory. (accessed May 12, 2011
- (2010) Livermore Computing Systems Summary. Lawrence Livermore National Laboratory. https://computing.llnl.gov/resources/systems-summary.pdf (accessed May 12, 2011).
- (2010) Livermore Computing Systems Summary

27
- 23244465694
- A performance comparison between the Earth Simulator and other terascale systems on a characteristic ASCI workload
- DOI 10.1002/cpe.891
- Kerbyson, D.J., Hoisie, A. and Wasserman, H. (2005) A performance comparison between the earth simulator and other terascale systems on a characteristic ASCI workload. Concurrency Comput.: Pract. Exp., 17, 1219-1238. (Pubitemid 41092969)
- (2005) Concurrency Computation Practice and Experience , vol.17 , Issue.10 , pp. 1219-1238
- Kerbyson, D.J.¹ Hoisie, A.² Wasserman, H.J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.