-
1
-
-
0003605996
-
-
RNR-94-007. NASA Ames Research Center, Moffet Field, CA
-
RNR-94-007. (1994) The NAS Parallel Benchmarks. NASA Ames Research Center, Moffet Field, CA.
-
(1994)
The NAS Parallel Benchmarks
-
-
-
2
-
-
51049124075
-
A plugand-Play model for evaluating wavefront computations on parallel architectures
-
Miami, FL, April. IEEE Computer Society, Los Alamitos, CA
-
Mudalige, G.R., Vernon, M.K. and Jarvis, S.A. (2008) A Plugand-Play Model for Evaluating Wavefront Computations on Parallel Architectures. Proc. IEEE Int. Parallel and Distributed Processing Symp., Miami, FL, April 14-18. IEEE Computer Society, Los Alamitos, CA.
-
(2008)
Proc. IEEE Int. Parallel and Distributed Processing Symp.
, pp. 14-18
-
-
Mudalige, G.R.1
Vernon, M.K.2
Jarvis, S.A.3
-
3
-
-
84922896495
-
WARPP: A toolkit for simulating high-performance parallel scientific codes
-
Rome, Italy, March 2-6,. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, Brussels, Belgium
-
Hammond, S.D., Mudalige, G.R., Smith, J.A., Jarvis, S.A., Herdman, J.A. and Vadgama, A. (2009) WARPP: A Toolkit for Simulating High-Performance Parallel Scientific Codes. Proc. Int. Conf. Simulation Tools and Techniques, Rome, Italy, March 2-6, pp. 19:1-19:10. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, Brussels, Belgium.
-
(2009)
Proc. Int. Conf. Simulation Tools and Techniques
, pp. 191-1910
-
-
Hammond, S.D.1
Mudalige, G.R.2
Smith, J.A.3
Jarvis, S.A.4
Herdman, J.A.5
Vadgama, A.6
-
4
-
-
84949489562
-
A general predictive performance model for wavefront algorithms on clusters of SMPs
-
Toronto, Canada, August 21-24, IEEE Computer Society, Los Alamitos, CA
-
Hoisie, A., Lubeck, O.,Wasserman, H., Petrini, F. and Alme, H. (2000) A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs. Proc. Int. Conf. Parallel Processing, Toronto, Canada, August 21-24, pp. 219-228. IEEE Computer Society, Los Alamitos, CA.
-
(2000)
Proc. Int. Conf. Parallel Processing
, pp. 219-228
-
-
Hoisie, A.1
Lubeck, O.2
Wasserman, H.3
Petrini, F.4
Alme, H.5
-
6
-
-
67549093800
-
Design and implementation of the smith-waterman algorithm on the CUDA-compatible gPU
-
Athens, Greece, October 8-10, IEEE Computer Society, Los Alamitos, CA
-
Munekawa, Y., Ino, F. and Hagihara, K. (2008) Design and Implementation of the Smith-Waterman Algorithm on the CUDA-Compatible GPU. Proc. IEEE Int. Conf. Bioinformatics and Bioengineering, Athens, Greece, October 8-10, pp. 1-6. IEEE Computer Society, Los Alamitos, CA.
-
(2008)
Proc. IEEE Int. Conf. Bioinformatics and Bioengineering
, pp. 1-6
-
-
Munekawa, Y.1
Ino, F.2
Hagihara, K.3
-
7
-
-
43349092363
-
CUDA Compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
-
Manavski, S. andValle, G. (2008) CUDA Compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinf., 9, S10.
-
(2008)
BMC Bioinf.
, vol.9
-
-
Manavski, S.1
Valle, G.2
-
8
-
-
34548757858
-
Multicore surprises: Lessons learned from optimizing sweep3D on the cell broadband engine
-
Long Beach, CA, March, IEEE Computer Society, LosAlamitos, CA
-
Petrini, F., Fossum, G., Fernandez, J., Varbanescu, A.L., Kistler, N. and Perrone, M. (2007) Multicore Surprises: Lessons Learned from Optimizing Sweep 3D on the Cell Broadband Engine. Proc. IEEE Int. Parallel and Distributed Processing Symp., Long Beach, CA, March 26-30. IEEE Computer Society, LosAlamitos, CA.
-
(2007)
Proc. IEEE Int. Parallel and Distributed Processing Symp.
, pp. 26-30
-
-
Petrini, F.1
Fossum, G.2
Fernandez, J.3
Varbanescu, A.L.4
Kistler, N.5
Perrone, M.6
-
9
-
-
79956151846
-
Optimizing sweep3D for graphic processor unit
-
Busan, Korea, May 21-23, Springer, Berlin
-
Gong, C., Liu, J., Gong, Z., Qin, J. and Xie, J. (2010) Optimizing Sweep 3D for Graphic Processor Unit. Proc. Int. Conf. Algorithms and Architectures for Parallel Processing, Busan, Korea, May 21-23, pp. 416-426. Springer, Berlin.
-
(2010)
Proc. Int. Conf. Algorithms and Architectures for Parallel Processing
, pp. 416-426
-
-
Gong, C.1
Liu, J.2
Gong, Z.3
Qin, J.4
Xie, J.5
-
10
-
-
84856917433
-
-
Los Alamos National Laboratory. (accessed May 12, 2011)
-
(1995) The ASCI Sweep 3D Benchmark. Los Alamos National Laboratory. http://www.c3.lanl.gov/pal/software/sweep3d/sweep3d-readme.html (accessed May 12, 2011).
-
(1995)
The ASCI Sweep 3D Benchmark
-
-
-
11
-
-
70450059008
-
Accelerating leukocyte tracking using CUDA:A case study in leveragingmanycore coprocessors
-
Rome, Italy, May. IEEE Computer Society, Los Alamitos, CA
-
Boyer, M., Tarjan, D., Acton, S.T. and Skadron, K. (2009) Accelerating Leukocyte Tracking using CUDA:A Case Study in LeveragingManycore Coprocessors. Proc. IEEE Int.Parallel and Distributed Processing Symp., Rome, Italy, May 23-29. IEEE Computer Society, Los Alamitos, CA.
-
(2009)
Proc. IEEE Int.Parallel and Distributed Processing Symp.
, pp. 23-29
-
-
Boyer, M.1
Tarjan, D.2
Acton, S.T.3
Skadron, K.4
-
12
-
-
70350754502
-
High performance discrete fourier transforms on graphics processors
-
Austin, TX, November 15-21, IEEE Press Piscataway, NJ
-
Govindaraju, N.K., Lloyd, B., Dotsenko, Y., Smith, B. and Manferdelli, J. (2008) High Performance Discrete Fourier Transforms on Graphics Processors. Proc. ACM/IEEE Conf. Supercomputing, Austin, TX, November 15-21, pp. 2:1-2:12. IEEE Press Piscataway, NJ.
-
(2008)
Proc. ACM/IEEE Conf. Supercomputing
, pp. 21-212
-
-
Govindaraju, N.K.1
Lloyd, B.2
Dotsenko, Y.3
Smith, B.4
Manferdelli, J.5
-
13
-
-
78650819651
-
An 80-fold speedup, 15.0 TFlops Full GPUAcceleration of non-hydrostatic weather model ASUCA production code
-
New Orleans, LA, November, IEEE Computer SocietyWashington, DC
-
Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N. and Matsuoka, S. (2010) An 80-Fold Speedup, 15.0 TFlops Full GPUAcceleration of Non-Hydrostatic Weather Model ASUCA Production Code. Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 13-19. IEEE Computer SocietyWashington, DC.
-
(2010)
Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis
, pp. 13-19
-
-
Shimokawabe, T.1
Aoki, T.2
Muroi, C.3
Ishida, J.4
Kawano, K.5
Endo, T.6
Nukada, A.7
Maruyama, N.8
Matsuoka, S.9
-
14
-
-
78649859889
-
An MPICUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters
-
Orlando, FL, January. American Institute of Aeronautics and Astronautics, Reston,VA
-
Jacobsen, D.A., Thibault, J.C. and Senocak, I. (2010) An MPICUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters. Proc. 48th AIAA Aerospace Sciences Meeting, Orlando, FL, January 4-7. American Institute of Aeronautics and Astronautics, Reston,VA.
-
(2010)
Proc. 48th AIAA Aerospace Sciences Meeting
, pp. 4-7
-
-
Jacobsen, D.A.1
Thibault, J.C.2
Senocak, I.3
-
15
-
-
77954995885
-
Debunking the 100X GPU vs. CPU Myth: An evaluation of throughput computing on CPU and GPU
-
Saint-Malo, France, June 21-23,. ACM NewYork, NY
-
Lee,V.W. et al. (2010) Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. Proc. ACM/IEEE Int. Symp. Computer Architecture, Saint-Malo, France, June 21-23, pp. 451-460. ACM NewYork, NY.
-
(2010)
Proc. ACM/IEEE Int. Symp. Computer Architecture
, pp. 451-460
-
-
Lee, V.W.1
-
16
-
-
85092761228
-
On the limits of GPU acceleration
-
Berkeley, CA, June. USENIX Association, Berkeley, CA
-
Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M.E. and Shringarpure, A. (2010) On the Limits of GPU Acceleration. Proc. USENIXWorkshop on Hot Topics in Parallelism, Berkeley, CA, June 14-15. USENIX Association, Berkeley, CA.
-
(2010)
Proc. USENIXWorkshop on Hot Topics in Parallelism
, pp. 14-15
-
-
Vuduc, R.1
Chandramowlishwaran, A.2
Choi, J.3
Guney, M.E.4
Shringarpure, A.5
-
19
-
-
84856841346
-
Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark
-
Pennycook, S.J., Hammond, S.D., Mudalige, G.R. and Jarvis, S.A. (2011) Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark. SIGMETRICS Perform. Eval. Rev., 38, 23-29.
-
(2011)
SIGMETRICS Perform. Eval. Rev.
, vol.38
, pp. 23-29
-
-
Pennycook, S.J.1
Hammond, S.D.2
Mudalige, G.R.3
Jarvis, S.A.4
-
20
-
-
84856919010
-
-
HPC Wire. (accessed November 4, 2010)
-
Lazou, C. (2010) Should I Buy GPGPUs or Blue Gene- HPC Wire. http://www.hpcwire.com/hpcwire/2010-11-04/should-i-buy-gpgpus-or-blue-gene.html (accessed November 4, 2010).
-
(2010)
Should I buy GPGPUs or Blue Gene
-
-
Lazou, C.1
-
22
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
Salt Lake City, UT, February 20-23,. ACM NewYork, NY
-
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B. and Hwu, W.W. (2008) Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Salt Lake City, UT, February 20-23, pp. 73-82. ACM NewYork, NY.
-
(2008)
Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.W.6
-
23
-
-
0016026944
-
The parallel execution of DOloops
-
Lamport, L. (1974) The parallel execution of DOloops. Commun. ACM, 17, 83-93.
-
(1974)
Commun. ACM
, vol.17
, pp. 83-93
-
-
Lamport, L.1
-
24
-
-
78650817529
-
Size matters: Space/time tradeoffs to improve GPGPU applications performance
-
New Orleans, LA, November, IEEE Computer SocietyWashington, DC
-
Gharaibeh, A. and Ripeanu, M. (2010) Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance. Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 13-19. IEEE Computer SocietyWashington, DC.
-
(2010)
Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis
, pp. 13-19
-
-
Gharaibeh, A.1
Ripeanu, M.2
-
25
-
-
84957882532
-
SKaMPI: A detailed, accurate MPI benchmark
-
Reussner, R., Sanders, P., Prechelt, L. and Müller, M. (1998) SKaMPI: A detailed, accurate MPI benchmark. Recent Adv. Parallel Virtual Mach. Message Passing Interface, 1497, 52-59.
-
(1998)
Recent Adv. Parallel Virtual Mach. Message Passing Interface
, vol.1497
, pp. 52-59
-
-
Reussner, R.1
Sanders, P.2
Prechelt, L.3
Müller, M.4
-
26
-
-
84856919926
-
-
Lawrence Livermore National Laboratory. (accessed May 12, 2011
-
(2010) Livermore Computing Systems Summary. Lawrence Livermore National Laboratory. https://computing.llnl.gov/resources/systems-summary.pdf (accessed May 12, 2011).
-
(2010)
Livermore Computing Systems Summary
-
-
-
27
-
-
23244465694
-
A performance comparison between the Earth Simulator and other terascale systems on a characteristic ASCI workload
-
DOI 10.1002/cpe.891
-
Kerbyson, D.J., Hoisie, A. and Wasserman, H. (2005) A performance comparison between the earth simulator and other terascale systems on a characteristic ASCI workload. Concurrency Comput.: Pract. Exp., 17, 1219-1238. (Pubitemid 41092969)
-
(2005)
Concurrency Computation Practice and Experience
, vol.17
, Issue.10
, pp. 1219-1238
-
-
Kerbyson, D.J.1
Hoisie, A.2
Wasserman, H.J.3
|