-
1
-
-
84906715281
-
-
TOP 500, http://www.top500.org/
-
-
-
-
2
-
-
78650835532
-
190 TFlops astrophysical N-body simulation on a cluster of GPUs
-
Washington, DC, USA: IEEE Computer Society
-
T. Hamada and K. Nitadori, "190 TFlops astrophysical N-body simulation on a cluster of GPUs," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-9.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10)
, pp. 1-9
-
-
Hamada, T.1
Nitadori, K.2
-
3
-
-
83155160941
-
Scalable fast multipole methods on distributed heterogeneous architectures
-
New York, NY, USA: ACM
-
Q. Hu, N. A. Gumerov, and R. Duraiswami, "Scalable fast multipole methods on distributed heterogeneous architectures," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). New York, NY, USA: ACM, 2011, pp. 36:1-36:12.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11)
, pp. 361-3612
-
-
Hu, Q.1
Gumerov, N.A.2
Duraiswami, R.3
-
4
-
-
83155160985
-
Petaflop biofluidics simulations on a two million-core system
-
New York, NY, USA: ACM
-
M. Bernaschi, M. Bisson, T. Endo, S. Matsuoka, M. Fatica, and S. Melchionna, "Petaflop biofluidics simulations on a two million-core system," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). New York, NY, USA: ACM, 2011, pp. 4:1-4:12.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11)
, pp. 41-412
-
-
Bernaschi, M.1
Bisson, M.2
Endo, T.3
Matsuoka, S.4
Fatica, M.5
Melchionna, S.6
-
5
-
-
83155190228
-
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
-
New York, NY, USA: ACM
-
T. Shimokawabe, T. Aoki, T. Takaki, T. Endo, A. Yamanaka, N. Maruyama, A. Nukada, and S. Matsuoka, "Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). New York, NY, USA: ACM, 2011, pp. 3:1-3:11.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11)
, pp. 31-311
-
-
Shimokawabe, T.1
Aoki, T.2
Takaki, T.3
Endo, T.4
Yamanaka, A.5
Maruyama, N.6
Nukada, A.7
Matsuoka, S.8
-
6
-
-
84877690500
-
Hybridizing S3D into an exascale application using OpenACC: An approach for moving to multipetaflops and beyond
-
Los Alamitos, CA, USA: ACM
-
J. M. Levesque, R. Sankaran, and R. Grout, "Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multipetaflops and beyond," In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). Los Alamitos, CA, USA: ACM, 2012 , pp. 15:1-15:11.
-
(2012)
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12)
, pp. 151-1511
-
-
Levesque, J.M.1
Sankaran, R.2
Grout, R.3
-
7
-
-
84877693414
-
High performance radiation transport simulations: Preparing for Titan
-
Los Alamitos, CA, USA
-
C. Baker, G. Davidson, T. M. Evans, S. Hamilton, J. Jarrell, and W. Joubert, "High performance radiation transport simulations: preparing for Titan," In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). Los Alamitos, CA, USA, pp. 47:1-47:10.
-
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12)
, pp. 471-4710
-
-
Baker, C.1
Davidson, G.2
Evans, T.M.3
Hamilton, S.4
Jarrell, J.5
Joubert, W.6
-
8
-
-
37249063076
-
A Madden- Julian Oscillation event realistically simulated by a global cloud-resolving model
-
H. Miura, M. Satoh, T. Nasuno, A. T. Noda, and K. Oouchi, "A Madden- Julian Oscillation event realistically simulated by a global cloud-resolving model," Science, vol. 318, pp. 1763-1765, 2007.
-
(2007)
Science
, vol.318
, pp. 1763-1765
-
-
Miura, H.1
Satoh, M.2
Nasuno, T.3
Noda, A.T.4
Oouchi, K.5
-
9
-
-
39549096002
-
Nonhydrostatic icosahedral atmospheric model (NICAM) for global cloud resolving simulations
-
M. Satoh, T. Matsuno, H. Tomita, H. Miura, T. Nasuno, and S. Iga, "Nonhydrostatic icosahedral atmospheric model (NICAM) for global cloud resolving simulations," J. Comput. Physics, vol. 227 , no.7 pp. 3486-3514, 2008.
-
(2008)
J. Comput. Physics
, vol.227
, Issue.7
, pp. 3486-3514
-
-
Satoh, M.1
Matsuno, T.2
Tomita, H.3
Miura, H.4
Nasuno, T.5
Iga, S.6
-
11
-
-
78650819651
-
An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code
-
Washington, DC, USA: IEEE Computer Society
-
T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka, "An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-11.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10)
, pp. 1-11
-
-
Shimokawabe, T.1
Aoki, T.2
Muroi, C.3
Ishida, J.4
Kawano, K.5
Endo, T.6
Nukada, A.7
Maruyama, N.8
Matsuoka, S.9
-
12
-
-
77954909361
-
Running the NIM nextgeneration weather model on GPUs
-
M. W. Govett, J. Middlecoff, and T. Henderson, "Running the NIM nextgeneration weather model on GPUs," in Proceedings of 10th IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing (CCGrid), 2010, pp. 792- 796
-
(2010)
Proceedings of 10th IEEE/ACM Int. Conf. Cluster, Cloud and Grid Computing (CCGrid)
, pp. 792-796
-
-
Govett, M.W.1
Middlecoff, J.2
Henderson, T.3
-
13
-
-
84906655622
-
Graphics processing unit (GPU) acceleration of the goddard earth observing system atmospheric model
-
Goddard Space Flight Center
-
W. Putman, "Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model," NASA Technical Report, Goddard Space Flight Center, 2011, http://ntrs.nasa.gov/search.jsp?R= 20120009084/
-
(2011)
NASA Technical Report
-
-
Putman, W.1
-
14
-
-
79958268442
-
145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction
-
proceedings of the International Conference on Computational Science (ICCS 2011)
-
T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi, "145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction," Procedia Computer Science, vol. 4, pp. 1535 - 1544, 2011, proceedings of the International Conference on Computational Science (ICCS 2011).
-
(2011)
Procedia Computer Science
, vol.4
, pp. 1535-1544
-
-
Shimokawabe, T.1
Aoki, T.2
Ishida, J.3
Kawano, K.4
Muroi, C.5
-
15
-
-
84874427444
-
Multi-GPU implementation of the NICAM atmospheric model
-
Rhodes Island, Greece
-
I. Demeshko, N. Maruyama, H. Tomita, and S. Matsuoka, "Multi-GPU Implementation of the NICAM Atmospheric Model," In Proceedings of Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2012). Rhodes Island, Greece, pp. 175-184
-
Proceedings of Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2012)
, pp. 175-184
-
-
Demeshko, I.1
Maruyama, N.2
Tomita, H.3
Matsuoka, S.4
-
16
-
-
84874804003
-
Progress towards accelerating HOMME on hybrid multi-core systems
-
I. Carpenter, R. K. Archibald, K. J. Evans, J. Larkin, P. Micikevicius, M. Norman, J. Rosinski, J. Schwarzmeier, and M. A. Taylor, "Progress towards accelerating HOMME on hybrid multi-core systems," The International Journal of High Performance Computing Applications, vol. 27, no. 3, pp. 335-347.
-
The International Journal of High Performance Computing Applications
, vol.27
, Issue.3
, pp. 335-347
-
-
Carpenter, I.1
Archibald, R.K.2
Evans, K.J.3
Larkin, J.4
Micikevicius, P.5
Norman, M.6
Rosinski, J.7
Schwarzmeier, J.8
Taylor, M.A.9
-
17
-
-
84875185250
-
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
-
ACM, New York, NY, USA
-
C. Yang,W. Xue, H. Fu, L. Gan, L. Li, Y. Xu, Y. Lu, J. Sun, G. Yang, and W. Zheng, "A peta-scalable CPU-GPU algorithm for global atmospheric simulations," In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13), ACM, New York, NY, USA, pp. 1-12.
-
Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13)
, pp. 1-12
-
-
Yang, C.1
Xue, W.2
Fu, H.3
Gan, L.4
Li, L.5
Xu, Y.6
Lu, Y.7
Sun, J.8
Yang, G.9
Zheng, W.10
-
18
-
-
79951514320
-
Parallel multilevel methods for implicit solution of shallow water equations with nonsmooth topography on the cubedsphere
-
C. Yang and X.-C. Cai, "Parallel multilevel methods for implicit solution of shallow water equations with nonsmooth topography on the cubedsphere," J. Comput. Phys., vol. 230, pp. 2523-2539, 2011.
-
(2011)
J. Comput. Phys.
, vol.230
, pp. 2523-2539
-
-
Yang, C.1
Cai, X.-C.2
-
19
-
-
0035273564
-
Strong stability-preserving highorder time discretization methods
-
S. Gottlieb, C.-W. Shu, and E. Tadmor, "Strong stability-preserving highorder time discretization methods," SIAM Review, vol. 43, pp. 89-112, 2001.
-
(2001)
SIAM Review
, vol.43
, pp. 89-112
-
-
Gottlieb, S.1
Shu, C.-W.2
Tadmor, E.3
-
20
-
-
0001440358
-
A standard test set for numerical approximations to the shallow water equations in spherical geometry
-
D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber, "A standard test set for numerical approximations to the shallow water equations in spherical geometry," J. Comput. Phys., 102: 211-224, 1992.
-
(1992)
J. Comput. Phys.
, vol.102
, pp. 211-224
-
-
Williamson, D.L.1
Drake, J.B.2
Hack, J.J.3
Jakob, R.4
Swarztrauber, P.N.5
-
21
-
-
0001178530
-
Spectral transform solutions to the shallow water test set
-
R. Jakob-Chien, J. J. Hack, and D. L. Williamson, "Spectral transform solutions to the shallow water test set," J. Comput. Phys., 119:164-187, 1995.
-
(1995)
J. Comput. Phys.
, vol.119
, pp. 164-187
-
-
Jakob-Chien, R.1
Hack, J.J.2
Williamson, D.L.3
-
22
-
-
84877702106
-
A scalable, numerically stable, high-performance tridiagonal solver using GPUs
-
IEEE Computer Society Press, Los Alamitos, CA, USA, 1-11
-
L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M. Hwu, "A scalable, numerically stable, high-performance tridiagonal solver using GPUs," In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12), IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 27:1-11.
-
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12)
, pp. 27
-
-
Chang, L.-W.1
Stratton, J.A.2
Kim, H.-S.3
Hwu, W.-M.4
-
23
-
-
84884825242
-
High performance FFT based poisson solver on a CPU-GPU heterogeneous platform
-
IEEE Computer Society, Washington, DC, USA
-
J. Wu and J. Jaja, "High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform," In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13), IEEE Computer Society, Washington, DC, USA, 115-125.
-
Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13)
, pp. 115-125
-
-
Wu, J.1
Jaja, J.2
-
24
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
Piscataway, NJ, USA: IEEE Press
-
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," in Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC '08). Piscataway, NJ, USA: IEEE Press, 2008, pp. 4:1-4:12.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC '08)
, pp. 41-412
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
25
-
-
34547500808
-
Implicit and explicit optimizations for stencil computations
-
S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Implicit and explicit optimizations for stencil computations," In Proceedings of the 2006 workshop on Memory system performance and correctness (MSPC '06), pp. 51-60.
-
Proceedings of the 2006 Workshop on Memory System Performance and Correctness (MSPC '06)
, pp. 51-60
-
-
Kamil, S.1
Datta, K.2
Williams, S.3
Oliker, L.4
Shalf, J.5
Yelick, K.6
-
26
-
-
33947307610
-
The memory behavior of cache oblivious stencil computations
-
M. Frigo and V. Strumpen, "The memory behavior of cache oblivious stencil computations," J. Supercomput., vol. 39, no. 2, pp. 93-112, 2007.
-
(2007)
J. Supercomput.
, vol.39
, Issue.2
, pp. 93-112
-
-
Frigo, M.1
Strumpen, V.2
-
27
-
-
84899705665
-
A multi-level optimization method for stencil computation on the domain that is bigger than memory capacity of GPU
-
IEEE Computer Society, Washington, DC, USA
-
G. Jin, T. Endo, and S. Matsuoka, "A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU," In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW '13), IEEE Computer Society, Washington, DC, USA, 1080-1087.
-
Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW '13)
, pp. 1080-1087
-
-
Jin, G.1
Endo, T.2
Matsuoka, S.3
-
28
-
-
84884845560
-
Optimizing and auto-tuning iterative stencil loops for GPUs with the in-plane method
-
IEEE Computer Society, Washington, DC, USA
-
WT Tang, WJ Tan, R. Krishnamoorthy, YW Wong, S.-h. Kuo, RSM Goh, S. J. Turner, and W.-F. Wong, "Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method," In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13), IEEE Computer Society, Washington, DC, USA.
-
Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13)
-
-
Tang, W.T.1
Tan, W.J.2
Krishnamoorthy, R.3
Wong, Y.W.4
Kuo, S.-H.5
Goh, R.6
Turner, S.J.7
Wong, W.-F.8
-
29
-
-
78650806116
-
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
-
A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey, "3.5-D blocking optimization for stencil computations on modern CPUs and GPUs," In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10), pp.1-13.
-
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10)
, pp. 1-13
-
-
Nguyen, A.1
Satish, N.2
Chhugani, J.3
Kim, C.4
Dubey, P.5
-
30
-
-
84877717516
-
Patus for convenient highperformance stencils: Evaluation in earthquake simulations
-
IEEE Computer Society Press, Los Alamitos, CA, USA, 1-10
-
M. Christen, O. Schenk, and Y. Cui, "Patus for convenient highperformance stencils: evaluation in earthquake simulations," In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12), IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 11:1-10.
-
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12)
, pp. 11
-
-
Christen, M.1
Schenk, O.2
Cui, Y.3
-
31
-
-
84877693508
-
Efficient backprojection-based synthetic aperture radar computation with manycore processors
-
IEEE Computer Society Press, Los Alamitos, CA, USA, 1-11
-
J. Park, PTP Tang, M. Smelyanskiy, D. Kim, and T. Benson, "Efficient backprojection-based synthetic aperture radar computation with manycore processors," In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12), IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 28:1-11.
-
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12)
, pp. 28
-
-
Park, J.1
Tang, P.2
Smelyanskiy, M.3
Kim, D.4
Benson, T.5
-
32
-
-
84884840397
-
Exploring SIMD for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors
-
IEEE Computer Society, Washington, DC, USA
-
S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, and S. A. Jarvis, "Exploring SIMD for Molecular Dynamics, Using Intel Xeon Processors and Intel Xeon Phi Coprocessors." In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13), IEEE Computer Society, Washington, DC, USA.
-
Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13)
-
-
Pennycook, S.J.1
Hughes, C.J.2
Smelyanskiy, M.3
Jarvis, S.A.4
-
33
-
-
84879835573
-
Efficient sparse matrix-vector multiplication on x86-based many-core processors
-
ACM, New York, NY, USA
-
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey, "Efficient sparse matrix-vector multiplication on x86-based many-core processors," In Proceedings of the 27th international ACM conference on International conference on supercomputing (ICS '13), ACM, New York, NY, USA, pp. 273-282.
-
Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13)
, pp. 273-282
-
-
Liu, X.1
Smelyanskiy, M.2
Chow, E.3
Dubey, P.4
-
34
-
-
84884866137
-
Design and implementation of the linpack benchmark for single and multinode systems based on intel xeon phi coprocessor
-
IEEE Computer Society, Washington, DC, USA
-
A. Heinecke, K. Vaidyanathan, M. Smelyanskiy, A. Kobotov, R. Dubtsov, G. Henry, A. G. Shet, G. Chrysos, and P. Dubey, "Design and Implementation of the Linpack Benchmark for Single and Multinode Systems Based on Intel Xeon Phi Coprocessor." In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13), IEEE Computer Society, Washington, DC, USA.
-
Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS '13)
-
-
Heinecke, A.1
Vaidyanathan, K.2
Smelyanskiy, M.3
Kobotov, A.4
Dubtsov, R.5
Henry, G.6
Shet, A.G.7
Chrysos, G.8
Dubey, P.9
-
35
-
-
84942448628
-
Assessing the performance of openmp programs on the intel xeon phi
-
Aachen, Germany
-
D. Schmidl, T. Cramer, S. Wienke, C. Terboven, and M. S. Müller, "Assessing the Performance of OpenMP Programs on the Intel Xeon Phi", In Proceedings of the Euro-Par 2013, Aachen, Germany, 2013
-
(2013)
Proceedings of the Euro-Par 2013
-
-
Schmidl, D.1
Cramer, T.2
Wienke, S.3
Terboven, C.4
Müller, M.S.5
-
36
-
-
84880053798
-
Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi
-
ACM, New York, NY, USA
-
S. Ramos, and T. Hoefler, "Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi," In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC '13). ACM, New York, NY, USA, 97-108.
-
Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing (HPDC '13)
, pp. 97-108
-
-
Ramos, S.1
Hoefler, T.2
|