-
3
-
-
0036105874
-
Cellular supercomputing with system-on-a-chip
-
Digest of Technical Papers (Cat. No.02CH37315) Ieee
-
G. Almasi et al. Cellular supercomputing with system-on-a-chip. In 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315), pages 196-197. Ieee, 2002.
-
(2002)
2002 IEEE International Solid-state Circuits Conference
, pp. 196-197
-
-
Almasi, G.1
-
4
-
-
79959926022
-
FAWN: A fast array of wimpy nodes
-
July
-
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. Communications of the ACM, 54(7):101-109, July 2011.
-
(2011)
Communications of the ACM
, vol.54
, Issue.7
, pp. 101-109
-
-
Andersen, D.G.1
Franklin, J.2
Kaminsky, M.3
Phanishayee, A.4
Tan, L.5
Vasudevan, V.6
-
5
-
-
33847103649
-
Optimizing bandwidth limited problems using one-sided communication and overlap
-
IEEE
-
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pages 1-10. IEEE, 2006.
-
(2006)
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
, pp. 1-10
-
-
Bell, C.1
Bonachea, D.2
Nishtala, R.3
Yelick, K.4
-
6
-
-
84858657417
-
The hidden cost of low bandwidth communication
-
U. Vishkin, editor ACM, New York, NY, USA
-
G. E. Blelloch, B. M. Maggs, and G. L. Miller. The hidden cost of low bandwidth communication. In U. Vishkin, editor, Developing a Computer Science Agenda for High-Performance Computing, pages 22-25. ACM, New York, NY, USA, 1994.
-
(1994)
Developing a Computer Science Agenda for High-performance Computing
, pp. 22-25
-
-
Blelloch, G.E.1
Maggs, B.M.2
Miller, G.L.3
-
7
-
-
33746887354
-
SeaStar interconnect: Balanced bandwidth for scalable performance
-
DOI 10.1109/MM.2006.65
-
R. Brightwell, K. T. Pedretti, K. D. Underwood, and T. Hudson. Seastar interconnect: Balanced bandwidth for scalable performance. IEEE Micro, 26:41-57, May 2006. (Pubitemid 44194067)
-
(2006)
IEEE Micro
, vol.26
, Issue.3
, pp. 41-57
-
-
Brightwell, R.1
Pedretti, K.T.2
Underwood, K.D.3
Hudson, T.4
-
8
-
-
0000493064
-
Estimating interlock and improving balance for pipelined architectures
-
Aug.
-
D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined architectures. Journal of Parallel and Distributed Computing, 5(4):334-358, Aug. 1988.
-
(1988)
Journal of Parallel and Distributed Computing
, vol.5
, Issue.4
, pp. 334-358
-
-
Callahan, D.1
Cocke, J.2
Kennedy, K.3
-
9
-
-
58449124711
-
Communication analysis of parallel 3d fft for flat cartesian meshes on large blue gene systems
-
Springer-Verlag
-
A. Chan, P. Balaji, W. Gropp, and R. Thakur. Communication analysis of parallel 3d fft for flat cartesian meshes on large blue gene systems. In Proceedings of the 15th international conference on High performance computing, pages 350-364. Springer-Verlag, 2008.
-
(2008)
Proceedings of the 15th International Conference on High Performance Computing
, pp. 350-364
-
-
Chan, A.1
Balaji, P.2
Gropp, W.3
Thakur, R.4
-
10
-
-
19344375178
-
The development and integration of a distributed 3D FFT for a cluster of workstations
-
Atlanta, GA, USA
-
C. E. Cramer and J. Board. The development and integration of a distributed 3D FFT for a cluster of workstations. In Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta, GA, USA, 2000.
-
(2000)
Proceedings of the 4th Annual Linux Showcase & Conference
-
-
Cramer, C.E.1
Board, J.2
-
11
-
-
84867435449
-
Balance principles for algorithm-architecture co-design
-
Berkeley, CA, USA Usenix Association
-
K. Czechowski, C. Battaglino, C. Mcclanahan, A. Chandramowlishwaran, and R. Vuduc. Balance principles for algorithm-architecture co-design. In USENIX Wkshp. Hot Topics in Parallelism (HotPar), pages 1-5, Berkeley, CA, USA, 2011. Usenix Association.
-
(2011)
USENIX Wkshp. Hot Topics in Parallelism (HotPar)
, pp. 1-5
-
-
Czechowski, K.1
Battaglino, C.2
Mcclanahan, C.3
Chandramowlishwaran, A.4
Vuduc, R.5
-
12
-
-
43949090517
-
Titanium performance and potential: An NPB experimental study
-
DOI 10.1007/978-3-540-69330-7-14, Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
-
K. Datta, D. Bonachea, and K. Yelick. Titanium Performance and Potential: An NPB Experimental Study. In Proceedings of the Languages and Compilers for Parallel Computing (LCPC) Workshop, volume LNCS 4339, pages 200-214, 2006. (Pubitemid 351702211)
-
(2006)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol.LNCS4339
, pp. 200-214
-
-
Datta, K.1
Bonachea, D.2
Yelick, K.3
-
14
-
-
0035980881
-
Scalable parallel FFT for spectral simulations on a Beowulf cluster
-
DOI 10.1016/S0167-8191(01)00120-X, PII S016781910100120X
-
P. Dmitruk, L.-P. Wang, W. H. Mattaeus, R. Zhang, and D. Seckel. Scalable parallel FFT for spectral simulations on a Beowulf cluster. Parallel Computing, 27(14):1921-1936, Dec. 2001. (Pubitemid 32997727)
-
(2001)
Parallel Computing
, vol.27
, Issue.14
, pp. 1921-1936
-
-
Dmitruk, P.1
Wang, L.-P.2
Matthaeus, W.H.3
Zhang, R.4
Seckel, D.5
-
15
-
-
78650819877
-
Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers
-
number November IEEE, Nov.
-
J. Doi and Y. Negishi. Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, number November, pages 1-9. IEEE, Nov. 2010.
-
(2010)
2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, pp. 1-9
-
-
Doi, J.1
Negishi, Y.2
-
16
-
-
79951595196
-
The international exascale software project roadmap
-
J. Dongarra et al. The international exascale software project roadmap. IJHPCA, 25(1):3-60, 2011.
-
(2011)
IJHPCA
, vol.25
, Issue.1
, pp. 3-60
-
-
Dongarra, J.1
-
18
-
-
0032226441
-
The future fast fourier transform?
-
A. Edelman, P. McCorquodale, and S. Toledo. The Future Fast Fourier Transform? SIAM Journal on Scientific Computing, 20(3):1094, 1998.
-
(1998)
SIAM Journal on Scientific Computing
, vol.20
, Issue.3
, pp. 1094
-
-
Edelman, A.1
McCorquodale, P.2
Toledo, S.3
-
19
-
-
19344378421
-
Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements
-
M. Eleftheriou, B. Fitch, A. Rayshubskiy, T. Ward, and R. Germain. Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements. IBM Journal of Research and Development, 49(2.3):457-464, 2005. (Pubitemid 40718146)
-
(2005)
IBM Journal of Research and Development
, vol.49
, Issue.2-3
, pp. 457-464
-
-
Eleftheriou, M.1
Fitch, B.G.2
Rayshubskiy, A.3
Ward, T.J.C.4
Germain, R.S.5
-
20
-
-
33947229391
-
Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer
-
DOI 10.1016/j.cpc.2006.12.006, PII S0010465507000276
-
B. FANG, Y. DENG, and G. MARTYNA. Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Computer Physics Communications, 176(8):531-538, Apr. 2007. (Pubitemid 46435804)
-
(2007)
Computer Physics Communications
, vol.176
, Issue.8
, pp. 531-538
-
-
Fang, B.1
Deng, Y.2
Martyna, G.3
-
21
-
-
0033350255
-
Cache-oblivious algorithms
-
FOCS '99 Washington, DC, USA IEEE Computer Society
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS '99, pages 285-, Washington, DC, USA, 1999. IEEE Computer Society.
-
(1999)
Proceedings of the 40th Annual Symposium on Foundations of Computer Science
, pp. 285
-
-
Frigo, M.1
Leiserson, C.E.2
Prokop, H.3
Ramachandran, S.4
-
22
-
-
77953976700
-
An introductory exascale feasibility study for FFTs and multigrid
-
Atlanta, GA, USA, Apr. IEEE
-
H. Gahvari and W. Gropp. An introductory exascale feasibility study for FFTs and multigrid. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1-9, Atlanta, GA, USA, Apr. 2010. IEEE.
-
(2010)
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
, pp. 1-9
-
-
Gahvari, H.1
Gropp, W.2
-
23
-
-
78149258346
-
Understanding throughput-oriented architectures
-
Nov.
-
M. Garland and D. B. Kirk. Understanding throughput-oriented architectures. Communications of the ACM, 53(11):58, Nov. 2010.
-
(2010)
Communications of the ACM
, vol.53
, Issue.11
, pp. 58
-
-
Garland, M.1
Kirk, D.B.2
-
24
-
-
0035280950
-
Parallel distributed FFT-based solvers for 3-D Poisson problems in meso-scale atmospheric simulations
-
DOI 10.1177/109434200101500104
-
L. Giraud, R. Guivarch, and J. Stein. Parallel Distributed FFT-Based Solvers for 3-D Poisson Problems in Meso-Scale Atmospheric Simulations. International Journal of High Performance Computing Applications, 15(1):36-46, Feb. 2001. (Pubitemid 32252488)
-
(2001)
International Journal of High Performance Computing Applications
, vol.15
, Issue.1
, pp. 36-46
-
-
Giraud, L.1
Guivarch, R.2
Stein, J.3
-
25
-
-
77954713684
-
An empirically tuned 2D and 3D FFT library on CUDA GPU
-
Tsukuba, Japan ACM Press
-
L. Gu, X. Li, and J. Siegel. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10, page 305, Tsukuba, Japan, 2010. ACM Press.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
, pp. 305
-
-
Gu, L.1
Li, X.2
Siegel, J.3
-
26
-
-
67650635164
-
Many-core vs. many-thread machines: Stay away from the valley
-
Z. Guz, E. Bolotin, I. Keidar, A. Kolodny, A. Mendelson, and U. C. Weiser. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters, 8:25-28, 2009.
-
(2009)
IEEE Computer Architecture Letters
, vol.8
, pp. 25-28
-
-
Guz, Z.1
Bolotin, E.2
Keidar, I.3
Kolodny, A.4
Mendelson, A.5
Weiser, U.C.6
-
27
-
-
60649098706
-
Parallel 3D-FFTs for multi-core nodes on a mesh communication network
-
Helsinki, Finland
-
J. Hein, H. Jagode, U. Sigrist, A. Simpson, and A. Trew. Parallel 3D-FFTs for multi-core nodes on a mesh communication network. In Proceedings of the Cray User's Group (CUG) Meeting, pages 1-15, Helsinki, Finland, 2008.
-
(2008)
Proceedings of the Cray User's Group (CUG) Meeting
, pp. 1-15
-
-
Hein, J.1
Jagode, H.2
Sigrist, U.3
Simpson, A.4
Trew, A.5
-
29
-
-
84971853043
-
I/O complexity: The red-blue pebble game
-
New York, New York, USA, May ACM Press
-
H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In Proceedings of the thirteenth annual ACM symposium on Theory of computing-STOC '81, pages 326-333, New York, New York, USA, May 1981. ACM Press.
-
(1981)
Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing-STOC '81
, pp. 326-333
-
-
Jia-Wei, H.1
Kung, H.T.2
-
32
-
-
55849105626
-
Optimization of all-to-all communication on the blue gene/L supercomputer
-
IEEE, Sept.
-
S. Kumar, Y. Sabharwal, R. Garg, and P. Heidelberger. Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. In 2008 37th International Conference on Parallel Processing, pages 320-329. IEEE, Sept. 2008.
-
(2008)
2008 37th International Conference on Parallel Processing
, pp. 320-329
-
-
Kumar, S.1
Sabharwal, Y.2
Garg, R.3
Heidelberger, P.4
-
34
-
-
52649125840
-
3D-stacked memory architectures for multi-core processors
-
IEEE, June
-
G. H. Loh. 3D-Stacked Memory Architectures for Multi-core Processors. In 2008 International Symposium on Computer Architecture, pages 453-464. IEEE, June 2008.
-
(2008)
2008 International Symposium on Computer Architecture
, pp. 453-464
-
-
Loh, G.H.1
-
37
-
-
84856841346
-
Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark
-
New Orleans, LA, USA, Nov.
-
S. J. Pennycook, S. D. Hammond, S. A. Jarvis, and G. R. Mudalige. Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark. In Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation (PMBS), New Orleans, LA, USA, Nov. 2010.
-
(2010)
Proceedings of the International Workshop on Performance Modeling, Benchmarking and Simulation (PMBS)
-
-
Pennycook, S.J.1
Hammond, S.D.2
Jarvis, S.A.3
Mudalige, G.R.4
-
38
-
-
79961071291
-
Web search using mobile cores: Quantifying and mitigating the price of efficiency
-
June
-
V. J. Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: Quantifying and mitigating the price of efficiency. ACM SIGARCH Computer Architecture News, 38(3):215-314, June 2010.
-
(2010)
ACM SIGARCH Computer Architecture News
, vol.38
, Issue.3
, pp. 215-314
-
-
Reddi, V.J.1
Lee, B.C.2
Chilimbi, T.3
Vaid, K.4
-
40
-
-
0012776293
-
A parallel 3-D FFT algorithm on clusters of vector SMPs
-
Applied Parallel Computing New Paradigms for HPC in Industry and Academia 5th International Workshop, PARA 2000 Bergen, Norway, June 18-20, 2000 Proceedings
-
D. Takahashi. A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs. In Proceedings of Applied Parallel Computing: New Paradigms for HPC in Industry and Academia, volume LNCS 1947, pages 316-323, 2001. (Pubitemid 33239312)
-
(2001)
LECTURE NOTES IN COMPUTER SCIENCE
, Issue.1947
, pp. 316-323
-
-
Takahashi, D.1
-
41
-
-
80052312080
-
Keeneland: Bringing heterogeneous gpu computing to the computational science community
-
J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, et al. Keeneland: Bringing heterogeneous gpu computing to the computational science community. IEEE Computing in Science and Engineering, 13(5):90-95, 2011.
-
(2011)
IEEE Computing in Science and Engineering
, vol.13
, Issue.5
, pp. 90-95
-
-
Vetter, J.1
Glassbrook, R.2
Dongarra, J.3
Schwan, K.4
Loftis, B.5
McNally, S.6
Meredith, J.7
Rogers, J.8
Roth, P.9
Spafford, K.10
-
42
-
-
84864034959
-
MVAPICH2-GPU: Optimized GPU to GPU communication for InfiniBand clusters
-
H. Wang, S. Potluri, M. Luo, A. Singh, S. Sur, and D. Panda. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science-Research and Development, pages 1-10.
-
Computer Science-research and Development
, pp. 1-10
-
-
Wang, H.1
Potluri, S.2
Luo, M.3
Singh, A.4
Sur, S.5
Panda, D.6
-
43
-
-
74049089074
-
A 32×32×32, spatially distributed 3D FFT in four microseconds on anton
-
ACM
-
C. Young, J. Bank, R. Dror, J. Grossman, J. Salmon, and D. Shaw. A 32×32×32, spatially distributed 3D FFT in four microseconds on Anton. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, page 23. ACM, 2009.
-
(2009)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
, pp. 23
-
-
Young, C.1
Bank, J.2
Dror, R.3
Grossman, J.4
Salmon, J.5
Shaw, D.6
|