-
1
-
-
32844464238
-
Optimization of MPI collective communication on BlueGene/L systems
-
ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
-
G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. Optimization of MPI collective communication on BlueGene/L systems. In Proceedings of the 19th annual international conference on Supercomputing, ICS '05, pages 253-262, 2005. (Pubitemid 43251330)
-
(2005)
Proceedings of the International Conference on Supercomputing
, pp. 253-262
-
-
Almasi, G.1
Heidelberger, P.2
Archer, C.J.3
Martorell, X.4
Erway, C.C.5
Moreira, J.E.6
Steinmacher-Burow, B.7
Zheng, Y.8
-
3
-
-
72249097688
-
The multikernel: A new OS architecture for scalable multicore systems
-
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 29-44, 2009.
-
(2009)
Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09
, pp. 29-44
-
-
Baumann, A.1
Barham, P.2
Dagand, P.-E.3
Harris, T.4
Isaacs, R.5
Peter, S.6
Roscoe, T.7
Schüpbach, A.8
Singhania, A.9
-
4
-
-
33746284933
-
Broadcast trees for heterogeneous platforms
-
O. Beaumont, L. Marchal, and Y. Robert. Broadcast Trees for Heterogeneous Platforms. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 80-92, 2005.
-
(2005)
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05
, pp. 80-92
-
-
Beaumont, O.1
Marchal, L.2
Robert, Y.3
-
5
-
-
34547261834
-
Thousand core chips - A technology perspective
-
DOI 10.1109/DAC.2007.375263, 4261282, 2007 44th ACM/IEEE Design Automation Conference, DAC'07
-
S. Borkar. Thousand core chips: a technology perspective. In Proceedings of the 44th annual Design Automation Conference, DAC '07, pages 746-749, 2007. (Pubitemid 47130064)
-
(2007)
Proceedings - Design Automation Conference
, pp. 746-749
-
-
Borkar, S.1
-
6
-
-
0031269329
-
Efficient algorithms for all-to-all communications in multiport message-passing systems
-
J. Bruck, C.-T. Ho, E. Upfal, S. Kipnis, and D. Weathersby. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Transactions on Parallel and Distributed Systems, 8:1143-1156, November 1997. (Pubitemid 127763326)
-
(1997)
IEEE Transactions on Parallel and Distributed Systems
, vol.8
, Issue.11
, pp. 1143-1156
-
-
Bruck, J.1
Ho, C.-T.2
Kipnis, S.3
Upfal, E.4
Weathersby, D.5
-
7
-
-
84864147959
-
-
A Collective Communication Library for the Intel Single-chip Cloud Computer
-
E. Chan. RCCE comm: A Collective Communication Library for the Intel Single-chip Cloud Computer. http://communities.intel.com/docs/DOC-5663, 2010.
-
(2010)
-
-
Chan, E.1
-
9
-
-
0009346826
-
LogP: Towards a realistic model of parallel computation
-
D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '93, pages 1-12, 1993.
-
(1993)
Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '93
, pp. 1-12
-
-
Culler, D.1
Karp, R.2
Patterson, D.3
Sahay, A.4
Schauser, K.E.5
Santos, E.6
Subramonian, R.7
Von Eicken, T.8
-
10
-
-
3643067761
-
Assessing fast network interfaces
-
D. E. Culler, L. T. Liu, R. P. Martin, and C. O. Yoshikawa. Assessing Fast Network Interfaces. In IEEE Micro, pages 35-43, Feb. 1996. (Pubitemid 126530205)
-
(1996)
IEEE Micro
, vol.16
, Issue.1
, pp. 35-43
-
-
Culler, D.E.1
Liu, L.T.2
Martin, R.P.3
Yoshikawa, C.O.4
-
11
-
-
35048884271
-
Open MPI: Goals, concept, and design of a next generation MPI implementation
-
Budapest, Hungary, September
-
E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S.Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings, 11th European PVM/MPI Users' Group Meeting, pages 97-104, Budapest, Hungary, September 2004.
-
(2004)
Proceedings, 11th European PVM/MPI Users' Group Meeting
, pp. 97-104
-
-
Gabriel, E.1
Fagg, G.E.2
Bosilca, G.3
Angskun, T.4
Dongarra, J.J.5
Squyres, J.M.6
Sahay, V.7
Kambadur, P.8
Barrett, B.9
Lumsdaine, A.10
Castain, R.H.11
Daniel, D.J.12
Graham, R.L.13
Woodall, T.S.14
-
12
-
-
84947273700
-
Efficient collective operations using remote memory operations on VIA-based clusters
-
R. Gupta, P. Balaji, D. K. Panda, and J. Nieplocha. Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS '03, pages 46-62, 2003.
-
(2003)
Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS '03
, pp. 46-62
-
-
Gupta, R.1
Balaji, P.2
Panda, D.K.3
Nieplocha, J.4
-
13
-
-
34548793392
-
A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast
-
T. Hoefler, C. Siebert, andW. Rehm. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, IPDPS '07, page 232, 2007.
-
(2007)
Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, IPDPS '07
, pp. 232
-
-
Hoefler, T.1
Siebert, C.2
Rehm, W.3
-
14
-
-
77952123736
-
A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS
-
IEEE
-
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In 2010 IEEE International SolidState Circuits Conference, pages 108-109. IEEE, 2010.
-
(2010)
2010 IEEE International SolidState Circuits Conference
, pp. 108-109
-
-
Howard, J.1
Dighe, S.2
Hoskote, Y.3
Vangal, S.4
Finan, D.5
Ruhl, G.6
Jenkins, D.7
Wilson, H.8
Borkar, N.9
Schrom, G.10
-
15
-
-
0018518295
-
Virtual cut-through: A new computer communication switching technique
-
DOI 10.1016/0376-5075(79)90032-1
-
P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3(4):267-286, 1979. (Pubitemid 10422271)
-
(1979)
Computer networks
, vol.3
, Issue.4
, pp. 267-286
-
-
Kermani Parviz1
Kleinrock Leonard2
-
16
-
-
66749092384
-
Exascale computing study: Technology challenges in achieving exascale systems
-
P. Kogge et al. Exascale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical report, DARPA, 2008.
-
(2008)
Technical Report DARPA
-
-
Kogge, P.1
-
17
-
-
12444269036
-
Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast supportsch
-
J. Liu, A. R. Mamidala, and D. K. Panda. Fast and Scalable MPI-Level Broadcast Using InfiniBand's Hardware Multicast Supportsch. In Proceedings of the 18th International Symposium on Parallel and Distributed Processing, IPDPS '04, page 10, 2004.
-
(2004)
Proceedings of the 18th International Symposium on Parallel and Distributed Processing, IPDPS '04
, pp. 10
-
-
Liu, J.1
Mamidala, A.R.2
Panda, D.K.3
-
18
-
-
1142305191
-
High performance RDMA-based MPI implementation over InfiniBand
-
J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. In Proceedings of the 17th annual international conference on Supercomputing, ICS '03, pages 295-304, 2003.
-
(2003)
Proceedings of the 17th Annual International Conference on Supercomputing, ICS '03
, pp. 295-304
-
-
Liu, J.1
Wu, J.2
Kini, S.P.3
Wyckoff, P.4
Panda, D.K.5
-
19
-
-
84864147958
-
-
RCCE: a Small Library for Many-Core Communication
-
T. Mattson and R. Van Der Wijngaart. RCCE: a Small Library for Many-Core Communication. http://techresearch.intel.com, 2010.
-
(2010)
-
-
Mattson, T.1
Van Der Wijngaart, R.2
-
20
-
-
70350754500
-
Programming the Intel 80-core network-on-a-chip terascale processor
-
T. G. Mattson, R. Van der Wijngaart, and M. Frumkin. Programming the Intel 80-core network-on-a-chip terascale processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 38:1-38:11, 2008.
-
(2008)
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08
, pp. 381-3811
-
-
Mattson, T.G.1
Van Der Wijngaart, R.2
Frumkin, M.3
-
22
-
-
78650735454
-
BatchQueue: Fast and memory-thrifty core to core communication
-
T. Preud'homme, J. Sopena, G. Thomas, and B. Folliot. BatchQueue: Fast and Memory-Thrifty Core to Core Communication. In Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD '10, pages 215-222, 2010.
-
(2010)
Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD '10
, pp. 215-222
-
-
Preud'homme, T.1
Sopena, J.2
Thomas, G.3
Folliot, B.4
-
23
-
-
84863890827
-
On efficient message passing on the intel scc
-
R. Rotta. On efficient message passing on the intel scc. In Proceedings of the 3rd MARC Symposium, pages 53-58, 2011.
-
(2011)
Proceedings of the 3rd MARC Symposium
, pp. 53-58
-
-
Rotta, R.1
-
25
-
-
33646719765
-
High performance RDMA based all-to-all broadcast for infiniband clusters
-
S. Sur, U. K. R. Bondhugula, A. Mamidala, H. W. Jin, and D. K. Panda. High performance RDMA based all-to-all broadcast for infiniband clusters. In Proceedings of the 12th international conference on High Performance Computing, HiPC'05, pages 148-157, 2005.
-
(2005)
Proceedings of the 12th International Conference on High Performance Computing, HiPC'05
, pp. 148-157
-
-
Sur, S.1
Bondhugula, U.K.R.2
Mamidala, A.3
Jin, H.W.4
Panda, D.K.5
-
26
-
-
14744288044
-
Optimization of collective communication operations in MPICH
-
DOI 10.1177/1094342005051521
-
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. IJHPCA, 19(1):49-66, 2005. (Pubitemid 40329106)
-
(2005)
International Journal of High Performance Computing Applications
, vol.19
, Issue.1
, pp. 49-66
-
-
Thakur, R.1
Rabenseifner, R.2
Gropp, W.3
-
27
-
-
70450209566
-
Architectures for extreme-scale computing
-
Nov.
-
J. Torrellas. Architectures for Extreme-Scale Computing. Computer, 42(11):28-35, Nov. 2009.
-
(2009)
Computer
, vol.42
, Issue.11
, pp. 28-35
-
-
Torrellas, J.1
-
28
-
-
80053027876
-
RCKMPI - Lightweight MPI implementation for intel's single-chip cloud computer (SCC)
-
I. A. C. Ureña, M. Riepen, and M. Konow. RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC). In Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface, EuroMPI'11, pages 208-217, 2011.
-
(2011)
Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, EuroMPI'11
, pp. 208-217
-
-
Ureña, I.A.C.1
Riepen, M.2
Konow, M.3
-
29
-
-
84856529095
-
Light-weight communications on Intel's single-chip cloud computer processor
-
Feb.
-
R. F. van der Wijngaart, T. G. Mattson, and W. Haas. Light-weight communications on Intel's single-chip cloud computer processor. ACM SIGOPS Operating Systems Review, 45(1):73-83, Feb. 2011.
-
(2011)
ACM SIGOPS Operating Systems Review
, vol.45
, Issue.1
, pp. 73-83
-
-
Van Der Wijngaart, R.F.1
Mattson, T.G.2
Haas, W.3
-
30
-
-
84870534520
-
Efficient memory copy operations on the 48-core intel SCC processor
-
M. W. van Tol, R. Bakker, M. Verstraaten, C. Grelck, and C. R. Jesshope. Efficient Memory Copy Operations on the 48-core Intel SCC Processor. In Proceedings of the 3rd MARC Symposium, pages 13-18, 2011.
-
(2011)
Proceedings of the 3rd MARC Symposium
, pp. 13-18
-
-
Van Tol, M.W.1
Bakker, R.2
Verstraaten, M.3
Grelck, C.4
Jesshope, C.R.5
|