-
1
-
-
4444241348
-
The Spread toolkit: Architecture and performance
-
Technical Report CNDS-2004-1, Johns Hopkins University, Center for Networking and Distributed Systems, Baltimore, MD, USA
-
Y. Amir, C. Danilov, M. Miskin-Amir, J. Schultz, and J. Stanton. The Spread toolkit: Architecture and performance. Technical Report CNDS-2004-1, Johns Hopkins University, Center for Networking and Distributed Systems, Baltimore, MD, USA, 2004.
-
(2004)
-
-
Amir, Y.1
Danilov, C.2
Miskin-Amir, M.3
Schultz, J.4
Stanton, J.5
-
2
-
-
0001776520
-
Group communication specifications: A comprehensive study
-
ACM Press, New York, NY, USA
-
G. V. Chockler, I. Keidar, and R. Vitenberg. Group communication specifications: A comprehensive study. ACM Computing Surveys (CSUR), 33(4):427-469, 2001. ACM Press, New York, NY, USA.
-
(2001)
ACM Computing Surveys (CSUR)
, vol.33
, Issue.4
, pp. 427-469
-
-
Chockler, G.V.1
Keidar, I.2
Vitenberg, R.3
-
3
-
-
49049116823
-
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System, 2007. http://www.lustre.org.
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System, 2007. http://www.lustre.org.
-
-
-
-
4
-
-
49049120622
-
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System Architecture Whitepaper, 2007. http://www.lustre.org/docs/whitepaper.pdf.
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System Architecture Whitepaper, 2007. http://www.lustre.org/docs/whitepaper.pdf.
-
-
-
-
6
-
-
13644278157
-
Total order broadcast and multicast algorithms: Taxonomy and survey
-
ACM Press, New York, NY, USA
-
X. Dáfago, A. Schiper, and P. Urbán. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys (CSUR), 36(4):372-421, 2004. ACM Press, New York, NY, USA.
-
(2004)
ACM Computing Surveys (CSUR)
, vol.36
, Issue.4
, pp. 372-421
-
-
Dáfago, X.1
Schiper, A.2
Urbán, P.3
-
7
-
-
0030129232
-
The Transis approach to high availability cluster communication
-
ACM Press, New York, NY, USA
-
D. Dolev and D. Malki. The Transis approach to high availability cluster communication. Communications of the ACM, 39(4):64-70, 1996. ACM Press, New York, NY, USA.
-
(1996)
Communications of the ACM
, vol.39
, Issue.4
, pp. 64-70
-
-
Dolev, D.1
Malki, D.2
-
8
-
-
38049153567
-
-
th International Conference on Computational Science (ICCS) 2007, Part II, 4488, pages 784-791, Beijing, China, May 27-30, 2007. Springer Verlag, Berlin, Germany.
-
th International Conference on Computational Science (ICCS) 2007, Part II, volume 4488, pages 784-791, Beijing, China, May 27-30, 2007. Springer Verlag, Berlin, Germany.
-
-
-
-
9
-
-
34548163601
-
Concepts for high availability in scientific high-end computing
-
Santa Fe, NM, USA, Oct. 11
-
C. Engelmann and S. L. Scott. Concepts for high availability in scientific high-end computing. In Proceedings of the High Availability and Performance Workshop (HAPCW) 2005, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2005, Santa Fe, NM, USA, Oct. 11, 2005.
-
(2005)
Proceedings of the High Availability and Performance Workshop (HAPCW) 2005, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2005
-
-
Engelmann, C.1
Scott, S.L.2
-
10
-
-
49049117939
-
High availability for ultra-scale high-end scientific computing
-
Cambridge, MA, USA, June 19
-
th ACM International Conference on Supercomputing (ICS) 2005, Cambridge, MA, USA, June 19, 2005.
-
(2005)
th ACM International Conference on Supercomputing (ICS) 2005
-
-
Engelmann, C.1
Scott, S.L.2
-
11
-
-
33646428515
-
High availability through distributed control
-
Santa Fe, NM, USA, Oct. 12
-
C. Engelmann, S. L. Scott, and G. A. Geist. High availability through distributed control. In Proceedings of the High Availability and Performance Workshop (HAPCW) 2004, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2004, Santa Fe, NM, USA, Oct. 12, 2004.
-
(2004)
Proceedings of the High Availability and Performance Workshop (HAPCW) 2004, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2004
-
-
Engelmann, C.1
Scott, S.L.2
Geist, G.A.3
-
12
-
-
33750954729
-
Active/active replication for highly available HPC system services
-
Vienna, Austria, Apr. 20-22, IEEE Computer Society
-
st International Conference on Availability, Reliability and Security (ARES) 2006, pages 639-645, Vienna, Austria, Apr. 20-22, 2006. IEEE Computer Society.
-
(2006)
st International Conference on Availability, Reliability and Security (ARES)
, pp. 639-645
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
13
-
-
34548190800
-
Symmetric active/active high availability for high-performance computing system services
-
Academy Publisher, Oulu, Finland
-
C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Symmetric active/active high availability for high-performance computing system services. Journal of Computers (JCP), l(8):43-54, 2006. Academy Publisher, Oulu, Finland.
-
(2006)
Journal of Computers (JCP), l
, pp. 43-54
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
14
-
-
34548183060
-
Towards high availability for high-performance computing system services: Accomplishments and limitations
-
Santa Fe, NM, USA, Oct. 17
-
C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Towards high availability for high-performance computing system services: Accomplishments and limitations. In Proceedings of the High Availability and Performance Workshop (HAPCW) 2006, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2006, Santa Fe, NM, USA, Oct. 17, 2006.
-
(2006)
Proceedings of the High Availability and Performance Workshop (HAPCW) 2006, in conjunction with the Los Alamos Computer Science Institute (LACSI) Symposium 2006
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
15
-
-
34548190322
-
On programming models for service-level high availability
-
Vienna, Austria, Apr. 10-13, IEEE Computer Society
-
nd International Conference on Availability, Reliability and Security (ARES) 2007, pages 999-1006, Vienna, Austria, Apr. 10-13, 2007. IEEE Computer Society.
-
(2007)
nd International Conference on Availability, Reliability and Security (ARES)
, pp. 999-1006
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
16
-
-
34548305034
-
Transparent symmetric active/active replication for service-level high availability
-
Rio de Janeiro, Brazil, May 14-17, IEEE Computer Society
-
th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2007, pages 755-760, Rio de Janeiro, Brazil, May 14-17, 2007. IEEE Computer Society.
-
(2007)
th IEEE International Symposium on Cluster Computing and the Grid (CCGrid)
, pp. 755-760
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
17
-
-
4644300495
-
-
Prentice Hall PTR, Upper Saddle River, NJ, USA, Aug
-
T. Erl. Service-Oriented Architecture: Concepts, Technology, and Design. Prentice Hall PTR, Upper Saddle River, NJ, USA, Aug. 2005.
-
(2005)
Service-Oriented Architecture: Concepts, Technology, and Design
-
-
Erl, T.1
-
20
-
-
49049113576
-
-
IBM Corporation, Armonk, NY, USA. MareNostrum eServer Computing Platform Documentation, 2007. http://www.ibm.com/servers/eserver/linux/power/marenostrum.
-
IBM Corporation, Armonk, NY, USA. MareNostrum eServer Computing Platform Documentation, 2007. http://www.ibm.com/servers/eserver/linux/power/marenostrum.
-
-
-
-
21
-
-
33746921679
-
Software architecture of the light weight kernel, Catamount
-
Albuquerque, NM, USA, May 16-19
-
th Cray User Group (CUG) Conference 2005, Albuquerque, NM, USA, May 16-19, 2005.
-
(2005)
th Cray User Group (CUG) Conference 2005
-
-
Kelly, S.M.1
Brightwell, R.2
-
22
-
-
84976782029
-
Using time instead of timeout for fault-tolerant distributed systems
-
ACM Press, New York, NY, USA
-
L. Lamport. Using time instead of timeout for fault-tolerant distributed systems. ACM Transactions on Programming Languages and Systems (TOPLAS), 6(2):254-280, 1984. ACM Press, New York, NY, USA.
-
(1984)
ACM Transactions on Programming Languages and Systems (TOPLAS)
, vol.6
, Issue.2
, pp. 254-280
-
-
Lamport, L.1
-
23
-
-
0030652761
-
Building reliable distributed systems with CORBA
-
Wiley InterScience, John Wiley & Sons, Inc, Hoboken, NJ, USA
-
S. Landis and S. Maffeis. Building reliable distributed systems with CORBA. Theory and Practice of Object Systems, 3(1):31-43, 1997. Wiley InterScience, John Wiley & Sons, Inc., Hoboken, NJ, USA.
-
(1997)
Theory and Practice of Object Systems
, vol.3
, Issue.1
, pp. 31-43
-
-
Landis, S.1
Maffeis, S.2
-
24
-
-
0004918487
-
The object group design pattern
-
Toronto, ON, Canada, June 17-21, USENIX Association, Berkeley, CA, USA
-
nd USENIX Conference on Object-Oriented Technologies (COOTS) 1996, page 12, Toronto, ON, Canada, June 17-21, 1996. USENIX Association, Berkeley, CA, USA.
-
(1996)
nd USENIX Conference on Object-Oriented Technologies (COOTS) 1996
, pp. 12
-
-
Maffeis, S.1
-
25
-
-
33749404596
-
Thema: Byzantine-faulttolerant middleware for web-service applications
-
Orlando, FL, USA, Oct. 26-28, IEEE Computer Society
-
th IEEE Symposium on Reliable Distributed Systems (SRDS) 2005, pages 131-142, Orlando, FL, USA, Oct. 26-28, 2005. IEEE Computer Society.
-
(2005)
th IEEE Symposium on Reliable Distributed Systems (SRDS)
, pp. 131-142
-
-
Merideth, M.G.1
Iyengar, A.2
Mikalsen, T.3
Tai, S.4
Rouvellou, I.5
Narasimhan, P.6
-
26
-
-
34548236975
-
-
J. Moreira, M. Brutman, n. Josá Casta T. Engelsiepen, M. Giampapa, T. Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, J. Parker, and B. Wallenfelt. Designing a highly-scalable operating system: The Blue Gene/L story. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2006, page 118, Tampa, FL, USA, Nov. 11-17, 2006. ACM Press, New York, NY, USA.
-
J. Moreira, M. Brutman, n. Josá Casta T. Engelsiepen, M. Giampapa, T. Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, J. Parker, and B. Wallenfelt. Designing a highly-scalable operating system: The Blue Gene/L story. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2006, page 118, Tampa, FL, USA, Nov. 11-17, 2006. ACM Press, New York, NY, USA.
-
-
-
-
27
-
-
0028576754
-
Extended virtual synchrony
-
Poznan, Poland, June 21-24, IEEE Computer Society
-
th IEEE International Conference on Distributed Computing Systems (ICDCS) 1994, pages 56-65, Poznan, Poland, June 21-24, 1994. IEEE Computer Society.
-
(1994)
th IEEE International Conference on Distributed Computing Systems (ICDCS)
, pp. 56-65
-
-
Moser, L.E.1
Amir, Y.2
Melliar-Smith, P.M.3
Agarwal, D.A.4
-
28
-
-
49049093933
-
Exploring process groups for reliability, availability and serviceability of terascale computing systems
-
Athens, Greece, June 19-21
-
nd International Conference on Computer Science and Information Systems 2006, Athens, Greece, June 19-21,2006.
-
(2006)
nd International Conference on Computer Science and Information Systems 2006
-
-
Okunbor, D.I.1
Engelmann, C.2
Scott, S.L.3
-
29
-
-
50649096937
-
Symmetric active/active metadata service for highly available cluster storage systems
-
Cambridge, MA, USA, Nov. 19-21, ACTA Press, Calgary, AB, Canada
-
th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS) 2007, Cambridge, MA, USA, Nov. 19-21, 2007. ACTA Press, Calgary, AB, Canada.
-
(2007)
th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS) 2007
-
-
Ou, L.1
Engelmann, C.2
He, X.3
Chen, X.4
Scott, S.L.5
-
30
-
-
40949146242
-
A fast delivery protocol for total order broadcasting
-
Honolulu, HI, USA, Aug. 13-16, IEEE Computer Society
-
th IEEE International Conference on Computer Communications and Networks (ICCCN) 2007, Honolulu, HI, USA, Aug. 13-16, 2007. IEEE Computer Society.
-
(2007)
th IEEE International Conference on Computer Communications and Networks (ICCCN) 2007
-
-
Ou, L.1
He, X.2
Engelmann, C.3
Scott, S.L.4
-
31
-
-
0036036816
-
-
ACM Press, New York, NY, USA
-
R. Rodrigues, M. Castro, and B. Liskov. BASE: Using abstraction to improve fault tolerance. volume 35, pages 15-28, 2001. ACM Press, New York, NY, USA.
-
(2001)
BASE: Using abstraction to improve fault tolerance
, vol.35
, pp. 15-28
-
-
Rodrigues, R.1
Castro, M.2
Liskov, B.3
-
32
-
-
0025564050
-
Implementing fault-tolerant services using the state machine approach: A tutorial
-
ACM Press, New York, NY, USA
-
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299-319, 1990. ACM Press, New York, NY, USA.
-
(1990)
ACM Computing Surveys (CSUR)
, vol.22
, Issue.4
, pp. 299-319
-
-
Schneider, F.B.1
-
33
-
-
46049083585
-
JOSHUA: Symmetric active/active replication for highly available HPC job and resource management
-
Barcelona, Spain, Sept. 25-28, IEEE Computer Society
-
th IEEE International Conference on Cluster Computing (Cluster) 2006, Barcelona, Spain, Sept. 25-28, 2006. IEEE Computer Society.
-
(2006)
th IEEE International Conference on Cluster Computing (Cluster) 2006
-
-
Uhlemann, K.1
Engelmann, C.2
Scott, S.L.3
|