-
1
-
-
50649103838
-
-
Altair Engineering, Troy, MI, USA. OpenPBS, 2007. http://www.openpbs.org.
-
Altair Engineering, Troy, MI, USA. OpenPBS, 2007. http://www.openpbs.org.
-
-
-
-
2
-
-
0029408206
-
The Totem single-ring ordering and membership protocol
-
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P.W. Ciarfella. The Totem single-ring ordering and membership protocol. ACM Transactions on Computer Systems, 13(4):311-342, 1995.
-
(1995)
ACM Transactions on Computer Systems
, vol.13
, Issue.4
, pp. 311-342
-
-
Amir, Y.1
Moser, L.E.2
Melliar-Smith, P.M.3
Agarwal, D.A.4
Ciarfella, P.W.5
-
4
-
-
50649083212
-
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System Architecture, 2007. http://www.lustre.org/docs/whitepaper.pdf.
-
Cluster File Systems, Inc., Boulder, CO, USA. Lustre Cluster File System Architecture, 2007. http://www.lustre.org/docs/whitepaper.pdf.
-
-
-
-
5
-
-
50649117047
-
-
Cluster Resources, Inc, Salt Lake City, UT, USA. Moab Workload Manager Administrator's Guide, 2007. http://www.clusterresources.com/products/mwm/docs.
-
Cluster Resources, Inc, Salt Lake City, UT, USA. Moab Workload Manager Administrator's Guide, 2007. http://www.clusterresources.com/products/mwm/docs.
-
-
-
-
6
-
-
50649084698
-
-
Cluster Resources, Inc, Salt Lake City, UT, USA. TORQUE Resource Manager, 2007. http://www.clusterresources.com/torque.
-
Cluster Resources, Inc, Salt Lake City, UT, USA. TORQUE Resource Manager, 2007. http://www.clusterresources.com/torque.
-
-
-
-
7
-
-
13644278157
-
Total order broadcast and multicast algorithms: Taxonomy and survey
-
X. Défago, A. Schiper, and P. Urbán. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4):372-421, 2004.
-
(2004)
ACM Computing Surveys
, vol.36
, Issue.4
, pp. 372-421
-
-
Défago, X.1
Schiper, A.2
Urbán, P.3
-
8
-
-
0030129232
-
The Transis approach to high availability cluster communication
-
D. Dolev and D. Malki. The Transis approach to high availability cluster communication. Communications of the ACM, 39(4):64-70, 1996.
-
(1996)
Communications of the ACM
, vol.39
, Issue.4
, pp. 64-70
-
-
Dolev, D.1
Malki, D.2
-
9
-
-
38049153567
-
-
th International Conference on Computational Science, Part II, 4488, pages 784-791, Beijing, China, May 27-30, 2007.
-
th International Conference on Computational Science, Part II, volume 4488, pages 784-791, Beijing, China, May 27-30, 2007.
-
-
-
-
10
-
-
34548163601
-
Concepts for high availability in scientific high-end computing
-
Santa Fe, NM, USA, Oct. 11
-
C. Engelmann and S. L. Scott. Concepts for high availability in scientific high-end computing. In Proceedings of the High Availability and Performance Workshop, Santa Fe, NM, USA, Oct. 11, 2005.
-
(2005)
Proceedings of the High Availability and Performance Workshop
-
-
Engelmann, C.1
Scott, S.L.2
-
11
-
-
34548182817
-
High availability for ultra-scale high-end scientific computing
-
Cambridge, MA, USA, June 19
-
nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters, Cambridge, MA, USA, June 19, 2005.
-
(2005)
nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters
-
-
Engelmann, C.1
Scott, S.L.2
-
12
-
-
33750954729
-
Active/active replication for highly available HPC system services
-
Vienna, Austria, Apr. 20-22
-
st International Conference on Availability, Reliability and Security, pages 639-645, Vienna, Austria, Apr. 20-22, 2006.
-
(2006)
st International Conference on Availability, Reliability and Security
, pp. 639-645
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
13
-
-
34548190800
-
Symmetric active/active high availability for high-performance computing system services
-
C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Symmetric active/active high availability for high-performance computing system services. Journal of Computers, 1(8):43-54, 2006.
-
(2006)
Journal of Computers
, vol.1
, Issue.8
, pp. 43-54
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
14
-
-
34548190322
-
On programming models for service-level high availability
-
Vienna, Austria, Apr. 10-13
-
nd International Conference on Availability, Reliability and Security, pages 999-1006, Vienna, Austria, Apr. 10-13, 2007.
-
(2007)
nd International Conference on Availability, Reliability and Security
, pp. 999-1006
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
15
-
-
34548305034
-
Transparent symmetric active/active replication for service-level high availability
-
Rio de Janeiro, Brazil, May 14-17
-
th IEEE International Symposium on Cluster Computing and the Grid, pages 755-760, Rio de Janeiro, Brazil, May 14-17, 2007.
-
(2007)
th IEEE International Symposium on Cluster Computing and the Grid
, pp. 755-760
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
16
-
-
49049107395
-
Symmetric active/active replication for dependent services
-
Barcelona, Spain, Mar. 4-7, To appear
-
rd International Conference on Availability, Reliability and Security, Barcelona, Spain, Mar. 4-7, 2008. To appear.
-
(2008)
rd International Conference on Availability, Reliability and Security
-
-
Engelmann, C.1
Scott, S.L.2
Leangsuksun, C.3
He, X.4
-
17
-
-
33646389112
-
Asymmetric active-active high availability for high-end computing
-
Cambridge, MA, USA, June 19
-
nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters, Cambridge, MA, USA, June 19, 2005.
-
(2005)
nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters
-
-
Leangsuksun, C.1
Munganuru, V.K.2
Liu, T.3
Scott, S.L.4
Engelmann, C.5
-
18
-
-
50249144002
-
Job-site level fault tolerance for cluster and grid environments
-
Boston, MA, USA, Sept. 26-30
-
th IEEE International Conference on Cluster Computing, pages 1-9, Boston, MA, USA, Sept. 26-30, 2005.
-
(2005)
th IEEE International Conference on Cluster Computing
, pp. 1-9
-
-
Limaye, K.1
Leangsuksun, C.2
Greenwood, Z.3
Scott, S.L.4
Engelmann, C.5
Libby, R.M.6
Chanchio, K.7
-
19
-
-
0028576754
-
Extended virtual synchrony
-
Poznan, Poland, June 21-24
-
th IEEE International Conference on Distributed Computing Systems, pages 56-65, Poznan, Poland, June 21-24, 1994.
-
(1994)
th IEEE International Conference on Distributed Computing Systems
, pp. 56-65
-
-
Moser, L.E.1
Amir, Y.2
Melliar-Smith, P.M.3
Agarwal, D.A.4
-
20
-
-
50649096937
-
Symmetric active/active metadata service for highly available cluster storage systems
-
Cambridge, MA, USA, Nov. 19-21
-
th IASTED International Conference on Parallel and Distributed Computing and Systems, Cambridge, MA, USA, Nov. 19-21, 2007.
-
(2007)
th IASTED International Conference on Parallel and Distributed Computing and Systems
-
-
Ou, L.1
Engelmann, C.2
He, X.3
Chen, X.4
Scott, S.L.5
-
21
-
-
40949146242
-
A fast delivery protocol for total order broadcasting
-
Honolulu, HI, USA, Aug. 13-16
-
th IEEE International Conference on Computer Communications and Networks, Honolulu, HI, USA, Aug. 13-16, 2007.
-
(2007)
th IEEE International Conference on Computer Communications and Networks
-
-
Ou, L.1
He, X.2
Engelmann, C.3
Scott, S.L.4
-
25
-
-
0036036816
-
BASE: Using abstraction to improve fault tolerance
-
R. Rodrigues, M. Castro, and B. Liskov. BASE: Using abstraction to improve fault tolerance. ACM SIGOPS Operating Systems Review, 35(5):15-28, 2001.
-
(2001)
ACM SIGOPS Operating Systems Review
, vol.35
, Issue.5
, pp. 15-28
-
-
Rodrigues, R.1
Castro, M.2
Liskov, B.3
-
26
-
-
0025564050
-
Implementing fault-tolerant services using the state machine approach: A tutorial
-
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299-319, 1990.
-
(1990)
ACM Computing Surveys
, vol.22
, Issue.4
, pp. 299-319
-
-
Schneider, F.B.1
-
27
-
-
50649103654
-
-
Sun Microsystems, Inc, Santa Clara, CA, USA. Sun Grid Engine Documentation, 2007. http://gridengine.sunsource.net.
-
Sun Microsystems, Inc, Santa Clara, CA, USA. Sun Grid Engine Documentation, 2007. http://gridengine.sunsource.net.
-
-
-
-
29
-
-
33746591369
-
High availability fundamentals
-
Nov
-
E. Vargas. High availability fundamentals. Sun Blueprints, Nov. 2000.
-
(2000)
Sun Blueprints
-
-
Vargas, E.1
-
30
-
-
0242571753
-
-
th International Workshop on Job Scheduling Strategies for Parallel Processing, 2862, pages 44-60, Seattle, WA, USA, June 24, 2003.
-
th International Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862, pages 44-60, Seattle, WA, USA, June 24, 2003.
-
-
-
|