-
1
-
-
28044457320
-
Monitoring Hard Disk with SMART
-
January
-
B. Allen, "Monitoring Hard Disk with SMART", Linux Journal, January, 2004.
-
(2004)
Linux Journal
-
-
Allen, B.1
-
2
-
-
4544337911
-
Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-Peer Systems
-
IEEE Computer Society, Chicago,IL
-
J. Brevik, D. Nurmi, and R. Wolski, "Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-Peer Systems", Proc. of IEEE CCGrid, IEEE Computer Society, Chicago,IL, 2004, pp. 190-199.
-
(2004)
Proc. of IEEE CCGrid
, pp. 190-199
-
-
Brevik, J.1
Nurmi, D.2
Wolski, R.3
-
3
-
-
23944436115
-
New Grid Scheduling and Rescheduling Methods in the GrADS Project
-
F. Berman, H. Casanova, et al., "New Grid Scheduling and Rescheduling Methods in the GrADS Project", Intl. Journal of Parallel Programming, 2005, pp. 209-229
-
(2005)
Intl. Journal of Parallel Programming
, pp. 209-229
-
-
Berman, F.1
Casanova, H.2
-
4
-
-
1542383568
-
Reliable matching and scheduling of precedence-constrained tasks in heterogeneous distributed computing
-
IEEE Computer Society, Toronto, Canada
-
A. Dogan,F. Ozguner, "Reliable matching and scheduling of precedence-constrained tasks in heterogeneous distributed computing,"In Proc. of the ICPP, IEEE Computer Society, Toronto, Canada, 2000, pp. 307
-
(2000)
Proc. of the ICPP
, pp. 307
-
-
Dogan, A.1
Ozguner, F.2
-
5
-
-
33751107476
-
MPI-Mitten: Enabling Migration Technology in MPI
-
IEEE Computer Society, Singapore
-
Cong Du and Xian-He Sun, "MPI-Mitten: Enabling Migration Technology in MPI", in Proc. of CCGRID, IEEE Computer Society, Singapore, 2006, pp. 11-18
-
(2006)
Proc. of CCGRID
, pp. 11-18
-
-
Du, C.1
Sun, X.-H.2
-
6
-
-
9144223280
-
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
-
Elmootazbellah N. Elnozahy and James S. Plank, "Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery", IEEE Transactions on Dependable and Secure Computing, Volume 1, No 2, 2004, pp. 97-108.
-
(2004)
IEEE Transactions on Dependable and Secure Computing
, vol.1
, Issue.2
, pp. 97-108
-
-
Elnozahy, E.N.1
Plank, J.S.2
-
8
-
-
47249123819
-
Exploring Meta-learning to Improve Failure Prediction in Supercomputing Clusters
-
P. Gujrati, Y. Li, Z. Lan, R. Thakur, and J. White, "Exploring Meta-learning to Improve Failure Prediction in Supercomputing Clusters", in Proc. of ICPP07, 2007
-
(2007)
Proc. of ICPP07
-
-
Gujrati, P.1
Li, Y.2
Lan, Z.3
Thakur, R.4
White, J.5
-
9
-
-
0037342701
-
A fault-tolerant scheduling algorithm for real-time periodic tasks with possible software faults
-
C.-C. Han, K.G. Shin, J. Wu, "A fault-tolerant scheduling algorithm for real-time periodic tasks with possible software faults," IEEE Trans. Computers, Vol.52, No.3 pp.362-372, 2003
-
(2003)
IEEE Trans. Computers
, vol.52
, Issue.3
, pp. 362-372
-
-
Han, C.-C.1
Shin, K.G.2
Wu, J.3
-
10
-
-
47249157799
-
Advanced Failure Prediction in Complex Software Systems
-
G. Hoffmann, F. Salfner, M. Malek, "Advanced Failure Prediction in Complex Software Systems", in Proc. of SRDS, 2004
-
(2004)
Proc. of SRDS
-
-
Hoffmann, G.1
Salfner, F.2
Malek, M.3
-
11
-
-
1242329663
-
Application Of A Model-Based Fault Detection System To Nuclear Plant Signals
-
Seoul,Korea
-
K. C. Gross, R. M. Singer, S. W. Wegerich, J. P. Herzog, R. VanAlstine, and F. Bockhorst, "Application Of A Model-Based Fault Detection System To Nuclear Plant Signals", in Proc. of ISAP,Seoul,Korea, 1997, pp. 66-70
-
(1997)
Proc. of ISAP
, pp. 66-70
-
-
Gross, K.C.1
Singer, R.M.2
Wegerich, S.W.3
Herzog, J.P.4
VanAlstine, R.5
Bockhorst, F.6
-
14
-
-
47249121426
-
-
IBM LoadLeveler for AIX 5L, available at http: //publib.boulder.ibm.com
-
IBM LoadLeveler for AIX 5L, available at http: //publib.boulder.ibm.com
-
-
-
-
15
-
-
33751082401
-
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
-
Singapore
-
Yawei Li, Zhiling Lan, "Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing", in Proc. of IEEE CCGrid'06, Singapore,2006,pp. 531-538
-
(2006)
Proc. of IEEE CCGrid'06
, pp. 531-538
-
-
Li, Y.1
Lan, Z.2
-
16
-
-
47249160413
-
-
Hardware monitoring by lm sensors, available at http: //secure.netroedge.com/-lm78/info.html.
-
Hardware monitoring by lm sensors, available at http: //secure.netroedge.com/-lm78/info.html.
-
-
-
-
18
-
-
0003912256
-
Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System
-
University of Wisconsin-Madison Computer Science Technical Report #1346
-
M. Lizkow, T. Tannenbaum, et al., "Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System", University of Wisconsin-Madison Computer Science Technical Report #1346, 1997.
-
(1997)
-
-
Lizkow, M.1
Tannenbaum, T.2
-
19
-
-
36949009638
-
Scalable Diskless Checkpointing for Large Parallel Systems
-
Ph.D. thesis, University of Illinois at Urbana-Champaign
-
Charng-Da Lu, "Scalable Diskless Checkpointing for Large Parallel Systems", Ph.D. thesis, University of Illinois at Urbana-Champaign, 2005
-
(2005)
-
-
Lu, C.-D.1
-
20
-
-
84872514589
-
-
available at
-
Moab Workload Manager, available at http://www.clusterresources.com
-
Moab Workload Manager
-
-
-
21
-
-
0035363047
-
-
A. Mu'alem and D. Feitelson, Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling, in IEEE Trans. Parallel and Distributed Systems, 12(6), 2001,pp. 529-543
-
A. Mu'alem and D. Feitelson, "Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling", in IEEE Trans. Parallel and Distributed Systems, Vol. 12(6), 2001,pp. 529-543
-
-
-
-
22
-
-
12444257746
-
-
A. Oliner, Ramendra K. Sahoo, José E. Moreira, Manish Gupta, Anand Sivasubramaniam, Fault-Aware Job Scheduling for BlueGene/L Systems, in Proc. of IPDPS, 2004,
-
A. Oliner, Ramendra K. Sahoo, José E. Moreira, Manish Gupta, Anand Sivasubramaniam, "Fault-Aware Job Scheduling for BlueGene/L Systems", in Proc. of IPDPS, 2004,
-
-
-
-
23
-
-
0343644421
-
-
available at
-
Parallel Workloads Archive, available at http://www.cs.huji.ac.il/labs/ parallel/workload/
-
Parallel Workloads Archive
-
-
-
24
-
-
47249098059
-
System-level fault-tolerance in large-scale parallel machines with buffered coscheduling
-
Petrini, F.; Davis, K.; Sancho, J.C," System-level fault-tolerance in large-scale parallel machines with buffered coscheduling", in Proc. of IPDPS, 2004, pp. 209
-
(2004)
Proc. of IPDPS
, pp. 209
-
-
Petrini, F.1
Davis, K.2
Sancho, J.C.3
-
25
-
-
0032683084
-
Safety and Reliability Driven Task Allocation in Distributed Systems
-
S. Srinivasan, and N.K. Jha, "Safety and Reliability Driven Task Allocation in Distributed Systems," in IEEE Trans. Parallel and Distributed Systems, Vol 10(3), 1999, pp. 238-251
-
(1999)
IEEE Trans. Parallel and Distributed Systems
, vol.10
, Issue.3
, pp. 238-251
-
-
Srinivasan, S.1
Jha, N.K.2
-
26
-
-
20444463471
-
A Dynamic and Reliability-driven Scheduling Algorithmfor Parallel Real-time Jobs on Heterogeneous Clusters
-
X. Qin and H. Jiang, "A Dynamic and Reliability-driven Scheduling Algorithmfor Parallel Real-time Jobs on Heterogeneous Clusters," in Journal of Parallel and Distributed Computing, vol. 65, no. 8, 2005, pp. 885-900.
-
(2005)
Journal of Parallel and Distributed Computing
, vol.65
, Issue.8
, pp. 885-900
-
-
Qin, X.1
Jiang, H.2
-
27
-
-
77952378080
-
Critical Event Prediction for Proactive Management in Large-scale Computer Clusters
-
Washington DC, USA
-
Ramendra K. Sahoo, A. Oliner, et al., "Critical Event Prediction for Proactive Management in Large-scale Computer Clusters", in Proc. of KDD, Washington DC, USA,2003,pp. 426-435
-
(2003)
Proc. of KDD
, pp. 426-435
-
-
Ramendra, K.1
Sahoo, A.O.2
-
28
-
-
0026923304
-
Task Allocation for Maximizing Reliability of Distributed Computer Systems
-
S. Shatz, J. Wang, and M. Goto, "Task Allocation for Maximizing Reliability of Distributed Computer Systems", in IEEE Trans. on Computers, Vol 41(9), 1992,pp. 1156-1168
-
(1992)
IEEE Trans. on Computers
, vol.41
, Issue.9
, pp. 1156-1168
-
-
Shatz, S.1
Wang, J.2
Goto, M.3
-
29
-
-
78149354391
-
Predicting Rare Events in Temporal Domains
-
R. Vilalta and S. Ma, "Predicting Rare Events in Temporal Domains", in Proc. of IEEE ICDM, 2002, pp.474-481
-
(2002)
Proc. of IEEE ICDM
, pp. 474-481
-
-
Vilalta, R.1
Ma, S.2
|