-
1
-
-
0036041277
-
Improving cluster availability using workstation validation
-
Jun
-
T. Heath, R. P. Martin, and T. D. Nguyen, "Improving cluster availability using workstation validation," SIGMETRICS Perform. Eval. Rev., vol. 30, no. 1, pp. 217-227, Jun. 2002.
-
(2002)
SIGMETRICS Perform. Eval. Rev
, vol.30
, Issue.1
, pp. 217-227
-
-
Heath, T.1
Martin, R.P.2
Nguyen, T.D.3
-
2
-
-
33845593340
-
-
B. Schroeder and G. A Gibson, A large-scale study of failures in high-performance computing systems, in Proceedings of the international Conference on Dependable Systems and Networks, June 2006, pp. 249-258.
-
B. Schroeder and G. A Gibson, "A large-scale study of failures in high-performance computing systems," in Proceedings of the international Conference on Dependable Systems and Networks, June 2006, pp. 249-258.
-
-
-
-
3
-
-
84958782417
-
Networked windows NT system field failure data analysis
-
J. Xu, Z. Kalbarczyk, and R. K Iyer, "Networked windows NT system field failure data analysis," in Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing, 1999, pp. 178-185.
-
(1999)
Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing
, pp. 178-185
-
-
Xu, J.1
Kalbarczyk, Z.2
Iyer, R.K.3
-
4
-
-
4544382099
-
Failure data analysis of a large-scale heterogeneous server environment
-
July
-
R. K Sahoo, M. S. Squillante, A. Sivasubramaniam, and Y. Zhang, "Failure data analysis of a large-scale heterogeneous server environment," in International Conference on Dependable Systems and Networks, July 2004, pp. 772-781.
-
(2004)
International Conference on Dependable Systems and Networks
, pp. 772-781
-
-
Sahoo, R.K.1
Squillante, M.S.2
Sivasubramaniam, A.3
Zhang, Y.4
-
5
-
-
77949271094
-
Reliability analysis in HPC clusters
-
Santa Fe, NM, USA, October 17
-
N. R. Gottumukkala, C. Leangsuksun, Y. Liu, R. Nassar, and S. L. Scott, "Reliability analysis in HPC clusters," in Proceedings of High Availability and Performance Workshop (HAPCW), in Conjunction With Los Alamos Computer Science Institute (LACSI) Symposium 2006, Santa Fe, NM, USA, October 17, 2006.
-
(2006)
Proceedings of High Availability and Performance Workshop (HAPCW), in Conjunction With Los Alamos Computer Science Institute (LACSI) Symposium 2006
-
-
Gottumukkala, N.R.1
Leangsuksun, C.2
Liu, Y.3
Nassar, R.4
Scott, S.L.5
-
6
-
-
84976815079
-
Measurement and modeling of computer reliability as affected by system activity
-
R. K. Iyer, D. J. Rossetti, and M. C. Hsueh, "Measurement and modeling of computer reliability as affected by system activity," ACM Trans. Computer Systems, vol. 4, no. 3, pp. 214-237, 1986.
-
(1986)
ACM Trans. Computer Systems
, vol.4
, Issue.3
, pp. 214-237
-
-
Iyer, R.K.1
Rossetti, D.J.2
Hsueh, M.C.3
-
7
-
-
0025502686
-
Error log analysis: Statistical modeling and heuristic trend analysis
-
D.P, Oct
-
T. Lin and D. P. Siewiorek, "Error log analysis: Statistical modeling and heuristic trend analysis," IEEE Trans. Reliability, vol. 39, no. 4, pp. 419-432, Oct. 1990, D.P.
-
(1990)
IEEE Trans. Reliability
, vol.39
, Issue.4
, pp. 419-432
-
-
Lin, T.1
Siewiorek, D.P.2
-
8
-
-
0024092486
-
Estimation of system reliability for independent series components with Weibull life distributions
-
Oct
-
D. K. Dey and L. R. Jaisingh, "Estimation of system reliability for independent series components with Weibull life distributions," IEEE Trans. Reliability, vol. 37, no. 4, pp. 401-405, Oct. 1988.
-
(1988)
IEEE Trans. Reliability
, vol.37
, Issue.4
, pp. 401-405
-
-
Dey, D.K.1
Jaisingh, L.R.2
-
9
-
-
0034155447
-
-
K. G. Popstojanova, K., and K. S. Trivedi, Failure correlation in software reliability models, IEEE Trans. Reliability, 49, no. 1, pp. 37-48, Mar. 2000.
-
K. G. Popstojanova, K., and K. S. Trivedi, "Failure correlation in software reliability models," IEEE Trans. Reliability, vol. 49, no. 1, pp. 37-48, Mar. 2000.
-
-
-
-
10
-
-
0029274576
-
Time-varying failure rates in the availability & reliability analysis of repairable systems
-
Mar
-
T. F. Hassett, D. L. Dietrich, and F. Szidarovszky, "Time-varying failure rates in the availability & reliability analysis of repairable systems," IEEE Trans. Reliability, vol. 44, no. 1, pp. 155-160, Mar. 1995.
-
(1995)
IEEE Trans. Reliability
, vol.44
, Issue.1
, pp. 155-160
-
-
Hassett, T.F.1
Dietrich, D.L.2
Szidarovszky, F.3
-
13
-
-
53349172400
-
Reliability-aware resource allocation in HPC systems
-
Austin, TX
-
N. R. Gottumukkala, C. Leangsuksun, R. Nassar, and S. L. Scott, "Reliability-aware resource allocation in HPC systems," in IEEE International Conference on Cluster Computing, Austin, TX, 2007, pp. 312-321.
-
(2007)
IEEE International Conference on Cluster Computing
, pp. 312-321
-
-
Gottumukkala, N.R.1
Leangsuksun, C.2
Nassar, R.3
Scott, S.L.4
-
14
-
-
0035390088
-
A variational calculus approach to optimal checkpoint placement
-
July
-
Y. Ling, J. Mi, and X. Lin, "A variational calculus approach to optimal checkpoint placement," IEEE Trans. Computers, vol. 50, no. 7, pp. 699-708, July 2001.
-
(2001)
IEEE Trans. Computers
, vol.50
, Issue.7
, pp. 699-708
-
-
Ling, Y.1
Mi, J.2
Lin, X.3
|