-
1
-
-
83155186191
-
-
Personal communication, May
-
William D. Gropp. Personal communication, May 2010.
-
(2010)
-
-
Gropp, W.D.1
-
2
-
-
70450206305
-
Toward exascale resilience
-
November
-
Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. Toward exascale resilience. Int. J. High Perform. Comput. Appl., 23:374-388, November 2009.
-
(2009)
Int. J. High Perform. Comput. Appl.
, vol.23
, pp. 374-388
-
-
Cappello, F.1
Geist, A.2
Gropp, B.3
Kale, L.4
Kramer, B.5
Snir, M.6
-
3
-
-
83155195268
-
Hierarchical event log organizer
-
Sep
-
Ana Gainaru, Franck Cappello, Stefan Trausan-Matu, and Bill Kramer. Hierarchical event log organizer. Technical Report of the INRIA-Illinois Joint Laboratory on PetaScale Computing, pages 1-24, Sep 2010.
-
(2010)
Technical Report of the INRIA-Illinois Joint Laboratory on PetaScale Computing
, pp. 1-24
-
-
Gainaru, A.1
Cappello, F.2
Trausan-Matu, S.3
Kramer, B.4
-
4
-
-
85076902294
-
Availability in globally distributed storage systems
-
Daniel Ford, Francois Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, 2010.
-
(2010)
Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation
-
-
Ford, D.1
Labelle, F.2
Popovici, F.3
Stokely, M.4
Truong, V.-A.5
Barroso, L.6
Grimes, C.7
Quinlan, S.8
-
5
-
-
36049041275
-
Understanding disk failure rates: What does an mttf of 1, 000, 000 hours mean to you?
-
Oct
-
Bianca Schroeder and Garth Gibson. Understanding disk failure rates: What does an mttf of 1, 000, 000 hours mean to you? Transactions on Storage (TOS, 3(3), Oct 2007.
-
(2007)
Transactions on Storage (TOS)
, vol.3
, Issue.3
-
-
Schroeder, B.1
Gibson, G.2
-
6
-
-
33845593340
-
A large-scale study of failures in high-performance computing systems
-
DOI 10.1109/DSN.2006.5, 1633514, Proceedings - DSN 2006: 2006 International Conference on Dependable Systems and Networks
-
Bianca Schroeder and Garth A. Gibson. A large-scale study of failures in high-performance computing systems. In Proceedings of the International Conference on Dependable Systems and Networks, pages 249-258, Washington, DC, USA, 2006. IEEE Computer Society. (Pubitemid 44930426)
-
(2006)
Proceedings of the International Conference on Dependable Systems and Networks
, vol.2006
, pp. 249-258
-
-
Schroeder, B.1
Gibson, G.A.2
-
8
-
-
38049182471
-
How are real grids used? the analysis of four grid traces and its implications
-
A. Iosup, C. Dumitrescu, D. H. J. Epema, H. Li, and L. Wolters. How are real grids used? the analysis of four grid traces and its implications. In GRID, pages 262-269, 2006.
-
(2006)
GRID
, pp. 262-269
-
-
Iosup, A.1
Dumitrescu, C.2
Epema, D.H.J.3
Li, H.4
Wolters, L.5
-
9
-
-
38049172300
-
-
Catalog of boinc projects. http://www.boinc-wiki.info/Catalog-of-BOINC- Powered-Projects.
-
Catalog of Boinc Projects
-
-
-
11
-
-
84900592671
-
-
EINSTEN@home. http://einstein.phys.uwm.edu.
-
EINSTEN@home
-
-
-
12
-
-
27344436659
-
Scalable molecular dynamics with NAMD
-
DOI 10.1002/jcc.20289
-
James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant V. Kalé, and Klaus Schulten. Scalable molecular dynamics with namd. Journal of Computational Chemistry, 26(16):1781-1802, 2005. (Pubitemid 43078511)
-
(2005)
Journal of Computational Chemistry
, vol.26
, Issue.16
, pp. 1781-1802
-
-
Phillips, J.C.1
Braun, R.2
Wang, W.3
Gumbart, J.4
Tajkhorshid, E.5
Villa, E.6
Chipot, C.7
Skeel, R.D.8
Kale, L.9
Schulten, K.10
-
13
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
September
-
John W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17:530-531, September 1974.
-
(1974)
Commun. ACM
, vol.17
, pp. 530-531
-
-
Young, J.W.1
-
15
-
-
67049096648
-
Alert detection in system logs
-
A Oliner, A Aiken, and J Stearley. Alert detection in system logs. Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, pages 959-964, 2008.
-
(2008)
Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on
, pp. 959-964
-
-
Oliner, A.1
Aiken, A.2
Stearley, J.3
-
16
-
-
20444471122
-
Towards informatic analysis of syslogs
-
2004 IEEE International Conference on Cluster Computing, ICCC 2004
-
J. Stearley. Towards informatic analysis of syslogs. In Proceedings of the 2004 IEEE International Conference on Cluster Computing, pages 309-318, Washington, DC, USA, 2004. IEEE Computer Society. (Pubitemid 40822381)
-
(2004)
Proceedings - IEEE International Conference on Cluster Computing, ICCC
, pp. 309-318
-
-
Stearley, J.1
-
17
-
-
77954903245
-
The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems
-
Derrick Kondo, Bahman Javadi, Alexandru Iosup, and Dick Epema. The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 398 -407, 2010.
-
(2010)
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
, pp. 398-407
-
-
Kondo, D.1
Javadi, B.2
Iosup, A.3
Epema, D.4
-
19
-
-
76349120592
-
Mining for statistical availability models in large-scale distributed systems: An empirical study of seti@home
-
September
-
B. Javadi, D. Kondo, JM. Vincent, and D.P. Anderson. Mining for statistical availability models in large-scale distributed systems: An empirical study of seti@home. In 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2009.
-
(2009)
17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)
-
-
Javadi, B.1
Kondo, D.2
Vincent, J.M.3
Anderson, D.P.4
-
20
-
-
33244467640
-
Is remote host availability governed by a universal law?
-
John R. Douceur. Is remote host availability governed by a universal law? SIGMETRICS Performance Evaluation Review, 31(3):25-29, 2003.
-
(2003)
SIGMETRICS Performance Evaluation Review
, vol.31
, Issue.3
, pp. 25-29
-
-
Douceur, J.R.1
-
21
-
-
38449113154
-
Quantifying machine availability in networked and desktop grid systems
-
University of California at Santa Barbara, November
-
J. Brevik, D. Nurmi, and R. Wolski. Quantifying Machine Availability in Networked and Desktop Grid Systems. Technical Report CS2003-37, Dept. of Computer Science and Engineering, University of California at Santa Barbara, November 2003.
-
(2003)
Technical Report CS2003-37, Dept. of Computer Science and Engineering
-
-
Brevik, J.1
Nurmi, D.2
Wolski, R.3
-
22
-
-
21844470195
-
On correlated failures in survivable storage systems
-
Mehmet Bakkaloglu, Jay J. Wylie, Chenxi Wang, and Gregory R. Ganger. On correlated failures in survivable storage systems. Technical Report CMU-CS-02-129, Carnegie Mellon University, 2002.
-
(2002)
Technical Report CMU-CS-02-129, Carnegie Mellon University
-
-
Bakkaloglu, M.1
Wylie, J.J.2
Wang, C.3
Ganger, G.R.4
-
23
-
-
38049145912
-
Characterizing result errors in internet desktop grids
-
D Kondo, F Araujo, P Malecot, P Domingues, LM Silva, G Fedak, and F Cappello. Characterizing result errors in internet desktop grids. Lecture Notes in Computer Science, 4641:361, 2007.
-
(2007)
Lecture Notes in Computer Science
, vol.4641
, pp. 361
-
-
Kondo, D.1
Araujo, F.2
Malecot, P.3
Domingues, P.4
Silva, L.M.5
Fedak, G.6
Cappello, F.7
-
26
-
-
70449844295
-
Dmtcp: Transparent checkpointing for cluster computations and the desktop
-
0
-
Jason Ansel, Kapil Arya, and Gene Cooperman. Dmtcp: Transparent checkpointing for cluster computations and the desktop. Parallel and Distributed Processing Symposium, International, 0:1-12, 2009.
-
(2009)
Parallel and Distributed Processing Symposium, International
, pp. 1-12
-
-
Ansel, J.1
Arya, K.2
Cooperman, G.3
-
27
-
-
34548282622
-
Blocking vs. Non-blocking coordinated checkpointing for large-scale fault tolerant mpi
-
Camille Coti, Thomas Herault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, and Franck Cappello. Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant mpi. In SC 2006 Conference, Proceedings of the ACM/IEEE, page 18, 2006.
-
(2006)
SC 2006 Conference, Proceedings of the ACM/IEEE
, pp. 18
-
-
Coti, C.1
Herault, T.2
Lemarinier, P.3
Pilard, L.4
Rezmerita, A.5
Rodriguez, E.6
Cappello, F.7
|