-
1
-
-
84884867455
-
-
http:// [Online; accessed 1-October-2012]
-
Top 500 most powerful supercomputers. http://http://www.top500.org/, 2012. [Online; accessed 1-October-2012].
-
(2012)
Top 500 Most Powerful Supercomputers
-
-
-
2
-
-
84881054324
-
-
Rapport de recherche RR-8023, INRIA, July
-
Guillaume Aupy, Yves Robert, Frédéric Vivien, and Dounia Zaidouni. Impact of fault prediction on checkpointing strategies. Rapport de recherche RR-8023, INRIA, July 2012.
-
(2012)
Impact of Fault Prediction on Checkpointing Strategies
-
-
Aupy, G.1
Robert, Y.2
Vivien, F.3
Zaidouni, D.4
-
3
-
-
83155160949
-
Fti: High performance fault tolerance interface for hybrid systems
-
ACM
-
Leonardo Bautista-Gomez, Seiji Tsuboi, Dimitri Komatitsch, Franck Cappello, Naoya Maruyama, and Satoshi Matsuoka. Fti: high performance fault tolerance interface for hybrid systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 32:1-32:32. ACM, 2011.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
-
-
Bautista-Gomez, L.1
Tsuboi, S.2
Komatitsch, D.3
Cappello, F.4
Maruyama, N.5
Matsuoka, S.6
-
5
-
-
28044460018
-
A higher order estimate of the optimum checkpoint interval for restart dumps
-
J. T. Daly. A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems, pages 303-312, 2006.
-
(2006)
Future Generation Computer Systems
, pp. 303-312
-
-
Daly, J.T.1
-
7
-
-
34548640111
-
Fundamental differences between sph and grid methods
-
Agertz et al. Fundamental differences between sph and grid methods. Monthly Notices of the Royal Astronomical Society, pages 963-978, 2007.
-
(2007)
Monthly Notices of the Royal Astronomical Society
, pp. 963-978
-
-
Agertz1
-
9
-
-
55849147399
-
Dynamic meta-learning for failure prediction in large-scale systems: A case study
-
IEEE press
-
J. Gu et al. Dynamic meta-learning for failure prediction in large-scale systems: A case study. In International Conference on Parallel Processing, pages 157-164. IEEE press, 2008.
-
(2008)
International Conference on Parallel Processing
, pp. 157-164
-
-
Gu, J.1
-
11
-
-
84877719832
-
Logmaster: Mining event correlations in logs of large-scale cluster systems
-
abs/1003.0951
-
R. Ren et al. Logmaster: Mining event correlations in logs of large-scale cluster systems. CoRR abs/1003.0951, 2010.
-
(2010)
CoRR
-
-
Ren, R.1
-
13
-
-
81055139569
-
Adaptive event prediction strategy with dynamic time window for large-scale hpc systems
-
ACM
-
Ana Gainaru, Franck Cappello, Joshi Fullop, Stefan Trausan-Matu, and William Kramer. Adaptive event prediction strategy with dynamic time window for large-scale hpc systems. In Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques, pages 4:1-4:8. ACM, 2011.
-
(2011)
Managing Large-scale Systems Via the Analysis of System Logs and the Application of Machine Learning Techniques
-
-
Gainaru, A.1
Cappello, F.2
Fullop, J.3
Trausan-Matu, S.4
Kramer, W.5
-
14
-
-
84866885057
-
Taming of the shrew: Modeling the normal and faulty behavior of large-scale hpc systems
-
IEEE press
-
Ana Gainaru, Franck Cappello, and William Kramer. Taming of the shrew: Modeling the normal and faulty behavior of large-scale hpc systems. In Proceedings of IEEE IPDPS 2012. IEEE press, 2012.
-
(2012)
Proceedings of IEEE IPDPS 2012
-
-
Gainaru, A.1
Cappello, F.2
Kramer, W.3
-
15
-
-
84877693592
-
Fault prediction under the microscope: A closer look into hpc systems
-
IEEE press
-
Ana Gainaru, Franck Cappello, Marc Snir, and William Kramer. Fault prediction under the microscope: A closer look into hpc systems. In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE press, 2012.
-
(2012)
Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
-
-
Gainaru, A.1
Cappello, F.2
Snir, M.3
Kramer, W.4
-
16
-
-
80052352428
-
Event log mining tool for large scale hpc systems
-
Berlin, Heidelberg, Springer-Verlag
-
Ana Gainaru, Franck Cappello, Stefan Trausan-Matu, and Bill Kramer. Event log mining tool for large scale hpc systems. In Proceedings of the 17th international conference on Parallel processing - Volume Part I, Euro-Par'11, pages 52-64, Berlin, Heidelberg, 2011. Springer-Verlag.
-
(2011)
Proceedings of the 17th International Conference on Parallel Processing - Volume Part I, Euro-Par'11
, pp. 52-64
-
-
Gainaru, A.1
Cappello, F.2
Trausan-Matu, S.3
Kramer, B.4
-
17
-
-
79952794881
-
Low-overhead diskless checkpoint for hybrid computing systems
-
December
-
L.B. Gomez, A. Nukada, N. Maruyama, F. Cappello, and S. Matsuoka. Low-overhead diskless checkpoint for hybrid computing systems. In High Performance Computing (HiPC), 2010 International Conference on, pages 1-10, December 2010.
-
(2010)
High Performance Computing (HiPC), 2010 International Conference on
, pp. 1-10
-
-
Gomez, L.B.1
Nukada, A.2
Maruyama, N.3
Cappello, F.4
Matsuoka, S.5
-
18
-
-
77954904463
-
Distributed diskless checkpoint for large scale systems
-
May
-
Leonardo Arturo Bautista Gomez, Naoya Maruyama, Franck Cappello, and Satoshi Matsuoka. Distributed diskless checkpoint for large scale systems. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 63-72, May 2010.
-
(2010)
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
, pp. 63-72
-
-
Gomez, L.A.B.1
Maruyama, N.2
Cappello, F.3
Matsuoka, S.4
-
20
-
-
84866852589
-
Hydee: Failure containment without event logging for large scale send-deterministic mpi applications
-
IEEE
-
A. Guermouche, T. Ropars, M. Snir, and F. Cappello. Hydee: Failure containment without event logging for large scale send-deterministic mpi applications. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 1216-1227. IEEE, 2012.
-
(2012)
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
, pp. 1216-1227
-
-
Guermouche, A.1
Ropars, T.2
Snir, M.3
Cappello, F.4
-
21
-
-
83155160934
-
Modeling and tolerating heterogeneous failures in large parallel systems
-
New York, NY, USA, ACM
-
Eric Heien, Derrick Kondo, Ana Gainaru, Dan LaPine, Bill Kramer, and Franck Cappello. Modeling and tolerating heterogeneous failures in large parallel systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 45:1-45:11, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11
-
-
Heien, E.1
Kondo, D.2
Gainaru, A.3
LaPine, D.4
Kramer, B.5
Cappello, F.6
-
23
-
-
83955164680
-
Weibull and gamma renewal approximation using generalized exponential functions
-
T. Jin and L. Gonigunta. Weibull and gamma renewal approximation using generalized exponential functions. Communications in Statistics-Simulation and Computation, 38(1):154-171, 2008.
-
(2008)
Communications in Statistics-Simulation and Computation
, vol.38
, Issue.1
, pp. 154-171
-
-
Jin, T.1
Gonigunta, L.2
-
25
-
-
78650831692
-
Design, Modeling, and Evaluation of a Scalable Multilevel Checkpointing System
-
November
-
Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R. de Supinski. Design, Modeling, and Evaluation of a Scalable Multilevel Checkpointing System. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1-11, November 2010.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, pp. 1-11
-
-
Moody, A.1
Bronevetsky, G.2
Mohror, K.3
De Supinski, B.R.4
-
26
-
-
31044449725
-
Accident prediction model for railway-highway interfaces
-
Jutaek Oh, Simon P Washington, and Doohee Nam. Accident prediction model for railway-highway interfaces. Accident analysis and prevention, 38(2):346-356, 2006.
-
(2006)
Accident Analysis and Prevention
, vol.38
, Issue.2
, pp. 346-356
-
-
Oh, J.1
Washington, S.P.2
Nam, D.3
-
27
-
-
47249142074
-
Modeling the impact of checkpoints on next-generation systems
-
R.A. Oldfield, S. Arunagiri, P.J. Teller, S. Seelam, M.R. Varela, R. Riesen, and P.C. Roth. Modeling the impact of checkpoints on next-generation systems. In 24th IEEE Conference on Mass Storage Systems and Technologies, pages 30-46, 2007.
-
(2007)
24th IEEE Conference on Mass Storage Systems and Technologies
, pp. 30-46
-
-
Oldfield, R.A.1
Arunagiri, S.2
Teller, P.J.3
Seelam, S.4
Varela, M.R.5
Riesen, R.6
Roth, P.C.7
-
28
-
-
54249121630
-
Modelling discontinuities and kelvin-helmholtz instabilities in sph
-
Daniel J. Price. Modelling discontinuities and kelvin-helmholtz instabilities in sph. Journal of Computational Physics, pages 10040-10057, 2008.
-
(2008)
Journal of Computational Physics
, pp. 10040-10057
-
-
Price, D.J.1
-
29
-
-
80052380100
-
On the use of cluster-based partial message logging to improve fault tolerance for mpi hpc applications
-
Springer Berlin / Heidelberg
-
Thomas Ropars, Amina Guermouche, Bora Uçar, Esteban Meneses, Laxmikant Kalé, and Franck Cappello. On the use of cluster-based partial message logging to improve fault tolerance for mpi hpc applications. In Euro-Par 2011 Parallel Processing, volume 6852, pages 567-578. Springer Berlin / Heidelberg, 2011.
-
(2011)
Euro-Par 2011 Parallel Processing
, vol.6852
, pp. 567-578
-
-
Ropars, T.1
Guermouche, A.2
Uçar, B.3
Meneses, E.4
Kalé, L.5
Cappello, F.6
-
30
-
-
77950267881
-
A survey of online failure prediction methods
-
Felix Salfner, Maren Lenk, and Miroslaw Malek. A survey of online failure prediction methods. ACM Computing Surveys, 42:1-42, 2010.
-
(2010)
ACM Computing Surveys
, vol.42
, pp. 1-42
-
-
Salfner, F.1
Lenk, M.2
Malek, M.3
-
31
-
-
80052777075
-
Making tsubame2.0, the world's greenest production supercomputer, even greener challenges to the architects
-
IEEE Press Piscataway
-
Matsuoka Satoshi. Making tsubame2.0, the world's greenest production supercomputer, even greener challenges to the architects. In International Symposium on Low Power Electronics and Design, pages 367-368. IEEE Press Piscataway, 2011.
-
(2011)
International Symposium on Low Power Electronics and Design
, pp. 367-368
-
-
Satoshi, M.1
-
32
-
-
29144514328
-
The cosmological simulation code gadget-2
-
Blackwell Science Ltd
-
Volker Springel. The cosmological simulation code gadget-2. In Monthly Notices of the Royal Astronomical Society, volume 364, pages 1105-1134. Blackwell Science Ltd, 2005.
-
(2005)
Monthly Notices of the Royal Astronomical Society
, vol.364
, pp. 1105-1134
-
-
Springel, V.1
-
33
-
-
0035390088
-
A variational calculus approach to optimal checkpoint placement
-
July
-
X.Lin Y.Ling, J.Mi. A variational calculus approach to optimal checkpoint placement. IEEE Transactions on Computers, 50(07):699, July 2001.
-
(2001)
IEEE Transactions on Computers
, vol.50
, Issue.7
, pp. 699
-
-
Lin, X.1
Ling, Y.2
Mi, J.3
-
34
-
-
84976846528
-
A first order approximation to the optimum checkpoint interval
-
J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530-531, 1974.
-
(1974)
Commun. ACM
, vol.17
, Issue.9
, pp. 530-531
-
-
Young, J.W.1
-
36
-
-
77649192707
-
A data-driven approach for predicting failure scenarios in nuclear systems
-
Enrico Zio, Francesco Di Maio, and Marco Stasi. A data-driven approach for predicting failure scenarios in nuclear systems. Annals of Nuclear Energy, 37:482-491, 2010.
-
(2010)
Annals of Nuclear Energy
, vol.37
, pp. 482-491
-
-
Zio, E.1
Di Maio, F.2
Stasi, M.3
|