SCOPUS 정보 검색 플랫폼

Simulation Series

Volumn 45, Issue 6, 2013, Pages 1-9

Exploring reliability of exascale systems through simulations

(4) Zhao, Dongfang a Zhang, Da a Wang, Ke a Raicu, Ioan a,b

a Illinois Institute of Technology (United States)

b ARGONNE NATIONAL LABORATORY (United States)

Author keywords

Checkpointing; Distributed filesystems; Exascale computing; Fault tolerance; Parallel filesystems

Indexed keywords

CHECK POINTING; CRITICAL CHALLENGES; DISTRIBUTED FILE-SYSTEM; DISTRIBUTED FILESYSTEMS; EXASCALE COMPUTING; PARALLEL FILESYSTEMS; PERSISTENT STORAGE; STATE-OF-THE-ART TECHNIQUES;

FAULT TOLERANCE;

FAULT TOLERANT COMPUTER SYSTEMS;

EID: 84876817972 PISSN: 07359276 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (3)

References (17)

1
- 70450159168
- Software challenges for extreme scale computing: Going from petascale to exascale systems
- November
- Michael A. Heroux. Software challenges for extreme scale computing: Going from petascale to exascale systems. Int. J. High Perform. Comput. Appl., 23(4):437-439, November 2009.
- (2009) Int. J. High Perform. Comput. Appl. , vol.23 , Issue.4 , pp. 437-439
- Heroux, M.A.¹

2
- 70450209566
- Architectures for extreme-scale computing
- November
- Josep Torrellas. Architectures for extreme-scale computing. Computer, 42(11):28-35, November 2009.
- (2009) Computer , vol.42 , Issue.11 , pp. 28-35
- Torrellas, J.¹

3
- 77951294992
- National Economic Council
- Barack Obama. A strategy for american innovation: Driving towards sustainable growth and quality jobs. National Economic Council, 2009.
- (2009) A Strategy for American Innovation: Driving Towards Sustainable Growth and Quality Jobs
- Obama, B.¹

4
- 85084163004
- GPFS: A Shared-Disk File System for Large Computing Clusters
- USENIX Association
- Frank Schmuck and Roger Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. FAST '02, Berkeley, CA, USA, 2002. USENIX Association.
- FAST '02, Berkeley, CA, USA, 2002
- Schmuck, F.¹ Haskin, R.²

5
- 84876857792
- Distributed File Systems for Exascale Computing
- Doctoral Research, Salt Lake City, UT
- Dongfang Zhao and Ioan Raicu. Distributed File Systems for Exascale Computing. In Doctoral Research, Supercomputing '12, Salt Lake City, UT, 2012.
- (2012) Supercomputing '12
- Zhao, D.¹ Raicu, I.²

6
- 28044438299
- A model for predicting the optimum checkpoint interval for restart dumps
- Berlin, Heidelberg
- John Daly. A model for predicting the optimum checkpoint interval for restart dumps. In ICCS, pages 3-12, Berlin, Heidelberg, 2003.
- (2003) ICCS , pp. 3-12
- Daly, J.¹

7
- 79961097605
- Making a case for distributed file systems at exascale
- New York, NY, USA, ACM
- Ioan Raicu, Ian T. Foster, and Pete Beckman. Making a case for distributed file systems at exascale. In Proceedings of the third international workshop on Large-scale system and application performance, LSAP '11, pages 11-18, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of the Third International Workshop on Large-scale System and Application Performance, LSAP '11 , pp. 11-18
- Raicu, I.¹ Foster, I.T.² Beckman, P.³

8
- 0032314841
- On coordinated checkpointing in distributed systems
- dec
- Guohong Cao and M. Singhal. On coordinated checkpointing in distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 9(12):1213-1225, dec 1998.
- (1998) Parallel and Distributed Systems, IEEE Transactions on , vol.9 , Issue.12 , pp. 1213-1225
- Cao, G.¹ Singhal, M.²

9
- 0032597670
- An analysis of communication induced checkpointing
- L. Alvisi, E. Elnozahy, S. Rao, S.A. Husain, and A. de Mel. An analysis of communication induced checkpointing. In Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on, pages 242-249, 1999.
- (1999) Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on , pp. 242-249
- Alvisi, L.¹ Elnozahy, E.² Rao, S.³ Husain, S.A.⁴ De Mel, A.⁵

10
- 77952666560
- Pageserver: High-performance ssd-based checkpointing of transactional distributed memory
- march
- S. Gerhold, N. Kaemmer, A. Weggerle, C. Himpel, and P. Schulthess. Pageserver: High-performance ssd-based checkpointing of transactional distributed memory. In Computer Engineering and Applications (ICCEA), 2010 Second International Conference on, volume 1, pages 235-239, march 2010.
- (2010) Computer Engineering and Applications (ICCEA), 2010 Second International Conference on , vol.1 , pp. 235-239
- Gerhold, S.¹ Kaemmer, N.² Weggerle, A.³ Himpel, C.⁴ Schulthess, P.⁵

11
- 77958107571
- Enhancing Checkpoint Performance with Staging IO and SSD
- Washington, DC, USA, IEEE Computer Society
- Xiangyong Ouyang, Sonya Marcarelli, and Dhabaleswar K. Panda. Enhancing Checkpoint Performance with Staging IO and SSD. In Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI '10, pages 13-20, Washington, DC, USA, 2010. IEEE Computer Society.
- (2010) Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI '10 , pp. 13-20
- Ouyang, X.¹ Marcarelli, S.² Panda, D.K.³

12
- 53349098075
- Evaluation of fault-tolerant policies using simulation
- Washington, DC, USA, IEEE Computer Society
- Anand Tikotekar, Geoffroy Vallee, Thomas Naughton, Stephen L. Scott, and Chokchai Leangsuksun. Evaluation of fault-tolerant policies using simulation. In Proceedings of the 2007 IEEE International Conference on Cluster Computing, CLUSTER '07, pages 303-311, Washington, DC, USA, 2007. IEEE Computer Society.
- (2007) Proceedings of the 2007 IEEE International Conference on Cluster Computing, CLUSTER '07 , pp. 303-311
- Tikotekar, A.¹ Vallee, G.² Naughton, T.³ Scott, S.L.⁴ Leangsuksun, C.⁵

13
- 0041347621
- Nonblocking checkpointing for optimistic parallel simulation: Description and an implementation
- June
- Francesco Quaglia and Andrea Santoro. Nonblocking checkpointing for optimistic parallel simulation: Description and an implementation. IEEE Trans. Parallel Distrib. Syst., 14(6):593-610, June 2003.
- (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.6 , pp. 593-610
- Quaglia, F.¹ Santoro, A.²

14
- 50649087527
- Reliability-aware approach: An incremental checkpoint/restart model in hpc environments
- Box Washington, DC, USA, IEEE Computer Society
- Nichamon Naksinehaboon, Yudan Liu, Chokchai (Box) Leangsuksun, Raja Nassar, Mihaela Paun, and Stephen L. Scott. Reliability-aware approach: An incremental checkpoint/restart model in hpc environments. In Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid, CCGRID '08, pages 783-788, Washington, DC, USA, 2008. IEEE Computer Society.
- (2008) Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid, CCGRID '08 , pp. 783-788
- Naksinehaboon, N.¹ Liu, Y.² Leangsuksun, C.³ Nassar, R.⁴ Paun, M.⁵ Scott, S.L.⁶

15
- 70350448198
- Inmemory checkpointing for mpi programs by xor-based double-erasure codes
- Berlin, Heidelberg, Springer-Verlag
- Gang Wang, Xiaoguang Liu, Ang Li, and Fan Zhang. Inmemory checkpointing for mpi programs by xor-based double-erasure codes. In Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 84-93, Berlin, Heidelberg, 2009. Springer-Verlag.
- (2009) Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface , pp. 84-93
- Wang, G.¹ Liu, X.² Li, A.³ Zhang, F.⁴

16
- 79957586365
- A new diskless checkpointing approach for multiple processor failures
- July
- Ge-Ming Chiu and Jane-Ferng Chiu. A new diskless checkpointing approach for multiple processor failures. IEEE Trans. Dependable Secur. Comput., 8(4):481-493, July 2011.
- (2011) IEEE Trans. Dependable Secur. Comput. , vol.8 , Issue.4 , pp. 481-493
- Chiu, G.-M.¹ Chiu, J.-F.²

17
- 79960768327
- Hybrid checkpointing using emerging non-volatile memories for future exascale systems
- June
- Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. Hybrid checkpointing using emerging non-volatile memories for future exascale systems. ACM Trans. Archit. Code Optim., 8(2):6:1-6:29, June 2011.
- (2011) ACM Trans. Archit. Code Optim. , vol.8 , Issue.2
- Dong, X.¹ Xie, Y.² Muralimanohar, N.³ Jouppi, N.P.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.