SCOPUS 정보 검색 플랫폼

IEEE Transactions on Computers

Volumn 57, Issue 12, 2008, Pages 1647-1660

Adaptive fault management of parallel applications for high-performance computing

(2) Lan, Zhiling a Li, Yawei a

a Illinois Institute of Technology (United States)

Author keywords

Adaptive fault management; High performance computing; Large scale systems; Parallel applications

Indexed keywords

LARGE SCALE SYSTEMS;

ADAPTIVE FAULT MANAGEMENT; ADAPTIVE FAULT MANAGEMENTS; CASE STUDIES; CHECKPOINTING; COMPLETION TIMES; FAILURE PREDICTIONS; FAULT RESILIENCES; HIGH-PERFORMANCE COMPUTING; PARALLEL APPLICATIONS; PERIODIC CHECKPOINTING; REAL APPLICATIONS; RESOURCE UTILIZATIONS; STOCHASTIC MODELING;

APPLICATIONS;

EID: 57049111494 PISSN: 00189340 EISSN: None Source Type: Journal
DOI: 10.1109/TC.2008.90 Document Type: Article

Times cited : (38)

References (59)

1
- 32844469834
- The Top500 Supercomputer Site, http://www.top500.org, 2007.
- (2007) The Top500 Supercomputer Site

2
- 51049111944
- Big Systems and Big Reliability Challenges
- D. Reed, C. Lu, and C. Mendes, "Big Systems and Big Reliability Challenges," Proc. Int'l Conf. Parallel Computing (ParCo), 2003.
- (2003) Proc. Int'l Conf. Parallel Computing (ParCo)
- Reed, D.¹ Lu, C.² Mendes, C.³

3
- 33845593340
- A Large Scale Study of Failures in High-Performance-Computing Systems
- B. Schroeder and G. Gibson, "A Large Scale Study of Failures in High-Performance-Computing Systems," Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2006.
- (2006) Proc. Int'l Conf. Dependable Systems and Networks (DSN)
- Schroeder, B.¹ Gibson, G.²

4
- 0042078549
- A Survey of Rollback-Recovery Protocols in Message-Passing Systems
- E. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson, "A Survey of Rollback-Recovery Protocols in Message-Passing Systems," ACM Computing Surveys, vol. 34, no. 3, 2002.
- (2002) ACM Computing Surveys , vol.34 , Issue.3
- Elnozahy, E.¹ Alvisi, L.² Wang, Y.³ Johnson, D.⁴

5
- 9144223280
- Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
- Apr.-June
- E. Elnozahy and J. Plank, "Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery," IEEE Trans. Dependable and Secure Computing, vol. 1, no. 2, Apr.-June 2004.
- (2004) IEEE Trans. Dependable and Secure Computing , vol.1 , Issue.2
- Elnozahy, E.¹ Plank, J.²

6
- 0035266102
- Proactive Management of Software Aging
- V. Castelli, R. Harper, P. Heldelberger, S. Hunter, K. Trivedi, K. Vaidyanathan, and W. Zeggert, "Proactive Management of Software Aging," IBM J. Research and Development, vol. 45, no. 2, 2001.
- (2001) IBM J. Research and Development , vol.45 , Issue.2
- Castelli, V.¹ Harper, R.² Heldelberger, P.³ Hunter, S.⁴ Trivedi, K.⁵ Vaidyanathan, K.⁶ Zeggert, W.⁷

7
- 33847147616
- Proactive Fault Tolerance in Large Systems
- S. Chakravorty, C. Mendes, and L. Kale, "Proactive Fault Tolerance in Large Systems," Proc. First Workshop High Performance Computing Reliability Issues (HPCRI), 2005.
- (2005) Proc. First Workshop High Performance Computing Reliability Issues (HPCRI)
- Chakravorty, S.¹ Mendes, C.² Kale, L.³

8
- 78149354391
- Predicting Rare Events in Temporal Domains
- R. Vilalta and S. Ma, "Predicting Rare Events in Temporal Domains," Proc. IEEE Int'l Conf. Data Mining (ICDM), 2002.
- (2002) Proc. IEEE Int'l Conf. Data Mining (ICDM)
- Vilalta, R.¹ Ma, S.²

9
- 77952378080
- Critical Event Prediction for Proactive Management in Large-Scale Computer Clusters
- R. Sahoo, A. Oliner, I. Rish, M. Gupta, J. Moreira, and S. Ma, "Critical Event Prediction for Proactive Management in Large-Scale Computer Clusters," Proc. ACM SIGKDD, 2003.
- (2003) Proc. ACM SIGKDD
- Sahoo, R.¹ Oliner, A.² Rish, I.³ Gupta, M.⁴ Moreira, J.⁵ Ma, S.⁶

10
- 33845589803
- Blue Gene/L Failure Analysis and Prediction Models
- Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R. Sahoo, "Blue Gene/L Failure Analysis and Prediction Models," Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2006.
- (2006) Proc. Int'l Conf. Dependable Systems and Networks (DSN)
- Liang, Y.¹ Zhang, Y.² Sivasubramaniam, A.³ Jette, M.⁴ Sahoo, R.⁵

11
- 47249153592
- A Meta-Learning Failure Predictor for Blue Gene/L Systems
- P. Gujrati, Y. Li, Z. Lan, R. Thakur, and J. White, "A Meta-Learning Failure Predictor for Blue Gene/L Systems," Proc. Int'l Conf. Parallel Processing (ICPP), 2007.
- (2007) Proc. Int'l Conf. Parallel Processing (ICPP)
- Gujrati, P.¹ Li, Y.² Lan, Z.³ Thakur, R.⁴ White, J.⁵

12
- 36049013419
- What Supercomputers Say: A Study of Five System Logs
- A. Oliner and J. Stearley, "What Supercomputers Say: A Study of Five System Logs," Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2007.
- (2007) Proc. Int'l Conf. Dependable Systems and Networks (DSN)
- Oliner, A.¹ Stearley, J.²

13
- 51049108066
- Mpich-V: A Multiprotocol Automatic Fault Tolerant MPI
- A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, and F. Cappello, "Mpich-V: A Multiprotocol Automatic Fault Tolerant MPI," Int'l J. High Performance Computing and Applications, 2005.
- (2005) Int'l J. High Performance Computing and Applications
- Bouteiller, A.¹ Herault, T.² Krawezik, G.³ Lemarinier, P.⁴ Cappello, F.⁵

14
- 13944251545
- A Component Architecture for LAM/MPI
- J. Squyres and A. Lumsdaine, "A Component Architecture for LAM/MPI," Proc. 10th European PVM/MPI Users' Group Meeting, 2003.
- (2003) Proc. 10th European PVM/MPI Users' Group Meeting
- Squyres, J.¹ Lumsdaine, A.²

15
- 23944521034
- Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs
- M. Schulz, G. Bronevetsky, R. Fernandes, D. Marques, K. Pingali, and P. Stodghill, "Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs," Proc. ACM/IEEE Conf. Supercomputing (SC), 2004.
- (2004) Proc. ACM/IEEE Conf. Supercomputing (SC)
- Schulz, M.¹ Bronevetsky, G.² Fernandes, R.³ Marques, D.⁴ Pingali, K.⁵ Stodghill, P.⁶

16
- 85084159983
- Libckpt: Transparent Checkpointing under Unix
- J. Plank, M. Beck, G. Kingsley, and K. Li, "Libckpt: Transparent Checkpointing under Unix," Proc. Usenix Winter Technical Conf., 1995.
- (1995) Proc. Usenix Winter Technical Conf
- Plank, J.¹ Beck, M.² Kingsley, G.³ Li, K.⁴

17
- 33749061217
- Requirements for Linux Checkpoint/Restart,
- Technical Report LBNL-49659, Berkeley Lab, May 2002
- J. Duell, P. Hargrove, and E. Roman, "Requirements for Linux Checkpoint/Restart," Technical Report LBNL-49659, Berkeley Lab, May 2002.
- Duell, J.¹ Hargrove, P.² Roman, E.³

18
- 27844562921
- Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation
- E. Gabriel et al., "Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation," Proc. 11th European PVM/MPI Users' Group Meeting, 2004.
- (2004) Proc. 11th European PVM/MPI Users' Group Meeting
- Gabriel, E.¹

19
- 33751107476
- MPI-Mitten: Enabling Migration Technology in MPI
- C. Du and X. Sun, "MPI-Mitten: Enabling Migration Technology in MPI," Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid) 2006.
- (2006) Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid)
- Du, C.¹ Sun, X.²

20
- 34548768671
- A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance
- C. Wang, F. Mueller, C. Engelmann, and S. Scott, "A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance," Proc. 21st Int'l Parallel and Distributed Processing Symp. (IPDPS ), 2007.
- (2007) Proc. 21st Int'l Parallel and Distributed Processing Symp. (IPDPS )
- Wang, C.¹ Mueller, F.² Engelmann, C.³ Scott, S.⁴

21
- 84976846528
- A First Order Approximation to the Optimal Checkpoint Interval
- J. Young, "A First Order Approximation to the Optimal Checkpoint Interval," Comm. ACM, vol. 17, no. 9, 1974.
- (1974) Comm. ACM , vol.17 , Issue.9
- Young, J.¹

22
- 4544342875
- Min-Max Checkpoint Placement under Incomplete Failure Information
- T. Ozaki, T. Dohi, H. Okamura, and N. Kaio, "Min-Max Checkpoint Placement under Incomplete Failure Information," Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2004.
- (2004) Proc. Int'l Conf. Dependable Systems and Networks (DSN)
- Ozaki, T.¹ Dohi, T.² Okamura, H.³ Kaio, N.⁴

23
- 0021473687
- On the Optimum Checkpoint Selection Problem
- S. Toueg and O. Babaoglu, "On the Optimum Checkpoint Selection Problem," SIAM J. Computing, vol. 13, no. 3, 1984.
- (1984) SIAM J. Computing , vol.13 , Issue.3
- Toueg, S.¹ Babaoglu, O.²

24
- 85020592954
- Converting a Swap-Based System to Do Paging in an Architecture Lacking Page Reference Bits
- O. Babaoglu and W. Joy, "Converting a Swap-Based System to Do Paging in an Architecture Lacking Page Reference Bits," Proc. Eighth Symp. Operating Systems Principles (SOSP), 1981.
- (1981) Proc. Eighth Symp. Operating Systems Principles (SOSP)
- Babaoglu, O.¹ Joy, W.²

25
- 12444268355
- On the Feasibility of Incremental Checkpointing for Scientific Computing
- J. Sancho, F. Petrini, G. Johnson, J. Fernandez, and E. Frachtenberg, "On the Feasibility of Incremental Checkpointing for Scientific Computing," Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
- (2004) Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS)
- Sancho, J.¹ Petrini, F.² Johnson, G.³ Fernandez, J.⁴ Frachtenberg, E.⁵

26
- 0032179680
- Diskless Checkpointing
- Oct
- J. Plank, K. Li, and M. Puening, "Diskless Checkpointing," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, Oct. 1998.
- (1998) IEEE Trans. Parallel and Distributed Systems , vol.9 , Issue.10
- Plank, J.¹ Li, K.² Puening, M.³

27
- 36949009638
- Scalable Diskless Checkpointing for Large Parallel Systems,
- PhD dissertation, Univ. of Illinois at Urbana-Champaign
- C.-D. Lu, "Scalable Diskless Checkpointing for Large Parallel Systems," PhD dissertation, Univ. of Illinois at Urbana-Champaign, 2005.
- (2005)
- Lu, C.-D.¹

28
- 20444463494
- FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI
- G. Zheng, L. Shi, and L. Kale, "FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI," Proc. IEEE Int'l Conf. Cluster Computing (Cluster), 2004.
- (2004) Proc. IEEE Int'l Conf. Cluster Computing (Cluster)
- Zheng, G.¹ Shi, L.² Kale, L.³

29
- 28044457320
- Monitoring Hard Disks with Smart
- Jan
- B. Allen, "Monitoring Hard Disks with Smart," Linux J., Jan. 2004.
- (2004) Linux J
- Allen, B.¹

30
- 57049084232
- Hardware Monitoring by
- Hardware Monitoring by LM Sensors, http://secure.netroedge.com/-lm78/ info.html, 2007.
- (2007)
- Sensors, L.M.¹

31
- 77957658515
- Health Application Programming Interface, http://www.renci.org, 2007.
- (2007) Health Application Programming Interface

32
- 57049155659
- Intelligent Platform Management Interface, http://www.intel.com/design/ servers/ipmi, 2007.
- (2007) Intelligent Platform Management Interface

33
- 0033355546
- A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
- K. Trivedi and K. Vaidyanathan, "A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems," Proc. 10th Int'l Symp. Software Reliability Eng. (ISSRE), 1999.
- (1999) Proc. 10th Int'l Symp. Software Reliability Eng. (ISSRE)
- Trivedi, K.¹ Vaidyanathan, K.²

34
- 0002168249
- Learning to Predict Rare Events in Event Sequences
- G. Weiss and H. Hirsh, "Learning to Predict Rare Events in Event Sequences," Proc. ACM SIGKDD, 1998.
- (1998) Proc. ACM SIGKDD
- Weiss, G.¹ Hirsh, H.²

35
- 57049176161
- Advanced Failure Prediction in Complex Software Systems
- G. Hoffmann, F. Salfner, and M. Malek, "Advanced Failure Prediction in Complex Software Systems," Proc. 23rd Int'l Symp. Reliable Distributed Systems (SRDS), 2004.
- (2004) Proc. 23rd Int'l Symp. Reliable Distributed Systems (SRDS)
- Hoffmann, G.¹ Salfner, F.² Malek, M.³

36
- 0012253727
- Bayesian Approaches to Failure Prediction for Disk Drives
- G. Hamerly and C. Elkan, "Bayesian Approaches to Failure Prediction for Disk Drives," Proc. 18th Int'l Conf. Machine Learning (ICML , 2001.
- (2001) Proc. 18th Int'l Conf. Machine Learning (ICML
- Hamerly, G.¹ Elkan, C.²

37
- 0035153870
- A Statistical Approach to Predictive Detection
- J. Hellerstein, F. Zhang, and P. Shahabuddin, "A Statistical Approach to Predictive Detection," Computer Networks: The Int'l J. Computer and Telecommunications Networking, 2001.
- (2001) Computer Networks: The Int'l J. Computer and Telecommunications Networking
- Hellerstein, J.¹ Zhang, F.² Shahabuddin, P.³

38
- 21044437801
- Overview of the Blue Gene/L System Architecture
- A. Gara et al., "Overview of the Blue Gene/L System Architecture," IBM J. Research and Development, vol. 49, nos. 2/3, 2005.
- (2005) IBM J. Research and Development , vol.49 , Issue.2-3
- Gara, A.¹

39
- 57049171580
- Cray, Cray XT Series System Management, http://docs.cray.com/books/ S-2393-15/S-2393-15.pdf, 2005.
- (2005) Cray, Cray XT Series System Management

40
- 33749680779
- A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster
- C. Leangsuksun, T. Liu, T. Raol, S. Scott, and R. Libby, "A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster," Proc. Fifth LCI Int'l Conf. Linux Clusters, 2004.
- (2004) Proc. Fifth LCI Int'l Conf. Linux Clusters
- Leangsuksun, C.¹ Liu, T.² Raol, T.³ Scott, S.⁴ Libby, R.⁵

41
- 12444257746
- Fault-Aware Job Scheduling for Blue Gene/L Systems
- A. Oliner, R. Sahoo, J. Moreira, M. Gupta, and A. Sivasubramaniam, "Fault-Aware Job Scheduling for Blue Gene/L Systems," Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
- (2004) Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS)
- Oliner, A.¹ Sahoo, R.² Moreira, J.³ Gupta, M.⁴ Sivasubramaniam, A.⁵

42
- 33845595513
- Performance Implications of Failures in Large-Scale Cluster Scheduling
- Y. Zhang, M. Squillante, A. Sivasubramaniam, and R. Sahoo, "Performance Implications of Failures in Large-Scale Cluster Scheduling," Proc. 10th Workshop Job Scheduling Strategies for Parallel Processing (JSSPP) 2004.
- (2004) Proc. 10th Workshop Job Scheduling Strategies for Parallel Processing (JSSPP)
- Zhang, Y.¹ Squillante, M.² Sivasubramaniam, A.³ Sahoo, R.⁴

43
- 16244422723
- Checkpointing and Migration of Unix Processes in the Condor Distributed Processing System
- Feb
- T. Tannenbaum and M. Litzkow, "Checkpointing and Migration of Unix Processes in the Condor Distributed Processing System," Dr. Dobbs J. Feb. 1995.
- (1995) Dr. Dobbs J
- Tannenbaum, T.¹ Litzkow, M.²

44
- 85059766484
- Live Migration of Virtual Machines
- C. Clark et al., "Live Migration of Virtual Machines," Proc. Second Symp. Networked Systems Design and Implementation (NSDI ), 2005.
- (2005) Proc. Second Symp. Networked Systems Design and Implementation (NSDI )
- Clark, C.¹

45
- 34547424386
- Cooperative Checkpointing: A Robust Approach to Large-Scale Systems Reliability
- A. Oliner, L. Rudolph, and R. Sahoo, "Cooperative Checkpointing: A Robust Approach to Large-Scale Systems Reliability," Proc. 20th Ann. Int'l Conf. Supercomputing (ICS), 2006.
- (2006) Proc. 20th Ann. Int'l Conf. Supercomputing (ICS)
- Oliner, A.¹ Rudolph, L.² Sahoo, R.³

46
- 57049097221
- Attitude and Articulation Control for the Cassini Spacecraft: A Fault Tolerance Overview,
- G. Brown, D. Bernard, and R. Rasmussen, "Attitude and Articulation Control for the Cassini Spacecraft: A Fault Tolerance Overview," Jet Propulsion Laboratory technical report, 1997.
- (1997) Jet Propulsion Laboratory technical report
- Brown, G.¹ Bernard, D.² Rasmussen, R.³

47
- 77955897418
- Total Recall: System Support for Automated Availability Management
- R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. Voelker, "Total Recall: System Support for Automated Availability Management," Proc. First Symp. Networked Systems Design and Implementation (NSDI ), 2004.
- (2004) Proc. First Symp. Networked Systems Design and Implementation (NSDI )
- Bhagwan, R.¹ Tati, K.² Cheng, Y.³ Savage, S.⁴ Voelker, G.⁵

48
- 51049111075
- A Fault Diagnosis and Prognosis Service for Teragrid Clusters
- Z. Lan, P. Gujrati, Y. Li, Z. Zheng, R. Thakur, and J. White, "A Fault Diagnosis and Prognosis Service for Teragrid Clusters," Proc. Second TeraGrid Conf., 2007.
- (2007) Proc. Second TeraGrid Conf
- Lan, Z.¹ Gujrati, P.² Li, Y.³ Zheng, Z.⁴ Thakur, R.⁵ White, J.⁶

49
- 51049094584
- Anomaly Localization in Large-Scale Clusters
- Z. Zheng, Y. Li, and Z. Lan, "Anomaly Localization in Large-Scale Clusters," Proc. IEEE Int'l Conf. Cluster Computing (Cluster , 2007.
- (2007) Proc. IEEE Int'l Conf. Cluster Computing (Cluster
- Zheng, Z.¹ Li, Y.² Lan, Z.³

50
- 33751082401
- Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
- Y. Li and Z. Lan, "Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing," Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid), 2006.
- (2006) Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid)
- Li, Y.¹ Lan, Z.²

51
- 0035201417
- Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems
- J. Plank and M. Thomason, "Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems," J. Parallel and Distributed Computing, vol. 61, no. 11, 2001.
- (2001) J. Parallel and Distributed Computing , vol.61 , Issue.11
- Plank, J.¹ Thomason, M.²

52
- 33847167560
- Cooperative Checkpointing Theory
- A. Oliner, L. Rudolph, and R. Sahoo, "Cooperative Checkpointing Theory," Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2006.
- (2006) Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS)
- Oliner, A.¹ Rudolph, L.² Sahoo, R.³

53
- 0024908784
- SPNP: Stochastic Petri Net Package
- G. Ciardo, J. Muppala, and K. Trivedi, "SPNP: Stochastic Petri Net Package," Proc. Third Int'l Workshop Petri Nets and Performance Models (PNPM), 1989.
- (1989) Proc. Third Int'l Workshop Petri Nets and Performance Models (PNPM)
- Ciardo, G.¹ Muppala, J.² Trivedi, K.³

54
- 27544513113
- Modeling Coordinated Checkpointing for Large-Scale Supercomputers
- L. Wang, K. Pattabiraman, Z. Kalbarczyk, and R. Iyer, "Modeling Coordinated Checkpointing for Large-Scale Supercomputers," Proc. Int'l Conf. Dependable Systems and Networks (DSN), 2005.
- (2005) Proc. Int'l Conf. Dependable Systems and Networks (DSN)
- Wang, L.¹ Pattabiraman, K.² Kalbarczyk, Z.³ Iyer, R.⁴

55
- 57049145431
- NASA NAS Parallel Benchmarks, http://www.nas.nasa.gov/Resources/Software/ npb.html, 2007.
- (2007) NASA NAS Parallel Benchmarks

56
- 84897988044
- Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formulation
- G. Bryan, T. Abel, and M. Norman, "Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formulation," Proc. ACM/IEEE Conf. Supercomputing (SC), 2001.
- (2001) Proc. ACM/IEEE Conf. Supercomputing (SC)
- Bryan, G.¹ Abel, T.² Norman, M.³

57
- 0029633168
- Gromacs: A Message-Passing Parallel Molecular Dynamics Implementation
- H. Berendsen, D.V. der Spoel, and R. van Drunen, "Gromacs: A Message-Passing Parallel Molecular Dynamics Implementation," Computer Physics Comm., vol. 91, pp. 43-56, 1995.
- (1995) Computer Physics Comm , vol.91 , pp. 43-56
- Berendsen, H.¹ der Spoel, D.V.² van Drunen, R.³

58
- 57049101082
- Dynamic Load Balancing for Structured Adaptive Mesh Refinement Applications
- Z. Lan, V. Taylor, and G. Bryan, "Dynamic Load Balancing for Structured Adaptive Mesh Refinement Applications," Proc. ACM/IEEE Conf. Supercomputing (SC), 2001.
- (2001) Proc. ACM/IEEE Conf. Supercomputing (SC)
- Lan, Z.¹ Taylor, V.² Bryan, G.³

59
- 79952168926
- Using Adaptive Fault Tolerance to Improve Application Robustness on the Teragrid
- Y. Li and Z. Lan, "Using Adaptive Fault Tolerance to Improve Application Robustness on the Teragrid," Proc. Second TeraGrid Conf. 2007.
- (2007) Proc. Second TeraGrid Conf
- Li, Y.¹ Lan, Z.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.