-
1
-
-
84878011546
-
The design and implementation of Berkeley lab's linux checkpoint/restart
-
Jason Duell. The design and implementation of berkeley lab's linux checkpoint/restart. Lawrence Berkeley National Laboratory, 2005.
-
(2005)
Lawrence Berkeley National Laboratory
-
-
Duell, J.1
-
2
-
-
84906771448
-
On the combination of silent error detection and checkpointing
-
IEEE
-
Aupy et al. On the combination of silent error detection and checkpointing. In PRDC-19. IEEE, 2013.
-
(2013)
PRDC-19
-
-
Aupy1
-
4
-
-
3042658703
-
LLVM: A compilation framework for lifelong program analysis & transformation
-
Chris et al. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, 2004.
-
(2004)
CGO
-
-
Chris1
-
5
-
-
84977104070
-
Nzdc: A compiler technique for near zero silent data corruption
-
ACM
-
Didehban et al. nZDC: A compiler technique for near Zero silent Data Corruption. In DAC-53. ACM, 2016.
-
(2016)
DAC-53
-
-
Didehban1
-
6
-
-
9144223280
-
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
-
Elnozahy et al. Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery. TDSC, 2004.
-
(2004)
TDSC
-
-
Elnozahy1
-
7
-
-
8344255328
-
MiBench: A free, commercially representative embedded benchmark suite
-
IEEE
-
Guthaus et al. MiBench: A free, commercially representative embedded benchmark suite. In WWC-4. IEEE, 2001.
-
(2001)
WWC-4
-
-
Guthaus1
-
8
-
-
85009418556
-
Clover: Compiler directed lightweight soft error resilience
-
ACM
-
Liu et al. Clover: Compiler directed lightweight soft error resilience. In LCTES. ACM, 2015.
-
(2015)
LCTES
-
-
Liu1
-
9
-
-
84880064456
-
When is multi-version checkpointing needed?
-
ACM
-
Lu et al. When is multi-version checkpointing needed? In FTXS-3. ACM, 2013.
-
(2013)
FTXS-3
-
-
Lu1
-
10
-
-
84977173962
-
Drift: Decoupled compiler-based instruction-level fault-Tolerance
-
Springer
-
Mitropoulou et al. DRIFT: Decoupled CompileR-Based Instruction-Level Fault-Tolerance. In LCPC. Springer, 2014.
-
(2014)
LCPC
-
-
Mitropoulou1
-
11
-
-
0036472442
-
Ed4i: Error detection by diverse data and duplicated instructions
-
Oh et al. ED4I: Error Detection by Diverse Data and Duplicated Instructions. TC, 2002.
-
(2002)
TC
-
-
Oh1
-
12
-
-
84961792784
-
Software resilience and the effectiveness of software mitigation in microcontrollers
-
Quinn et al. Software resilience and the effectiveness of software mitigation in microcontrollers. TNS, 2015.
-
TNS 2015
-
-
Quinn1
-
13
-
-
78649382366
-
SWIFT: Software implemented fault tolerance
-
IEEE
-
Reis et al. SWIFT: Software implemented fault tolerance. In CGO. IEEE, 2005.
-
(2005)
CGO
-
-
Reis1
-
14
-
-
34249775197
-
Automatic instruction-level software-only recovery
-
Reis et al. Automatic instruction-level software-only recovery. MICRO, 2007
-
(2007)
MICRO
-
-
Reis1
-
15
-
-
85023621347
-
Avoiding pitfalls in fault-injection based comparison of program
-
Schirmeier et al. Avoiding pitfalls in fault-injection based comparison of program .. In DSN, 2015.
-
(2015)
DSN
-
-
Schirmeier1
-
16
-
-
60649118603
-
Understanding failures in petascale computers
-
IOP Publishing
-
Schroeder et al. Understanding failures in petascale computers. In SciDAC. IOP Publishing, 2007.
-
(2007)
SciDAC
-
-
Schroeder1
-
17
-
-
85023613092
-
Quantitative analysis of control flow checking mechanisms for soft errors
-
IEEE
-
Shrivastava et al. Quantitative analysis of control flow checking mechanisms for soft errors. In DAC-51. IEEE, 2014.
-
(2014)
DAC-51
-
-
Shrivastava1
-
18
-
-
85023643991
-
Encore: Low-cost, fine-grained transient fault recovery
-
Shuguang et al. Encore: low-cost, fine-grained transient fault recovery. In MICR0-44-ACM, 2011.
-
(2011)
MICR0-44-ACM
-
-
Shuguang1
-
19
-
-
67649255075
-
PLR: A software approach to transient fault tolerance for multicore architectures
-
Shye et al. PLR: A software approach to transient fault tolerance for multicore architectures. TDSC, 2009.
-
(2009)
TDSC
-
-
Shye1
-
21
-
-
0021582517
-
Neutron generated single-event upsets in the atmosphere
-
Silberberg et al. Neutron generated single-event upsets in the atmosphere. TNS, 1984.
-
(1984)
TNS
-
-
Silberberg1
-
22
-
-
0027576605
-
Single event upset in avionics
-
Taber et al. Single event upset in avionics. TNS, 1993.
-
(1993)
TNS
-
-
Taber1
-
23
-
-
84864204782
-
Eliminating single points of failure in softwarebased redundancy
-
IEEE
-
Ulbrich et al. Eliminating single points of failure in softwarebased redundancy. In EDCC-9. IEEE, 2012.
-
(2012)
EDCC-9
-
-
Ulbrich1
-
24
-
-
85055780139
-
Compiler-managed software-based redundant multi-Threading for transient fault detection
-
Wang et al. Compiler-managed software-based redundant multi-Threading for transient fault detection. In CGO, 2007.
-
(2007)
CGO
-
-
Wang1
-
25
-
-
85023645677
-
An instruction-level fine-grained recovery approach for soft errors
-
ACM
-
Xu et al. An instruction-level fine-grained recovery approach for soft errors. In SAC-28. ACM, 2013.
-
(2013)
SAC-28
-
-
Xu1
-
26
-
-
85023610877
-
A fault-Tolerant programmable voter for softwarebased n-modular redundancy
-
IEEE
-
Yim et al. A fault-Tolerant programmable voter for softwarebased n-modular redundancy. In AeroConf. IEEE, 2012.
-
(2012)
AeroConf
-
-
Yim1
-
27
-
-
85023631679
-
Esoftcheck: Removal of non-vital checks for fault tolerance
-
IEEE Computer Society
-
Yu et al. Esoftcheck: Removal of non-vital checks for fault tolerance. In CG0-7. IEEE Computer Society, 2009.
-
(2009)
CG0-7
-
-
Yu1
-
28
-
-
85023608268
-
Fault recovery based on checkpointing for hard real-Time embedded systems
-
IEEE
-
Zhang et al. Fault recovery based on checkpointing for hard real-Time embedded systems. In DFT-18. IEEE, 2003.
-
(2003)
DFT-18
-
-
Zhang1
-
29
-
-
85023602827
-
Runtime asynchronous fault tolerance via speculation
-
IRC. International Technology Roadmap For Semiconductors 2.0-Executive Summary. [accessed 19-November-2016
-
Zhang et al. Runtime asynchronous fault tolerance via speculation. In CG0-10. ACM, 2012. IRC. International Technology Roadmap For Semiconductors 2.0-Executive Summary. http://www.itrs2.net/itrs-reports.html, 2015. [accessed 19-November-2016].
-
(2015)
CG0-10. ACM 2012
-
-
Zhang1
-
30
-
-
0033314330
-
Ibm s/390 parallel enterprise server g5 fault tolerance: A historical perspective
-
Spainhoweret al. Ibm s/390 parallel enterprise server g5 fault tolerance: A historical perspective. IBM J RES DEV, 1999.
-
(1999)
IBM J RES DEV
-
-
Spainhower1
|