메뉴 건너뛰기




Volumn , Issue , 2014, Pages 467-478

Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory

Author keywords

datacenter cost; DRAM; hard errors; memory architectures; memory errors; soft errors; software reliability

Indexed keywords

COST REDUCTION; DYNAMIC RANDOM ACCESS STORAGE; ERRORS; RADIATION HARDENING; SOFTWARE RELIABILITY;

EID: 84912138346     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/DSN.2014.50     Document Type: Conference Paper
Times cited : (141)

References (70)
  • 2
    • 2342577992 scopus 로고    scopus 로고
    • Energy-e-cient server clusters
    • E. M. Elnozahy et al., "Energy-E-cient Server Clusters, " in PACS, 2003.
    • (2003) PACS
    • Elnozahy, E.M.1
  • 5
    • 84889669277 scopus 로고    scopus 로고
    • No more electrical infrastructure: Towards fuel cell powered data centers
    • A. C. Riekstin et al., " No More Electrical Infrastructure: Towards Fuel Cell Powered Data Centers," in HotPower, 2013.
    • (2013) HotPower
    • Riekstin, A.C.1
  • 6
    • 77956082324 scopus 로고    scopus 로고
    • Server engineering insights for large-sc ale online services
    • C. Kozyrakis et al., "Server Engineering Insights for Large-Sc ale Online Services," IEEE Micro, 2010.
    • (2010) IEEE Micro
    • Kozyrakis, C.1
  • 7
    • 85074591232 scopus 로고    scopus 로고
    • Scaling memcache at facebook
    • R. Nishtala et al., "Scaling Memcache at Facebook, " in NSDI, 2013.
    • (2013) NSDI
    • Nishtala, R.1
  • 10
    • 6344250139 scopus 로고    scopus 로고
    • A white paper on the benefits of chipkill-correct ecc for pc server main memory
    • T. J. Dell, "A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory," IBM Microelectronics Division, 1997.
    • (1997) IBM Microelectronics Division
    • Dell, T.J.1
  • 11
    • 84912068232 scopus 로고    scopus 로고
    • Ibm zenterprise redundant array of independent memory subsystem
    • P. J. Meaney et al., " IBM zEnterprise Redundant Array of Independent Memory Subsystem," IBM JRD, 2012.
    • (2012) IBM JRD
    • Meaney, P.J.1
  • 13
    • 70449657893 scopus 로고    scopus 로고
    • Dram errors in the wild: A large-scale field study
    • B. Schroeder et al., "DRAM Errors in the Wild: A Large-Scale Field Study, " in SIGMETRICS Performance, 2009.
    • (2009) SIGMETRICS Performance
    • Schroeder, B.1
  • 14
    • 84877721508 scopus 로고    scopus 로고
    • A study of dram failures in the field
    • V. Sridharan et al., "A Study of DRAM Failures in the Field, " in SC, 2012.
    • (2012) SC
    • Sridharan, V.1
  • 15
    • 84858781341 scopus 로고    scopus 로고
    • Cosmic rays don't strike twice: Understanding the nature of dram errors and the implications for system design
    • A. A. Hwang et al., " Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design," in ASPLOS, 2012.
    • (2012) ASPLOS
    • Hwang, A.A.1
  • 16
    • 83155174060 scopus 로고    scopus 로고
    • A realistic evaluation of memory hardware errors and software system susceptibility
    • X. Li,et al. A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility in USENIX ATC 2010
    • (2010) USENIX ATC
    • Li, X.1
  • 17
    • 79951712962 scopus 로고    scopus 로고
    • Elastic refresh: Techn iques to mitigate refresh penalties in high density memory
    • J. Stuecheli et al., "Elastic refresh: Techn iques to mitigate refresh penalties in high density memory," in MICRO, 2010.
    • (2010) MICRO
    • Stuecheli, J.1
  • 18
    • 84912074380 scopus 로고    scopus 로고
    • JEDEC Solid State Technology Association
    • JEDEC Solid State Technology Association, " JEDEC Standard: DDR3 SDRAM, JESD79-3C," 2008.
    • (2008) JEDEC Standard: DDR3 SDRAM, JESD79-3C
  • 19
    • 84877705582 scopus 로고    scopus 로고
    • Detection and correction of silent data corruption for large-scale high-performance computing
    • D. Fiala et al., " Detection and Correction of Silent Data Corruption for Large-scale High-performance Computing," in SC, 2012.
    • (2012) SC
    • Fiala, D.1
  • 22
    • 33845564156 scopus 로고    scopus 로고
    • Assessment of the e-ect of memory page retirement on system ras again st hardware faults
    • D. Tang et al., "Assessment of the E-ect of Memory Page Retirement on System RAS Again st Hardware Faults," in DSN, 2006.
    • (2006) DSN
    • Tang, D.1
  • 23
    • 84864850807 scopus 로고    scopus 로고
    • A case for exploiting subarray-level parallelism (salp) in dram
    • Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM, " in ISCA, 2012.
    • (2012) ISCA
    • Kim, Y.1
  • 24
    • 84880276949 scopus 로고    scopus 로고
    • Tiered-latency dram: A low latency and low cost dram architecture
    • D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, 2013.
    • (2013) HPCA
    • Lee, D.1
  • 25
    • 84944567950 scopus 로고    scopus 로고
    • Rowclone: Fast and e-cient in-dram copy and initialization of bulk data
    • V. Seshadri et al., " RowClone: Fast and E-cient In-DRAM Copy and Initialization of Bulk Data," in MICRO, 2013.
    • (2013) MICRO
    • Seshadri, V.1
  • 26
    • 84904320452 scopus 로고    scopus 로고
    • The e-cacy of error mitigation techniques for dram retention failures: A comparative experimental study
    • S. Khan et al., " The E-cacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study," in SIGMETRICS, 2014.
    • (2014) SIGMETRICS
    • Khan, S.1
  • 27
    • 84905484287 scopus 로고    scopus 로고
    • Flipping bits in memory without accessing them: An experimental study of dram disturbance errors
    • Y. Kim et al., "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," in ISCA, 2014.
    • (2014) ISCA
    • Kim, Y.1
  • 28
    • 0018331014 scopus 로고
    • Alpha-particle-induced soft errors in dynamic memories
    • T. C. May et al., "Alpha-Particle-Induced Soft Errors in Dynamic Memories, " IEEE T-ED, 1979.
    • (1979) IEEE T-ED
    • May, T.C.1
  • 29
    • 84899689608 scopus 로고    scopus 로고
    • Feng shui of supercomputer memory: Positional e-ects in dram and sram faults
    • V. Sridharan et al., " Feng Shui of Supercomputer Memory: Positional E-ects in DRAM and SRAM Faults," in SC, 2013.
    • (2013) SC
    • Sridharan, V.1
  • 30
    • 84940424704 scopus 로고    scopus 로고
    • Analysis and modeling of memory errors from large-scale f ield data collection
    • T. Siddiqua et al., "Analysis and Modeling of Memory Errors from Large-scale F ield Data Collection," in SELSE, 2013.
    • (2013) SELSE
    • Siddiqua, T.1
  • 31
    • 84912112129 scopus 로고    scopus 로고
    • Susceptibility of commodity systems and software to memory soft errors
    • A. Messer et al., " Susceptibility of Commodity Systems and Software to Memory Soft Errors," IEEE TC, 2004.
    • (2004) IEEE TC
    • Messer, A.1
  • 32
    • 84877692741 scopus 로고    scopus 로고
    • Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool
    • D. Li et al., " Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool," in SC, 2012.
    • (2012) SC
    • Li, D.1
  • 33
    • 34547697289 scopus 로고    scopus 로고
    • Application-level correctness and its impact on fault tolerance
    • X. Li et al., "Application-Level Correctness and Its Impact on Fault Tolerance," in HPCA, 2007.
    • (2007) HPCA
    • Li, X.1
  • 34
    • 84858790858 scopus 로고    scopus 로고
    • Architecture support for disciplined approximate programming
    • H. Esmaeilzadeh et al., " Architecture Support for Disciplined Approximate Programming," in ASPLOS, 2012.
    • (2012) ASPLOS
    • Esmaeilzadeh, H.1
  • 35
    • 84897803531 scopus 로고    scopus 로고
    • Uncertain: A first-order type for uncertain data
    • J. Bornholt et al., " Uncertain: A First-order Type for Uncertain Data," in ASPLOS, 2014.
    • (2014) ASPLOS
    • Bornholt, J.1
  • 36
    • 84912068229 scopus 로고    scopus 로고
    • "Intel iACT, " http://www.github.com/IntelLabs/iACT.
    • Intel IACT
  • 37
    • 77952257218 scopus 로고    scopus 로고
    • Virtua lized and flexible ecc for main memory
    • D. H. Yoon et al., "Virtua lized and Flexible ECC for Main Memory," in ASPLOS, 2010.
    • (2010) ASPLOS
    • Yoon, D.H.1
  • 38
    • 84857724913 scopus 로고    scopus 로고
    • Rampage: Graceful degradation management for memory errors in commodity linux servers
    • H. Schirmeier,et al. RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers in PRDC 2011.
    • (2011) PRDC
    • Schirmeier, H.1
  • 39
    • 84883441643 scopus 로고    scopus 로고
    • Generative software-based memory error detection and correction for operating system data structures
    • C. Borchert et al., " Generative Software-Based Memory Error Detection and Correction for Operating System Data Structures," in DSN, 2013.
    • (2013) DSN
    • Borchert, C.1
  • 40
    • 59249096511 scopus 로고    scopus 로고
    • Samurai: Protecting crit ical data in unsafe languages
    • K. Pattabiraman et al., "Samurai: Protecting Crit ical Data in Unsafe Languages," in EuroSys, 2008.
    • (2008) EuroSys
    • Pattabiraman, K.1
  • 41
    • 33750415121 scopus 로고    scopus 로고
    • Automatic instruction-level software-only recovery
    • J. Chang et al., "Automatic Instruction-Level Software-Only Recovery, " in DSN, 2006.
    • (2006) DSN
    • Chang, J.1
  • 42
    • 0034590511 scopus 로고    scopus 로고
    • A c/c++ sou rce-to-source compiler for dependable applications
    • A. Benso et al., "A C/C++ Sou rce-to-Source Compiler for Dependable Applications," in DSN, 2000.
    • (2000) DSN
    • Benso, A.1
  • 43
    • 77953110390 scopus 로고    scopus 로고
    • Ersa: Error resilient system architecture for probabilistic applications
    • L. Leem et al., " ERSA: Error Resilient System Architecture for Probabilistic Applications," in DATE, 2010.
    • (2010) DATE
    • Leem, L.1
  • 44
    • 53349142162 scopus 로고    scopus 로고
    • Trace-based microarchitecture-level diagnosis of permanent hardware faults
    • M.-L. Li et al., " Trace-Based Microarchitecture-level Diagnosis of Permanent Hardware Faults," in DSN, 2008.
    • (2008) DSN
    • Li, M.-L.1
  • 45
    • 84866649524 scopus 로고    scopus 로고
    • Understanding soft error propagation using e-cient vulnerability-driven fault injection
    • X. Xu et al., " Understanding Soft Error Propagation Using E-cient Vulnerability-Driven Fault Injection," in DSN, 2012.
    • (2012) DSN
    • Xu, X.1
  • 46
    • 53349140999 scopus 로고    scopus 로고
    • Understanding the propagation of hard errors to software and implications for resilient system design
    • M.-L. Li et al., " Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design," in ASPLOS, 2008.
    • (2008) ASPLOS
    • Li, M.-L.1
  • 47
    • 79953075520 scopus 로고    scopus 로고
    • Flikker: Saving dram refresh-power through critical data partitioning
    • S. Liu et al., " Flikker: Saving DRAM Refresh-Power Through Critical Data Partitioning," in ASPLOS, 2011.
    • (2011) ASPLOS
    • Liu, S.1
  • 48
    • 84864829982 scopus 로고    scopus 로고
    • Boom: Enabling mobile memory based lowpower server dim ms
    • D. H. Yoon et al., "BOOM: Enabling Mobile Memory Based Lowpower Server DIM Ms," in ISCA, 2012.
    • (2012) ISCA
    • Yoon, D.H.1
  • 49
    • 84864850882 scopus 로고    scopus 로고
    • Towards energy-proportional datacenter memory with mobile dram
    • K. T. Malladi et al., " Towards Energy-proportional Datacenter Memory with Mobile DRAM," in ISCA, 2012.
    • (2012) ISCA
    • Malladi, K.T.1
  • 50
    • 70450273507 scopus 로고    scopus 로고
    • Scalable high pe rformance main memory system using phase-change memory technology
    • M. K. Qureshi et al., "Scalable High Pe rformance Main Memory System Using Phase-Change Memory Technology," in ISCA, 2009.
    • (2009) ISCA
    • Qureshi, M.K.1
  • 51
    • 84872056636 scopus 로고    scopus 로고
    • Row bu-er locality aware caching policies for hybrid memories
    • H. Yoon et al., "Row Bu-er Locality Aware Caching Policies for Hybrid Memories, " in ICCD, 2012.
    • (2012) ICCD
    • Yoon, H.1
  • 52
    • 79957551382 scopus 로고    scopus 로고
    • Mlp aware heterogeneous memory system
    • S. Phadke et al., "MLP Aware Heterogeneous Memory System, " in DATE, 2011.
    • (2011) DATE
    • Phadke, S.1
  • 53
    • 84876591680 scopus 로고    scopus 로고
    • Leveraging heterogeneity in dram main memories to accelerate critical word access
    • N. Chatterjee et al., " Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access," in MICRO, 2012.
    • (2012) MICRO
    • Chatterjee, N.1
  • 54
    • 84870990173 scopus 로고    scopus 로고
    • Enabling e-cient and scalable hybrid memories using fine-granularity dram cache management
    • J. Meza et al., " Enabling e-cient and scalable hybrid memories using fine-granularity dram cache management," IEEE Computer Architecture Letters, 2012.
    • (2012) IEEE Computer Architecture Letters
    • Meza, J.1
  • 57
    • 84881189389 scopus 로고    scopus 로고
    • An experimental study of data retention behavio r in modern dram devices: Implications for retention time profiling mechanisms
    • J. Liu et al., "An Experimental Study of Data Retention Behavio r in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms," in ISCA, 2013.
    • (2013) ISCA
    • Liu, J.1
  • 58
    • 77954977639 scopus 로고    scopus 로고
    • Web search using mobile cores: Quantifying and mitigating the price of e-ciency
    • V. J. Reddi et al., " Web Search Using Mobile Cores: Quantifying and Mitigating the Price of E-ciency," in ISCA, 2010.
    • (2010) ISCA
    • Reddi, V.J.1
  • 59
    • 84889578845 scopus 로고    scopus 로고
    • Memcached http://memcached.org/
    • Memcached
  • 60
    • 84863735533 scopus 로고    scopus 로고
    • Distributed graphlab: A framework for machine learning and data mining in the cloud
    • Y. Low et al., " Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud," PVLDB, 2012.
    • (2012) PVLDB
    • Low, Y.1
  • 62
    • 84858759524 scopus 로고    scopus 로고
    • Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults
    • S. K. S. Hari et al., " Relyzer: Exploiting Application-Level Fault Equivalence to Analyze Application Resiliency to Transient Faults," in ASPLOS, 2012.
    • (2012) ASPLOS
    • Hari, S.K.S.1
  • 63
    • 4544282186 scopus 로고    scopus 로고
    • Characterizing the e-ects of transient faults on a high-performance processor pipeline
    • N. J. Wang et al., " Characterizing the E-ects of Transient Faults on a High-Performance Processor Pipeline," in DSN, 2004.
    • (2004) DSN
    • Wang, N.J.1
  • 64
    • 79955966760 scopus 로고    scopus 로고
    • Cycles, cells and platters: An empirical analysis of hardware failures on a million consumer pcs
    • E. B. Nightingale et al., " Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs," in EuroSys, 2011.
    • (2011) EuroSys
    • Nightingale, E.B.1
  • 65
    • 84906487819 scopus 로고    scopus 로고
    • Enhancing server availability and se curity through failure-oblivious computing
    • M. C. Rinard et al., "Enhancing Server Availability and Se curity Through Failure-Oblivious Computing," in OSDI, 2004.
    • (2004) OSDI
    • Rinard, M.C.1
  • 66
    • 70450227674 scopus 로고    scopus 로고
    • Disaggregated memory for expansion and sharing in blade servers
    • K. Lim et al., "Disaggregated Memory for Expansion and Sharing in Blade Servers, " in ISCA, 2009.
    • (2009) ISCA
    • Lim, K.1
  • 67
    • 79959878920 scopus 로고    scopus 로고
    • Enerj: Approximate data types for safe and general low-power computation
    • A. Sampson e t al., " EnerJ: Approximate Data Types for Safe and General Low-Power Computation," in PLDI, 2011.
    • (2011) PLDI
    • Sampson, A.1
  • 68
    • 84912068228 scopus 로고    scopus 로고
    • A rising tide lifts all boats: How memory error prediction and prevention can help with virtualized system longevity
    • Y. Du et al., " A Rising Tide Lifts All Boats: How Memory Error Prediction and Prevention Can Help with Virtualized System Longevity," in HotDep, 2010.
    • (2010) HotDep
    • Du, Y.1
  • 69
    • 84912103691 scopus 로고    scopus 로고
    • "Memtest86+, " http://www.memtest.org/.
    • Memtest86+
  • 70
    • 84912112126 scopus 로고    scopus 로고
    • Memory scaling: A systems architecture perspective
    • O. Mutlu, "Memory Scaling: A Systems Architecture Perspective, " in MEMCON, 2013.
    • (2013) MEMCON
    • Mutlu, O.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.