메뉴 건너뛰기




Volumn , Issue , 2012, Pages 368-379

A case for exploiting subarray-level parallelism (SALP) in DRAM

Author keywords

[No Author keywords available]

Indexed keywords

ACCESS LATENCY; AREA OVERHEAD; LOW COST APPROACH; MULTI-CORE SYSTEMS; NEW MECHANISMS; OFF-CHIP MEMORIES; REQUEST SCHEDULING; SUB-ARRAYS; SYSTEM COSTS; TIMING PARAMETERS;

EID: 84864850807     PISSN: 10636897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ISCA.2012.6237032     Document Type: Conference Paper
Times cited : (353)

References (71)
  • 1
    • 84864837658 scopus 로고    scopus 로고
    • Multicore DIMM: An energy efficient memory module with independently controlled DRAMs
    • Jan.
    • J. H. Ahn et al. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE CAL, Jan. 2009.
    • (2009) IEEE CAL
    • Ahn, J.H.1
  • 2
    • 84864831816 scopus 로고    scopus 로고
    • Improving system energy efficiency with memory rank subsetting
    • Mar.
    • J. H. Ahn et al. Improving system energy efficiency with memory rank subsetting. ACM TACO, Mar. 2012.
    • (2012) ACM TACO
    • Ahn, J.H.1
  • 3
    • 84860350704 scopus 로고    scopus 로고
    • Staged reads: Mitigating the impact of DRAM writes on DRAM reads
    • N. Chatterjee et al. Staged reads: Mitigating the impact of DRAM writes on DRAM reads. In HPCA, 2012.
    • (2012) HPCA
    • Chatterjee, N.1
  • 4
    • 4644226058 scopus 로고    scopus 로고
    • Microarchitecture optimizations for exploiting memorylevel parallelism
    • Y. Chou et al. Microarchitecture optimizations for exploiting memorylevel parallelism. In ISCA, 2004.
    • (2004) ISCA
    • Chou, Y.1
  • 5
    • 0030662863 scopus 로고    scopus 로고
    • Improving data cache performance by preexecuting instructions under a cache miss
    • J. Dundas and T. Mudge. Improving data cache performance by preexecuting instructions under a cache miss. In ICS, 1997.
    • (1997) ICS
    • Dundas, J.1    Mudge, T.2
  • 6
    • 84863348772 scopus 로고    scopus 로고
    • Parallel application memory scheduling
    • E. Ebrahimi et al. Parallel application memory scheduling. In MICRO, 2011.
    • (2011) MICRO
    • Ebrahimi, E.1
  • 7
    • 84864847006 scopus 로고    scopus 로고
    • Enhanced SDRAM SM2604
    • Enhanced Memory Systems. Enhanced SDRAM SM2604, 2002.
    • (2002) Enhanced Memory Systems
  • 9
    • 34547653935 scopus 로고    scopus 로고
    • Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads and scaling
    • B. Ganesh et al. Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads and scaling. In HPCA, 2007.
    • (2007) HPCA
    • Ganesh, B.1
  • 10
    • 0003997750 scopus 로고
    • CDRAM in a unified memory architecture
    • C. A. Hart. CDRAM in a unified memory architecture. In Compcon, 1994.
    • (1994) Compcon
    • Hart, C.A.1
  • 11
    • 0025419834 scopus 로고
    • The cache DRAM architecture: A DRAM with an onchip cache memory
    • Mar.
    • H. Hidaka et al. The cache DRAM architecture: A DRAM with an onchip cache memory. IEEE Micro, Mar. 1990.
    • (1990) IEEE Micro
    • Hidaka, H.1
  • 12
    • 84864860761 scopus 로고    scopus 로고
    • HPCC. RandomAccess. http://icl.cs.utk.edu/hpcc/.
    • RandomAccess
  • 13
    • 0027191655 scopus 로고
    • Performance of cached DRAM organizations in vector supercomputers
    • W.-C. Hsu and J. E. Smith. Performance of cached DRAM organizations in vector supercomputers. In ISCA, 1993.
    • (1993) ISCA
    • Hsu, W.-C.1    Smith, J.E.2
  • 16
    • 52649148744 scopus 로고    scopus 로고
    • Self optimizing memory controllers: A reinforcement learning approach
    • E. Ipek et al. Self optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008.
    • (2008) ISCA
    • Ipek, E.1
  • 18
    • 78650934251 scopus 로고    scopus 로고
    • JEDEC. Standard No 79-3E
    • JEDEC. Standard No. 79-3E. DDR3 SDRAM Specification, 2010.
    • (2010) DDR3 SDRAM Specification
  • 22
    • 70349280616 scopus 로고    scopus 로고
    • 75nm 7Gb/s/pin 1Gb GDDR5 graphics memory device with bandwidth- improvement techniques
    • R. Kho et al. 75nm 7Gb/s/pin 1Gb GDDR5 graphics memory device with bandwidth-improvement techniques. In ISSCC, 2009.
    • (2009) ISSCC
    • Kho, R.1
  • 23
    • 77952558442 scopus 로고    scopus 로고
    • ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers
    • Y. Kim et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.
    • (2010) HPCA
    • Kim, Y.1
  • 24
    • 79951718838 scopus 로고    scopus 로고
    • Thread cluster memory scheduling: Exploiting differences in memory access behavior
    • Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010.
    • (2010) MICRO
    • Kim, Y.1
  • 25
    • 84864831812 scopus 로고    scopus 로고
    • Latched row decoder for a random access memory
    • U.S. patent number 5615164
    • T. Kirihata. Latched row decoder for a random access memory. U.S. patent number 5615164, 1997.
    • (1997)
    • Kirihata, T.1
  • 26
    • 84864847008 scopus 로고    scopus 로고
    • Conditional-capture flip-flop for statistical power reduction
    • B.-S. Kong et al. Conditional-capture flip-flop for statistical power reduction. IEEE JSSC, 2001.
    • (2001) IEEE JSSC
    • Kong, B.-S.1
  • 27
    • 84904279959 scopus 로고
    • Lockup-free instruction fetch/prefetch cache organization
    • D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA, 1981.
    • (1981) ISCA
    • Kroft, D.1
  • 28
    • 70450235471 scopus 로고    scopus 로고
    • Architecting phase change memory as a scalable DRAM alternative
    • B. C. Lee et al. Architecting phase change memory as a scalable DRAM alternative. In ISCA, 2009.
    • (2009) ISCA
    • Lee, B.C.1
  • 29
    • 84860332549 scopus 로고    scopus 로고
    • DRAM-aware last-level cache writeback: Reducing writecaused interference in memory systems
    • UT Austin
    • C. J. Lee et al. DRAM-aware last-level cache writeback: Reducing writecaused interference in memory systems. TR-HPS-2010-002, UT Austin, 2010.
    • (2010) TR-HPS-2010-002
    • Lee, C.J.1
  • 30
    • 31944440969 scopus 로고    scopus 로고
    • Pin: Building customized program analysis tools with dynamic instrumentation
    • C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.
    • (2005) PLDI
    • Luk, C.-K.1
  • 35
    • 84855258577 scopus 로고    scopus 로고
    • Bandwidth engine serial memory chip breaks 2 billion accesses/ sec
    • M. J. Miller. Bandwidth engine serial memory chip breaks 2 billion accesses/ sec. In HotChips, 2011.
    • (2011) HotChips
    • Miller, M.J.1
  • 36
    • 70349280617 scopus 로고    scopus 로고
    • 1.2V 1.6Gb/s 56nm 6F2 4Gb DDR3 SDRAM with hybrid- I/O sense amplifier and segmented sub-array architecture
    • Y. Moon et al. 1.2V 1.6Gb/s 56nm 6F2 4Gb DDR3 SDRAM with hybrid- I/O sense amplifier and segmented sub-array architecture. In ISSCC, 2009.
    • (2009) ISSCC
    • Moon, Y.1
  • 37
    • 52649128991 scopus 로고    scopus 로고
    • Memory performance attacks: Denial of memory service in multi-core systems
    • T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX SS, 2007.
    • (2007) USENIX SS
    • Moscibroda, T.1    Mutlu, O.2
  • 38
    • 84858771269 scopus 로고    scopus 로고
    • Reducing memory interference in multicore systems via application-aware memory channel partitioning
    • S. P. Muralidhara et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In MICRO, 2011.
    • (2011) MICRO
    • Muralidhara, S.P.1
  • 39
    • 84955506994 scopus 로고    scopus 로고
    • Runahead execution: An alternative to very large instruction windows for out-of-order processors
    • O. Mutlu et al. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In HPCA, 2003.
    • (2003) HPCA
    • Mutlu, O.1
  • 40
    • 47349122373 scopus 로고    scopus 로고
    • Stall-time fair memory access scheduling for chip multiprocessors
    • O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.
    • (2007) MICRO
    • Mutlu, O.1    Moscibroda, T.2
  • 41
    • 52649119398 scopus 로고    scopus 로고
    • Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems
    • O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008.
    • (2008) ISCA
    • Mutlu, O.1    Moscibroda, T.2
  • 43
    • 34548050337 scopus 로고    scopus 로고
    • Fair queuing memory systems
    • K. J. Nesbit et al. Fair queuing memory systems. In MICRO, 2006.
    • (2006) MICRO
    • Nesbit, K.J.1
  • 44
    • 84864860561 scopus 로고    scopus 로고
    • Semiconductor memory having a bank with sub-banks
    • U.S. patent number 7782703
    • J.-h. Oh. Semiconductor memory having a bank with sub-banks. U.S. patent number 7782703, 2010.
    • (2010)
    • Oh, J.-H.1
  • 45
    • 33845874613 scopus 로고    scopus 로고
    • A case for MLP-aware cache replacement
    • M. K. Qureshi et al. A case for MLP-aware cache replacement. In ISCA, 2006.
    • (2006) ISCA
    • Qureshi, M.K.1
  • 47
    • 0033691565 scopus 로고    scopus 로고
    • Memory access scheduling
    • S. Rixner et al. Memory access scheduling. In ISCA, 2000.
    • (2000) ISCA
    • Rixner, S.1
  • 48
    • 84864831813 scopus 로고    scopus 로고
    • DRAMSim2: A cycle accurate memory system simulator
    • Jan.
    • P. Rosenfeld et al. DRAMSim2: A cycle accurate memory system simulator. IEEE CAL, Jan. 2011.
    • (2011) IEEE CAL
    • Rosenfeld, P.1
  • 50
    • 0031641453 scopus 로고    scopus 로고
    • Fast Cycle RAM (FCRAM); A 20-ns random row access, pipe-lined operating DRAM
    • Y. Sato et al. Fast Cycle RAM (FCRAM); a 20-ns random row access, pipe-lined operating DRAM. In Symposium on VLSI Circuits, 1998.
    • (1998) Symposium on VLSI Circuits
    • Sato, Y.1
  • 51
    • 81255177633 scopus 로고    scopus 로고
    • IBM POWER7 multicore server processor
    • May
    • B. Sinharoy et al. IBM POWER7 multicore server processor. IBM Journal Res. Dev., May. 2011.
    • (2011) IBM Journal Res. Dev.
    • Sinharoy, B.1
  • 52
    • 0018282603 scopus 로고
    • A pipelined shared resource MIMD computer
    • B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.
    • (1978) ICPP
    • Smith, B.J.1
  • 53
    • 0034443570 scopus 로고    scopus 로고
    • Symbiotic jobscheduling for a simultaneous multithreaded processor
    • A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.
    • (2000) ASPLOS
    • Snavely, A.1    Tullsen, D.M.2
  • 54
    • 84864847009 scopus 로고    scopus 로고
    • STREAM Benchmark. http://www.streambench.org/.
  • 55
    • 77954992165 scopus 로고    scopus 로고
    • The virtual write queue: Coordinating DRAM and last-level cache policies
    • J. Stuecheli et al. The virtual write queue: Coordinating DRAM and last-level cache policies. In ISCA, 2010.
    • (2010) ISCA
    • Stuecheli, J.1
  • 56
    • 77952283542 scopus 로고    scopus 로고
    • Micro-pages: Increasing DRAM efficiency with localityaware data placement
    • K. Sudan et al. Micro-pages: Increasing DRAM efficiency with localityaware data placement. In ASPLOS, 2010.
    • (2010) ASPLOS
    • Sudan, K.1
  • 58
    • 84863352139 scopus 로고
    • Parallel operation in the control data 6600
    • J. E. Thornton. Parallel operation in the control data 6600. In AFIPS, 1965.
    • (1965) AFIPS
    • Thornton, J.E.1
  • 59
    • 52649139073 scopus 로고    scopus 로고
    • A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies
    • S. Thoziyoor et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In ISCA, 2008.
    • (2008) ISCA
    • Thoziyoor, S.1
  • 60
    • 0003081830 scopus 로고
    • An efficient algorithm for exploiting multiple arithmetic units
    • Jan.
    • R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal Res. Dev., Jan. 1967.
    • (1967) IBM Journal Res. Dev.
    • Tomasulo, R.M.1
  • 61
    • 84864847011 scopus 로고    scopus 로고
    • TPC. http://www.tpc.org/.
  • 62
    • 77954989143 scopus 로고    scopus 로고
    • Rethinking DRAM design and organization for energyconstrained multi-cores
    • A. N. Udipi et al. Rethinking DRAM design and organization for energyconstrained multi-cores. In ISCA, 2010.
    • (2010) ISCA
    • Udipi, A.N.1
  • 63
    • 79951702954 scopus 로고    scopus 로고
    • Understanding the energy consumption of dynamic random access memories
    • T. Vogelsang. Understanding the energy consumption of dynamic random access memories. In MICRO, 2010.
    • (2010) MICRO
    • Vogelsang, T.1
  • 64
    • 49749122679 scopus 로고    scopus 로고
    • Improving power and data efficiency with threaded memory modules
    • F. Ware and C. Hampel. Improving power and data efficiency with threaded memory modules. In ICCD, 2006.
    • (2006) ICCD
    • Ware, F.1    Hampel, C.2
  • 66
    • 0031363421 scopus 로고    scopus 로고
    • The hierarchical multi-bank DRAM: A highperformance architecture for memory integrated with processors
    • T. Yamauchi et al. The hierarchical multi-bank DRAM: A highperformance architecture for memory integrated with processors. In Advanced Research in VLSI, 1997.
    • (1997) Advanced Research in VLSI
    • Yamauchi, T.1
  • 67
    • 76749123978 scopus 로고    scopus 로고
    • Complexity effective memory access scheduling for many-core accelerator architectures
    • G. L. Yuan et al. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO, 2009.
    • (2009) MICRO
    • Yuan, G.L.1
  • 68
    • 0034460897 scopus 로고    scopus 로고
    • A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
    • Z. Zhang et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO, 2000.
    • (2000) MICRO
    • Zhang, Z.1
  • 69
    • 0035389657 scopus 로고    scopus 로고
    • Cached DRAM for ILP processor memory access latency reduction
    • Jul.
    • Z. Zhang et al. Cached DRAM for ILP processor memory access latency reduction. IEEE Micro, Jul. 2001.
    • (2001) IEEE Micro
    • Zhang, Z.1
  • 70
    • 66749162556 scopus 로고    scopus 로고
    • Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
    • H. Zheng et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In MICRO, 2008.
    • (2008) MICRO
    • Zheng, H.1
  • 71
    • 52649113530 scopus 로고    scopus 로고
    • Controller for a synchronous DRAMthat maximizes throughput by allowing memory requests and commands to be issued out of order
    • U.S. patent number 5630096
    • W. K. Zuravleff and T. Robinson. Controller for a synchronous DRAMthat maximizes throughput by allowing memory requests and commands to be issued out of order. U.S. patent number 5630096, 1997.
    • (1997)
    • Zuravleff, W.K.1    Robinson, T.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.