메뉴 건너뛰기




Volumn , Issue , 2015, Pages 283-295

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

Author keywords

[No Author keywords available]

Indexed keywords

ACCELERATION; BANDWIDTH; DATA TRANSFER; ELECTRONICS PACKAGING; ENERGY UTILIZATION; INTEGRATED CIRCUIT INTERCONNECTS; MEMORY ARCHITECTURE; THREE DIMENSIONAL INTEGRATED CIRCUITS;

EID: 84934280905     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/HPCA.2015.7056040     Document Type: Conference Paper
Times cited : (271)

References (73)
  • 1
    • 80054875176 scopus 로고    scopus 로고
    • GPUs and the future of parallel computing
    • Sep.
    • S. W. Keckler et al., "GPUs and the Future of Parallel Computing," IEEE Micro, vol. 31, no. 5, pp. 7-17, Sep. 2011.
    • (2011) IEEE Micro , vol.31 , Issue.5 , pp. 7-17
    • Keckler, S.W.1
  • 2
    • 84890064931 scopus 로고    scopus 로고
    • Role of interconnects in the future of computing
    • Dec.
    • S. Borkar, "Role of Interconnects in the Future of Computing," J. Light. Technol., vol. 31, no. 24, pp. 3927-3933, Dec. 2013.
    • (2013) J. Light. Technol. , vol.31 , Issue.24 , pp. 3927-3933
    • Borkar, S.1
  • 3
    • 84995478726 scopus 로고
    • FBRAM: A new form of memory optimized for 3D graphics
    • M. F. Deering et al., "FBRAM: a new form of memory optimized for 3D graphics," in Computer Graphics and Interactive Techniques, 1994, pp. 167-174.
    • (1994) Computer Graphics and Interactive Techniques , pp. 167-174
    • Deering, M.F.1
  • 4
    • 0036374270 scopus 로고    scopus 로고
    • The architecture of the DIVA processing-in-memory chip
    • J. Draper et al., "The architecture of the DIVA processing-in-memory chip," in Intl. Conf. on Supercomputing, 2002, pp. 14-25.
    • (2002) Intl. Conf. on Supercomputing , pp. 14-25
    • Draper, J.1
  • 5
    • 11644259287 scopus 로고
    • Computational Ram: A memory-SIMD hybrid and its application to DSP
    • D. G. Elliott et al., "Computational Ram: A memory-SIMD hybrid and its application To DSP," in IEEE Custom Integrated Circuits Conference, 1992, pp. 30. 6. 1-30. 6. 4.
    • (1992) IEEE Custom Integrated Circuits Conference , pp. 3061-3064
    • Elliott, D.G.1
  • 7
    • 0033688597 scopus 로고    scopus 로고
    • Smart Memories: A modular reconfigurable architecture
    • K. Mai et al., "Smart Memories: a modular reconfigurable architecture," in Intl. Symp. on Computer Architecture, 2000, pp. 161-171.
    • (2000) Intl. Symp. on Computer Architecture , pp. 161-171
    • Mai, K.1
  • 8
    • 0031594009 scopus 로고    scopus 로고
    • Active Pages: A computation model for intelligent memory
    • M. Oskin et al., "Active Pages: A computation model for intelligent memory," in Intl. Symp. on Computer Architecture, 1998, pp. 192-203.
    • (1998) Intl. Symp. on Computer Architecture , pp. 192-203
    • Oskin, M.1
  • 9
    • 0031096193 scopus 로고    scopus 로고
    • A case for intelligent RAM
    • D. Patterson et al., "A case for intelligent RAM," IEEE Micro, vol. 17, no. 2, pp. 34-44, 1997.
    • (1997) IEEE Micro , vol.17 , Issue.2 , pp. 34-44
    • Patterson, D.1
  • 10
    • 84945924942 scopus 로고    scopus 로고
    • A processing in memory taxonomy and a case for studying fixed-function PIM
    • G. H. Loh et al., "A processing in memory taxonomy and a case for studying fixed-function PIM," in Workshop on Near-Data Processing, 2013.
    • (2013) Workshop on Near-Data Processing
    • Loh, G.H.1
  • 11
    • 0031383426 scopus 로고    scopus 로고
    • Intelligent RAM (IRAM): The industrial setting, applications, and architectures
    • D. Patterson et al., "Intelligent RAM (IRAM): the industrial setting, applications, and architectures," in Intl. Conf. on Computer Design, 1997, pp. 2-7.
    • (1997) Intl. Conf. on Computer Design , pp. 2-7
    • Patterson, D.1
  • 12
    • 84934332496 scopus 로고    scopus 로고
    • High-level programming model abstractions for processing in memory
    • M. Chu et al., "High-level programming model abstractions for processing in memory," in Workshop on Near-Data Processing, 2013.
    • (2013) Workshop on Near-Data Processing
    • Chu, M.1
  • 13
    • 84876043213 scopus 로고    scopus 로고
    • Centip3De: A 64-Core, 3D Stacked Near-Threshold System
    • Mar.
    • R. G. Dreslinski et al., "Centip3De: A 64-Core, 3D Stacked Near-Threshold System," IEEE Micro, vol. 33, no. 2, pp. 8-16, Mar. 2013.
    • (2013) IEEE Micro , vol.33 , Issue.2 , pp. 8-16
    • Dreslinski, R.G.1
  • 15
    • 84860655377 scopus 로고    scopus 로고
    • 3D-MAPS: 3D massively parallel processor with stacked memory
    • D. H. Kim et al., "3D-MAPS: 3D massively parallel processor with stacked memory," in IEEE Intl. Solid-State Circuits Conference, 2012, pp. 188-190.
    • (2012) IEEE Intl. Solid-State Circuits Conference , pp. 188-190
    • Kim, D.H.1
  • 16
    • 53749097461 scopus 로고    scopus 로고
    • POD: A 3D-Integrated Broad-Purpose Acceleration Layer
    • Jul.
    • D. H. Woo et al., "POD: A 3D-Integrated Broad-Purpose Acceleration Layer," IEEE Micro, vol. 28, no. 4, pp. 28-40, Jul. 2008.
    • (2008) IEEE Micro , vol.28 , Issue.4 , pp. 28-40
    • Woo, D.H.1
  • 18
    • 84893898462 scopus 로고    scopus 로고
    • A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing
    • Q. Zhu et al., "A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing," in IEEE Intl. 3D Systems Integration Conf., 2013, pp. 1-7.
    • (2013) IEEE Intl. 3D Systems Integration Conf. , pp. 1-7
    • Zhu, Q.1
  • 20
    • 84919356481 scopus 로고    scopus 로고
    • DRAMA: An architecture for accelerated processing near memory
    • A. Farmahini-Farahani et al., "DRAMA: An Architecture for Accelerated Processing near Memory," IEEE Comput. Archit. Lett., 2014.
    • (2014) IEEE Comput. Archit. Lett.
    • Farmahini-Farahani, A.1
  • 21
    • 33645656262 scopus 로고    scopus 로고
    • A 512-Mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques
    • Apr.
    • C. Park et al., "A 512-Mb DDR3 SDRAM Prototype With CIO Minimization and Self-Calibration Techniques," IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.
    • (2006) IEEE J. Solid-State Circuits , vol.41 , Issue.4 , pp. 831-838
    • Park, C.1
  • 22
    • 84881125193 scopus 로고    scopus 로고
    • Reducing memory access latency with asymmetric DRAM bank organizations
    • Y. H. Son et al., "Reducing memory access latency with asymmetric DRAM bank organizations," in Intl. Symp. on Computer Architecture, 2013, vol. 41, no. 3, pp. 380-391.
    • (2013) Intl. Symp. on Computer Architecture , vol.41 , Issue.3 , pp. 380-391
    • Son, Y.H.1
  • 23
    • 84869168810 scopus 로고    scopus 로고
    • DySER: Unifying functionality and parallelism specialization for energy-efficient computing
    • Sep.
    • V. Govindaraju et al., "DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing," IEEE Micro, vol. 32, no. 5, pp. 38-51, Sep. 2012.
    • (2012) IEEE Micro , vol.32 , Issue.5 , pp. 38-51
    • Govindaraju, V.1
  • 24
    • 84883088830 scopus 로고    scopus 로고
    • A general constraint-centric scheduling framework for spatial architectures
    • T. Nowatzki et al., "A general constraint-centric scheduling framework for spatial architectures," in Conf. on Programming Language Design and Implementation, 2013, pp. 495-506.
    • (2013) Conf. on Programming Language Design and Implementation , pp. 495-506
    • Nowatzki, T.1
  • 25
    • 0000227930 scopus 로고    scopus 로고
    • Reconfigurable computing: A survey of systems and software
    • Jun.
    • K. Compton and S. Hauck, "Reconfigurable computing: a survey of systems and software," ACM Comput. Surv., vol. 34, no. 2, pp. 171-210, Jun. 2002.
    • (2002) ACM Comput. Surv. , vol.34 , Issue.2 , pp. 171-210
    • Compton, K.1    Hauck, S.2
  • 28
    • 79955890625 scopus 로고    scopus 로고
    • Dynamically specialized datapaths for energy efficient computing
    • V. Govindaraju et al., "Dynamically specialized datapaths for energy efficient computing," in High Performance Computer Architecture, 2011, pp. 503-514.
    • (2011) High Performance Computer Architecture , pp. 503-514
    • Govindaraju, V.1
  • 29
    • 84905455447 scopus 로고    scopus 로고
    • Single-graph multiple flows: Energy efficient design alternative for GPGPUs
    • D. Voitsechov and Y. Etsion, "Single-graph multiple flows: Energy efficient design alternative for GPGPUs," in Intl. Symp. on Computer Architecture, 2014, pp. 205-216.
    • (2014) Intl. Symp. on Computer Architecture , pp. 205-216
    • Voitsechov, D.1    Etsion, Y.2
  • 30
    • 84881163269 scopus 로고    scopus 로고
    • Triggered instructions: A control paradigm for spatially-programmed architectures
    • A. Parashar et al., "Triggered instructions: a control paradigm for spatially-programmed architectures," in Intl. Symp. on Computer Architecture, 2013, pp. 142-153.
    • (2013) Intl. Symp. on Computer Architecture , pp. 142-153
    • Parashar, A.1
  • 33
    • 84874086986 scopus 로고    scopus 로고
    • ULP-SRP: Ultra low power Samsung Reconfigurable Processor for biomedical applications
    • C. Kim et al., "ULP-SRP: Ultra low power Samsung Reconfigurable Processor for biomedical applications," in Intl. Conf. on Field-Programmable Technology, 2012, pp. 329-334.
    • (2012) Intl. Conf. on Field-Programmable Technology , pp. 329-334
    • Kim, C.1
  • 34
    • 84894213024 scopus 로고    scopus 로고
    • A scalable GPU architecture based on dynamically reconfigurable embedded processor
    • W.-J. Lee et al., "A scalable GPU architecture based on dynamically reconfigurable embedded processor," in High Performance Graphics, Posters, 2011.
    • (2011) High Performance Graphics, Posters
    • Lee, W.-J.1
  • 36
    • 84655163339 scopus 로고    scopus 로고
    • A 1. 2 v 12. 8 GB/s 2 Gb Mobile Wide-I/O DRAM with 4x128 I/Os Using TSV Based Stacking
    • Jan.
    • J.-S. Kim et al., "A 1. 2 V 12. 8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4x128 I/Os Using TSV Based Stacking," IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 107-116, Jan. 2012.
    • (2012) IEEE J. Solid-State Circuits , vol.47 , Issue.1 , pp. 107-116
    • Kim, J.-S.1
  • 37
    • 79953177459 scopus 로고    scopus 로고
    • 1-Tbyte/s 1-Gbit DRAM Architecture Using 3-D Interconnect for High-Throughput Computing
    • Apr.
    • T. Sekiguchi et al., "1-Tbyte/s 1-Gbit DRAM Architecture Using 3-D Interconnect for High-Throughput Computing," IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 828-837, Apr. 2011.
    • (2011) IEEE J. Solid-State Circuits , vol.46 , Issue.4 , pp. 828-837
    • Sekiguchi, T.1
  • 39
    • 84860674864 scopus 로고    scopus 로고
    • A 1. 2V 23nm 6F2 4Gb DDR3 SDRAM with localbitline sense amplifier, hybrid LIO sense amplifier and dummy-less array architecture
    • K.-N. Lim et al., "A 1. 2V 23nm 6F2 4Gb DDR3 SDRAM with localbitline sense amplifier, hybrid LIO sense amplifier and dummy-less array architecture," in IEEE Intl. Solid-State Circuits Conference, 2012, pp. 42-44.
    • (2012) IEEE Intl. Solid-State Circuits Conference , pp. 42-44
    • Lim, K.-N.1
  • 40
    • 84860652243 scopus 로고    scopus 로고
    • A 1. 2V 30nm 3. 2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme
    • K. Sohn et al., "A 1. 2V 30nm 3. 2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme," in IEEE Intl. Solid-State Circuits Conference, 2012, pp. 38-40.
    • (2012) IEEE Intl. Solid-State Circuits Conference , pp. 38-40
    • Sohn, K.1
  • 41
    • 84892554902 scopus 로고    scopus 로고
    • Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device
    • M. Shevgoor et al., "Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device," in Intl. Symp. on Microarchitecture, 2013, pp. 198-209.
    • (2013) Intl. Symp. on Microarchitecture , pp. 198-209
    • Shevgoor, M.1
  • 43
    • 84881138473 scopus 로고    scopus 로고
    • Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems
    • J. Mukundan et al., "Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems," in Intl. Symp. on Computer Architecture, 2013, vol. 41, no. 3, pp. 48-59.
    • (2013) Intl. Symp. on Computer Architecture , vol.41 , Issue.3 , pp. 48-59
    • Mukundan, J.1
  • 44
    • 84881179047 scopus 로고    scopus 로고
    • Efficient virtual memory for big memory servers
    • A. Basu et al., "Efficient virtual memory for big memory servers," in Intl. Symp. on Computer Architecture, 2013, pp. 237-248.
    • (2013) Intl. Symp. on Computer Architecture , pp. 237-248
    • Basu, A.1
  • 45
    • 79951706157 scopus 로고    scopus 로고
    • Flexible and efficient instruction-grained run-time monitoring using on-chip reconfigurable fabric
    • D. Y. Deng et al., "Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric," in Intl. Symp. on Microarchitecture, 2010, pp. 137-148.
    • (2010) Intl. Symp. on Microarchitecture , pp. 137-148
    • Deng, D.Y.1
  • 46
    • 84921266818 scopus 로고    scopus 로고
    • Comparing different implementations of near data computing with in-memory mapreduce workloads
    • S. Pugsley et al., "Comparing Different Implementations of Near Data Computing with In-Memory MapReduce Workloads," IEEE Micro, vol. 34, no. 4, pp. 44-52, 2014.
    • (2014) IEEE Micro , vol.34 , Issue.4 , pp. 44-52
    • Pugsley, S.1
  • 47
    • 62749127314 scopus 로고    scopus 로고
    • Mapreduce for data intensive scientific analyses
    • J. Ekanayake et al., "MapReduce for Data Intensive Scientific Analyses," in IEEE Intl. Conf. on eScience, 2008, pp. 277-284.
    • (2008) IEEE Intl. Conf. on EScience , pp. 277-284
    • Ekanayake, J.1
  • 50
    • 84934332501 scopus 로고    scopus 로고
    • "CORAL benchmark codes," 2014. [Online]. Available: https://asc. llnl. gov/CORAL-benchmarks/.
    • (2014) CORAL Benchmark Codes
  • 51
    • 0029179077 scopus 로고
    • The SPLASH-2 programs: Characterization and methodological considerations
    • S. C. Woo et al., "The SPLASH-2 programs: characterization and methodological considerations," in Intl. Symp. on Computer Architecture, 1995, pp. 24-36.
    • (1995) Intl. Symp. on Computer Architecture , pp. 24-36
    • Woo, S.C.1
  • 52
    • 70649092154 scopus 로고    scopus 로고
    • Rodinia: A benchmark suite for heterogeneous computing
    • S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in Intl. Symp. on Workload Characterization, 2009, pp. 44-54.
    • (2009) Intl. Symp. on Workload Characterization , pp. 44-54
    • Che, S.1
  • 53
    • 84878608239 scopus 로고    scopus 로고
    • The McPAT framework for multicore and manycore architectures
    • Apr.
    • S. Li et al., "The McPAT Framework for Multicore and Manycore Architectures," ACM Trans. Archit. Code Optim., vol. 10, no. 1, pp. 1-29, Apr. 2013.
    • (2013) ACM Trans. Archit. Code Optim. , vol.10 , Issue.1 , pp. 1-29
    • Li, S.1
  • 54
    • 35248884474 scopus 로고    scopus 로고
    • ADRES: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix
    • B. Mei et al., "ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix," in Field Programmable Logic and Application, 2003, pp. 61-70.
    • (2003) Field Programmable Logic and Application , pp. 61-70
    • Mei, B.1
  • 56
    • 79951712762 scopus 로고    scopus 로고
    • ReMAP: A reconfigurable heterogeneous multicore architecture
    • M. A. Watkins and D. H. Albonesi, "ReMAP: A Reconfigurable Heterogeneous Multicore Architecture," in Intl. Symp. on Microarchitecture, 2010, pp. 497-508.
    • (2010) Intl. Symp. on Microarchitecture , pp. 497-508
    • Watkins, M.A.1    Albonesi, D.H.2
  • 57
    • 84859464490 scopus 로고    scopus 로고
    • The gem5 simulator
    • Aug.
    • N. Binkert et al., "The gem5 simulator," SIGARCH Comput. Arch. News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
    • (2011) SIGARCH Comput. Arch. News , vol.39 , Issue.2 , pp. 1-7
    • Binkert, N.1
  • 59
    • 77954995378 scopus 로고    scopus 로고
    • Understanding sources of inefficiency in generalpurpose chips
    • R. Hameed et al., "Understanding sources of inefficiency in generalpurpose chips," in Intl. Symp. on Computer Architecture, 2010, pp. 37-47.
    • (2010) Intl. Symp. on Computer Architecture , pp. 37-47
    • Hameed, R.1
  • 60
    • 79955908630 scopus 로고    scopus 로고
    • Efficient data streaming with on-chip accelerators: Opportunities and challenges
    • R. Hou et al., "Efficient data streaming with on-chip accelerators: Opportunities and challenges," in High Performance Computer Architecture, 2011, pp. 312-320.
    • (2011) High Performance Computer Architecture , pp. 312-320
    • Hou, R.1
  • 61
    • 84904470871 scopus 로고    scopus 로고
    • Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems
    • A. Farmahini-Farahani et al., "Energy-Efficient Reconfigurable Cache Architectures for Accelerator-Enabled Embedded Systems," in Intl. Symp. on Performance Analysis of Systems and Software, 2014, pp. 211-220.
    • (2014) Intl. Symp. on Performance Analysis of Systems and Software , pp. 211-220
    • Farmahini-Farahani, A.1
  • 62
    • 84893641728 scopus 로고    scopus 로고
    • A decade of reconfigurable computing: A visionary retrospective
    • R. Hartenstein, "A decade of reconfigurable computing: a visionary retrospective," in Design, Automation and Test in Europe, 2001, pp. 642-649.
    • (2001) Design, Automation and Test in Europe , pp. 642-649
    • Hartenstein, R.1
  • 63
    • 0033703884 scopus 로고    scopus 로고
    • CHIMAERA: A high-performance architecture with a tightly-coupled reconfigurable functional unit
    • Z. A. Ye et al., "CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit," in Intl. Symp. on Computer Architecture, 2000, pp. 225-235.
    • (2000) Intl. Symp. on Computer Architecture , pp. 225-235
    • Ye, Z.A.1
  • 65
    • 0034174174 scopus 로고    scopus 로고
    • The Garp architecture and C compiler
    • T. Callahan et al., "The Garp architecture and C compiler," Computer (Long. Beach. Calif)., vol. 33, no. 4, pp. 62-69, 2000.
    • (2000) Computer (Long. Beach. Calif) , vol.33 , Issue.4 , pp. 62-69
    • Callahan, T.1
  • 66
    • 84858781934 scopus 로고    scopus 로고
    • A resistive TCAM accelerator for data-intensive computing
    • Q. Guo et al., "A resistive TCAM accelerator for data-intensive computing," in Intl. Symp. on Microarchitecture, 2011, pp. 339-350.
    • (2011) Intl. Symp. on Microarchitecture , pp. 339-350
    • Guo, Q.1
  • 67
    • 84881119037 scopus 로고    scopus 로고
    • AC-DIMM: Associative computing with STTMRAM
    • Q. Guo et al., "AC-DIMM: associative computing with STTMRAM," in Intl. Symp. on Computer Architecture, 2013, vol. 41, no. 3, pp. 189-200.
    • (2013) Intl. Symp. on Computer Architecture , vol.41 , Issue.3 , pp. 189-200
    • Guo, Q.1
  • 69
    • 84934332502 scopus 로고    scopus 로고
    • A near-memory processor for vector, streaming and bit manipulation workloads
    • R. B. T. Mingliang Wei, Marc Snir, Josep Torrellas, "A near-memory processor for vector, streaming and bit manipulation workloads," in UIUC Tech. Report, 2005.
    • (2005) UIUC Tech. Report
    • Mingliang Wei, R.B.T.1    Snir, M.2    Torrellas, J.3
  • 70
    • 84865647673 scopus 로고    scopus 로고
    • Active memory controller
    • Jan.
    • Z. Fang et al., "Active memory controller," J. Supercomput., vol. 62, no. 1, pp. 510-549, Jan. 2012.
    • (2012) J. Supercomput. , vol.62 , Issue.1 , pp. 510-549
    • Fang, Z.1
  • 71
    • 84876588873 scopus 로고    scopus 로고
    • Hybrid memory cube (HMC)
    • J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hot Chips 23, 2011.
    • (2011) Hot Chips , vol.23
    • Pawlowski, J.T.1
  • 72
    • 52649125840 scopus 로고    scopus 로고
    • 3D-stacked memory architectures for multi-core processors
    • G. H. Loh, "3D-Stacked Memory Architectures for Multi-core Processors," in Intl. Symp. on Computer Architecture, 2008, pp. 453-464.
    • (2008) Intl. Symp. on Computer Architecture , pp. 453-464
    • Loh, G.H.1
  • 73
    • 84905460430 scopus 로고    scopus 로고
    • Row-buffer decoupling: A case for low-latency dram microarchitecture
    • S. O et al., "Row-Buffer Decoupling: A Case for Low-Latency DRAM Microarchitecture," in Intl. Symp. on Computer Architecture, 2014.
    • (2014) Intl. Symp. on Computer Architecture
    • So, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.