메뉴 건너뛰기




Volumn 29, Issue 2, 2018, Pages 420-434

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

Author keywords

convolutional neural networks; Hybrid memory cube; large scale deep learning; streaming floating point

Indexed keywords

BANDWIDTH; BUDGET CONTROL; COMPUTATION THEORY; CONVOLUTION; COST EFFECTIVENESS; DEEP LEARNING; DIGITAL ARITHMETIC; DYNAMIC RANDOM ACCESS STORAGE; EMBEDDED SYSTEMS; GEOMETRY; INTEGRATED CIRCUIT DESIGN; LEARNING SYSTEMS; NEURAL NETWORKS; RANDOM ACCESS STORAGE; STANDARDS; STORAGE ALLOCATION (COMPUTER);

EID: 85030640525     PISSN: 10459219     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPDS.2017.2752706     Document Type: Article
Times cited : (80)

References (61)
  • 4
    • 85034742569 scopus 로고    scopus 로고
    • A taxonomy of deep convolutional neural nets for computer vision
    • S. Srinivas, et al., "A taxonomy of deep convolutional neural nets for computer vision, "Frontiers Robot. AI, vol. 2, 2016, Art. no. 36.
    • (2016) Frontiers Robot. AI , vol.2
    • Srinivas, S.1
  • 5
    • 84933585162 scopus 로고    scopus 로고
    • Very deep convolutional networks for large-scale image recognition
    • vol. abs/1409.1556 [Online]. Available:
    • K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition, "CoRR, vol. abs/1409.1556, 2014. [Online]. Available: Http://arxiv.org/abs/1409.1556
    • (2014) CoRR
    • Simonyan, K.1    Zisserman, A.2
  • 6
    • 84892582758 scopus 로고    scopus 로고
    • Combining modality specific deep neural networks for emotion recognition in video
    • S. E. Kahou, et al., "Combining modality specific deep neural networks for emotion recognition in video, "in Proc. 15th ACM Int. Conf. Multimodal Interaction, 2013, pp. 543-550.
    • (2013) Proc. 15th ACM Int. Conf. Multimodal Interaction , pp. 543-550
    • Kahou, S.E.1
  • 8
    • 85031010255 scopus 로고    scopus 로고
    • Fathom: Reference workloads for modern deep learning methods
    • vol. abs/1608.06581 [Online]. Available:
    • R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, D. Brooks, "Fathom: Reference workloads for modern deep learning methods, "CoRR, vol. abs/1608.06581, 2016. [Online]. Available: Http://arxiv.org/abs/1608.06581
    • (2016) CoRR
    • Adolf, R.1    Rama, S.2    Reagen, B.3    Wei, G.-Y.4    Brooks, D.5
  • 9
    • 84887917957 scopus 로고    scopus 로고
    • Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations
    • Dec.
    • S. Carrillo, et al., "Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations, "IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 12, pp. 2451-2461, Dec. 2013.
    • (2013) IEEE Trans. Parallel Distrib. Syst. , vol.24 , Issue.12 , pp. 2451-2461
    • Carrillo, S.1
  • 10
    • 84939220433 scopus 로고    scopus 로고
    • Parallel architectures for learning the RTRN and Elman dynamic neural networks
    • Sep.
    • J. Bilski and J. Smolag, "Parallel architectures for learning the RTRN and Elman dynamic neural networks, "IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 9, pp. 2561-2570, Sep. 2015.
    • (2015) IEEE Trans. Parallel Distrib. Syst. , vol.26 , Issue.9 , pp. 2561-2570
    • Bilski, J.1    Smolag, J.2
  • 11
    • 84913580146 scopus 로고    scopus 로고
    • Caffe: Convolutional architecture for fast feature embedding
    • Y. Jia, et al., "Caffe: Convolutional architecture for fast feature embedding, "in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675-678.
    • (2014) Proc. 22nd ACM Int. Conf. Multimedia , pp. 675-678
    • Jia, Y.1
  • 15
    • 85001132445 scopus 로고    scopus 로고
    • Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
    • Nov.
    • C. Zhang, Z. Fang, P. Zhou, P. Pan, J. Cong, "Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks, "in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2016, pp. 1-8.
    • (2016) Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. , pp. 1-8
    • Zhang, C.1    Fang, Z.2    Zhou, P.3    Pan, P.4    Cong, J.5
  • 16
    • 84988443578 scopus 로고    scopus 로고
    • EIE: Efficient inference engine on compressed deep neural network
    • S. Han, et al., "EIE: Efficient inference engine on compressed deep neural network, "in Proc. 43rd Annu. Int. Symp. Comput. Archit., 2016, pp. 243-254.
    • (2016) Proc. 43rd Annu. Int. Symp. Comput. Archit. , pp. 243-254
    • Han, S.1
  • 17
    • 84933037461 scopus 로고    scopus 로고
    • A high-throughput neural network accelerator
    • May
    • T. Chen, et al., "A high-throughput neural network accelerator, "IEEE Micro, vol. 35, no. 3, pp. 24-32, May 2015.
    • (2015) IEEE Micro , vol.35 , Issue.3 , pp. 24-32
    • Chen, T.1
  • 18
    • 85034951063 scopus 로고    scopus 로고
    • A 803 GOp/s/W convolutional network accelerator
    • L. Cavigelli and L. Benini, "A 803 GOp/s/W convolutional network accelerator, "IEEE Trans. Circuits Syst. Video Technol., vol. PP, no. 99, p. 1, 2016, doi: 10.1109/TCSVT.2016.2592330.
    • (2016) IEEE Trans. Circuits Syst. Video Technol. , Issue.99 , pp. 1
    • Cavigelli, L.1    Benini, L.2
  • 19
    • 84994841295 scopus 로고    scopus 로고
    • ShiDianNao: Shifting vision processing closer to the sensor
    • Jun.
    • Z. Du, et al., "ShiDianNao: Shifting vision processing closer to the sensor, "SIGARCH Comput. Archit. News, vol. 43, no. 3, pp. 92-104, Jun. 2015.
    • (2015) SIGARCH Comput. Archit. News , vol.43 , Issue.3 , pp. 92-104
    • Du, Z.1
  • 20
    • 85050556811 scopus 로고    scopus 로고
    • Scaling deep learning on multiple in-memory processors
    • L. Xu, D. P. Zhang, N. Jayasena, "Scaling deep learning on multiple in-memory processors, "in 3rd Workshop Near-Data Process., 2015, http://www.cs.utah.edu/wondp/tentative.html
    • (2015) 3rd Workshop Near-Data Process.
    • Xu, L.1    Zhang, D.P.2    Jayasena, N.3
  • 22
    • 84988345727 scopus 로고    scopus 로고
    • PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
    • P. Chi, et al., "PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, "in Proc. 43rd Annu. Int. Symp. Comput. Architecture, 2016, pp. 27-39.
    • (2016) Proc. 43rd Annu. Int. Symp. Comput. Architecture , pp. 27-39
    • Chi, P.1
  • 23
    • 85040685601 scopus 로고    scopus 로고
    • TETRIS: Scalable and efficient neural network acceleration with 3D memory
    • Apr.
    • M. Gao, J. Pu, X. Yang, M. Horowitz, C. Kozyrakis, "TETRIS: Scalable and efficient neural network acceleration with 3D memory, "SIGARCH Comput. Archit. News, vol. 45, no. 1, pp. 751-764, Apr. 2017.
    • (2017) SIGARCH Comput. Archit. News , vol.45 , Issue.1 , pp. 751-764
    • Gao, M.1    Pu, J.2    Yang, X.3    Horowitz, M.4    Kozyrakis, C.5
  • 25
  • 26
    • 85040696322 scopus 로고    scopus 로고
    • Hybrid Memory Cube Specification 2.1
    • Hybrid Memory Cube Specification 2.1, Hybrid Memory Cube Consortium Std., 2015, http://hybridmemorycube.org/files/SiteDownloads/HMC-30GVSR-HMCC-Specification-Rev2.1-20151105.pdf
    • (2015) Hybrid Memory Cube Consortium Std.
  • 27
    • 84962791896 scopus 로고    scopus 로고
    • 256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers
    • Jan.
    • D. Kang, et al., "256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers, "in Proc. IEEE Int. Solid-State Circuits Conf., Jan. 2016, pp. 130-131.
    • (2016) Proc. IEEE Int. Solid-State Circuits Conf. , pp. 130-131
    • Kang, D.1
  • 28
    • 65949107549 scopus 로고    scopus 로고
    • Roofline: An insightful visual performance model for multicore architectures
    • Apr.
    • S. Williams, A. Waterman, D. Patterson, "Roofline: An insightful visual performance model for multicore architectures, "Commun. ACM, vol. 52, no. 4, pp. 65-76, Apr. 2009.
    • (2009) Commun. ACM , vol.52 , Issue.4 , pp. 65-76
    • Williams, S.1    Waterman, A.2    Patterson, D.3
  • 29
    • 84944735469 scopus 로고    scopus 로고
    • Cambridge MA USA: MIT Press [Online]. Available:
    • I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: Http://www.deeplearningbook.org.
    • (2016) Deep Learning
    • Goodfellow, I.1    Bengio, Y.2    Courville, A.3
  • 31
    • 85013813121 scopus 로고    scopus 로고
    • Deep residual learning for image recognition
    • vol. abs/1512.03385
    • K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition, "CoRR, vol. abs/1512.03385, 2015, http://arxiv.org/abs/1512.03385
    • (2015) CoRR
    • He, K.1    Zhang, X.2    Ren, S.3    Sun, J.4
  • 32
    • 85017428320 scopus 로고    scopus 로고
    • Identity mappings in deep residual networks
    • vol. abs/1603.05027
    • K. He, X. Zhang, S. Ren, J. Sun, "Identity mappings in deep residual networks, "CoRR, vol. abs/1603.05027, 2016, http://arxiv.org/abs/1603.05027
    • (2016) CoRR
    • He, K.1    Zhang, X.2    Ren, S.3    Sun, J.4
  • 37
    • 84962603967 scopus 로고    scopus 로고
    • In-place matrix transposition on GPUs
    • Mar.
    • J. Gmez-Luna, et al., "In-place matrix transposition on GPUs, "IEEE Tran. Parallel Distrib. Syst., vol. 27, no. 3, pp. 776-788, Mar. 2016.
    • (2016) IEEE Tran. Parallel Distrib. Syst. , vol.27 , Issue.3 , pp. 776-788
    • Gmez-Luna, J.1
  • 38
    • 84995478886 scopus 로고    scopus 로고
    • Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
    • Jan.
    • Y. H. Chen, T. Krishna, J. S. Emer, V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, "IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017.
    • (2017) IEEE J. Solid-State Circuits , vol.52 , Issue.1 , pp. 127-138
    • Chen, Y.H.1    Krishna, T.2    Emer, J.S.3    Sze, V.4
  • 40
    • 84988335986 scopus 로고    scopus 로고
    • Cambricon: An instruction set architecture for neural network
    • S. Liu, et al., "Cambricon: An instruction set architecture for neural network, "in Proc. 43rd Annu. Int. Symp. Comput. Archit., 2016, pp. 393-405.
    • (2016) Proc. 43rd Annu. Int. Symp. Comput. Archit. , pp. 393-405
    • Liu, S.1
  • 42
    • 84978518772 scopus 로고    scopus 로고
    • TensorFlow: Large-scale machine learning on heterogeneous distributed systems
    • vol. abs/1603.04467
    • M. Abadi, et al., "TensorFlow: Large-scale machine learning on heterogeneous distributed systems, "CoRR, vol. abs/1603.04467, 2016, http://arxiv.org/abs/1603.04467
    • (2016) CoRR
    • Abadi, M.1
  • 44
    • 84979900694 scopus 로고    scopus 로고
    • MLlib: Machine learning in apache spark
    • Jan.
    • X. Meng, et al., "MLlib: Machine learning in apache spark, "J. Mach. Learn. Res., vol. 17, no. 1, pp. 1235-1241, Jan. 2016.
    • (2016) J. Mach. Learn. Res. , vol.17 , Issue.1 , pp. 1235-1241
    • Meng, X.1
  • 45
    • 84962532980 scopus 로고    scopus 로고
    • HadoopCL2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications
    • Mar.
    • M. Grossman, M. Breternitz, V. Sarkar, "HadoopCL2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications, "IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 3, pp. 762-775, Mar. 2016.
    • (2016) IEEE Trans. Parallel Distrib. Syst. , vol.27 , Issue.3 , pp. 762-775
    • Grossman, M.1    Breternitz, M.2    Sarkar, V.3
  • 46
    • 84963787521 scopus 로고    scopus 로고
    • A survey of software techniques for using non-volatile memories for storage and main memory systems
    • May
    • S. Mittal and J. S. Vetter, "A survey of software techniques for using non-volatile memories for storage and main memory systems, "IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 5, pp. 1537-1550, May 2016.
    • (2016) IEEE Trans. Parallel Distrib. Syst. , vol.27 , Issue.5 , pp. 1537-1550
    • Mittal, S.1    Vetter, J.S.2
  • 47
    • 85014217863 scopus 로고    scopus 로고
    • A near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices
    • vol. abs/1608.08376 [Online]. Available:
    • M. Gautschi, et al., "A near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices, "CoRR, vol. abs/1608.08376, 2016. [Online]. Available: Http://arxiv.org/abs/1608.08376
    • (2016) CoRR
    • Gautschi, M.1
  • 48
    • 84954026518 scopus 로고    scopus 로고
    • A 60 GOPS/W,-1.8 V to 0.9 V body bias ULP cluster in 28nm UTBB FD-SOI technology
    • D. Rossi, et al., "A 60 GOPS/W,-1.8 V to 0.9 V body bias ULP cluster in 28nm UTBB FD-SOI technology, "Solid-State Electron., vol. 117, pp. 170-184, 2016.
    • (2016) Solid-State Electron. , vol.117 , pp. 170-184
    • Rossi, D.1
  • 53
    • 84937886247 scopus 로고    scopus 로고
    • Fast image scanning with deep max-pooling convolutional neural networks
    • vol. abs/1302.1700 [Online]. Available:
    • A. Giusti, D. C. Ciresan, J. Masci, L. M. Gambardella, J. Schmidhuber, "Fast image scanning with deep max-pooling convolutional neural networks, "CoRR, vol. abs/1302.1700, 2013. [Online]. Available: Http://arxiv.org/abs/1302.1700
    • (2013) CoRR
    • Giusti, A.1    Ciresan, D.C.2    Masci, J.3    Gambardella, L.M.4    Schmidhuber, J.5
  • 54
    • 85040656538 scopus 로고    scopus 로고
    • [Online]. Available:
    • Berkeley SoftFloat library. (2017). [Online]. Available: Http://www.jhauser.us/arithmetic/SoftFloat.html
    • (2017) Berkeley SoftFloat Library
  • 56
    • 84866544858 scopus 로고    scopus 로고
    • Hybrid memory cube new DRAM architecture increases density and performance
    • Jun.
    • J. Jeddeloh and B. Keeth, "Hybrid memory cube new DRAM architecture increases density and performance, "in Proc. Symp. VLSI Technol., Jun. 2012, pp. 87-88.
    • (2012) Proc. Symp. VLSI Technol. , pp. 87-88
    • Jeddeloh, J.1    Keeth, B.2
  • 57
    • 84883288792 scopus 로고    scopus 로고
    • A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects
    • Sep.
    • E. Azarkhish, I. Loi, L. Benini, "A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects, "IET Comput. Digital Techn., vol. 7, no. 5, pp. 191-199, Sep. 2013.
    • (2013) IET Comput. Digital Techn. , vol.7 , Issue.5 , pp. 191-199
    • Azarkhish, E.1    Loi, I.2    Benini, L.3
  • 58
    • 84898078721 scopus 로고    scopus 로고
    • 28Gb/s 560mW multi-standard SerDes with single-stage analog front-end and 14-tap decision-feedback equalizer in 28nm CMOS
    • Feb.
    • H. Kimura, et al., "28Gb/s 560mW multi-standard SerDes with single-stage analog front-end and 14-tap decision-feedback equalizer in 28nm CMOS, "in Proc. IEEE Int. Solid-State Circuits Conf. Digest Tech. Papers, Feb. 2014, pp. 38-39.
    • (2014) Proc. IEEE Int. Solid-State Circuits Conf. Digest Tech. Papers , pp. 38-39
    • Kimura, H.1
  • 60
    • 84962878072 scopus 로고    scopus 로고
    • A 1.2V 20nm 307GB/s HBM DRAM with at-speed wafer-level I/O test scheme and adaptive refresh considering temperature distribution
    • Jan.
    • K. Sohn, et al., "A 1.2V 20nm 307GB/s HBM DRAM with at-speed wafer-level I/O test scheme and adaptive refresh considering temperature distribution, "in 2016 Proc. IEEE Int. Solid-State Circuits Conf., Jan. 2016, pp. 316-317.
    • (2016) 2016 Proc. IEEE Int. Solid-State Circuits Conf. , pp. 316-317
    • Sohn, K.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.