메뉴 건너뛰기




Volumn 2016-April, Issue , 2016, Pages 14-26

TABLA: A unified template-based framework for accelerating statistical machine learning

Author keywords

[No Author keywords available]

Indexed keywords

ACCELERATION; ALGORITHMS; ARM PROCESSORS; ARTIFICIAL INTELLIGENCE; COMPUTER ARCHITECTURE; COMPUTER HARDWARE; COMPUTER PROGRAMMING LANGUAGES; FIELD PROGRAMMABLE GATE ARRAYS (FPGA); GENERAL PURPOSE COMPUTERS; HARDWARE; HIGH LEVEL LANGUAGES; INTEGRATED CIRCUIT DESIGN; LEARNING SYSTEMS; LOGIC SYNTHESIS; OPTIMIZATION; PROGRAM PROCESSORS; RECONFIGURABLE HARDWARE; SUPERCOMPUTERS;

EID: 84965008656     PISSN: 15300897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/HPCA.2016.7446050     Document Type: Conference Paper
Times cited : (163)

References (58)
  • 5
    • 84860270793 scopus 로고    scopus 로고
    • CPU DB: Recording microprocessor history
    • April
    • Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, and Mark Horowitz. CPU DB: Recording microprocessor history. ACM Queue, 10(4):10:10-10:27, April 2012.
    • (2012) ACM Queue , vol.10 , Issue.4 , pp. 1010-1027
    • Danowitz, A.1    Kelley, K.2    Mao, J.3    Stevenson, J.P.4    Horowitz, M.5
  • 8
    • 79955890625 scopus 로고    scopus 로고
    • Dynamically specialized datapaths for energy efficient computing
    • Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011.
    • (2011) HPCA
    • Govindaraju, V.1    Ho, C.-H.2    Sankaralingam, K.3
  • 9
    • 84858776502 scopus 로고    scopus 로고
    • QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores
    • Ganesh Venkatesh, John Sampson, Nathan Goulding, Sravanthi Kota Venkata, Steven Swanson, and Michael Taylor. QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In MICRO, 2011.
    • (2011) MICRO
    • Venkatesh, G.1    Sampson, J.2    Goulding, N.3    Venkata, S.K.4    Swanson, S.5    Taylor, M.6
  • 10
    • 84863374615 scopus 로고    scopus 로고
    • Bundled execution of recurring traces for energy-efficient general purpose processing
    • Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. Bundled execution of recurring traces for energy-efficient general purpose processing. In MICRO, 2011.
    • (2011) MICRO
    • Gupta, S.1    Feng, S.2    Ansari, A.3    Mahlke, S.4    August, D.5
  • 22
    • 0010615068 scopus 로고
    • Neurocomputing using the MasPar MP-1
    • K. W. Przytula and V. K. Prasnna, editors chapter 2 Prentice-Hall
    • Kamil A. Grajski. Neurocomputing, using the MasPar MP-1. In K. W. Przytula and V. K. Prasnna, editors, Parallel Digital Implementations of Neural Networks, chapter 2, pages 51-76. Prentice-Hall, 1993.
    • (1993) Parallel Digital Implementations of Neural Networks , pp. 51-76
    • Kamil, A.1    Grajski2
  • 23
    • 84876591853 scopus 로고    scopus 로고
    • Neural acceleration for general-purpose approximate programs
    • Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. Neural acceleration for general-purpose approximate programs. In MICRO, 2012.
    • (2012) MICRO
    • Esmaeilzadeh, H.1    Sampson, A.2    Ceze, L.3    Burger, D.4
  • 24
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Yann Lecun, LÃl'on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278-2324, 1998.
    • (1998) Proceedings of the IEEE , pp. 2278-2324
    • LeCun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 25
    • 84965003162 scopus 로고    scopus 로고
    • Nvidia. Jetson
    • Nvidia. Jetson. http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html, 2015.
    • (2015)
  • 28
    • 84874092321 scopus 로고    scopus 로고
    • Model-driven level 3 BLAS performance optimization on loongson 3A processor
    • Zhang Xianyi, Wang Qian, and Zhang Yunquan. Model-driven level 3 BLAS performance optimization on loongson 3A processor. In ICPADS, 2012.
    • (2012) ICPADS
    • Xianyi, Z.1    Qian, W.2    Yunquan, Z.3
  • 29
    • 84863614151 scopus 로고    scopus 로고
    • Factorization machines with libFM
    • May
    • Steffen Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1-57:22, May 2012.
    • (2012) ACM Trans. Intell. Syst. Technol. , vol.3 , Issue.3 , pp. 571-5722
    • Rendle, S.1
  • 30
    • 79955702502 scopus 로고    scopus 로고
    • Libsvm: A library for support vector machines
    • May
    • Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1-27:27, May 2011.
    • (2011) ACM Trans. Intell. Syst. Technol. , vol.2 , Issue.3 , pp. 271-2727
    • Chang, C.-C.1    Lin, C.-J.2
  • 35
    • 84885624310 scopus 로고    scopus 로고
    • Parallel architectures for the knn classifier-design of soft IP cores and FPGA implementations
    • September
    • Ioannis Stamoulias and Elias S. Manolakos. Parallel architectures for the knn classifier-design of soft IP cores and FPGA implementations. ACM Trans. Embed. Comput. Syst., 13(2):22:1-22:21, September 2013.
    • (2013) ACM Trans. Embed. Comput. Syst. , vol.13 , Issue.2 , pp. 221-2221
    • Stamoulias, I.1    Manolakos, E.S.2
  • 36
    • 77955985658 scopus 로고    scopus 로고
    • IP-cores design for the knn classifier
    • May
    • E.S. Manolakos and I. Stamoulias. IP-cores design for the knn classifier. In ISCAS, May 2010.
    • (2010) ISCAS
    • Manolakos, E.S.1    Stamoulias, I.2
  • 38
    • 34047242620 scopus 로고    scopus 로고
    • Real-time K-Means clustering for color images on reconfigurable hardware
    • Tsutomu Maruyama. Real-time K-Means clustering for color images on reconfigurable hardware. In ICPR, pages 816-819, 2006.
    • (2006) ICPR , pp. 816-819
    • Maruyama, T.1
  • 40
    • 77954269943 scopus 로고    scopus 로고
    • A heterogeneous FPGA architecture for support vector machine training
    • May
    • M. Papadonikolakis and C. Bouganis. A heterogeneous FPGA architecture for support vector machine training. In FCCM, May 2010.
    • (2010) FCCM
    • Papadonikolakis, M.1    Bouganis, C.2
  • 42
    • 79953123438 scopus 로고    scopus 로고
    • An energy-efficient heterogeneous system for embedded learning and classification
    • March
    • A. Majumdar, S. Cadambi, and S.T. Chakradhar. An energy-efficient heterogeneous system for embedded learning and classification. Embedded Systems Letters, IEEE, 3(1):42-45, March 2011.
    • (2011) Embedded Systems Letters IEEE , vol.3 , Issue.1 , pp. 42-45
    • Majumdar, A.1    Cadambi, S.2    Chakradhar, S.T.3
  • 43
    • 84859452113 scopus 로고    scopus 로고
    • A massively parallel, energy efficient programmable accelerator for learning and classification
    • March
    • Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, and Hans Peter Graf. A massively parallel, energy efficient programmable accelerator for learning and classification. ACM Trans. Archit. Code Optim., 9(1):6:1-6:30, March 2012.
    • (2012) ACM Trans. Archit. Code Optim. , vol.9 , Issue.1 , pp. 61-630
    • Majumdar, A.1    Cadambi, S.2    Becchi, M.3    Chakradhar, S.T.4    Graf, H.P.5
  • 45
    • 84897780584 scopus 로고    scopus 로고
    • DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning
    • Tianshi Chen, Zidong Du, Ninghui Sun, JiaWang, ChengyongWu, Yunji Chen, and Olivier Temam. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014.
    • (2014) ASPLOS
    • Chen, T.1    Du, Z.2    Sun, N.3    Wang, J.4    Wu, C.5    Chen, Y.6    Temam, O.7
  • 48
    • 79961187689 scopus 로고    scopus 로고
    • A hardware acceleration technique for gradient descent and conjugate gradient
    • June
    • D. Kesler, B. Deka, and R. Kumar. A hardware acceleration technique for gradient descent and conjugate gradient. In SASP, June 2011.
    • (2011) SASP
    • Kesler, D.1    Deka, B.2    Kumar, R.3
  • 49
    • 79953230605 scopus 로고    scopus 로고
    • Constantinides. A high throughput FPGAbased floating point conjugate gradient implementation for dense matrices
    • January
    • Antonio Roldao and George A. Constantinides. A high throughput FPGAbased floating point conjugate gradient implementation for dense matrices. ACM Trans. Reconfigurable Technol. Syst., 3(1):1:1-1:19, January 2010.
    • (2010) ACM Trans. Reconfigurable Technol. Syst. , vol.3 , Issue.1 , pp. 11-119
    • Roldao, A.1    George, A.2
  • 50
    • 34147131364 scopus 로고    scopus 로고
    • A hybrid approach for mapping conjugate gradient onto an fpga-augmented reconfigurable supercomputer
    • April
    • G.R. Morris, V.K. Prasanna, and R.D.,erson. A hybrid approach for mapping conjugate gradient onto an fpga-augmented reconfigurable supercomputer. In FCCM, April 2006.
    • (2006) FCCM
    • Morris, G.R.1    Prasanna, V.K.2    Erson, R.D.3
  • 51
    • 60349119698 scopus 로고    scopus 로고
    • An implementation of the conjugate gradient algorithm on fpgas
    • April
    • D. DuBois, A. DuBois, T. Boorman, C. Connor, and S. Poole. An implementation of the conjugate gradient algorithm on fpgas. In FCCM, April 2008.
    • (2008) FCCM
    • DuBois, D.1    DuBois, A.2    Boorman, T.3    Connor, C.4    Poole, S.5
  • 52
    • 77954069604 scopus 로고    scopus 로고
    • FPGA implementation of kNN classifier based on wavelet transform and partial distance search
    • Yao-Jung Yeh, Hui-Ya Li,Wen-Jyi Hwang, and Chiung-Yao Fang. FPGA implementation of kNN classifier based on wavelet transform and partial distance search. In SCIA, 2007.
    • (2007) SCIA
    • Yeh, Y.-J.1    Li, H.-Y.2    Hwang, W.-J.3    Fang, C.-Y.4
  • 53
    • 54949115901 scopus 로고    scopus 로고
    • CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures
    • Andrew R. Putnam, Dave Bennett, Eric Dellinger, Jeff Mason, and Prasanna Sundararajan. CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures. In FPGA, 2008.
    • (2008) FPGA
    • Putnam, A.R.1    Bennett, D.2    Dellinger, E.3    Mason, J.4    Sundararajan, P.5
  • 55
    • 84912524416 scopus 로고    scopus 로고
    • A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication
    • IEEE, May
    • Jeremy Fowers, Kalin Ovtcharov, Karin Strauss, Eric Chung, and Greg Stitt. A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication. In FCCM. IEEE, May 2014.
    • (2014) FCCM
    • Fowers, J.1    Ovtcharov, K.2    Strauss, K.3    Chung, E.4    Stitt, G.5
  • 56
    • 79952918458 scopus 로고    scopus 로고
    • CoRAM: An in-fabric memory architecture for fpga-based computing
    • Eric S. Chung, James C. Hoe, and Ken Mai. CoRAM: An in-fabric memory architecture for fpga-based computing. In FPGA, 2011.
    • (2011) FPGA
    • Chung, E.S.1    Hoe, J.C.2    Mai, K.3
  • 57
    • 84881142714 scopus 로고    scopus 로고
    • LINQits: Big data on little clients
    • Eric S. Chung, John D. Davis, and Jaewon Lee. LINQits: Big data on little clients. In ISCA, 2013.
    • (2013) ISCA
    • Chung, E.S.1    Davis, J.D.2    Lee, J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.