SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn , Issue , 2008, Pages 309-318

Efficient computation of sum-products on GPUs through software-managed cache

(5) Silberstein, Mark a Schuster, Assaf a Geiger, Dan a Patney, Anjul b Owens, John D b

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

b UNIVERSITY OF CALIFORNIA (United States)

Author keywords

CUDA; GPGPU; Software managed cache; Sum product

Indexed keywords

ALGORITHMS; ARTIFICIAL INTELLIGENCE; COMPUTER NETWORKS; COMPUTERS; DATA STORAGE EQUIPMENT; DIGITAL COMMUNICATION SYSTEMS; IMAGE PROCESSING; INTELLIGENT CONTROL; PROGRAM PROCESSORS;

ACCESS PATTERNS; ANALYTICAL MODELS; BOUND ALGORITHMS; COMPLEX DATUMS; CUDA; DATA REUSES; DATA SETS; DIGITAL COMMUNICATIONS; EFFICIENT COMPUTATIONS; GENETIC ANALYSES; GENETIC DISEASES; GPGPU; GRAPHICS PROCESSING UNITS; L2 CACHES; MATRIX PRODUCTS; MEMORY ACCESSES; MULTI-DIMENSIONAL; PERFORMANCE ANALYSES; RANDOM DATUMS; SOFTWARE-MANAGED CACHE; SPEED-UP; SUM-PRODUCT;

COMPUTER SOFTWARE REUSABILITY;

EID: 56849102474 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1375527.1375572 Document Type: Conference Paper

Times cited : (76)

References (16)

1
- 56849104962
- A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor
- J. Balart, M. Gonzalez, X. Martorell, E. Ayguade, Z. Sura, T. Chen, T. Zhang, K. O'brien, and K. O'brien. A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. In LCPC'07: Proceedings of the 2007 Workshop on Languages and Compilers for Parallel Computing, 2007.
- (2007) LCPC'07: Proceedings of the 2007 Workshop on Languages and Compilers for Parallel Computing
- Balart, J.¹ Gonzalez, M.² Martorell, X.³ Ayguade, E.⁴ Sura, Z.⁵ Chen, T.⁶ Zhang, T.⁷ O'brien, K.⁸ O'brien, K.⁹

2
- 42649120679
- Ray Tracing on the Cell Processor
- Sept
- C. Benlhin, I. Wald, M. Scherbaum, and H. Friedrich. Ray Tracing on the Cell Processor. IEEE Symposium on Interactive Ray Tracing 2006, pages 15-23, Sept. 2006.
- (2006) IEEE Symposium on Interactive Ray Tracing 2006 , pp. 15-23
- Benlhin, C.¹ Wald, I.² Scherbaum, M.³ Friedrich, H.⁴

3
- 0033683314
- Application-specific memory management for embedded systems using software-controlled caches
- New York, NY, USA, ACM
- D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In DAC'00: Proceedings of the 37th Conference on Design Automation, pages 416-419, New York, NY, USA, 2000. ACM.
- (2000) DAC'00: Proceedings of the 37th Conference on Design Automation , pp. 416-419
- Chiou, D.¹ Jain, P.² Rudolph, L.³ Devadas, S.⁴

4
- 33646009337
- Optimizing Compiler for the CELL Processor
- Washington, DC, USA, IEEE Computer Society
- A. E. Eiehenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing Compiler for the CELL Processor. In PACT'05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 161-172, Washington, DC, USA, 2005. IEEE Computer Society.
- (2005) PACT'05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques , pp. 161-172
- Eiehenberger, A.E.¹ O'Brien, K.² O'Brien, K.³ Wu, P.⁴ Chen, T.⁵ Oden, P.H.⁶ Prener, D.A.⁷ Shepherd, J.C.⁸ So, B.⁹ Sura, Z.¹⁰ Wang, A.¹¹ Zhang, T.¹² Zhao, P.¹³ Gschwind, M.¹⁴

5
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 83, 2006.
- (2006) SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 83
- Fatahalian, K.¹ Horn, D.R.² Knight, T.J.³ Leem, L.⁴ Houston, M.⁵ Park, J.Y.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

6
- 78651269052
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
- Aug
- K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In Graphics Hardware 2004, pages 133-138, Aug. 2004.
- (2004) Graphics Hardware 2004 , pp. 133-138
- Fatahalian, K.¹ Sugerman, J.² Hanrahan, P.³

7
- 0038558013
- Exact genetic linkage computations for general pedigrees
- M. Fishelson and D. Geiger. Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(Suppl. 1):S189-S198. 2002.
- (2002) Bioinformatics , vol.18 , Issue.SUPPL. 1
- Fishelson, M.¹ Geiger, D.²

8
- 34548292052
- A memory model for scientific algorithms on graphics processors
- Nov
- N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha. A memory model for scientific algorithms on graphics processors. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, page 89, Nov. 2006.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing , pp. 89
- Govindaraju, N.K.¹ Larsen, S.² Gray, J.³ Manocha, D.⁴

9
- 46449123181
- IBM Corporation. Cell Broadband Engine Architecture. http://www.ibm.com/ techlib/techlib.nsf/techdocs.
- Cell Broadband Engine Architecture

10
- 34547500808
- Implicit and explicit optimizations for stencil computations
- New York, NY, USA, ACM
- S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In MSPC'06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness, pages 51-60, New York, NY, USA, 2006. ACM.
- (2006) MSPC'06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness , pp. 51-60
- Kamil, S.¹ Datta, K.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

11
- 57349190835
- Fast and small short vector SIMD matrix multiplication kernel for the synergistic processing element of the CELL processor
- University of Tennessee
- J. Kurzak, W. Alvaro, and J. Dongarra. Fast and small short vector SIMD matrix multiplication kernel for the synergistic processing element of the CELL processor. Technical Report LAPACK Working Note 189, University of Tennessee. 2007.
- (2007) Technical Report LAPACK Working Note , vol.189
- Kurzak, J.¹ Alvaro, W.² Dongarra, J.³

12
- 62949190469
- Jan. 2007
- NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, http://developer.nvidia.com/cuda, Jan. 2007.
- NVIDIA CUDA compute unified device architecture programming guide

13
- 33947588048
- A survey of general-purpose computation on graphics hardware
- J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, 2007.
- (2007) Computer Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.D.¹ Luebke, D.² Govindaraju, N.³ Harris, M.⁴ Krüger, J.⁵ Lefohn, A.E.⁶ Purcell, T.J.⁷

14
- 2942666109
- A new look at the generalized distributive law
- June
- P. Pakzad and V. Anantharam. A new look at the generalized distributive law. IEEE Transactions on Information Theory, 50(6):1132-1155, June 2004.
- (2004) IEEE Transactions on Information Theory , vol.50 , Issue.6 , pp. 1132-1155
- Pakzad, P.¹ Anantharam, V.²

15
- 77951558943
- A performance-oriented data parallel virtual machine for GPUs
- Aug
- M. Peercy, M. Segal, and D. Gerstmann. A performance-oriented data parallel virtual machine for GPUs. In ACM SIGGRAPH 2006 Conference Abstracts and Applications, Aug. 2006.
- (2006) ACM SIGGRAPH 2006 Conference Abstracts and Applications
- Peercy, M.¹ Segal, M.² Gerstmann, D.³

16
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27:3-35, 2001.
- (2001) Parallel Computing , vol.27 , pp. 3-35
- Whaley, R.¹ Petitet, A.² Dongarra, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.