SCOPUS 정보 검색 플랫폼

2012 Innovative Parallel Computing, InPar 2012

Volumn , Issue , 2012, Pages

Policy-based tuning for performance portability and library co-optimization

(3) Merrill, Duane a Garland, Michael a Grimshaw, Andrew b

a NVIDIA (United States)

b UNIVERSITY OF VIRGINIA (United States)

Author keywords

auto tuning; library design; metaprogramming; Performance; performance portability; policy; software reuse

Indexed keywords

AUTOTUNING; LIBRARY DESIGNS; META PROGRAMMING; PERFORMANCE; PERFORMANCE PORTABILITY;

COMPUTER ARCHITECTURE; PARALLEL ARCHITECTURES; PUBLIC POLICY;

COMPUTER SOFTWARE REUSABILITY;

EID: 84870725376 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/InPar.2012.6339597 Document Type: Conference Paper

Times cited : (17)

References (32)

1
- 70450183916
- Understanding the efficiency of ray traversal on GPUs
- Aila, T. and Laine, S. 2009. Understanding the efficiency of ray traversal on GPUs. Proceedings of the Conference on High Performance Graphics 2009 (New York, NY, USA, 2009), 145-149.
- (2009) Proceedings of the Conference on High Performance Graphics 2009 (New York, NY, USA, 2009) , pp. 145-149
- Aila, T.¹ Laine, S.²

2
- 67650786281
- PetaBricks: A language and compiler for algorithmic choice
- Ansel, J. et al. 2009. PetaBricks: a language and compiler for algorithmic choice. Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation (New York, NY, USA, 2009), 38-49.
- (2009) Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2009) , pp. 38-49
- Ansel, J.¹

3
- 84887875867
- Accessed: 2011-08-25
- Back40 Computing: Fast and efficient software primitives for GPU computing: http://code.google.com/p/back40computing/. Accessed: 2011-08-25.
- Back40 Computing: Fast and Efficient Software Primitives for GPU Computing

4
- 84870697021
- Blelloch, G.E. et al. Solving linear recurrences with loop raking. 416-424.
- Solving Linear Recurrences with Loop Raking , pp. 416-424
- Blelloch, G.E.¹

5
- 84870721658
- Accessed: 2011-08-25
- CUDA: http://www.nvidia.com/object/cuda-home-new.html. Accessed: 2011-08-25.
- CUDA

6
- 70449710961
- Google Project Hosting: Accessed: 2011-07-12
- cudpp - CUDA Data Parallel Primitives Library - Google Project Hosting: http://code.google.com/p/cudpp/. Accessed: 2011-07-12.
- Cudpp - CUDA Data Parallel Primitives Library

7
- 0002806690
- OpenMP: An industry standard API for shared-memory programming
- Mar. 1998
- Dagum, L. and Menon, R. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering. 5, (Mar. 1998), 46-55.
- (1998) IEEE Computational Science and Engineering , vol.5 , pp. 46-55
- Dagum, L.¹ Menon, R.²

8
- 37549003336
- MapReduce: Simplified data processing on large clusters
- Jan. 2008
- Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM. 51, 1 (Jan. 2008), 107-113.
- (2008) Commun. ACM. , vol.51 , Issue.1 , pp. 107-113
- Dean, J.¹ Ghemawat, S.²

9
- 20744452904
- Self-Adapting Linear Algebra Algorithms and Software
- Feb. 2005
- Demmel, J. et al. 2005. Self-Adapting Linear Algebra Algorithms and Software. Proceedings of the IEEE. 93, 2 (Feb. 2005), 293-312.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 293-312
- Demmel, J.¹

10
- 0004045054
- Duxbury Press
- Devore, J. 1999. Applied statistics for engineers and scientists. Duxbury Press.
- (1999) Applied Statistics for Engineers and Scientists
- Devore, J.¹

11
- 34548207355
- Sequoia: Programming the memory hierarchy
- New York, NY, USA, 2006
- Fatahalian, K. et al. 2006. Sequoia: programming the memory hierarchy. Proceedings of the 2006 ACM/IEEE conference on Supercomputing (New York, NY, USA, 2006).
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
- Fatahalian, K.¹

12
- 0348209599
- A fast Fourier transform compiler
- Frigo, M. 1999. A fast Fourier transform compiler. Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation (New York, NY, USA, 1999), 169-180.
- (1999) Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (New York, NY, USA, 1999) , pp. 169-180
- Frigo, M.¹

13
- 84976721284
- MULTILISP: A language for concurrent symbolic computation
- Oct. 1985
- Halstead,Jr., R.H. 1985. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501-538.
- (1985) ACM Trans. Program. Lang. Syst. , vol.7 , Issue.4 , pp. 501-538
- Halstead Jr., R.H.¹

14
- 0003568839
- IEEE Computer Society 2009. IEEE Std 1076-2008 (Revision of IEEE Std 1076-2002)
- IEEE Computer Society 2009. IEEE Standard VHDL Language Reference Manual. IEEE Std 1076-2008 (Revision of IEEE Std 1076-2002). (2009), c1-626.
- (2009) IEEE Standard VHDL Language Reference Manual

15
- 84870669933
- PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation
- Sep. 2011
- Klöckner, A. et al. 2011. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing. (Sep. 2011).
- (2011) Parallel Computing
- Klöckner, A.¹

16
- 84866532295
- Technical Report #245. LAPACK Working Note
- Kurzak, J. et al. 2011. Autotuning GEMMs for Fermi. Technical Report #245. LAPACK Working Note.
- (2011) Autotuning GEMMs for Fermi
- Kurzak, J.¹

17
- 0003775174
- version 1.2. Lawrence-Livermore-National-Laboratory
- Mcgraw, J. et al. 1985. SISAL: Streams and iteration in a single assignment language, language reference manual version 1.2. Lawrence-Livermore- National-Laboratory.
- (1985) SISAL: Streams and Iteration in A Single Assignment Language, Language Reference Manual
- Mcgraw, J.¹

18
- 84870684894
- University of Virginia
- Merrill, D. 2011. Allocation-oriented Algorithm Design with Application to GPU Computing. University of Virginia.
- (2011) Allocation-oriented Algorithm Design with Application to GPU Computing
- Merrill, D.¹

19
- 79959718248
- High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing
- 2011
- Merrill, D. and Grimshaw, A. 2011. High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters. 21, 02 (2011), 245-272.
- (2011) Parallel Processing Letters , vol.21 , Issue.2 , pp. 245-272
- Merrill, D.¹ Grimshaw, A.²

20
- 78149268496
- Technical Report #CS2009-14. Department of Computer Science, University of Virginia
- Merrill, D. and Grimshaw, A. 2009. Parallel Scan for Stream Architectures. Technical Report #CS2009-14. Department of Computer Science, University of Virginia.
- (2009) Parallel Scan for Stream Architectures
- Merrill, D.¹ Grimshaw, A.²

21
- 67650661447
- Accessed: 2009-12-12
- Optimizing parallel reduction in CUDA: 2007. http://developer.download. nvidia.com/compute/cuda/1-1/Website/projects/reduction/doc/reduction.pdf. Accessed: 2009-12-12.
- (2007) Optimizing Parallel Reduction in CUDA

22
- 49049088756
- GPU Computing
- May. 2008
- Owens, J.D. et al. 2008. GPU Computing. Proceedings of the IEEE. 96, 5 (May. 2008), 879-899.
- (2008) Proceedings of the IEEE. , vol.96 , Issue.5 , pp. 879-899
- Owens, J.D.¹

23
- 19344368072
- SPIRAL: Code Generation for DSP Transforms
- Feb. 2005
- Puschel, M. et al. 2005. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE. 93, 2 (Feb. 2005), 232-275.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 232-275
- Puschel, M.¹

24
- 33749908081
- Classes of Recursively Enumerable Sets and Their Decision Problems
- 1953
- Rice, H.G. 1953. Classes of Recursively Enumerable Sets and Their Decision Problems. Transactions of the American Mathematical Society. 74, 2 (1953), pp. 358-366.
- (1953) Transactions of the American Mathematical Society , vol.74 , Issue.2 , pp. 358-366
- Rice, H.G.¹

25
- 0004138172
- MIT Press
- Rogers, H. 1987. Theory of recursive functions and effective computability. MIT Press.
- (1987) Theory of Recursive Functions and Effective Computability
- Rogers, H.¹

26
- 79952576869
- A programming language interface to describe transformations and code generation
- Rudy, G. et al. 2011. A programming language interface to describe transformations and code generation. Proceedings of the 23rd international conference on Languages and compilers for parallel computing (Berlin, Heidelberg, 2011), 136-150.
- (2011) Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing (Berlin, Heidelberg, 2011) , pp. 136-150
- Rudy, G.¹

27
- 84870714547
- Google Project Hosting: Accessed: 2011-08-25
- Thrust - Code at the speed of light - Google Project Hosting: http://code.google.com/p/thrust/. Accessed: 2011-08-25.
- Thrust - Code at the Speed of Light

28
- 70449844310
- A scalable auto-tuning framework for compiler optimization
- Tiwari, A. et al. 2009. A scalable auto-tuning framework for compiler optimization. Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (Washington, DC, USA, 2009), 1-12.
- (2009) Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing (Washington, DC, USA, 2009) , pp. 1-12
- Tiwari, A.¹

29
- 84934313374
- Task management for irregular-parallel workloads on the GPU
- Tzeng, S. et al. 2010. Task management for irregular-parallel workloads on the GPU. Proceedings of the Conference on High Performance Graphics (Aire-la-Ville, Switzerland, Switzerland, 2010), 29-37.
- (2010) Proceedings of the Conference on High Performance Graphics (Aire-la-Ville, Switzerland, Switzerland, 2010) , pp. 29-37
- Tzeng, S.¹

30
- 0025467711
- A bridging model for parallel computation
- Aug. 1990
- Valiant, L.G. 1990. A bridging model for parallel computation. Commun. ACM. 33, 8 (Aug. 1990), 103-111.
- (1990) Commun. ACM. , vol.33 , Issue.8 , pp. 103-111
- Valiant, L.G.¹

31
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- Volkov, V. and Demmel, J.W. 2008. Benchmarking GPUs to tune dense linear algebra. Proceedings of the 2008 ACM/IEEE conference on Supercomputing (Piscataway, NJ, USA, 2008), 31:1-31:11.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Piscataway, NJ, USA, 2008)
- Volkov, V.¹ Demmel, J.W.²

32
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- Jan. 2005
- Vuduc, R. et al. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series. 16, (Jan. 2005), 521-530.
- (2005) Journal of Physics: Conference Series , vol.16 , pp. 521-530
- Vuduc, R.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.