SCOPUS 정보 검색 플랫폼

IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium

Volumn , Issue , 2009, Pages

A Cross-Input Adaptive Framework for GPU Program Optimizations

(3) Liu, Yixun a Zhang, Eddy Z a Shen, Xipeng a

a William and Mary (United States)

Author keywords

Cross input adaptation; CUDA; Empirical search; G ADAPT; GPU; Program optimizations

Indexed keywords

ADAPTIVE FRAMEWORK; ADAPTIVE OPTIMIZATION; GENERAL-PURPOSE COMPUTING; GPU PROGRAMMING; GPU PROGRAMS; GRAPHIC PROCESSING UNITS; HIGH QUALITY; INPUT ADAPTATION; NEW DIMENSIONS; NUMERICAL APPLICATIONS; OPTIMAL CONFIGURATIONS; PREDICTIVE MODELS; SINGLE-CHIP;

DISTRIBUTED PARAMETER NETWORKS; OPTIMIZATION; PARALLEL ARCHITECTURES;

COMPUTER GRAPHICS EQUIPMENT;

EID: 70450103746 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2009.5160988 Document Type: Conference Paper

Times cited : (108)

References (25)

1
- 84869690427
- NVIDIA CUDA
- NVIDIA CUDA. http://www.nvidia.com/cuda.

2
- 0037810283
- Online feedback-directed optimization of Java
- M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of Java. In Proceedings of ACM Conference on Object- Oriented Programming Systems, Languages and Applications, pages 111-129, 2002.
- (2002) Proceedings of ACM Conference on Object- Oriented Programming Systems, Languages and Applications , pp. 111-129
- Arnold, M.¹ Hind, M.² Ryder, B.G.³

3
- 57349180412
- M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 225-234, 2008.
- M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 225-234, 2008.

4
- 0030661485
- Optimizing matrixmultiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
- J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrixmultiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM International Conference on Supercomputing, pages 340-347, 1997.
- (1997) Proceedings of the ACM International Conference on Supercomputing , pp. 340-347
- Bilmes, J.¹ Asanovic, K.² Chin, C.-W.³ Demmel, J.⁴

5
- 0027041691
- Procedure cloning
- K. D. Cooper, M. W. Hall, and K. Kennedy. Procedure cloning. In Computer Languages, pages 96-105, 1992.
- (1992) Computer Languages , pp. 96-105
- Cooper, K.D.¹ Hall, M.W.² Kennedy, K.³

6
- 0030706481
- Dynamic feedback: An effective technique for adaptive computing
- Las Vegas, May
- P. Diniz and M. Rinard. Dynamic feedback: an effective technique for adaptive computing. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 71-84, Las Vegas, May 1997.
- (1997) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 71-84
- Diniz, P.¹ Rinard, M.²

7
- 57349184047
- Fast scan algorithms on graphics processors
- Y. Dotsenko, N. K. Govindaraju, P. Sloan, C. Boyd, and J. Manferdelli. Fast scan algorithms on graphics processors. In ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing, pages 205-213, 2008.
- (2008) ICS'08: Proceedings of the 22nd Annual International Conference on Supercomputing , pp. 205-213
- Dotsenko, Y.¹ Govindaraju, N.K.² Sloan, P.³ Boyd, C.⁴ Manferdelli, J.⁵

8
- 20744449792
- The design and implementation of FFTW3
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216-231, 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.G.²

9
- 51049101693
- Faster matrix-vector multiplication on GeForce 8800GTX
- N. Fujimoto. Faster matrix-vector multiplication on GeForce 8800GTX. In Proceedings of the Workshop on Large-Scale Parallel Processing (colocated with IPDPS), pages 1-8, 2008.
- (2008) Proceedings of the Workshop on Large-Scale Parallel Processing (colocated with IPDPS) , pp. 1-8
- Fujimoto, N.¹

10
- 70449894697
- High performance computing with CUDA
- M. Harris. High performance computing with CUDA. In Tutorial in IEEE SuperComputing, 2007.
- (2007) Tutorial in IEEE SuperComputing
- Harris, M.¹

11
- 0003684449
- Springer
- T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.
- (2001) The elements of statistical learning
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

12
- 1542501019
- Sparsity: Optimizationframework for sparse matrix kernels
- Eun-Jin Im, Katherine Yelick, and Richard Vuduc. Sparsity: Optimizationframework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135-158, 2004.
- (2004) Int. J. High Perform. Comput. Appl , vol.18 , Issue.1 , pp. 135-158
- Im, E.-J.¹ Yelick, K.² Vuduc, R.³

13
- 20744444866
- Telescoping languages: A system for automatic generation of domain languages
- Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(2):387-408, 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 387-408
- Kennedy, K.¹ Broom, B.² Chauhan, A.³ Fowler, R.⁴ Garvin, J.⁵ Koelbel, C.⁶ McCosh, C.⁷ Mellor-Crummey, J.⁸

14
- 35048854568
- S. Lee, T. Johnson, and R. Eigenmann. Cetus - an extensible compiler infrastructure for source-to-source transformation. In In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pages 539-553, 2003.
- S. Lee, T. Johnson, and R. Eigenmann. Cetus - an extensible compiler infrastructure for source-to-source transformation. In In Proceedings of the 16th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), pages 539-553, 2003.

15
- 67650568217
- Cross-input learning and discriminative prediction in evolvable virtual machine
- F. Mao and X. Shen. Cross-input learning and discriminative prediction in evolvable virtual machine. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2009.
- (2009) Proceedings of the International Symposium on Code Generation and Optimization (CGO)
- Mao, F.¹ Shen, X.²

16
- 0032676575
- Efficient incremental run-time specialization for free
- Atlanta, GA, May
- R. Marlet, C. Consel, and P. Boinot. Efficient incremental run-time specialization for free. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 281-292, Atlanta, GA, May 1999.
- (1999) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 281-292
- Marlet, R.¹ Consel, C.² Boinot, P.³

17
- 78651550268
- Scalable parallel programming with CUDA
- March/ April
- John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable parallel programming with CUDA. ACM Queue, pages 40-53, March/ April 2008.
- (2008) ACM Queue , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

18
- 70350759823
- Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
- A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. In SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pages 1-11, 2008.
- (2008) SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing , pp. 1-11
- Nukada, A.¹ Ogata, Y.² Endo, T.³ Matsuoka, S.⁴

19
- 19344368072
- SPIRAL: Code generation for DSP transforms
- M. Puschel, J.M.F. Moura, J.R. Johnson, D. Padua, M.M. Veloso, B.W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232-275, 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 232-275
- Puschel, M.¹ Moura, J.M.F.² Johnson, J.R.³ Padua, D.⁴ Veloso, M.M.⁵ Singer, B.W.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gacic, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

20
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73-82, 2008.
- (2008) PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

21
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu. Program optimization space pruning for a multithreaded GPU. In CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 195-204, 2008.
- (2008) CGO'08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.⁵ Stratton, J.A.⁶ Hwu, W.W.⁷

22
- 56849102474
- Efficient computation of sum-products on GPUs through software-managed cache
- June
- Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, and John D. Owens. Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, pages 309-318, June 2008.
- (2008) Proceedings of the 22nd ACM International Conference on Supercomputing , pp. 309-318
- Silberstein, M.¹ Schuster, A.² Geiger, D.³ Patney, A.⁴ Owens, J.D.⁵

23
- 31844454218
- A framework for adaptive algorithm selection in STAPL
- N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277-288, 2005.
- (2005) Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 277-288
- Thomas, N.¹ Tanase, G.² Tkachyshyn, O.³ Perdue, J.⁴ Amato, N.M.⁵ Rauchwerger, L.⁶

24
- 0034819518
- High-level adaptive program optimization with ADAPT
- Snowbird, Utah, June
- M. Voss and R. Eigenmann. High-level adaptive program optimization with ADAPT. In Proceedings of ACM Symposium on Principles and Practice of Parallel Programming, pages 93-102, Snowbird, Utah, June 2001.
- (2001) Proceedings of ACM Symposium on Principles and Practice of Parallel Programming , pp. 93-102
- Voss, M.¹ Eigenmann, R.²

25
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001.
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Whaley, R.C.¹ Petitet, A.² Dongarra, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.