SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 5898 LNCS, Issue , 2010, Pages 50-64

Loop transformation recipes for code generation and auto-tuning

(6) Hall, Mary a Chame, Jacqueline b Chen, Chun a Shin, Jaewook c Rudy, Gabe a Khan, Malik Murtaza b

a Department of Electrical and Computer Engineering (United States)

b UNIVERSITY OF SOUTHERN CALIFORNIA (United States)

c ARGONNE NATIONAL LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ABSTRACT INTERFACES; AUTO-TUNE; AUTOTUNING; CODE GENERATION; CODE TRANSFORMATION; CORRECT CODE; DECISION ALGORITHMS; HIGH LEVEL SPECIFICATION; HIGH-LEVEL INTERFACES; LOOP TRANSFORMATION; POLYHEDRAL FRAMEWORK; SOFTWARE DEVELOPER; TRANSFORMATION ALGORITHM;

COSINE TRANSFORMS; LINGUISTICS; NETWORK COMPONENTS; PARALLEL ARCHITECTURES; TUNING;

PROGRAM COMPILERS;

EID: 77954412565 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-13374-9_4 Document Type: Conference Paper

Times cited : (46)

References (49)

1
- 77954416366
- http://www.peri-scidac.org/wiki/index.php/Main-Page

2
- 77954398863
- http://rosecompiler.org/

3
- 77954416181
- http://www.gnu.org/prep/standards/html-node/Errors.html

4
- 77954390318
- http://nek5000.mcs.anl.gov/index.php/Main-Page

5
- 0033700781
- Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
- Ahmed, N., Mateev, N., Pingali, K.: Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In: Proceedings of the 2000 ACM International Conference on Supercomputing (May 2000)
- Proceedings of the 2000 ACM International Conference on Supercomputing (May 2000)
- Ahmed, N.¹ Mateev, N.² Pingali, K.³

6
- 4544380943
- Finding effective compilation sequences
- Almagor, L., Cooper, K.D., Grosul, A., Harvey, T.J., Reeves, S.W., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems, LCTES 2004 (June 2004)
- Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems, LCTES 2004 (June 2004)
- Almagor, L.¹ Cooper, K.D.² Grosul, A.³ Harvey, T.J.⁴ Reeves, S.W.⁵ Subramanian, D.⁶ Torczon, L.⁷ Waterman, T.⁸

7
- 77954415625
- LAPACK: A portable linear algebra library for high-performance computers
- Anderson, E., Sorensen, D., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Croz, J.D., Hammarling, S., Demmel, J., Bischof, C.H.: LAPACK: A portable linear algebra library for high-performance computers. In: Proceedings of Supercomputing 1990 (November 1990)
- Proceedings of Supercomputing 1990 (November 1990)
- Anderson, E.¹ Sorensen, D.² Bai, Z.³ Dongarra, J.⁴ Greenbaum, A.⁵ McKenney, A.⁶ Croz, J.D.⁷ Hammarling, S.⁸ Demmel, J.⁹ Bischof, C.H.¹⁰

8
- 0028549474
- Improving the ratio of memory operations to floating-point operations in loops
- Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems 16(6), 1768-1810 (1994)
- (1994) ACM Transactions on Programming Languages and Systems , vol.16 , Issue.6 , pp. 1768-1810
- Carr, S.¹ Kennedy, K.²

9
- 36048968626
- PhD thesis, University of Southern California May
- Chen, C.: Model-Guided Empirical Optimization for Memory Hierarchy. PhD thesis, University of Southern California (May 2007)
- (2007) Model-Guided Empirical Optimization for Memory Hierarchy
- Chen, C.¹

10
- 70449959487
- Technical Report 08-897, University of Southern California June
- Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California (June 2008)
- (2008) CHiLL: A Framework for Composing High-level Loop Transformations
- Chen, C.¹ Chame, J.² Hall, M.³

11
- 33646828918
- Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
- Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: Proceedings of the International Symposium on Code Generation and Optimization (March 2005)
- Proceedings of the International Symposium on Code Generation and Optimization (March 2005)
- Chen, C.¹ Chame, J.² Hall, M.W.³

12
- 0036679993
- Adaptive optimizing compilers for the 21st century
- DOI 10.1023/A:1015729001611
- Cooper, K.D., Subramanian, D., Torczon, L.: Adaptive optimizing compilers for the 21st century. The Journal of Supercomputing 23(1), 7-22 (2002) (Pubitemid 34772138)
- (2002) Journal of Supercomputing , vol.23 , Issue.1 , pp. 7-22
- Cooper, K.D.¹ Subramanian, D.² Torczon, L.³

13
- 43949129775
- A language for the compact representation of multiple program versions
- Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. Springer, Heidelberg
- Donadio, S., Brodman, J., Roeder, T., Yotov, K., Barthou, D., Cohen, A., Garzarán, M.J., Padua, D., Pingali, K.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136-151. Springer, Heidelberg (2006)
- (2006) LNCS , vol.4339 , pp. 136-151
- Donadio, S.¹ Brodman, J.² Roeder, T.³ Yotov, K.⁴ Barthou, D.⁵ Cohen, A.⁶ Garzarán, M.J.⁷ Padua, D.⁸ Pingali, K.⁹

14
- 20744449792
- The design and implementation of FFTW3
- Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation 93(2), 216-231 (2005)
- (2005) Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.G.²

15
- 33746593747
- Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
- Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34(3), 261-317 (2006)
- (2006) International Journal of Parallel Programming , vol.34 , Issue.3 , pp. 261-317
- Girbal, S.¹ Vasilache, N.² Bastoul, C.³ Cohen, A.⁴ Parello, D.⁵ Sigler, M.⁶ Temam, O.⁷

16
- 70449793159
- Annotation-based empirical performance tuning using Orio
- Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: Proceedings of the 23rd International Parallel and Distributed Processing Symposium (May 2009)
- Proceedings of the 23rd International Parallel and Distributed Processing Symposium (May 2009)
- Hartono, A.¹ Norris, B.² Sadayappan, P.³

17
- 35048886594
- Improving performance of hypermatrix cholesky factorization
- Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. Springer, Heidelberg
- Herrero, J.R., Navarro, J.J.: Improving performance of hypermatrix cholesky factorization. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 461-469. Springer, Heidelberg (2003)
- (2003) LNCS , vol.2790 , pp. 461-469
- Herrero, J.R.¹ Navarro, J.J.²

18
- 0038895757
- Register tiling in nonrectangular iteration spaces
- Jiménez, M., Llabería, J.M., Fernández, A.: Register tiling in nonrectangular iteration spaces. ACM Transactions on Programming Languages and Systems 24(4), 409-453 (2002)
- (2002) ACM Transactions on Programming Languages and Systems , vol.24 , Issue.4 , pp. 409-453
- Jiménez, M.¹ Llabería, J.M.² Fernández, A.³

19
- 58449097645
- Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes
- Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. Springer, Heidelberg
- Kaushik, D.K., Gropp, W., Minkoff, M., Smith, B.: Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 120-130. Springer, Heidelberg (2008)
- (2008) LNCS , vol.5374 , pp. 120-130
- Kaushik, D.K.¹ Gropp, W.² Minkoff, M.³ Smith, B.⁴

20
- 0004261309
- Technical Report CS-TR-3193, Department of Computer Science, University of Maryland
- Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical Report CS-TR-3193, Department of Computer Science, University of Maryland (1993)
- (1993) A Framework for Unifying Reordering Transformations
- Kelly, W.¹ Pugh, W.²

21
- 0034512401
- Combined selection of tile sizes and unroll factors using iterative compilation
- Kisuki, T., Knijnenburg, P.M.W., O'Boyle, M.F.P.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2000)
- Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2000)
- Kisuki, T.¹ Knijnenburg, P.M.W.² O'Boyle, M.F.P.³

22
- 0442295621
- The effect of cache models on iterative compilation for combined tiling and unrolling
- Knijnenburg, P.M.W., Kisuki, T., Gallivan, K., O'Boyle, M.F.P.: The effect of cache models on iterative compilation for combined tiling and unrolling. Concurrency and Computation: Practice and Experience 16(2-3), 247-270 (2004)
- (2004) Concurrency and Computation: Practice and Experience , vol.16 , Issue.2-3 , pp. 247-270
- Knijnenburg, P.M.W.¹ Kisuki, T.² Gallivan, K.³ O'Boyle, M.F.P.⁴

23
- 0030685988
- Data-centric multi-level blocking
- Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1997)
- Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1997)
- Kodukula, I.¹ Ahmed, N.² Pingali, K.³

24
- 23944512382
- Empirical optimization for a sparse linear solver: A case study
- Lee, Y., Diniz, P., Hall, M., Lucas, R.: Empirical optimization for a sparse linear solver: A case study. International Journal of Parallel Programming 33 (2005)
- (2005) International Journal of Parallel Programming , vol.33
- Lee, Y.¹ Diniz, P.² Hall, M.³ Lucas, R.⁴

25
- 85088330717
- Maximizing parallelism and minimizing synchronization with affine partitioning
- Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine partitioning. In: Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1997) (January 1997)
- Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1997) (January 1997)
- Lim, A.W.¹ Lam, M.S.²

26
- 0034823777
- Blocking and array contraction across arbitrarily nested loops using affine partitioning
- Lim, A.W., Liao, S.-W., Lam, M.S.: Blocking and array contraction across arbitrarily nested loops using affine partitioning. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (June 2001)
- ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (June 2001)
- Lim, A.W.¹ Liao, S.-W.² Lam, M.S.³

27
- 34247114368
- Combining analytical and empirical approaches in tuning matrix transposition
- Lu, Q., Krishnamoorthy, S., Sadaypppan, P.: Combining analytical and empirical approaches in tuning matrix transposition. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (September 2006)
- Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (September 2006)
- Lu, Q.¹ Krishnamoorthy, S.² Sadaypppan, P.³

28
- 0030190854
- Improving Data Locality with Loop Transformations
- McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424-453 (1996) (Pubitemid 126422522)
- (1996) ACM Transactions on Programming Languages and Systems , vol.18 , Issue.4 , pp. 424-453
- Mckinley, K.S.¹ Carr, S.² Tseng, C.-W.³

29
- 68849096760
- Generating empirically optimized composed matrix kernels from matlab prototypes
- Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science - ICCS 2009. Springer, Heidelberg
- Norris, B., Hartono, A., Jessup, E., Siek, J.: Generating empirically optimized composed matrix kernels from matlab prototypes. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science - ICCS 2009. LNCS, vol. 5544, pp. 248-258. Springer, Heidelberg (2009)
- (2009) LNCS , vol.5544 , pp. 248-258
- Norris, B.¹ Hartono, A.² Jessup, E.³ Siek, J.⁴

30
- 84871295761
- GRAPHITE: Polyhedral analyses and optimizations for GCC
- Pop, S., Cohen, A., Bastoul, C., Girbal, S., Silber, G.-A., Vasilache, N.: GRAPHITE: Polyhedral analyses and optimizations for GCC. In: Proceedings of the 4th GCC Developers' Summit (June 2006)
- Proceedings of the 4th GCC Developers' Summit (June 2006)
- Pop, S.¹ Cohen, A.² Bastoul, C.³ Girbal, S.⁴ Silber, G.-A.⁵ Vasilache, N.⁶

31
- 34547683700
- Iterative optimization in the polyhedral model: Part I, one-dimensional time
- Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: Part I, one-dimensional time. In: Proceedings of the International Symposium on Code Generation and Optimization (March 2007)
- Proceedings of the International Symposium on Code Generation and Optimization (March 2007)
- Pouchet, L.-N.¹ Bastoul, C.² Cohen, A.³ Cavazos, J.⁴

32
- 57349167317
- Iterative optimization in the polyhedral model: Part II, multi-dimensional time
- Pouchet, L.-N., Bastoul, C., Cohen, A., Vasilache, N.: Iterative optimization in the polyhedral model: Part II, multi-dimensional time. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2008)
- Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2008)
- Pouchet, L.-N.¹ Bastoul, C.² Cohen, A.³ Vasilache, N.⁴

33
- 84948990001
- Iteration space slicing for locality
- Carter, L., Ferrante, J. (eds.) LCPC 1999. Springer, Heidelberg
- Pugh, B., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, p. 164. Springer, Heidelberg (1999)
- (1999) LNCS , vol.1863 , pp. 164
- Pugh, B.¹ Rosser, E.²

34
- 14744298722
- Technical Report TR03-419, Rice University October
- Qasem, A., Jin, G., Mellor-Crummey, J.: Improving performance with integrated program transformations. Technical Report TR03-419, Rice University (October 2003)
- (2003) Improving Performance with Integrated Program Transformations
- Qasem, A.¹ Jin, G.² Mellor-Crummey, J.³

35
- 34547401051
- Profitable loop fusion and tiling using model-driven empirical search
- Qasem, A., Kennedy, K.: Profitable loop fusion and tiling using model-driven empirical search. In: Proceedings of the 2006 ACM International Conference on Supercomputing (June 2006)
- Proceedings of the 2006 ACM International Conference on Supercomputing (June 2006)
- Qasem, A.¹ Kennedy, K.²

36
- 63549093766
- A tuning framework for software-managed memory hierarchies
- Ren, M., Park, J.Y., Houston, M., Aiken, A., Dally, W.J.: A tuning framework for software-managed memory hierarchies. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2008)
- Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (October 2008)
- Ren, M.¹ Park, J.Y.² Houston, M.³ Aiken, A.⁴ Dally, W.J.⁵

37
- 0031622954
- Data transformations for eliminating conflict misses
- Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1998)
- Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1998)
- Rivera, G.¹ Tseng, C.-W.²

38
- 0026991030
- A general framework for iteration-reordering loop transformations
- Sarkar, V., Thekkath, R.: A general framework for iteration-reordering loop transformations. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1992)
- Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1992)
- Sarkar, V.¹ Thekkath, R.²

39
- 84891431362
- Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology
- Shin, J., Hall, M.W., Chame, J., Chen, C., Hovland, P.D.: Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology. In: The Fourth International Workshop on Automatic Performance Tuning (October 2009)
- The Fourth International Workshop on Automatic Performance Tuning (October 2009)
- Shin, J.¹ Hall, M.W.² Chame, J.³ Chen, C.⁴ Hovland, P.D.⁵

40
- 0027764718
- To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts
- Temam, O., Granston, E.D., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing 1993 (November 1993)
- Proceedings of Supercomputing 1993 (November 1993)
- Temam, O.¹ Granston, E.D.² Jalby, W.³

41
- 70449844310
- A scalable auto-tuning framework for compiler optimization
- Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 24th International Parallel and Distributed Processing Symposium (April 2009)
- Proceedings of the 24th International Parallel and Distributed Processing Symposium (April 2009)
- Tiwari, A.¹ Chen, C.² Chame, J.³ Hall, M.⁴ Hollingsworth, J.K.⁵

42
- 84938142043
- Terascale spectral element algorithms and implementations
- Tufo, H.M., Fischer, P.F.: Terascale spectral element algorithms and implementations. In: ACM/IEEE conference on Supercomputing, Portland, OR (1999)
- ACM/IEEE Conference on Supercomputing, Portland, OR (1999)
- Tufo, H.M.¹ Fischer, P.F.²

43
- 0343462141
- Automated empirical optimizations of software and the ATLAS project
- DOI 10.1016/S0167-8191(00)00087-9
- Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3-35 (2001) (Pubitemid 32264775)
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Clint Whaley, R.¹ Petitet, A.² Dongarra, J.J.³

44
- 33745158666
- Tuning high performance kernels through empirical compilation
- Clint Whaley, R., Whaley, D.B.: Tuning high performance kernels through empirical compilation. In: Proceedings of the 34th International Conference on Parallel Processing (June 2005)
- Proceedings of the 34th International Conference on Parallel Processing (June 2005)
- Clint Whaley, R.¹ Whaley, D.B.²

45
- 3142767727
- A data locality optimizing algorithm
- Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1991)
- Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (June 1991)
- Wolf, M.E.¹ Lam, M.S.²

46
- 0026232450
- A loop transformation theory and an algorithm to maximize parallelism
- Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2(4), 452-471 (1991)
- (1991) IEEE Transactions on Parallel and Distributed Systems , vol.2 , Issue.4 , pp. 452-471
- Wolf, M.E.¹ Lam, M.S.²

47
- 0026082301
- Data dependence and program restructuring
- Wolfe, M.: Data dependence and program restructuring. The Journal of Supercomputing 4(4), 321-344 (1991)
- (1991) The Journal of Supercomputing , vol.4 , Issue.4 , pp. 321-344
- Wolfe, M.¹

48
- 77954420946
- October
- Wolfe, M.: Compilers and more: Optimizing gpu kernels (October 2008), http://www.hpcwire.com/features/Compilers and More Optimizing GPU Kernels.html
- (2008) Compilers and More: Optimizing Gpu Kernels
- Wolfe, M.¹

49
- 34548765138
- POET: Parameterized optimizations for empirical tuning
- Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: parameterized optimizations for empirical tuning. In: Proceedings of the 21st International Parallel and Distributed Processing Symposium (March 2007)
- Proceedings of the 21st International Parallel and Distributed Processing Symposium (March 2007)
- Yi, Q.¹ Seymour, K.² You, H.³ Vuduc, R.⁴ Quinlan, D.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.