메뉴 건너뛰기




Volumn 24, Issue 4, 2002, Pages 409-453

Register tiling in nonrectangular iteration spaces

Author keywords

D.3.4 Programming Languages Processors; Data reuse; Locality; Loop optimization; Loop tiling; Measurement; Performance; Register level

Indexed keywords

ALGORITHMS; CODES (SYMBOLS); COMPUTER PROGRAMMING LANGUAGES; COMPUTER SOFTWARE REUSABILITY; DATA PROCESSING; HEURISTIC METHODS; ITERATIVE METHODS; LINEAR ALGEBRA; MICROPROCESSOR CHIPS; PROGRAM COMPILERS;

EID: 0038895757     PISSN: 01640925     EISSN: None     Source Type: Journal    
DOI: 10.1145/567097.567101     Document Type: Article
Times cited : (28)

References (55)
  • 1
    • 0004318530 scopus 로고
    • Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana
    • ABU-SUFAH, W. 1979. Improving the performance of virtual memory computers. Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana.
    • (1979) Improving the Performance of Virtual Memory Computers
    • Abu-Sufah, W.1
  • 6
    • 0038835469 scopus 로고
    • Technical report TR-94-42, Leiden University, Department of Mathematics & Computer Science, Laiden, The Netherlands
    • BIK, A. AND WIJSHOFF, H. 1994. Implementation of Fourier-Motzkin elimination. Technical report TR-94-42, Leiden University, Department of Mathematics & Computer Science, Laiden, The Netherlands.
    • (1994) Implementation of Fourier-Motzkin Elimination
    • Bik, A.1    Wijshoff, H.2
  • 9
    • 0000493064 scopus 로고
    • Estimating interlock and improving balance for pipelined architectures
    • Aug.
    • CALLAHAN, D., COCKE, J., AND KENNEDY, K. 1988. Estimating interlock and improving balance for pipelined architectures. J. Parallel Distrib. Comput. 5, 4 (Aug.), 334-358.
    • (1988) J. Parallel Distrib. Comput. , vol.5 , Issue.4 , pp. 334-358
    • Callahan, D.1    Cocke, J.2    Kennedy, K.3
  • 10
    • 0012951882 scopus 로고
    • Ph.D. thesis, Dept. of Computer Science, Rice University, Houston, TX
    • CARR, S. 1992. Memory-hierarchy management. Ph.D. thesis, Dept. of Computer Science, Rice University, Houston, TX.
    • (1992) Memory-hierarchy Management
    • Carr, S.1
  • 11
  • 13
    • 0028549474 scopus 로고
    • Improving the ratio of memory operations to floating-point operations in loops
    • Nov.
    • CARR, S. AND KENNEDY, K. 1994a. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16, 6 (Nov.), 1768-1810.
    • (1994) ACM Trans. Program. Lang. Syst. , vol.16 , Issue.6 , pp. 1768-1810
    • Carr, S.1    Kennedy, K.2
  • 14
    • 0028277074 scopus 로고
    • Scalar replacement in the presence of conditional control flow
    • Jan.
    • CARR, S. AND KENNEDY, K. 1994b. Scalar replacement in the presence of conditional control flow. Software Pract. Exp. 24, 1 (Jan.), 51-77.
    • (1994) Software Pract. Exp. , vol.24 , Issue.1 , pp. 51-77
    • Carr, S.1    Kennedy, K.2
  • 19
    • 0003929457 scopus 로고
    • Technical report UT-CS-90-108, Department of Computer Science, University of Tennessee, Knoxville
    • DONGARRA, J. AND SCHREIBER, R. 1990. Automatic blocking of nested loops. Technical report UT-CS-90-108, Department of Computer Science, University of Tennessee, Knoxville.
    • (1990) Automatic Blocking of Nested Loops
    • Dongarra, J.1    Schreiber, R.2
  • 21
    • 0003455775 scopus 로고
    • M.S. thesis, Dept. of Computer Science, Rice University, Houston, TX
    • ESSEGHIR, K. 1993. Improving data locality for caches. M.S. thesis, Dept. of Computer Science, Rice University, Houston, TX.
    • (1993) Improving Data Locality for Caches
    • Esseghir, K.1
  • 22
    • 85015240805 scopus 로고
    • On estimating and enhancing cache effectiveness
    • Proceeding, of the 4th International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Springer-Verlag, Santa Clara, CA
    • FERRANTE, J., SARKAR, V., AND THRASH, W. 1991. On estimating and enhancing cache effectiveness. In Proceeding, of the 4th International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Santa Clara, CA 328-343
    • (1991) Lecture Notes in Computer Science , vol.589 , pp. 328-343
    • Ferrante, J.1    Sarkar, V.2    Thrash, W.3
  • 23
    • 84972622535 scopus 로고
    • Impact of hierarchical memory systems on linear algebra algorithm design
    • Spring
    • GALLIVAN, K., JALBY, W., AND MEIER, U. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 1 (Spring), 12-48.
    • (1988) Int. J. Supercomput. Appl. , vol.2 , Issue.1 , pp. 12-48
    • Gallivan, K.1    Jalby, W.2    Meier, U.3
  • 24
    • 84862940593 scopus 로고
    • Strategies for cache and local memory management by global program transformations
    • ICS-87, Athens, Greece. Springer-Verlag, Berlin, Germany
    • GANNON D., JALBY, W., AND GALLIVAN, K. 1987. Strategies for cache and local memory management by global program transformations. In Proceedings of the 1st International Conference on Supercomputing (ICS-87, Athens, Greece). Springer-Verlag, Berlin, Germany.
    • (1987) Proceedings of the 1st International Conference on Supercomputing
    • Gannon, D.1    Jalby, W.2    Gallivan, K.3
  • 25
    • 0003783762 scopus 로고    scopus 로고
    • Ph.D. thesis, Dept. of Computer Science, Universitat Politècnica de Catalunya, Barcelona, Spain.
    • JIMÉNEZ, M. 1999. Multilevel tiling for non-rectangular iteration spaces. Ph.D. thesis, Dept. of Computer Science, Universitat Politècnica de Catalunya, Barcelona, Spain. (Available online at nttp://www.ac.upc.es/recerca/reports.)
    • (1999) Multilevel Tiling for Non-rectangular Iteration Spaces
    • Jiménez, M.1
  • 28
    • 0028459839 scopus 로고
    • DXML: A high-performance scientific subroutine library
    • Summer
    • KAMATH, C., HO R., AND MANLEY, D. P. 1994. DXML: A high-performance scientific subroutine library. Dig. Tech. J. 6, 3 (Summer), 44-56.
    • (1994) Dig. Tech. J. , vol.6 , Issue.3 , pp. 44-56
    • Kamath, C.1    Ho, R.2    Manley, D.P.3
  • 29
    • 0032025292 scopus 로고
    • Locality optimization algorithms for compilation of out-of-core codes
    • Mar.
    • KANDEMIR M., CHOUDHARY, A., RAMANUJAM, J., AND KANDASWAMY, M. 1988. Locality optimization algorithms for compilation of out-of-core codes. J. Inf. Sci. Eng. 14, 1 (Mar.), 107-138.
    • (1988) J. Inf. Sci. Eng. , vol.14 , Issue.1 , pp. 107-138
    • Kandemir, M.1    Choudhary, A.2    Ramanujam, J.3    Kandaswamy, M.4
  • 31
    • 0003363567 scopus 로고    scopus 로고
    • The 21264: A superscalar alpha processor with out-of-order execution
    • KELLER, J. 1996. The 21264: a superscalar alpha processor with out-of-order execution. Presentation at 1996 IEEE Microprocessor Forum. Slides available online at www.microprocessor.sscc.ru/alpha-21264/a264up1.html.
    • (1996) 1996 IEEE Microprocessor Forum
    • Keller, J.1
  • 36
    • 0026137116 scopus 로고
    • The cache performance and optimizations of blocked algorithms
    • Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV)
    • LAM, M. ROTHBERG, E., AND WOLF, M. 1991. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV). Comput. Architect. News 19, 2, 63-74.
    • (1991) Comput. Architect. News , vol.19 , Issue.2 , pp. 63-74
    • Lam, M.1    Rothberg, E.2    Wolf, M.3
  • 39
    • 0027694019 scopus 로고
    • Access normalization: Loop restructuring for NUMA computers
    • Nov.
    • LI, W. AND PINGALI, K. 1993. Access normalization: loop restructuring for NUMA computers. ACM Trans. Comput. Syst. 11, 4 (Nov.), 353-375.
    • (1993) ACM Trans. Comput. Syst. , vol.11 , Issue.4 , pp. 353-375
    • Li, W.1    Pingali, K.2
  • 41
    • 0030190854 scopus 로고    scopus 로고
    • Improving data locality with loop transformations
    • July
    • MCKINLEY, K. S., CARR, S., AND TSENG, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18, 4 (July), 424-453.
    • (1996) ACM Trans. Program. Lang. Syst. , vol.18 , Issue.4 , pp. 424-453
    • Mckinley, K.S.1    Carr, S.2    Tseng, C.-W.3
  • 42
    • 0032308685 scopus 로고    scopus 로고
    • Quantifying the multi-level nature of tiling interactions
    • MITCHELL, N., HOGSTEDT, K., CARTER, L., AND FERRANTE, J. 1998. Quantifying the multi-level nature of tiling interactions. J. Parallel Program. 26, 6, 641-670.
    • (1998) J. Parallel Program. , vol.26 , Issue.6 , pp. 641-670
    • Mitchell, N.1    Hogstedt, K.2    Carter, L.3    Ferrante, J.4
  • 47
    • 0027764718 scopus 로고
    • To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts
    • IEEE Computer Society Press, Silver Spring, MD
    • TEMAM, O., GRANSTON, E. D., AND JALBY, W. 1993. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93. IEEE Computer Society Press, Silver Spring, MD, 410-419.
    • (1993) Proceedings of Supercomputing '93 , pp. 410-419
    • Temam, O.1    Granston, E.D.2    Jalby, W.3
  • 48
  • 50
    • 0026232450 scopus 로고
    • A loop transformation theory and an algorithm to maximize parallelism
    • Oct.
    • WOLF, M. E. AND LAM, M. S. 1991b. A loop transformation theory and an algorithm to maximize parallelism. Trans. Parallel Distrib. Syst. 2, 4 (Oct.), 452-471.
    • (1991) Trans. Parallel Distrib. Syst. , vol.2 , Issue.4 , pp. 452-471
    • Wolf, M.E.1    Lam, M.S.2
  • 51
    • 0030379246 scopus 로고    scopus 로고
    • Combining loop transformations considering caches and scheduling
    • MICRO-96, Paris France. IEEE Computer Society Press, Los Alamitos, CA
    • WOLF, M. E., MAYDAN, D. E., AND CHEN, D.-K. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual International Symposium on Microarchitecture (MICRO-96, Paris France). IEEE Computer Society Press, Los Alamitos, CA, 274-286.
    • (1996) Proceedings of the 29th Annual International Symposium on Microarchitecture , pp. 274-286
    • Wolf, M.E.1    Maydan, D.E.2    Chen, D.-K.3
  • 53
    • 0024935630 scopus 로고
    • More iteration space tiling
    • ACM Press, New York, NY
    • WOLFE, M. 1989b. More iteration space tiling. In Proceedings of Supercomputing '89 ACM Press, New York, NY, 655-664.
    • (1989) Proceedings of Supercomputing '89 , pp. 655-664
    • Wolfe, M.1
  • 55
    • 0030129806 scopus 로고    scopus 로고
    • The MIPS R10000 superscalar microprocessor: Emphasizing concurrency and latency-hiding techniques to efficiently run large, real-world applications
    • Apr.
    • YEAGER, K. C. 1996. The MIPS R10000 superscalar microprocessor: emphasizing concurrency and latency-hiding techniques to efficiently run large, real-world applications IEEE Micro 16, 2 (Apr.), 28-40.
    • (1996) IEEE Micro , vol.16 , Issue.2 , pp. 28-40
    • Yeager, K.C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.