-
1
-
-
0004318530
-
-
Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana
-
ABU-SUFAH, W. 1979. Improving the performance of virtual memory computers. Ph.D. thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana.
-
(1979)
Improving the Performance of Virtual Memory Computers
-
-
Abu-Sufah, W.1
-
2
-
-
84976766536
-
Scanning polyhedra with DO loops
-
Williamsburg, VA ACM Press, New York, NY
-
ANCOURT, C. AND IRIGOIN, F. 1991. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (Williamsburg, VA) ACM Press, New York, NY, 39-50.
-
(1991)
Proceedings of the 3rd ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming
, pp. 39-50
-
-
Ancourt, C.1
Irigoin, F.2
-
3
-
-
0003706460
-
-
SIAM Press, Philadelphia, PA
-
ANDERSON, E., BAI, Z., BISCHOF, C., DEMMEL, J., DONGARRA, J., DUCROZ, J., GRENNBAUM, A., HAMMARLING, S., MCKENNEY, A., OSTROUCHOV, S., AND SORENSEN, D. 1988. LAPACK Users' Guide, 2nd ed. SIAM Press, Philadelphia, PA.
-
(1988)
LAPACK Users' Guide, 2nd Ed.
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Demmel, J.4
Dongarra, J.5
Ducroz, J.6
Grennbaum, A.7
Hammarling, S.8
McKenney, A.9
Ostrouchov, S.10
Sorensen, D.11
-
6
-
-
0038835469
-
-
Technical report TR-94-42, Leiden University, Department of Mathematics & Computer Science, Laiden, The Netherlands
-
BIK, A. AND WIJSHOFF, H. 1994. Implementation of Fourier-Motzkin elimination. Technical report TR-94-42, Leiden University, Department of Mathematics & Computer Science, Laiden, The Netherlands.
-
(1994)
Implementation of Fourier-Motzkin Elimination
-
-
Bik, A.1
Wijshoff, H.2
-
7
-
-
0028591436
-
(pen)-ultimate tiling?
-
IEEE Computer Society Press, Silver Spring, MD
-
BOULET, P., DARTE, A., RISSET, T., AND ROBERT, Y. 1994. (pen)-ultimate tiling? In Proceedings of the Scalable High-Performance Computing Conference. IEEE Computer Society Press, Silver Spring, MD, 568-576.
-
(1994)
Proceedings of the Scalable High-Performance Computing Conference
, pp. 568-576
-
-
Boulet, P.1
Darte, A.2
Risset, T.3
Robert, Y.4
-
8
-
-
0025447908
-
Improving register allocation for subscripted variables
-
PLDI-90. ACM Press, New York, NY
-
CALLAHAN, D., CARR, S., AND KENNEDY, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-90). ACM Press, New York, NY, 53-65.
-
(1990)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 53-65
-
-
Callahan, D.1
Carr, S.2
Kennedy, K.3
-
9
-
-
0000493064
-
Estimating interlock and improving balance for pipelined architectures
-
Aug.
-
CALLAHAN, D., COCKE, J., AND KENNEDY, K. 1988. Estimating interlock and improving balance for pipelined architectures. J. Parallel Distrib. Comput. 5, 4 (Aug.), 334-358.
-
(1988)
J. Parallel Distrib. Comput.
, vol.5
, Issue.4
, pp. 334-358
-
-
Callahan, D.1
Cocke, J.2
Kennedy, K.3
-
10
-
-
0012951882
-
-
Ph.D. thesis, Dept. of Computer Science, Rice University, Houston, TX
-
CARR, S. 1992. Memory-hierarchy management. Ph.D. thesis, Dept. of Computer Science, Rice University, Houston, TX.
-
(1992)
Memory-hierarchy Management
-
-
Carr, S.1
-
11
-
-
0029749714
-
Combining optimization for cache and instruction-level parallelism
-
PACT'96. IEEE Computer Society Press, Boston, MA 2
-
CARR, S. 1996. Combining optimization for cache and instruction-level parallelism. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT'96). IEEE Computer Society Press, Boston, MA 238-247.
-
(1996)
Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
, pp. 38-247
-
-
Carr, S.1
-
12
-
-
0031380928
-
Unroll-and-jam using uniformly generated sets
-
MICRO-97. IEEE Computer Society Press, Los Alamitos, CA
-
CARR, S. AND GUAN, Y. 1997. Unroll-and-jam using uniformly generated sets. In Proceedings of the 30th Annual IEEE / ACM International Symposium on Microarchitecture (MICRO-97). IEEE Computer Society Press, Los Alamitos, CA, 349-357.
-
(1997)
Proceedings of the 30th Annual IEEE / ACM International Symposium on Microarchitecture
, pp. 349-357
-
-
Carr, S.1
Guan, Y.2
-
13
-
-
0028549474
-
Improving the ratio of memory operations to floating-point operations in loops
-
Nov.
-
CARR, S. AND KENNEDY, K. 1994a. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16, 6 (Nov.), 1768-1810.
-
(1994)
ACM Trans. Program. Lang. Syst.
, vol.16
, Issue.6
, pp. 1768-1810
-
-
Carr, S.1
Kennedy, K.2
-
14
-
-
0028277074
-
Scalar replacement in the presence of conditional control flow
-
Jan.
-
CARR, S. AND KENNEDY, K. 1994b. Scalar replacement in the presence of conditional control flow. Software Pract. Exp. 24, 1 (Jan.), 51-77.
-
(1994)
Software Pract. Exp.
, vol.24
, Issue.1
, pp. 51-77
-
-
Carr, S.1
Kennedy, K.2
-
15
-
-
84976831704
-
Compiler optimizations for improving data locality
-
ASPLOS-VI, San Jose, CA.
-
CARR, S., MCKINLEY, K., AND TSENG, C.-W. 1994. Compiler optimizations for improving data locality. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI, San Jose, CA). 252-262.
-
(1994)
Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 252-262
-
-
Carr, S.1
McKinley, K.2
Tseng, C.-W.3
-
16
-
-
0029235623
-
Hierarchical tiling for improved superscalar performance
-
IPPS'95. IEEE Computer Society Press, Los Alamitos, CA
-
CARTER, L., FERRANTE, J., AND HUMMEL, S. F. 1995. Hierarchical tiling for improved superscalar performance. In Proceedings of the 9th International Symposium on Parallel Processing (IPPS'95). IEEE Computer Society Press, Los Alamitos, CA, 239-245.
-
(1995)
Proceedings of the 9th International Symposium on Parallel Processing
, pp. 239-245
-
-
Carter, L.1
Ferrante, J.2
Hummel, S.F.3
-
17
-
-
84976745804
-
Tile size selection using cache organization and data layout
-
PLDI-95. ACM Press, New York, NY
-
COLEMAN, S. AND MCKINLEY, K. S. 1995. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-95). ACM Press, New York, NY, 279-290.
-
(1995)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 279-290
-
-
Coleman, S.1
McKinley, K.S.2
-
18
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
DONGARRA, J., CROZ, J. D., HAMMARLING, S., AND DUFF, I. 1990. A set of level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Soft. 16, 1, 1-17.
-
(1990)
ACM Trans. Math. Soft.
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongarra, J.1
Croz, J.D.2
Hammarling, S.3
Duff, I.4
-
19
-
-
0003929457
-
-
Technical report UT-CS-90-108, Department of Computer Science, University of Tennessee, Knoxville
-
DONGARRA, J. AND SCHREIBER, R. 1990. Automatic blocking of nested loops. Technical report UT-CS-90-108, Department of Computer Science, University of Tennessee, Knoxville.
-
(1990)
Automatic Blocking of Nested Loops
-
-
Dongarra, J.1
Schreiber, R.2
-
20
-
-
0027837036
-
A practical data flow framework for array reference analysis and its use in optimizations
-
PLDI-93. ACM Press, New York, NY
-
DUESTERWALD, E., GUPTA, R., AND SOFFA, M. L. 1993. A practical data flow framework for array reference analysis and its use in optimizations. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-93). ACM Press, New York, NY, 68-77.
-
(1993)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 68-77
-
-
Duesterwald, E.1
Gupta, R.2
Soffa, M.L.3
-
21
-
-
0003455775
-
-
M.S. thesis, Dept. of Computer Science, Rice University, Houston, TX
-
ESSEGHIR, K. 1993. Improving data locality for caches. M.S. thesis, Dept. of Computer Science, Rice University, Houston, TX.
-
(1993)
Improving Data Locality for Caches
-
-
Esseghir, K.1
-
22
-
-
85015240805
-
On estimating and enhancing cache effectiveness
-
Proceeding, of the 4th International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Springer-Verlag, Santa Clara, CA
-
FERRANTE, J., SARKAR, V., AND THRASH, W. 1991. On estimating and enhancing cache effectiveness. In Proceeding, of the 4th International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Santa Clara, CA 328-343
-
(1991)
Lecture Notes in Computer Science
, vol.589
, pp. 328-343
-
-
Ferrante, J.1
Sarkar, V.2
Thrash, W.3
-
23
-
-
84972622535
-
Impact of hierarchical memory systems on linear algebra algorithm design
-
Spring
-
GALLIVAN, K., JALBY, W., AND MEIER, U. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 1 (Spring), 12-48.
-
(1988)
Int. J. Supercomput. Appl.
, vol.2
, Issue.1
, pp. 12-48
-
-
Gallivan, K.1
Jalby, W.2
Meier, U.3
-
24
-
-
84862940593
-
Strategies for cache and local memory management by global program transformations
-
ICS-87, Athens, Greece. Springer-Verlag, Berlin, Germany
-
GANNON D., JALBY, W., AND GALLIVAN, K. 1987. Strategies for cache and local memory management by global program transformations. In Proceedings of the 1st International Conference on Supercomputing (ICS-87, Athens, Greece). Springer-Verlag, Berlin, Germany.
-
(1987)
Proceedings of the 1st International Conference on Supercomputing
-
-
Gannon, D.1
Jalby, W.2
Gallivan, K.3
-
25
-
-
0003783762
-
-
Ph.D. thesis, Dept. of Computer Science, Universitat Politècnica de Catalunya, Barcelona, Spain.
-
JIMÉNEZ, M. 1999. Multilevel tiling for non-rectangular iteration spaces. Ph.D. thesis, Dept. of Computer Science, Universitat Politècnica de Catalunya, Barcelona, Spain. (Available online at nttp://www.ac.upc.es/recerca/reports.)
-
(1999)
Multilevel Tiling for Non-rectangular Iteration Spaces
-
-
Jiménez, M.1
-
26
-
-
0034581377
-
On the performance of hand vs. automatically optimized numerical codes
-
HPCA-6. IEEE Computer Science Press, Los Alamitos, CA
-
JIMÉNEZ, M., LLABERÍA, J., AND FERNÁNDEZ, A. 2000. On the performance of hand vs. automatically optimized numerical codes. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA-6). IEEE Computer Science Press, Los Alamitos, CA, 183-194.
-
(2000)
Proceedings of the 6th International Symposium on High-Performance Computer Architecture
, pp. 183-194
-
-
Jiménez, M.1
Llabería, J.2
Fernández, A.3
-
27
-
-
0031605328
-
Performance evaluation of tiling for the register level
-
HPCA-4. IEEE Computer Society Press, Los Alamitos, CA
-
JIMÉNEZ M., LLABERÍA, J. M., AND FERNÁNDEZ, A. 1998. Performance evaluation of tiling for the register level. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA-4). IEEE Computer Society Press, Los Alamitos, CA, 254-265.
-
(1998)
Proceedings of the 4th International Symposium on High-Performance Computer Architecture
, pp. 254-265
-
-
Jiménez, M.1
Llabería, J.M.2
Fernández, A.3
-
28
-
-
0028459839
-
DXML: A high-performance scientific subroutine library
-
Summer
-
KAMATH, C., HO R., AND MANLEY, D. P. 1994. DXML: A high-performance scientific subroutine library. Dig. Tech. J. 6, 3 (Summer), 44-56.
-
(1994)
Dig. Tech. J.
, vol.6
, Issue.3
, pp. 44-56
-
-
Kamath, C.1
Ho, R.2
Manley, D.P.3
-
29
-
-
0032025292
-
Locality optimization algorithms for compilation of out-of-core codes
-
Mar.
-
KANDEMIR M., CHOUDHARY, A., RAMANUJAM, J., AND KANDASWAMY, M. 1988. Locality optimization algorithms for compilation of out-of-core codes. J. Inf. Sci. Eng. 14, 1 (Mar.), 107-138.
-
(1988)
J. Inf. Sci. Eng.
, vol.14
, Issue.1
, pp. 107-138
-
-
Kandemir, M.1
Choudhary, A.2
Ramanujam, J.3
Kandaswamy, M.4
-
30
-
-
0030662867
-
A compiler algorithm for optimizing locality in loop nests
-
ACM Press, New York, NY
-
KANDEMIR, M., RAMANUJAM, J., AND CHOUDHARY, A. 1997. A compiler algorithm for optimizing locality in loop nests. In Proceedings of the 11th International Conference on Supercomputing (ICS-97). ACM Press, New York, NY, 269-276.
-
(1997)
Proceedings of the 11th International Conference on Supercomputing (ICS-97)
, pp. 269-276
-
-
Kandemir, M.1
Ramanujam, J.2
Choudhary, A.3
-
31
-
-
0003363567
-
The 21264: A superscalar alpha processor with out-of-order execution
-
KELLER, J. 1996. The 21264: a superscalar alpha processor with out-of-order execution. Presentation at 1996 IEEE Microprocessor Forum. Slides available online at www.microprocessor.sscc.ru/alpha-21264/a264up1.html.
-
(1996)
1996 IEEE Microprocessor Forum
-
-
Keller, J.1
-
33
-
-
0030685988
-
Data-centric multi-level blocking
-
PLDI-97. ACM Press, New York, NY
-
KODUKULA, I., AHMED, N., AND PINGALI, K. 1997. Data-centric multi-level blocking. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-97). ACM Press, New York, NY, 346-357.
-
(1997)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 346-357
-
-
Kodukula, I.1
Ahmed, N.2
Pingali, K.3
-
34
-
-
85027612984
-
Dependence graphs and compiler optimization
-
POPL'81.
-
KUCK, D J., KUHN, R. H., PADUA, D. A., LEASURE, B., AND WOLFE, M. 1981. Dependence graphs and compiler optimization. In Proceedings of the 8th Symposium on the Principles of Programming Languages (POPL'81). 207-218.
-
(1981)
Proceedings of the 8th Symposium on the Principles of Programming Languages
, pp. 207-218
-
-
Kuck, D.J.1
Kuhn, R.H.2
Padua, D.A.3
Leasure, B.4
Wolfe, M.5
-
36
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV)
-
LAM, M. ROTHBERG, E., AND WOLF, M. 1991. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV). Comput. Architect. News 19, 2, 63-74.
-
(1991)
Comput. Architect. News
, vol.19
, Issue.2
, pp. 63-74
-
-
Lam, M.1
Rothberg, E.2
Wolf, M.3
-
39
-
-
0027694019
-
Access normalization: Loop restructuring for NUMA computers
-
Nov.
-
LI, W. AND PINGALI, K. 1993. Access normalization: loop restructuring for NUMA computers. ACM Trans. Comput. Syst. 11, 4 (Nov.), 353-375.
-
(1993)
ACM Trans. Comput. Syst.
, vol.11
, Issue.4
, pp. 353-375
-
-
Li, W.1
Pingali, K.2
-
40
-
-
84976845278
-
Efficient and exact data dependence analysis
-
PLDI-91, Toronto, Ontario, Canada. ACM Press, New York, NY
-
MAYDAN, D. E., HENNESSY, J. L., AND LAM, M. S. 1991. Efficient and exact data dependence analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-91, Toronto, Ontario, Canada). ACM Press, New York, NY, 1-14.
-
(1991)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 1-14
-
-
Maydan, D.E.1
Hennessy, J.L.2
Lam, M.S.3
-
41
-
-
0030190854
-
Improving data locality with loop transformations
-
July
-
MCKINLEY, K. S., CARR, S., AND TSENG, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18, 4 (July), 424-453.
-
(1996)
ACM Trans. Program. Lang. Syst.
, vol.18
, Issue.4
, pp. 424-453
-
-
Mckinley, K.S.1
Carr, S.2
Tseng, C.-W.3
-
42
-
-
0032308685
-
Quantifying the multi-level nature of tiling interactions
-
MITCHELL, N., HOGSTEDT, K., CARTER, L., AND FERRANTE, J. 1998. Quantifying the multi-level nature of tiling interactions. J. Parallel Program. 26, 6, 641-670.
-
(1998)
J. Parallel Program.
, vol.26
, Issue.6
, pp. 641-670
-
-
Mitchell, N.1
Hogstedt, K.2
Carter, L.3
Ferrante, J.4
-
43
-
-
0040613487
-
-
Technical report TR-98-671, Computer Science Department, University of Southern California, Los Angeles
-
MOON, S. AND SAAVEDRA, R. 1998. Hyperblocking: a data reorganization method to eliminate cache conflicts in tiled loop nests. Technical report TR-98-671, Computer Science Department, University of Southern California, Los Angeles.
-
(1998)
Hyperblocking: A Data Reorganization Method to Eliminate Cache Conflicts in Tiled Loop Nests
-
-
Moon, S.1
Saavedra, R.2
-
44
-
-
0008568517
-
Multilevel orthogonal blocking for dense linear algebra computations
-
Jan. 10-14
-
NAVARRO, J., JUAN, A., VALERO, M., LLABERA, J., AND LANG, T. 1993. Multilevel orthogonal blocking for dense linear algebra computations. IEEE Comput. Soc. TC Comput. Arch. Newsl. Jan. 10-14.
-
(1993)
IEEE Comput. Soc. TC Comput. Arch. Newsl.
-
-
Navarro, J.1
Juan, A.2
Valero, M.3
Llabera, J.4
Lang, T.5
-
47
-
-
0027764718
-
To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts
-
IEEE Computer Society Press, Silver Spring, MD
-
TEMAM, O., GRANSTON, E. D., AND JALBY, W. 1993. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93. IEEE Computer Society Press, Silver Spring, MD, 410-419.
-
(1993)
Proceedings of Supercomputing '93
, pp. 410-419
-
-
Temam, O.1
Granston, E.D.2
Jalby, W.3
-
48
-
-
0003553286
-
-
Ph.D. thesis, Computer Systems Laboratory, Stanford University, Stanford, CA
-
WOLF, M. E. 1992. Improving locality and parallelism in nested loops. Ph.D. thesis, Computer Systems Laboratory, Stanford University, Stanford, CA.
-
(1992)
Improving Locality and Parallelism in Nested Loops
-
-
Wolf, M.E.1
-
49
-
-
84976827033
-
A data locality optimizing algorithm
-
PLDI-91, Toronto, Ontario, Canada. ACM Press, New York, NY
-
WOLF, M. E. AND LAM, M. S. 1991a. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-91, Toronto, Ontario, Canada). ACM Press, New York, NY, 30-44.
-
(1991)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 30-44
-
-
Wolf, M.E.1
Lam, M.S.2
-
50
-
-
0026232450
-
A loop transformation theory and an algorithm to maximize parallelism
-
Oct.
-
WOLF, M. E. AND LAM, M. S. 1991b. A loop transformation theory and an algorithm to maximize parallelism. Trans. Parallel Distrib. Syst. 2, 4 (Oct.), 452-471.
-
(1991)
Trans. Parallel Distrib. Syst.
, vol.2
, Issue.4
, pp. 452-471
-
-
Wolf, M.E.1
Lam, M.S.2
-
51
-
-
0030379246
-
Combining loop transformations considering caches and scheduling
-
MICRO-96, Paris France. IEEE Computer Society Press, Los Alamitos, CA
-
WOLF, M. E., MAYDAN, D. E., AND CHEN, D.-K. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual International Symposium on Microarchitecture (MICRO-96, Paris France). IEEE Computer Society Press, Los Alamitos, CA, 274-286.
-
(1996)
Proceedings of the 29th Annual International Symposium on Microarchitecture
, pp. 274-286
-
-
Wolf, M.E.1
Maydan, D.E.2
Chen, D.-K.3
-
53
-
-
0024935630
-
More iteration space tiling
-
ACM Press, New York, NY
-
WOLFE, M. 1989b. More iteration space tiling. In Proceedings of Supercomputing '89 ACM Press, New York, NY, 655-664.
-
(1989)
Proceedings of Supercomputing '89
, pp. 655-664
-
-
Wolfe, M.1
-
55
-
-
0030129806
-
The MIPS R10000 superscalar microprocessor: Emphasizing concurrency and latency-hiding techniques to efficiently run large, real-world applications
-
Apr.
-
YEAGER, K. C. 1996. The MIPS R10000 superscalar microprocessor: emphasizing concurrency and latency-hiding techniques to efficiently run large, real-world applications IEEE Micro 16, 2 (Apr.), 28-40.
-
(1996)
IEEE Micro
, vol.16
, Issue.2
, pp. 28-40
-
-
Yeager, K.C.1
|