-
1
-
-
84976859541
-
The cache performance and optimizations of blocked algorithms
-
Apr
-
M. S. Lam, E. E. Rothberg, and M. E. Wolf, "The cache performance and optimizations of blocked algorithms", in Proc. Fourth Int'l Conf. on Architectural Support for Prog. Lang, and Operating Systems, pp. 63-74, Apr. 1991.
-
(1991)
Proc. Fourth Int'l Conf. on Architectural Support for Prog. Lang, and Operating Systems
, pp. 63-74
-
-
Lam, M.S.1
Rothberg, E.E.2
Wolf, M.E.3
-
2
-
-
0027764718
-
To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts
-
Los Alamitos, California, IEEE Computer Society Press, Nov
-
O. Temam, E. D. Granston, and W. Jalby, "To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts", in Proceedings of Supercomputing '93, (Los Alamitos, California), pp. 410-419, IEEE Computer Society Press, Nov. 1993.
-
(1993)
Proceedings of Supercomputing '93
, pp. 410-419
-
-
Temam, O.1
Granston, E.D.2
Jalby, W.3
-
3
-
-
1542423315
-
-
ch. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiprocessor. MIT press
-
D. Gannon and W. Jalby, The characteristics of parallel programs, ch. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiprocessor. MIT press, 1987.
-
(1987)
The Characteristics of Parallel Programs
-
-
Gannon, D.1
Jalby, W.2
-
4
-
-
0347662803
-
-
Tech. Rep, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL
-
K. Gallivan, W. Jalby, U. Meier, and A. Sameh, "The impact of hierarchical memory systems on linear algebra design", Tech. Rep. CSRD-625, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, 1987.
-
(1987)
The Impact of Hierarchical Memory Systems on Linear Algebra Design
-
-
Gallivan, K.1
Jalby, W.2
Meier, U.3
Sameh, A.4
-
5
-
-
0026157234
-
Data prefetching in multiprocessor vector cache memories
-
Toronto, Canada, June
-
J. W. C. Fu and J. H. Patel, "Data prefetching in multiprocessor vector cache memories", in Proc. 18th Ann. Int'l Symp. Computer Architecture, (Toronto, Canada), pp. 54-63, June 1991.
-
(1991)
Proc. 18th Ann. Int'l Symp. Computer Architecture
, pp. 54-63
-
-
Fu, J.W.C.1
Patel, J.H.2
-
6
-
-
0026186269
-
Compile-time partitioning of iterative parallel loops to reduce cache coherence traffic
-
S. G. Abraham and D. E. Hudak, "Compile-time partitioning of iterative parallel loops to reduce cache coherence traffic", J. Parallel and Distributed Computing, vol. 2, pp. 318-328, 1991.
-
(1991)
J. Parallel and Distributed Computing
, vol.2
, pp. 318-328
-
-
Abraham, S.G.1
Hudak, D.E.2
-
7
-
-
0026267802
-
An effective on-chip preloading scheme to reduce data access penalty
-
Nov
-
J.-L. Baer and T.-F. Chen, "An effective on-chip preloading scheme to reduce data access penalty", in Proceeding of Supercomputing '91, pp. 176-186, Nov. 1991.
-
(1991)
Proceeding of Supercomputing '91
, pp. 176-186
-
-
Baer, J.-L.1
Chen, T.-F.2
-
8
-
-
84976833735
-
Design and evaluation of a compiler algorithm for prefetching
-
Oct
-
T. C. Mowry, M. S. Lam, and A. Gupta, "Design and evaluation of a compiler algorithm for prefetching", in Proc. Fifth Int'l Conf. on Architectural Support for Prog. Lang, and Operating Systems, pp. 62-73, Oct. 1992.
-
(1992)
Proc. Fifth Int'l Conf. on Architectural Support for Prog. Lang, and Operating Systems
, pp. 62-73
-
-
Mowry, T.C.1
Lam, M.S.2
Gupta, A.3
-
9
-
-
84944799568
-
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
-
Albuquerque, NM., Nov
-
W. Y. Chen, S. A. Mahlke, P. P. Chang, and W. W. Hwu, "Data access microarchitectures for superscalar processors with compiler-assisted data prefetching", in Proc. 24th Ann. Workshop on Microprogramming and Microarchitectures, (Albuquerque, NM.), Nov. 1991.
-
(1991)
Proc. 24th Ann. Workshop on Microprogramming and Microarchitectures
-
-
Chen, W.Y.1
Mahlke, S.A.2
Chang, P.P.3
Hwu, W.W.4
-
10
-
-
33646901785
-
Tolerating data access latency with register preloading
-
July
-
W. Y. Chen, S. A. Mahlke, W. W. Hwu, T. Kiyohara, and P. P. Chang, "Tolerating data access latency with register preloading", in Proceedings of the 6th International Conference on Supercomputing, July 1992.
-
(1992)
Proceedings of the 6th International Conference on Supercomputing
-
-
Chen, W.Y.1
Mahlke, S.A.2
Hwu, W.W.3
Kiyohara, T.4
Chang, P.P.5
-
11
-
-
0026157612
-
IMPACT: An architectural framework for multiple-instruction-issue processors
-
Toronto, Canada, June
-
P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Wärter, and W. W. Hwu, "IMPACT: An architectural framework for multiple-instruction-issue processors", in Proc. 18th Ann. Int'l Symp. Computer Architecture, (Toronto, Canada), pp. 266-275, June 1991.
-
(1991)
Proc. 18th Ann. Int'l Symp. Computer Architecture
, pp. 266-275
-
-
Chang, P.P.1
Mahlke, S.A.2
Chen, W.Y.3
Wärter, N.J.4
Hwu, W.W.5
-
12
-
-
0027595384
-
The superblock: An effective technique for VLIW and superscalar compilation
-
Jan
-
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, "The superblock: An effective technique for VLIW and superscalar compilation", Journal of Supercomputing, vol. 7, pp. 229-248, Jan. 1992.
-
(1992)
Journal of Supercomputing
, vol.7
, pp. 229-248
-
-
Hwu, W.W.1
Mahlke, S.A.2
Chen, W.Y.3
Chang, P.P.4
Warter, N.J.5
Bringmann, R.A.6
Ouellette, R.G.7
Hank, R.E.8
Kiyohara, T.9
Haab, G.E.10
Holm, J.G.11
Lavery, D.M.12
-
13
-
-
84976676720
-
A practical algorithm for exact array dependence analysis
-
Aug
-
W. Pugh, "A practical algorithm for exact array dependence analysis", Communications of the ACM, vol. 35, pp. 102-114, Aug. 1992.
-
(1992)
Communications of the ACM
, vol.35
, pp. 102-114
-
-
Pugh, W.1
-
15
-
-
0003477925
-
-
Tech. Rep, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May
-
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin, "The PERFECT club benchmarks: Effective performance evaluation of supercomputers", Tech. Rep. CSRD-827, Center for Supercomputing Research and Development, University of Illinois, Urbana, IL, May 1989.
-
(1989)
The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers
-
-
Berry, M.1
Chen, D.2
Koss, P.3
Kuck, D.4
Lo, S.5
Pang, Y.6
Roloff, R.7
Sameh, A.8
Clementi, E.9
Chin, S.10
Schneider, D.11
Fox, G.12
Messina, P.13
Walker, D.14
Hsiung, C.15
Schwarzmeier, J.16
Lue, K.17
Orzag, S.18
Seidl, F.19
Johnson, O.20
Swanson, G.21
Goodrum, R.22
Martin, J.23
more..
-
16
-
-
6144224602
-
-
Tech. Rep, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL
-
J. W. C. Fu and J. H. Patel, "How to simulate 100 billion references cheaply", Tech. Rep. CRHC-91-30, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, 1991.
-
(1991)
How to Simulate 100 Billion References Cheaply
-
-
Fu, J.W.C.1
Patel, J.H.2
-
17
-
-
0017922490
-
The Cray-1 computer system
-
Jan
-
R. M. Russell, "The Cray-1 computer system", Communications of the ACM, vol. 21, pp. 63-72, Jan. 1978.
-
(1978)
Communications of the ACM
, vol.21
, pp. 63-72
-
-
Russell, R.M.1
|