SCOPUS 정보 검색 플랫폼

Volumn 16, Issue 1, 1998, Pages 55-92

Tolerating Latency in Multiprocessors through Compiler-Inserted Prefetching

a Carnegie Mellon University (United States)

Author keywords

B.3.2 Memory Structures : Design Styles cache memories; Compiler optimization; D.3.4 Programming Languages : Processors compilers; Design; Experimentation; Optimization; Performance; Prefetching

Indexed keywords

ALGORITHMS; BUFFER STORAGE; COMPUTER ARCHITECTURE; COMPUTER SOFTWARE; FAULT TOLERANT COMPUTER SYSTEMS; MULTIPROCESSING SYSTEMS; OPTIMIZATION; PROGRAM PROCESSORS;

COMPILER INSERTED PREFETCHING; TOLERATING LATENCY;

PROGRAM COMPILERS;

EID: 0031988272 PISSN: 07342071 EISSN: None Source Type: Journal
DOI: 10.1145/273011.273021 Document Type: Article

Times cited : (52)

References (43)

1
- 85072516160
- Automatic program transformations for virtual memory computers
- ABU-SUFAH, W., KUCK, D. J., AND LAWRIE, D. H. 1979. Automatic program transformations for virtual memory computers. In Proceedings of the 1979 National Computer Conference. 969-974.
- (1979) Proceedings of the 1979 National Computer Conference , pp. 969-974
- Abu-Sufah, W.¹ Kuck, D.J.² Lawrie, D.H.³

2
- 0025433676
- Weak ordering - A new definition
- May
- ADVE, S. AND HILL, M. 1990. Weak ordering - A new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture (May). 2-14.
- (1990) Proceedings of the 17th Annual International Symposium on Computer Architecture , pp. 2-14
- Adve, S.¹ Hill, M.²

3
- 0027802136
- Communication optimization and code generation for distributed memory machines
- June
- AMARASINGHE, S. P. AND LAM, M. S. 1993. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (June). 126-138.
- (1993) Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation , pp. 126-138
- Amarasinghe, S.P.¹ Lam, M.S.²

4
- 0027870804
- Global optimizations for parallelism and locality on scalable parallel machines
- June. ACM, New York
- ANDERSON, J. M. AND LAM, M. S. 1993. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (June). ACM, New York, 112-125.
- (1993) Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation , pp. 112-125
- Anderson, J.M.¹ Lam, M.S.²

5
- 0026267802
- An effective on-chip preloading scheme to reduce data access penalty
- BAER, J.-L. AND CHEN, T.-F. 1991. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing '91.
- (1991) Proceedings of Supercomputing '91
- Baer, J.-L.¹ Chen, T.-F.²

6
- 0025548641
- Data cache performance of supercomputer applications
- CALLAHAN, D. AND PORTERFIELD, A. 1990. Data cache performance of supercomputer applications. In Proceedings of Supercomputing '90. 564-572.
- (1990) Proceedings of Supercomputing '90 , pp. 564-572
- Callahan, D.¹ Porterfield, A.²

7
- 0026138044
- Software prefetching
- April. ACM, New York
- CALLAHAN, D., KENNEDY, K., AND PORTERFIELD, A. 1991. Software prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (April). ACM, New York, 40-52.
- (1991) Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 40-52
- Callahan, D.¹ Kennedy, K.² Porterfield, A.³

8
- 0023531324
- A vliw architecture for a trace scheduling compiler
- Oct. ACM, New York
- COLWELL, R. P., NIX, R. P., O'DONNELL, J. J., PAPWORTH, D. B., AND RODMAN, P. K. 1987. A vliw architecture for a trace scheduling compiler. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (Oct.). ACM, New York, 180-192.
- (1987) Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 180-192
- Colwell, R.P.¹ Nix, R.P.² O'Donnell, J.J.³ Papworth, D.B.⁴ Rodman, P.K.⁵

9
- 0004269807
- CONVEX COMPUTER Convex Computer Corp.
- CONVEX COMPUTER. 1994. Convex Exemplar Architecture. Convex Computer Corp.
- (1994) Convex Exemplar Architecture

10
- 0027574855
- A methodology for procedure cloning
- April
- COOPER, K., HALL, M., AND KENNEDY, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (April).
- (1993) Comput. Lang. , vol.19 , Issue.2
- Cooper, K.¹ Hall, M.² Kennedy, K.³

11
- 0024868691
- Overlapped loop support in the cydra 5
- April. ACM, New York
- DEHNERT, J. C., Hsu, P. Y.-T., AND BRATT, J. P. 1989. Overlapped loop support in the cydra 5. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III) (April). ACM, New York, 26-38.
- (1989) Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III) , pp. 26-38
- Dehnert, J.C.¹ Hsu, P.Y.-T.² Bratt, J.P.³

12
- 0023963509
- Synchronization, coherence, and event ordering in multiprocessors
- Feb.
- DUBOIS, M., SCHEURICH, C., AND BRIGGS, F. A. 1988. Synchronization, coherence, and event ordering in multiprocessors. Computer 21, 2 (Feb.), 9-21.
- (1988) Computer , vol.21 , Issue.2 , pp. 9-21
- Dubois, M.¹ Scheurich, C.² Briggs, F.A.³

13
- 0001577034
- Eliminating false sharing
- Aug.
- EGGERS, S. J. AND JEREMIASSEN, T. E. 1991. Eliminating false sharing. In Proceedings of the 1991 International Conference on Parallel Processing. Vol. 1 (Aug.). 377-381.
- (1991) Proceedings of the 1991 International Conference on Parallel Processing , vol.1 , pp. 377-381
- Eggers, S.J.¹ Jeremiassen, T.E.²

14
- 0347662803
- The impact of hierarchical memory systems on linear algebra algorithm design
- Univ. of Illinois, Urbana-Champaign, Ill.
- GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. 1987. The impact of hierarchical memory systems on linear algebra algorithm design. Tech. Rep. UIUCSRD 625, Univ. of Illinois, Urbana-Champaign, Ill.
- (1987) Tech. Rep. UIUCSRD 625
- Gallivan, K.¹ Jalby, W.² Meier, U.³ Sameh, A.⁴

15
- 0347662804
- The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiprocessor
- MIT Press, Cambridge, Mass.
- GANNON, D. AND JALBY, W. 1987. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiprocessor. In The Characteristics of Parallel Algorithms. MIT Press, Cambridge, Mass.
- (1987) The Characteristics of Parallel Algorithms
- Gannon, D.¹ Jalby, W.²

16
- 0026137114
- Performance evaluation of memory consistency models for shared-memory multiprocessors
- April. ACM, New York
- GHARACHORLOO, K., GUPTA, A., AND HENNESSY, J. 1991. Performance evaluation of memory consistency models for shared-memory multiprocessors. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (April). ACM, New York, 245-257.
- (1991) Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 245-257
- Gharachorloo, K.¹ Gupta, A.² Hennessy, J.³

17
- 0025433762
- Memory consistency and event ordering in scalable shared-memory multiprocessors
- May
- GHARACHORLOO, K., LENOSKI, D., LAUDON, J., GIBBONS, P., GUPTA, A., AND HENNESSY, J. 1990. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture (May). 15-26.
- (1990) Proceedings of the 17th Annual International Symposium on Computer Architecture , pp. 15-26
- Gharachorloo, K.¹ Lenoski, D.² Laudon, J.³ Gibbons, P.⁴ Gupta, A.⁵ Hennessy, J.⁶

18
- 0011603313
- Tango introduction and tutorial
- Stanford Univ., Palo Alto, Calif.
- GOLDSCHMIDT, S. R. AND DAVIS, H. 1990. Tango introduction and tutorial. Tech. Rep. CSL-TR-90-410, Stanford Univ., Palo Alto, Calif.
- (1990) Tech. Rep. CSL-TR-90-410
- Goldschmidt, S.R.¹ Davis, H.²

19
- 0004236492
- Johns Hopkins University Press, Baltimore, Md.
- GOLUB, G. H. AND LOAN, C. F. V. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, Md.
- (1989) Matrix Computations
- Golub, G.H.¹ Loan, C.F.V.²

20
- 0025146693
- Compiler-directed data prefetching in multiprocessors with memory hierarchies
- GORNISH, E., GRANSTON, E., AND VEIDENBAUM, A. 1990. Compiler-directed data prefetching in multiprocessors with memory hierarchies. In Proceedings of the International Conference on Supercomputing.
- (1990) Proceedings of the International Conference on Supercomputing
- Gornish, E.¹ Granston, E.² Veidenbaum, A.³

21
- 0008160312
- Master's thesis, Univ. of Illinois, Urbana-Champaign, Ill.
- GORNISH, E. H. 1989. Compile time analysis for data prefetching. Master's thesis, Univ. of Illinois, Urbana-Champaign, Ill.
- (1989) Compile Time Analysis for Data Prefetching
- Gornish, E.H.¹

22
- 0026158290
- Comparative evaluation of latency reducing and tolerating techniques
- May
- GUPTA, A., HENNESSY, J., GHARACHORLOO, K., MOWRY, T., AND WEBER, W.-D. 1991. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the 18th Annual International Symposium on Computer Architecture (May). 254-263.
- (1991) Proceedings of the 18th Annual International Symposium on Computer Architecture , pp. 254-263
- Gupta, A.¹ Hennessy, J.² Gharachorloo, K.³ Mowry, T.⁴ Weber, W.-D.⁵

23
- 0026153646
- Architecture for software-controlled data prefetching
- May
- KLAIBER, A. C. AND LEVY, H. M. 1991. Architecture for software-controlled data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture (May). 43-63.
- (1991) Proceedings of the 18th Annual International Symposium on Computer Architecture , pp. 43-63
- Klaiber, A.C.¹ Levy, H.M.²

24
- 0028732616
- Cache performance in vector supercomputers
- KONTOTHANASSIS, L., SUGUMAR, R., FAANES, G., SMITH, J., AND SCOTT, M. 1994. Cache performance in vector supercomputers. In Proceedings of Supercomputing '94. 255-264.
- (1994) Proceedings of Supercomputing '94 , pp. 255-264
- Kontothanassis, L.¹ Sugumar, R.² Faanes, G.³ Smith, J.⁴ Scott, M.⁵

25
- 0019892368
- Lockup-free instruction fetch/prefetch cache organization
- KROFT, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture. 81-85.
- (1981) Proceedings of the 8th Annual International Symposium on Computer Architecture , pp. 81-85
- Kroft, D.¹

26
- 0018518477
- How to make a multiprocessor computer that correctly executes multiprocess programs
- Sept.
- LAMPORT, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept.), 241-248.
- (1979) IEEE Trans. Comput. , vol.C-28 , Issue.9 , pp. 241-248
- Lamport, L.¹

27
- 0030685588
- The SGI Origin: A ccNUMa highly scalable server
- June
- LAUDON, J. AND LENOSKI, D. 1997. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th International Symposium on Computer Architecture (June). 241-251.
- (1997) Proceedings of the 24th International Symposium on Computer Architecture , pp. 241-251
- Laudon, J.¹ Lenoski, D.²

28
- 0347031952
- Ph.D. thesis, Dept. of Computer Science, Univ. of Illinois, Urbana-Champaign, Ill.
- LEE, R. L. 1987. The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors. Ph.D. thesis, Dept. of Computer Science, Univ. of Illinois, Urbana-Champaign, Ill.
- (1987) The Effectiveness of Caches and Data Prefetch Buffers in Large-scale Shared Memory Multiprocessors
- Lee, R.L.¹

29
- 0026839484
- The Stanford DASH multiprocessor
- Mar.
- LENOSKI, D., GHARACHORLOO, K., LAUDON, J., GUPTA, A., HENNESSY, J., HOROWITZ, M., AND LAM, M. 1992. The Stanford DASH multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79.
- (1992) IEEE Comput. , vol.25 , Issue.3 , pp. 63-79
- Lenoski, D.¹ Gharachorloo, K.² Laudon, J.³ Gupta, A.⁴ Hennessy, J.⁵ Horowitz, M.⁶ Lam, M.⁷

30
- 0030259355
- Compiler-based prefetching for recursive data structures
- Oct. ACM, New York
- LUK, C.-K. AND MOWRY, T. C. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct.). ACM, New York, 222-233.
- (1996) Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 222-233
- Luk, C.-K.¹ Mowry, T.C.²

31
- 0003979521
- Holt, Rinehart and Winston, Inc.
- LUSK, E., OVERBEEK, R., ET AL. 1987. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc.
- (1987) Portable Programs for Parallel Processors
- Lusk, E.¹ Overbeek, R.²

32
- 84945709131
- The organization of matrices and matrix operations in a paged multiprogramming environment
- MCKELLER, A. C. AND COFFMAN, E. G. 1969. The organization of matrices and matrix operations in a paged multiprogramming environment. Commun. ACM 12, 3, 153-165.
- (1969) Commun. ACM , vol.12 , Issue.3 , pp. 153-165
- Mckeller, A.C.¹ Coffman, E.G.²

33
- 0002031606
- Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
- MOWRY, T. AND GUPTA, A. 1991. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parallel Distrib. Comput. 12, 2, 87-106.
- (1991) J. Parallel Distrib. Comput. , vol.12 , Issue.2 , pp. 87-106
- Mowry, T.¹ Gupta, A.²

34
- 0004033521
- Ph.D. thesis, Stanford Univ., Palo Alto, Calif.
- MOWRY, T. C. 1994. Tolerating latency through software-controlled data prefetching. Ph.D. thesis, Stanford Univ., Palo Alto, Calif.
- (1994) Tolerating Latency Through Software-controlled Data Prefetching
- Mowry, T.C.¹

35
- 0026918402
- Design and evaluation of a compiler algorithm for prefetching
- Oct.
- MOWRY, T. C., LAM, M. S., AND GUPTA, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. Vol. 27 (Oct.). 62-73.
- (1992) Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems , vol.27 , pp. 62-73
- Mowry, T.C.¹ Lam, M.S.² Gupta, A.³

36
- 0003690936
- Ph.D. thesis, Dept. of Computer Science, Rice Univ., Houston, Tex.
- PORTERFIELD, A. K. 1989. Software methods for improvement of cache performance on supercomputer applications. Ph.D. thesis, Dept. of Computer Science, Rice Univ., Houston, Tex.
- (1989) Software Methods for Improvement of Cache Performance on Supercomputer Applications
- Porterfield, A.K.¹

37
- 0003897840
- Splash: Stanford parallel applications for shared memory
- Stanford Univ., Palo Alto, Calif.
- SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1991. Splash: Stanford parallel applications for shared memory. Tech. Rep. CSL-TR-91-469, Stanford Univ., Palo Alto, Calif.
- (1991) Tech. Rep. CSL-TR-91-469
- Singh, J.P.¹ Weber, W.-D.² Gupta, A.³

38
- 0025440459
- A survey of cache coherence schemes for multiprocessors
- June
- STENSTROM, P. 1990. A survey of cache coherence schemes for multiprocessors. IEEE Comput. 23, 6 (June), 12-24.
- (1990) IEEE Comput. , vol.23 , Issue.6 , pp. 12-24
- Stenstrom, P.¹

39
- 0026987137
- Sharlit: A tool for building optimizers
- ACM, New York
- TJIANG, S. W. K. AND HENNESSY, J. L. 1992. Sharlit: A tool for building optimizers. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York.
- (1992) Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation
- Tjiang, S.W.K.¹ Hennessy, J.L.²

40
- 0003333239
- Shared data placement optimizations to reduce multiprocessor cache miss rates
- Aug.
- TORRELLAS, J., LAM, M. S., AND HENNESSY, J. L. 1990. Shared data placement optimizations to reduce multiprocessor cache miss rates. In Proceedings of the 1990 International Conference on Parallel Processing. Vol. 2 (Aug.). 266-270.
- (1990) Proceedings of the 1990 International Conference on Parallel Processing , vol.2 , pp. 266-270
- Torrellas, J.¹ Lam, M.S.² Hennessy, J.L.³

41
- 0027316717
- Limitations of cache prefetching on a bus-based multiprocessor
- May
- TULLSEN, D. M. AND EGGERS, S. J. 1993. Limitations of cache prefetching on a bus-based multiprocessor. In Proceedings of the 20th Annual International Symposium on Computer Architecture (May). 278-288.
- (1993) Proceedings of the 20th Annual International Symposium on Computer Architecture , pp. 278-288
- Tullsen, D.M.¹ Eggers, S.J.²

42
- 0004376335
- Ph.D. thesis, Stanford Univ., Palo Alto, Calif.
- WEBER, W.-D. 1993. Scalable directories for cache-coherent shared-memory multiprocessors. Ph.D. thesis, Stanford Univ., Palo Alto, Calif.
- (1993) Scalable Directories for Cache-coherent Shared-memory Multiprocessors
- Weber, W.-D.¹

43
- 84976827033
- A data locality optimizing algorithm
- June. ACM, New York
- WOLF, M. E. AND LAM, M. S. 1991. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation (June). ACM, New York, 30-44.
- (1991) Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation , pp. 30-44
- Wolf, M.E.¹ Lam, M.S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.