SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 14, Issue 3, 2003, Pages 307-321

On the parallel execution time of tiled loops

(3) Högstedt, Karin a Carter, Larry b Ferrante, Jeanne b

a AT AND T LABS RESEARCH (United States)

b UNIVERSITY OF SAN DIEGO (United States)

Author keywords

Blocking; Compiler optimization; Parallel compilers; Tiling

Indexed keywords

COMPUTER SIMULATION; DIFFERENTIAL EQUATIONS; DYNAMIC PROGRAMMING; INTERPOLATION; LINEAR PROGRAMMING; PROGRAM COMPILERS; RESPONSE TIME (COMPUTER SYSTEMS);

COMPILER OPTIMIZATION; EXECUTION TIME; PARALLEL COMPILERS; TILED LOOPS; TILING;

PARALLEL PROCESSING SYSTEMS;

EID: 0037962984 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2003.1189587 Document Type: Article

Times cited : (30)

References (47)

1
- 0038823316
- A case-study in performance programming: Seismic migration
- G. Almasi, B. Alpern, L. Berman, L. Carter, and D. Hale, "A Case-Study in Performance Programming: Seismic Migration," Proc. Symp. High Performance Computing, Sept. 1991.
- Proc. Symp. High Performance Computing, Sept. 1991
- Almasi, G.¹ Alpern, B.² Berman, L.³ Carter, L.⁴ Hale, D.⁵

2
- 0038146616
- Automatic code distribution
- C. Ancourt and F. Irigoin, "Automatic Code Distribution," Proc. Third Workshop Compilers for Parallel Computers (CPC '92), July 1992.
- Proc. Third Workshop Compilers for Parallel Computers (CPC '92), July 1992
- Ancourt, C.¹ Irigoin, F.²

3
- 70350749986
- Optimal orthogonal tiling
- Sept.
- R. Andonov, S. Rajopadhye, and N. Yanev, "Optimal Orthogonal Tiling," Proc. Europar '98, pp. 480-490, Sept. 1998.
- (1998) Proc. Europar '98 , pp. 480-490
- Andonov, R.¹ Rajopadhye, S.² Yanev, N.³

4
- 0003207812
- Unimodular transformations of double loops
- U. Banerjee, "Unimodular Transformations of Double Loops," Proc. Workshop Programming Languages and Compilers for Parallel Computing, Aug. 1990.
- Proc. Workshop Programming Languages and Compilers for Parallel Computing, Aug. 1990
- Banerjee, U.¹

5
- 4243923745
- Matrix multiply benchmarks
- technical report, Center for Scientific Computing, Dept. of Math., Univ. of Utah; This report is updated frequently
- N.H.F. Beebe, "Matrix Multiply Benchmarks," technical report, Center for Scientific Computing, Dept. of Math., Univ. of Utah, 1990, This report is updated frequently.
- (1990)
- Beebe, N.H.F.¹

6
- 0030661485
- Optimizing matrix multiply using PhiPAC: A portable, high performance, ANSI C coding methodology
- July
- J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel, "Optimizing Matrix Multiply Using PhiPAC: A Portable, High Performance, ANSI C Coding Methodology," Proc. 11th Int'l Conf. Supercomputing (ICS '97), pp. 340-347, July 1997.
- (1997) Proc. 11th Int'l Conf. Supercomputing (ICS '97) , pp. 340-347
- Bilmes, J.¹ Asanovic, K.² Chin, C.-W.³ Demmel, J.⁴

7
- 0028482686
- (Pen)-ultimate tiling?
- P. Boulet, A. Darte, T. Risset, and Y. Robert, "(Pen)-Ultimate Tiling?" INTEGRATION, the Very Large Scale Intergration J., vol. 17, pp. 33-51, 1994.
- (1994) INTEGRATION, the Very Large Scale Intergration J. , vol.17 , pp. 33-51
- Boulet, P.¹ Darte, A.² Risset, T.³ Robert, Y.⁴

8
- 0000493064
- Estimating interlock and improving balance for pipelined machines
- Aug.
- D. Callahan, J. Cocke, and K. Kennedy, "Estimating Interlock and Improving Balance for Pipelined Machines," J. Parallel and Distributed Computing, vol. 5, no. 4, pp. 334-358, Aug. 1988.
- (1988) J. Parallel and Distributed Computing , vol.5 , Issue.4 , pp. 334-358
- Callahan, D.¹ Cocke, J.² Kennedy, K.³

9
- 0029749714
- Combining optimization for cache and instruction-level parallelism
- S. Carr, "Combining Optimization for Cache and Instruction-Level Parallelism," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '96), pp. 238-247, 1996.
- (1996) Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '96) , pp. 238-247
- Carr, S.¹

10
- 84964748976
- Compiler blockability of numerical algorithms
- Nov.
- S. Carr and K. Kennedy, "Compiler Blockability of Numerical Algorithms," J. Supercomputing, pp. 114-124, Nov. 1992.
- (1992) J. Supercomputing , pp. 114-124
- Carr, S.¹ Kennedy, K.²

11
- 85009364061
- Compiler optimizations for improving data locality
- S. Carr, K.S. McKinley, and C.-W. Tseng, "Compiler Optimizations for Improving Data Locality," Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1994.
- Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1994
- Carr, S.¹ McKinley, K.S.² Tseng, C.-W.³

12
- 0347151907
- Efficient parallelism via hierarchical tiling
- L. Carter, J. Ferrante, and S.F. Hummel, "Efficient Parallelism via Hierarchical Tiling," Proc. SIAM Conf. Parallel Processing for Scientific Computing, Feb. 1995.
- Proc. SIAM Conf. Parallel Processing for Scientific Computing, Feb. 1995
- Carter, L.¹ Ferrante, J.² Hummel, S.F.³

13
- 0029235623
- Hierarchical tiling for improved superscalar performance
- L. Carter, J. Ferrante, and S.F. Hummel, "Hierarchical Tiling for Improved Superscalar Performance," Proc. Int'l Parallel Processing Symp., Apr. 1995.
- Proc. Int'l Parallel Processing Symp., Apr. 1995
- Carter, L.¹ Ferrante, J.² Hummel, S.F.³

14
- 85009352487
- Tile size selection using cache organization and data layout
- June
- S. Coleman and K.S. McKinley, "Tile Size Selection Using Cache Organization and Data Layout," Programming Language Design and Implementation, June 1995.
- (1995) Programming Language Design and Implementation
- Coleman, S.¹ McKinley, K.S.²

15
- 0004116989
- MIT Press and McGraw-Hill
- T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, sixth ed., MIT Press and McGraw-Hill, 1992.
- (1992) Introduction to Algorithms, Sixth Ed.
- Cormen, T.H.¹ Leiserson, C.E.² Rivest, R.L.³

16
- 0030287932
- LogP: A practical model of parallel computation
- Nov.
- D. Culler, R. Karp, D. Patterson, A. Sahay, E. Santos, K.E. Schauser, R. Subramonian, and T. von Eicken, "LogP: A Practical Model of Parallel Computation," Comm. ACM, vol. 39, no. 11, pp. 78-85, Nov. 1996.
- (1996) Comm. ACM , vol.39 , Issue.11 , pp. 78-85
- Culler, D.¹ Karp, R.² Patterson, D.³ Sahay, A.⁴ Santos, E.⁵ Schauser, K.E.⁶ Subramonian, R.⁷ Von Eicken, T.⁸

17
- 0002352131
- Linear scheduling is nearly optimal
- A. Darte, L. Khachiyan, and Y. Robert, "Linear Scheduling is Nearly Optimal," Parallel Processing Letters, vol. 1, no. 2, pp. 73-81, 1991.
- (1991) Parallel Processing Letters , vol.1 , Issue.2 , pp. 73-81
- Darte, A.¹ Khachiyan, L.² Robert, Y.³

18
- 0031335231
- Determining the idle time of a tiling: New results
- F. Desprez, J. Dongarra, F. Rastello, and Y. Robert, "Determining the Idle Time of a Tiling: New Results," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '97), Nov. 1997.
- Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '97), Nov. 1997
- Desprez, F.¹ Dongarra, J.² Rastello, F.³ Robert, Y.⁴

19
- 0034299275
- Generation of efficient nested loops from polyhedra
- S.V. Rajopadhye, F. Quiller, and D. Wilde, "Generation of Efficient Nested Loops from Polyhedra," Int'l J. Parallel Programming, vol. 28, no. 5, pp. 469-498, 2000.
- (2000) Int'l J. Parallel Programming , vol.28 , Issue.5 , pp. 469-498
- Rajopadhye, S.V.¹ Quiller, F.² Wilde, D.³

20
- 0023385308
- The program dependence graph and its use in optimization
- July
- J. Ferrante, K.J. Ottenstein, and J.D. Warren, "The Program Dependence Graph and Its Use in Optimization," ACM Trans. Programming Languages and Systems, vol. 9, no. 3, pp. 319-349, July 1987.
- (1987) ACM Trans. Programming Languages and Systems , vol.9 , Issue.3 , pp. 319-349
- Ferrante, J.¹ Ottenstein, K.J.² Warren, J.D.³

21
- 0003638028
- Predicting performance for tiled perfectly nested loops
- PhD thesis, Univ. of California, San Diego, Dept. of Computer Science and Eng., Dec.
- K. Högstedt, "Predicting Performance for Tiled Perfectly Nested Loops," PhD thesis, Univ. of California, San Diego, Dept. of Computer Science and Eng., Dec. 1999.
- (1999)
- Högstedt, K.¹

22
- 0030651937
- Determining the idle time of a tiling
- K. Högstedt, L. Carter, and J. Ferrante, "Determining the Idle Time of a Tiling," Proc. Symp. Principles of Programming Languages, Jan. 1997.
- Proc. Symp. Principles of Programming Languages, Jan. 1997
- Högstedt, K.¹ Carter, L.² Ferrante, J.³

23
- 0032642196
- Selecting tile shape for minimal execution time
- K. Högstedt, L. Carter, and J. Ferrante, "Selecting Tile Shape for Minimal Execution Time," Proc 11th ACM Symp. Parallel Algorithms and Architectures, June 1999.
- Proc 11th ACM Symp. Parallel Algorithms and Architectures, June 1999
- Högstedt, K.¹ Carter, L.² Ferrante, J.³

24
- 85026986651
- Supernode partitioning
- Jan.
- F. Irigoin and R. Triolet, "Supernode Partitioning," Proc. Symp. Principles of Programming Languages, pp. 319-328, Jan. 1988.
- (1988) Proc. Symp. Principles of Programming Languages , pp. 319-328
- Irigoin, F.¹ Triolet, R.²

25
- 85027602455
- Optimizing for parallelism and data locality
- K. Kennedy and K.S. McKinley, "Optimizing for Parallelism and Data Locality," Proc. Int'l Conf. Supercomputing, July 1992.
- Proc. Int'l Conf. Supercomputing, July 1992
- Kennedy, K.¹ McKinley, K.S.²

26
- 0001465739
- Maximizing loop parallelism and improving data locality via loop fusion and distribution
- K. Kennedy and K.S. McKinley, "Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution," Languages and Compilers for Parallel Computing, 1993.
- Languages and Compilers for Parallel Computing, 1993
- Kennedy, K.¹ McKinley, K.S.²

27
- 0030685988
- Data-centric multilevel blocking
- I. Kodukula, N. Ahmed, and K. Pingali, "Data-Centric Multilevel Blocking," Proc. SIGPLAN, Conf. Programming Language Design and Implementation, pp. 346-357, 1997.
- (1997) Proc. SIGPLAN, Conf. Programming Language Design and Implementation , pp. 346-357
- Kodukula, I.¹ Ahmed, N.² Pingali, K.³

28
- 0032308685
- Quantifying the multilevel nature of tiling interactions
- N. Mitchell, K. Högstedt, L. Carter, and J. Ferrante, "Quantifying the Multilevel Nature of Tiling Interactions," Int'l J. Parallel Programming, vol.26, no. 6, pp. 641-670, 1998.
- (1998) Int'l J. Parallel Programming , vol.26 , Issue.6 , pp. 641-670
- Mitchell, N.¹ Högstedt, K.² Carter, L.³ Ferrante, J.⁴

29
- 57649182551
- Quantifying the multilevel nature of tiling interactions
- N. Mitchell, L. Carter, J. Ferrante, and K. Högstedt, "Quantifying the Multilevel Nature of Tiling Interactions," Proc. Workshop Languages and Compilers for Parallel Computing, 1997.
- Proc. Workshop Languages and Compilers for Parallel Computing, 1997
- Mitchell, N.¹ Carter, L.² Ferrante, J.³ Högstedt, K.⁴

30
- 0029728673
- Automatic partitioning of signal processing programs for symmetric multiprocessors
- Oct.
- C.J. Newburn and J.P. Shen, "Automatic Partitioning of Signal Processing Programs for Symmetric Multiprocessors," Proc. 1996 Conf. Parallel Architectures and Compilation Techniques (PACT '96), pp. 269-280, Oct. 1996.
- (1996) Proc. 1996 Conf. Parallel Architectures and Compilation Techniques (PACT '96) , pp. 269-280
- Newburn, C.J.¹ Shen, J.P.²

31
- 0038485313
- Optimizing memory usage in the polyhedral model
- F. Quiller and S. V. Rajopadhye, "Optimizing Memory Usage in the Polyhedral Model," ACM Trans. Programming Languages and Systems (TOPLAS), vol. 22, no. 5, pp. 773-815, 2000.
- (2000) ACM Trans. Programming Languages and Systems (TOPLAS) , vol.22 , Issue.5 , pp. 773-815
- Quiller, F.¹ Rajopadhye, S.V.²

32
- 0002238004
- Tiling multidimensional iteration spaces for nonshared memory machines
- Nov.
- J. Ramanujam and P. Sadayappan, "Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines," Supercomputing, Nov. 1991.
- (1991) Supercomputing
- Ramanujam, J.¹ Sadayappan, P.²

33
- 0023384075
- Stencils and problem partitionings: Their influence on the performance of multiple processor systems
- July
- D.A. Reed, L.M. Adams, and M.L. Patrick, "Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems," IEEE Trans. Computers, vol. 36, no. 7, pp. 845-858, July 1987.
- (1987) IEEE Trans. Computers , vol.36 , Issue.7 , pp. 845-858
- Reed, D.A.¹ Adams, L.M.² Patrick, M.L.³

34
- 0031140581
- Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
- V. Sarkar, "Automatic Selection of High-Order Transformations in the IBM XL FORTRAN Compilers," IBM J. Research and Development, vol. 41, no. 3, pp. 233-264, 1997.
- (1997) IBM J. Research and Development , vol.41 , Issue.3 , pp. 233-264
- Sarkar, V.¹

35
- 17244374581
- New tiling techniques to improve cache temporal locality
- Y. Song and Z. Li, "New Tiling Techniques to Improve Cache Temporal Locality," Proc. SIGPLAN, Conf. Programming Language Design and Implementation, pp. 215-228, 1999.
- (1999) Proc. SIGPLAN, Conf. Programming Language Design and Implementation , pp. 215-228
- Song, Y.¹ Li, Z.²

36
- 0037808951
- Standord SUIF Compiler System
- Standord SUIF Compiler System, http://suif.stanford.edu/, 2002.
- (2002)

37
- 0038485309
- Sweep3D Benchmark
- Sweep3D Benchmark, www.llnl.gov/asci.benchmarks/asci/limtited/sweep3d/asci_sweep3d.html, 1995.
- (1995)

38
- 0003278639
- Automatically tuned linear algebra software
- R.C. Whaley and J.J. Dongarra, "Automatically Tuned Linear Algebra Software," Supercomputer, 1998.
- (1998) Supercomputer
- Whaley, R.C.¹ Dongarra, J.J.²

39
- 0003553286
- Improving locality and parallelism in nested loops
- Phd thesis, Stanford Univ., Computer Systems Laboratory, Aug.
- M.E. Wolf, "Improving Locality and Parallelism in Nested Loops," Phd thesis, Stanford Univ., Computer Systems Laboratory, Aug. 1992.
- (1992)
- Wolf, M.E.¹

40
- 85013942562
- A data locality optimizing algorithm
- M.E. Wolf and M.S. Lam, "A Data Locality Optimizing Algorithm," Proc. Symp. Programming Language Design and Implementation, Apr. 1991.
- Proc. Symp. Programming Language Design and Implementation, Apr. 1991
- Wolf, M.E.¹ Lam, M.S.²

41
- 0026232450
- A loop transformation theory and an algorithm to maximize parallelism
- M.E. Wolf and M.S. Lam, "A Loop Transformation Theory and an Algorithm to Maximize Parallelism," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 452-471, 1991.
- (1991) IEEE Trans. Parallel and Distributed Systems , vol.2 , Issue.4 , pp. 452-471
- Wolf, M.E.¹ Lam, M.S.²

42
- 0030379246
- Combining loop transformations considering caches and scheduling
- M.E. Wolf, D. Maydan, and D.-K. Chen, "Combining Loop Transformations Considering Caches and Scheduling," Proc. Ninth Int'l Symp. Microarchitecture, Dec. 1996.
- Proc. Ninth Int'l Symp. Microarchitecture, Dec. 1996
- Wolf, M.E.¹ Maydan, D.² Chen, D.-K.³

43
- 0002433589
- Iteration space tiling for memory hierarchies
- I.M.J. Wolfe, "Iteration Space Tiling for Memory Hierarchies," Parallel Processing for Scientific Computing, pp. 357-361, 1987.
- (1987) Parallel Processing for Scientific Computing , pp. 357-361
- Wolfe, I.M.J.¹

44
- 0024935630
- More iteration space tiling
- M.J. Wolfe, "More Iteration Space Tiling," Supercomputing, pp. 655-664, 1989.
- (1989) Supercomputing , pp. 655-664
- Wolfe, M.J.¹

45
- 0003927035
- Addison-Wesley
- M.J. Wolfe, High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
- (1996) High Performance Compilers for Parallel Computing
- Wolfe, M.J.¹

46
- 4243740738
- Time skewing for parallel computers
- Springer-Verlag, Aug.
- D. Wonnacott, "Time Skewing for Parallel Computers," Languages and Compilers for Parallel Computing, Springer-Verlag, Aug. 1999.
- (1999) Languages and Compilers for Parallel Computing
- Wonnacott, D.¹

47
- 0032315190
- Reuse-driven tiling for improving data locality
- J. Xue and C.-H. Huang, "Reuse-Driven Tiling for Improving Data Locality," Int'l J. Parallel Programming, vol. 26, no. 6, pp. 671-696, 1998.
- (1998) Int'l J. Parallel Programming , vol.26 , Issue.6 , pp. 671-696
- Xue, J.¹ Huang, C.-H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.