SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Volumn , Issue , 2008, Pages 111-122

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

(6) Baskaran, Muthu Manikandan a Ramanujam, J b Bondhugula, Uday a Rountev, Atanas a Krishnamoorthy, Sriram a Sadayappan, P a

a Ohio State University (United States)

b Louisiana State University (United States)

Author keywords

Data movement; Graphics processor unit; Multi level tiling; Scratchpad memory

Indexed keywords

ARRAY ACCESS FUNCTIONS; AUTOMATIC DETERMINATION; CELL PROCESSOR; COMPUTATIONAL POWER; DATA MOVEMENTS; FAST MEMORY; GRAPHICS PROCESSOR UNITS; LOCAL MEMORIES; MULTI-LEVEL; MULTI-LEVEL TILING; MULTIPLE LEVELS; OFF-CHIP MEMORIES; ON CHIP MEMORY; ON CHIPS; SCRATCH PAD MEMORY; TILE SIZE;

AUTOMATIC PROGRAMMING; COMPUTER PROGRAMMING LANGUAGES; IMAGE CODING; INFORMATION MANAGEMENT; MEMORY ARCHITECTURE; PARALLEL ARCHITECTURES; PARALLEL PROCESSING SYSTEMS; PROGRAM PROCESSORS;

PARALLEL PROGRAMMING;

EID: 79959456077 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (57)

References (40)

1
- 0032305438
- Compiler optimizations for real time execution of loops on limited memory embedded systems
- S. Anantharaman and S. Pande. Compiler optimizations for real time execution of loops on limited memory embedded systems. In IEEE Real-Time Systems Symposium, pages 154-164, 1998.
- (1998) IEEE Real-Time Systems Symposium , pp. 154-164
- Anantharaman, S.¹ Pande, S.²

2
- 0003418094
- http://mathatlas.sourceforge.net/
- Automatically Tuned Linear Algebra Software (ATLAS). http://math-atlas. sourceforge.net/ http://mathatlas.sourceforge.net/.
- Automatically Tuned Linear Algebra Software (ATLAS)

3
- 34547453716
- Loop transformation methodologies for array-oriented memory management
- F. Balasa, P. Kjeldsberg, M. Palkovic, A. Vandecappelle, and F. Catthoor. Loop transformation methodologies for array-oriented memory management. In 17th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'06), pages 205-212, 2006.
- (2006) 17th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'06) , pp. 205-212
- Balasa, F.¹ Kjeldsberg, P.² Palkovic, M.³ Vandecappelle, A.⁴ Catthoor, F.⁵

4
- 0003713964
- 2nd Edition. Athena Scientific. ISBN 1-886529-00-0
- D. P. Bertsekas. Nonlinear Programming: 2nd Edition. Athena Scientific. ISBN 1-886529-00-0.
- Nonlinear Programming
- Bertsekas, D.P.¹

5
- 33751022080
- Programming for parallelism and locality with hierarchically tiled arrays
- G. Bikshandi, J. Guo, D. Hoeflinger, G. Almasi, B. B. Fraguela, M. J. Garzaran, D. Padua, and C. von Praun. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP, pages 48-57, 2006.
- (2006) PPoPP , pp. 48-57
- Bikshandi, G.¹ Guo, J.² Hoeflinger, D.³ Almasi, G.⁴ Fraguela, B.B.⁵ Garzaran, M.J.⁶ Padua, D.⁷ Von Praun, C.⁸

6
- 0030661485
- Optimizing matrix multiply using PHiPAC
- J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC. In Proc. ACM International Conference on Supercomputing, pages 340-347, 1997.
- (1997) Proc. ACM International Conference on Supercomputing , pp. 340-347
- Bilmes, J.¹ Asanovic, K.² Chin, C.³ Demmel, J.⁴

7
- 57349110181
- Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences
- Ohio State University, May
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences. Technical Report OSU-CISRC5/07-TR43, Ohio State University, May 2007.
- (2007) Technical Report OSU-CISRC5/07-TR43
- Bondhugula, U.¹ Baskaran, M.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

8
- 0003502725
- Kluwer Academic Publishers
- F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T V. Achteren, and T Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002.
- (2002) Data Access and Storage Management for Embedded Programmable Processors
- Catthoor, F.¹ Danckaert, K.² Kulkarni, C.³ Brockmeyer, E.⁴ Kjeldsberg, P.⁵ Achteren, T.⁶ Omnes, T.⁷

9
- 0029717349
- Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs
- P. Clauss. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs. In ICS '96: Proceedings of the 10th international conference on Supercomputing, pages 278-285, 1996.
- (1996) ICS '96: Proceedings of the 10th International Conference on Supercomputing , pp. 278-285
- Clauss, P.¹

10
- 79959483988
- CLooG: The Chunky Loop Generator, http://www.cloog.org.

11
- 0031358458
- Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs
- Dec.
- A. Darte and F. Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. IJPP, 25(6):447-496, Dec. 1997.
- (1997) IJPP , vol.25 , Issue.6 , pp. 447-496
- Darte, A.¹ Vivien, F.²

12
- 0346757617
- A strategy for array management in local memory
- Irvine, Calif., Cambridge, Mass.: MIT Press
- C. Eisenbeis, W. Jalby, D. Windheiser, and F. Bodin. A strategy for array management in local memory. In Advances in Languages and Compilers for Parallel Computing, 1990 Workshop, pages 130-151, Irvine, Calif., 1990. Cambridge, Mass.: MIT Press.
- (1990) Advances in Languages and Compilers for Parallel Computing, 1990 Workshop
- Eisenbeis, C.¹ Jalby, W.² Windheiser, D.³ Bodin, F.⁴

13
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
- Fatahalian, K.¹ Knight, T.J.² Houston, M.³ Erez, M.⁴ Horn, D.R.⁵ Leem, L.⁶ Park, J.Y.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

14
- 0001023389
- Parametric integer programming
- P. Feautrier. Parametric integer programming. Operationnelle/Operatiom Research, 22(3):243-268, 1988.
- (1988) Operationnelle/Operatiom Research , vol.22 , Issue.3 , pp. 243-268
- Feautrier, P.¹

15
- 0026109335
- Dataflow analysis of array and scalar references
- P. Feautrier. Dataflow analysis of array and scalar references. IJPP, 20(1):23-53, 1991.
- (1991) IJPP , vol.20 , Issue.1 , pp. 23-53
- Feautrier, P.¹

16
- 0026933251
- Some efficient solutions to the affine scheduling problem: I. one-dimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. IJPP, 21(5):313-348, 1992.
- (1992) IJPP , vol.21 , Issue.5 , pp. 313-348
- Feautrier, P.¹

17
- 0001448065
- Some efficient solutions to the affine scheduling problem. partii, multidimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem. partii, multidimensional time. IJPP, 21(6):389-420, 1992.
- (1992) IJPP , vol.21 , Issue.6 , pp. 389-420
- Feautrier, P.¹

18
- 84957027384
- Automatic parallelization in the polytope model
- P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79-103, 1996.
- (1996) The Data Parallel Programming Model , pp. 79-103
- Feautrier, P.¹

19
- 85015240805
- On estimating and enhancing cache effectiveness
- London, UK, Springer-Verlag
- J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pages 328-343, London, UK, 1992. Springer-Verlag.
- (1992) Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing , pp. 328-343
- Ferrante, J.¹ Sarkar, V.² Thrash, W.³

20
- 84862940593
- Strategies for cache and local memory management by global program transformation
- New York, NY, USA, Springer-Verlag New York, Inc.
- D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. In Proceedings of the 1st International Conference on Supercomputing, pages 229-254, New York, NY, USA, 1988. Springer-Verlag New York, Inc.
- (1988) Proceedings of the 1st International Conference on Supercomputing , pp. 229-254
- Gannon, D.¹ Jalby, W.² Gallivan, K.³

21
- 33646559059
- FMI, University of Passau, Habilitation Thesis
- M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004. Habilitation Thesis.
- (2004) Automatic Parallelization of Loop Programs for Distributed Memory Architectures
- Griebl, M.¹

22
- 34547227870
- Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies
- I. Issenin, E. Brockmeyer, B. Durinck, and N. Dutt. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In DAC '06: Proceedings of the 43rd annual conference on Design automation, pages 49-52, 2006.
- (2006) DAC '06: Proceedings of the 43rd Annual Conference on Design Automation , pp. 49-52
- Issenin, I.¹ Brockmeyer, E.² Durinck, B.³ Dutt, N.⁴

23
- 0242578180
- A cost-effective implementation of multilevel tiling
- M. Jimnez, J. M. Llabera, and A. Fernndez. A cost-effective implementation of multilevel tiling. IEEE Trans. Parallel Distrib. Syst., 14(10): 1006-1020, 2003.
- (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.10 , pp. 1006-1020
- Jimnez, M.¹ Llabera, J.M.² Fernndez, A.³

24
- 2142707258
- Compiler-directed scratch pad memory optimization for embedded multiprocessors
- M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu. Compiler-directed scratch pad memory optimization for embedded multiprocessors. IEEE Transactions on VLSI (TVLSI), 12(3):281-287, 2004.
- (2004) IEEE Transactions on VLSI (TVLSI) , vol.12 , Issue.3 , pp. 281-287
- Kandemir, M.¹ Kadayif, I.² Choudhary, A.³ Ramanujam, J.⁴ Kolcu, I.⁵

25
- 1242286076
- A compiler based approach for dynamically managing scratch-pad memories in embedded systems
- M. Kandemir, J. Ramanujam, M. Irwin, V. Narayanan, I. Kadayif, and A. Parikh. A compiler based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on Computer-Aided Design, 23(2):243-260, 2004.
- (2004) IEEE Transactions on Computer-Aided Design , vol.23 , Issue.2 , pp. 243-260
- Kandemir, M.¹ Ramanujam, J.² Irwin, M.³ Narayanan, V.⁴ Kadayif, I.⁵ Parikh, A.⁶

26
- 56749175334
- Multi-level tiling: M for the price of one
- November
- D. Kim, L. Renganarayana, D. Rostron, S. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In SC, November 2007.
- (2007) SC
- Kim, D.¹ Renganarayana, L.² Rostron, D.³ Rajopadhye, S.⁴ Strout, M.M.⁵

27
- 35448944792
- Effective automatic parallelization of stencil computations
- July
- S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN PLDI2007, July 2007.
- (2007) ACM SIGPLAN PLDI2007
- Krishnamoorthy, S.¹ Baskaran, M.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

28
- 4243731804
- PhD thesis, Stanford University, Aug.
- A. Lim. Improving Parallelism And Data Locality With Affine Partitioning. PhD thesis, Stanford University, Aug. 2001.
- (2001) Improving Parallelism and Data Locality with Affine Partitioning
- Lim, A.¹

29
- 0030645995
- Maximizing parallelism and minimizing synchronization with affine transforms
- A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. InPOPL'97, pages 201-214, 1997.
- (1997) InPOPL' , vol.97 , pp. 201-214
- Lim, A.W.¹ Lam, M.S.²

30
- 79959401728
- NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html.

31
- 33746967016
- Data and memory optimization techniques for embedded systems. ACM Trans
- P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandecappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems. ACM Trans. Design Autom. Electr. Syst., 6(2):149-206, 2001.
- (2001) Design Autom. Electr. Syst. , vol.6 , Issue.2 , pp. 149-206
- Panda, P.R.¹ Catthoor, F.² Dutt, N.D.³ Danckaert, K.⁴ Brockmeyer, E.⁵ Kulkarni, C.⁶ Vandecappelle, A.⁷ Kjeldsberg, P.G.⁸

32
- 84887458374
- PolyLib - A library of polyhedral functions. http://icps.u-strasbg.tr/ polylib/.
- PolyLib - A Library of Polyhedral Functions

33
- 34547683700
- Iterative optimization in the polyhedral model: Part I, one-dimensional time
- L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time. In CGO '07, pages 144-156, 2007.
- (2007) CGO '07 , pp. 144-156
- Pouchet, L.-N.¹ Bastoul, C.² Cohen, A.³ Vasilache, N.⁴

34
- 84976676720
- The omega test: A fast and practical integer programming algorithm for dependence analysis
- Aug
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. Communication's of the ACM, 8:102-114, Aug. 1992.
- (1992) Communication's of the ACM , vol.8 , pp. 102-114
- Pugh, W.¹

35
- 0028132512
- Counting solutions to presburger formulas: How and why
- W. Pugh. Counting solutions to presburger formulas: how and why. In PLDI '94: Proceedings of the ACM SIGPLAN1994 conference on Programming language design and implementation, pages 121-134, 1994.
- (1994) PLDI '94: Proceedings of the ACM SIGPLAN1994 Conference on Programming Language Design and Implementation , pp. 121-134
- Pugh, W.¹

36
- 0034857668
- Reducing memory requirements of nested loops for embedded systems
- J. Ramanujam, J. Hong, M. Kandemir, and A. Narayan. Reducing memory requirements of nested loops for embedded systems. In DAC '01: Proceedings of the 38th conference on Design automation, pages 359-364, 2001.
- (2001) DAC '01: Proceedings of the 38th Conference on Design Automation , pp. 359-364
- Ramanujam, J.¹ Hong, J.² Kandemir, M.³ Narayan, A.⁴

37
- 34548752231
- Towards optimal multi-level tiling for stencil computations
- IEEE
- L. Renganarayanan, M. Harthikote-Matha, R. Dewri, and S. V. Rajopadhye. Towards optimal multi-level tiling for stencil computations. In IPDPS, pages 1-10. IEEE, 2007.
- (2007) IPDPS , pp. 1-10
- Renganarayanan, L.¹ Harthikote-Matha, M.² Dewri, R.³ Rajopadhye, S.⁴

38
- 78650907365
- Near-Optimal allocation of local memory arrays
- HP Laboratories Palo Alto
- R. Schreiber and D. C. Cronquist. Near-Optimal Allocation of Local Memory Arrays. Technical Report HPL-2004-24, HP Laboratories Palo Alto, 2004.
- (2004) Technical Report HPL-2004-24
- Schreiber, R.¹ Cronquist, D.C.²

39
- 67650016545
- Violated dependence analysis
- June
- N. Vasilache, C. Bastoul, S. Girbal, and A. Cohen. Violated dependence analysis. In ACM ICS, June 2006.
- (2006) ACM ICS
- Vasilache, N.¹ Bastoul, C.² Girbal, S.³ Cohen, A.⁴

40
- 0032656355
- Exact memory size estimation for array computations without loop unrolling
- Y. Zhao and S. Malik. Exact memory size estimation for array computations without loop unrolling. In DAC '99: Proceedings of the 36th ACM/IEEE conference on Design automation, pages 811-816, 1999.
- (1999) DAC '99: Proceedings of the 36th ACM/IEEE Conference on Design Automation , pp. 811-816
- Zhao, Y.¹ Malik, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.