SCOPUS 정보 검색 플랫폼

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Volumn , Issue , 2009, Pages 219-228

Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

(6) Baskaran, Muthu Manikandan a Vydyanathan, Nagavijayalakshmi a Bondhugula, Uday Kumar a Ramanujam, J b Rountev, Atanas a Sadayappan, P a

a Ohio State University (United States)

b LOUISIANA STATE UNIVERSITY (United States)

Author keywords

Compile time optimization; Dynamic scheduling; Runtime optimization

Indexed keywords

AUTOMATIC PARALLELIZATION; CHOLESKY; CHOLESKY DECOMPOSITION; COMPILATION TECHNOLOGY; COMPILE TIME; COMPILE-TIME OPTIMIZATION; COMPILER-ASSISTED; DYNAMIC EXTRACTION; DYNAMIC SCHEDULING; INPUT PROGRAMS; INPUT-AFFINE; LOAD IMBALANCE; LOAD-BALANCED; LOOP NESTS; LU DECOMPOSITION; MULTI-CORE PROCESSOR; MULTI-CORE SYSTEMS; PARALLEL CODE; PARALLEL EXECUTIONS; PARALLELIZATION; PROCESSOR CORES; RUNTIME; RUNTIME OPTIMIZATION;

BUILDING MATERIALS; EMBEDDED SYSTEMS; MICROPROCESSOR CHIPS; OCCUPATIONAL RISKS; OPTIMIZATION; PARALLEL PROGRAMMING; PROGRAM COMPILERS;

PARALLEL ALGORITHMS;

EID: 67650069905 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1504176.1504209 Document Type: Conference Paper

Times cited : (35)

References (48)

1
- 0016313256
- A comparison of list schedules for parallel processing systems
- T. L. Adam, K. M. Chandy, and J. R. Dickson. A comparison of list schedules for parallel processing systems. Commun. ACM, 17(12):685-690, 1974.
- (1974) Commun. ACM , vol.17 , Issue.12 , pp. 685-690
- Adam, T.L.¹ Chandy, K.M.² Dickson, J.R.³

2
- 0023438847
- AUTOMATIC TRANSLATION OF FORTRAN PROGRAMS TO VECTOR FORM.
- DOI 10.1145/29873.29875
- R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Trans. on Programming Languages and Systems, 9(4):491-542, 1987. (Pubitemid 18531687)
- (1987) ACM Transactions on Programming Languages and Systems , vol.9 , Issue.4 , pp. 491-542
- Allen Randy¹ Kennedy Ken²

3
- 84976766536
- Scanning polyhedra with do loops
- C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPoPP'91, pages 39-50, 1991.
- (1991) PPoPP'91 , pp. 39-50
- Ancourt, C.¹ Irigoin, F.²

4
- 10444289646
- Code generation in the polyhedral model is easier than you think
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'04, pages 7-16, 2004.
- (2004) PACT'04 , pp. 7-16
- Bastoul, C.¹

5
- 10444255848
- Putting polyhedral loop transformations to work
- C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam. Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC'03), pages 23-30, 2003.
- (2003) Workshop on Languages and Compilers for Parallel Computing (LCPC'03) , pp. 23-30
- Bastoul, C.¹ Cohen, A.² Girbal, S.³ Sharma, S.⁴ Temam, O.⁵

6
- 57349110181
- Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences
- May
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences. Technical Report OSU-CISRC- 5/07-TR43, Ohio State University, May 2007.
- (2007) Technical Report OSU-CISRC- 5/07-TR43 Ohio State University
- Bondhugula, U.¹ Baskaran, M.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

7
- 57349145904
- Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model
- Apr.
- U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International Conference on Compiler Construction (ETAPS CC), Apr. 2008.
- (2008) International Conference on Compiler Construction (ETAPS CC)
- Bondhugula, U.¹ Baskaran, M.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

8
- 57349139452
- A practical automatic polyhedral parallelizer and locality optimizer
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN Programming Languages Design and Implementation (PLDI'08), 2008.
- (2008) ACM SIGPLAN Programming Languages Design and Implementation (PLDI'08)
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

9
- 47249137843
- Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer
- Oct.
- U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
- (2007) Technical Report OSU-CISRC-10/07-TR70, The Ohio State University
- Bondhugula, U.¹ Ramanujam, J.² Sadayappan, P.³

10
- 0032066690
- Loop parallelization algorithms: From parallelism extraction to code generation
- PII S0167819198000209
- P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Computing, 24(3-4):421-444, 1998. (Pubitemid 128413646)
- (1998) Parallel Computing , vol.24 , Issue.3-4 , pp. 421-444
- Boulet, P.¹ Darte, A.² Silber, G.-A.³ Vivien, F.⁴

11
- 36048997493
- Multithreading for synchronization tolerance in matrix factorization
- Proceedings of the SciDAC 2007 Conference
- A. Buttari, J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick. Multithreading for synchronization tolerance in matrix factorization. In Proceedings of the SciDAC 2007 Conference. Journal of Physics: Conference Series, 2007.
- (2007) Journal of Physics: Conference Series
- Buttari, A.¹ Dongarra, J.² Husbands, P.³ Kurzak, J.⁴ Yelick, K.⁵

12
- 51049101584
- A class of parallel tiled linear algebra algorithms for multicore architectures
- September, Submitted to Parallel Computing. LAPACK Working Note 191
- A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Technical Report UT-CS-07-600, Innovative Computing Laboratory, University of Tennessee Knoxville, September 2007. Submitted to Parallel Computing. LAPACK Working Note 191.
- (2007) Technical Report UT-CS-07-600, Innovative Computing Laboratory, University of Tennessee Knoxville
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

13
- 0028744946
- An efficient algorithm for the run-time parallelization of doacross loops
- D.-K. Chen, J. Torrellas, and P.-C. Yew. An efficient algorithm for the run-time parallelization of doacross loops. In Supercomputing'94: Proceedings of the 1994 conference on Supercomputing, pages 518-527, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
- (1994) Supercomputing'94: Proceedings of the 1994 conference on Supercomputing , pp. 518-527
- Chen, D.-K.¹ Torrellas, J.² Yew, P.-C.³

14
- 0038378430
- Toward efficient and robust software speculative parallelization on multiprocessors
- New York, NY, USA, ACM.
- M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP'03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 13-24, New York, NY, USA, 2003. ACM.
- (2003) PPoPP'03: Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of parallel programming , pp. 13-24
- Cintra, M.¹ Llanos, D.R.²

15
- 84877711343
- CLooG: The Chunky Loop Generator. http://www.cloog.org.
- CLooG: The Chunky Loop Generator

16
- 0342782260
- Combining retiming and scheduling techniques for loop parallelization and loop tiling'
- A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379-392, 1997. (Pubitemid 127732656)
- (1997) Parallel Processing Letters , vol.7 , Issue.4 , pp. 379-392
- Darte, A.¹

17
- 0031358458
- Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs
- A. Darte and F. Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. IJPP, 25(6):447. 496, Dec. 1997. (Pubitemid 127507526)
- (1997) International Journal of Parallel Programming , vol.25 , Issue.6 , pp. 447-496
- Darte, A.¹ Vivien, F.²

18
- 70350588404
- Four important concepts that will effect math software
- J. Dongarra. Four important concepts that will effect math software. In 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08), 2008.
- (2008) 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA'08)
- Dongarra, J.¹

19
- 0026109335
- Dataflow analysis of array and scalar references
- P. Feautrier. Dataflow analysis of array and scalar references. IJPP, 20(1):23-53, 1991.
- (1991) IJPP , vol.20 , Issue.1 , pp. 23-53
- Feautrier, P.¹

20
- 0026933251
- Some efficient solutions to the affine scheduling problem. I. One-dimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one-dimensional time. IJPP, 21(5):313-348, 1992. (Pubitemid 23705312)
- (1992) International Journal of Parallel Programming , vol.21 , Issue.5 , pp. 313-347
- Feautrier Paul¹

21
- 0001448065
- Some efficient solutions to the affine scheduling problem, part II: Multidimensional time
- P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. IJPP, 21(6):389-420, 1992.
- (1992) IJPP , vol.21 , Issue.6 , pp. 389-420
- Feautrier, P.¹

22
- 84957027384
- Automatic parallelization in the polytope model
- P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79-103, 1996.
- (1996) The Data Parallel Programming Model , pp. 79-103
- Feautrier, P.¹

23
- 0027606922
- On the granularity and clustering of directed acyclic task graphs
- DOI 10.1109/71.242154
- A. Gerasoulis and T. Yang. On the granularity and clustering of directed acyclic task graphs. IEEE Trans. Parallel Distrib. Syst., 4(6):686-701, 1993. (Pubitemid 23709227)
- (1993) IEEE Transactions on Parallel and Distributed Systems , vol.4 , Issue.6 , pp. 686-701
- Gerasoulis Apostolos¹ Yang Tao²

24
- 33746593747
- Semi-automatic composition of loop transformations
- June
- S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. IJPP, 34(3):261-317, June 2006.
- (2006) IJPP , vol.34 , Issue.3 , pp. 261-317
- Girbal, S.¹ Vasilache, N.² Bastoul, C.³ Cohen, A.⁴ Parello, D.⁵ Sigler, M.⁶ Temam, O.⁷

25
- 33646559059
- FMI, University of Passau, Habilitation Thesis
- M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004. Habilitation Thesis.
- (2004) Automatic Parallelization of Loop Programs for Distributed Memory Architectures
- Griebl, M.¹

26
- 0025539983
- Parallel processing of near fine grain tasks using static scheduling on OSCAR (Optimally Scheduled Advanced Multiprocessor)
- Proc Supercomput 90
- H. Kasahara, H. Honda, and S. Narita. Parallel processing of near fine grain tasks using static scheduling on oscar (optimally scheduled advanced multiprocessor). In Supercomputing'90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pages 856-864, Washington, DC, USA, 1990. IEEE Computer Society. (Pubitemid 21675205)
- (1990) Supercomputing'90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing , pp. 856-864
- Kasahara Hironori¹ Honda Hiroki² Narita Seinosuke³

27
- 0002050141
- Static scheduling algorithms for allocating directed task graphs to multiprocessors
- Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406-471, 1999.
- (1999) ACM Comput. Surv. , vol.31 , Issue.4 , pp. 406-471
- Kwok, Y.-K.¹ Ahmad, I.²

28
- 0027829921
- Improving the performance of runtime parallelization
- S.-T. Leung and J. Zahorjan. Improving the performance of runtime parallelization. SIGPLAN Not., 28(7):83-91, 1993.
- (1993) SIGPLAN Not. , vol.28 , Issue.7 , pp. 83-91
- Leung, S.-T.¹ Zahorjan, J.²

29
- 4243731804
- PhD thesis, Stanford University, Aug.
- A. Lim. Improving Parallelism And Data Locality With affine Partitioning. PhD thesis, Stanford University, Aug. 2001.
- (2001) Improving Parallelism And Data Locality With affine Partitioning
- Lim, A.¹

30
- 17644395320
- Blocking and array contraction across arbitrarily nested loops using affine partitioning
- A. Lim, S. Liao, and M. Lam. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In ACM SIGPLAN PPoPP, pages 103-112, 2001. (Pubitemid 33720383)
- (2001) SIGPLAN Notices (ACM Special Interest Group on Programming Languages) , vol.36 , Issue.7 , pp. 103-112
- Lim, A.W.¹ Liao, S.-W.² Lam, M.S.³

31
- 0032662841
- An affine partitioning algorithm to maximize parallelism and minimize communication
- A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ACM Intl. Conf. on Supercomputing, pages 228.237, 1999.
- (1999) ACM Intl. Conf. on Supercomputing , pp. 228-237
- Lim, A.W.¹ Cheong, G.I.² Lam, M.S.³

32
- 0032067773
- Maximizing parallelism and minimizing synchronization with affine partitions
- PII S0167819198000210
- A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3- 4):445.475, 1998. (Pubitemid 128413647)
- (1998) Parallel Computing , vol.24 , Issue.3-4 , pp. 445-475
- Lim, A.W.¹ Lam, M.S.²

33
- 84870501166
- Parallel linear algebra for scalable multi-core architectures (PLASMA) project. http://icl.cs.utk.edu/plasma.
- Parallel linear algebra for scalable multi-core architectures (PLASMA) project

34
- 84877715579
- PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.
- PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores

35
- 0027735065
- Runtime compilation techniques for data partitioning and communication schedule reuse
- New York, NY, USA, ACM.
- R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Supercomputing'93: Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 361.370, New York, NY, USA, 1993. ACM.
- (1993) Supercomputing'93: Proceedings of the 1993 ACM/IEEE conference on Supercomputing , pp. 361-370
- Ponnusamy, R.¹ Saltz, J.² Choudhary, A.³

36
- 84976676720
- The Omega test: A fast and practical integer programming algorithm for dependence analysis
- Aug.
- W. Pugh. The Omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, 8:102-114, Aug. 1992.
- (1992) Communications of the ACM , vol.8 , pp. 102-114
- Pugh, W.¹

37
- 0034299275
- Generation of efficient nested loops from polyhedra
- DOI 10.1023/A:1007554627716
- F. Quilleŕe, S. V. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. IJPP, 28(5):469-498, 2000. (Pubitemid 30959586)
- (2000) International Journal of Parallel Programming , vol.28 , Issue.5 , pp. 469-498
- Quillere, F.¹ Rajopadhye, S.² Wilde, D.³

38
- 31844447800
- Mitosis compiler: An infrastructure for speculative threading based on pre-computation slices
- Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 05
- C. G. Quinones, C. Madriles, J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: An infrastructure for speculative threading based on pre-computation slices. In PLDI 05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 269.279, 2005. (Pubitemid 43182906)
- (2005) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , pp. 269-279
- Quinones, C.G.¹ Madriles, C.² Sanchez, J.³ Marcuello, P.⁴ Gonzalez, A.⁵ Tullsen, D.M.⁶

39
- 84976823223
- The lrpd test: Speculative runtime parallelization of loops with privatization and reduction parallelization
- L. Rauchwerger and D. Padua. The lrpd test: speculative runtime parallelization of loops with privatization and reduction parallelization. SIGPLAN Not., 30(6):218-232, 1995.
- (1995) SIGPLAN Not. , vol.30 , Issue.6 , pp. 218-232
- Rauchwerger, L.¹ Padua, D.²

40
- 33745216602
- Low-cost thread-level data dependence speculation on multiprocessors
- P. Rundberg and P. S. Om. Low-cost thread-level data dependence speculation on multiprocessors. In In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, pages 1-9, 2000.
- (2000) Fourth Workshop on Multithreaded Execution, Architecture and Compilation , pp. 1-9
- Rundberg, P.¹ Om, P.S.²

41
- 34548045548
- Sensitivity analysis for automatic parallelization on multi-cores
- DOI 10.1145/1274971.1275008, Proceedings of ICS07: 21st ACM International Conference on Supercomputing
- S. Rus, M. Pennings, and L. Rauchwerger. Sensitivity analysis for automatic parallelization on multi-cores. In ICS'07: Proceedings of the 21st annual international conference on Supercomputing, pages 263.273, New York, NY, USA, 2007. ACM. (Pubitemid 47281623)
- (2007) Proceedings of the International Conference on Supercomputing , pp. 263-273
- Rus, S.¹ Pennings, M.² Rauchwerger, L.³

42
- 45149139984
- Run-time scheduling and execution of loops on message passing machines
- DOI 10.1016/0743-7315(90)90129-D
- J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time scheduling and execution of loops on message passing machines. J. Parallel Distrib. Comput., 8(4):303-312, 1990. (Pubitemid 20682861)
- (1990) Journal of Parallel and Distributed Computing , vol.8 , Issue.4 , pp. 303-312
- Saltz Joel¹ Crowley Kathleen² Mirchandaney Ravi³ Berryman Harry⁴

43
- 0026152428
- Run-time parallelization and scheduling of loops
- DOI 10.1109/12.88484
- J. H. Salz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Trans. Comput., 40(5):603-612, 1991. (Pubitemid 21675674)
- (1991) IEEE Transactions on Computers , vol.40 , Issue.5 , pp. 603-612
- Saltz Joel, H.¹ Mirchandaney Ravi² Crowley Kay³

44
- 0003493010
- MIT Press, Cambridge, MA, USA
- V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, USA, 1989.
- (1989) Partitioning and Scheduling Parallel Programs for Multiprocessors
- Sarkar, V.¹

45
- 84976746768
- Compile-time partitioning and scheduling of parallel programs
- New York, NY, USA, ACM.
- V. Sarkar and J. Hennessy. Compile-time partitioning and scheduling of parallel programs. In SIGPLAN'86: Proceedings of the 1986 SIGPLAN symposium on Compiler construction, pages 17-26, New York, NY, USA, 1986. ACM.
- (1986) SIGPLAN'86: Proceedings of the 1986 SIGPLAN symposium on Compiler construction , pp. 17-26
- Sarkar, V.¹ Hennessy, J.²

46
- 33745804733
- Polyhedral code generation in the real world
- DOI 10.1007/11688839-16, Compiler Construction - 15th International Conference, CC 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Proceedings
- N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In International Conference on Compiler Construction (ETAPS CC'06), pages 185.201, Mar. 2006. (Pubitemid 44019652)
- (2006) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.3923 , pp. 185-201
- Vasilache, N.¹ Bastoul, C.² Cohen, A.³

47
- 67650016545
- Violated dependence analysis
- June
- N. Vasilache, C. Bastoul, S. Girbal, and A. Cohen. Violated dependence analysis. In ACM ICS, June 2006.
- (2006) ACM ICS
- Vasilache, N.¹ Bastoul, C.² Girbal, S.³ Cohen, A.⁴

48
- 0026232450
- Loop transformation theory and an algorithm to maximize parallelism
- DOI 10.1109/71.97902
- M. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452-471, 1991. (Pubitemid 23624757)
- (1991) IEEE Transactions on Parallel and Distributed Systems , vol.2 , Issue.4 , pp. 452-471
- Wolf Michael, E.¹ Lam Monica, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.