SCOPUS 정보 검색 플랫폼

Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011

Volumn , Issue , 2011, Pages 676-687

PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures

(3) Christen, Matthias a Schenk, Olaf a Burkhart, Helmar a

a UNIVERSITY OF BASEL (Switzerland)

Author keywords

autotuning; code generation; high performance computing; stencil computations

Indexed keywords

AUTOTUNING; CODE GENERATION; COMPLEX HARDWARE; GRAPHICS PROCESSING UNIT; HARDWARE ARCHITECTURE; HIGH PERFORMANCE COMPUTING; IMAGE PROCESSING APPLICATIONS; MANY CORE; MICRO ARCHITECTURES; MULTI CORE; MULTIGRID METHODS; OPTIMIZATION STRATEGY; PARALLEL IMPLEMENTATIONS; PARALLELIZATIONS; PDE SOLVERS; SCIENTIFIC COMPUTING APPLICATIONS; STENCIL COMPUTATIONS; TIME-TO-SOLUTION;

ARCHITECTURE; COMPUTER ARCHITECTURE; COMPUTER SOFTWARE SELECTION AND EVALUATION; DISTRIBUTED PARAMETER NETWORKS; IMAGE PROCESSING; OPTIMIZATION; PROGRAM PROCESSORS;

NETWORK COMPONENTS;

EID: 80053238973 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2011.70 Document Type: Conference Paper

Times cited : (273)

References (31)

1
- 77954022347
- An Auto-tuning Framework for Parallel Multicore Stencil Computations
- S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An Auto-tuning Framework For Parallel Multicore Stencil Computations," in IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2010, pp. 1-12.
- IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2010 , pp. 1-12
- Kamil, S.¹ Chan, C.² Oliker, L.³ Shalf, J.⁴ Williams, S.⁵

2
- 84943297310
- Automatically Tuned Linear Algebra Software
- R. C. Whaley and J. Dongarra, "Automatically Tuned Linear Algebra Software," in SuperComputing 1998: High Performance Networking and Computing, 1998.
- (1998) SuperComputing 1998: High Performance Networking and Computing
- Whaley, R.C.¹ Dongarra, J.²

3
- 70349742199
- R. A. van de Geijn and E. S. Quintana-Ortí, The Science of Programming Matrix Computations. www.lulu.com, 2008.
- (2008) The Science of Programming Matrix Computations
- Van De Geijn, R.A.¹ Quintana-Ortí, E.S.²

4
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- R. Vuduc, J. W. Demmel, and K. A. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," Journal of Physics: Conference Series, vol. 16, no. 1, p. 521, 2005.
- (2005) Journal of Physics: Conference Series , vol.16 , Issue.1 , pp. 521
- Vuduc, R.¹ Demmel, J.W.² Yelick, K.A.³

5
- 20744449792
- The Design and Implementation of FFTW3
- special issue on "Program Generation, Optimization, and Platform Adaptation"
- M. Frigo and S. G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE, vol. 93, no. 2, pp. 216-231, 2005, special issue on "Program Generation, Optimization, and Platform Adaptation".
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.G.²

6
- 19344368072
- SPIRAL: Code generation for DSP transforms
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", vol. 93, no. 2, pp. 232-275, 2005.
- (2005) Proceedings of the IEEE, Special Issue on "Program Generation, Optimization, and Adaptation" , vol.93 , Issue.2 , pp. 232-275
- Püschel, M.¹ Moura, J.M.F.² Johnson, J.³ Padua, D.⁴ Veloso, M.⁵ Singer, B.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gacic, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

7
- 0242578173
- An Efficient Code Generation Technique for Tiled Iteration Spaces
- G. Goumas, M. Athanasaki, and N. Koziris, "An Efficient Code Generation Technique for Tiled Iteration Spaces," IEEE Transactions on Parallel and Distributed Systems, vol. 14, pp. 1021-1034, 2003.
- (2003) IEEE Transactions on Parallel and Distributed Systems , vol.14 , pp. 1021-1034
- Goumas, G.¹ Athanasaki, M.² Koziris, N.³

8
- 77954412565
- Loop Transformation Recipes for Code Generation and Auto-Tuning
- Languages and Compilers for Parallel Computing, ser. G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., Springer Berlin / Heidelberg
- M. Hall, J. Chame, C. Chen, J. Shin, G. Rudy, and M. Khan, "Loop Transformation Recipes for Code Generation and Auto-Tuning," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., vol. 5898. Springer Berlin / Heidelberg, 2010, pp. 50-64.
- (2010) Lecture Notes in Computer Science , vol.5898 , pp. 50-64
- Hall, M.¹ Chame, J.² Chen, C.³ Shin, J.⁴ Rudy, G.⁵ Khan, M.⁶

9
- 24644456455
- Automatic tiling of iterative stencil loops
- DOI 10.1145/1034774.1034777
- Z. Li and Y. Song, "Automatic tiling of iterative stencil loops,"ACM Trans. Program. Lang. Syst., vol. 26, no. 6, pp. 975-1028, 2004. (Pubitemid 41270296)
- (2004) ACM Transactions on Programming Languages and Systems , vol.26 , Issue.6 , pp. 975-1028
- Li, Z.¹ Song, Y.²

10
- 35448985754
- Parameterized Tiled Loops for Free
- June [Online]. Available
- L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout, "Parameterized Tiled Loops for Free," SIGPLAN Not., vol. 42, pp. 405-414, June 2007. [Online]. Available: http://doi.acm.org/10.1145/1273442. 1250780
- (2007) SIGPLAN Not. , vol.42 , pp. 405-414
- Renganarayanan, L.¹ Kim, D.² Rajopadhye, S.³ Strout, M.M.⁴

11
- 78649765479
- Tiling optimizations for 3D scientific computations
- G. Rivera and C. Tseng, "Tiling optimizations for 3D scientific computations," in Supercomputing, ACM/IEEE 2000 Conference, 2000.
- Supercomputing, ACM/IEEE 2000 Conference, 2000
- Rivera, G.¹ Tseng, C.²

12
- 57349139452
- A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A Practical Automatic Polyhedral Parallelizer and Locality Optimizer," in Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008.
- Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

13
- 80053283808
- Scientific Computing with Multicore and Accelerators. CRC Press ch.
- K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick, Scientific Computing with Multicore and Accelerators. CRC Press, 2010, ch. Auto-tuning Stencil Computations on Multicore and Accelerators, pp. 219-253.
- (2010) Auto-tuning Stencil Computations on Multicore and Accelerators , pp. 219-253
- Datta, K.¹ Williams, S.² Volkov, V.³ Carter, J.⁴ Oliker, L.⁵ Shalf, J.⁶ Yelick, K.⁷

14
- 84983141180
- Scientific Computing with Multicore and Accelerators. CRC Press, ch.
- M. Christen, O. Schenk, E. Neufeld, M. Paulides, and H. Burkhart, Scientific Computing with Multicore and Accelerators. CRC Press, 2010, ch. Manycore Stencil Computations in Hyperthermia Applications, pp. 255-277.
- (2010) Manycore Stencil Computations in Hyperthermia Applications , pp. 255-277
- Christen, M.¹ Schenk, O.² Neufeld, E.³ Paulides, M.⁴ Burkhart, H.⁵

15
- 79551491518
- A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
- 10.1007/s10766-010-0142-5. [Online]. Available
- J. Meng and K. Skadron, "A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,"International Journal of Parallel Programming, vol. 39, pp. 115-142, 2011, 10.1007/s10766-010-0142-5. [Online]. Available: http://dx.doi.org/10.1007/s10766-010-0142-5
- (2011) International Journal of Parallel Programming , vol.39 , pp. 115-142
- Meng, J.¹ Skadron, K.²

16
- 70449657442
- Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
- G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske, "Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization," in COMPSAC (1), 2009, pp. 579-586.
- (2009) COMPSAC (1) , pp. 579-586
- Wellein, G.¹ Hager, G.² Zeiser, T.³ Wittmann, M.⁴ Fehske, H.⁵

17
- 32844463802
- Cache oblivious stencil computations
- New York, NY, USA: ACM
- M. Frigo and V. Strumpen, "Cache oblivious stencil computations,"in ICS '05: Proceedings of the 19th annual international conference on Supercomputing. New York, NY, USA: ACM, 2005, pp. 361-366.
- (2005) ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing , pp. 361-366
- Frigo, M.¹ Strumpen, V.²

18
- 77954709215
- Cache oblivious parallelograms in iterative stencil computations
- R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel, "Cache oblivious parallelograms in iterative stencil computations,"ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 49-59, 2010.
- (2010) ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing , pp. 49-59
- Strzodka, R.¹ Shaheen, M.² Pajak, D.³ Seidel, H.-P.⁴

19
- 35648995516
- Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2006-183, December
- K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: a view from Berkeley," Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2006-183, December 2006.
- (2006) The Landscape of Parallel Computing Research: A View from Berkeley
- Asanovic, K.¹ Bodik, R.² Catanzaro, B.C.³ Gebis, J.J.⁴ Husbands, P.⁵ Keutzer, K.⁶ Patterson, D.A.⁷ Plishker, W.L.⁸ Shalf, J.⁹ Williams, S.W.¹⁰ Yelick, K.A.¹¹

20
- 70450077422
- Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures
- M. Christen, O. Schenk, E. Neufeld, P. Messmer, and H. Burkhart, "Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures," in IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2009, pp. 1-10.
- IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2009 , pp. 1-10
- Christen, M.¹ Schenk, O.² Neufeld, E.³ Messmer, P.⁴ Burkhart, H.⁵

21
- 84888360034
- Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm
- H. H. Pennes, "Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm," J Appl Physiol, vol. 1, no. 2, pp. 93-122, 1948.
- (1948) J Appl Physiol , vol.1 , Issue.2 , pp. 93-122
- Pennes, H.H.¹

22
- 80053294634
- Available
- Core documentation of the COSMOmodel. [Online]. Available: http://cosmomodel.cscs.ch/content/model/documentation/core/default.htm
- Core Documentation of the COSMOmodel. [Online]

23
- 70449997300
- Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
- to appear
- K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors," SIAM Review, 2008, to appear.
- (2008) SIAM Review
- Datta, K.¹ Kamil, S.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

24
- 70349100958
- Khronos OpenCLWorking Group 8 December
- Khronos OpenCLWorking Group, The OpenCL Specification, 8 December 2008.
- (2008) The OpenCL Specification

25
- 38149089799
- Available
- H. Mössenböck, M. Löberbauer, and A. Wöß. The Compiler Generator Coco/R. [Online]. Available: http://www.ssw.unilinz.ac.at/ coco
- The Compiler Generator Coco/R. [Online]
- Mössenböck, H.¹ Löberbauer, M.² Wöß, A.³

26
- 80053290660
- Available
- Cetus - A Source-to-Source Compiler Infrastructure for C Programs. [Online]. Available: http://cetus.ecn.purdue.edu/
- Cetus - A Source-to-Source Compiler Infrastructure for C Programs. [Online]

27
- 80053278305
- Cetus: A Source-to-Source Compiler Infrastructure for Multicores
- H. Bae, L. Bachega, C. Dave, S.-I. Lee, S. Lee, S.-J. Min, R. Eigenmann, and S. Midkiff, "Cetus: A Source-to-Source Compiler Infrastructure for Multicores," in Proceedings of the 14th Int'l Workshop on Compilers for Parallel Computing, 2009.
- Proceedings of the 14th Int'l Workshop on Compilers for Parallel Computing, 2009
- Bae, H.¹ Bachega, L.² Dave, C.³ Lee, S.-I.⁴ Lee, S.⁵ Min, S.-J.⁶ Eigenmann, R.⁷ Midkiff, S.⁸

28
- 84871749450
- Available
- Maxima, a Computer Algebra System. [Online]. Available: http://maxima.sourceforge.net/
- Maxima, a Computer Algebra System. [Online]

29
- 0000238336
- A simplex method for function minimization
- J. A. Nelder and R. Mead, "A simplex method for function minimization," Computer Journal, vol. 7, p. 308313, 1965.
- (1965) Computer Journal , vol.7 , pp. 308313
- Nelder, J.A.¹ Mead, R.²

30
- 80053267454
- A Case for Machine Learning to Optimize Multicore Performance
- A. Ganapathi, K. Datta, O. Fox, and D. Patterson, "A Case for Machine Learning to Optimize Multicore Performance," in First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), 2009.
- First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), 2009
- Ganapathi, A.¹ Datta, K.² Fox, O.³ Patterson, D.⁴

31
- 77951200277
- Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
- P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor," IEEE Micro, vol. 30, pp. 16-29, 2010.
- (2010) IEEE Micro , vol.30 , pp. 16-29
- Conway, P.¹ Kalyanasundharam, N.² Donley, G.³ Lepak, K.⁴ Hughes, B.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.