-
1
-
-
77954022347
-
An Auto-tuning Framework for Parallel Multicore Stencil Computations
-
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An Auto-tuning Framework For Parallel Multicore Stencil Computations," in IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2010, pp. 1-12.
-
IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2010
, pp. 1-12
-
-
Kamil, S.1
Chan, C.2
Oliker, L.3
Shalf, J.4
Williams, S.5
-
4
-
-
24344485098
-
OSKI: A library of automatically tuned sparse matrix kernels
-
R. Vuduc, J. W. Demmel, and K. A. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," Journal of Physics: Conference Series, vol. 16, no. 1, p. 521, 2005.
-
(2005)
Journal of Physics: Conference Series
, vol.16
, Issue.1
, pp. 521
-
-
Vuduc, R.1
Demmel, J.W.2
Yelick, K.A.3
-
5
-
-
20744449792
-
The Design and Implementation of FFTW3
-
special issue on "Program Generation, Optimization, and Platform Adaptation"
-
M. Frigo and S. G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE, vol. 93, no. 2, pp. 216-231, 2005, special issue on "Program Generation, Optimization, and Platform Adaptation".
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.G.2
-
6
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", vol. 93, no. 2, pp. 232-275, 2005.
-
(2005)
Proceedings of the IEEE, Special Issue on "Program Generation, Optimization, and Adaptation"
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.4
Veloso, M.5
Singer, B.6
Xiong, J.7
Franchetti, F.8
Gacic, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
7
-
-
0242578173
-
An Efficient Code Generation Technique for Tiled Iteration Spaces
-
G. Goumas, M. Athanasaki, and N. Koziris, "An Efficient Code Generation Technique for Tiled Iteration Spaces," IEEE Transactions on Parallel and Distributed Systems, vol. 14, pp. 1021-1034, 2003.
-
(2003)
IEEE Transactions on Parallel and Distributed Systems
, vol.14
, pp. 1021-1034
-
-
Goumas, G.1
Athanasaki, M.2
Koziris, N.3
-
8
-
-
77954412565
-
Loop Transformation Recipes for Code Generation and Auto-Tuning
-
Languages and Compilers for Parallel Computing, ser. G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., Springer Berlin / Heidelberg
-
M. Hall, J. Chame, C. Chen, J. Shin, G. Rudy, and M. Khan, "Loop Transformation Recipes for Code Generation and Auto-Tuning," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, G. Gao, L. Pollock, J. Cavazos, and X. Li, Eds., vol. 5898. Springer Berlin / Heidelberg, 2010, pp. 50-64.
-
(2010)
Lecture Notes in Computer Science
, vol.5898
, pp. 50-64
-
-
Hall, M.1
Chame, J.2
Chen, C.3
Shin, J.4
Rudy, G.5
Khan, M.6
-
9
-
-
24644456455
-
Automatic tiling of iterative stencil loops
-
DOI 10.1145/1034774.1034777
-
Z. Li and Y. Song, "Automatic tiling of iterative stencil loops,"ACM Trans. Program. Lang. Syst., vol. 26, no. 6, pp. 975-1028, 2004. (Pubitemid 41270296)
-
(2004)
ACM Transactions on Programming Languages and Systems
, vol.26
, Issue.6
, pp. 975-1028
-
-
Li, Z.1
Song, Y.2
-
10
-
-
35448985754
-
Parameterized Tiled Loops for Free
-
June [Online]. Available
-
L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout, "Parameterized Tiled Loops for Free," SIGPLAN Not., vol. 42, pp. 405-414, June 2007. [Online]. Available: http://doi.acm.org/10.1145/1273442. 1250780
-
(2007)
SIGPLAN Not.
, vol.42
, pp. 405-414
-
-
Renganarayanan, L.1
Kim, D.2
Rajopadhye, S.3
Strout, M.M.4
-
12
-
-
57349139452
-
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
-
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A Practical Automatic Polyhedral Parallelizer and Locality Optimizer," in Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008.
-
Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008
-
-
Bondhugula, U.1
Hartono, A.2
Ramanujam, J.3
Sadayappan, P.4
-
13
-
-
80053283808
-
-
Scientific Computing with Multicore and Accelerators. CRC Press ch.
-
K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick, Scientific Computing with Multicore and Accelerators. CRC Press, 2010, ch. Auto-tuning Stencil Computations on Multicore and Accelerators, pp. 219-253.
-
(2010)
Auto-tuning Stencil Computations on Multicore and Accelerators
, pp. 219-253
-
-
Datta, K.1
Williams, S.2
Volkov, V.3
Carter, J.4
Oliker, L.5
Shalf, J.6
Yelick, K.7
-
14
-
-
84983141180
-
-
Scientific Computing with Multicore and Accelerators. CRC Press, ch.
-
M. Christen, O. Schenk, E. Neufeld, M. Paulides, and H. Burkhart, Scientific Computing with Multicore and Accelerators. CRC Press, 2010, ch. Manycore Stencil Computations in Hyperthermia Applications, pp. 255-277.
-
(2010)
Manycore Stencil Computations in Hyperthermia Applications
, pp. 255-277
-
-
Christen, M.1
Schenk, O.2
Neufeld, E.3
Paulides, M.4
Burkhart, H.5
-
15
-
-
79551491518
-
A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations
-
10.1007/s10766-010-0142-5. [Online]. Available
-
J. Meng and K. Skadron, "A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,"International Journal of Parallel Programming, vol. 39, pp. 115-142, 2011, 10.1007/s10766-010-0142-5. [Online]. Available: http://dx.doi.org/10.1007/s10766-010-0142-5
-
(2011)
International Journal of Parallel Programming
, vol.39
, pp. 115-142
-
-
Meng, J.1
Skadron, K.2
-
16
-
-
70449657442
-
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
-
G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske, "Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization," in COMPSAC (1), 2009, pp. 579-586.
-
(2009)
COMPSAC (1)
, pp. 579-586
-
-
Wellein, G.1
Hager, G.2
Zeiser, T.3
Wittmann, M.4
Fehske, H.5
-
17
-
-
32844463802
-
Cache oblivious stencil computations
-
New York, NY, USA: ACM
-
M. Frigo and V. Strumpen, "Cache oblivious stencil computations,"in ICS '05: Proceedings of the 19th annual international conference on Supercomputing. New York, NY, USA: ACM, 2005, pp. 361-366.
-
(2005)
ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing
, pp. 361-366
-
-
Frigo, M.1
Strumpen, V.2
-
18
-
-
77954709215
-
Cache oblivious parallelograms in iterative stencil computations
-
R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel, "Cache oblivious parallelograms in iterative stencil computations,"ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 49-59, 2010.
-
(2010)
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
, pp. 49-59
-
-
Strzodka, R.1
Shaheen, M.2
Pajak, D.3
Seidel, H.-P.4
-
19
-
-
35648995516
-
-
Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2006-183, December
-
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: a view from Berkeley," Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2006-183, December 2006.
-
(2006)
The Landscape of Parallel Computing Research: A View from Berkeley
-
-
Asanovic, K.1
Bodik, R.2
Catanzaro, B.C.3
Gebis, J.J.4
Husbands, P.5
Keutzer, K.6
Patterson, D.A.7
Plishker, W.L.8
Shalf, J.9
Williams, S.W.10
Yelick, K.A.11
-
20
-
-
70450077422
-
Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures
-
M. Christen, O. Schenk, E. Neufeld, P. Messmer, and H. Burkhart, "Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures," in IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2009, pp. 1-10.
-
IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2009
, pp. 1-10
-
-
Christen, M.1
Schenk, O.2
Neufeld, E.3
Messmer, P.4
Burkhart, H.5
-
21
-
-
84888360034
-
Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm
-
H. H. Pennes, "Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm," J Appl Physiol, vol. 1, no. 2, pp. 93-122, 1948.
-
(1948)
J Appl Physiol
, vol.1
, Issue.2
, pp. 93-122
-
-
Pennes, H.H.1
-
23
-
-
70449997300
-
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
-
to appear
-
K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick, "Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors," SIAM Review, 2008, to appear.
-
(2008)
SIAM Review
-
-
Datta, K.1
Kamil, S.2
Williams, S.3
Oliker, L.4
Shalf, J.5
Yelick, K.6
-
24
-
-
70349100958
-
-
Khronos OpenCLWorking Group 8 December
-
Khronos OpenCLWorking Group, The OpenCL Specification, 8 December 2008.
-
(2008)
The OpenCL Specification
-
-
-
27
-
-
80053278305
-
Cetus: A Source-to-Source Compiler Infrastructure for Multicores
-
H. Bae, L. Bachega, C. Dave, S.-I. Lee, S. Lee, S.-J. Min, R. Eigenmann, and S. Midkiff, "Cetus: A Source-to-Source Compiler Infrastructure for Multicores," in Proceedings of the 14th Int'l Workshop on Compilers for Parallel Computing, 2009.
-
Proceedings of the 14th Int'l Workshop on Compilers for Parallel Computing, 2009
-
-
Bae, H.1
Bachega, L.2
Dave, C.3
Lee, S.-I.4
Lee, S.5
Min, S.-J.6
Eigenmann, R.7
Midkiff, S.8
-
29
-
-
0000238336
-
A simplex method for function minimization
-
J. A. Nelder and R. Mead, "A simplex method for function minimization," Computer Journal, vol. 7, p. 308313, 1965.
-
(1965)
Computer Journal
, vol.7
, pp. 308313
-
-
Nelder, J.A.1
Mead, R.2
-
30
-
-
80053267454
-
A Case for Machine Learning to Optimize Multicore Performance
-
A. Ganapathi, K. Datta, O. Fox, and D. Patterson, "A Case for Machine Learning to Optimize Multicore Performance," in First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), 2009.
-
First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), 2009
-
-
Ganapathi, A.1
Datta, K.2
Fox, O.3
Patterson, D.4
-
31
-
-
77951200277
-
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
-
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor," IEEE Micro, vol. 30, pp. 16-29, 2010.
-
(2010)
IEEE Micro
, vol.30
, pp. 16-29
-
-
Conway, P.1
Kalyanasundharam, N.2
Donley, G.3
Lepak, K.4
Hughes, B.5
|