-
1
-
-
1542392248
-
Achieving scalable locality with time skewing
-
Wonnacott D. Achieving scalable locality with time skewing. Int. J. Parallel Program, 2002, 30(3): 181-221.
-
(2002)
Int. J. Parallel Program
, vol.30
, Issue.3
, pp. 181-221
-
-
Wonnacott, D.1
-
2
-
-
34547503691
-
Time skewing: A value-based ap-proach to optimizing for memory locality
-
Department of Computer Science, Rugers Uni-versity
-
Mccalpin J, Wonnacott D. Time skewing: A value-based ap-proach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers Uni-versity. 1999.
-
(1999)
Technical Report DCS-TR-379
-
-
McCalpin, J.1
Wonnacott, D.2
-
3
-
-
77954709215
-
Cache oblivious parallelograms in iterative stencil computations
-
Tsukuba, Japan, Jun. 1-4
-
Strzodka R, Shaheen M, Pajak D et al. Cache oblivious parallelograms in iterative stencil computations. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.49-59.
-
(2010)
Proc. The 24th ACM Int. Conf. Supercomputing
, pp. 49-59
-
-
Strzodka, R.1
Shaheen, M.2
Pajak, D.3
-
4
-
-
0032635362
-
New tiling techniques to improve cache temporal locality
-
Atlanta, USA, May 1-4
-
Song Y, Li Z. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Program-ming Language Design and Implementation, Atlanta, USA, May 1-4, 1999, pp.215-228.
-
(1999)
Proc. ACM SIGPLAN Conference on Program-ming Language Design and Implementation
, pp. 215-228
-
-
Song, Y.1
Li, Z.2
-
5
-
-
84858693885
-
Increasing tempo-ral locality with skewing and recursive blocking
-
Denver, USA, Nov. 10-16
-
Jin G, Mellor-Crummey J, Fowler R. Increasing tempo-ral locality with skewing and recursive blocking. In Proc. ACM/IEEE Conference on Supercomputing, Denver, USA, Nov. 10-16, 2001, pp.43-43.
-
(2001)
Proc. ACM/IEEE Conference on Supercomputing
, pp. 43-43
-
-
Jin, G.1
Mellor-Crummey, J.2
Fowler, R.3
-
6
-
-
70350771127
-
Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures
-
Austin, USA, Nov.15-21, Article 4.
-
Datta K, Murphy M, Volkov V et al. Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures. In Proc. ACM/IEEE Conference on Supercom-puting, Austin, USA, Nov.15-21, 2008, Article 4.
-
(2008)
Proc. ACM/IEEE Conference on Supercom-putting
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
-
7
-
-
34250216007
-
Scientific computing kernels on the cell processor
-
DOI 10.1007/s10766-007-0034-5
-
Williams S, Shalf J, Oliker L et al. Scientific computing Ker-nels on the cell processor. Int. J. Parallel Program, 2007, 35(3): 263-298. (Pubitemid 46904454)
-
(2007)
International Journal of Parallel Programming
, vol.35
, Issue.3
, pp. 263-298
-
-
Williams, S.1
Shalf, J.2
Oliker, L.3
Kamil, S.4
Husbands, P.5
Yelick, K.6
-
8
-
-
70449723385
-
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
-
Yorktown Heights, USA, Jun. 8-12
-
Meng J, Skadron K. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proc. The 23rd International Conference on Supercomput-ing, Yorktown Heights, USA, Jun. 8-12, 2009, pp.256-265.
-
(2009)
Proc. The 23rd International Conference on Supercomput-ing
, pp. 256-265
-
-
Meng, J.1
Skadron, K.2
-
9
-
-
79953817719
-
-
NVIDIA. NVIDIA CUDA programming guide 3.0, http://de-veloper.download. nvidia.com/compute/cuda/3 0/toolkit/do-cs/NVIDIA CUDA ProgrammingGuide-pdf, 2010.
-
(2010)
NVIDIA CUDA Programming Guide 3.0
-
-
-
11
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
Austin, USA, Jun. 20-24
-
Hong S, Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. The 36th Annual Int. Symp. Computer Architecture, Austin, USA, Jun. 20-24, 2009, pp.152-163.
-
(2009)
Proc. The 36th Annual Int. Symp. Computer Architecture
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
12
-
-
77957561221
-
An adaptive performance modeling tool for GPU archi-tectures
-
Bangalore, India, Jan. 9-14
-
Baghsorkhi S S, Delahaye M, Patel S J, Gropp W D, Hwu W W. An adaptive performance modeling tool for GPU archi-tectures. In Proc. The 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, Jan. 9-14, 2010, pp.105-114.
-
(2010)
Proc. The 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.W.5
-
13
-
-
84861583048
-
-
Sept.
-
van der Laan W J. Decuda. http://wiki.github.com/laanwj/decuda/, Sept., 2010.
-
(2010)
Decuda
-
-
Van Der Laan, W.J.1
-
14
-
-
70350786536
-
A memory optimization technique for software-managed scratchpad memory in GPUs
-
San Francisco, USA, Jul. 27-28
-
Moazeni M, Bui A, Sarrafzadeh M. A memory optimization technique for software-managed scratchpad memory in GPUs. In Proc. The 7th IEEE Symposium on Application Specific Processors, San Francisco, USA, Jul. 27-28, 2009, pp.43-49.
-
(2009)
Proc. The 7th IEEE Symposium on Application Specific Processors
, pp. 43-49
-
-
Moazeni, M.1
Bui, A.2
Sarrafzadeh, M.3
-
15
-
-
77951159230
-
A control-structure splitting opti-mization for GPGPU
-
Ischia, Italy, May 18-20
-
Carrillo S, Siegel J, Li X. A control-structure splitting opti-mization for GPGPU. In Proc. The 6th ACM Conf. Comput-ing Frontiers, Ischia, Italy, May 18-20, 2009, pp.147-150.
-
(2009)
Proc. The 6th ACM Conf. Comput-ing Frontiers
, pp. 147-150
-
-
Carrillo, S.1
Siegel, J.2
Li, X.3
-
16
-
-
78649816542
-
Hy-brid core acceleration of UWB SIRE radar signal processing
-
Park S J, Ross J, Shires D, Richie D, Henz B, Nguyen L. Hy-brid core acceleration of UWB SIRE radar signal processing. IEEE Trans. Parallel Distrib. Syst, 2011, 22(1): 46-57.
-
(2011)
IEEE Trans. Parallel Distrib. Syst
, vol.22
, Issue.1
, pp. 46-57
-
-
Park, S.J.1
Ross, J.2
Shires, D.3
Richie, D.4
Henz, B.5
Nguyen, L.6
-
17
-
-
77954713684
-
An empirically tuned 2D and 3D FFT library on CUDA GPU
-
Tsukuba, Japan, Jun. 1-4
-
Gu L, Li X, Siegel J. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.305-314.
-
(2010)
Proc. The 24th ACM Int. Conf. Supercomputing
, pp. 305-314
-
-
Gu, L.1
Li, X.2
Siegel, J.3
-
18
-
-
77949629485
-
Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA
-
Shenzhen, China, Dec. 9-11
-
Goorts P, Rogmans S, Bekaert P. Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA. In Proc. The 15th International Conference on Parallel and Distributed Sys-tems, Shenzhen, China, Dec. 9-11, 2009, pp.300-307.
-
(2009)
Proc. The 15th International Conference on Parallel and Distributed Sys-tems
, pp. 300-307
-
-
Goorts, P.1
Rogmans, S.2
Bekaert, P.3
-
19
-
-
77951435761
-
Accele-rating lattice Boltzmann fluid flow simulations using graphics processors
-
Vienna, Austria, Sep. 22-25
-
Bailey P, Myre J, Walsh S D C, Lilja D J, Saar M O. Accele-rating lattice Boltzmann fluid flow simulations using graphics processors. In Proc. International Conference on Parallel Processing, Vienna, Austria, Sep. 22-25, 2009, pp.550-557.
-
(2009)
Proc. International Conference on Parallel Processing
, pp. 550-557
-
-
Bailey, P.1
Myre, J.2
Walsh, S.D.C.3
Lilja, D.J.4
Saar, M.O.5
-
20
-
-
70449723384
-
Tuned and wildly asyn-chronous stencil kernels for hybrid CPU/GPU systems
-
Yorktown Heights, USA, Jun. 8-12
-
Venkatasubramanian S, Vuduc R W. Tuned and wildly asyn-chronous stencil kernels for hybrid CPU/GPU systems. In Proc. The 23rd International Conference on Supercomputing, Yorktown Heights, USA, Jun. 8-12, 2009, pp.244-255.
-
(2009)
Proc. The 23rd International Conference on Supercomputing
, pp. 244-255
-
-
Venkatasubramanian, S.1
Vuduc, R.W.2
-
21
-
-
70450077422
-
Parallel data-locality aware stencil computations on modern micro-architectures
-
Rome, Italy, May 23-29
-
Christen M, Schenk O, Neufeld E et al. Parallel data-locality aware stencil computations on modern micro-architectures. In Proc. IEEE Int. Symp. Parallel & Distributed Process-ing, Rome, Italy, May 23-29, 2009, pp.1-10.
-
(2009)
Proc. IEEE Int. Symp. Parallel & Distributed Process-ing
, pp. 1-10
-
-
Christen, M.1
Schenk, O.2
Neufeld, E.3
-
23
-
-
78649538491
-
Toward harnessing DOACROSS parallelism for multi-GPGPUs
-
San Diego, USA, Sep. 13-16
-
Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. The 39th Int. Conf. Parallel Processing, San Diego, USA, Sep. 13-16, 2010, pp.40-50.
-
(2010)
Proc. The 39th Int. Conf. Parallel Processing
, pp. 40-50
-
-
Di, P.1
Wan, Q.2
Zhang, X.3
-
24
-
-
80053238973
-
Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures
-
Anchorage, USA, May 16-20
-
Christen M, Schenk O, Burkhart H. Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proc. IEEE International Parallel & Distributed Processing Symposium, Anchorage, USA, May 16-20, 2011, pp.676-687.
-
(2011)
Proc. IEEE International Parallel & Distributed Processing Symposium
, pp. 676-687
-
-
Christen, M.1
Schenk, O.2
Burkhart, H.3
-
25
-
-
78650806116
-
3 5-D blocking optimization for stencil computations on modern CPUs and GPUs
-
New Orleans, USA, Nov. 13-19
-
Nguyen A, Satish N, Chhugani J, Kim C, Dubey P. 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, USA, Nov. 13-19, 2010, pp.1-13.
-
(2010)
Proc. ACM/IEEE Int. Conf. for High Performance Computing Networking, Storage and Analysis
, pp. 1-13
-
-
Nguyen, A.1
Satish, N.2
Chhugani, J.3
Kim, C.4
Dubey, P.5
-
26
-
-
34547401051
-
Profitable loop fusion and tiling using model-driven empirical search
-
DOI 10.1145/1183401.1183437, Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006
-
Qasem A, Kennedy K. Profitable loop fusion and tiling us-ing model-driven empirical search. In Proc. The 20th Annual International Conference on Supercomputing, Cairns, Aus-tralia, Jun. 28-Jul. 1, 2006, pp.249-258. (Pubitemid 47168511)
-
(2006)
Proceedings of the International Conference on Supercomputing
, pp. 249-258
-
-
Qasem, A.1
Kennedy, K.2
-
27
-
-
0442295621
-
The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles
-
Knijnenburg P M W, Kisuki T, Gallivan K et al. The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles. Concurrency and Computation: Practice & Experience, 2004, 16(2-3): 247-270.
-
(2004)
Concurrency and Computation: Practice & Experience
, vol.16
, Issue.2-3
, pp. 247-270
-
-
Knijnenburg, P.M.W.1
Kisuki, T.2
Gallivan, K.3
-
28
-
-
10744232785
-
A comparison of empirical and model-driven optimization
-
San Diego, USA, Jun. 8-11
-
Yotov K, Li X, Ren G et al. A comparison of empirical and model-driven optimization. In Proc. The ACM SIGPLAN 2003 Conference on Programming Language Design and Im-plementation, San Diego, USA, Jun. 8-11, 2003, pp.63-76.
-
(2003)
Proc. The ACM SIGPLAN 2003 Conference on Programming Language Design and Im-plementation
, pp. 63-76
-
-
Yotov, K.1
Li, X.2
Ren, G.3
-
29
-
-
33646828918
-
Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
-
1402081, Proceedings of the 2005 International Symposium onCode Generation and Optimization, CGO 2005
-
Chen C, Chame J, Hall M. Combining models and guided empirical search to optimize for multiple levels of the mem-ory hierarchy. In Proc. Int. Symp. Code Generation and Optimization, San Jose, USA, Mar. 20-23, 2005, pp.111-122. (Pubitemid 43773797)
-
(2005)
Proceedings of the 2005 International Symposium on Code Generation and Optimization, CGO 2005
, vol.2005
, pp. 111-122
-
-
Chen, C.1
Chame, J.2
Hall, M.3
-
30
-
-
43949083742
-
Analytic models and empirical search: A hybrid approach to code optimization
-
DOI 10.1007/978-3-540-69330-7-18, Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
-
Epshteyn A, Garzaran M, DeJong G et al. Analytical models and empirical search: A hybrid approach to code optimiza-tion. In Proc. The 18th International Workshop on Lan-guages and Compilers for Parallel Computing, Hawthorne, USA, Oct. 20-22, 2005, pp.259-273. (Pubitemid 351702215)
-
(2006)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol.4339
, pp. 259-273
-
-
Epshteyn, A.1
Garzaran, M.J.2
DeJong, G.3
Padua, D.4
Ren, G.5
Li, X.6
Yotov, K.7
Pingali, K.8
-
31
-
-
84886006847
-
Using machine learning to focus iterative optimization
-
New York, USA, Mar. 26-29
-
Agakov F, Bonilla E, Cavazos J et al. Using machine learning to focus iterative optimization. In Proc. Int. Symp. Code Generation and Optimization, New York, USA, Mar. 26-29, 2006, pp.295-305.
-
(2006)
Proc. Int. Symp. Code Generation and Optimization
, pp. 295-305
-
-
Agakov, F.1
Bonilla, E.2
Cavazos, J.3
-
32
-
-
4544380943
-
Finding effective compilation sequences
-
Washington, USA, Jun. 11-13
-
Almagor L, Cooper K D, Grosul A, Harvey T J, Reeves S W, Subramanian D, Torczon L, Waterman T. Finding effective compilation sequences. In Proc. ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embed-ded Systems, Washington, USA, Jun. 11-13, 2004, pp.231-239.
-
(2004)
Proc. ACM SIGPLAN/SIGBED Conference on Languages Compilers, and Tools for Embed-ded Systems
, pp. 231-239
-
-
Almagor, L.1
Cooper, K.D.2
Grosul, A.3
Harvey, T.J.4
Reeves, S.W.5
Subramanian, D.6
Torczon, L.7
Waterman, T.8
-
33
-
-
34547681859
-
Microarchitecture sensitive empirical models for compiler optimizations
-
DOI 10.1109/CGO.2007.25, 4145110, International Symposium on Code Generation and Optimization, CGO 2007
-
Vaswani K, Thazhuthaveetil M J, Srikant Y N et al. Mi-croarchitecture sensitive empirical models for compiler opti-mizations. In Proc. Int, Symp, Code Generation and Opti-mization, San Jose, USA, Mar. 11-14, 2007, pp.131-143. (Pubitemid 47214304)
-
(2007)
International Symposium on Code Generation and Optimization, CGO 2007
, pp. 131-143
-
-
Vaswani, K.1
Thazhuthaveetil, M.J.2
Srikant, Y.N.3
Joseph, P.J.4
-
34
-
-
43449094719
-
Program optimization space pruning for a multithreaded GPU
-
DOI 10.1145/1356058.1356084, Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
-
Ryoo S, Rodrigues C I, Stone S S et al. Program optimiza-tion space pruning for a multithreaded gpu. In Proc. The 6th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, Boston, USA, Apr. 6-9, 2008, pp.195-204. (Pubitemid 351667266)
-
(2008)
Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Baghsorkhi, S.S.4
Ueng, S.-Z.5
Stratton, J.A.6
Hwu, W.-M.W.7
-
35
-
-
77749340082
-
Model-driven autotuning of sparse matrix-vector multiply on GPUs
-
Bangalore, India, Jan. 9-14
-
Choi J W, Singh A, Vuduc R W. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In Proc. The 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Bangalore, India, Jan. 9-14, 2010, pp.115-126.
-
(2010)
Proc. The 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming
, pp. 115-126
-
-
Choi, J.W.1
Singh, A.2
Vuduc, R.W.3
-
37
-
-
0025447908
-
Improving register allocation for subscripted variables
-
White Plains, USA, Jun. 20-22
-
Callahan D, Carr S, Kennedy K. Improving register allocation for subscripted variables. In Proc. ACM SIGPLAN Confer-ence on Programming Language Design and Implementation, White Plains, USA, Jun. 20-22, 1990, pp.53-65.
-
(1990)
Proc. ACM SIGPLAN Confer-ence on Programming Language Design and Implementation
, pp. 53-65
-
-
Callahan, D.1
Carr, S.2
Kennedy, K.3
|