-
1
-
-
0023438847
-
Automatic translation of fortran programs to vector form
-
Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM TOPLAS 9(4) (1987)
-
(1987)
ACM TOPLAS
, vol.9
, Issue.4
-
-
Allen, R.1
Kennedy, K.2
-
2
-
-
0027802136
-
Communication optimization and code generation for distributed memory machines
-
Amarasinghe, S., Lam, M.: Communication optimization and code generation for distributed memory machines. In: PLDI (1993)
-
(1993)
PLDI
-
-
Amarasinghe, S.1
Lam, M.2
-
3
-
-
0029181140
-
Data and computation transformations for multiprocessors
-
Anderson, J., Amarasinghe, S., Lam, M.: Data and computation transformations for multiprocessors. In: PPoPP (1995)
-
(1995)
PPoPP
-
-
Anderson, J.1
Amarasinghe, S.2
Lam, M.3
-
4
-
-
70350676807
-
Optimized stencil computation using in-place calculation on modern multicore systems
-
Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
-
Augustin, W., Heuveline, V., Weiss, J.-P.: Optimized stencil computation using in-place calculation on modern multicore systems. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 772-784. Springer, Heidelberg (2009)
-
(2009)
LNCS
, vol.5704
, pp. 772-784
-
-
Augustin, W.1
Heuveline, V.2
Weiss, J.-P.3
-
5
-
-
0027311338
-
Automatic array alignment in data-parallel programs
-
Chatterjee, S., Gilbert, J., Schreiber, R., Teng, S.: Automatic array alignment in data-parallel programs. In: POPL (1993)
-
(1993)
POPL
-
-
Chatterjee, S.1
Gilbert, J.2
Schreiber, R.3
Teng, S.4
-
6
-
-
59749100826
-
Optimization and performance modeling of stencil computations on modern microprocessors
-
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1) (2009)
-
(2009)
SIAM Review
, vol.51
, Issue.1
-
-
Datta, K.1
Kamil, S.2
Williams, S.3
Oliker, L.4
Shalf, J.5
Yelick, K.6
-
7
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC 2008, pp. 1-12 (2008)
-
(2008)
SC 2008
, pp. 1-12
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
8
-
-
84971423310
-
Auto-tuning the 27-point stencil for multicore
-
Datta, K., Williams, S., Volkov, V., Carter, J., Oliker, L., Shalf, J., Yelick, K.: Auto-tuning the 27-point stencil for multicore. In: iWAPT 2009 (2009)
-
(2009)
IWAPT 2009
-
-
Datta, K.1
Williams, S.2
Volkov, V.3
Carter, J.4
Oliker, L.5
Shalf, J.6
Yelick, K.7
-
10
-
-
79953283169
-
In-core optimization of high-order stencil computations
-
Dursun, H., Nomura, K., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA (2009)
-
(2009)
PDPTA
-
-
Dursun, H.1
Nomura, K.2
Wang, W.3
Kunaseth, M.4
Peng, L.5
Seymour, R.6
Kalia, R.7
Nakano, A.8
Vashishta, P.9
-
11
-
-
70350630432
-
A multilevel parallelization framework for high-order stencil computations
-
Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
-
Dursun, H., Nomura, K.-i., Peng, L., Seymour, R., Wang, W., Kalia, R.K., Nakano, A., Vashishta, P.: A multilevel parallelization framework for high-order stencil computations. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 642-653. Springer, Heidelberg (2009)
-
(2009)
LNCS
, vol.5704
, pp. 642-653
-
-
Dursun, H.1
Nomura, K.-I.2
Peng, L.3
Seymour, R.4
Wang, W.5
Kalia, R.K.6
Nakano, A.7
Vashishta, P.8
-
12
-
-
8344245462
-
Vectorization for simd architectures with alignment constraints
-
Eichenberger, A., Wu, P., O'Brien, K.: Vectorization for simd architectures with alignment constraints. In: PLDI (2004)
-
(2004)
PLDI
-
-
Eichenberger, A.1
Wu, P.2
O'Brien, K.3
-
13
-
-
37149053855
-
New algorithms for SIMD alignment
-
Adsul, B., Vetta, A. (eds.) CC 2007. Springer, Heidelberg
-
Fireman, L., Petrank, E., Zaks, A.: New algorithms for SIMD alignment. In: Adsul, B., Vetta, A. (eds.) CC 2007. LNCS, vol. 4420, pp. 1-15. Springer, Heidelberg (2007)
-
(2007)
LNCS
, vol.4420
, pp. 1-15
-
-
Fireman, L.1
Petrank, E.2
Zaks, A.3
-
14
-
-
67149109696
-
A simd optimization framework for retargetable compilers
-
Hohenauer, M., Engel, F., Leupers, R., Ascheid, G., Meyr, H.: A simd optimization framework for retargetable compilers. ACM TACO 6(1) (2009)
-
(2009)
ACM TACO
, vol.6
, Issue.1
-
-
Hohenauer, M.1
Engel, F.2
Leupers, R.3
Ascheid, G.4
Meyr, H.5
-
15
-
-
77749243464
-
Data transformations enabling loop vectorization on multithreaded data parallel architectures
-
Jang, B., Mistry, P., Schaa, D., Dominguez, R., Kaeli, D.R.: Data transformations enabling loop vectorization on multithreaded data parallel architectures. In: PPOPP (2010)
-
(2010)
PPOPP
-
-
Jang, B.1
Mistry, P.2
Schaa, D.3
Dominguez, R.4
Kaeli, D.R.5
-
16
-
-
34547500808
-
Implicit and explicit optimizations for stencil computations
-
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: MSPC 2006 (2006)
-
(2006)
MSPC 2006
-
-
Kamil, S.1
Datta, K.2
Williams, S.3
Oliker, L.4
Shalf, J.5
Yelick, K.6
-
17
-
-
84958661690
-
Impact of modern memory subsystems on cache optimizations for stencil computations
-
Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP 2005 (2005)
-
(2005)
MSP 2005
-
-
Kamil, S.1
Husbands, P.2
Oliker, L.3
Shalf, J.4
Yelick, K.5
-
18
-
-
0033077834
-
A linear algebra framework for automatic determination of optimal data layouts
-
Kandemir, M., Choudhary, A., Shenoy, N., Banerjee, P., Ramanujam, J.: A linear algebra framework for automatic determination of optimal data layouts. IEEE TPDS 10(2) (1999)
-
(1999)
IEEE TPDS
, vol.10
, Issue.2
-
-
Kandemir, M.1
Choudhary, A.2
Shenoy, N.3
Banerjee, P.4
Ramanujam, J.5
-
20
-
-
0032108102
-
Automatic data layout for distributed-memory machines
-
Kennedy, K., Kremer, U.: Automatic data layout for distributed-memory machines. ACM TOPLAS 20(4) (1998)
-
(1998)
ACM TOPLAS
, vol.20
, Issue.4
-
-
Kennedy, K.1
Kremer, U.2
-
21
-
-
35448944792
-
Effective automatic parallelization of stencil computations
-
Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Effective automatic parallelization of stencil computations. In: PLDI (2007)
-
(2007)
PLDI
-
-
Krishnamoorthy, S.1
Baskaran, M.2
Bondhugula, U.3
Ramanujam, J.4
Rountev, A.5
Sadayappan, P.6
-
22
-
-
0034446825
-
Exploiting superword level parallelism with multimedia instruction sets
-
Larsen, S., Amarasinghe, S.P.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)
-
(2000)
PLDI
-
-
Larsen, S.1
Amarasinghe, S.P.2
-
24
-
-
24644456455
-
Automatic tiling of iterative stencil loops
-
Li, Z., Song, Y.: Automatic tiling of iterative stencil loops. ACM TOPLAS 26(6) (2004)
-
(2004)
ACM TOPLAS
, vol.26
, Issue.6
-
-
Li, Z.1
Song, Y.2
-
25
-
-
70449723385
-
Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
-
Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS (2009)
-
(2009)
ICS
-
-
Meng, J.1
Skadron, K.2
-
26
-
-
67650671606
-
3d finite difference computation on gpus using cuda
-
Micikevicius, P.: 3d finite difference computation on gpus using cuda. In: GPGPU-2 (2009)
-
(2009)
GPGPU-2
-
-
Micikevicius, P.1
-
27
-
-
79953275887
-
Multi-platform auto-vectorization
-
Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO (2006)
-
(2006)
CGO
-
-
Nuzman, D.1
Henderson, R.2
-
28
-
-
33746034953
-
Auto-vectorization of interleaved data for simd
-
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for simd. In: PLDI (2006)
-
(2006)
PLDI
-
-
Nuzman, D.1
Rosen, I.2
Zaks, A.3
-
29
-
-
63549093768
-
Outer-loop vectorization: Revisited for short simd architectures
-
Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: PACT (2008)
-
(2008)
PACT
-
-
Nuzman, D.1
Zaks, A.2
-
30
-
-
0032658236
-
Nonsingular data transformations: Definition, validity, and applications
-
O'Boyle, M., Knijnenburg, P.: Nonsingular data transformations: Definition, validity, and applications. IJPP 27(3) (1999)
-
(1999)
IJPP
, vol.27
, Issue.3
-
-
O'Boyle, M.1
Knijnenburg, P.2
-
31
-
-
77951447129
-
Mapping the FDTD Application to Many-Core Chip Architectures
-
Orozco, D., Gao, G.R.: Mapping the FDTD Application to Many-Core Chip Architectures. In: ICPP (2009)
-
(2009)
ICPP
-
-
Orozco, D.1
Gao, G.R.2
-
32
-
-
0031622954
-
Data transformations for eliminating conflict misses
-
Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: PLDI (1998)
-
(1998)
PLDI
-
-
Rivera, G.1
Tseng, C.-W.2
-
33
-
-
77949404004
-
Exploiting memory customization in fpga for 3d stencil computations
-
Shafiq, M., Pericas, M., de la Cruz, R., Araya-Polo, M., Navarro, N., Ayguade, E.: Exploiting memory customization in fpga for 3d stencil computations. In: FPT (2009)
-
(2009)
FPT
-
-
Shafiq, M.1
Pericas, M.2
De La Cruz, R.3
Araya-Polo, M.4
Navarro, N.5
Ayguade, E.6
-
34
-
-
35449003235
-
Sketching stencils
-
Solar-Lezama, A., Arnold, G., Tancau, L., Bodik, R., Saraswat, V., Seshia, S.: Sketching stencils. In: PLDI (2007)
-
(2007)
PLDI
-
-
Solar-Lezama, A.1
Arnold, G.2
Tancau, L.3
Bodik, R.4
Saraswat, V.5
Seshia, S.6
-
35
-
-
79953269601
-
Efficient multicore-aware parallelization strategies for iterative stencil computations
-
abs/1004.1741
-
Treibig, J., Wellein, G., Hager, G.: Efficient multicore-aware parallelization strategies for iterative stencil computations. CoRR, abs/1004.1741 (2010)
-
(2010)
CoRR
-
-
Treibig, J.1
Wellein, G.2
Hager, G.3
-
36
-
-
70449723384
-
Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems
-
Venkatasubramanian, S., Vuduc, R.: Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems. In: ICS (2009)
-
(2009)
ICS
-
-
Venkatasubramanian, S.1
Vuduc, R.2
-
37
-
-
70449657442
-
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
-
Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: COMPSAC (2009)
-
(2009)
COMPSAC
-
-
Wellein, G.1
Hager, G.2
Zeiser, T.3
Wittmann, M.4
Fehske, H.5
-
38
-
-
78650871519
-
Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
-
abs/1006.3148
-
Wittmann, M., Hager, G., Treibig, J., Wellein, G.: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. CoRR, abs/1006.3148 (2010)
-
(2010)
CoRR
-
-
Wittmann, M.1
Hager, G.2
Treibig, J.3
Wellein, G.4
-
40
-
-
1542392248
-
Achieving scalable locality with time skewing
-
Wonnacott, D.: Achieving scalable locality with time skewing. IJPP 30(3) (2002)
-
(2002)
IJPP
, vol.30
, Issue.3
-
-
Wonnacott, D.1
-
41
-
-
33646833599
-
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
-
Wu, P., Eichenberger, A.E., Wang, A.: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In: CGO (2005)
-
(2005)
CGO
-
-
Wu, P.1
Eichenberger, A.E.2
Wang, A.3
|