메뉴 건너뛰기




Volumn 6601 LNCS, Issue , 2011, Pages 225-245

Data layout transformation for stencil computations on short-vector SIMD architectures

Author keywords

[No Author keywords available]

Indexed keywords

ANALYSIS TECHNIQUES; DATA LAYOUTS; MODERN PROCESSORS; PARTIAL DIFFERENTIAL; SCIENTIFIC AND ENGINEERING APPLICATIONS; SIMD ARCHITECTURE; SIMD INSTRUCTIONS; STENCIL COMPUTATIONS;

EID: 79953274591     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-19861-8_13     Document Type: Conference Paper
Times cited : (99)

References (41)
  • 1
    • 0023438847 scopus 로고
    • Automatic translation of fortran programs to vector form
    • Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM TOPLAS 9(4) (1987)
    • (1987) ACM TOPLAS , vol.9 , Issue.4
    • Allen, R.1    Kennedy, K.2
  • 2
    • 0027802136 scopus 로고
    • Communication optimization and code generation for distributed memory machines
    • Amarasinghe, S., Lam, M.: Communication optimization and code generation for distributed memory machines. In: PLDI (1993)
    • (1993) PLDI
    • Amarasinghe, S.1    Lam, M.2
  • 3
    • 0029181140 scopus 로고
    • Data and computation transformations for multiprocessors
    • Anderson, J., Amarasinghe, S., Lam, M.: Data and computation transformations for multiprocessors. In: PPoPP (1995)
    • (1995) PPoPP
    • Anderson, J.1    Amarasinghe, S.2    Lam, M.3
  • 4
    • 70350676807 scopus 로고    scopus 로고
    • Optimized stencil computation using in-place calculation on modern multicore systems
    • Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
    • Augustin, W., Heuveline, V., Weiss, J.-P.: Optimized stencil computation using in-place calculation on modern multicore systems. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 772-784. Springer, Heidelberg (2009)
    • (2009) LNCS , vol.5704 , pp. 772-784
    • Augustin, W.1    Heuveline, V.2    Weiss, J.-P.3
  • 6
    • 59749100826 scopus 로고    scopus 로고
    • Optimization and performance modeling of stencil computations on modern microprocessors
    • Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1) (2009)
    • (2009) SIAM Review , vol.51 , Issue.1
    • Datta, K.1    Kamil, S.2    Williams, S.3    Oliker, L.4    Shalf, J.5    Yelick, K.6
  • 9
    • 79953287400 scopus 로고    scopus 로고
    • Introducing the semi-stencil algorithm
    • de la Cruz, R., Araya-Polo, M., Cela, J.M.: Introducing the semi-stencil algorithm. In: PPAM (1) (2009)
    • (2009) PPAM , Issue.1
    • De La Cruz, R.1    Araya-Polo, M.2    Cela, J.M.3
  • 11
    • 70350630432 scopus 로고    scopus 로고
    • A multilevel parallelization framework for high-order stencil computations
    • Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
    • Dursun, H., Nomura, K.-i., Peng, L., Seymour, R., Wang, W., Kalia, R.K., Nakano, A., Vashishta, P.: A multilevel parallelization framework for high-order stencil computations. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 642-653. Springer, Heidelberg (2009)
    • (2009) LNCS , vol.5704 , pp. 642-653
    • Dursun, H.1    Nomura, K.-I.2    Peng, L.3    Seymour, R.4    Wang, W.5    Kalia, R.K.6    Nakano, A.7    Vashishta, P.8
  • 12
    • 8344245462 scopus 로고    scopus 로고
    • Vectorization for simd architectures with alignment constraints
    • Eichenberger, A., Wu, P., O'Brien, K.: Vectorization for simd architectures with alignment constraints. In: PLDI (2004)
    • (2004) PLDI
    • Eichenberger, A.1    Wu, P.2    O'Brien, K.3
  • 13
    • 37149053855 scopus 로고    scopus 로고
    • New algorithms for SIMD alignment
    • Adsul, B., Vetta, A. (eds.) CC 2007. Springer, Heidelberg
    • Fireman, L., Petrank, E., Zaks, A.: New algorithms for SIMD alignment. In: Adsul, B., Vetta, A. (eds.) CC 2007. LNCS, vol. 4420, pp. 1-15. Springer, Heidelberg (2007)
    • (2007) LNCS , vol.4420 , pp. 1-15
    • Fireman, L.1    Petrank, E.2    Zaks, A.3
  • 14
  • 15
    • 77749243464 scopus 로고    scopus 로고
    • Data transformations enabling loop vectorization on multithreaded data parallel architectures
    • Jang, B., Mistry, P., Schaa, D., Dominguez, R., Kaeli, D.R.: Data transformations enabling loop vectorization on multithreaded data parallel architectures. In: PPOPP (2010)
    • (2010) PPOPP
    • Jang, B.1    Mistry, P.2    Schaa, D.3    Dominguez, R.4    Kaeli, D.R.5
  • 17
    • 84958661690 scopus 로고    scopus 로고
    • Impact of modern memory subsystems on cache optimizations for stencil computations
    • Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP 2005 (2005)
    • (2005) MSP 2005
    • Kamil, S.1    Husbands, P.2    Oliker, L.3    Shalf, J.4    Yelick, K.5
  • 18
    • 0033077834 scopus 로고    scopus 로고
    • A linear algebra framework for automatic determination of optimal data layouts
    • Kandemir, M., Choudhary, A., Shenoy, N., Banerjee, P., Ramanujam, J.: A linear algebra framework for automatic determination of optimal data layouts. IEEE TPDS 10(2) (1999)
    • (1999) IEEE TPDS , vol.10 , Issue.2
    • Kandemir, M.1    Choudhary, A.2    Shenoy, N.3    Banerjee, P.4    Ramanujam, J.5
  • 20
    • 0032108102 scopus 로고    scopus 로고
    • Automatic data layout for distributed-memory machines
    • Kennedy, K., Kremer, U.: Automatic data layout for distributed-memory machines. ACM TOPLAS 20(4) (1998)
    • (1998) ACM TOPLAS , vol.20 , Issue.4
    • Kennedy, K.1    Kremer, U.2
  • 22
    • 0034446825 scopus 로고    scopus 로고
    • Exploiting superword level parallelism with multimedia instruction sets
    • Larsen, S., Amarasinghe, S.P.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)
    • (2000) PLDI
    • Larsen, S.1    Amarasinghe, S.P.2
  • 23
  • 24
    • 24644456455 scopus 로고    scopus 로고
    • Automatic tiling of iterative stencil loops
    • Li, Z., Song, Y.: Automatic tiling of iterative stencil loops. ACM TOPLAS 26(6) (2004)
    • (2004) ACM TOPLAS , vol.26 , Issue.6
    • Li, Z.1    Song, Y.2
  • 25
    • 70449723385 scopus 로고    scopus 로고
    • Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
    • Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS (2009)
    • (2009) ICS
    • Meng, J.1    Skadron, K.2
  • 26
    • 67650671606 scopus 로고    scopus 로고
    • 3d finite difference computation on gpus using cuda
    • Micikevicius, P.: 3d finite difference computation on gpus using cuda. In: GPGPU-2 (2009)
    • (2009) GPGPU-2
    • Micikevicius, P.1
  • 27
    • 79953275887 scopus 로고    scopus 로고
    • Multi-platform auto-vectorization
    • Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO (2006)
    • (2006) CGO
    • Nuzman, D.1    Henderson, R.2
  • 28
    • 33746034953 scopus 로고    scopus 로고
    • Auto-vectorization of interleaved data for simd
    • Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for simd. In: PLDI (2006)
    • (2006) PLDI
    • Nuzman, D.1    Rosen, I.2    Zaks, A.3
  • 29
    • 63549093768 scopus 로고    scopus 로고
    • Outer-loop vectorization: Revisited for short simd architectures
    • Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: PACT (2008)
    • (2008) PACT
    • Nuzman, D.1    Zaks, A.2
  • 30
    • 0032658236 scopus 로고    scopus 로고
    • Nonsingular data transformations: Definition, validity, and applications
    • O'Boyle, M., Knijnenburg, P.: Nonsingular data transformations: Definition, validity, and applications. IJPP 27(3) (1999)
    • (1999) IJPP , vol.27 , Issue.3
    • O'Boyle, M.1    Knijnenburg, P.2
  • 31
    • 77951447129 scopus 로고    scopus 로고
    • Mapping the FDTD Application to Many-Core Chip Architectures
    • Orozco, D., Gao, G.R.: Mapping the FDTD Application to Many-Core Chip Architectures. In: ICPP (2009)
    • (2009) ICPP
    • Orozco, D.1    Gao, G.R.2
  • 32
    • 0031622954 scopus 로고    scopus 로고
    • Data transformations for eliminating conflict misses
    • Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: PLDI (1998)
    • (1998) PLDI
    • Rivera, G.1    Tseng, C.-W.2
  • 35
    • 79953269601 scopus 로고    scopus 로고
    • Efficient multicore-aware parallelization strategies for iterative stencil computations
    • abs/1004.1741
    • Treibig, J., Wellein, G., Hager, G.: Efficient multicore-aware parallelization strategies for iterative stencil computations. CoRR, abs/1004.1741 (2010)
    • (2010) CoRR
    • Treibig, J.1    Wellein, G.2    Hager, G.3
  • 36
    • 70449723384 scopus 로고    scopus 로고
    • Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems
    • Venkatasubramanian, S., Vuduc, R.: Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems. In: ICS (2009)
    • (2009) ICS
    • Venkatasubramanian, S.1    Vuduc, R.2
  • 37
    • 70449657442 scopus 로고    scopus 로고
    • Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
    • Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: COMPSAC (2009)
    • (2009) COMPSAC
    • Wellein, G.1    Hager, G.2    Zeiser, T.3    Wittmann, M.4    Fehske, H.5
  • 38
    • 78650871519 scopus 로고    scopus 로고
    • Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
    • abs/1006.3148
    • Wittmann, M., Hager, G., Treibig, J., Wellein, G.: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. CoRR, abs/1006.3148 (2010)
    • (2010) CoRR
    • Wittmann, M.1    Hager, G.2    Treibig, J.3    Wellein, G.4
  • 40
    • 1542392248 scopus 로고    scopus 로고
    • Achieving scalable locality with time skewing
    • Wonnacott, D.: Achieving scalable locality with time skewing. IJPP 30(3) (2002)
    • (2002) IJPP , vol.30 , Issue.3
    • Wonnacott, D.1
  • 41
    • 33646833599 scopus 로고    scopus 로고
    • Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
    • Wu, P., Eichenberger, A.E., Wang, A.: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In: CGO (2005)
    • (2005) CGO
    • Wu, P.1    Eichenberger, A.E.2    Wang, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.