SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 6601 LNCS, Issue , 2011, Pages 225-245

Data layout transformation for stencil computations on short-vector SIMD architectures

(6) Henretty, Tom a Stock, Kevin a Pouchet, Louis Noël a Franchetti, Franz b Ramanujam, J c Sadayappan, P a

a OHIO STATE UNIVERSITY (United States)

b CARNEGIE MELLON UNIVERSITY (United States)

c LOUISIANA STATE UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ANALYSIS TECHNIQUES; DATA LAYOUTS; MODERN PROCESSORS; PARTIAL DIFFERENTIAL; SCIENTIFIC AND ENGINEERING APPLICATIONS; SIMD ARCHITECTURE; SIMD INSTRUCTIONS; STENCIL COMPUTATIONS;

ALIGNMENT; COMPUTATIONAL ELECTROMAGNETICS; HYDRAULICS; IMAGE PROCESSING; MEMORY ARCHITECTURE; PARTIAL DIFFERENTIAL EQUATIONS; PROGRAM COMPILERS; STATIC ANALYSIS;

METADATA;

EID: 79953274591 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-19861-8_13 Document Type: Conference Paper

Times cited : (99)

References (41)

1
- 0023438847
- Automatic translation of fortran programs to vector form
- Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM TOPLAS 9(4) (1987)
- (1987) ACM TOPLAS , vol.9 , Issue.4
- Allen, R.¹ Kennedy, K.²

2
- 0027802136
- Communication optimization and code generation for distributed memory machines
- Amarasinghe, S., Lam, M.: Communication optimization and code generation for distributed memory machines. In: PLDI (1993)
- (1993) PLDI
- Amarasinghe, S.¹ Lam, M.²

3
- 0029181140
- Data and computation transformations for multiprocessors
- Anderson, J., Amarasinghe, S., Lam, M.: Data and computation transformations for multiprocessors. In: PPoPP (1995)
- (1995) PPoPP
- Anderson, J.¹ Amarasinghe, S.² Lam, M.³

4
- 70350676807
- Optimized stencil computation using in-place calculation on modern multicore systems
- Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
- Augustin, W., Heuveline, V., Weiss, J.-P.: Optimized stencil computation using in-place calculation on modern multicore systems. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 772-784. Springer, Heidelberg (2009)
- (2009) LNCS , vol.5704 , pp. 772-784
- Augustin, W.¹ Heuveline, V.² Weiss, J.-P.³

5
- 0027311338
- Automatic array alignment in data-parallel programs
- Chatterjee, S., Gilbert, J., Schreiber, R., Teng, S.: Automatic array alignment in data-parallel programs. In: POPL (1993)
- (1993) POPL
- Chatterjee, S.¹ Gilbert, J.² Schreiber, R.³ Teng, S.⁴

6
- 59749100826
- Optimization and performance modeling of stencil computations on modern microprocessors
- Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1) (2009)
- (2009) SIAM Review , vol.51 , Issue.1
- Datta, K.¹ Kamil, S.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

7
- 70350771127
- Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
- Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC 2008, pp. 1-12 (2008)
- (2008) SC 2008 , pp. 1-12
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

8
- 84971423310
- Auto-tuning the 27-point stencil for multicore
- Datta, K., Williams, S., Volkov, V., Carter, J., Oliker, L., Shalf, J., Yelick, K.: Auto-tuning the 27-point stencil for multicore. In: iWAPT 2009 (2009)
- (2009) IWAPT 2009
- Datta, K.¹ Williams, S.² Volkov, V.³ Carter, J.⁴ Oliker, L.⁵ Shalf, J.⁶ Yelick, K.⁷

9
- 79953287400
- Introducing the semi-stencil algorithm
- de la Cruz, R., Araya-Polo, M., Cela, J.M.: Introducing the semi-stencil algorithm. In: PPAM (1) (2009)
- (2009) PPAM , Issue.1
- De La Cruz, R.¹ Araya-Polo, M.² Cela, J.M.³

10
- 79953283169
- In-core optimization of high-order stencil computations
- Dursun, H., Nomura, K., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA (2009)
- (2009) PDPTA
- Dursun, H.¹ Nomura, K.² Wang, W.³ Kunaseth, M.⁴ Peng, L.⁵ Seymour, R.⁶ Kalia, R.⁷ Nakano, A.⁸ Vashishta, P.⁹

11
- 70350630432
- A multilevel parallelization framework for high-order stencil computations
- Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. Springer, Heidelberg
- Dursun, H., Nomura, K.-i., Peng, L., Seymour, R., Wang, W., Kalia, R.K., Nakano, A., Vashishta, P.: A multilevel parallelization framework for high-order stencil computations. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 642-653. Springer, Heidelberg (2009)
- (2009) LNCS , vol.5704 , pp. 642-653
- Dursun, H.¹ Nomura, K.-I.² Peng, L.³ Seymour, R.⁴ Wang, W.⁵ Kalia, R.K.⁶ Nakano, A.⁷ Vashishta, P.⁸

12
- 8344245462
- Vectorization for simd architectures with alignment constraints
- Eichenberger, A., Wu, P., O'Brien, K.: Vectorization for simd architectures with alignment constraints. In: PLDI (2004)
- (2004) PLDI
- Eichenberger, A.¹ Wu, P.² O'Brien, K.³

13
- 37149053855
- New algorithms for SIMD alignment
- Adsul, B., Vetta, A. (eds.) CC 2007. Springer, Heidelberg
- Fireman, L., Petrank, E., Zaks, A.: New algorithms for SIMD alignment. In: Adsul, B., Vetta, A. (eds.) CC 2007. LNCS, vol. 4420, pp. 1-15. Springer, Heidelberg (2007)
- (2007) LNCS , vol.4420 , pp. 1-15
- Fireman, L.¹ Petrank, E.² Zaks, A.³

14
- 67149109696
- A simd optimization framework for retargetable compilers
- Hohenauer, M., Engel, F., Leupers, R., Ascheid, G., Meyr, H.: A simd optimization framework for retargetable compilers. ACM TACO 6(1) (2009)
- (2009) ACM TACO , vol.6 , Issue.1
- Hohenauer, M.¹ Engel, F.² Leupers, R.³ Ascheid, G.⁴ Meyr, H.⁵

15
- 77749243464
- Data transformations enabling loop vectorization on multithreaded data parallel architectures
- Jang, B., Mistry, P., Schaa, D., Dominguez, R., Kaeli, D.R.: Data transformations enabling loop vectorization on multithreaded data parallel architectures. In: PPOPP (2010)
- (2010) PPOPP
- Jang, B.¹ Mistry, P.² Schaa, D.³ Dominguez, R.⁴ Kaeli, D.R.⁵

16
- 34547500808
- Implicit and explicit optimizations for stencil computations
- Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: MSPC 2006 (2006)
- (2006) MSPC 2006
- Kamil, S.¹ Datta, K.² Williams, S.³ Oliker, L.⁴ Shalf, J.⁵ Yelick, K.⁶

17
- 84958661690
- Impact of modern memory subsystems on cache optimizations for stencil computations
- Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: MSP 2005 (2005)
- (2005) MSP 2005
- Kamil, S.¹ Husbands, P.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

18
- 0033077834
- A linear algebra framework for automatic determination of optimal data layouts
- Kandemir, M., Choudhary, A., Shenoy, N., Banerjee, P., Ramanujam, J.: A linear algebra framework for automatic determination of optimal data layouts. IEEE TPDS 10(2) (1999)
- (1999) IEEE TPDS , vol.10 , Issue.2
- Kandemir, M.¹ Choudhary, A.² Shenoy, N.³ Banerjee, P.⁴ Ramanujam, J.⁵

19
- 0037952146
- Morgan Kaufmann, San Francisco
- Kennedy, K., Allen, J.: Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann, San Francisco (2002)
- (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Kennedy, K.¹ Allen, J.²

20
- 0032108102
- Automatic data layout for distributed-memory machines
- Kennedy, K., Kremer, U.: Automatic data layout for distributed-memory machines. ACM TOPLAS 20(4) (1998)
- (1998) ACM TOPLAS , vol.20 , Issue.4
- Kennedy, K.¹ Kremer, U.²

21
- 35448944792
- Effective automatic parallelization of stencil computations
- Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., Sadayappan, P.: Effective automatic parallelization of stencil computations. In: PLDI (2007)
- (2007) PLDI
- Krishnamoorthy, S.¹ Baskaran, M.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

22
- 0034446825
- Exploiting superword level parallelism with multimedia instruction sets
- Larsen, S., Amarasinghe, S.P.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)
- (2000) PLDI
- Larsen, S.¹ Amarasinghe, S.P.²

23
- 77950328026
- Increasing and detecting memory address congruence
- Larsen, S., Witchel, E., Amarasinghe, S.P.: Increasing and detecting memory address congruence. In: IEEE PACT (2002)
- (2002) IEEE PACT
- Larsen, S.¹ Witchel, E.² Amarasinghe, S.P.³

24
- 24644456455
- Automatic tiling of iterative stencil loops
- Li, Z., Song, Y.: Automatic tiling of iterative stencil loops. ACM TOPLAS 26(6) (2004)
- (2004) ACM TOPLAS , vol.26 , Issue.6
- Li, Z.¹ Song, Y.²

25
- 70449723385
- Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus
- Meng, J., Skadron, K.: Performance modeling and automatic ghost zone optimization for iterative stencil loops on gpus. In: ICS (2009)
- (2009) ICS
- Meng, J.¹ Skadron, K.²

26
- 67650671606
- 3d finite difference computation on gpus using cuda
- Micikevicius, P.: 3d finite difference computation on gpus using cuda. In: GPGPU-2 (2009)
- (2009) GPGPU-2
- Micikevicius, P.¹

27
- 79953275887
- Multi-platform auto-vectorization
- Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO (2006)
- (2006) CGO
- Nuzman, D.¹ Henderson, R.²

28
- 33746034953
- Auto-vectorization of interleaved data for simd
- Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for simd. In: PLDI (2006)
- (2006) PLDI
- Nuzman, D.¹ Rosen, I.² Zaks, A.³

29
- 63549093768
- Outer-loop vectorization: Revisited for short simd architectures
- Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: PACT (2008)
- (2008) PACT
- Nuzman, D.¹ Zaks, A.²

30
- 0032658236
- Nonsingular data transformations: Definition, validity, and applications
- O'Boyle, M., Knijnenburg, P.: Nonsingular data transformations: Definition, validity, and applications. IJPP 27(3) (1999)
- (1999) IJPP , vol.27 , Issue.3
- O'Boyle, M.¹ Knijnenburg, P.²

31
- 77951447129
- Mapping the FDTD Application to Many-Core Chip Architectures
- Orozco, D., Gao, G.R.: Mapping the FDTD Application to Many-Core Chip Architectures. In: ICPP (2009)
- (2009) ICPP
- Orozco, D.¹ Gao, G.R.²

32
- 0031622954
- Data transformations for eliminating conflict misses
- Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: PLDI (1998)
- (1998) PLDI
- Rivera, G.¹ Tseng, C.-W.²

33
- 77949404004
- Exploiting memory customization in fpga for 3d stencil computations
- Shafiq, M., Pericas, M., de la Cruz, R., Araya-Polo, M., Navarro, N., Ayguade, E.: Exploiting memory customization in fpga for 3d stencil computations. In: FPT (2009)
- (2009) FPT
- Shafiq, M.¹ Pericas, M.² De La Cruz, R.³ Araya-Polo, M.⁴ Navarro, N.⁵ Ayguade, E.⁶

34
- 35449003235
- Sketching stencils
- Solar-Lezama, A., Arnold, G., Tancau, L., Bodik, R., Saraswat, V., Seshia, S.: Sketching stencils. In: PLDI (2007)
- (2007) PLDI
- Solar-Lezama, A.¹ Arnold, G.² Tancau, L.³ Bodik, R.⁴ Saraswat, V.⁵ Seshia, S.⁶

35
- 79953269601
- Efficient multicore-aware parallelization strategies for iterative stencil computations
- abs/1004.1741
- Treibig, J., Wellein, G., Hager, G.: Efficient multicore-aware parallelization strategies for iterative stencil computations. CoRR, abs/1004.1741 (2010)
- (2010) CoRR
- Treibig, J.¹ Wellein, G.² Hager, G.³

36
- 70449723384
- Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems
- Venkatasubramanian, S., Vuduc, R.: Tuned and wildly asynchronous stencil kernels for hybrid cpu/gpu systems. In: ICS (2009)
- (2009) ICS
- Venkatasubramanian, S.¹ Vuduc, R.²

37
- 70449657442
- Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
- Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: COMPSAC (2009)
- (2009) COMPSAC
- Wellein, G.¹ Hager, G.² Zeiser, T.³ Wittmann, M.⁴ Fehske, H.⁵

38
- 78650871519
- Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
- abs/1006.3148
- Wittmann, M., Hager, G., Treibig, J., Wellein, G.: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. CoRR, abs/1006.3148 (2010)
- (2010) CoRR
- Wittmann, M.¹ Hager, G.² Treibig, J.³ Wellein, G.⁴

39
- 0003927035
- Addison-Wesley, Reading
- Wolfe, M.J.: High Performance Compilers For Parallel Computing. Addison-Wesley, Reading (1996)
- (1996) High Performance Compilers for Parallel Computing
- Wolfe, M.J.¹

40
- 1542392248
- Achieving scalable locality with time skewing
- Wonnacott, D.: Achieving scalable locality with time skewing. IJPP 30(3) (2002)
- (2002) IJPP , vol.30 , Issue.3
- Wonnacott, D.¹

41
- 33646833599
- Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
- Wu, P., Eichenberger, A.E., Wang, A.: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In: CGO (2005)
- (2005) CGO
- Wu, P.¹ Eichenberger, A.E.² Wang, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.