메뉴 건너뛰기




Volumn , Issue , 2008, Pages 2-11

Outer-loop vectorization - revisited for short SIMD architectures

Author keywords

Data reuse; SIMD; Subword parallelism; Vectorization

Indexed keywords

DATA REUSE; DATA-LEVEL PARALLELISMS; EMBEDDED APPLICATIONS; LOOP INTERCHANGES; LOOP NESTS; LOOP VECTORIZATION; OPTIMIZING COMPILERS; OUTER LOOPS; PERFORMANCE IMPROVEMENTS; SIMD; SIMD ARCHITECTURES; SPEED-UP FACTORS; SUBWORD PARALLELISM; VECTOR MACHINES; VECTORIZATION;

EID: 63549093768     PISSN: 1089795X     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1454115.1454119     Document Type: Conference Paper
Times cited : (138)

References (35)
  • 1
    • 0141513025 scopus 로고
    • Pfc: A program to convert fortran to parallel form
    • Rice University
    • R. Allen and K. Kennedy. Pfc: A program to convert fortran to parallel form. Dept. of Math. Sciences, Rice University, 1982.
    • (1982) Dept. of Math. Sciences
    • Allen, R.1    Kennedy, K.2
  • 2
    • 0023438847 scopus 로고
    • Automatic translation of fortran programs to vector form
    • R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form. ACM Tr. on Prog. Lang. and Systems, 9(4):491-542, 1987.
    • (1987) ACM Tr. on Prog. Lang. and Systems , vol.9 , Issue.4 , pp. 491-542
    • Allen, R.1    Kennedy, K.2
  • 5
    • 0002921197 scopus 로고    scopus 로고
    • Efficient exploitation of parallelism on Pentium III and Pentium 4 processor-based systems
    • February
    • A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Efficient exploitation of parallelism on Pentium III and Pentium 4 processor-based systems. Intel Technology J., February 2001.
    • (2001) Intel Technology J
    • Bik, A.J.C.1    Girkar, M.2    Grey, P.M.3    Tian, X.4
  • 6
    • 0344908850 scopus 로고    scopus 로고
    • Automatic intra-register vectorization for the intel architecture
    • A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic intra-register vectorization for the intel architecture. Int. J. Parallel Program., 30(2):65-98, 2002.
    • (2002) Int. J. Parallel Program , vol.30 , Issue.2 , pp. 65-98
    • Bik, A.J.C.1    Girkar, M.2    Grey, P.M.3    Tian, X.4
  • 7
    • 0003261178 scopus 로고    scopus 로고
    • Exploiting a new level of dlp in multimedia applications
    • J. Corbal, R. Espasa, and M. Valero. Exploiting a new level of dlp in multimedia applications. In Micro, 1999.
    • (1999) Micro
    • Corbal, J.1    Espasa, R.2    Valero, M.3
  • 8
    • 8344245462 scopus 로고    scopus 로고
    • Vectorization for simd architectures with alignment constraints
    • A. E. Eichenberger, P. Wu, and K. O'brien. Vectorization for simd architectures with alignment constraints. In PLDI, 2004.
    • (2004) PLDI
    • Eichenberger, A.E.1    Wu, P.2    O'brien, K.3
  • 9
    • 63549125298 scopus 로고    scopus 로고
    • Free Software Foundation. GCC
    • Free Software Foundation. GCC, http://gcc.gnu.org.
  • 10
    • 63549117796 scopus 로고    scopus 로고
    • Free Software Foundation. gcc.gnu.org/projects/tree-ssa/vectorization. html
    • Free Software Foundation. gcc.gnu.org/projects/tree-ssa/vectorization. html.
  • 11
    • 43449100381 scopus 로고    scopus 로고
    • Compiling for vector-thread architectures
    • To appear, April
    • M. Hampton and K. Asanovic. Compiling for vector-thread architectures. In CGO, To appear, April 2008.
    • (2008) CGO
    • Hampton, M.1    Asanovic, K.2
  • 12
    • 25844503119 scopus 로고    scopus 로고
    • Introduction to the cell multiprocessor
    • July
    • J. A. Kahle and et al. Introduction to the cell multiprocessor. IBM J. of R & D, 49(4):589-604, July 2005.
    • (2005) IBM J. of R & D , vol.49 , Issue.4 , pp. 589-604
    • Kahle, J.A.1    and et, al.2
  • 13
    • 63549131451 scopus 로고    scopus 로고
    • Hardware/compiler co-development for an embedded media processor
    • November
    • C. Kozyrakis, D. Judd, J. Gebis, S. Williams, D. Patterson, and K. Yelick. Hardware/compiler co-development for an embedded media processor. IEEE, 89(11):694-709, November 2001.
    • (2001) IEEE , vol.89 , Issue.11 , pp. 694-709
    • Kozyrakis, C.1    Judd, D.2    Gebis, J.3    Williams, S.4    Patterson, D.5    Yelick, K.6
  • 14
    • 84948978169 scopus 로고    scopus 로고
    • Vector vs. superscalar and vliw architectures for embedded multimedia benchmarks
    • C. Kozyrakis and D. Patterson. Vector vs. superscalar and vliw architectures for embedded multimedia benchmarks. Micro, 2002.
    • (2002) Micro
    • Kozyrakis, C.1    Patterson, D.2
  • 15
    • 0034446825 scopus 로고    scopus 로고
    • Exploiting superword level parallelism with multimedia instruction sets
    • S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. PLDI, 2000.
    • (2000) PLDI
    • Larsen, S.1    Amarasinghe, S.2
  • 16
    • 84948766393 scopus 로고    scopus 로고
    • Increasing and detecting memory address congruence
    • S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In PACT, 2002.
    • (2002) PACT
    • Larsen, S.1    Witchel, E.2    Amarasinghe, S.3
  • 17
    • 63549147948 scopus 로고    scopus 로고
    • C. G. Lee. Utdsp benchmarks. http://www.eecg.toronto.edu/corinna/DSP/ infrastructure/UTDSP.html, 1998.
    • (1998) Utdsp benchmarks
    • Lee, C.G.1
  • 20
    • 79953275887 scopus 로고    scopus 로고
    • Multi-platform auto-vectorization
    • D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, 2006.
    • (2006) In CGO
    • Nuzman, D.1    Henderson, R.2
  • 22
    • 33746034953 scopus 로고    scopus 로고
    • Auto-vectorization of interleaved data for simd
    • D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for simd. In PLDI, 2006.
    • (2006) PLDI
    • Nuzman, D.1    Rosen, I.2    Zaks, A.3
  • 24
    • 0022676201 scopus 로고
    • A vectorizing fortran compiler
    • March
    • R. G. Scarborough and H. G. Kolsky. A vectorizing fortran compiler. IBM J. of R & D, 30(2):163-171, March 1986.
    • (1986) IBM J. of R & D , vol.30 , Issue.2 , pp. 163-171
    • Scarborough, R.G.1    Kolsky, H.G.2
  • 25
    • 84976682018 scopus 로고
    • Automatic recognition of vector and parallel operations in a higher level language
    • P. B. Schneck. Automatic recognition of vector and parallel operations in a higher level language. SIGPLAN Not., 7(11):45-52, 1972.
    • (1972) SIGPLAN Not , vol.7 , Issue.11 , pp. 45-52
    • Schneck, P.B.1
  • 27
    • 84948740064 scopus 로고    scopus 로고
    • Compiler-controlled caching in superword register files for multimedia extension architectures
    • September
    • J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In PACT, September 2002.
    • (2002) PACT
    • Shin, J.1    Chame, J.2    Hall, M.W.3
  • 28
    • 33646554301 scopus 로고    scopus 로고
    • J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO, March 2005.
    • J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO, March 2005.
  • 29
    • 33745206426 scopus 로고    scopus 로고
    • Support for the intel pentium 4 processor with hyper-threading technology in intel 8.0 compilers
    • February
    • K. B. Smith, A. J.C. Bik, and X. Tian. Support for the intel pentium 4 processor with hyper-threading technology in intel 8.0 compilers. Intel Tech. J., 8(1):19-31, February 2004.
    • (2004) Intel Tech. J , vol.8 , Issue.1 , pp. 19-31
    • Smith, K.B.1    Bik, A.J.C.2    Tian, X.3
  • 30
    • 27644589117 scopus 로고    scopus 로고
    • Improving superword level parallelism support in modern compilers
    • C. Tenllado and et al. Improving superword level parallelism support in modern compilers. In CODES+ISSS, 2005.
    • (2005) CODES+ISSS
    • Tenllado, C.1    and et, al.2
  • 31
    • 56749149080 scopus 로고    scopus 로고
    • Pack transposition: Enhancing superword level parallelism exploitation
    • C. Tenllado, L. Piñuel, M. Prieto, and F. Catthoor. Pack transposition: Enhancing superword level parallelism exploitation. In ParCo, 2005.
    • (2005) ParCo
    • Tenllado, C.1    Piñuel, L.2    Prieto, M.3    Catthoor, F.4
  • 32
    • 0024168743 scopus 로고
    • V-pascal: An automatic vectorizing compiler for pascal with no language extensions
    • T. Tsuda and Y. Kunieda. V-pascal: an automatic vectorizing compiler for pascal with no language extensions. In Supercomputing, 1988.
    • (1988) Supercomputing
    • Tsuda, T.1    Kunieda, Y.2
  • 34
    • 63549125297 scopus 로고    scopus 로고
    • P. Wu, A. E. Eichenberger, and A. Wang. Efficient simd code generation for runtime alignment. In CGO, March 2005.
    • P. Wu, A. E. Eichenberger, and A. Wang. Efficient simd code generation for runtime alignment. In CGO, March 2005.
  • 35
    • 32844466554 scopus 로고    scopus 로고
    • P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In ICS, 2005.
    • P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In ICS, 2005.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.