-
4
-
-
0344908850
-
Automatic intra-register vectorization for the intel architecture
-
April
-
A. Bik, M. Girkar, P. M. Grey, and X. Tian. Automatic intra-register vectorization for the intel architecture. International J. of Parallel Programming, 2:65-98, April 2002.
-
(2002)
International J. of Parallel Programming
, vol.2
, pp. 65-98
-
-
Bik, A.1
Girkar, M.2
Grey, P.M.3
Tian, X.4
-
6
-
-
0027311338
-
Automatic array alignment in data-parallel programs
-
ACM Press
-
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng. Automatic array alignment in data-parallel programs. In Proceedings of POPL, pages 16-28. ACM Press, 1993.
-
(1993)
Proceedings of POPL
, pp. 16-28
-
-
Chatterjee, S.1
Gilbert, J.R.2
Schreiber, R.3
Teng, S.-H.4
-
7
-
-
0029238937
-
Optimal evaluation of array expressions on massively parallel machines
-
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng. Optimal evaluation of array expressions on massively parallel machines. ACM Trans. Program. Lang. Syst., 17(1):123-156, 1995.
-
(1995)
ACM Trans. Program. Lang. Syst
, vol.17
, Issue.1
, pp. 123-156
-
-
Chatterjee, S.1
Gilbert, J.R.2
Schreiber, R.3
Teng, S.-H.4
-
10
-
-
37149048617
-
-
M. Corporation. Altivec technology programming interface manual. June 1999.
-
M. Corporation. Altivec technology programming interface manual. June 1999.
-
-
-
-
11
-
-
0026966832
-
The complexity of multiway cuts (extended abstract)
-
New York, NY, USA, ACM Press
-
E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiway cuts (extended abstract). In Proceedings of the 24th ACM symposium on Theory of computing, pages 241-251, New York, NY, USA, 1992. ACM Press.
-
(1992)
Proceedings of the 24th ACM symposium on Theory of computing
, pp. 241-251
-
-
Dahlhaus, E.1
Johnson, D.S.2
Papadimitriou, C.H.3
Seymour, P.D.4
Yannakakis, M.5
-
13
-
-
1642502420
-
Improving effective bandwidth through compiler enhancement of global cache reuse
-
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. J. Parallel Distrib. Comput., 64:108-134, 2004.
-
(2004)
J. Parallel Distrib. Comput
, vol.64
, pp. 108-134
-
-
Ding, C.1
Kennedy, K.2
-
14
-
-
8344245462
-
Vectorization for SIMD architectures with alignment constraints
-
June
-
A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. In Proceeding of PLDI, June 2004.
-
(2004)
Proceeding of PLDI
-
-
Eichenberger, A.E.1
Wu, P.2
O'Brien, K.3
-
15
-
-
37149024346
-
-
M.Sc. thesis, Technion, Israel Institute of Technology, Department of Computer Science, June
-
L. Fireman. The complexity of SIMD alignment. M.Sc. thesis, Technion - Israel Institute of Technology, Department of Computer Science, June 2006. http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2006/MSC/ MSC-2006-17.
-
(2006)
The complexity of SIMD alignment
-
-
Fireman, L.1
-
16
-
-
84876653309
-
Collective loop fusion for array contraction
-
G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Workshop on Languages and Compilers for Parallel Computing, pages 281-295, 1992.
-
(1992)
Workshop on Languages and Compilers for Parallel Computing
, pp. 281-295
-
-
Gao, G.R.1
Olsen, R.2
Sarkar, V.3
Thekkath, R.4
-
17
-
-
0026219468
-
Optimal expression evaluation for data parallel architectures
-
J. R. Gilbert and R. Schreiber. Optimal expression evaluation for data parallel architectures. J. Parallel Distrib. Comput., 13(1):58-64, 1991.
-
(1991)
J. Parallel Distrib. Comput
, vol.13
, Issue.1
, pp. 58-64
-
-
Gilbert, J.R.1
Schreiber, R.2
-
19
-
-
0034446825
-
Exploiting superword level parallelism with multimedia instruction sets
-
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of PLDI, pages 145-156, 2000.
-
(2000)
Proceedings of PLDI
, pp. 145-156
-
-
Larsen, S.1
Amarasinghe, S.2
-
21
-
-
37149021552
-
-
D. Naishlos. Autovectorization in gcc. In Proceeding of GCC Developers Summit, pages 105-118, 2004.
-
D. Naishlos. Autovectorization in gcc. In Proceeding of GCC Developers Summit, pages 105-118, 2004.
-
-
-
-
22
-
-
4544372264
-
Vectorizing for a SIMdD DSP Architecture
-
D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP Architecture. In Proceedings of CASES, pages 2-11, 2003.
-
(2003)
Proceedings of CASES
, pp. 2-11
-
-
Naishlos, D.1
Biberstein, M.2
Ben-David, S.3
Zaks, A.4
-
23
-
-
79953275887
-
Multi-platform auto-vectorization
-
D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In Proceedings of CGO, pages 281-294, 2006.
-
(2006)
Proceedings of CGO
, pp. 281-294
-
-
Nuzman, D.1
Henderson, R.2
-
26
-
-
33745222449
-
Optimizing data permutations for simd devices
-
G. Ren, P. Wu, and D. A. Padua. Optimizing data permutations for simd devices. In Proceedings of PLDI, pages 118-131, 2006.
-
(2006)
Proceedings of PLDI
, pp. 118-131
-
-
Ren, G.1
Wu, P.2
Padua, D.A.3
-
27
-
-
33646554301
-
Superword-level parallelism in the presence of control flow
-
Washington, DC, USA, IEEE Computer Society
-
J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In Proceedings of CGO, pages 165-175, Washington, DC, USA, 2005. IEEE Computer Society.
-
(2005)
Proceedings of CGO
, pp. 165-175
-
-
Shin, J.1
Hall, M.2
Chame, J.3
-
28
-
-
37149001737
-
-
C. B. Software. VAST-F/AltiVec: Automatic Fortran Vectorizer for PowerPC Vector Unit. http://www.psrv.com/vastaltivec.html, 2004.
-
C. B. Software. VAST-F/AltiVec: Automatic Fortran Vectorizer for PowerPC Vector Unit. http://www.psrv.com/vastaltivec.html, 2004.
-
-
-
-
30
-
-
33646833599
-
Efficient simd code generation for runtime alignment and length conversion
-
Washington, DC, USA, IEEE Computer Society
-
P. Wu, A. E. Eichenberger, and A. Wang. Efficient simd code generation for runtime alignment and length conversion. In Proceedings of CGO, pages 153-164, Washington, DC, USA, 2005. IEEE Computer Society.
-
(2005)
Proceedings of CGO
, pp. 153-164
-
-
Wu, P.1
Eichenberger, A.E.2
Wang, A.3
|