-
6
-
-
34547185000
-
Scalable subgraph mapping for acyclic computation accelerators
-
N. Clark, A. Hormati, S. Mahlke, and S. Yehia, "Scalable subgraph mapping for acyclic computation accelerators," in CASES'06.
-
CASES'06
-
-
Clark, N.1
Hormati, A.2
Mahlke, S.3
Yehia, S.4
-
8
-
-
77954724842
-
Sams multi-layout memory: Providing multiple views of data to boost simd performance
-
C. Gou, G. Kuzmanov, and G. Gaydadjiev, "Sams multi-layout memory: providing multiple views of data to boost simd performance," in ICS'10.
-
ICS'10
-
-
Gou, C.1
Kuzmanov, G.2
Gaydadjiev, G.3
-
9
-
-
84869168810
-
Dyser: Unifying functionality and parallelism specialization for energy efficient computing
-
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy efficient computing," IEEE Micro, vol. 33, no. 5, 2012.
-
(2012)
IEEE Micro
, vol.33
, Issue.5
-
-
Govindaraju, V.1
Ho, C.-H.2
Nowatzki, T.3
Chhugani, J.4
Satish, N.5
Sankaralingam, K.6
Kim, C.7
-
10
-
-
79955890625
-
Dynamically specialized datapaths for energy efficient computing
-
V. Govindaraju, C.-H. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA 2011.
-
(2011)
HPCA
-
-
Govindaraju, V.1
Ho, C.-H.2
Sankaralingam, K.3
-
11
-
-
84871291822
-
Bundled execution of recurring traces for energy-efficient general purpose processing
-
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO-44.
-
MICRO-44
-
-
Gupta, S.1
Feng, S.2
Ansari, A.3
Mahlke, S.4
August, D.5
-
12
-
-
0031360911
-
Garp: A MIPS processor with a reconfigurable coprocessor
-
J. R. Hauser and J. Wawrzynek, "Garp: A MIPS Processor with a Reconfigurable Coprocessor," in FCCM'97.
-
FCCM'97
-
-
Hauser, J.R.1
Wawrzynek, J.2
-
13
-
-
84863451245
-
Dynamic trace-based analysis of vectorization potential of applications
-
J. Holewinski, R. Ramamurthi, M. Ravishankar, N. Fauzia, L.-N. Pouchet, A. Rountev, and P. Sadayappan, "Dynamic trace-based analysis of vectorization potential of applications," SIGPLAN Not., 2012.
-
(2012)
SIGPLAN Not
-
-
Holewinski, J.1
Ramamurthi, R.2
Ravishankar, M.3
Fauzia, N.4
Pouchet, L.-N.5
Rountev, A.6
Sadayappan, P.7
-
15
-
-
0034446825
-
Exploiting superword level parallelism with multimedia instruction sets
-
S. Larsen and S. Amarasinghe, "Exploiting superword level parallelism with multimedia instruction sets," in PLDI'00.
-
PLDI'00
-
-
Larsen, S.1
Amarasinghe, S.2
-
16
-
-
80052543989
-
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
-
Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart, C. Batten, and K. Asanovíc, "Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators," in ISCA'11.
-
ISCA'11
-
-
Lee, Y.1
Avizienis, R.2
Bishara, A.3
Xia, R.4
Lockhart, D.5
Batten, C.6
Asanovíc, K.7
-
17
-
-
84863012838
-
An evaluation of vectorizing compilers
-
S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua, "An evaluation of vectorizing compilers," in PACT'11.
-
PACT'11
-
-
Maleki, S.1
Gao, Y.2
Garzarán, M.J.3
Wong, T.4
Padua, D.A.5
-
18
-
-
84887477162
-
Tartan: Evaluating spatial computation for whole program execution
-
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: evaluating spatial computation for whole program execution," in ASPLOS-XII.
-
ASPLOS-XII
-
-
Mishra, M.1
Callahan, T.J.2
Chelcea, T.3
Venkataramani, G.4
Goldstein, S.C.5
Budiu, M.6
-
19
-
-
77951154340
-
The gpu computing era
-
Mar.
-
J. Nickolls and W. J. Dally, "The gpu computing era," IEEE Micro, vol. 30, no. 2, Mar. 2010.
-
(2010)
IEEE Micro
, vol.30
, Issue.2
-
-
Nickolls, J.1
Dally, W.J.2
-
20
-
-
84883088830
-
A general constraint-centric scheduling framework for spatial architectures
-
T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," in PLDI 2013.
-
(2013)
PLDI
-
-
Nowatzki, T.1
Sartin-Tarm, M.2
De Carli, L.3
Sankaralingam, K.4
Estan, C.5
Robatmili, B.6
-
21
-
-
79953275887
-
Multi-platform auto-vectorization
-
D. Nuzman and R. Henderson, "Multi-platform auto-vectorization, " in CGO'06.
-
CGO'06
-
-
Nuzman, D.1
Henderson, R.2
-
22
-
-
33746034953
-
Auto-vectorization of interleaved data for simd
-
D. Nuzman, I. Rosen, and A. Zaks, "Auto-vectorization of interleaved data for simd," in PLDI'06.
-
PLDI'06
-
-
Nuzman, D.1
Rosen, I.2
Zaks, A.3
-
23
-
-
0022874874
-
Advanced compiler optimizations for supercomputers
-
D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers," Commun. ACM, 1986.
-
(1986)
Commun. ACM
-
-
Padua, D.A.1
Wolfe, M.J.2
-
25
-
-
84876586321
-
Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability
-
Y. Park, J. J. K. Park, H. Park, and S. Mahlke, "Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability," in MICRO'12.
-
MICRO'12
-
-
Park, Y.1
Park, J.J.K.2
Park, H.3
Mahlke, S.4
-
26
-
-
84863353689
-
Simd defragmenter: Efficient ilp realization on data-parallel architectures
-
Y. Park, S. Seo, H. Park, H. K. Cho, and S. Mahlke, "Simd defragmenter: efficient ilp realization on data-parallel architectures," in ASPLOS'12.
-
ASPLOS'12
-
-
Park, Y.1
Seo, S.2
Park, H.3
Cho, H.K.4
Mahlke, S.5
-
27
-
-
84870653904
-
Ispc: A spmd compiler for highperformance cpu programming
-
M. Pharr and W. R. Mark, ""ispc: A spmd compiler for highperformance cpu programming"," in InPar 2012.
-
(2012)
InPar
-
-
Pharr, M.1
Mark, W.R.2
-
28
-
-
33745222449
-
Optimizing data permutations for simd devices
-
G. Ren, P. Wu, and D. Padua, "Optimizing data permutations for simd devices," in PLDI'06.
-
PLDI'06
-
-
Ren, G.1
Wu, P.2
Padua, D.3
-
29
-
-
84944402628
-
Universal mechanisms for data-parallel architectures
-
December
-
K. Sankaralingam, S. W. Keckler, W. R. Mark, and D. Burger, "Universal Mechanisms for Data-Parallel Architectures," in MICRO'03: Proceedings of the 36th Annual International Symposium on Microarchitecture, December 2003, pp. 303-314.
-
(2003)
MICRO'03: Proceedings of the 36th Annual International Symposium on Microarchitecture
, pp. 303-314
-
-
Sankaralingam, K.1
Keckler, S.W.2
Mark, W.R.3
Burger, D.4
-
30
-
-
84887486539
-
Constraint centric scheduling guide
-
May
-
M. Sartin-Tarm, T. Nowatzki, L. De Carli, K. Sankaralingam, and C. Estan, "Constraint centric scheduling guide," SIGARCH Comput. Archit. News, vol. 41, no. 2, pp. 17-21, May 2013.
-
(2013)
SIGARCH Comput. Archit. News
, vol.41
, Issue.2
, pp. 17-21
-
-
Sartin-Tarm, M.1
Nowatzki, T.2
De Carli, L.3
Sankaralingam, K.4
Estan, C.5
-
31
-
-
84864831385
-
Can traditional programming bridge the ninja performance gap for parallel computing applications
-
N. Satish, C. Kim, J. Chhugani, H. Saito, R. Krishnaiyer, M. Smelyanskiy, M. Girkar, and P. Dubey, "Can traditional programming bridge the ninja performance gap for parallel computing applications?" in ISCA 2012.
-
(2012)
ISCA
-
-
Satish, N.1
Kim, C.2
Chhugani, J.3
Saito, H.4
Krishnaiyer, R.5
Smelyanskiy, M.6
Girkar, M.7
Dubey, P.8
-
32
-
-
57649106258
-
Larrabee: A many-core x86 architecture for visual computing
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual computing," in SIGGRAPH 2008.
-
(2008)
SIGGRAPH
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
33
-
-
47849103500
-
Introducing control flow into vectorized code
-
J. Shin, "Introducing control flow into vectorized code," in PACT'07.
-
PACT'07
-
-
Shin, J.1
-
34
-
-
85088882721
-
Vector instruction set support for conditional operations
-
J. E. Smith, G. Faanes, and R. Sugumar, "Vector instruction set support for conditional operations," in ISCA'00.
-
ISCA'00
-
-
Smith, J.E.1
Faanes, G.2
Sugumar, R.3
-
35
-
-
84857819522
-
Using machine learning to improve automatic vectorization
-
K. Stock, L.-N. Pouchet, and P. Sadayappan, "Using machine learning to improve automatic vectorization," TACO 2012.
-
(2012)
TACO
-
-
Stock, K.1
Pouchet, L.-N.2
Sadayappan, P.3
-
36
-
-
70449626135
-
Polyhedral-model guided loop-nest auto-vectorization
-
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, "Polyhedral-model guided loop-nest auto-vectorization," in PACT'09.
-
PACT'09
-
-
Trifunovic, K.1
Nuzman, D.2
Cohen, A.3
Zaks, A.4
Rosen, I.5
-
37
-
-
84887444440
-
Relaxing simd control flow constraints using loop transformations
-
R. v. Hanxleden and K. Kennedy, "Relaxing simd control flow constraints using loop transformations," in PLDI'92.
-
PLDI'92
-
-
Hanxleden, R.V.1
Kennedy, K.2
-
38
-
-
77952256041
-
Conservation cores: Reducing the energy of mature computations
-
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: reducing the energy of mature computations," in ASPLOS'10.
-
ASPLOS'10
-
-
Venkatesh, G.1
Sampson, J.2
Goulding, N.3
Garcia, S.4
Bryksin, V.5
Lugo-Martinez, J.6
Swanson, S.7
Taylor, M.B.8
|