-
2
-
-
67650525989
-
Shared memory consistency models: A tutorial
-
S. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 1995.
-
(1995)
IEEE Computer
-
-
Adve, S.1
Gharachorloo, K.2
-
3
-
-
0010355322
-
Perfect pipelining: A new loop parallelization technique
-
A. Aiken and A. Nicolau. Perfect pipelining: A new loop parallelization technique. ESOP, 1988.
-
(1988)
ESOP
-
-
Aiken, A.1
Nicolau, A.2
-
5
-
-
85060036181
-
Validity of the single processor approach to achieving large scale computing capabilities
-
G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. Proc. Spring Joint Computer Conference, 1967.
-
(1967)
Proc. Spring Joint Computer Conference
-
-
Amdahl, G.1
-
7
-
-
41349123319
-
Revisiting the sequential programming model for the multicore era
-
M. Bridges et al. Revisiting the sequential programming model for the multicore era. IEEE Micro, 2008.
-
(2008)
IEEE Micro
-
-
Bridges, M.1
-
8
-
-
76949106140
-
A highly flexible, parallel virtual machine: Design and experience of ILDJIT
-
S. Campanoni et al. A highly flexible, parallel virtual machine: Design and experience of ILDJIT. Softw. Pract. Exper., 2010.
-
(2010)
Softw. Pract. Exper.
-
-
Campanoni, S.1
-
9
-
-
0032662989
-
Simultaneous subordinate microthreading (SSMT)
-
R. Chappell et al. Simultaneous subordinate microthreading (SSMT). ISCA, 1999.
-
(1999)
ISCA
-
-
Chappell, R.1
-
11
-
-
0012526362
-
Statement re-ordering for DOACROSS loops
-
D-K. Chen and P-C. Yew. Statement re-ordering for DOACROSS loops. ICPP, 1994.
-
(1994)
ICPP
-
-
Chen, D.-K.1
Yew, P.-C.2
-
14
-
-
84863453906
-
-
R. Costa et al. Gcc4cli. http://gcc.gnu.org/projects/cli.html.
-
Gcc4cli
-
-
Costa, R.1
-
15
-
-
0022893044
-
DOACROSS: Beyond vectorization for multiprocessors
-
R. Cytron. DOACROSS: Beyond vectorization for multiprocessors. ICPP, 1986.
-
(1986)
ICPP
-
-
Cytron, R.1
-
18
-
-
0024012163
-
Reevaluating Amdahl's law
-
31, May
-
J. Gustafson. Reevaluating Amdahl's law. Commun. ACM, 31, May 1988.
-
(1988)
Commun. ACM
-
-
Gustafson, J.1
-
19
-
-
77954006048
-
Decoupled software pipelining creates parallelization opportunities
-
Jialu H. et al. Decoupled software pipelining creates parallelization opportunities. CGO, 2010.
-
(2010)
CGO
-
-
Jialu, H.1
-
20
-
-
74349096277
-
Parallelization of DOALL and DOACROSS loops - A survey
-
A. Hurson et al. Parallelization of DOALL and DOACROSS loops - a survey. Advances in Computers, 1997.
-
(1997)
Advances in Computers
-
-
Hurson, A.1
-
21
-
-
84863484683
-
-
VTune. http://software.intel.com/en-us/intel-vtune.
-
VTune
-
-
-
22
-
-
3042569221
-
Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors
-
D. Kim et al. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors. CGO, 2004.
-
(2004)
CGO
-
-
Kim, D.1
-
23
-
-
79951708803
-
Scalable speculative parallelization on commodity clusters
-
H. Kim et al. Scalable speculative parallelization on commodity clusters. MICRO, 2010.
-
(2010)
MICRO
-
-
Kim, H.1
-
24
-
-
0030400452
-
A loop allocation policy for DOACROSS loops
-
J. Lim et al. A loop allocation policy for DOACROSS loops. SPDP, 1996.
-
(1996)
SPDP
-
-
Lim, J.1
-
25
-
-
72049107238
-
Optimal loop parallelization for maximizing iteration-level parallelism
-
D. Liu et al. Optimal loop parallelization for maximizing iteration-level parallelism. CASES, 2009.
-
(2009)
CASES
-
-
Liu, D.1
-
26
-
-
0031199614
-
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
-
J. Lo et al. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. TCS, 1997.
-
(1997)
TCS
-
-
Lo, J.1
-
27
-
-
81455150594
-
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
-
C-K. Luk. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. SIGARCH Comp. Arch. News, 2001.
-
(2001)
SIGARCH Comp. Arch. News
-
-
Luk, C.-K.1
-
28
-
-
70449709551
-
Synchronization optimizations for efficient execution on multi-cores
-
A. Nicolau et al. Synchronization optimizations for efficient execution on multi-cores. ICS, 2009.
-
(2009)
ICS
-
-
Nicolau, A.1
-
29
-
-
67650096789
-
Techniques for efficient placement of synchronization primitives
-
A. Nicolau et al. Techniques for efficient placement of synchronization primitives. PPoPP, 2009.
-
(2009)
PPoPP
-
-
Nicolau, A.1
-
30
-
-
33749375700
-
Automatic thread extraction with decoupled software pipelining
-
G. Ottoni et al. Automatic thread extraction with decoupled software pipelining. MICRO, 2005.
-
(2005)
MICRO
-
-
Ottoni, G.1
-
31
-
-
84863453911
-
Exposing speculative thread parallelism in SPEC2000
-
M. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. PPoPP, 2000.
-
(2000)
PPoPP
-
-
Prabhu, M.1
Olukotun, K.2
-
32
-
-
77952281906
-
Speculative parallelization using software multi-threaded transactions
-
A. Raman et al. Speculative parallelization using software multi-threaded transactions. ASPLOS, 2010.
-
(2010)
ASPLOS
-
-
Raman, A.1
-
33
-
-
43449113286
-
Parallel-Stage decoupled software pipelining
-
E. Raman et al. Parallel-Stage decoupled software pipelining. CGO, 2008.
-
(2008)
CGO
-
-
Raman, E.1
-
34
-
-
51149117060
-
Performance scalability of decoupled software pipelining
-
R. Rangan et al. Performance scalability of decoupled software pipelining. TACO, 2008.
-
(2008)
TACO
-
-
Rangan, R.1
-
35
-
-
84863463725
-
Spin-block synchronization algorithm in the shared memory multiprocessor system
-
J. Seung-Ju and K. Gil-Yong. Spin-block synchronization algorithm in the shared memory multiprocessor system. SIGOPS Oper. Syst. Rev., 1994.
-
(1994)
SIGOPS Oper. Syst. Rev.
-
-
Seung-Ju, J.1
Gil-Yong, K.2
-
36
-
-
84863431682
-
Efficient DOACROSS execution on distributed shared-memory multiprocessors
-
H-M. Su and P-C. Yew. Efficient DOACROSS execution on distributed shared-memory multiprocessors. ACM/IEEE conference on Supercomputing, 1991.
-
(1991)
ACM/IEEE Conference on Supercomputing
-
-
Su, H.-M.1
Yew, P.-C.2
-
37
-
-
47349118686
-
A practical approach to exploiting coarse-grained pipeline parallelism in C programs
-
W. Thies et al. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. MICRO, 2007.
-
(2007)
MICRO
-
-
Thies, W.1
-
38
-
-
78650659831
-
Towards a holistic approach to auto-parallelization
-
G. Tournavitis et al. Towards a holistic approach to auto- parallelization. PLDI, 2009.
-
(2009)
PLDI
-
-
Tournavitis, G.1
-
39
-
-
41349089872
-
Speculative decoupled software pipelining
-
N. Vachharajani et al. Speculative decoupled software pipelining. PACT, 2007.
-
(2007)
PACT
-
-
Vachharajani, N.1
-
40
-
-
0035335764
-
Time stamp algorithms for runtime parallelization of DOACROSS loops with dynamic dependences
-
C.-Z. Xu and V. Chaudhary. Time stamp algorithms for runtime parallelization of DOACROSS loops with dynamic dependences. TPDS, 2001.
-
(2001)
TPDS
-
-
Xu, C.-Z.1
Chaudhary, V.2
-
41
-
-
57749168614
-
Uncovering hidden loop level parallelism in sequential applications
-
H. Zhong et al. Uncovering hidden loop level parallelism in sequential applications. HPCA, 2008.
-
(2008)
HPCA
-
-
Zhong, H.1
-
42
-
-
70449652981
-
Exploiting parallelism with dependence-aware scheduling
-
X. Zhuang et al. Exploiting parallelism with dependence-aware scheduling. In PACT, pages 193-202, 2009.
-
(2009)
PACT
, pp. 193-202
-
-
Zhuang, X.1
|