-
1
-
-
3042669130
-
IBM POWER5 chip: A dual-core multithreaded processor
-
R. N. Kalla, B. Sinharoy, and J. M. Tendler, "IBM POWER5 chip: A dual-core multithreaded processor," IEEE Micro, vol. 24, no. 2, pp. 40-47, 2004.
-
(2004)
IEEE Micro
, vol.24
, Issue.2
, pp. 40-47
-
-
Kalla, R.N.1
Sinharoy, B.2
Tendler, J.M.3
-
2
-
-
20344374162
-
Niagara: A 32-way multithreaded sparc processor
-
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way multithreaded sparc processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, 2005.
-
(2005)
IEEE Micro
, vol.25
, Issue.2
, pp. 21-29
-
-
Kongetira, P.1
Aingaran, K.2
Olukotun, K.3
-
3
-
-
0036949388
-
An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
-
C. Kim, D. Burger, and S. W. Keckler, "An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches," in ASPLOS'02.
-
ASPLOS'02
-
-
Kim, C.1
Burger, D.2
Keckler, S.W.3
-
4
-
-
21644472427
-
Managing wire delay in large chip-multiprocessor caches
-
B. M. Beckmann and D. A. Wood, "Managing wire delay in large chip-multiprocessor caches," in MICRO'04.
-
MICRO'04
-
-
Beckmann, B.M.1
Wood, D.A.2
-
5
-
-
84956979498
-
Lawra: Linear algebra with recursive algorithms
-
London, UK: Springer-Verlag
-
B. S. Andersen, F. G. Gustavson, A. Karaivanov, M. Marinova, J. Waniewski, and P. Y. Yalamov, "Lawra: Linear algebra with recursive algorithms," in PARA '00. London, UK: Springer-Verlag, 2001, pp. 38-51.
-
(2001)
PARA '00
, pp. 38-51
-
-
Andersen, B.S.1
Gustavson, F.G.2
Karaivanov, A.3
Marinova, M.4
Waniewski, J.5
Yalamov, P.Y.6
-
6
-
-
0026933251
-
Some efficient solutions to the affine scheduling problem: I. one-dimensional time
-
P. Feautrier, "Some efficient solutions to the affine scheduling problem: I. one-dimensional time," IJPP, vol. 21, no. 5, pp. 313-348, 1992.
-
(1992)
IJPP
, vol.21
, Issue.5
, pp. 313-348
-
-
Feautrier, P.1
-
7
-
-
0032058019
-
Constraint-based array dependence analysis
-
W. Pugh and D. Wonnacott, "Constraint-based array dependence analysis," ACM Trans. Program. Lang. Syst., vol. 20, no. 3, pp. 635-678, 1998.
-
(1998)
ACM Trans. Program. Lang. Syst.
, vol.20
, Issue.3
, pp. 635-678
-
-
Pugh, W.1
Wonnacott, D.2
-
8
-
-
0029711429
-
Minimizing communication while preserving parallelism
-
New York, NY, USA: ACM
-
W. Kelly and W. Plugh, "Minimizing communication while preserving parallelism," in ICS '96. New York, NY, USA: ACM, 1996, pp. 52-60.
-
(1996)
ICS '96
, pp. 52-60
-
-
Kelly, W.1
Plugh, W.2
-
9
-
-
0034299275
-
Generation of efficient nested loops from polyhedra
-
F. Quilleré, S. V. Rajopadhye, and D. Wilde, "Generation of efficient nested loops from polyhedra," Intl. J. of Parallel Programming, vol. 28, no. 5, pp. 469-498, 2000.
-
(2000)
Intl. J. of Parallel Programming
, vol.28
, Issue.5
, pp. 469-498
-
-
Quilleré, F.1
Rajopadhye, S.V.2
Wilde, D.3
-
11
-
-
57349167317
-
Iterative optimization in the polyhedral model: Part II, multidimensional time
-
Tucson, Arizona, June
-
L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen, "Iterative optimization in the polyhedral model: Part II, multidimensional time," in PLDI'08, Tucson, Arizona, June 2008.
-
(2008)
PLDI'08
-
-
Pouchet, L.-N.1
Bastoul, C.2
Cavazos, J.3
Cohen, A.4
-
12
-
-
57349139452
-
A practical automatic polyhedral parallelizer and locality optimizer
-
New York, NY, USA: ACM
-
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A practical automatic polyhedral parallelizer and locality optimizer," in PLDI '08. New York, NY, USA: ACM, 2008, pp. 101-113.
-
(2008)
PLDI '08
, pp. 101-113
-
-
Bondhugula, U.1
Hartono, A.2
Ramanujam, J.3
Sadayappan, P.4
-
14
-
-
33646559059
-
Automatic parallelization of loop programs for distributed memory architectures
-
University of Passau, habilitation Thesis. [Online]. Available
-
M. Griebl, Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004, habilitation Thesis. [Online]. Available: http://www.uni-passau.de/~griebl/habilitation.html
-
(2004)
FMI
-
-
Griebl, M.1
-
16
-
-
0031622954
-
Data transformations for eliminating conflict misses
-
G. Rivera and C.-W. Tseng, "Data transformations for eliminating conflict misses," in PLDI'98.
-
PLDI'98
-
-
Rivera, G.1
Tseng, C.-W.2
-
19
-
-
33748870886
-
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
-
M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood, "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset," SIGARCH Comput. Archit. News, vol. 33, no. 4, pp. 92-99, 2005.
-
(2005)
SIGARCH Comput. Archit. News
, vol.33
, Issue.4
, pp. 92-99
-
-
Martin, M.1
Sorin, D.2
Beckmann, B.3
Marty, M.4
Xu, M.5
Alameldeen, A.6
Moore, K.7
Hill, M.8
Wood, D.9
-
20
-
-
35348900723
-
Virtual hierarchies to support server consolidation
-
M. R. Marty and M. D. Hill, "Virtual hierarchies to support server consolidation," in ISCA'07.
-
ISCA'07
-
-
Marty, M.R.1
Hill, M.D.2
-
21
-
-
33947595619
-
Accelerator: Using data parallelism to program gpus for general-purpose uses
-
D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: using data parallelism to program gpus for general-purpose uses," in ASPLOS'06.
-
ASPLOS'06
-
-
Tarditi, D.1
Puri, S.2
Oglesby, J.3
-
22
-
-
0009930394
-
ZPL: A machine independent programming language for parallel computers
-
B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby, "ZPL: A machine independent programming language for parallel computers," IEEE TSE, vol. 26, no. 3, pp. 197-211, 2000.
-
(2000)
IEEE TSE
, vol.26
, Issue.3
, pp. 197-211
-
-
Chamberlain, B.L.1
Choi, S.-E.2
Lewis, E.C.3
Lin, C.4
Snyder, L.5
Weathersby, W.D.6
-
24
-
-
16244422171
-
Interconnect-power dissipation in a microprocessor
-
N. Magen, A. Kolodny, U. Weiser, and N. Shamir, "Interconnect-power dissipation in a microprocessor," in SLIP'04, 2004.
-
(2004)
SLIP'04
-
-
Magen, N.1
Kolodny, A.2
Weiser, U.3
Shamir, N.4
-
25
-
-
84955452760
-
Dynamic voltage scaling with links for power optimization of interconnection networks
-
L. Shang, L.-S. Peh, and N. K. Jha, "Dynamic voltage scaling with links for power optimization of interconnection networks," in HPCA'03.
-
HPCA'03
-
-
Shang, L.1
Peh, L.-S.2
Jha, N.K.3
-
26
-
-
33746085616
-
Reducing noc energy consumption through compiler-directed channel voltage scaling
-
G. Chen, F. Li, M. Kandemir, and M. J. Irwin, "Reducing NoC energy consumption through compiler-directed channel voltage scaling," in PLDI'06.
-
PLDI'06
-
-
Chen, G.1
Li, F.2
Kandemir, M.3
Irwin, M.J.4
-
28
-
-
0033700063
-
A case for userlevel dynamic page migration
-
D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé, "A case for userlevel dynamic page migration," in ICS'00.
-
ICS'00
-
-
Nikolopoulos, D.S.1
Papatheodorou, T.S.2
Polychronopoulos, C.D.3
Labarta, J.4
Ayguadé, E.5
-
29
-
-
84989342078
-
Scheduling and page migration for multiprocessor compute servers
-
R. Chandra, S. Devine, B. Verghese, A. Gupta, and M. Rosenblum, "Scheduling and page migration for multiprocessor compute servers," in ASPLOS'94.
-
ASPLOS'94
-
-
Chandra, R.1
Devine, S.2
Verghese, B.3
Gupta, A.4
Rosenblum, M.5
-
30
-
-
0003582055
-
-
Dept. Computer Science, University of Washington, Seattle, WA, Tech. Rep. TR-95-09-01
-
S. Leung and J. Zahorjan, "Optimizing data locality by array restructuring," Dept. Computer Science, University of Washington, Seattle, WA, Tech. Rep. TR-95-09-01, 1995.
-
(1995)
Optimizing Data Locality by Array Restructuring
-
-
Leung, S.1
Zahorjan, J.2
-
31
-
-
70449655268
-
Non-singular data transformations: Definition, validity, applications
-
M. F. P. O'Boyle and P. M. W. Knijnenburg, "Non-singular data transformations: definition, validity, applications," in CPC'96.
-
CPC'96
-
-
O'Boyle, M.F.P.1
Knijnenburg, P.M.W.2
-
32
-
-
0033077834
-
A linear algebra framework for automatic determination of optimal data layouts
-
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A linear algebra framework for automatic determination of optimal data layouts," IEEE TPDS, vol. 10, no. 2, pp. 115-135, 1999.
-
(1999)
IEEE TPDS
, vol.10
, Issue.2
, pp. 115-135
-
-
Kandemir, M.1
Choudhary, A.2
Shenoy, N.3
Banerjee, P.4
Ramanujam, J.5
-
33
-
-
0035439109
-
Static and dynamic locality optimizations using integer linear programming
-
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "Static and dynamic locality optimizations using integer linear programming," IEEE TPDS, vol. 12, no. 9, pp. 922-941, 2001.
-
(2001)
IEEE TPDS
, vol.12
, Issue.9
, pp. 922-941
-
-
Kandemir, M.1
Banerjee, P.2
Choudhary, A.3
Ramanujam, J.4
Ayguade, E.5
-
34
-
-
70449666072
-
Improving locality using loop and data transformations in an integrated framework
-
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in MICRO'98.
-
MICRO'98
-
-
Kandemir, M.1
Choudhary, A.2
Ramanujam, J.3
Banerjee, P.4
-
35
-
-
0027311338
-
Automatic array alignment in data-parallel programs
-
New York, NY, USA: ACM Press
-
S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng, "Automatic array alignment in data-parallel programs," in POPL'93. New York, NY, USA: ACM Press, 1993, pp. 16-28.
-
(1993)
POPL'93
, pp. 16-28
-
-
Chatterjee, S.1
Gilbert, J.R.2
Schreiber, R.3
Teng, S.-H.4
-
39
-
-
85088332028
-
Nonlinear array layouts for hierarchical memory systems
-
S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi, "Nonlinear array layouts for hierarchical memory systems," in ICS'99.
-
ICS'99
-
-
Chatterjee, S.1
Jain, V.V.2
Lebeck, A.R.3
Mundhra, S.4
Thottethodi, M.5
-
40
-
-
0033342448
-
Cache-efficient matrix transposition
-
S. Chatterjee and S. Sen, "Cache-efficient matrix transposition," in HPCA'00.
-
HPCA'00
-
-
Chatterjee, S.1
Sen, S.2
-
41
-
-
0032067773
-
Maximizing parallelism and minimizing synchronization with affine partitions
-
A. W. Lim and M. S. Lam, "Maximizing parallelism and minimizing synchronization with affine partitions," Parallel Computing, vol. 24, no. 3-4, pp. 445-475, 1998.
-
(1998)
Parallel Computing
, vol.24
, Issue.3-4
, pp. 445-475
-
-
Lim, A.W.1
Lam, M.S.2
-
42
-
-
0032662841
-
An affine partitioning algorithm to maximize parallelism and minimize communication
-
A. W. Lim, G. I. Cheong, and M. S. Lam, "An affine partitioning algorithm to maximize parallelism and minimize communication," in ICS'99, 1999, pp. 228-237.
-
(1999)
ICS'99
, pp. 228-237
-
-
Lim, A.W.1
Cheong, G.I.2
Lam, M.S.3
|