-
1
-
-
80053933239
-
-
NAS parallel benchmarks in OpenMP. URL
-
NAS parallel benchmarks in OpenMP. URL http://phase.hpcc. jp/Omni/benchmarks/NPB/index.html.
-
-
-
-
2
-
-
0025536635
-
LAPACK: A portable linear algebra library for high-performance computers
-
E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, and D. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. In SC '90, pages 2-11, 1990. (Pubitemid 21675291)
-
(1990)
Proc Supercomput 90
, pp. 2-11
-
-
Anderson, E.1
Bai, Z.2
Dongarra, J.3
Greenbaum, A.4
McKenney, A.5
Du Croz, J.6
Hammarling, S.7
Demmel, J.8
Bischof, C.9
Sorensen, D.10
-
3
-
-
19044386208
-
An Updated Set of Basic Linear Algebra Subprograms (BLAS)
-
DOI 10.1145/567806.567807
-
L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw., 28(2):135-151, 2002. (Pubitemid 135701673)
-
(2002)
ACM Transactions on Mathematical Software
, vol.28
, Issue.2
, pp. 135-151
-
-
Blackford, L.S.1
Demmel, J.2
Dongarra, J.3
Duff, I.4
Hammarling, S.5
Henry, G.6
Heroux, M.7
Kaufman, L.8
Lumsdaine, A.9
Petitet, A.10
Pozo, R.11
Remington, K.12
Whaley, R.C.13
-
4
-
-
35248852476
-
Scheduling threads for constructive cache sharing on CMPs
-
DOI 10.1145/1248377.1248396, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
-
S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA'07, pages 105-115, 2007. (Pubitemid 47568559)
-
(2007)
Annual ACM Symposium on Parallelism in Algorithms and Architectures
, pp. 105-115
-
-
Chen, S.1
Gibbons, P.B.2
Kozuch, M.3
Liaskovitis, V.4
Ailamaki, A.5
Blelloch, G.E.6
Falsafi, B.7
Fix, L.8
Hardavellas, N.9
Mowry, T.C.10
Wilkerson, C.11
-
5
-
-
17244375796
-
Cache-conscious structure layout
-
T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In PLDI '99, pages 1-12, 1999. (Pubitemid 129686073)
-
(1999)
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)
, vol.34
, Issue.5
, pp. 1-12
-
-
Chilimbi, T.M.1
Hill, M.D.2
Larus, J.R.3
-
6
-
-
80053954369
-
-
HP Corp. Perfmon project. URL
-
HP Corp. Perfmon project. URL http://www.hpl.hp.com/ research/linux/ perfmon.
-
-
-
-
7
-
-
63549085110
-
Analysis and approximation of optimal co-scheduling on chip multiprocessors
-
Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT'08, pages 220-229, 2008.
-
(2008)
PACT'08
, pp. 220-229
-
-
Jiang, Y.1
Shen, X.2
Chen, J.3
Tripathi, R.4
-
8
-
-
0033894726
-
Dynamic data layouts for cache-conscious factorization of DFT
-
D. Kang. Dynamic data layouts for cache-conscious factorization of DFT. In IPDPS '00, page 693, 2000.
-
(2000)
IPDPS
, pp. 693
-
-
Kang, D.1
-
9
-
-
84976736383
-
Page placement algorithms for large real-indexed caches
-
R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst., 10(4), 1992.
-
(1992)
ACM Trans. Comput. Syst.
, vol.10
, pp. 4
-
-
Kessler, R.E.1
Hill, M.D.2
-
10
-
-
10444238444
-
Fair cache sharing and partitioning in a chip multiprocessor architecture
-
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT'04, pages 111-122, 2004.
-
(2004)
PACT'04
, pp. 111-122
-
-
Kim, S.1
Chandra, D.2
Solihin, Y.3
-
11
-
-
47249103334
-
Using OS observations to improve performance in multicore systems
-
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multicore systems. IEEE Micro, 28(3):54-66, 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.3
, pp. 54-66
-
-
Knauerhase, R.1
Brett, P.2
Hohlt, B.3
Li, T.4
Hahn, S.5
-
12
-
-
0026137116
-
The cache performance and optimizations of blocked algorithms
-
M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In ASPLOS '91, pages 63-74, 1991.
-
(1991)
ASPLOS
, vol.91
, pp. 63-74
-
-
Lam, M.D.1
Rothberg, E.E.2
Wolf, M.E.3
-
13
-
-
0030733703
-
The influence of caches on the performance of sorting
-
A. LaMarca and R. E. Ladner. The influence of caches on the performance of sorting. In SODA '97, pages 370-379.
-
SODA
, vol.97
, pp. 370-379
-
-
LaMarca, A.1
Ladner, R.E.2
-
14
-
-
77955032509
-
MCC-DB: Minimizing cache conflicts in muli-core processors for databases
-
R. Lee, X. Ding, F. Chen, Q. Lu, and X. Zhang. MCC-DB: Minimizing cache conflicts in muli-core processors for databases. In VLDB'09.
-
VLDB'09
-
-
Lee, R.1
Ding, X.2
Chen, F.3
Lu, Q.4
Zhang, X.5
-
15
-
-
57749186047
-
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
-
Salt Lake City, UT
-
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA '08, pages 367-378, Salt Lake City, UT, 2008.
-
(2008)
HPCA
, vol.8
, pp. 367-378
-
-
Lin, J.1
Lu, Q.2
Ding, X.3
Zhang, Z.4
Zhang, X.5
Sadayappan, P.6
-
16
-
-
79952791256
-
Enabling software multicore cache management with lightweight hardware support
-
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Enabling software multicore cache management with lightweight hardware support. In SC'09, 2009.
-
(2009)
SC'09
-
-
Lin, J.1
Lu, Q.2
Ding, X.3
Zhang, Z.4
Zhang, X.5
Sadayappan, P.6
-
17
-
-
2342468635
-
Organizing the last line of defense before hitting the memory wall for CMPs
-
C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In HPCA'04, pages 176-185, 2004.
-
(2004)
HPCA'04
, pp. 176-185
-
-
Liu, C.1
Sivasubramaniam, A.2
Kandemir, M.3
-
18
-
-
70449652924
-
Soft-OLP: Improving hardware cache performance through softwarecontrolled object-level partitioning
-
Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through softwarecontrolled object-level partitioning. In PACT '09, pages 246-257, 2009.
-
(2009)
PACT
, vol.9
, pp. 246-257
-
-
Lu, Q.1
Lin, J.2
Ding, X.3
Zhang, Z.4
Zhang, X.5
Sadayappan, P.6
-
20
-
-
0035177611
-
Cache-friendly implementations of transitive closure
-
Barcelona, Spain
-
M. Penner and V. K. Prasanna. Cache-friendly implementations of transitive closure. In PACT '01, page 185, Barcelona, Spain, 2001.
-
(2001)
PACT
, vol.1
, pp. 185
-
-
Penner, M.1
Prasanna, V.K.2
-
21
-
-
34548042910
-
Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches
-
DOI 10.1109/MICRO.2006.49, 4041865, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-39
-
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO'06, pages 423-432, 2006. (Pubitemid 351337015)
-
(2006)
Proceedings of the Annual International Symposium on Microarchitecture, MICRO
, pp. 423-432
-
-
Qureshi, M.K.1
Patt, Y.N.2
-
22
-
-
0036038691
-
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
-
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In SIGMETRICS' 02, pages 66-76.
-
SIGMETRICS'
, vol.2
, pp. 66-76
-
-
Snavely, A.1
Tullsen, D.M.2
Voelker, G.3
-
23
-
-
66749168716
-
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer
-
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In MICRO '08, pages 258-269, 2008.
-
(2008)
MICRO
, vol.8
, pp. 258-269
-
-
Soares, L.1
Tam, D.2
Stumm, M.3
-
25
-
-
57749176037
-
Managing shared l2 caches on multicore systems in software
-
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In WIOSCA '07, 2007.
-
(2007)
WIOSCA
, vol.7
-
-
Tam, D.1
Azimi, R.2
Soares, L.3
Stumm, M.4
-
26
-
-
34548030923
-
Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors
-
DOI 10.1145/1272996.1273004, Operating Systems Review - Proceedings of the 2007 EuroSys Conference
-
D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys'07, pages 47-58, 2007. (Pubitemid 47281574)
-
(2007)
Operating Systems Review (ACM)
, pp. 47-58
-
-
Tam, D.1
Azimi, R.2
Stumm, M.3
-
27
-
-
80054007458
-
-
TOP500.Org. URL
-
TOP500.Org. URL http://www.top500.org/lists/2010/06.
-
-
-
-
28
-
-
84863652586
-
A new approach to array redistribution: Strip mining redistribution
-
A. Wakatani and M. Wolfe. A new approach to array redistribution: Strip mining redistribution. In PARLE '94, pages 323-335, 1994.
-
(1994)
PARLE
, vol.94
, pp. 323-335
-
-
Wakatani, A.1
Wolfe, M.2
-
29
-
-
0003278639
-
Automatically tuned linear algebra software
-
R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SC '98, 1998.
-
(1998)
SC
, vol.98
-
-
Whaley, R.C.1
Dongarra, J.2
-
30
-
-
0002433589
-
Iteration space tiling for memory hierarchies
-
Philadelphia, PA
-
M. Wolfe. Iteration space tiling for memory hierarchies. In PP '89, pages 357-361, Philadelphia, PA, 1989.
-
(1989)
PP
, vol.89
, pp. 357-361
-
-
Wolfe, M.1
-
31
-
-
0024935630
-
More iteration space tiling
-
M. Wolfe. More iteration space tiling. In SC '89, pages 655-664, 1989. (Pubitemid 20665965)
-
(1989)
Proc Supercomput 89
, pp. 655-664
-
-
Wolfe Michael1
-
33
-
-
35248846531
-
An experimental comparison of cache-oblivious and cache-conscious programs
-
DOI 10.1145/1248377.1248394, SPAA'07: Proceedings of the Nineteenth Annual Symposium on Parallelism in Algorithms and Architectures
-
K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson. An experimental comparison of cache-oblivious and cache-conscious programs. In SPAA '07, pages 93-104, 2007. (Pubitemid 47568558)
-
(2007)
Annual ACM Symposium on Parallelism in Algorithms and Architectures
, pp. 93-104
-
-
Yotov, K.1
Roeder, T.2
Pingali, K.3
Gunnels, J.4
Gustavson, F.5
-
34
-
-
70349111334
-
Towards practical page coloring-based multicore cache management
-
X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys'09, pages 89-102, 2009.
-
(2009)
EuroSys'09
, pp. 89-102
-
-
Zhang, X.1
Dwarkadas, S.2
Shen, K.3
-
35
-
-
77952248898
-
Addressing shared resource contention in multicore processors via scheduling
-
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS '10, pages 129-142, 2010.
-
ASPLOS
, vol.10
, Issue.2010
, pp. 129-142
-
-
Zhuravlev, S.1
Blagodurov, S.2
Fedorova, A.3
|