-
1
-
-
33751022080
-
Programming for parallelism, and locality with hierarchically tiled arrays
-
Proceedings of the 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'06
-
G. Bikshandi, J. Guo, D. Hoeinger, G. Almasi, B. B. Fraguela, M. J. Garzarfian, D. Padua, and C. von Praun. Programming for parallelism and locality with hierarchically tiled arrays. In PPoPP'06, pages 48-57, New York, NY, USA, 2006. ACM. (Pubitemid 44758674)
-
(2006)
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
, vol.2006
, pp. 48-57
-
-
Bikshandi, G.1
Jia, G.2
Hoeflinger, D.3
Almasi, G.4
Fraguela, B.B.5
Garzaran, M.J.6
Padua, D.7
Von Praun, C.8
-
2
-
-
84863430235
-
A case for NUMA-aware contention management on multicore systems
-
Berkeley, CA, USA, USENIX Association
-
S. Blagodurov, S. Zhuravlev, M. Dashti, and A. Fedorova. A case for NUMA-aware contention management on multicore systems. In USENIX ATC'11, Berkeley, CA, USA, 2011. USENIX Association.
-
(2011)
USENIX ATC'11
-
-
Blagodurov, S.1
Zhuravlev, S.2
Dashti, M.3
Fedorova, A.4
-
3
-
-
0346043334
-
Data distribution support on distributed shared memory multiprocessors
-
R. Chandra, D.-K. Chen, R. Cox, D. E. Maydan, N. Nedeljkovic, and J. M. Anderson. Data distribution support on distributed shared memory multiprocessors. In PLDI'97, pages 334-345, New York, NY, USA, 1997. ACM. (Pubitemid 127453709)
-
(1997)
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)
, vol.32
, Issue.5
, pp. 334-345
-
-
Chandra, R.1
Chen, D.-K.2
Cox, R.3
Maydan, D.E.4
Nedeljkovic, N.5
Anderson, J.M.6
-
4
-
-
0242276320
-
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations
-
DOI 10.1016/S0743-7315(03)00103-5
-
A. Darte, J. Mellor-Crummey, R. Fowler, and D. Chavarrá-Miranda. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations. J. Parallel Distrib. Comput., 63:887-911, September 2003. (Pubitemid 37364493)
-
(2003)
Journal of Parallel and Distributed Computing
, vol.63
, Issue.9
, pp. 887-911
-
-
Darte, A.1
Mellor-Crummey, J.2
Fowler, R.3
Chavarria-Miranda, D.4
-
5
-
-
77953995394
-
What can performance counters do for memory subsystem analysis?
-
New York, NY, USA, ACM
-
S. Eranian. What can performance counters do for memory subsystem analysis? In MSPC'08, pages 26-30, New York, NY, USA, 2008. ACM.
-
(2008)
MSPC'08
, pp. 26-30
-
-
Eranian, S.1
-
7
-
-
77954696758
-
Cache topology aware computation mapping for multicores
-
New York, NY, USA, ACM
-
M. Kandemir, T. Yemliha, S. Muralidhara, S. Srikantaiah, M. J. Irwin, and Y. Zhang. Cache topology aware computation mapping for multicores. In PLDI'10, pages 74-85, New York, NY, USA, 2010. ACM.
-
(2010)
PLDI'10
, pp. 74-85
-
-
Kandemir, M.1
Yemliha, T.2
Muralidhara, S.3
Srikantaiah, S.4
Irwin, M.J.5
Zhang, Y.6
-
8
-
-
85023182347
-
Locality and loop scheduling on NUMA multiprocessors
-
CRC Press, Inc.
-
H. Li, H. L. Sudarsan, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In ICPP'93, pages 140-147. CRC Press, Inc, 1993.
-
(1993)
ICPP'93
, pp. 140-147
-
-
Li, H.1
Sudarsan, H.L.2
Stumm, M.3
Sevcik, K.C.4
-
9
-
-
79959898692
-
Memory management in numa multicore systems: Trapped between cache contention and interconnect overhead
-
New York, NY, USA, ACM
-
Z. Majo and T. R. Gross. Memory management in numa multicore systems: trapped between cache contention and interconnect overhead. In ISMM'11, pages 11-20, New York, NY, USA, 2011. ACM.
-
(2011)
ISMM'11
, pp. 11-20
-
-
Majo, Z.1
Gross, T.R.2
-
10
-
-
78650178980
-
Feedback-directed page placement for cc-NUMA via hardware-generated memory traces
-
December
-
J. Marathe, V. Thakkar, and F. Mueller. Feedback-directed page placement for cc-NUMA via hardware-generated memory traces. J. Par. Distrib. Comput., 70:1204-1219, December 2010.
-
(2010)
J. Par. Distrib. Comput.
, vol.70
, pp. 1204-1219
-
-
Marathe, J.1
Thakkar, V.2
Mueller, F.3
-
11
-
-
77952562600
-
Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms
-
C. McCurdy and J. S. Vetter. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms. In ISPASS'10, pages 87-96, 2010.
-
(2010)
ISPASS'10
, pp. 87-96
-
-
McCurdy, C.1
Vetter, J.S.2
-
12
-
-
0033700063
-
A case for user-level dynamic page migration
-
New York, NY, USA, ACM
-
D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé. A case for user-level dynamic page migration. In ICS'00, pages 119{130, New York, NY, USA, 2000. ACM.
-
(2000)
ICS'00
, pp. 119-130
-
-
Nikolopoulos, D.S.1
Papatheodorou, T.S.2
Polychronopoulos, C.D.3
Labarta, J.4
Ayguadé, E.5
-
13
-
-
79959204964
-
Is data distribution necessary in OpenMP?
-
Washington, DC, USA, IEEE Computer Society
-
D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguade. Is data distribution necessary in OpenMP? In Supercomputing'00, Washington, DC, USA, 2000. IEEE Computer Society.
-
(2000)
Supercomputing'00
-
-
Nikolopoulos, D.S.1
Papatheodorou, T.S.2
Polychronopoulos, C.D.3
Labarta, J.4
Ayguadé, E.5
-
14
-
-
34548030923
-
Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors
-
DOI 10.1145/1272996.1273004, Operating Systems Review - Proceedings of the 2007 EuroSys Conference
-
D. Tam, R. Azimi, and M. Stumm. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys'07, pages 47-58, New York, NY, USA, 2007. ACM. (Pubitemid 47281574)
-
(2007)
Operating Systems Review (ACM)
, pp. 47-58
-
-
Tam, D.1
Azimi, R.2
Stumm, M.3
-
15
-
-
0030652844
-
Automatic partitioning of data and computations on scalable shared memory multiprocessors
-
August
-
S. Tandri and T. Abdelrahman. Automatic partitioning of data and computations on scalable shared memory multiprocessors. In ICPP'97, pages 64 -73, August 1997.
-
(1997)
ICPP'97
, pp. 64-73
-
-
Tandri, S.1
Abdelrahman, T.2
-
16
-
-
0028346514
-
Impact of sharing-based thread placement on multithreaded architectures
-
Los Alamitos, CA, USA, IEEE Computer Society Press
-
R. Thekkath and S. J. Eggers. Impact of sharing-based thread placement on multithreaded architectures. In ISCA'94, pages 176-186, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
-
(1994)
ISCA'94
, pp. 176-186
-
-
Thekkath, R.1
Eggers, S.J.2
-
17
-
-
84934274832
-
Using hardware counters to automatically improve memory performance
-
Washington, DC, USA, IEEE Computer Society
-
M. M. Tikir and J. K. Hollingsworth. Using hardware counters to automatically improve memory performance. In Supercomputing'04, Washington, DC, USA, 2004. IEEE Computer Society.
-
(2004)
Supercomputing'04
-
-
Tikir, M.M.1
Hollingsworth, J.K.2
-
18
-
-
2842582520
-
Operating system support for improving data locality on CC-NUMA compute servers
-
New York, NY, USA, ACM
-
B. Verghese, S. Devine, A. Gupta, and M. Rosenblum. Operating system support for improving data locality on CC-NUMA compute servers. In ASPLOS'96, pages 279-289, New York, NY, USA, 1996. ACM.
-
(1996)
ASPLOS'96
, pp. 279-289
-
-
Verghese, B.1
Devine, S.2
Gupta, A.3
Rosenblum, M.4
-
19
-
-
77749340037
-
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
-
New York, NY, USA, ACM
-
E. Z. Zhang, Y. Jiang, and X. Shen. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In PPoPP'10, pages 203-212, New York, NY, USA, 2010. ACM.
-
(2010)
PPoPP'10
, pp. 203-212
-
-
Zhang, E.Z.1
Jiang, Y.2
Shen, X.3
|