-
2
-
-
0030783438
-
An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors
-
H. Abdel-Shafi et al. An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors. In HPCA, 1997.
-
(1997)
HPCA
-
-
Abdel-Shafi, H.1
-
3
-
-
84947253476
-
So many states, so little time: Verifying memory coherence in the cray X1
-
D. Abts et al. So Many States, So Little Time: Verifying Memory Coherence in the Cray X1. In IPDPS, 2003.
-
(2003)
IPDPS
-
-
Abts, D.1
-
4
-
-
77955253149
-
Memory models: A case for rethinking parallel languages and hardware
-
Aug.
-
S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, Aug. 2010.
-
(2010)
CACM
-
-
Adve, S.V.1
Boehm, H.-J.2
-
5
-
-
0029694996
-
A comparison of entry consistency and lazy release consistency
-
February
-
S. V. Adve et al. A Comparison of Entry Consistency and Lazy Release Consistency. In HPCA, pages 26-37, February 1996.
-
(1996)
HPCA
, pp. 26-37
-
-
Adve, S.V.1
-
8
-
-
66749099556
-
Garnet: A detailed interconnection network model inside a full-system simulation framework
-
N. Agarwal et al. Garnet: A detailed interconnection network model inside a full-system simulation framework. Technical Report CE-P08- 001, Princeton University, 2008.
-
(2008)
Technical Report CE-P08- 001 Princeton University
-
-
Agarwal, N.1
-
9
-
-
70350589478
-
Serialization sets: A dynamic dependence-based parallel execution model
-
M. D. Allen, S. Sridharan, and G. S. Sohi. Serialization Sets: A Dynamic Dependence-based Parallel Execution Model. In PPoPP, pages 85-96, 2009.
-
(2009)
PPoPP
, pp. 85-96
-
-
Allen, M.D.1
Sridharan, S.2
Sohi, G.S.3
-
10
-
-
57349105680
-
SharC: Checking data sharing strategies for multithreaded C
-
Z. Anderson et al. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, pages 149-158, 2008.
-
(2008)
PLDI
, pp. 149-158
-
-
Anderson, Z.1
-
11
-
-
0029202432
-
Empirical evaluation of the CRAY-T3D: A compiler perspective
-
June
-
R. H. Arpaci et al. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In ISCA, pages 320-331, June 1995.
-
(1995)
ISCA
, pp. 320-331
-
-
Arpaci, R.H.1
-
12
-
-
47349112480
-
Scavenger: A new last level cache architecture with global block priority
-
A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In MICRO, 2007.
-
(2007)
MICRO
-
-
Basu, A.1
-
13
-
-
72249097688
-
The multikernel: A new OS architecture for scalable multicore systems
-
A. Baumann et al. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In SOSP, 2009.
-
(2009)
SOSP
-
-
Baumann, A.1
-
14
-
-
70350676927
-
Grace: Safe multithreaded programming for C/C++
-
E. D. Berger et al. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, pages 81-96, 2009.
-
(2009)
OOPSLA
, pp. 81-96
-
-
Berger, E.D.1
-
15
-
-
0003456195
-
Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors
-
B. N. Bershad and M. J. Zekauskas. Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical Report TR CMU-CS-91-170, CMU, 1991.
-
(1991)
Technical Report TR CMU-CS-91-170, CMU
-
-
Bershad, B.N.1
Zekauskas, M.J.2
-
18
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
R. D. Blumofe et al. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, pages 207-216, 1995.
-
(1995)
PPoPP
, pp. 207-216
-
-
Blumofe, R.D.1
-
19
-
-
0028202414
-
Virtual memory mapped network interface for the shrimp multicomputer
-
M. A. Blumrich et al. Virtual memory mapped network interface for the shrimp multicomputer. In ISCA, pages 142-153, 1994.
-
(1994)
ISCA
, pp. 142-153
-
-
Blumrich, M.A.1
-
20
-
-
79961140975
-
Safe nondeterminism in a deterministic-by-default parallel language
-
To appear
-
R. Bocchino et al. Safe Nondeterminism in a Deterministic-by-Default Parallel Language. In POPL, 2011. To appear.
-
(2011)
POPL
-
-
Bocchino, R.1
-
21
-
-
72249108375
-
A type and effect system for deterministic parallel java
-
R. L. Bocchino, Jr. et al. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, pages 97-116, 2009.
-
(2009)
OOPSLA
, pp. 97-116
-
-
Bocchino Jr., R.L.1
-
22
-
-
84856556802
-
Multi-core implementations of the concurrent collections programming model
-
Z. Budimlic et al. Multi-core Implementations of the Concurrent Collections Programming Model. In IWCPC, 2009.
-
(2009)
IWCPC
-
-
Budimlic, Z.1
-
24
-
-
84856534645
-
Efficient and flexible object sharing
-
Portugal, July
-
M. Castro et al. Efficient and flexible object sharing. Technical report, IST - INESC, Portugal, July 1995.
-
(1995)
Technical Report, IST - INESC
-
-
Castro, M.1
-
25
-
-
34547473118
-
Computation Spreading: Employing hardware migration to specialize CMP cores on-the-fly
-
New York, NY, USA. ACM
-
K. Chakraborty et al. Computation Spreading: Employing hardware migration to specialize CMP cores on-the-fly. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ASPLOS-XII, pages 283-292, New York, NY, USA, 2006. ACM.
-
(2006)
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS
, vol.12
, pp. 283-292
-
-
Chakraborty, K.1
-
26
-
-
84983641412
-
Performance evaluation of hybrid hardware and software distributed shared memory protocols
-
R. Chandra et al. Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols. In ICS, 1994.
-
(1994)
ICS
-
-
Chandra, R.1
-
28
-
-
57349151501
-
Mpads: Memory-pooling-assisted data splitting
-
S. Curial et al. Mpads: memory-pooling-assisted data splitting. In ISMM, pages 101-110, 2008.
-
(2008)
ISMM
, pp. 101-110
-
-
Curial, S.1
-
29
-
-
0001801746
-
Protocol verification as a hardware design aid
-
Washington, DC, USA. IEEE Computer Society
-
D. L. Dill et al. Protocol Verification as a Hardware Design Aid. In ICCD '92, pages 522-525, Washington, DC, USA, 1992. IEEE Computer Society.
-
(1992)
ICCD '92
, pp. 522-525
-
-
Dill, D.L.1
-
30
-
-
0026299679
-
Delayed consistency and its effects on the miss rate of parallel programs
-
M. Dubois et al. Delayed Consistency and its Effects on the Miss Rate of Parallel Programs. In SC, pages 197-206, 1991.
-
(1991)
SC
, pp. 197-206
-
-
Dubois, M.1
-
31
-
-
57749194890
-
An OS-based alternative to full hardware coherence on tiled CMPs
-
C. Fensch and M. Cintra. An OS-based alternative to full hardware coherence on tiled CMPs. In HPCA, 2008.
-
(2008)
HPCA
-
-
Fensch, C.1
Cintra, M.2
-
32
-
-
0025433762
-
Memory consistency and event ordering in scalable shared-memory multiprocessors
-
May
-
K. Gharachorloo et al. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, pages 15-26, May 1990.
-
(1990)
ISCA
, pp. 15-26
-
-
Gharachorloo, K.1
-
33
-
-
79251578911
-
Ct: A flexible parallel programming model for tera-scale architectures
-
A. Ghuloum et al. Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures. Intel White Paper, 2007.
-
(2007)
Intel White Paper
-
-
Ghuloum, A.1
-
36
-
-
76749126627
-
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
-
IEEE
-
D. Hackenberg et al. Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems. In MICRO, pages 413- 422. IEEE, 2009.
-
(2009)
MICRO
, pp. 413-422
-
-
Hackenberg, D.1
-
37
-
-
70350601187
-
Reactive NUCA: Near-optimal block placement and replication in distributed caches
-
N. Hardavellas et al. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In ISCA, pages 184-195, 2009.
-
(2009)
ISCA
, pp. 184-195
-
-
Hardavellas, N.1
-
38
-
-
84976814565
-
AP1000+: Architectural support of PUT/GET interface for parallelizing compiler
-
K. Hayashi et al. AP1000+: Architectural Support of PUT/GET Interface for Parallelizing Compiler. In ASPLOS, pages 196-207, 1994.
-
(1994)
ASPLOS
, pp. 196-207
-
-
Hayashi, K.1
-
39
-
-
0030646290
-
Coherent block data transfer in the FLASH multiprocessor
-
J. Heinlein et al. Coherent Block Data Transfer in the FLASH Multiprocessor. In ISPP, pages 18-27, 1997.
-
(1997)
ISPP
, pp. 18-27
-
-
Heinlein, J.1
-
41
-
-
77952123736
-
A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS
-
J. Howard et al. A 48-core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In ISSCC, pages 108-109, 2010.
-
(2010)
ISSCC
, pp. 108-109
-
-
Howard, J.1
-
43
-
-
79960488261
-
-
Intel. The SCC Platform Overview. http://techresearch.intel.com/spaw2/ uploads/files/SCC-Platform Overview.pdf.
-
The SCC Platform Overview
-
-
-
44
-
-
0029192199
-
Reducing false sharing on shared memory multiprocessors through compile time data transformations
-
T. E. Jeremiassen and S. J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In PPOPP, pages 179-188, 1995.
-
(1995)
PPOPP
, pp. 179-188
-
-
Jeremiassen, T.E.1
Eggers, S.J.2
-
45
-
-
78649527148
-
SARC coherence: Scaling directory cache coherence in performance and power
-
Sept.-Oct.
-
S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 30(5):54-65, Sept.-Oct. 2010.
-
(2010)
IEEE Micro
, vol.30
, Issue.5
, pp. 54-65
-
-
Kaxiras, S.1
Keramidas, G.2
-
47
-
-
70450237431
-
Rigel: An architecture and scalable programming interface for a 1000-core accelerator
-
J. H. Kelm et al. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In ISCA, 2009.
-
(2009)
ISCA
-
-
Kelm, J.H.1
-
48
-
-
0029180738
-
Data forwarding in scalable shared-memory multiprocessors
-
D. A. Koufaty et al. Data Forwarding in Scalable Shared-Memory Multiprocessors. In SC, pages 255-264, 1995.
-
(1995)
SC
, pp. 255-264
-
-
Koufaty, D.A.1
-
49
-
-
35448941890
-
Optimistic parallelism requires abstractions
-
DOI 10.1145/1250734.1250759, PLDI'07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation
-
M. Kulkarni et al. Optimistic Parallelism Requires Abstractions. In PLDI, pages 211-222, 2007. (Pubitemid 47630689)
-
(2007)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
, pp. 211-222
-
-
Kulkarni, M.1
Pingali, K.2
Walter, B.3
Ramanarayanan, G.4
Bala, K.5
Chew, L.P.6
-
50
-
-
84856518284
-
Efficient and scalable cache coherence schemes for shared memory hypercube multiprocessors
-
New York, NY, USA. ACM
-
A. Kumar et al. Efficient and scalable cache coherence schemes for shared memory hypercube multiprocessors. In SC, New York, NY, USA, 1994. ACM.
-
(1994)
SC
-
-
Kumar, A.1
-
51
-
-
0029202473
-
Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors
-
Jun
-
A. R. Lebeck and D. A. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In ISCA, pages 48-59, Jun 1995.
-
(1995)
ISCA
, pp. 48-59
-
-
Lebeck, A.R.1
Wood, D.A.2
-
52
-
-
33646892173
-
The problem with threads
-
DOI 10.1109/MC.2006.180
-
E. A. Lee. The Problem with Threads. IEEE Computer, 39(5):33-42, May 2006. (Pubitemid 43786509)
-
(2006)
Computer
, vol.39
, Issue.5
, pp. 33-42
-
-
Lee, E.A.1
-
53
-
-
77955008711
-
Conflict exceptions: Simplifying concurrent language semantics with precise hardware exceptions for data-races
-
B. Lucia et al. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In ISCA, 2010.
-
(2010)
ISCA
-
-
Lucia, B.1
-
54
-
-
0038346234
-
Token coherence: Decoupling performance and correctness
-
M. M. Martin et al. Token coherence: Decoupling performance and correctness. In ISCA, 2003.
-
(2003)
ISCA
-
-
Martin, M.M.1
-
55
-
-
0038684776
-
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors
-
M. M. Martin et al. Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In ISCA, 2003.
-
(2003)
ISCA
-
-
Martin, M.M.1
-
56
-
-
33748870886
-
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
-
M. M. K. Martin et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 33(4):92-99, 2005.
-
(2005)
SIGARCH Computer Architecture News
, vol.33
, Issue.4
, pp. 92-99
-
-
Martin, M.M.K.1
-
57
-
-
28444472751
-
Improving multiple-CMP systems using token coherence
-
Proceedings - 11th International Symposium on High-Performance Computer Architecture, HPCA-11 2005
-
M. R. Marty et al. Improving Multiple-CMP Systems Using Token Coherence. In HPCA, pages 328-339, 2005. (Pubitemid 41731512)
-
(2005)
Proceedings - International Symposium on High-Performance Computer Architecture
, pp. 328-339
-
-
Marty, M.R.1
Bingham, J.D.2
Hill, M.D.3
Hu, A.J.4
Martin, M.M.K.5
Wood, D.A.6
-
58
-
-
0026712573
-
Design and analysis of a scalable cache coherence scheme based on clocks and timestamps
-
January
-
S. L. Min and J.-L. Baer. Design and analysis of a scalable cache coherence scheme based on clocks and timestamps. IEEE Trans. on Parallel and Distributed Systems, 3(2):25-44, January 1992.
-
(1992)
IEEE Trans. on Parallel and Distributed Systems
, vol.3
, Issue.2
, pp. 25-44
-
-
Min, S.L.1
Baer, J.-L.2
-
59
-
-
27544455733
-
RegionScout: Exploiting coarse grain sharing in snoop- based coherence
-
A. Moshovos. RegionScout: Exploiting Coarse Grain Sharing in Snoop- Based Coherence. In ISCA, 2005.
-
(2005)
ISCA
-
-
Moshovos, A.1
-
60
-
-
84856542182
-
A formal specification and verification technique for cache coherence protocols
-
A. Nanda and L. Bhuyan. A formal specification and verification technique for cache coherence protocols. In ICPP, pages I22-I26, 1992.
-
(1992)
ICPP
-
-
Nanda, A.1
Bhuyan, L.2
-
61
-
-
67650834931
-
Kendo: Efficient deterministic multithreading in software
-
M. Olszewski et al. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, pages 97-108, 2009.
-
(2009)
ASPLOS
, pp. 97-108
-
-
Olszewski, M.1
-
62
-
-
78149276281
-
SWEL: Hardware cache coherence protocols to map shared data onto shared caches
-
S. H. Pugsley et al. SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches. In PACT, 2010.
-
(2010)
PACT
-
-
Pugsley, S.H.1
-
63
-
-
66749116576
-
Token tenure: PATCHing token counting using directory-based cache coherence
-
A. Raghavan et al. Token Tenure: PATCHing Token Counting using Directory-Based Cache Coherence. In MICRO, 2008.
-
(2008)
MICRO
-
-
Raghavan, A.1
-
64
-
-
33845894426
-
Spatial memory streaming
-
DOI 10.1109/ISCA.2006.38, 1635957, Proceedings - 33rd International Symposium on Computer Architecture,ISCA 2006
-
S. Somogyi et al. Spatial Memory Streaming. In ISCA, pages 252-263, 2006. (Pubitemid 46016620)
-
(2006)
Proceedings - International Symposium on Computer Architecture
, vol.2006
, pp. 252-263
-
-
Somogyi, S.1
Wenisch, T.F.2
Ailamaki, A.3
Falsafi, B.4
Moshovos, A.5
-
65
-
-
0036612643
-
Specifying and verifying a broadcast and a multicast snooping cache coherence protocol
-
DOI 10.1109/TPDS.2002.1011412
-
D. J. Sorin et al. Specifying and verifying a broadcast and a multicast snooping cache coherence protocol. IEEE Trans. Parallel Distrib. Syst., 13(6):556-578, 2002. (Pubitemid 34835456)
-
(2002)
IEEE Transactions on Parallel and Distributed Systems
, vol.13
, Issue.6
, pp. 556-578
-
-
Sorin, D.J.1
Plakal, M.2
Condon, A.E.3
Hill, M.D.4
Martin, M.M.K.5
Wood, D.A.6
-
66
-
-
33845886092
-
Flexible snooping: Adaptive forwarding and filtering of snoops in embedded-ring multiprocessors
-
DOI 10.1109/ISCA.2006.21, 1635963, Proceedings - 33rd International Symposium on Computer Architecture,ISCA 2006
-
K. Strauss et al. Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors. In ISCA, pages 327-338, 2006. (Pubitemid 46016626)
-
(2006)
Proceedings - International Symposium on Computer Architecture
, vol.2006
, pp. 327-338
-
-
Strauss, K.1
Shen, X.2
Torrellas, J.3
-
67
-
-
79955923055
-
Atomic coherence: Leveraging nanophotonics to build race-free cache coherence protocols
-
D. Vantrease et al. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In HPCA, 2011.
-
(2011)
HPCA
-
-
Vantrease, D.1
-
68
-
-
27544508955
-
Temporal streaming of shared memory
-
Proceedings - 32nd International Symposium on Computer Architecture, ISCA 2005
-
T. Wenisch et al. Temporal Streaming of Shared Memory. In ISCA, pages 222-233, 2005. (Pubitemid 41543443)
-
(2005)
Proceedings - International Symposium on Computer Architecture
, pp. 222-233
-
-
Wenisch, T.F.1
Somogyi, S.2
Hardavellas, N.3
Kim, J.4
Ailamaki, A.5
Falsafi, B.6
-
69
-
-
0029179077
-
The SPLASH-2 programs: Characterization and methodological considerations
-
S. C. Woo et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA, 1995.
-
(1995)
ISCA
-
-
Woo, S.C.1
-
70
-
-
0025470393
-
Verifying a multiprocessor cache controller using random case generation
-
D. A. Wood et al. Verifying a multiprocessor cache controller using random case generation. IEEE DToC, 7(4), 1990.
-
(1990)
IEEE DToC
, vol.7
, Issue.4
-
-
Wood, D.A.1
-
71
-
-
47349115313
-
A framework for coarse-grain optimizations in the on-chip memory hierarchy
-
J. Zebchuk et al. A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy. In MICRO, pages 314-327, 2007.
-
(2007)
MICRO
, pp. 314-327
-
-
Zebchuk, J.1
-
72
-
-
76749145126
-
A tagless coherence directory
-
J. Zebchuk et al. A Tagless Coherence Directory. In MICRO, 2009.
-
(2009)
MICRO
-
-
Zebchuk, J.1
|