메뉴 건너뛰기




Volumn , Issue , 2011, Pages 155-166

DeNovo: Rethinking the memory hierarchy for disciplined parallelism

Author keywords

[No Author keywords available]

Indexed keywords

ADDITIONAL PROTOCOL; ADDRESS SPACE; CACHE ARCHITECTURE; CACHE COHERENCE; CACHE HIT RATES; COHERENCE PROTOCOL; DATA RACES; DESIGN COMPLEXITY; FLEXIBLE COMMUNICATION; HARDWARE ARCHITECTURE; HARDWARE DESIGNERS; MANY-CORE; MEMORY HIERARCHY; NETWORK TRAFFIC; NON-DETERMINISM; PARALLEL PROGRAMMING MODEL; SHARED MEMORIES; SHARED-MEMORY PROGRAMMING MODEL; SOFTWARE EVOLUTION; SOFTWARE MODEL; TRANSIENT STATE;

EID: 84856527825     PISSN: 1089795X     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/PACT.2011.21     Document Type: Conference Paper
Times cited : (136)

References (72)
  • 2
    • 0030783438 scopus 로고    scopus 로고
    • An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors
    • H. Abdel-Shafi et al. An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors. In HPCA, 1997.
    • (1997) HPCA
    • Abdel-Shafi, H.1
  • 3
    • 84947253476 scopus 로고    scopus 로고
    • So many states, so little time: Verifying memory coherence in the cray X1
    • D. Abts et al. So Many States, So Little Time: Verifying Memory Coherence in the Cray X1. In IPDPS, 2003.
    • (2003) IPDPS
    • Abts, D.1
  • 4
    • 77955253149 scopus 로고    scopus 로고
    • Memory models: A case for rethinking parallel languages and hardware
    • Aug.
    • S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, Aug. 2010.
    • (2010) CACM
    • Adve, S.V.1    Boehm, H.-J.2
  • 5
    • 0029694996 scopus 로고    scopus 로고
    • A comparison of entry consistency and lazy release consistency
    • February
    • S. V. Adve et al. A Comparison of Entry Consistency and Lazy Release Consistency. In HPCA, pages 26-37, February 1996.
    • (1996) HPCA , pp. 26-37
    • Adve, S.V.1
  • 8
    • 66749099556 scopus 로고    scopus 로고
    • Garnet: A detailed interconnection network model inside a full-system simulation framework
    • N. Agarwal et al. Garnet: A detailed interconnection network model inside a full-system simulation framework. Technical Report CE-P08- 001, Princeton University, 2008.
    • (2008) Technical Report CE-P08- 001 Princeton University
    • Agarwal, N.1
  • 9
    • 70350589478 scopus 로고    scopus 로고
    • Serialization sets: A dynamic dependence-based parallel execution model
    • M. D. Allen, S. Sridharan, and G. S. Sohi. Serialization Sets: A Dynamic Dependence-based Parallel Execution Model. In PPoPP, pages 85-96, 2009.
    • (2009) PPoPP , pp. 85-96
    • Allen, M.D.1    Sridharan, S.2    Sohi, G.S.3
  • 10
    • 57349105680 scopus 로고    scopus 로고
    • SharC: Checking data sharing strategies for multithreaded C
    • Z. Anderson et al. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, pages 149-158, 2008.
    • (2008) PLDI , pp. 149-158
    • Anderson, Z.1
  • 11
    • 0029202432 scopus 로고
    • Empirical evaluation of the CRAY-T3D: A compiler perspective
    • June
    • R. H. Arpaci et al. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In ISCA, pages 320-331, June 1995.
    • (1995) ISCA , pp. 320-331
    • Arpaci, R.H.1
  • 12
    • 47349112480 scopus 로고    scopus 로고
    • Scavenger: A new last level cache architecture with global block priority
    • A. Basu et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In MICRO, 2007.
    • (2007) MICRO
    • Basu, A.1
  • 13
    • 72249097688 scopus 로고    scopus 로고
    • The multikernel: A new OS architecture for scalable multicore systems
    • A. Baumann et al. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In SOSP, 2009.
    • (2009) SOSP
    • Baumann, A.1
  • 14
    • 70350676927 scopus 로고    scopus 로고
    • Grace: Safe multithreaded programming for C/C++
    • E. D. Berger et al. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, pages 81-96, 2009.
    • (2009) OOPSLA , pp. 81-96
    • Berger, E.D.1
  • 15
    • 0003456195 scopus 로고
    • Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors
    • B. N. Bershad and M. J. Zekauskas. Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical Report TR CMU-CS-91-170, CMU, 1991.
    • (1991) Technical Report TR CMU-CS-91-170, CMU
    • Bershad, B.N.1    Zekauskas, M.J.2
  • 18
    • 0029191296 scopus 로고
    • Cilk: An efficient multithreaded runtime system
    • R. D. Blumofe et al. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, pages 207-216, 1995.
    • (1995) PPoPP , pp. 207-216
    • Blumofe, R.D.1
  • 19
    • 0028202414 scopus 로고
    • Virtual memory mapped network interface for the shrimp multicomputer
    • M. A. Blumrich et al. Virtual memory mapped network interface for the shrimp multicomputer. In ISCA, pages 142-153, 1994.
    • (1994) ISCA , pp. 142-153
    • Blumrich, M.A.1
  • 20
    • 79961140975 scopus 로고    scopus 로고
    • Safe nondeterminism in a deterministic-by-default parallel language
    • To appear
    • R. Bocchino et al. Safe Nondeterminism in a Deterministic-by-Default Parallel Language. In POPL, 2011. To appear.
    • (2011) POPL
    • Bocchino, R.1
  • 21
    • 72249108375 scopus 로고    scopus 로고
    • A type and effect system for deterministic parallel java
    • R. L. Bocchino, Jr. et al. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, pages 97-116, 2009.
    • (2009) OOPSLA , pp. 97-116
    • Bocchino Jr., R.L.1
  • 22
    • 84856556802 scopus 로고    scopus 로고
    • Multi-core implementations of the concurrent collections programming model
    • Z. Budimlic et al. Multi-core Implementations of the Concurrent Collections Programming Model. In IWCPC, 2009.
    • (2009) IWCPC
    • Budimlic, Z.1
  • 24
    • 84856534645 scopus 로고
    • Efficient and flexible object sharing
    • Portugal, July
    • M. Castro et al. Efficient and flexible object sharing. Technical report, IST - INESC, Portugal, July 1995.
    • (1995) Technical Report, IST - INESC
    • Castro, M.1
  • 26
    • 84983641412 scopus 로고
    • Performance evaluation of hybrid hardware and software distributed shared memory protocols
    • R. Chandra et al. Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols. In ICS, 1994.
    • (1994) ICS
    • Chandra, R.1
  • 28
    • 57349151501 scopus 로고    scopus 로고
    • Mpads: Memory-pooling-assisted data splitting
    • S. Curial et al. Mpads: memory-pooling-assisted data splitting. In ISMM, pages 101-110, 2008.
    • (2008) ISMM , pp. 101-110
    • Curial, S.1
  • 29
    • 0001801746 scopus 로고
    • Protocol verification as a hardware design aid
    • Washington, DC, USA. IEEE Computer Society
    • D. L. Dill et al. Protocol Verification as a Hardware Design Aid. In ICCD '92, pages 522-525, Washington, DC, USA, 1992. IEEE Computer Society.
    • (1992) ICCD '92 , pp. 522-525
    • Dill, D.L.1
  • 30
    • 0026299679 scopus 로고
    • Delayed consistency and its effects on the miss rate of parallel programs
    • M. Dubois et al. Delayed Consistency and its Effects on the Miss Rate of Parallel Programs. In SC, pages 197-206, 1991.
    • (1991) SC , pp. 197-206
    • Dubois, M.1
  • 31
    • 57749194890 scopus 로고    scopus 로고
    • An OS-based alternative to full hardware coherence on tiled CMPs
    • C. Fensch and M. Cintra. An OS-based alternative to full hardware coherence on tiled CMPs. In HPCA, 2008.
    • (2008) HPCA
    • Fensch, C.1    Cintra, M.2
  • 32
    • 0025433762 scopus 로고
    • Memory consistency and event ordering in scalable shared-memory multiprocessors
    • May
    • K. Gharachorloo et al. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, pages 15-26, May 1990.
    • (1990) ISCA , pp. 15-26
    • Gharachorloo, K.1
  • 33
    • 79251578911 scopus 로고    scopus 로고
    • Ct: A flexible parallel programming model for tera-scale architectures
    • A. Ghuloum et al. Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures. Intel White Paper, 2007.
    • (2007) Intel White Paper
    • Ghuloum, A.1
  • 36
    • 76749126627 scopus 로고    scopus 로고
    • Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
    • IEEE
    • D. Hackenberg et al. Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems. In MICRO, pages 413- 422. IEEE, 2009.
    • (2009) MICRO , pp. 413-422
    • Hackenberg, D.1
  • 37
    • 70350601187 scopus 로고    scopus 로고
    • Reactive NUCA: Near-optimal block placement and replication in distributed caches
    • N. Hardavellas et al. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In ISCA, pages 184-195, 2009.
    • (2009) ISCA , pp. 184-195
    • Hardavellas, N.1
  • 38
    • 84976814565 scopus 로고
    • AP1000+: Architectural support of PUT/GET interface for parallelizing compiler
    • K. Hayashi et al. AP1000+: Architectural Support of PUT/GET Interface for Parallelizing Compiler. In ASPLOS, pages 196-207, 1994.
    • (1994) ASPLOS , pp. 196-207
    • Hayashi, K.1
  • 39
    • 0030646290 scopus 로고    scopus 로고
    • Coherent block data transfer in the FLASH multiprocessor
    • J. Heinlein et al. Coherent Block Data Transfer in the FLASH Multiprocessor. In ISPP, pages 18-27, 1997.
    • (1997) ISPP , pp. 18-27
    • Heinlein, J.1
  • 41
    • 77952123736 scopus 로고    scopus 로고
    • A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS
    • J. Howard et al. A 48-core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In ISSCC, pages 108-109, 2010.
    • (2010) ISSCC , pp. 108-109
    • Howard, J.1
  • 43
    • 79960488261 scopus 로고    scopus 로고
    • Intel. The SCC Platform Overview. http://techresearch.intel.com/spaw2/ uploads/files/SCC-Platform Overview.pdf.
    • The SCC Platform Overview
  • 44
    • 0029192199 scopus 로고
    • Reducing false sharing on shared memory multiprocessors through compile time data transformations
    • T. E. Jeremiassen and S. J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In PPOPP, pages 179-188, 1995.
    • (1995) PPOPP , pp. 179-188
    • Jeremiassen, T.E.1    Eggers, S.J.2
  • 45
    • 78649527148 scopus 로고    scopus 로고
    • SARC coherence: Scaling directory cache coherence in performance and power
    • Sept.-Oct.
    • S. Kaxiras and G. Keramidas. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power. IEEE Micro, 30(5):54-65, Sept.-Oct. 2010.
    • (2010) IEEE Micro , vol.30 , Issue.5 , pp. 54-65
    • Kaxiras, S.1    Keramidas, G.2
  • 47
    • 70450237431 scopus 로고    scopus 로고
    • Rigel: An architecture and scalable programming interface for a 1000-core accelerator
    • J. H. Kelm et al. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator. In ISCA, 2009.
    • (2009) ISCA
    • Kelm, J.H.1
  • 48
    • 0029180738 scopus 로고
    • Data forwarding in scalable shared-memory multiprocessors
    • D. A. Koufaty et al. Data Forwarding in Scalable Shared-Memory Multiprocessors. In SC, pages 255-264, 1995.
    • (1995) SC , pp. 255-264
    • Koufaty, D.A.1
  • 50
    • 84856518284 scopus 로고
    • Efficient and scalable cache coherence schemes for shared memory hypercube multiprocessors
    • New York, NY, USA. ACM
    • A. Kumar et al. Efficient and scalable cache coherence schemes for shared memory hypercube multiprocessors. In SC, New York, NY, USA, 1994. ACM.
    • (1994) SC
    • Kumar, A.1
  • 51
    • 0029202473 scopus 로고
    • Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors
    • Jun
    • A. R. Lebeck and D. A. Wood. Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. In ISCA, pages 48-59, Jun 1995.
    • (1995) ISCA , pp. 48-59
    • Lebeck, A.R.1    Wood, D.A.2
  • 52
    • 33646892173 scopus 로고    scopus 로고
    • The problem with threads
    • DOI 10.1109/MC.2006.180
    • E. A. Lee. The Problem with Threads. IEEE Computer, 39(5):33-42, May 2006. (Pubitemid 43786509)
    • (2006) Computer , vol.39 , Issue.5 , pp. 33-42
    • Lee, E.A.1
  • 53
    • 77955008711 scopus 로고    scopus 로고
    • Conflict exceptions: Simplifying concurrent language semantics with precise hardware exceptions for data-races
    • B. Lucia et al. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In ISCA, 2010.
    • (2010) ISCA
    • Lucia, B.1
  • 54
    • 0038346234 scopus 로고    scopus 로고
    • Token coherence: Decoupling performance and correctness
    • M. M. Martin et al. Token coherence: Decoupling performance and correctness. In ISCA, 2003.
    • (2003) ISCA
    • Martin, M.M.1
  • 55
    • 0038684776 scopus 로고    scopus 로고
    • Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors
    • M. M. Martin et al. Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In ISCA, 2003.
    • (2003) ISCA
    • Martin, M.M.1
  • 56
    • 33748870886 scopus 로고    scopus 로고
    • Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
    • M. M. K. Martin et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Computer Architecture News, 33(4):92-99, 2005.
    • (2005) SIGARCH Computer Architecture News , vol.33 , Issue.4 , pp. 92-99
    • Martin, M.M.K.1
  • 58
    • 0026712573 scopus 로고
    • Design and analysis of a scalable cache coherence scheme based on clocks and timestamps
    • January
    • S. L. Min and J.-L. Baer. Design and analysis of a scalable cache coherence scheme based on clocks and timestamps. IEEE Trans. on Parallel and Distributed Systems, 3(2):25-44, January 1992.
    • (1992) IEEE Trans. on Parallel and Distributed Systems , vol.3 , Issue.2 , pp. 25-44
    • Min, S.L.1    Baer, J.-L.2
  • 59
    • 27544455733 scopus 로고    scopus 로고
    • RegionScout: Exploiting coarse grain sharing in snoop- based coherence
    • A. Moshovos. RegionScout: Exploiting Coarse Grain Sharing in Snoop- Based Coherence. In ISCA, 2005.
    • (2005) ISCA
    • Moshovos, A.1
  • 60
    • 84856542182 scopus 로고
    • A formal specification and verification technique for cache coherence protocols
    • A. Nanda and L. Bhuyan. A formal specification and verification technique for cache coherence protocols. In ICPP, pages I22-I26, 1992.
    • (1992) ICPP
    • Nanda, A.1    Bhuyan, L.2
  • 61
    • 67650834931 scopus 로고    scopus 로고
    • Kendo: Efficient deterministic multithreading in software
    • M. Olszewski et al. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, pages 97-108, 2009.
    • (2009) ASPLOS , pp. 97-108
    • Olszewski, M.1
  • 62
    • 78149276281 scopus 로고    scopus 로고
    • SWEL: Hardware cache coherence protocols to map shared data onto shared caches
    • S. H. Pugsley et al. SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches. In PACT, 2010.
    • (2010) PACT
    • Pugsley, S.H.1
  • 63
    • 66749116576 scopus 로고    scopus 로고
    • Token tenure: PATCHing token counting using directory-based cache coherence
    • A. Raghavan et al. Token Tenure: PATCHing Token Counting using Directory-Based Cache Coherence. In MICRO, 2008.
    • (2008) MICRO
    • Raghavan, A.1
  • 66
    • 33845886092 scopus 로고    scopus 로고
    • Flexible snooping: Adaptive forwarding and filtering of snoops in embedded-ring multiprocessors
    • DOI 10.1109/ISCA.2006.21, 1635963, Proceedings - 33rd International Symposium on Computer Architecture,ISCA 2006
    • K. Strauss et al. Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors. In ISCA, pages 327-338, 2006. (Pubitemid 46016626)
    • (2006) Proceedings - International Symposium on Computer Architecture , vol.2006 , pp. 327-338
    • Strauss, K.1    Shen, X.2    Torrellas, J.3
  • 67
    • 79955923055 scopus 로고    scopus 로고
    • Atomic coherence: Leveraging nanophotonics to build race-free cache coherence protocols
    • D. Vantrease et al. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In HPCA, 2011.
    • (2011) HPCA
    • Vantrease, D.1
  • 69
    • 0029179077 scopus 로고
    • The SPLASH-2 programs: Characterization and methodological considerations
    • S. C. Woo et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In ISCA, 1995.
    • (1995) ISCA
    • Woo, S.C.1
  • 70
    • 0025470393 scopus 로고
    • Verifying a multiprocessor cache controller using random case generation
    • D. A. Wood et al. Verifying a multiprocessor cache controller using random case generation. IEEE DToC, 7(4), 1990.
    • (1990) IEEE DToC , vol.7 , Issue.4
    • Wood, D.A.1
  • 71
    • 47349115313 scopus 로고    scopus 로고
    • A framework for coarse-grain optimizations in the on-chip memory hierarchy
    • J. Zebchuk et al. A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy. In MICRO, pages 314-327, 2007.
    • (2007) MICRO , pp. 314-327
    • Zebchuk, J.1
  • 72
    • 76749145126 scopus 로고    scopus 로고
    • A tagless coherence directory
    • J. Zebchuk et al. A Tagless Coherence Directory. In MICRO, 2009.
    • (2009) MICRO
    • Zebchuk, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.