메뉴 건너뛰기




Volumn , Issue , 2010, Pages 441-450

Data marshaling for multi-core architectures

Author keywords

CMP; Critical sections; Pipelining; Staged execution

Indexed keywords

CACHE MISS; CMP; CRITICAL SECTIONS; EXECUTION MODEL; HETEROGENEOUS MULTICORE; MULTICORE ARCHITECTURES; PERFORMANCE BENEFITS; STORAGE OVERHEAD;

EID: 77954973999     PISSN: 10636897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1815961.1816020     Document Type: Conference Paper
Times cited : (33)

References (46)
  • 4
    • 27544493676 scopus 로고    scopus 로고
    • Mitigating Amdahl's law through EPI throttling
    • M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In ISCA-32, 2005.
    • (2005) ISCA-32
    • Annavaram, M.1    Grochowski, E.2    Shen, J.3
  • 6
    • 0003605996 scopus 로고
    • Technical Report RNR-94-1007, NASA Ames Research Center
    • D. H. Bailey et al. NAS parallel benchmarks. Technical Report RNR-94-1007, NASA Ames Research Center, 1994.
    • (1994) NAS Parallel Benchmarks
    • Bailey, D.H.1
  • 7
    • 70450245578 scopus 로고    scopus 로고
    • Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors
    • A. Bhattacharjee and M. Martonosi. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In ISCA, 2009.
    • (2009) ISCA
    • Bhattacharjee, A.1    Martonosi, M.2
  • 8
    • 63549095070 scopus 로고    scopus 로고
    • The PARSEC benchmark suite: Characterization and architectural implications
    • C. Bienia et al. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008.
    • (2008) PACT
    • Bienia, C.1
  • 9
    • 84976783312 scopus 로고
    • Implementing remote procedure calls
    • A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM TOCS, 2(1):39-59, 1984.
    • (1984) ACM TOCS , vol.2 , Issue.1 , pp. 39-59
    • Birrell, A.D.1    Nelson, B.J.2
  • 10
    • 0029191296 scopus 로고
    • Cilk: An efficient multithreaded runtime system
    • R. D. Blumofe et al. Cilk: an efficient multithreaded runtime system. In PPoPP, 1995.
    • (1995) PPoPP
    • Blumofe, R.D.1
  • 11
    • 84872973735 scopus 로고    scopus 로고
    • Reinventing scheduling for multicore systems
    • S. Boyd-Wickizer et al. Reinventing scheduling for multicore systems. In HotOS-XII, 2009.
    • (2009) HotOS-XII
    • Boyd-Wickizer, S.1
  • 12
    • 57549118941 scopus 로고    scopus 로고
    • The shared-thread multiprocessor
    • J. A. Brown and D. M. Tullsen. The shared-thread multiprocessor. In ICS, 2008.
    • (2008) ICS
    • Brown, J.A.1    Tullsen, D.M.2
  • 13
    • 34547473118 scopus 로고    scopus 로고
    • Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly
    • K. Chakraborty, P. M. Wells, and G. S. Sohi. Computation spreading: Employing hardware migration to specialize CMP cores on-the-fly. In ASPLOS-XII, 2006.
    • (2006) ASPLOS-XII
    • Chakraborty, K.1    Wells, P.M.2    Sohi, G.S.3
  • 14
    • 0036949391 scopus 로고    scopus 로고
    • A stateless, content-directed data prefetching mechanism
    • R. Cooksey et al. A stateless, content-directed data prefetching mechanism. In ASPLOS, 2002.
    • (2002) ASPLOS
    • Cooksey, R.1
  • 15
    • 33646144623 scopus 로고    scopus 로고
    • The OpenMP source code repository
    • A. J. Dorta et al. The OpenMP source code repository. In Euromicro, 2005.
    • (2005) Euromicro
    • Dorta, A.J.1
  • 16
    • 64949179220 scopus 로고    scopus 로고
    • Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems.
    • E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. HPCA, 2009.
    • (2009) HPCA
    • Ebrahimi, E.1
  • 17
    • 34547423880 scopus 로고    scopus 로고
    • Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
    • M. Gordon et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006.
    • (2006) ASPLOS
    • Gordon, M.1
  • 19
    • 48249118853 scopus 로고    scopus 로고
    • Amdahl's law in the multicore era
    • M. Hill and M. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7), 2008.
    • (2008) IEEE Computer , vol.41 , pp. 7
    • Hill, M.1    Marty, M.2
  • 20
    • 70449669476 scopus 로고    scopus 로고
    • DDCache: Decoupled and delegable cache data and metadata
    • H. Hossain et al. DDCache: Decoupled and delegable cache data and metadata. In PACT, 2009.
    • (2009) PACT
    • Hossain, H.1
  • 23
    • 0030677583 scopus 로고    scopus 로고
    • Prefetching using Markov predictors
    • D. Joseph and D. Grunwald. Prefetching using Markov predictors. In ISCA, 1997.
    • (1997) ISCA
    • Joseph, D.1    Grunwald, D.2
  • 24
    • 0025429331 scopus 로고
    • Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
    • N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990.
    • (1990) ISCA-17
    • Jouppi, N.P.1
  • 25
    • 0036949388 scopus 로고    scopus 로고
    • An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
    • C. Kim et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS, 2002.
    • (2002) ASPLOS
    • Kim, C.1
  • 27
    • 85084163748 scopus 로고    scopus 로고
    • Using cohort scheduling to enhance server performance
    • J. R. Larus and M. Parkes. Using cohort scheduling to enhance server performance. In USENIX, 2002.
    • (2002) USENIX
    • Larus, J.R.1    Parkes, M.2
  • 28
    • 0030685588 scopus 로고    scopus 로고
    • The SGI Origin: A ccNUMA highly scalable server
    • J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA, 1997.
    • (1997) ISCA
    • Laudon, J.1    Lenoski, D.2
  • 29
    • 0033705677 scopus 로고    scopus 로고
    • Push vs. pull: Data movement for linked data structures
    • C. lin Yang and A. R. Lebeck. Push vs. pull: Data movement for linked data structures. In ICS, 2000.
    • (2000) ICS
    • Lin Yang, C.1    Lebeck, A.R.2
  • 31
    • 33947328378 scopus 로고    scopus 로고
    • Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors
    • T. Morad et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Comp Arch Letters, 2006.
    • (2006) Comp Arch Letters
    • Morad, T.1
  • 32
    • 47349098275 scopus 로고    scopus 로고
    • MineBench: A benchmark suite for data mining workloads
    • R. Narayanan et al. MineBench: A benchmark suite for data mining workloads. In IISWC, 2006.
    • (2006) IISWC
    • Narayanan, R.1
  • 33
    • 77954988455 scopus 로고    scopus 로고
    • NVIDIA Corporation
    • NVIDIA Corporation. CUDA SDK code samples, 2009.
    • (2009) CUDA SDK Code Samples
  • 34
    • 64949187933 scopus 로고    scopus 로고
    • Adaptive spill-receive for robust high-performance caching in CMPs.
    • M. K. Qureshi. Adaptive spill-receive for robust high-performance caching in CMPs. HPCA, 2009.
    • (2009) HPCA
    • Qureshi, M.K.1
  • 35
    • 70450253535 scopus 로고    scopus 로고
    • Thread motion: Fine-grained power management for multi-core systems
    • K. K. Rangan et al. Thread motion: Fine-grained power management for multi-core systems. In ISCA, 2009.
    • (2009) ISCA
    • Rangan, K.K.1
  • 36
    • 0030672607 scopus 로고    scopus 로고
    • The interaction of software prefetching with ILP processors in shared-memory systems
    • P. Ranganathan et al. The interaction of software prefetching with ILP processors in shared-memory systems. ISCA, 1997.
    • (1997) ISCA
    • Ranganathan, P.1
  • 37
    • 70450279104 scopus 로고    scopus 로고
    • Spatio-temporal memory streaming
    • S. Somogyi et al. Spatio-temporal memory streaming. ISCA, 2009.
    • (2009) ISCA
    • Somogyi, S.1
  • 38
    • 77952284721 scopus 로고    scopus 로고
    • Fast switching of threads between cores
    • R. Strong et al. Fast switching of threads between cores. SIGOPS Oper. Syst. Rev., 43(2), 2009.
    • (2009) SIGOPS Oper. Syst. Rev. , vol.43 , pp. 2
    • Strong, R.1
  • 41
    • 67650033098 scopus 로고    scopus 로고
    • Accelerating critical section execution with asymmetric multi-core architectures
    • M. A. Suleman et al. Accelerating critical section execution with asymmetric multi-core architectures. ASPLOS, 2009.
    • (2009) ASPLOS
    • Suleman, M.A.1
  • 44
    • 84957872108 scopus 로고    scopus 로고
    • The impact of speeding up critical sections with data prefetching and forwarding
    • P. Trancoso and J. Torrellas. The impact of speeding up critical sections with data prefetching and forwarding. In ICPP, 1996.
    • (1996) ICPP
    • Trancoso, P.1    Torrellas, J.2
  • 46
    • 27544495466 scopus 로고    scopus 로고
    • Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
    • M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA, 2005.
    • (2005) ISCA
    • Zhang, M.1    Asanovic, K.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.