메뉴 건너뛰기




Volumn , Issue , 2007, Pages 105-115

Scheduling threads for constructive cache sharing on CMPs

Author keywords

Chip multiprocessors; Constructive cache sharing; Parallel depth first; Scheduling algorithms; Thread granularity; Work stealing; Working set profiling

Indexed keywords

CONSTRUCTIVE CACHE SHARING; MULTITHREADED PROGRAMS; PARALLEL DEPTH FIRST; THREAD GRANULARITY; WORKING SET PROFILING;

EID: 35248852476     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1248377.1248396     Document Type: Conference Paper
Times cited : (122)

References (42)
  • 3
    • 38949154099 scopus 로고    scopus 로고
    • Parallel real-time task scheduling on multicore platforms
    • J. Anderson and J. Calandrino. Parallel real-time task scheduling on multicore platforms. In RTSS, 2006.
    • (2006) RTSS
    • Anderson, J.1    Calandrino, J.2
  • 5
    • 8344240379 scopus 로고    scopus 로고
    • Effectively sharing a cache among threads
    • G. E. Blelloch and P. B. Gibbons. Effectively sharing a cache among threads. In SPAA, 2004.
    • (2004) SPAA
    • Blelloch, G.E.1    Gibbons, P.B.2
  • 6
    • 0003575841 scopus 로고    scopus 로고
    • Provably efficient scheduling for languages with fine-grained parallelism
    • G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. J. of the ACM, 46(2), 1999.
    • (1999) J. of the ACM , vol.46 , Issue.2
    • Blelloch, G.E.1    Gibbons, P.B.2    Matias, Y.3
  • 7
    • 0030707347 scopus 로고    scopus 로고
    • Space-efficient scheduling of parallelism with synchronization variables
    • G. E. Blelloch, P. B. Gibbons, Y. Matias, and G. J. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In SPAA, 1997.
    • (1997) SPAA
    • Blelloch, G.E.1    Gibbons, P.B.2    Matias, Y.3    Narlikar, G.J.4
  • 10
    • 0000269759 scopus 로고    scopus 로고
    • Scheduling multithreaded computations by work stealing
    • R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. of the ACM, 46(5), 1999.
    • (1999) J. of the ACM , vol.46 , Issue.5
    • Blumofe, R.D.1    Leiserson, C.E.2
  • 11
    • 0032592096 scopus 로고    scopus 로고
    • Design challenges of technology scaling
    • S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4), 1999.
    • (1999) IEEE Micro , vol.19 , Issue.4
    • Borkar, S.1
  • 13
    • 21244474546 scopus 로고    scopus 로고
    • Predicting inter-thread cache contention on a chip multi-processor architecture
    • D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA, 2005.
    • (2005) HPCA
    • Chandra, D.1    Guo, F.2    Kim, S.3    Solihin, Y.4
  • 18
    • 27544432313 scopus 로고    scopus 로고
    • Optimizing replication, communication, and capacity allocation in CMPs
    • Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in CMPs. In ISCA, 2005.
    • (2005) ISCA
    • Chishti, Z.1    Powell, M.D.2    Vijaykumar, T.N.3
  • 20
    • 33746683732 scopus 로고    scopus 로고
    • Maximizing CMP throughput with mediocre cores
    • J. D. Davis, J. Laudon, and K. Olukotun. Maximizing CMP throughput with mediocre cores. In PACT, 2005.
    • (2005) PACT
    • Davis, J.D.1    Laudon, J.2    Olukotun, K.3
  • 21
    • 35248879016 scopus 로고    scopus 로고
    • S. Eddy. HMMER: profile HMMs for protein sequence analysis, http://hmmer.wustl.edu/.
    • S. Eddy. HMMER: profile HMMs for protein sequence analysis, http://hmmer.wustl.edu/.
  • 22
    • 34548334096 scopus 로고    scopus 로고
    • Performance of multithreaded chip multiprocessors and implications for operating system design
    • A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX ATC, 2005.
    • (2005) USENIX ATC
    • Fedorova, A.1    Seltzer, M.2    Small, C.3    Nussbaum, D.4
  • 23
    • 0036949388 scopus 로고    scopus 로고
    • An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
    • C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS-X, 2002.
    • (2002) ASPLOS-X
    • Kim, C.1    Burger, D.2    Keckler, S.W.3
  • 24
    • 10444238444 scopus 로고    scopus 로고
    • Fair cache sharing and partitioning in a chip multiprocessor architecture
    • S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
    • (2004) PACT
    • Kim, S.1    Chandra, D.2    Solihin, Y.3
  • 26
    • 77957948824 scopus 로고    scopus 로고
    • Energy-aware microprocessor synchronization: Transactional memory vs. locks
    • T. Moreshet, R. I. Bahar, and M. Herlihy. Energy-aware microprocessor synchronization: Transactional memory vs. locks. In WMPI, 2006.
    • (2006) WMPI
    • Moreshet, T.1    Bahar, R.I.2    Herlihy, M.3
  • 27
    • 4544290262 scopus 로고    scopus 로고
    • A parallel, multithreaded decision tree builder
    • Technical Report CMU-CS-98-184, Carnegie Mellon University
    • G. J. Narlikar. A parallel, multithreaded decision tree builder. Technical Report CMU-CS-98-184, Carnegie Mellon University, 1998.
    • (1998)
    • Narlikar, G.J.1
  • 29
    • 35248822476 scopus 로고    scopus 로고
    • S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for SMT processors. Technical report, U. Washington, 2000.
    • S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for SMT processors. Technical report, U. Washington, 2000.
  • 31
    • 0025629433 scopus 로고
    • Analysis of multithreaded architectures for parallel computing
    • R. H. Saavedra-Barrera, D. E. Culler, and T. von Eicken. Analysis of multithreaded architectures for parallel computing. In SPAA, 1990.
    • (1990) SPAA
    • Saavedra-Barrera, R.H.1    Culler, D.E.2    von Eicken, T.3
  • 34
    • 0003450887 scopus 로고    scopus 로고
    • Cacti 3.0: An integrated cache timing, power and area model
    • Technical Report WRL 2001/2, Compaq Computer Corporation
    • P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical Report WRL 2001/2, Compaq Computer Corporation, 2001.
    • (2001)
    • Shivakumar, P.1    Jouppi, N.P.2
  • 35
    • 0034443570 scopus 로고    scopus 로고
    • Symbiotic job scheduling for a simultaneous multithreading processor
    • A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS, 2000.
    • (2000) ASPLOS
    • Snavely, A.1    Tullsen, D.M.2
  • 37
    • 84949769332 scopus 로고    scopus 로고
    • A new memory monitoring scheme for memory-aware scheduling and partitioning
    • G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.
    • (2002) HPCA
    • Suh, G.E.1    Devadas, S.2    Rudolph, L.3
  • 41
    • 84949817426 scopus 로고    scopus 로고
    • Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
    • S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In HPCA, 2002.
    • (2002) HPCA
    • Yang, S.-H.1    Falsafi, B.2    Powell, M.D.3    Vijaykumar, T.N.4
  • 42
    • 27544495466 scopus 로고    scopus 로고
    • Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
    • M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA, 2005.
    • (2005) ISCA
    • Zhang, M.1    Asanovic, K.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.