SCOPUS 정보 검색 플랫폼

Annual ACM Symposium on Parallelism in Algorithms and Architectures

Volumn , Issue , 2007, Pages 105-115

Scheduling threads for constructive cache sharing on CMPs

(11) Chen, Shimin b Gibbons, Phillip B b Kozuch, Michael b Liaskovitis, Vasileios a Ailamaki, Anastassia a Blelloch, Guy E a Falsafi, Babak a Fix, Limor b Hardavellas, Nikos a Mowry, Todd C a,b Wilkerson, Chris b

a CARNEGIE MELLON UNIVERSITY (United States)

b INTEL CORPORATION (United States)

Author keywords

Chip multiprocessors; Constructive cache sharing; Parallel depth first; Scheduling algorithms; Thread granularity; Work stealing; Working set profiling

Indexed keywords

CONSTRUCTIVE CACHE SHARING; MULTITHREADED PROGRAMS; PARALLEL DEPTH FIRST; THREAD GRANULARITY; WORKING SET PROFILING;

BENCHMARKING; CACHE MEMORY; PARALLEL PROGRAMMING; SCHEDULING ALGORITHMS;

MICROPROCESSOR CHIPS;

EID: 35248852476 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1248377.1248396 Document Type: Conference Paper

Times cited : (122)

References (42)

1
- 0036590708
- The data locality of work stealing
- U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002.
- (2002) Theory of Computing Systems , vol.35 , Issue.3
- Acar, U.A.¹ Blelloch, G.E.² Blumofe, R.D.³

2
- 0024656760
- An analytical cache model
- A. Agarwal, M. Horowitz, and J. L. Hennessy. An analytical cache model. ACM Trans. on Computer Systems, 7(2), 1989.
- (1989) ACM Trans. on Computer Systems , vol.7 , Issue.2
- Agarwal, A.¹ Horowitz, M.² Hennessy, J.L.³

3
- 38949154099
- Parallel real-time task scheduling on multicore platforms
- J. Anderson and J. Calandrino. Parallel real-time task scheduling on multicore platforms. In RTSS, 2006.
- (2006) RTSS
- Anderson, J.¹ Calandrino, J.²

4
- 0142134997
- A dynamically tunable memory hierarchy
- R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. A dynamically tunable memory hierarchy. IEEE Trans. on Computers, 52(10), 2003.
- (2003) IEEE Trans. on Computers , vol.52 , Issue.10
- Balasubramonian, R.¹ Albonesi, D.H.² Buyuktosunoglu, A.³ Dwarkadas, S.⁴

5
- 8344240379
- Effectively sharing a cache among threads
- G. E. Blelloch and P. B. Gibbons. Effectively sharing a cache among threads. In SPAA, 2004.
- (2004) SPAA
- Blelloch, G.E.¹ Gibbons, P.B.²

6
- 0003575841
- Provably efficient scheduling for languages with fine-grained parallelism
- G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. J. of the ACM, 46(2), 1999.
- (1999) J. of the ACM , vol.46 , Issue.2
- Blelloch, G.E.¹ Gibbons, P.B.² Matias, Y.³

7
- 0030707347
- Space-efficient scheduling of parallelism with synchronization variables
- G. E. Blelloch, P. B. Gibbons, Y. Matias, and G. J. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In SPAA, 1997.
- (1997) SPAA
- Blelloch, G.E.¹ Gibbons, P.B.² Matias, Y.³ Narlikar, G.J.⁴

8
- 0030387154
- An analysis of dag-consistent distributed shared-memory algorithms
- R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. An analysis of dag-consistent distributed shared-memory algorithms. In SPAA, 1996.
- (1996) SPAA
- Blumofe, R.D.¹ Frigo, M.² Joerg, C.F.³ Leiserson, C.E.⁴ Randall, K.H.⁵

9
- 0029191296
- CILK: An efficient multithreaded runtime system
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiseron, K. H. Randall, and Y. Zhou. CILK: An efficient multithreaded runtime system. In PPoPP, 1995.
- (1995) PPoPP
- Blumofe, R.D.¹ Joerg, C.F.² Kuszmaul, B.C.³ Leiseron, C.E.⁴ Randall, K.H.⁵ Zhou, Y.⁶

10
- 0000269759
- Scheduling multithreaded computations by work stealing
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. of the ACM, 46(5), 1999.
- (1999) J. of the ACM , vol.46 , Issue.5
- Blumofe, R.D.¹ Leiserson, C.E.²

11
- 0032592096
- Design challenges of technology scaling
- S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4), 1999.
- (1999) IEEE Micro , vol.19 , Issue.4
- Borkar, S.¹

12
- 0034312472
- A multithreaded PowerPC processor for commercial servers
- J. M. Borkenhagen, R. J. Eickemeyer, R. N. Kalla, and S. R. Kunkel. A multithreaded PowerPC processor for commercial servers. IBM JRD, 44(6), 2000.
- (2000) IBM JRD , vol.44 , Issue.6
- Borkenhagen, J.M.¹ Eickemeyer, R.J.² Kalla, R.N.³ Kunkel, S.R.⁴

13
- 21244474546
- Predicting inter-thread cache contention on a chip multi-processor architecture
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA, 2005.
- (2005) HPCA
- Chandra, D.¹ Guo, F.² Kim, S.³ Solihin, Y.⁴

14
- 67649118314
- Electrical and optical on-chip interconnects in scaled microprocessors
- G. Chen, H. Chen, M. Haurylau, N. Nelson, D. Albonesi, P. M. Fauchet, and E. G. Friedman. Electrical and optical on-chip interconnects in scaled microprocessors. In International Symp. on Circuits and Systems, 2005.
- (2005) International Symp. on Circuits and Systems
- Chen, G.¹ Chen, H.² Haurylau, M.³ Nelson, N.⁴ Albonesi, D.⁵ Fauchet, P.M.⁶ Friedman, E.G.⁷

15
- 33745602003
- Inspector joins
- S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, 2005.
- (2005) VLDB
- Chen, S.¹ Ailamaki, A.² Gibbons, P.B.³ Mowry, T.C.⁴

16
- 84989298447
- Scheduling threads for constructive cache sharing on CMPs
- Technical Report IRP-TR-07-01, Intel Research Pittsburgh
- S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. Technical Report IRP-TR-07-01, Intel Research Pittsburgh, 2007.
- (2007)
- Chen, S.¹ Gibbons, P.B.² Kozuch, M.³ Liaskovitis, V.⁴ Ailamaki, A.⁵ Blelloch, G.E.⁶ Falsafi, B.⁷ Fix, L.⁸ Hardavellas, N.⁹ Mowry, T.C.¹⁰ Wilkerson, C.¹¹

17
- 0032095557
- Performance of shared caches on multithreaded architectures
- Y.-Y. Chen, J.-K. Peir, and C.-T. King. Performance of shared caches on multithreaded architectures. J. of Information Science and Engineering, 14(2), 1998.
- (1998) J. of Information Science and Engineering , vol.14 , Issue.2
- Chen, Y.-Y.¹ Peir, J.-K.² King, C.-T.³

18
- 27544432313
- Optimizing replication, communication, and capacity allocation in CMPs
- Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in CMPs. In ISCA, 2005.
- (2005) ISCA
- Chishti, Z.¹ Powell, M.D.² Vijaykumar, T.N.³

19
- 2442653868
- Design and implementation of the POWER5 microprocessor
- J. Clabes, J. Friedrich, M. Sweet, and J. Dilullo. Design and implementation of the POWER5 microprocessor. In International Solid State Circuits Conf., 2004.
- (2004) International Solid State Circuits Conf
- Clabes, J.¹ Friedrich, J.² Sweet, M.³ Dilullo, J.⁴

20
- 33746683732
- Maximizing CMP throughput with mediocre cores
- J. D. Davis, J. Laudon, and K. Olukotun. Maximizing CMP throughput with mediocre cores. In PACT, 2005.
- (2005) PACT
- Davis, J.D.¹ Laudon, J.² Olukotun, K.³

21
- 35248879016
- S. Eddy. HMMER: profile HMMs for protein sequence analysis, http://hmmer.wustl.edu/.
- S. Eddy. HMMER: profile HMMs for protein sequence analysis, http://hmmer.wustl.edu/.

22
- 34548334096
- Performance of multithreaded chip multiprocessors and implications for operating system design
- A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX ATC, 2005.
- (2005) USENIX ATC
- Fedorova, A.¹ Seltzer, M.² Small, C.³ Nussbaum, D.⁴

23
- 0036949388
- An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
- C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS-X, 2002.
- (2002) ASPLOS-X
- Kim, C.¹ Burger, D.² Keckler, S.W.³

24
- 10444238444
- Fair cache sharing and partitioning in a chip multiprocessor architecture
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
- (2004) PACT
- Kim, S.¹ Chandra, D.² Solihin, Y.³

25
- 0033688597
- Smart memories: A modular reconfigurable architecture
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: a modular reconfigurable architecture. In ISCA, 2000.
- (2000) ISCA
- Mai, K.¹ Paaske, T.² Jayasena, N.³ Ho, R.⁴ Dally, W.J.⁵ Horowitz, M.⁶

26
- 77957948824
- Energy-aware microprocessor synchronization: Transactional memory vs. locks
- T. Moreshet, R. I. Bahar, and M. Herlihy. Energy-aware microprocessor synchronization: Transactional memory vs. locks. In WMPI, 2006.
- (2006) WMPI
- Moreshet, T.¹ Bahar, R.I.² Herlihy, M.³

27
- 4544290262
- A parallel, multithreaded decision tree builder
- Technical Report CMU-CS-98-184, Carnegie Mellon University
- G. J. Narlikar. A parallel, multithreaded decision tree builder. Technical Report CMU-CS-98-184, Carnegie Mellon University, 1998.
- (1998)
- Narlikar, G.J.¹

28
- 0040362680
- Space-efficient scheduling of nested parallelism
- G. J. Narlikar and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Trans. on Programming Languages and Systems, 21(1), 1999.
- (1999) ACM Trans. on Programming Languages and Systems , vol.21 , Issue.1
- Narlikar, G.J.¹ Blelloch, G.E.²

29
- 35248822476
- S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for SMT processors. Technical report, U. Washington, 2000.
- S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for SMT processors. Technical report, U. Washington, 2000.

30
- 35248819959
- Thread scheduling for cache locality
- J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In ASPLOS, 1996.
- (1996) ASPLOS
- Philbin, J.¹ Edler, J.² Anshus, O.J.³ Douglas, C.C.⁴ Li, K.⁵

31
- 0025629433
- Analysis of multithreaded architectures for parallel computing
- R. H. Saavedra-Barrera, D. E. Culler, and T. von Eicken. Analysis of multithreaded architectures for parallel computing. In SPAA, 1990.
- (1990) SPAA
- Saavedra-Barrera, R.H.¹ Culler, D.E.² von Eicken, T.³

32
- 0004245602
- Semiconductor Industry Association, ITRS 2005 Edition
- Semiconductor Industry Association. The International Technology Roadmap for Semiconductors (ITRS) 2005 Edition, 2005.
- (2005) The International Technology Roadmap for Semiconductors

33
- 84940605935
- Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
- J. R. Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, vol. 1148, 1996.
- (1996) Applied Computational Geometry: Towards Geometric Engineering , vol.1148
- Shewchuk, J.R.¹

34
- 0003450887
- Cacti 3.0: An integrated cache timing, power and area model
- Technical Report WRL 2001/2, Compaq Computer Corporation
- P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical Report WRL 2001/2, Compaq Computer Corporation, 2001.
- (2001)
- Shivakumar, P.¹ Jouppi, N.P.²

35
- 0034443570
- Symbiotic job scheduling for a simultaneous multithreading processor
- A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS, 2000.
- (2000) ASPLOS
- Snavely, A.¹ Tullsen, D.M.²

36
- 0034826142
- Analytical cache models with application to cache partitioning
- G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with application to cache partitioning. In International Conf. on Supercomputing, 2001.
- (2001) International Conf. on Supercomputing
- Suh, G.E.¹ Devadas, S.² Rudolph, L.³

37
- 84949769332
- A new memory monitoring scheme for memory-aware scheduling and partitioning
- G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.
- (2002) HPCA
- Suh, G.E.¹ Devadas, S.² Rudolph, L.³

38
- 1642371317
- Dynamic partitioning of shared cache memory
- G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. J. of Supercomputing, 28(1), 2004.
- (2004) J. of Supercomputing , vol.28 , Issue.1
- Suh, G.E.¹ Rudolph, L.² Devadas, S.³

39
- 0023456387
- Footprints in the cache
- D. Thibaut and H. S. Stone. Footprints in the cache. ACM Trans. on Computer Systems, 5(4), 1987.
- (1987) ACM Trans. on Computer Systems , vol.5 , Issue.4
- Thibaut, D.¹ Stone, H.S.²

40
- 84858355384
- M. W. Weissmann. Libpmsort. http://freshmeat.net/projects/libpmsort.
- Libpmsort
- Weissmann, M.W.¹

41
- 84949817426
- Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
- S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In HPCA, 2002.
- (2002) HPCA
- Yang, S.-H.¹ Falsafi, B.² Powell, M.D.³ Vijaykumar, T.N.⁴

42
- 27544495466
- Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
- M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA, 2005.
- (2005) ISCA
- Zhang, M.¹ Asanovic, K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.