SCOPUS 정보 검색 플랫폼

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010

Volumn , Issue , 2010, Pages

Exploiting inter-thread temporal locality for chip multithreading

(3) Meng, Jiayuan a Sheaffer, Jeremy W a Skadron, Kevin a

a University of Virginia (United States)

Author keywords

Chip multithreading; Data locality; Data parallelism; Fine grained parallelism; Task scheduling

Indexed keywords

CHIP MULTITHREADING; DATA LOCALITY; DATA PARALLELISM; FINE-GRAINED PARALLELISM; TASK-SCHEDULING;

CELLULAR ARRAYS; DISTRIBUTED PARAMETER NETWORKS; ENERGY CONSERVATION; MULTITASKING; SCHEDULING ALGORITHMS;

MICROPROCESSOR CHIPS;

EID: 77954020709 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2010.5470465 Document Type: Conference Paper

Times cited : (20)

References (57)

1
- 77954019149
- LEON2 Processor
- LEON2 Processor. http://vlsicad.eecs.umich.edu/BK/Slots/cache/www. gaisler.com/products/leon2/leon.html.

2
- 77952209580
- NVIDIAs next generation CUDA compute architecture: Fermi
- NVIDIAs next generation CUDA compute architecture: Fermi. NVIDIA Corporation, 2009.
- (2009) NVIDIA Corporation

3
- 33751205298
- Cactus grid computing: Review of current development
- London, UK. Springer-Verlag
- G. Allen, W. Benger, T. Dramlitsch, T. Goodale, H.-C. Hege, G. Lanfermann, A.e Merzky, T. Radke, and E. Seidel. Cactus grid computing: Review of current development. In Euro-Par '01, pages 817-824, London, UK, 2001. Springer-Verlag.
- (2001) Euro-Par '01 , pp. 817-824
- Allen, G.¹ Benger, W.² Dramlitsch, T.³ Goodale, T.⁴ Hege, H.-C.⁵ Lanfermann, G.⁶ Merzky, A.E.⁷ Radke, T.⁸ Seidel, E.⁹

4
- 35648995516
- Technical Report UCB/EECS-2006-2183, EECS Department, University of California, Berkeley, December 18
- K. Asanovic, R. Bodik, B. Christopher C., J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-2183, EECS Department, University of California, Berkeley, December 18 2006.
- (2006) The Landscape of Parallel Computing Research: A View from Berkeley
- Asanovic, K.¹ Bodik, R.² Christopher, C.B.³ Gebis, J.J.⁴ Husbands, P.⁵ Keutzer, K.⁶ Patterson, D.A.⁷ Plishker, W.L.⁸ Shalf, J.⁹ Williams, S.W.¹⁰ Yelick, K.A.¹¹

5
- 33846535493
- The M5 simulator: Modeling networked systems
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4), 2006.
- (2006) IEEE Micro , vol.26 , Issue.4
- Binkert, N.L.¹ Dreslinski, R.G.² Hsu, L.R.³ Lim, K.T.⁴ Saidi, A.G.⁵ Reinhardt, S.K.⁶

6
- 0029206424
- Provably efficient scheduling for languages with fine-grained parallelism
- G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. In ACM Proc. of Annu. Symp. on Para. Alg. and Archi., pages 1-12, 1995.
- (1995) ACM Proc. of Annu. Symp. on Para. Alg. and Archi. , pp. 1-12
- Blelloch, G.E.¹ Gibbons, P.B.² Matias, Y.³

7
- 0003459808
- PhD thesis, Cambridge, MA, USA
- R. D. Blumofe. Executing multithreaded programs efficiently. PhD thesis, Cambridge, MA, USA, 1995.
- (1995) Executing Multithreaded Programs Efficiently
- Blumofe, R.D.¹

8
- 77953698031
- OpenMP application program interface, May
- OpenMP Architecture Review Board. OpenMP application program interface, May 2008.
- (2008) OpenMP Architecture Review Board

9
- 0033719421
- Wattch: A framework for architectural-level power analysis and optimizations
- June
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In ISCA 27, June 2000.
- (2000) ISCA 27
- Brooks, D.¹ Tiwari, V.² Martonosi, M.³

10
- 0029235623
- Hierarchical tiling for improved superscalar performance
- Washington, DC, USA
- L. Carter, J. Ferrante, and S. F. Hummel. Hierarchical tiling for improved superscalar performance. In IPPS '95, pages 239-245, Washington, DC, USA, 1995.
- (1995) IPPS '95 , pp. 239-245
- Carter, L.¹ Ferrante, J.² Hummel, S.F.³

11
- 21244474546
- Predicting interthread cache contention on a chip multi-processor architecture
- Washington, DC, USA
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting interthread cache contention on a chip multi-processor architecture. In HPCA '05, pages 340-351, Washington, DC, USA, 2005.
- (2005) HPCA '05 , pp. 340-351
- Chandra, D.¹ Guo, F.² Kim, S.³ Solihin, Y.⁴

12
- 64949190009
- PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches
- Feb.
- M. Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches. In HPCA, pages 227-238, Feb. 2009.
- (2009) HPCA , pp. 227-238
- Chaudhuri, M.¹

13
- 51449118065
- A performance study of general purpose applications on graphisc processors using CUDA
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphisc processors using CUDA. JPDC'08, 2008.
- (2008) JPDC'08
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

14
- 35248852476
- Scheduling threads for constructive cache sharing on CMPs
- New York, NY, USA
- S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA '07, pages 105-115, New York, NY, USA, 2007.
- (2007) SPAA '07 , pp. 105-115
- Chen, S.¹ Gibbons, P.B.² Kozuch, M.³ Liaskovitis, V.⁴ Ailamaki, A.⁵ Blelloch, G.E.⁶ Falsafi, B.⁷ Fix, L.⁸ Hardavellas, N.⁹ Mowry, T.C.¹⁰ Wilkerson, C.¹¹

15
- 77953978022
- Intel Corporation. Intel threading building blocks
- Intel Corporation. Intel threading building blocks.

16
- 77953967507
- Intel Corporation. Pircture the future now: Intel AVX
- Intel Corporation. Pircture the future now: Intel AVX. http://software.intel.com/en-us/avx/.

17
- 77953976727
- NVIDIA Corporation. GeForce GTX 280 specifications. 2008
- NVIDIA Corporation. GeForce GTX 280 specifications. 2008.

18
- 70349937457
- October
- L. Dagum. OpenMP: A proposed industry standard API for shared memory programming, October 1997.
- (1997) OpenMP: A Proposed Industry Standard API for Shared Memory Programming
- Dagum, L.¹

19
- 84877083867
- Merrimac: Supercomputing with streams
- William J. Dally et al. Merrimac: Supercomputing with streams. In SC'03, 2003.
- (2003) SC'03
- Dally, W.J.¹

20
- 70350771127
- Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In SC '08, pages 1-12, 2008.
- (2008) SC'08 , pp. 1-12
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

21
- 33746683732
- Maximizing cmp throughput with mediocre cores
- Washington, DC, USA
- J. D. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05, pages 51-62, Washington, DC, USA, 2005.
- (2005) PACT '05 , pp. 51-62
- Davis, J.D.¹ Laudon, J.² Olukotun, K.³

22
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC'06, 2006.
- (2006) SC'06
- Fatahalian, K.¹ Horn, D.R.² Knight, T.J.³ Leem, L.⁴ Houston, M.⁵ Park, J.Y.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

23
- 32844463802
- Cache oblivious stencil computations
- New York, NY, USA
- M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS '05, pages 361-366, New York, NY, USA, 2005.
- (2005) ICS '05 , pp. 361-366
- Frigo, M.¹ Strumpen, V.²

24
- 34247376580
- Chip multiprocessing and the cell broadband engine
- New York, NY, USA
- M. Gschwind. Chip multiprocessing and the Cell Broadband Engine. In CF'06, New York, NY, USA, 2006.
- (2006) CF'06
- Gschwind, M.¹

25
- 4444374512
- Compact thermal modeling for temperature-aware design
- W. Huang, M. R. Stan, K. Skadron, S. Ghosh, K. Sankaranarayanan, and S. Velusamy. Compact thermal modeling for temperature-aware design. In DAC'04, 2004.
- (2004) DAC'04
- Huang, W.¹ Stan, M.R.² Skadron, K.³ Ghosh, S.⁴ Sankaranarayanan, K.⁵ Velusamy, S.⁶

26
- 57749175984
- A comprehensive approach to dram power management
- I. Hur and C. Lin. A comprehensive approach to dram power management. HPCA '08, pages 305-316, 2008.
- (2008) HPCA '08 , pp. 305-316
- Hur, I.¹ Lin, C.²

27
- 0022901352
- Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system
- W. Jalby and U. Meier. Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system. In Proc. Int. Conf. Parallel Processing, pages 429-432, 1986.
- (1986) Proc. Int. Conf. Parallel Processing , pp. 429-432
- Jalby, W.¹ Meier, U.²

28
- 84893483994
- An evaluation of thread migration for exploiting distributed array locality
- S. Jenks and J.-L. Gaudiot. An evaluation of thread migration for exploiting distributed array locality. In HPCA, pages 190-195, 2002.
- (2002) HPCA , pp. 190-195
- Jenks, S.¹ Gaudiot, J.-L.²

29
- 0347304618
- Data-centric multilevel blocking
- New York, NY, USA
- I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multilevel blocking. In PLDI '97, pages 346-357, New York, NY, USA, 1997.
- (1997) PLDI '97 , pp. 346-357
- Kodukula, I.¹ Ahmed, N.² Pingali, K.³

30
- 20344374162
- Niagara: A 32-way multithreaded sparc processor
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21-29, 2005.
- (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
- Kongetira, P.¹ Aingaran, K.² Olukotun, K.³

31
- 77957795684
- Optimistic parallelism benefits from data partitioning
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS 13, 2008.
- (2008) ASPLOS , vol.13
- Kulkarni, M.¹ Pingali, K.² Ramanarayanan, G.³ Walter, B.⁴ Bala, K.⁵ Paul Chew, L.⁶

32
- 35348855586
- Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
- S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. SIGARCH Comput. Archit. News, 35(2), 2007.
- (2007) SIGARCH Comput. Archit. News , vol.35 , Issue.2
- Kumar, S.¹ Hughes, C.J.² Nguyen, A.³

33
- 0031364101
- Tuning compiler optimizations for simultaneous multithreading
- Washington, DC, USA
- J. L. Lo, S. J. Eggers, H. M. Levy, S. S. Parekh, and D. M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In MICRO 30, pages 114-124, Washington, DC, USA, 1997.
- (1997) MICRO , vol.30 , pp. 114-124
- Lo, J.L.¹ Eggers, S.J.² Levy, H.M.³ Parekh, S.S.⁴ Tullsen, D.M.⁵

34
- 0033688597
- Smart memories: A modular reconfigurable architecture
- New York, NY, USA
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: a modular reconfigurable architecture. In ISCA '00, pages 161-171, New York, NY, USA, 2000.
- (2000) ISCA '00 , pp. 161-171
- Mai, K.¹ Paaske, T.² Jayasena, N.³ Ho, R.⁴ Dally, W.J.⁵ Horowitz, M.⁶

35
- 84876909872
- Using processor affinity in loop scheduling on shared-memory multiprocessors
- E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. In SC'92, pages 104-113, 1992.
- (1992) SC'92 , pp. 104-113
- Markatos, E.P.¹ Leblanc, T.J.²

36
- 0003665539
- Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
- K. S. McKinley and O. Temam. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks. ACM Trans. Comput. Syst., 17(4):288-336, 1999.
- (1999) ACM Trans. Comput. Syst. , vol.17 , Issue.4 , pp. 288-336
- McKinley, K.S.¹ Temam, O.²

37
- 77950987305
- Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
- Oct
- J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In ICCD, Oct 2009.
- (2009) ICCD
- Meng, J.¹ Skadron, K.²

38
- 47349098275
- Minebench: A benchmark suite for data mining workloads
- Oct.
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. Minebench: A benchmark suite for data mining workloads. IISWC '06, pages 182-188, Oct. 2006.
- (2006) IISWC '06 , pp. 182-188
- Narayanan, R.¹ Ozisikyilmaz, B.² Zambreno, J.³ Memik, G.⁴ Choudhary, A.⁵

39
- 77953964608
- NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007
- NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007.

40
- 0013229812
- Technical report
- S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for smt processors. Technical report, 2000.
- (2000) Thread-sensitive Scheduling for Smt Processors
- Parekh, S.¹ Eggers, S.² Levy, H.³

41
- 2842513495
- Thread scheduling for cache locality
- New York, NY, USA. ACM
- J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In ASPLOS-VII, pages 60-71, New York, NY, USA, 1996. ACM.
- (1996) ASPLOS-VII , pp. 60-71
- Philbin, J.¹ Edler, J.² Anshus, O.J.³ Douglas, C.C.⁴ Li, K.⁵

42
- 0036374188
- Computation regrouping: Restructuring programs for temporal data cache locality
- New York, NY, USA
- V. K. Pingali, S. A. McKee, W. C. Hseih, and J. B. Carter. Computation regrouping: restructuring programs for temporal data cache locality. In ICS '02, pages 252-261, New York, NY, USA, 2002.
- (2002) ICS '02 , pp. 252-261
- Pingali, V.K.¹ McKee, S.A.² Hseih, W.C.³ Carter, J.B.⁴

43
- 34248593308
- Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow
- K. N. Premnath and J. Abraham. Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow. J. Comput. Phys., 224(2):539-559, 2007.
- (2007) J. Comput. Phys. , vol.224 , Issue.2 , pp. 539-559
- Premnath, K.N.¹ Abraham, J.²

44
- 36849004429
- Bringing NoCs to 65 nm
- A. Pullini, F. Angiolini, S. Murali, D. Atienza, G. D. Micheli, and L. Benini. Bringing NoCs to 65 nm. IEEE Micro, 27(5), 2007.
- (2007) IEEE Micro , vol.27 , Issue.5
- Pullini, A.¹ Angiolini, F.² Murali, S.³ Atienza, D.⁴ Micheli, G.D.⁵ Benini, L.⁶

45
- 0006106643
- Tiling of iteration spaces for multicomputers
- J. Ramanujam. Tiling of iteration spaces for multicomputers. In Proc. 1990 Int. Conf. Parallel Processing, Vol, pages 179-186, 1990.
- (1990) Proc. 1990 Int. Conf. Parallel Processing , pp. 179-186
- Ramanujam, J.¹

46
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
- (2008) ACM Trans. Graph. , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

47
- 68949199685
- A dynamically reconfigurable cache for multithreaded processors
- A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. J. Embedded Comput., 2(2):221-233, 2006.
- (2006) J. Embedded Comput. , vol.2 , Issue.2 , pp. 221-233
- Settle, A.¹ Connors, D.² Gibert, E.³ Gonzalez, A.⁴

48
- 0039927463
- Symbiotic jobscheduling for a simultaneous multithreaded processor
- New York, NY, USA
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS '00, pages 234-244, New York, NY, USA, 2000.
- (2000) ASPLOS '00 , pp. 234-244
- Snavely, A.¹ Tullsen, D.M.²

49
- 0028754497
- Affinity scheduling of unbalanced workloads
- New York, NY, USA
- S. Subramaniam and D. L. Eager. Affinity scheduling of unbalanced workloads. In SC '94, pages 214-226, New York, NY, USA, 1994.
- (1994) SC '94 , pp. 214-226
- Subramaniam, S.¹ Eager, D.L.²

50
- 84949769332
- A new memory monitoring scheme for memory-aware scheduling and partitioning
- G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02, page 117, 2002.
- (2002) HPCA '02 , pp. 117
- Suh, G.E.¹ Devadas, S.² Rudolph, L.³

51
- 77951226007
- HP Laboratories Palo Alto
- D. Tarjan, S. Thoziyoor, and N. P. Jouppi. Cacti 4.0. Technical Report HPL-2006-2086, HP Laboratories Palo Alto, 2006.
- (2006) Cacti 4.0 Technical Report HPL-2006-2086
- Tarjan, D.¹ Thoziyoor, S.² Jouppi, N.P.³

52
- 0000444590
- Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
- J. Torrellas, A. Tucker, and A. Gupta. Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors. J. Parallel Distrib. Comput., 24(2):139-151, 1995.
- (1995) J. Parallel Distrib. Comput. , vol.24 , Issue.2 , pp. 139-151
- Torrellas, J.¹ Tucker, A.² Gupta, A.³

53
- 34247272420
- Thread-associative memory for multicore and multithreaded computing
- New York, NY, USA
- S. Wang and L. Wang. Thread-associative memory for multicore and multithreaded computing. In ISLPED '06, pages 139-142, New York, NY, USA, 2006.
- (2006) ISLPED '06 , pp. 139-142
- Wang, S.¹ Wang, L.²

54
- 0029179077
- The SPLASH-2 programs: Characterization and methodological considerations
- June
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. ISCA '95, pages 24-36, June 1995.
- (1995) ISCA '95 , pp. 24-36
- Woo, S.C.¹ Ohara, M.² Torrie, E.³ Singh, J.P.⁴ Gupta, A.⁵

55
- 77953992522
- Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide
- Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide.

56
- 84949817426
- Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
- S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In HPCA '02, page 151, 2002.
- (2002) HPCA '02 , pp. 151
- Yang, S.-H.¹ Falsafi, B.² Powell, M.D.³ Vijaykumar, T.N.⁴

57
- 79952570595
- An adaptive OpenMP loop scheduler for hyperthreaded SMPs
- Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss. An adaptive OpenMP loop scheduler for hyperthreaded SMPs. In PDCS '04, 2004.
- (2004) PDCS '04
- Zhang, Y.¹ Burcea, M.² Cheng, V.³ Ho, R.⁴ Voss, M.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.