메뉴 건너뛰기




Volumn , Issue , 2010, Pages

Exploiting inter-thread temporal locality for chip multithreading

Author keywords

Chip multithreading; Data locality; Data parallelism; Fine grained parallelism; Task scheduling

Indexed keywords

CHIP MULTITHREADING; DATA LOCALITY; DATA PARALLELISM; FINE-GRAINED PARALLELISM; TASK-SCHEDULING;

EID: 77954020709     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/IPDPS.2010.5470465     Document Type: Conference Paper
Times cited : (20)

References (57)
  • 1
    • 77954019149 scopus 로고    scopus 로고
    • LEON2 Processor
    • LEON2 Processor. http://vlsicad.eecs.umich.edu/BK/Slots/cache/www. gaisler.com/products/leon2/leon.html.
  • 2
    • 77952209580 scopus 로고    scopus 로고
    • NVIDIAs next generation CUDA compute architecture: Fermi
    • NVIDIAs next generation CUDA compute architecture: Fermi. NVIDIA Corporation, 2009.
    • (2009) NVIDIA Corporation
  • 8
    • 77953698031 scopus 로고    scopus 로고
    • OpenMP application program interface, May
    • OpenMP Architecture Review Board. OpenMP application program interface, May 2008.
    • (2008) OpenMP Architecture Review Board
  • 9
    • 0033719421 scopus 로고    scopus 로고
    • Wattch: A framework for architectural-level power analysis and optimizations
    • June
    • D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In ISCA 27, June 2000.
    • (2000) ISCA 27
    • Brooks, D.1    Tiwari, V.2    Martonosi, M.3
  • 10
    • 0029235623 scopus 로고
    • Hierarchical tiling for improved superscalar performance
    • Washington, DC, USA
    • L. Carter, J. Ferrante, and S. F. Hummel. Hierarchical tiling for improved superscalar performance. In IPPS '95, pages 239-245, Washington, DC, USA, 1995.
    • (1995) IPPS '95 , pp. 239-245
    • Carter, L.1    Ferrante, J.2    Hummel, S.F.3
  • 11
    • 21244474546 scopus 로고    scopus 로고
    • Predicting interthread cache contention on a chip multi-processor architecture
    • Washington, DC, USA
    • D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting interthread cache contention on a chip multi-processor architecture. In HPCA '05, pages 340-351, Washington, DC, USA, 2005.
    • (2005) HPCA '05 , pp. 340-351
    • Chandra, D.1    Guo, F.2    Kim, S.3    Solihin, Y.4
  • 12
    • 64949190009 scopus 로고    scopus 로고
    • PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches
    • Feb.
    • M. Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches. In HPCA, pages 227-238, Feb. 2009.
    • (2009) HPCA , pp. 227-238
    • Chaudhuri, M.1
  • 13
    • 51449118065 scopus 로고    scopus 로고
    • A performance study of general purpose applications on graphisc processors using CUDA
    • S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphisc processors using CUDA. JPDC'08, 2008.
    • (2008) JPDC'08
    • Che, S.1    Boyer, M.2    Meng, J.3    Tarjan, D.4    Sheaffer, J.W.5    Skadron, K.6
  • 15
    • 77953978022 scopus 로고    scopus 로고
    • Intel Corporation. Intel threading building blocks
    • Intel Corporation. Intel threading building blocks.
  • 16
    • 77953967507 scopus 로고    scopus 로고
    • Intel Corporation. Pircture the future now: Intel AVX
    • Intel Corporation. Pircture the future now: Intel AVX. http://software.intel.com/en-us/avx/.
  • 17
    • 77953976727 scopus 로고    scopus 로고
    • NVIDIA Corporation. GeForce GTX 280 specifications. 2008
    • NVIDIA Corporation. GeForce GTX 280 specifications. 2008.
  • 19
    • 84877083867 scopus 로고    scopus 로고
    • Merrimac: Supercomputing with streams
    • William J. Dally et al. Merrimac: Supercomputing with streams. In SC'03, 2003.
    • (2003) SC'03
    • Dally, W.J.1
  • 21
    • 33746683732 scopus 로고    scopus 로고
    • Maximizing cmp throughput with mediocre cores
    • Washington, DC, USA
    • J. D. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05, pages 51-62, Washington, DC, USA, 2005.
    • (2005) PACT '05 , pp. 51-62
    • Davis, J.D.1    Laudon, J.2    Olukotun, K.3
  • 23
    • 32844463802 scopus 로고    scopus 로고
    • Cache oblivious stencil computations
    • New York, NY, USA
    • M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS '05, pages 361-366, New York, NY, USA, 2005.
    • (2005) ICS '05 , pp. 361-366
    • Frigo, M.1    Strumpen, V.2
  • 24
    • 34247376580 scopus 로고    scopus 로고
    • Chip multiprocessing and the cell broadband engine
    • New York, NY, USA
    • M. Gschwind. Chip multiprocessing and the Cell Broadband Engine. In CF'06, New York, NY, USA, 2006.
    • (2006) CF'06
    • Gschwind, M.1
  • 26
    • 57749175984 scopus 로고    scopus 로고
    • A comprehensive approach to dram power management
    • I. Hur and C. Lin. A comprehensive approach to dram power management. HPCA '08, pages 305-316, 2008.
    • (2008) HPCA '08 , pp. 305-316
    • Hur, I.1    Lin, C.2
  • 27
    • 0022901352 scopus 로고
    • Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system
    • W. Jalby and U. Meier. Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system. In Proc. Int. Conf. Parallel Processing, pages 429-432, 1986.
    • (1986) Proc. Int. Conf. Parallel Processing , pp. 429-432
    • Jalby, W.1    Meier, U.2
  • 28
    • 84893483994 scopus 로고    scopus 로고
    • An evaluation of thread migration for exploiting distributed array locality
    • S. Jenks and J.-L. Gaudiot. An evaluation of thread migration for exploiting distributed array locality. In HPCA, pages 190-195, 2002.
    • (2002) HPCA , pp. 190-195
    • Jenks, S.1    Gaudiot, J.-L.2
  • 29
    • 0347304618 scopus 로고    scopus 로고
    • Data-centric multilevel blocking
    • New York, NY, USA
    • I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multilevel blocking. In PLDI '97, pages 346-357, New York, NY, USA, 1997.
    • (1997) PLDI '97 , pp. 346-357
    • Kodukula, I.1    Ahmed, N.2    Pingali, K.3
  • 30
    • 20344374162 scopus 로고    scopus 로고
    • Niagara: A 32-way multithreaded sparc processor
    • P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21-29, 2005.
    • (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
    • Kongetira, P.1    Aingaran, K.2    Olukotun, K.3
  • 32
    • 35348855586 scopus 로고    scopus 로고
    • Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
    • S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. SIGARCH Comput. Archit. News, 35(2), 2007.
    • (2007) SIGARCH Comput. Archit. News , vol.35 , Issue.2
    • Kumar, S.1    Hughes, C.J.2    Nguyen, A.3
  • 33
    • 0031364101 scopus 로고    scopus 로고
    • Tuning compiler optimizations for simultaneous multithreading
    • Washington, DC, USA
    • J. L. Lo, S. J. Eggers, H. M. Levy, S. S. Parekh, and D. M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In MICRO 30, pages 114-124, Washington, DC, USA, 1997.
    • (1997) MICRO , vol.30 , pp. 114-124
    • Lo, J.L.1    Eggers, S.J.2    Levy, H.M.3    Parekh, S.S.4    Tullsen, D.M.5
  • 34
    • 0033688597 scopus 로고    scopus 로고
    • Smart memories: A modular reconfigurable architecture
    • New York, NY, USA
    • K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: a modular reconfigurable architecture. In ISCA '00, pages 161-171, New York, NY, USA, 2000.
    • (2000) ISCA '00 , pp. 161-171
    • Mai, K.1    Paaske, T.2    Jayasena, N.3    Ho, R.4    Dally, W.J.5    Horowitz, M.6
  • 35
    • 84876909872 scopus 로고
    • Using processor affinity in loop scheduling on shared-memory multiprocessors
    • E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. In SC'92, pages 104-113, 1992.
    • (1992) SC'92 , pp. 104-113
    • Markatos, E.P.1    Leblanc, T.J.2
  • 36
    • 0003665539 scopus 로고    scopus 로고
    • Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
    • K. S. McKinley and O. Temam. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks. ACM Trans. Comput. Syst., 17(4):288-336, 1999.
    • (1999) ACM Trans. Comput. Syst. , vol.17 , Issue.4 , pp. 288-336
    • McKinley, K.S.1    Temam, O.2
  • 37
    • 77950987305 scopus 로고    scopus 로고
    • Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
    • Oct
    • J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In ICCD, Oct 2009.
    • (2009) ICCD
    • Meng, J.1    Skadron, K.2
  • 39
    • 77953964608 scopus 로고    scopus 로고
    • NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007
    • NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007.
  • 41
    • 2842513495 scopus 로고    scopus 로고
    • Thread scheduling for cache locality
    • New York, NY, USA. ACM
    • J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In ASPLOS-VII, pages 60-71, New York, NY, USA, 1996. ACM.
    • (1996) ASPLOS-VII , pp. 60-71
    • Philbin, J.1    Edler, J.2    Anshus, O.J.3    Douglas, C.C.4    Li, K.5
  • 42
    • 0036374188 scopus 로고    scopus 로고
    • Computation regrouping: Restructuring programs for temporal data cache locality
    • New York, NY, USA
    • V. K. Pingali, S. A. McKee, W. C. Hseih, and J. B. Carter. Computation regrouping: restructuring programs for temporal data cache locality. In ICS '02, pages 252-261, New York, NY, USA, 2002.
    • (2002) ICS '02 , pp. 252-261
    • Pingali, V.K.1    McKee, S.A.2    Hseih, W.C.3    Carter, J.B.4
  • 43
    • 34248593308 scopus 로고    scopus 로고
    • Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow
    • K. N. Premnath and J. Abraham. Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow. J. Comput. Phys., 224(2):539-559, 2007.
    • (2007) J. Comput. Phys. , vol.224 , Issue.2 , pp. 539-559
    • Premnath, K.N.1    Abraham, J.2
  • 47
    • 68949199685 scopus 로고    scopus 로고
    • A dynamically reconfigurable cache for multithreaded processors
    • A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. J. Embedded Comput., 2(2):221-233, 2006.
    • (2006) J. Embedded Comput. , vol.2 , Issue.2 , pp. 221-233
    • Settle, A.1    Connors, D.2    Gibert, E.3    Gonzalez, A.4
  • 48
    • 0039927463 scopus 로고    scopus 로고
    • Symbiotic jobscheduling for a simultaneous multithreaded processor
    • New York, NY, USA
    • A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS '00, pages 234-244, New York, NY, USA, 2000.
    • (2000) ASPLOS '00 , pp. 234-244
    • Snavely, A.1    Tullsen, D.M.2
  • 49
    • 0028754497 scopus 로고
    • Affinity scheduling of unbalanced workloads
    • New York, NY, USA
    • S. Subramaniam and D. L. Eager. Affinity scheduling of unbalanced workloads. In SC '94, pages 214-226, New York, NY, USA, 1994.
    • (1994) SC '94 , pp. 214-226
    • Subramaniam, S.1    Eager, D.L.2
  • 50
    • 84949769332 scopus 로고    scopus 로고
    • A new memory monitoring scheme for memory-aware scheduling and partitioning
    • G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02, page 117, 2002.
    • (2002) HPCA '02 , pp. 117
    • Suh, G.E.1    Devadas, S.2    Rudolph, L.3
  • 52
    • 0000444590 scopus 로고
    • Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
    • J. Torrellas, A. Tucker, and A. Gupta. Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors. J. Parallel Distrib. Comput., 24(2):139-151, 1995.
    • (1995) J. Parallel Distrib. Comput. , vol.24 , Issue.2 , pp. 139-151
    • Torrellas, J.1    Tucker, A.2    Gupta, A.3
  • 53
    • 34247272420 scopus 로고    scopus 로고
    • Thread-associative memory for multicore and multithreaded computing
    • New York, NY, USA
    • S. Wang and L. Wang. Thread-associative memory for multicore and multithreaded computing. In ISLPED '06, pages 139-142, New York, NY, USA, 2006.
    • (2006) ISLPED '06 , pp. 139-142
    • Wang, S.1    Wang, L.2
  • 54
    • 0029179077 scopus 로고
    • The SPLASH-2 programs: Characterization and methodological considerations
    • June
    • S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. ISCA '95, pages 24-36, June 1995.
    • (1995) ISCA '95 , pp. 24-36
    • Woo, S.C.1    Ohara, M.2    Torrie, E.3    Singh, J.P.4    Gupta, A.5
  • 55
    • 77953992522 scopus 로고    scopus 로고
    • Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide
    • Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide.
  • 56
    • 84949817426 scopus 로고    scopus 로고
    • Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
    • S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In HPCA '02, page 151, 2002.
    • (2002) HPCA '02 , pp. 151
    • Yang, S.-H.1    Falsafi, B.2    Powell, M.D.3    Vijaykumar, T.N.4
  • 57
    • 79952570595 scopus 로고    scopus 로고
    • An adaptive OpenMP loop scheduler for hyperthreaded SMPs
    • Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss. An adaptive OpenMP loop scheduler for hyperthreaded SMPs. In PDCS '04, 2004.
    • (2004) PDCS '04
    • Zhang, Y.1    Burcea, M.2    Cheng, V.3    Ho, R.4    Voss, M.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.