메뉴 건너뛰기




Volumn , Issue , 2013, Pages 289-298

Meeting midway: Improving CMP performance with memory-side prefetching

Author keywords

CMP; DRAM; memory prefetching; NOC

Indexed keywords

CHIP MULTIPROCESSOR; CMP; MEMORY CHANNELS; MEMORY CONTROLLER; MULTITHREADED; NOC; PREFETCHING; RESOURCE CONTENTION;

EID: 84887501415     PISSN: 1089795X     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/PACT.2013.6618825     Document Type: Conference Paper
Times cited : (33)

References (37)
  • 1
    • 70449562109 scopus 로고    scopus 로고
    • Balancing locality and parallelism on sharedcache multicore systems
    • M. Cade and A. Qasem, "Balancing locality and parallelism on sharedcache multicore systems," in HPCC, 2009.
    • (2009) HPCC
    • Cade, M.1    Qasem, A.2
  • 2
    • 0032761638 scopus 로고    scopus 로고
    • Impulse: Building a smarter memory controller
    • J. Carter et al., "Impulse: Building a smarter memory controller," in HPCA, 1999.
    • HPCA, 1999
    • Carter, J.1
  • 3
    • 0028202735 scopus 로고
    • A performance study of software and hardware data prefetching schemes
    • T.-F. Chen and J.-L. Baer, "A performance study of software and hardware data prefetching schemes," in ISCA, 1994.
    • (1994) ISCA
    • Chen, T.-F.1    Baer, J.-L.2
  • 5
    • 80052522711 scopus 로고    scopus 로고
    • Prefetch-aware shared resource management for multicore systems
    • E. Ebrahimi et al., "Prefetch-aware shared resource management for multicore systems," in ISCA, 2011.
    • (2011) ISCA
    • Ebrahimi, E.1
  • 6
    • 76749142994 scopus 로고    scopus 로고
    • Coordinated control of multiple prefetchers in multicore systems
    • E. Ebrahimi et al., "Coordinated control of multiple prefetchers in multicore systems," in MICRO, 2009.
    • (2009) MICRO
    • Ebrahimi, E.1
  • 7
    • 77958076693 scopus 로고    scopus 로고
    • Optimizing application performance on intel core microarchitecture using hardware-implemented prefetchers
    • R. Hegde, "Optimizing application performance on intel core microarchitecture using hardware-implemented prefetchers," Intel, 2008.
    • (2008) Intel
    • Hegde, R.1
  • 8
    • 14944355925 scopus 로고    scopus 로고
    • Memory-side prefetching for linked data structures for processor-in-memory systems
    • C. J. Hughes and S. V. Adve, "Memory-side prefetching for linked data structures for processor-in-memory systems," Journal of PDC, 2005.
    • (2005) Journal of PDC
    • Hughes, C.J.1    Adve, S.V.2
  • 9
    • 40349103955 scopus 로고    scopus 로고
    • Memory prefetching using adaptive stream detection
    • I. Hur and C. Lin, "Memory prefetching using adaptive stream detection," in MICRO, 2006.
    • (2006) MICRO
    • Hur, I.1    Lin, C.2
  • 10
    • 8344236686 scopus 로고    scopus 로고
    • Effective stream-based and execution-based data prefetching
    • S. Iacobovici et al., "Effective stream-based and execution-based data prefetching," in ICS, 2004.
    • (2004) ICS
    • Iacobovici, S.1
  • 12
    • 0030677583 scopus 로고    scopus 로고
    • Prefetching using markov predictors
    • D. Joseph and D. Grunwald, "Prefetching using Markov predictors," in ISCA, 1997.
    • (1997) ISCA
    • Joseph, D.1    Grunwald, D.2
  • 13
    • 0034581346 scopus 로고    scopus 로고
    • A prefetching technique for irregular accesses to linked data structures
    • M. Karlsson et al., "A prefetching technique for irregular accesses to linked data structures," in HPCA, 2000.
    • (2000) HPCA
    • Karlsson, M.1
  • 14
    • 10744231529 scopus 로고    scopus 로고
    • Nonuniform cache architectures for wire-delay dominated on-chip caches
    • nov.-dec.
    • C. Kim et al., "Nonuniform cache architectures for wire-delay dominated on-chip caches," Micro, IEEE, vol. 23, no. 6, nov.-dec. 2003.
    • (2003) Micro, IEEE , vol.23 , Issue.6
    • Kim, C.1
  • 15
    • 77952558442 scopus 로고    scopus 로고
    • Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers
    • Y. Kim et al., "Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers," in HPCA, 2010.
    • (2010) HPCA
    • Kim, Y.1
  • 16
    • 66749189125 scopus 로고    scopus 로고
    • Prefetch-aware DRAM controllers
    • C. J. Lee et al., "Prefetch-aware DRAM controllers," in MICRO, 2008.
    • (2008) MICRO
    • Lee, C.J.1
  • 17
    • 0034818343 scopus 로고    scopus 로고
    • Reducing DRAM latencies with an integrated memory hierarchy design
    • W.-f. Lin, "Reducing DRAM latencies with an integrated memory hierarchy design," in HPCA, 2001.
    • (2001) HPCA
    • Lin, W.-F.1
  • 18
    • 79551706790 scopus 로고    scopus 로고
    • Enhancements for accurate and timely streaming prefetcher
    • Jan.
    • G. Liu et al., "Enhancements for accurate and timely streaming prefetcher," The Journal of ILP, vol. 13, Jan. 2011.
    • (2011) The Journal of ILP , vol.13
    • Liu, G.1
  • 19
    • 0036375948 scopus 로고    scopus 로고
    • Profile-guided post-link stride prefetching
    • C.-K. Luk et al., "Profile-guided post-link stride prefetching," in ICS, 2002.
    • (2002) ICS
    • Luk, C.-K.1
  • 20
    • 84962144701 scopus 로고    scopus 로고
    • Balancing thoughput and fairness in smt processors
    • K. Luo et al., "Balancing thoughput and fairness in smt processors," in ISPASS, 2001.
    • (2001) ISPASS
    • Luo, K.1
  • 21
    • 0036469676 scopus 로고    scopus 로고
    • SIMICS: A full system simulation platform
    • Feb.
    • P. S. Magnusson et al., "SIMICS: A full system simulation platform," Computer, vol. 35, no. 2, Feb. 2002.
    • (2002) Computer , vol.35 , Issue.2
    • Magnusson, P.S.1
  • 22
    • 33748870886 scopus 로고    scopus 로고
    • Multifacets general execution-driven multiprocessor simulator (gems) toolset
    • M. M. Martin et al., "Multifacets general execution-driven multiprocessor simulator (gems) toolset," SIGARCH Comput. Archit. News, 2005.
    • (2005) SIGARCH Comput. Archit. News
    • Martin, M.M.1
  • 25
    • 47349084021 scopus 로고    scopus 로고
    • Optimizing nuca organizations and wiring alternatives for large caches with CACTI 6. 0
    • N. Muralimanohar et al., "Optimizing nuca organizations and wiring alternatives for large caches with CACTI 6. 0," in MICRO, 2007.
    • (2007) MICRO
    • Muralimanohar, N.1
  • 26
    • 52649119398 scopus 로고    scopus 로고
    • Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems
    • O. Mutlu and T. Moscibroda, "Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems," in ISCA, 2008.
    • (2008) ISCA
    • Mutlu, O.1    Moscibroda, T.2
  • 27
    • 33644639438 scopus 로고    scopus 로고
    • Cost-effective compiler directed memory prefetching and bypassing
    • D. Ortega et al., "Cost-effective compiler directed memory prefetching and bypassing," in PACT, 2002.
    • (2002) PACT
    • Ortega, D.1
  • 28
    • 77954460854 scopus 로고
    • Data prefetching and data forwarding in shared memory multiprocessors
    • D. K. Poulsen and P.-C. Yew, "Data prefetching and data forwarding in shared memory multiprocessors," in ICPP, 1994.
    • (1994) ICPP
    • Poulsen, D.K.1    Yew, P.-C.2
  • 29
    • 0018106484 scopus 로고
    • Sequential program prefetching in memory hierarchies
    • Dec.
    • A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, no. 12, Dec. 1978.
    • (1978) Computer , vol.11 , Issue.12
    • Smith, A.J.1
  • 30
    • 0042850375 scopus 로고    scopus 로고
    • Correlation prefetching with a user-level memory thread
    • Jun.
    • Y. Solihin et al., "Correlation prefetching with a user-level memory thread," IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 6, Jun. 2003.
    • (2003) IEEE Trans. Parallel Distrib. Syst. , vol.14 , Issue.6
    • Solihin, Y.1
  • 31
    • 84887479111 scopus 로고    scopus 로고
    • Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers
    • S. Srinath et al., "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers," in HPCA'07.
    • HPCA'07
    • Srinath, S.1
  • 32
    • 77952283542 scopus 로고    scopus 로고
    • Micro-pages: Increasing dram efficiency with localityaware data placement
    • K. Sudan et al., "Micro-pages: increasing dram efficiency with localityaware data placement," in ASPLOS, 2010.
    • (2010) ASPLOS
    • Sudan, K.1
  • 33
    • 84863379287 scopus 로고    scopus 로고
    • Pacman: Prefetch-aware cache management for high performance caching
    • C.-J. Wu et al., "Pacman: prefetch-aware cache management for high performance caching," in MICRO, 2011.
    • (2011) MICRO
    • Wu, C.-J.1
  • 34
    • 0036036096 scopus 로고    scopus 로고
    • Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
    • Y. Wu, "Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching," in PLDI, 2002.
    • (2002) PLDI
    • Wu, Y.1
  • 35
    • 0033705677 scopus 로고    scopus 로고
    • Push vs pull: Data movement for linked data structures
    • C.-L. Yang and A. R. Lebeck, "Push vs. pull: data movement for linked data structures," in ICS, 2000.
    • (2000) ICS
    • Yang, C.-L.1    Lebeck, A.R.2
  • 36
    • 0034460897 scopus 로고    scopus 로고
    • A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
    • Z. Zhang et al., "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," in MICRO, 2000.
    • (2000) MICRO
    • Zhang, Z.1
  • 37
    • 84944748972 scopus 로고    scopus 로고
    • A hardware-based cache pollution filtering mechanism for aggressive prefetches
    • X. Zhuang and H.-H. S. Lee, "A hardware-based cache pollution filtering mechanism for aggressive prefetches," in ICPP, 2003.
    • (2003) ICPP
    • Zhuang, X.1    Lee, H.-H.S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.