메뉴 건너뛰기




Volumn , Issue , 2009, Pages 209-218

A compiler-directed data prefetching scheme for chip multiprocessors

Author keywords

Chip multiprocessors; Compiler; Helper thread; Prefetching

Indexed keywords

BENCHMARK SUITES; CHIP MULTIPROCESSOR; CHIP MULTIPROCESSORS; COMPILER; DATA-PREFETCHING; DEGRADED PERFORMANCE; EXPERIMENTAL DATA; HELPER THREAD; ITS DATA; MEMORY ACCESS LATENCY; MULTI-THREADED APPLICATION; NEGATIVE INTERACTION; ON-CHIP CACHE; ON-CHIP MULTIPROCESSOR; OPTIMAL SCHEME; PARALLEL EXECUTIONS; PERFORMANCE IMPROVEMENTS; PREFETCH; PREFETCHES; PREFETCHING; PROGRAM PHASIS; SHARED CACHE; SIMULATION PARAMETERS; STATIC COMPILER;

EID: 67650091160     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1504176.1504208     Document Type: Conference Paper
Times cited : (25)

References (56)
  • 4
    • 84966509749 scopus 로고    scopus 로고
    • Taming the memory hogs: Using compiler- inserted releases to manage physical memory intelligently
    • A. D. Brown and T. C. Mowry. Taming the Memory Hogs: Using Compiler- Inserted Releases to Manage Physical Memory Intelligently. In OSDI, pages 31-44, 2000.
    • (2000) OSDI , pp. 31-44
    • Brown, A.D.1    Mowry, T.C.2
  • 5
    • 34548020997 scopus 로고    scopus 로고
    • Competitive prefetching for concurrent sequential I/O
    • DOI 10.1145/1272996.1273017, Operating Systems Review - Proceedings of the 2007 EuroSys Conference
    • C. Li et al. Competitive Prefetching for Concurrent Sequential I/O. In EuroSys, pages 189-202, 2007. (Pubitemid 47281583)
    • (2007) Operating Systems Review (ACM) , pp. 189-202
    • Li, C.1    Shen, K.2    Papathanasiou, A.E.3
  • 6
    • 0028202735 scopus 로고
    • A performance study of software and hardware data prefetching schemes
    • T.-F. Chen and J.-L. Baer. A performance study of software and hardware data prefetching schemes. In ISCA, pages 223-232, 1994.
    • (1994) ISCA , pp. 223-232
    • Chen, T.-F.1    Baer, J.-L.2
  • 7
    • 0036949391 scopus 로고    scopus 로고
    • A stateless, content-directed data prefetching mechanism
    • DOI 10.1145/635508.605427
    • Cooksey et al. A stateless, content-directed data prefetching mechanism. In ASPLOS, pages 279-290, 2002. (Pubitemid 44892240)
    • (2002) Operating Systems Review (ACM) , vol.36 , Issue.5 , pp. 279-290
    • Cooksey, R.1    Jourdan, S.2    Grunwald, D.3
  • 8
    • 84965078406 scopus 로고
    • Fixed and adaptive sequential prefetching in shared memory multiprocessors
    • Dahlgren et al. Fixed and adaptive sequential prefetching in shared memory multiprocessors. In ICPP, pages 56-63, 1993.
    • (1993) ICPP , pp. 56-63
    • Dahlgren1
  • 10
    • 84944415710 scopus 로고    scopus 로고
    • Comparing program phase detection techniques
    • A. S. Dhodapkar and J. E. Smith. Comparing Program Phase Detection Techniques. In MICRO, pages 217-227, 2003.
    • (2003) MICRO , pp. 217-227
    • Dhodapkar, A.S.1    Smith, J.E.2
  • 11
    • 70350214299 scopus 로고    scopus 로고
    • DiskSeen: Exploiting disk layout and access history to enhance i/o prefetch
    • Ding et al. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch. In USENIX, pages 261-274, 2007.
    • (2007) USENIX , pp. 261-274
    • Ding1
  • 13
    • 84883502375 scopus 로고
    • Informed prefetching and caching
    • et al. Informed Prefetching and Caching. In SOSP, pages 79-95, 1995.
    • (1995) SOSP , pp. 79-95
  • 14
    • 34247120722 scopus 로고    scopus 로고
    • Efficient emulation of hardware prefetchers via event-driven helper threading
    • DOI 10.1145/1152154.1152178, PACT 2006 - Proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques
    • I. Ganusov and M. Burtscher. Efficient Emulation of Hardware Prefetchers via Event-Driven Helper Threading. In PACT, pages 144-153, 2006. (Pubitemid 46601092)
    • (2006) Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT , vol.2006 , pp. 144-153
    • Ganusov, I.1    Burtscher, M.2
  • 15
    • 85060311108 scopus 로고    scopus 로고
    • AMP: Adaptive multi-stream prefetching in a shared cache
    • B. S. Gill and L. A. D. Bathen. AMP: Adaptive Multi-Stream Prefetching in a Shared Cache. In USENIX FAST, pages 185-198, 2007.
    • (2007) USENIX FAST , pp. 185-198
    • Gill, B.S.1    Bathen, L.A.D.2
  • 16
    • 2342510413 scopus 로고    scopus 로고
    • An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
    • E. H. Gornish and A. Veidenbaum. An integrated hardware/software data prefetching scheme for shared-memory multiprocessors. Int. J. Parallel Program., 27(1):35-70, 1999. (Pubitemid 129645089)
    • (1999) International Journal of Parallel Programming , vol.27 , Issue.1 , pp. 35-70
    • Gornish, E.H.1    Veidenbaum, A.2
  • 17
    • 0031235242 scopus 로고    scopus 로고
    • A single-chip multiprocessor
    • Hammond et al. A Single-Chip Multiprocessor. Computer, 30(9):79-85, 1997. (Pubitemid 127672649)
    • (1997) Computer , vol.30 , Issue.9 , pp. 79-85
    • Hammond, L.1    Nayfeh, B.A.2    Olukotun, K.3
  • 18
    • 42549168687 scopus 로고    scopus 로고
    • Exploring the cache design space for large scale CMPs
    • Hsu et al. Exploring the cache design space for large scale CMPs. SIGARCH Comput. Archit. News, 33(4):24-33, 2005.
    • (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.4 , pp. 24-33
    • Hsu1
  • 19
    • 0038346237 scopus 로고    scopus 로고
    • Positional adaptation of processors: Application to energy reduction
    • Huang et al. Positional Adaptation of Processors: Application to Energy Reduction. In ISCA, pages 157-168, 2003.
    • (2003) ISCA , pp. 157-168
    • Huang1
  • 22
    • 33847158857 scopus 로고    scopus 로고
    • Helper thread prefetching for loosely-coupled multiprocessor systems
    • Jung et al. Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems. In IPDPS, 2006.
    • (2006) IPDPS
    • Jung1
  • 23
    • 3042669130 scopus 로고    scopus 로고
    • IBM power5 chip: A dual-core multithreaded processor
    • Kalla et al. IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro, 24(2):40-47, 2004.
    • (2004) IEEE Micro , vol.24 , Issue.2 , pp. 40-47
    • Kalla1
  • 24
    • 0030652599 scopus 로고    scopus 로고
    • Adaptive data prefetching using cache information
    • A. Ki and A. E. Knowles. Adaptive data prefetching using cache information. In ICS, pages 204-212, 1997.
    • (1997) ICS , pp. 204-212
    • Ki, A.1    Knowles, A.E.2
  • 25
    • 0036949290 scopus 로고    scopus 로고
    • Design and evaluation of compiler algorithms for pre-execution
    • DOI 10.1145/635508.605415
    • D. Kim and D. Yeung. Design and Evaluation of Compiler Algorithms for Pre-xecution. In ASPLOS, pages 159-170, 2002. (Pubitemid 44892231)
    • (2002) Operating Systems Review (ACM) , vol.36 , Issue.5 , pp. 159-170
    • Kim, D.1    Yeung, D.2
  • 26
    • 20344374162 scopus 로고    scopus 로고
    • Niagara: A 32-way multithreaded sparc processor
    • DOI 10.1109/MM.2005.35
    • Kongetira et al. Niagara: A 32-Way Multithreaded Sparc Processor. IEEEMicro, 25(2):21-29, 2005. (Pubitemid 40784326)
    • (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
    • Kongetira, P.1    Aingaran, K.2    Olukotun, K.3
  • 27
    • 85077324913 scopus 로고    scopus 로고
    • Managing prefetch memory for data-intensive online servers
    • C. Li and K. Shen. Managing prefetch memory for data-intensive online servers. In USENIX FAST, pages 253-266, 2005.
    • (2005) USENIX FAST , pp. 253-266
    • Li, C.1    Shen, K.2
  • 29
    • 67650020024 scopus 로고    scopus 로고
    • The performance of runtime data cache prefetching in a dynamic optimization system
    • Lu et al. The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System. In MICRO, page 180, 2003.
    • (2003) MICRO , pp. 180
    • Lu1
  • 31
    • 0034839064 scopus 로고    scopus 로고
    • Tolerating memory latency through software-controlled preexecution in simultaneous multithreading processors
    • C.-K. Luk. Tolerating Memory Latency through Software-controlled preexecution in Simultaneous Multithreading Processors. In ISCA, pages 40-51, 2001.
    • (2001) ISCA , pp. 40-51
    • Luk, C.-K.1
  • 32
    • 0042366306 scopus 로고    scopus 로고
    • Architectural and compiler support for effective instruction prefetching: A cooperative approach
    • DOI 10.1145/367742.367786
    • C.-K. Luk and T. C. Mowry. Architectural and compiler support for effective instruction prefetching: a cooperative approach. ACM Trans. Comput. Syst., 19(1):71-109, 2001. (Pubitemid 33616137)
    • (2001) ACM Transactions on Computer Systems , vol.19 , Issue.1 , pp. 71-109
    • Luk, C.-K.1    Mowry, T.C.2
  • 35
    • 84869376619 scopus 로고    scopus 로고
    • Montecito - The next product in the itanium(R) processor family
    • C. McNairy and R. Bhatia. Montecito - The next product in the Itanium(R) Processor Family, 2004. In Hot Chips 16, http://www.hotchips.org/ archives/.
    • (2004) Hot Chips 16
    • Mc Nairy, C.1    Bhatia, R.2
  • 36
    • 84869354578 scopus 로고    scopus 로고
    • Microsoft
    • Microsoft. Phoenix as a Tool in Research and Instruction. http://research. microsoft.com/phoenix/.
  • 38
    • 85088074507 scopus 로고    scopus 로고
    • Automatic compiler-inserted i/o prefetching for out-of-core applications
    • Mowry et al. Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications. In OSDI, pages 3-17, 1996.
    • (1996) OSDI , pp. 3-17
    • Mowry1
  • 39
    • 0029251307 scopus 로고
    • Going beyond integer programming with the omega test to eliminate false data dependences
    • W. Pugh and D.Wonnacott. Going Beyond Integer Programming with the Omega Test to Eliminate False Data Dependences. IEEE Trans. Parallel Distrib. Syst., 6(2):204-211, 1995.
    • (1995) IEEE Trans. Parallel Distrib. Syst. , vol.6 , Issue.2 , pp. 204-211
    • Pugh, W.1    Wonnacott, D.2
  • 42
    • 0038345698 scopus 로고    scopus 로고
    • Phase tracking and prediction
    • T. Sherwood, S. Sair, and B. Calder. Phase Tracking and Prediction. In ISCA, pages 336-349, 2003.
    • (2003) ISCA , pp. 336-349
    • Sherwood, T.1    Sair, S.2    Calder, B.3
  • 43
    • 33847108092 scopus 로고    scopus 로고
    • Coterminous locality and coterminous group data prefetching on chipmultiprocessors
    • Shi et al. Coterminous locality and coterminous group data prefetching on chipmultiprocessors. In IPDPS, 2006.
    • (2006) IPDPS
    • Shi1
  • 46
    • 28444486909 scopus 로고    scopus 로고
    • Effective instruction prefetching in chip multiprocessors for modern commercial applications
    • Spracklen et al. Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications. In HPCA, pages 225-236, 2005.
    • (2005) HPCA , pp. 225-236
    • Spracklen1
  • 48
    • 67650060169 scopus 로고    scopus 로고
    • UltraSPARC-II enhancements: Support for software controlled prefetch
    • Sun Microsystems
    • Sun Microsystems. UltraSPARC-II Enhancements: Support for Software Controlled Prefetch, 1997. White Paper WPR-0002.
    • (1997) White paper WPR-0002
  • 49
    • 33746291130 scopus 로고    scopus 로고
    • Impact of compiler-based data-prefetching techniques on SPEC OMP application performance
    • Tian et al. Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance. In IPDPS, page 53.1, 2005.
    • (2005) IPDPS
    • Tian1
  • 50
    • 0031164230 scopus 로고    scopus 로고
    • Informed multi-process prefetching and caching
    • Tomkins et al. Informed Multi-Process Prefetching and Caching. In SIGMET- RICS, pages 100-114, 1997.
    • (1997) SIGMET- RICS , pp. 100-114
    • Tomkins1
  • 51
  • 52
    • 0038345683 scopus 로고    scopus 로고
    • Guided region prefetching: A cooperative hardware/software approach
    • Wang et al. Guided Region Prefetching: A Cooperative Hardware/Software Approach. In ISCA, pages 388-398, 2003.
    • (2003) ISCA , pp. 388-398
    • Wang1
  • 54
    • 84976827033 scopus 로고
    • A data locality optimizing algorithm
    • M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm. In PLDI, pages 30-44, 1991.
    • (1991) PLDI , pp. 30-44
    • Wolf, M.E.1    Lam, M.S.2
  • 55
    • 0030379246 scopus 로고    scopus 로고
    • Combining loop transformations considering caches and scheduling
    • Wolf et al. Combining Loop Transformations Considering Caches and Scheduling. In MICRO, pages 274-286, 1996.
    • (1996) MICRO , pp. 274-286
    • Wolf1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.