메뉴 건너뛰기




Volumn 28, Issue , 2014, Pages 1-69

A primer on hardware prefetching

Author keywords

address correlated prefetching; branch directed prefetching; discontinuity prefetching; execution based prefetching; global history buffer; hardware prefetching; Markov prefetcher; next line prefetching; spatial memory streaming; stride prefetching; temporal memory streaming

Indexed keywords

COMPUTER ARCHITECTURE; DISTRIBUTED COMPUTER SYSTEMS; HARDWARE; MICROPROCESSOR CHIPS; WALLS (STRUCTURAL PARTITIONS); MEMORY ARCHITECTURE; RECONFIGURABLE HARDWARE;

EID: 84902193542     PISSN: 19353235     EISSN: 19353243     Source Type: Book Series    
DOI: 10.2200/S00581ED1V01Y201405CAC028     Document Type: Article
Times cited : (36)

References (131)
  • 1
    • 0003158656 scopus 로고
    • Hitting the memory wall: Implications of the obvious
    • DOI: 10.1145/216585.216588.
    • W. A. Wulf and S. A. McKee. "Hitting the Memory Wall: Implications of the Obvious." ACM SIGARCH Computer Architecture News, v. 23 no. 1, 1995. DOI: 10.1145/216585.216588. xiii
    • (1995) ACM SIGARCH Computer Architecture News , vol.23 , Issue.1
    • Wulf, W.A.1    McKee, S.A.2
  • 2
    • 84878619560 scopus 로고    scopus 로고
    • TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
    • DOI: 10.1145/2445572.2445574.
    • D. Lustig, A. Bhattacharjee, and M. Martonosi. "TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs." ACM Transactions on Architecture and Code Optimization, v. 10, no. 1, 2013. DOI: 10.1145/2445572.2445574. xiii
    • (2013) ACM Transactions on Architecture and Code Optimization , vol.10 , Issue.1
    • Lustig, D.1    Bhattacharjee, A.2    Martonosi, M.3
  • 4
    • 67651111633 scopus 로고    scopus 로고
    • Te Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It
    • DOI: 10.2200/S00201ED1V01Y200907CAC007.
    • B. Jacob. "Te Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It." Synthesis Lectures on Computer Architecture, v. 4, no. 1, 2009. DOI: 10.2200/S00201ED1V01Y200907CAC007. 2
    • (2009) Synthesis Lectures on Computer Architecture , vol.4 , Issue.1 , pp. 2
    • Jacob, B.1
  • 5
    • 0018106484 scopus 로고
    • Sequential Program Prefetching in Memory Hierarchies
    • DOI: 10.1109/C-M.1978.218016. 7
    • A. J. Smith. "Sequential Program Prefetching in Memory Hierarchies." Computer, v. 11, no. 12, 1978. DOI: 10.1109/C-M.1978.218016. 7, 15
    • (1978) Computer , vol.11 , Issue.12 , pp. 15
    • Smith, A.J.1
  • 6
    • 2842517957 scopus 로고
    • Te IBM System/360 Model 91: Machine Philosophy and Instruction-Handling
    • DOI: 10.1147/rd.111.0008. 8
    • D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo. "Te IBM System/360 Model 91: Machine Philosophy and Instruction-Handling." IBM Journal of Research and Development, v. 11 no. 1, 1967. DOI: 10.1147/rd.111.0008. 8
    • (1967) IBM Journal of Research and Development , vol.11 , Issue.1
    • Anderson, D.W.1    Sparacio, F.J.2    Tomasulo, R.M.3
  • 10
    • 34548767664 scopus 로고    scopus 로고
    • Enlarging Instruction Streams
    • DOI: 10.1109/TC.2007.70742. 8
    • O. J. Santana, A. Ramirez, and M. Valero. "Enlarging Instruction Streams." IEEE Trans-actions on Computers, v. 56, no. 10, 2007. DOI: 10.1109/TC.2007.70742. 8, 11
    • (2007) IEEE Trans-actions on Computers , vol.56 , Issue.10 , pp. 11
    • Santana, O.J.1    Ramirez, A.2    Valero, M.3
  • 13
    • 8344281427 scopus 로고    scopus 로고
    • Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors
    • DOI: 10.1142/S0129053399000065. 9
    • A. V. Veidenbaum, Q. Zhao, and A. Shameer. "Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors." International Journal of High Speed Computing, v. 10, no. 1, 1999. DOI: 10.1142/ S0129053399000065. 9
    • (1999) International Journal of High Speed Computing , vol.10 , Issue.1
    • Veidenbaum, A.V.1    Zhao, Q.2    Shameer, A.3
  • 20
    • 2442585659 scopus 로고    scopus 로고
    • Call Graph Prefetching for Data-base Applications
    • DOI: 10.1145/945506.945509. 11
    • M. Annavaram, J. M. Patel, and E. S. Davidson. "Call Graph Prefetching for Data-base Applications." ACM Transactions on Computer Systems, v. 21, no. 4, 2003. DOI: 10.1145/945506.945509. 11
    • (2003) ACM Transactions on Computer Systems , vol.21 , Issue.4
    • Annavaram, M.1    Patel, J.M.2    Davidson, E.S.3
  • 23
    • 0032308865 scopus 로고    scopus 로고
    • Cooperative Prefetching: Compiler and Hardware Support for Efective Instruction Prefetching in Modern Processors
    • DOI: 10.1109/MICRO.1998.742780.12
    • C-K. Luk, T. C. Mowry. "Cooperative Prefetching: Compiler and Hardware Support for Efective Instruction Prefetching In Modern Processors." In Proc. of the 31st annual ACM/IEEE International Symposium on Microarchitecture, 1998. DOI: 10.1109/MICRO.1998.742780.12
    • (1998) Proc. of the 31st Annual ACM/IEEE International Symposium on Microarchitecture
    • Luk, C.-K.1    Mowry, T.C.2
  • 29
    • 0026267802 scopus 로고
    • An Efective On-Chip Preloading Scheme to Reduce Data Access Penalty
    • DOI: 10.1145/125826.125932. 15
    • J.-L. Baer and T.-F Chen. "An Efective On-Chip Preloading Scheme to Reduce Data Access Penalty." In Proc. of Supercomputing, 1991. DOI: 10.1145/125826.125932. 15, 16
    • (1991) Proc. of Supercomputing , vol.16
    • Baer, J.-L.1    Chen, T.-F.2
  • 30
    • 0038702612 scopus 로고
    • Efectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
    • IEEE Symposium on High-Performance Computer Architecture DOI: 10.1109/HPCA.1995.386554. 16
    • F Dahlgren and P. Stenstrom. "Efectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors." In Proc. of the 1st IEEE Symposium on High-Performance Computer Architecture, 1995. DOI: 10.1109/HPCA.1995.386554. 16
    • (1995) Proc. of the 1st
    • Dahlgren, F.1    Stenstrom, P.2
  • 31
    • 79551718643 scopus 로고    scopus 로고
    • Access Map Pattern Matching for High Performance Data Cache Prefetch
    • Y. Ishii, M. Inaba and K. Hiraki. "Access Map Pattern Matching for High Performance Data Cache Prefetch." Journal of Instruction-Level Parallelism, v. 13, 2011. 16, 28
    • (2011) Journal of Instruction-Level Parallelism , vol.13 , Issue.16 , pp. 28
    • Ishii, Y.1    Inaba, M.2    Hiraki, K.3
  • 32
    • 0037340044 scopus 로고    scopus 로고
    • A Decoupled Predictor-Directed Stream Prefetch-ing Architecture
    • DOI: 10.1109/TC.2003.1183943.
    • S. Sair, T. Sherwood, and B. Calder. "A Decoupled Predictor-Directed Stream Prefetch-ing Architecture." IEEE Transactions on Computers, v. 52, no. 3, 2003. DOI: 10.1109/TC.2003.1183943. 16
    • (2003) IEEE Transactions on Computers , vol.52 , Issue.3 , pp. 16
    • Sair, S.1    Sherwood, T.2    Calder, B.3
  • 34
    • 0025429331 scopus 로고
    • Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Bufers
    • DOI: 10.1145/325164.325162. 16,24
    • N. P. Jouppi. "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Bufers." In Proc. of the 17th Annual International Symposium on Computer Architecture, 1990. DOI: 10.1145/325164.325162. 16, 24
    • (1990) Proc. of the 17th Annual International Symposium on Computer Architecture
    • Jouppi, N.P.1
  • 38
    • 0016930686 scopus 로고
    • Dynamic Improvement of Locality in Virtual Mem-ory Systems
    • DOI: 10.1109/TSE.1976.233801. 17
    • J-L. Baer, J-L., and G. R. Sager. "Dynamic Improvement of Locality in Virtual Mem-ory Systems." IEEE Transactions on Software Engineering, v. 1, 1976. DOI: 10.1109/TSE.1976.233801. 17
    • (1976) IEEE Transactions on Software Engineering , vol.1
    • Baer J -L, J.-L.1    Sager, G.R.2
  • 50
    • 0033075109 scopus 로고    scopus 로고
    • Prefetching Using Markov Predictors
    • DOI: 10.1109/12.752653. 18
    • D. Joseph and D. Grunwald. "Prefetching Using Markov Predictors." IEEE Transactions on Computers, v. 48 no. 2, 1999. DOI: 10.1109/12.752653. 18
    • (1999) IEEE Transactions on Computers , vol.48 , Issue.2
    • Joseph, D.1    Grunwald, D.2
  • 64
    • 0027149156 scopus 로고    scopus 로고
    • Modeling Live and Dead Lines in Cache Memory Systems
    • DOI: 10.1109/12.192209. 20
    • A. Mendelson, D. Tiebaut, and D. K. Pradhan. "Modeling Live and Dead Lines in Cache Memory Systems." IEEE Transactions on Computers, v. 4 2, n o. 1. DOI: 10.1109/12.192209. 20
    • IEEE Transactions on Computers , vol.4 , Issue.2 , pp. 1
    • Mendelson, A.1    Tiebaut, D.2    Pradhan, D.K.3
  • 70
    • 77949614728 scopus 로고    scopus 로고
    • Making Ad-dress-Correlated Prefetching Practical
    • DOI: 10.1109/MM.2010.21. 25
    • T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsaf and A. Moshovos. "Making Ad-dress-Correlated Prefetching Practical." IEEE Micro, v. 30, no. 1, 2010. DOI: 10.1109/MM.2010.21. 25
    • (2010) IEEE Micro , vol.30 , Issue.1
    • Wenisch, T.F.1    Ferdman, M.2    Ailamaki, A.3    Falsaf, B.4    Moshovos, A.5
  • 74
    • 79551697130 scopus 로고    scopus 로고
    • Storage Efcient Hardware Prefetching Using Delta Correlating Prediction Tables
    • M. Grannaes, M. Jahre, and L. Natvig. "Storage Efcient Hardware Prefetching Using Delta Correlating Prediction Tables." Journal of Instruction-Level Parallelism, v. 13, 2011. 28
    • (2011) Journal of Instruction-Level Parallelism , vol.13 , pp. 28
    • Grannaes, M.1    Jahre, M.2    Natvig, L.3
  • 75
    • 79551702603 scopus 로고    scopus 로고
    • Combining Local and Global History for High Perfor-mance Data Prefetching
    • M. Dimitrov and H. Zhou. "Combining Local and Global History for High Perfor-mance Data Prefetching." Journal of Instruction-Level Parallelism, v. 13, 2011. 28
    • (2011) Journal of Instruction-Level Parallelism , vol.13 , pp. 28
    • Dimitrov, M.1    Zhou, H.2
  • 78
    • 79551700079 scopus 로고    scopus 로고
    • Data Prefetching by Exploiting Global and Local Access Pat-terns
    • A. Sharif and H-H. Lee. "Data Prefetching by Exploiting Global and Local Access Pat-terns. " Journal of Instruction-Level Parallelism, v. 13, 2011. 28
    • (2011) Journal of Instruction-Level Parallelism , vol.13 , pp. 28
    • Sharif, A.1    Lee, H.-H.2
  • 86
    • 0028324009 scopus 로고
    • Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio
    • DOI: 10.1145/191995.192072. 30
    • A. Seznec. "Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio." In Proc. of the 21st Annual International Symposium on Computer Architecture, 1994. DOI: 10.1145/191995.192072. 30
    • (1994) Proc. of the 21st Annual International Symposium on Computer Architecture
    • Seznec, A.1
  • 96
    • 47349095223 scopus 로고    scopus 로고
    • Future Execution: A Prefetching Mechanism that Uses Multiple Cores to Speed Up Single Treads
    • DOI: 10.1145/1187976.1187979. 33
    • I. Ganusov and M. Burtscher. "Future Execution: A Prefetching Mechanism that Uses Multiple Cores to Speed Up Single Treads." ACM Transactions on Architecture and Code Optimization, v. 3, no. 4, 2006. DOI: 10.1145/1187976.1187979. 33
    • (2006) ACM Transactions on Architecture and Code Optimization , vol.3 , Issue.4
    • Ganusov, I.1    Burtscher, M.2
  • 97
    • 68849120952 scopus 로고    scopus 로고
    • Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems
    • DOI: 10.1109/TPDS.2008.224. 33
    • J. Lee, C. Jung, D. Lim, and Y. Solihin. "Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems." IEEE Transactions on Parallel and Distributed Systems, v. 20, no. 9, 2009. DOI: 10.1109/TPDS.2008. 224. 33
    • (2009) IEEE Transactions on Parallel and Distributed Systems , vol.20 , Issue.9
    • Lee, J.1    Jung, C.2    Lim, D.3    Solihin, Y.4
  • 102
  • 104
    • 1342282617 scopus 로고    scopus 로고
    • Runahead execution: An Efective Alternative to Large Instruction Windows
    • DOI: 10.1109/MM.2003.1261383.34
    • O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. "Runahead execution: An Efective Alternative to Large Instruction Windows." IEEE Micro, v. 23, no. 6, 2003. DOI: 10.1109/MM.2003.1261383.34
    • (2003) IEEE Micro , vol.23 , Issue.6
    • Mutlu, O.1    Stark, J.2    Wilkerson, C.3    Patt, Y.N.4
  • 106
    • 33644903196 scopus 로고    scopus 로고
    • Efficient Runahead Execution: Power-Efcient Memory Latency Tolerance
    • DOI: 10.1109/MM.2006.10. 34
    • O. Mutlu, H. Kim, and Y N. Patt. "Efficient Runahead Execution: Power-Efcient Memory Latency Tolerance." IEEE Micro, v. 26, no. 1, 2006. DOI: 10.1109/MM.2006.10. 34
    • (2006) IEEE Micro , vol.26 , Issue.1
    • Mutlu, O.1    Kim, H.2    Patt, Y.N.3
  • 109
    • 77951007282 scopus 로고    scopus 로고
    • Extending Data Prefetching to Cope with Context Switch Misses
    • DOI: 10.1109/ICCD.2009.5413144. 34, 35
    • H. Cui and S. Suleyman. "Extending Data Prefetching to Cope with Context Switch Misses." In Proc. of the International Conference on Computer Design, 2009. DOI: 10.1109/ICCD.2009.5413144. 34, 35
    • (2009) Proc. of the International Conference on Computer Design
    • Cui, H.1    Suleyman, S.2
  • 119
    • 79551699363 scopus 로고    scopus 로고
    • Efcient Prefetching with Hybrid Schemes and Use of Program Feedback to Adjust Prefetcher Aggressiveness
    • S. Verma, D. M. Koppelman, and L. Peng. "Efcient Prefetching with Hybrid Schemes and Use of Program Feedback to Adjust Prefetcher Aggressiveness." Journal of Instruction-Level Parallelism, v. 13, 2011. 35
    • (2011) Journal of Instruction-Level Parallelism , vol.13 , pp. 35
    • Verma, S.1    Koppelman, D.M.2    Peng, L.3
  • 125
    • 0032650093 scopus 로고    scopus 로고
    • Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation
    • DOI: 10.1145/300979.300987. 36
    • C.-K. Luk and T. C. Mowry. "Memory Forwarding: Enabling Aggressive Layout Optimizations by Guaranteeing the Safety of Data Relocation." In Proc. of the 26th Annual International Symposium on Computer Architecture, 1999. DOI: 10.1145/300979.300987. 36
    • (1999) Proc. of the 26th Annual International Symposium on Computer Architecture
    • Luk, C.-K.1    Mowry, T.C.2
  • 131
    • 79961040286 scopus 로고    scopus 로고
    • Toward Dark Silicon in Servers
    • DOI: 10.1109/MM.2011.77.40
    • N. Hardavellas, M. Ferdman, B. Falsaf, and A. Ailamaki. "Toward Dark Silicon in Servers." In IEEE Micro, v. 31, no. 4, 2011. DOI: 10.1109/MM.2011.77.40.
    • (2011) IEEE Micro , vol.31 , Issue.4
    • Hardavellas, N.1    Ferdman, M.2    Falsaf, B.3    Ailamaki, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.