메뉴 건너뛰기




Volumn , Issue , 2004, Pages 326-335

Cluster prefetch: Tolerating on-chip wire delays in clustered microarchitectures

Author keywords

Clustered microarchitectures; Communication bound processors; Data prefetch; Distributed caches; Effective address and memory dependence prediction

Indexed keywords

CLUSTERED MICROARCHITECTURES; COMMUNICATION-BOUND PROCESSOR; DATA PREFETCH; DISTRIBUTED CACHES;

EID: 8344258257     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (4)

References (38)
  • 1
    • 0033717865 scopus 로고    scopus 로고
    • Clock rate versus IPC: The end of the road for conventional microarchitectures
    • June
    • V. Agarwal, M. Hrishikesh, S. Keckler, and D. Burger. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. In Proceedings of ISCA-27, pages 248-259, June 2000.
    • (2000) Proceedings of ISCA-27 , pp. 248-259
    • Agarwal, V.1    Hrishikesh, M.2    Keckler, S.3    Burger, D.4
  • 2
    • 84962184017 scopus 로고    scopus 로고
    • An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors
    • A. Aggarwal and M. Franklin. An Empirical Study of the Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors. In Proceedings of ISPASS, 2001.
    • (2001) Proceedings of ISPASS
    • Aggarwal, A.1    Franklin, M.2
  • 3
    • 33745778388 scopus 로고    scopus 로고
    • Hierarchical interconnects for on-chip clustering
    • April
    • A. Aggarwal and M. Franklin. Hierarchical Interconnects for On-Chip Clustering. In Proceedings of IPDPS, April 2002.
    • (2002) Proceedings of IPDPS
    • Aggarwal, A.1    Franklin, M.2
  • 5
    • 0038346226 scopus 로고    scopus 로고
    • Dynamically managing the communication-parallelism trade-off in future clustered processors
    • June
    • R. Balasubramonian, S. Dwarkadas, and D. Albonesi. Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors. In Proceedings of ISCA-30, pages 275-286, June 2003.
    • (2003) Proceedings of ISCA-30 , pp. 275-286
    • Balasubramonian, R.1    Dwarkadas, S.2    Albonesi, D.3
  • 6
    • 0034462014 scopus 로고    scopus 로고
    • Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
    • December
    • A. Baniasadi and A. Moshovos. Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar Processors. In Proceedings of MICRO-33, pages 337-347, December 2000.
    • (2000) Proceedings of MICRO-33 , pp. 337-347
    • Baniasadi, A.1    Moshovos, A.2
  • 10
    • 0003465202 scopus 로고    scopus 로고
    • The simplescalar toolset, version 2.0
    • University of Wisconsin-Madison, June
    • D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
    • (1997) Technical Report TR-97-1342
    • Burger, D.1    Austin, T.2
  • 12
    • 0029308368 scopus 로고
    • Effective hardware based data prefetching for high performance processors
    • May
    • T. Chen and J. Baer. Effective Hardware Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609-623, May 1995.
    • (1995) IEEE Transactions on Computers , vol.44 , Issue.5 , pp. 609-623
    • Chen, T.1    Baer, J.2
  • 13
    • 0031594025 scopus 로고    scopus 로고
    • Memory dependence prediction using store sets
    • June
    • G. Chrysos and J. Emer. Memory Dependence Prediction Using Store Sets. In Proceedings of ISCA-25, June 1998.
    • (1998) Proceedings of ISCA-25
    • Chrysos, G.1    Emer, J.2
  • 14
    • 0031374601 scopus 로고    scopus 로고
    • The multicluster architecture: Reducing cycle time through partitioning
    • December
    • K. Farkas, P. Chow, N. Jouppi, and Z. Vranesic. The Multicluster Architecture: Reducing Cycle Time through Partitioning. In Proceedings of MICRO-30, pages 149-159, December 1997.
    • (1997) Proceedings of MICRO-30 , pp. 149-159
    • Farkas, K.1    Chow, P.2    Jouppi, N.3    Vranesic, Z.4
  • 15
    • 57649085955 scopus 로고    scopus 로고
    • Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
    • November
    • E. Gibert, J. Sanchez, and A. Gonzalez. Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor. In Proceedings of MICRO-35, pages 123-133, November 2002.
    • (2002) Proceedings of MICRO-35 , pp. 123-133
    • Gibert, E.1    Sanchez, J.2    Gonzalez, A.3
  • 16
    • 84944397775 scopus 로고    scopus 로고
    • Flexible compiler-managed LO buffers for clustered VLIW processors
    • December
    • E. Gibert, J. Sanchez, and A. Gonzalez. Flexible Compiler-Managed LO Buffers for Clustered VLIW Processors. In Proceedings of MICRO-36, December 2003.
    • (2003) Proceedings of MICRO-36
    • Gibert, E.1    Sanchez, J.2    Gonzalez, A.3
  • 17
    • 0030721866 scopus 로고    scopus 로고
    • Speculative execution via address prediction and data prefetching
    • July
    • J. Gonzalez and A. Gonzalez. Speculative Execution via Address Prediction and Data Prefetching. In Proceedings of the 11th ICS, pages 196-203, July 1997.
    • (1997) Proceedings of the 11th ICS , pp. 196-203
    • Gonzalez, J.1    Gonzalez, A.2
  • 20
    • 0026865602 scopus 로고
    • Processor coupling: Integrating compile time and runtime scheduling for parallelism
    • May
    • S. Keckler and W. Dally. Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism. In Proceedings of ISCA-19, pages 202-213, May 1992.
    • (1992) Proceedings of ISCA-19 , pp. 202-213
    • Keckler, S.1    Dally, W.2
  • 21
    • 0032639289 scopus 로고    scopus 로고
    • The alpha 21264 microprocessor
    • March/April
    • R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.
    • (1999) IEEE Micro , vol.19 , Issue.2 , pp. 24-36
    • Kessler, R.1
  • 27
  • 28
    • 0034461108 scopus 로고    scopus 로고
    • Reducing wire delay penalty through value prediction
    • December
    • J.-M. Parcerisa and A. Gonzalez. Reducing Wire Delay Penalty through Value Prediction. In Proceedings of MICRO-33, pages 317-326, December 2000.
    • (2000) Proceedings of MICRO-33 , pp. 317-326
    • Parcerisa, J.-M.1    Gonzalez, A.2
  • 30
    • 1142280992 scopus 로고    scopus 로고
    • Partitioned first-level cache design for clustered microarchitectures
    • June
    • P. Racunas and Y. Patt. Partitioned First-Level Cache Design for Clustered Microarchitectures. In Proceedings of ICS-17, June 2003.
    • (2003) Proceedings of ICS-17
    • Racunas, P.1    Patt, Y.2
  • 31
    • 0031605773 scopus 로고    scopus 로고
    • An empirical study of decentralized ILP execution models
    • October
    • N. Ranganathan and M. Franklin. An Empirical Study of Decentralized ILP Execution Models. In Proceedings of ASPLOS-VIII, pages 272-281, October 1998.
    • (1998) Proceedings of ASPLOS-VIII , pp. 272-281
    • Ranganathan, N.1    Franklin, M.2
  • 32
    • 0032315195 scopus 로고    scopus 로고
    • Predictive techniques for aggressive load speculation
    • December
    • G. Reinman and B. Calder. Predictive Techniques for Aggressive Load Speculation. In Proceedings of MICRO-31, December 1998.
    • (1998) Proceedings of MICRO-31
    • Reinman, G.1    Calder, B.2
  • 33
    • 0034459218 scopus 로고    scopus 로고
    • Modulo scheduling for a fully-distributed clustered VLIW architecture
    • December
    • J. Sanchez and A. Gonzalez. Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture. In Proceedings of MICRO-33, pages 124-133, December 2000.
    • (2000) Proceedings of MICRO-33 , pp. 124-133
    • Sanchez, J.1    Gonzalez, A.2
  • 34
    • 0030409867 scopus 로고    scopus 로고
    • The performance potential of data dependence speculation and collapsing
    • Dec
    • Y. Sazeides, S. Vassiliadis, and J. Smith. The Performance Potential of Data Dependence Speculation and Collapsing. In Proceedings of MICRO-29, pages 238-247, Dec 1996.
    • (1996) Proceedings of MICRO-29 , pp. 238-247
    • Sazeides, Y.1    Vassiliadis, S.2    Smith, J.3
  • 35
    • 0003450887 scopus 로고    scopus 로고
    • CACTI 3.0: An integrated cache timing, power, and area model
    • Compaq Western Research Laboratory, August
    • P. Shivakumar and N. P. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. Technical Report TN-2001/2, Compaq Western Research Laboratory, August 2001.
    • (2001) Technical Report TN-2001/2
    • Shivakumar, P.1    Jouppi, N.P.2
  • 37
    • 0034817930 scopus 로고    scopus 로고
    • Dynamic prediction of critical path instructions
    • January
    • E. Tune, D. Liang, D. Tullsen, and B. Calder. Dynamic Prediction of Critical Path Instructions. In Proceedings of HPCA-7, pages 185-196, January 2001.
    • (2001) Proceedings of HPCA-7 , pp. 185-196
    • Tune, E.1    Liang, D.2    Tullsen, D.3    Calder, B.4
  • 38
    • 0035273395 scopus 로고    scopus 로고
    • Inherently lower-power high-performance superscalar architectures
    • March
    • V. Zyuban and P. Kogge. Inherently Lower-Power High-Performance Superscalar Architectures. IEEE Transactions on Computers, March 2001.
    • (2001) IEEE Transactions on Computers
    • Zyuban, V.1    Kogge, P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.