SCOPUS 정보 검색 플랫폼

IEEE Transactions on Computers

Volumn 53, Issue 7, 2004, Pages 843-855

Design and optimization of large size and low overhead off-chip caches

(3) Zhang, Zhao a Zhu, Zhichun b Zhang, Xiaodong b

a Iowa State University (United States)

b IEEE (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SIMULATION; DYNAMIC RANDOM ACCESS STORAGE; MICROPROCESSOR CHIPS; MULTIPROCESSING SYSTEMS; OPTIMIZATION; STATIC RANDOM ACCESS STORAGE;

MEMORY HIERARCHY; MEMORY INTENSIVE APPLICATIONS; OFF CHIP CACHES;

CACHE MEMORY;

EID: 3242710575 PISSN: 00189340 EISSN: None Source Type: Journal
DOI: 10.1109/TC.2004.27 Document Type: Article

Times cited : (24)

References (51)

1
- 0034825713
- Performance of hardware compressed main memory
- B. Abali, H. Franke, X. Shen, D.E. Poff, and T.B. Smith, "Performance of Hardware Compressed Main Memory," Proc. Seventh Int'l Symp. High-Performance Computer Architecture, pp. 73-81, 2001.
- (2001) Proc. Seventh Int'l Symp. High-Performance Computer Architecture , pp. 73-81
- Abali, B.¹ Franke, H.² Shen, X.³ Poff, D.E.⁴ Smith, T.B.⁵

2
- 0034844454
- Data prefetching by dependence graph precomputation
- M.M. Annavaram, J.M. Patel, and E.S. Davidson, "Data Prefetching by Dependence Graph Precomputation," Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 52-61, 2001.
- (2001) Proc. 28th Ann. Int'l Symp. Computer Architecture , pp. 52-61
- Annavaram, M.M.¹ Patel, J.M.² Davidson, E.S.³

3
- 0023672138
- On the inclusion properties for multi-level cache hierarchies
- J.-L. Baer and W.-H. Wang, "On the Inclusion Properties for Multi-Level Cache Hierarchies," Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 73-80, 1988.
- (1988) Proc. 15th Ann. Int'l Symp. Computer Architecture , pp. 73-80
- Baer, J.-L.¹ Wang, W.-H.²

4
- 0034856729
- Dynamically allocating processor resources between nearby and distant ILP
- R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, "Dynamically Allocating Processor Resources between Nearby and Distant ILP," Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 26-37, 2001.
- (2001) Proc. 28th Ann. Int'l Symp. Computer Architecture , pp. 26-37
- Balasubramonian, R.¹ Dwarkadas, S.² Albonesi, D.H.³

5
- 3242671405
- Technical Report CS-TR-1997-1349, Univ. of Wisconsin, Madison, June
- D. Burger, "System-Level Implications of Processor-Memory Integration," Technical Report CS-TR-1997-1349, Univ. of Wisconsin, Madison, June 1997.
- (1997) System-Level Implications of Processor-Memory Integration
- Burger, D.¹

6
- 0031594024
- Multi-Level texture caching for 3D graphics hardware
- M. Cox, N. Bhandari, and M. Shantz, "Multi-Level Texture Caching for 3D Graphics Hardware," Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 86-97, 1998.
- (1998) Proc. 25th Ann. Int'l Symp. Computer Architecture , pp. 86-97
- Cox, M.¹ Bhandari, N.² Shantz, M.³

7
- 0003662159
- San Mateo, Calif.: Morgan Kaufmann
- D. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach. San Mateo, Calif.: Morgan Kaufmann, 1999.
- (1999) Parallel Computer Architecture: A Hardware/Software Approach
- Culler, D.¹ Singh, J.P.² Gupta, A.³

8
- 0032687058
- A performance comparison of contemporary DRAM architectures
- V. Cuppu, B. Jacob, B. Davis, and T. Mudge, "A Performance Comparison of Contemporary DRAM Architectures," Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 222-233, 1999.
- (1999) Proc. 26th Ann. Int'l Symp. Computer Architecture , pp. 222-233
- Cuppu, V.¹ Jacob, B.² Davis, B.³ Mudge, T.⁴

9
- 0030348712
- AlphaServer 4100 performance characterization
- Z. Cvetanovic and D.D. Donaldson, "AlphaServer 4100 Performance Characterization," Digital Technical J., vol. 8, no. 4, pp. 3-20, 1996.
- (1996) Digital Technical J. , vol.8 , Issue.4 , pp. 3-20
- Cvetanovic, Z.¹ Donaldson, D.D.²

10
- 0036374270
- The architecture of the DIVA processing-in-memory chip
- J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C.W. Kang, I. Kim, and G. Daglikoca, "The Architecture of the DIVA Processing-in-Memory Chip," Proc. 16th Int'l Conf. Supercomputing, pp. 14-25, 2002.
- (2002) Proc. 16th Int'l Conf. Supercomputing , pp. 14-25
- Draper, J.¹ Chame, J.² Hall, M.³ Steele, C.⁴ Barrett, T.⁵ LaCoss, J.⁶ Granacki, J.⁷ Shin, J.⁸ Chen, C.⁹ Kang, C.W.¹⁰ Kim, I.¹¹ Daglikoca, G.¹²

11
- 3242686851
- Enhanced Memory Systems Inc
- Enhanced Memory Systems Inc., 64 Mit ESDRAM Components, Product Brief r1.8, 2000.
- (2000) 64 Mit ESDRAM Components, Product Brief R1.8

12
- 78650753155
- Memory-Intensive benchmarks: IRAM vs. cache-based machines
- B. Gaeke, P. Husbands, X. Li, L. Oliker, K. Yelick, and R. Biswas, "Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines," Proc. 16th Int'l Parallel and Distributed Processing Symp., pp. 30-30, 2002.
- (2002) Proc. 16th Int'l Parallel and Distributed Processing Symp. , pp. 30-30
- Gaeke, B.¹ Husbands, P.² Li, X.³ Oliker, L.⁴ Yelick, K.⁵ Biswas, R.⁶

13
- 0030677581
- The design and analysis of a cache architecture for texture mapping
- Z.S. Hakura and A. Gupta, "The Design and Analysis of a Cache Architecture for Texture Mapping," Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 108-120, 1997.
- (1997) Proc. 24th Ann. Int'l Symp. Computer Architecture , pp. 108-120
- Hakura, Z.S.¹ Gupta, A.²

14
- 0003997750
- CDRAM in a unified memory architecture
- C.A. Hart, "CDRAM in a Unified Memory Architecture," Proc. CompCon '94, pp. 261-266, 1994.
- (1994) Proc. CompCon '94 , pp. 261-266
- Hart, C.A.¹

15
- 0025419834
- The cache DRAM architecture: A DRAM with an on-chip cache memory
- Apr
- H. Hidaka, Y. Matsuda, M. Asakura, and K. Fujishima, "The Cache DRAM Architecture: A DRAM with an On-Chip Cache Memory," IEEE Micro, vol. 10, no. 2, pp. 14-25, Apr. 1990.
- (1990) IEEE Micro , vol.10 , Issue.2 , pp. 14-25
- Hidaka, H.¹ Matsuda, Y.² Asakura, M.³ Fujishima, K.⁴

16
- 0027191655
- Performance of cached DRAM organizations in vector supercomputers
- W.-C. Hsu and J.E. Smith, "Performance of Cached DRAM Organizations in Vector Supercomputers," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 327-336, 1993.
- (1993) Proc. 20th Ann. Int'l Symp. Computer Architecture , pp. 327-336
- Hsu, W.-C.¹ Smith, J.E.²

17
- 0029666640
- DCD-Disk caching disk: A new approach for boosting I/O performance
- Y. Hu and Q. Yang, "DCD-Disk Caching Disk: A New Approach for Boosting I/O Performance," Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 169-178, 1996.
- (1996) Proc. 23rd Ann. Int'l Symp. Computer Architecture , pp. 169-178
- Hu, Y.¹ Yang, Q.²

18
- 0003535436
- white paper, IBM, Oct
- "POWER4 System Architecture," white paper, IBM, Oct. 2001.
- (2001) POWER4 System Architecture

19
- 0026938770
- A new Era of fast dynamic RAMs
- Oct
- F. Jones et al., "A New Era of Fast Dynamic RAMs," IEEE Spectrum, pp. 43-49, Oct. 1992.
- (1992) IEEE Spectrum , pp. 43-49
- Jones, F.¹

20
- 0025429331
- Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
- N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proc. 17th Ann. Int'l Symp. Computer Architecture, pp. 364-373, 1990.
- (1990) Proc. 17th Ann. Int'l Symp. Computer Architecture , pp. 364-373
- Jouppi, N.P.¹

21
- 84862448424
- Technical Report HPL-2000-53, HP Laboratories, Palo Alto, Calif., Apr
- P. Keltcher, S. Richardson, and S. Siu, "An Equal Area Comparison of Embedded DRAM and SRAM Memory Architectures for a Chip Multiprocessor," Technical Report HPL-2000-53, HP Laboratories, Palo Alto, Calif., Apr. 2000.
- (2000) an Equal Area Comparison of Embedded DRAM and SRAM Memory Architectures for a Chip Multiprocessor
- Keltcher, P.¹ Richardson, S.² Siu, S.³

22
- 84947250925
- Active memory: Micron's yukon
- G. Kirsch, "Active Memory: Micron's Yukon," Proc. Int'l Parallel and Distributd Processing Symp., p. 89b, 2003.
- (2003) Proc. Int'l Parallel and Distributd Processing Symp.
- Kirsch, G.¹

23
- 0003985543
- WCDRAM: A fully associative integrated cached-DRAM with vide cache lines
- R.P. Koganti and G. Kedem, "WCDRAM: A Fully Associative Integrated Cached-DRAM with Wide Cache Lines," Proc. Fourth IEEE Workshop Architecture and Implementation of High Performance Comm. Systems, 1997.
- (1997) Proc. Fourth IEEE Workshop Architecture and Implementation of High Performance Comm. Systems
- Koganti, R.P.¹ Kedem, G.²

24
- 3242661621
- Technical Report CSD-99-1059, Univ. of California, Berkeley
- C. Kozyrakis, "A Media-Enhanced Vector Architecture for Embedded Memory Systems," Technical Report CSD-99-1059, Univ. of California, Berkeley, 1999.
- (1999) A Media-Enhanced Vector Architecture for Embedded Memory Systems
- Kozyrakis, C.¹

25
- 0005503448
- Vector IRAM: A media-enhanced vector processor with embedded DRAM
- C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, and D. Patterson, "Vector IRAM: A Media-Enhanced Vector Processor with Embedded DRAM," Proc. Hot Chips 12, 2000.
- Proc. Hot Chips , vol.12 , pp. 2000
- Kozyrakis, C.¹ Gebis, J.² Martin, D.³ Williams, S.⁴ Mavroidis, I.⁵ Pope, S.⁶ Jones, D.⁷ Patterson, D.⁸

26
- 0034461711
- Eager writeback - A technique for improving bandwidth utilization
- H.-H. Lee, G. Tyson, and M. Farrens, "Eager Writeback - A Technique for Improving Bandwidth Utilization," Proc 33rd IEEE/ACM Int'l Symp. Microarchitecture, pp. 11-21, 2000.
- (2000) Proc 33rd IEEE/ACM Int'l Symp. Microarchitecture , pp. 11-21
- Lee, H.-H.¹ Tyson, G.² Farrens, M.³

27
- 0034818343
- Reducing DRAM latencies with an integrated memory hierarchy design
- W. Lin, S. Reinhardt, and D. Burger, "Reducing DRAM Latencies with an Integrated Memory Hierarchy Design," Proc. Seventh Int'l Symp. High-Performance Computer Architecture, pp. 301-312, 2001.
- (2001) Proc. Seventh Int'l Symp. High-Performance Computer Architecture , pp. 301-312
- Lin, W.¹ Reinhardt, S.² Burger, D.³

28
- 0034839064
- Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
- C.-K. Luk, "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 40-51, 2001.
- (2001) Proc. 28th Ann. Int'l Symp. Computer Architecture , pp. 40-51
- Luk, C.-K.¹

29
- 0028294834
- Evaluating stream buffers as a secondary cache replacement
- S. Palacharla and R.E. Kessler, "Evaluating Stream Buffers as a Secondary Cache Replacement," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 24-33, 1994.
- (1994) Proc. 21st Ann. Int'l Symp. Computer Architecture , pp. 24-33
- Palacharla, S.¹ Kessler, R.E.²

30
- 0031096193
- A case for intelligent RAM
- Mar./Apr
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, "A Case for Intelligent RAM," IEEE Micro, pp. 34-44, Mar./Apr. 1997.
- (1997) IEEE Micro , pp. 34-44
- Patterson, D.¹ Anderson, T.² Cardwell, N.³ Fromm, R.⁴ Keeton, K.⁵ Kozyrakis, C.⁶ Thomas, R.⁷ Yelick, K.⁸

31
- 0033075272
- Functional implementation techniques for CPU cache memories
- Feb
- J.-K. Peir, W.W. Hsu, and A.J. Smith, "Functional Implementation Techniques for CPU Cache Memories," IEEE Trans. Computers, vol. 48, no. 2, pp. 100-110, Feb. 1999.
- (1999) IEEE Trans. Computers , vol.48 , Issue.2 , pp. 100-110
- Peir, J.-K.¹ Hsu, W.W.² Smith, A.J.³

32
- 0036375949
- Bloom filtering cache misses for accurate data speculation and prefetching
- J.-K. Peir, S.-C. Lai, S.-L. Lu, J. Stark, and K. Lai, "Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching," Proc. 16th Int'l Conf. Supercomputing (ICS-02), pp. 189-198, 2002.
- (2002) Proc. 16th Int'l Conf. Supercomputing (ICS-02) , pp. 189-198
- Peir, J.-K.¹ Lai, S.-C.² Lu, S.-L.³ Stark, J.⁴ K. Lai⁵

33
- 0035101241
- The IA-64 itanium processor cartridge
- Jan./Feb
- W.A. Samaras, N. Cherukuri, and S. Venkataraman, "The IA-64 Itanium Processor Cartridge," IEEE Micro, vol. 21, no. 1, pp. 82-89, Jan./Feb. 2001.
- (2001) IEEE Micro , vol.21 , Issue.1 , pp. 82-89
- Samaras, W.A.¹ Cherukuri, N.² Venkataraman, S.³

34
- 0029666645
- Missing the memory wall: The case for processor/memory integration
- A. Saulsbury, F. Pong, and A. Nowatzyk, "Missing the Memory Wall: The Case for Processor/Memory Integration," Proc. 23rd Ann. Int'l Symp. Computer Architecure, pp. 90-103, 1996.
- (1996) Proc. 23rd Ann. Int'l Symp. Computer Architecure , pp. 90-103
- Saulsbury, A.¹ Pong, F.² Nowatzyk, A.³

35
- 0028324009
- Decoupled sectored caches: Conciliating low tag implementation cost and low miss ratio
- A. Seznec, "Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 384-393, 1994.
- (1994) Proc. 21st Ann. Int'l Symp. Computer Architecture , pp. 384-393
- Seznec, A.¹

36
- 0037340044
- A decoupled predictor-directed stream prefetching architecture
- Mar
- T. Sherwood and B. Calder, "A Decoupled Predictor-Directed Stream Prefetching Architecture," IEEE Trans. Computers, vol. 52, no. 5, Mar. 2003.
- (2003) IEEE Trans. Computers , vol.52 , Issue.5
- Sherwood, T.¹ Calder, B.²

37
- 0003450887
- technical report, COMPAQ Western Research Lab, Aug
- P. Shivakumar and N.P. Jouppi, "CACTI 3.0: An Integrated Cache Timing, Power, and Area Model," technical report, COMPAQ Western Research Lab, Aug. 2001.
- (2001) CACTI 3.0: an Integrated Cache Timing, Power, and Area Model
- Shivakumar, P.¹ Jouppi, N.P.²

38
- 0020564767
- A study of instruction cache organization and replacement policies
- J.E. Smith and J.R. Goodman, "A Study of Instruction Cache Organization and Replacement Policies," Proc. 10th Ann. Int'l Symp. Computer Architecture, pp. 132-137, 1983.
- (1983) Proc. 10th Ann. Int'l Symp. Computer Architecture , pp. 132-137
- Smith, J.E.¹ Goodman, J.R.²

39
- 3242726706
- Standard Performance Evaluation Corp., http://www.spec.org, 2004.
- (2004) Standard Performance Evaluation Corp.

40
- 0035272785
- Pinnacle: IBM MXT in a memory controller chip
- Mar/Apr
- R.B. Tremaine, T.B. Smith, M. Wazlowski, D. Har, K.-K. Mak, and S. Arramreddy, "Pinnacle: IBM MXT in a Memory Controller Chip," IEEE Micro, vol. 21, no. 2, pp. 56-68, Mar/Apr. 2001.
- (2001) IEEE Micro , vol.21 , Issue.2 , pp. 56-68
- Tremaine, R.B.¹ Smith, T.B.² Wazlowski, M.³ Har, D.⁴ Mak, K.-K.⁵ Arramreddy, S.⁶

41
- 0029508817
- A modified approach to data cache management
- G. Tyson, M. Farrens, J. Matthews, and A.R. Pleszkun, "A Modified Approach to Data Cache Management," Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 93-103, 1995.
- (1995) Proc. 28th Ann. Int'l Symp. Microarchitecture , pp. 93-103
- Tyson, G.¹ Farrens, M.² Matthews, J.³ Pleszkun, A.R.⁴

42
- 28444439498
- C. Weaver http://www.simplescalar.org/spec2000.html, SPEC2000 binaries, 2004.
- (2004) SPEC2000 Binaries
- Weaver, C.¹

43
- 0030681129
- Designing high bandwidth on-chip caches
- K.M. Wilson and K. Olukotun, "Designing High Bandwidth On-Chip Caches," Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 121-132, 1997.
- (1997) Proc. 24th Ann. Int'l Symp. Computer Architecture , pp. 121-132
- Wilson, K.M.¹ Olukotun, K.²

44
- 0003999721
- Technical Report UW CSE 97-03-04, Univ. of Washington, Feb
- W. Wong and J.-L. Baer, "DRAM On-Chip Caching," Technical Report UW CSE 97-03-04, Univ. of Washington, Feb. 1997.
- (1997) DRAM On-Chip Caching
- Wong, W.¹ Baer, J.-L.²

45
- 0034581198
- Modified LRU policies for improving second-level cache behavior
- W.A. Wong and J.-L. Baer, "Modified LRU Policies for Improving Second-Level Cache Behavior," Proc. Sixth Int'l Symp. High-Performance Computer Architecture, pp. 49-60, 2000.
- (2000) Proc. Sixth Int'l Symp. High-Performance Computer Architecture , pp. 49-60
- Wong, W.A.¹ Baer, J.-L.²

46
- 6344293529
- Technical Report CSL-TR-97-731, Computer Systems Laboratory, Stanford Univ., Aug
- T. Yamauchi, L. Hammond, and K. Olukotun, "A Single Chip Multiprocessor Integrated with High Density DRAM," Technical Report CSL-TR-97-731, Computer Systems Laboratory, Stanford Univ., Aug. 1997.
- (1997) A Single Chip Multiprocessor Integrated With High Density DRAM
- Yamauchi, T.¹ Hammond, L.² Olukotun, K.³

47
- 0026867221
- Alternative implementations of two-level adaptive branch prediction
- T.-Y. Yeh and Y.N. Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 124-134, 1992.
- (1992) Proc. 19th Ann. Int'l Symp. Computer Architecture , pp. 124-134
- Yeh, T.-Y.¹ Patt, Y.N.²

48
- 0032651228
- Speculation techniques for improving load related instruction scheduling
- A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, "Speculation Techniques for Improving Load Related Instruction Scheduling," Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 42-53, 1999.
- (1999) Proc. 26th Ann. Int'l Symp. Computer Architecture , pp. 42-53
- Yoaz, A.¹ Erez, M.² Ronen, R.³ Jourdan, S.⁴

49
- 0034460897
- A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
- Z. Zhang, Z. Zhu, and X. Zhang, "A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data Locality," Proc. 33rd IEEE/ACM Int'l Symp. Microarchitecture, pp. 32-41, 2000.
- (2000) Proc. 33rd IEEE/ACM Int'l Symp. Microarchitecture , pp. 32-41
- Zhang, Z.¹ Zhu, Z.² Zhang, X.³

50
- 0035389657
- Cached DRAM: A simple and effective technique for memory access latency reduction on ILP processors
- July/Aug
- Z. Zhang, Z. Zhu, and X. Zhang, "Cached DRAM: A Simple and Effective Technique for Memory Access Latency Reduction on ILP Processors," IEEE Micro, vol. 21, no. 4, pp. 22-32, July/Aug. 2001.
- (2001) IEEE Micro , vol.21 , Issue.4 , pp. 22-32
- Zhang, Z.¹ Zhu, Z.² Zhang, X.³

51
- 84949752992
- Fine-grain priority scheduling on multi-channel memory systems
- Z. Zhu, Z. Zhang, and X. Zhang, "Fine-Grain Priority Scheduling on Multi-Channel Memory Systems," Proc. Eighth Int'l Symp. High-Performance Computer Architecture, pp. 107-116, 2002.
- (2002) Proc. Eighth Int'l Symp. High-Performance Computer Architecture , pp. 107-116
- Zhu, Z.¹ Zhang, Z.² Zhang, X.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.