SCOPUS 정보 검색 플랫폼

Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors

Volumn , Issue , 2009, Pages 282-288

Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

(2) Meng, Jiayuan a Skadron, Kevin a

a University of Virginia (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BASELINE CONFIGURATIONS; CACHE ORGANIZATION; CONFLICT MISS; DATA PARALLEL; DATA SHARING; DIRECTORY PROTOCOL; HIGH BANDWIDTH; MANY-CORE; MEMORY ALLOCATORS; PRIVATE DATA; SHARED DIRECTORIES; STACK RANDOMIZATION;

INTERCONNECTION NETWORKS;

EID: 77950987305 PISSN: 10636404 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCD.2009.5413143 Document Type: Conference Paper

Times cited : (27)

References (36)

1
- 33746683732
- Maximizing CMP throughput with mediocre cores
- J. D. Davis, J. Laudon, and K. Olukotun, "Maximizing CMP throughput with mediocre cores," in. PACT, 2005, pp. 51-62.
- (2005) PACT , pp. 51-62
- Davis, J.D.¹ Laudon, J.² Olukotun, K.³

2
- 20344374162
- Niagara: A 32-way multithreaded Sparc processor
- P. Kongetira, K. Amgaran, and K. Olukotun, "Niagara: a 32-way multithreaded Sparc processor," IEEE Micro, vol.25, no.2, pp. 21-29, 2005.
- (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
- Kongetira, P.¹ Amgaran, K.² Olukotun, K.³

3
- 49249086142
- Larrabee: A many-core x86 architecture for visual, computing
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual, computing," ACM TOG, vol.27, no.3, pp. 1-15, 2008.
- (2008) ACM TOG , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

4
- 0028201665
- Tradeoffs in two-level on-chip caching
- Apr.
- N. P. Jouppi and S. J. E. Wilton, "Tradeoffs in two-level on-chip caching," in ISCA, Apr. 1994, pp. 34-45.
- (1994) ISCA , pp. 34-45
- Jouppi, N.P.¹ Wilton, S.J.E.²

5
- 70449695861
- Non-inclusion property in multi-level caches revisited
- M. Zahran, K. Albayraktaroglu, and M. Franklin, "Non-inclusion property in multi-level caches revisited," in Int'l J. Computers and their Applications, no.2, 2007, pp. 99-108.
- (2007) Int'l J. Computers and Their Applications , Issue.2 , pp. 99-108
- Zahran, M.¹ Albayraktaroglu, K.² Franklin, M.³

6
- 34547282756
- Reducing verification complexity of a multicore coherence protocol using assume/guarantee
- X. Chen, Y. Yang, G. Gopalakrishnan, and C.-T. Chou, "Reducing verification complexity of a multicore coherence protocol using assume/guarantee," in FMCAD, 2006, pp. 81-88.
- (2006) FMCAD , pp. 81-88
- Chen, X.¹ Yang, Y.² Gopalakrishnan, G.³ Chou, C.-T.⁴

7
- 70449131864
- Corey: An operating system for many cores
- December
- S. B. Wickizer, H. Chen, R. Chen, Y Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y Dai, Y. Zhang, and Z. Zhang, "Corey: An operating system for many cores," in OSDI, December 2008.
- (2008) OSDI
- Wickizer, S.B.¹ Chen, H.² Chen, R.³ Mao, Y.⁴ Kaashoek, F.⁵ Morris, R.⁶ Pesterev, A.⁷ Stein, L.⁸ Wu, M.⁹ Dai, Y.¹⁰ Zhang, Y.¹¹ Zhang, Z.¹²

8
- 35448932427
- Intel Corporation
- "Intel threading building blocks," Intel Corporation.
- Intel Threading Building Blocks

9
- 34247273005
- Scalable locality-conscious multithreaded memory allocation
- S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos, "Scalable locality-conscious multithreaded memory allocation," in ISMM, 2006, pp. 84-94.
- (2006) ISMM , pp. 84-94
- Schneider, S.¹ Antonopoulos, C.D.² Nikolopoulos, D.S.³

10
- 17544362263
- Hoard: A scalable memory allocator for multithreaded applications
- E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson, "Hoard: a scalable memory allocator for multithreaded applications," SIGPLAN Not, vol.35, no.11, pp. 117-128, 2000.
- (2000) SIGPLAN Not , vol.35 , Issue.11 , pp. 117-128
- Berger, E.D.¹ McKinley, K.S.² Blumofe, R.D.³ Wilson, P.R.⁴

11
- 84949769332
- A new memory monitoring scheme for memory-aware scheduling and partitioning
- G. E. Suh, S. Devadas, and L. Rudolph, "A new memory monitoring scheme for memory-aware scheduling and partitioning," in HPCA, 2002, p. 117.
- (2002) HPCA , pp. 117
- Suh, G.E.¹ Devadas, S.² Rudolph, L.³

12
- 0000444590
- Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
- J. Torrellas, A. Tucker, and A. Gupta, "Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors," JPDC, vol.24, no.2, pp. 139-151, 1995.
- (1995) JPDC , vol.24 , Issue.2 , pp. 139-151
- Torrellas, J.¹ Tucker, A.² Gupta, A.³

13
- 0028754497
- Affinity scheduling of unbalanced workloads
- S. Subramaniam and D. L. Eager, "Affinity scheduling of unbalanced workloads," in SC, .1.994, pp. 214-226.
- SC,.1994 , pp. 214-226
- Subramaniam, S.¹ Eager, D.L.²

14
- 14844328033
- On the effectiveness of address-space randomization
- H. Shacham, E. jin Goh, N. Modadugu, B. Pfaff, and D. Boneh, "On the effectiveness of address-space randomization," in CCS, 2004, pp. 298-307.
- (2004) CCS , pp. 298-307
- Shacham, H.¹ Jin Goh, E.² Modadugu, N.³ Pfaff, B.⁴ Boneh, D.⁵

15
- 35348920021
- Adaptive insertion policies for high performance caching
- DOI 10.1145/1250662.1250709, ISCA'07: 34th Annual International Symposium on Computer Architecture, Conference Proceedings
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Einer, "Adaptive insertion policies for high performance caching, ISCA, 2007, pp. 381-391. (Pubitemid 47582119)
- (2007) Proceedings - International Symposium on Computer Architecture , pp. 381-391
- Qureshi, M.K.¹ Jaleel, A.² Patt, Y.N.³ Steely Jr., S.C.⁴ Emer, J.⁵

16
- 0034592592
- Region-based caching: An energy-delay efficient memory architecture for embedded processors
- H. S. Lee and G. S. Tyson, "Region-based caching: an energy-delay efficient memory architecture for embedded processors," in CASES, 2000, pp. 120-127.
- (2000) CASES , pp. 120-127
- Lee, H.S.¹ Tyson, G.S.²

17
- 34548316872
- A novel technique to use scratch-pad memory for stack management
- DOI 10.1109/DATE.2007.364509, 4212019, Proceedings - 2007 Design, Automation and Test in Europe Conference and Exhibition, DATE 2007
- S. Park, H. woo Park, and S. Ha, "A novel technique to use scratch-pad memory for stack management," in DATE, 2007, pp. 1478-1483. (Pubitemid 47334172)
- (2007) Proceedings -Design, Automation and Test in Europe, DATE , pp. 1478-1483
- Park, S.¹ Park, H.-W.² Ha, S.³

18
- 23044524059
- On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems
- July
- P. R. Panda, N. D. Dutt, and A. Nicolau, "On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems," ACM TODAES, vol, 5, no.3, pp. 682-704, July 2000.
- (2000) ACM TODAES , vol.5 , Issue.3 , pp. 682-704
- Panda, P.R.¹ Dutt, N.D.² Nicolau, A.³

19
- 77951001644
- A localizing directory coherence protocol
- C. McCurdy and C. Fischer, "A localizing directory coherence protocol," in. WMPI, 2004, pp. 23-29.
- (2004) WMPI , pp. 23-29
- McCurdy, C.¹ Fischer, C.²

20
- 42549168687
- Exploring the cache design space for large scale CMPs
- L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell, "Exploring the cache design space for large scale CMPs," dasCMP, vol.33, no.4, pp. 24-33, 2005.
- (2005) DasCMP , vol.33 , Issue.4 , pp. 24-33
- Hsu, L.¹ Iyer, R.² Makineni, S.³ Reinhardt, S.⁴ Newell, D.⁵

21
- 33845903561
- Cooperative caching for Chip Multiprocessors
- J. Chang and G. S. Sohi, "Cooperative caching for Chip Multiprocessors," in ISCA, 2006, pp. 264-276.
- (2006) ISCA , pp. 264-276
- Chang, J.¹ Sohi, G.S.²

22
- 27544495466
- Victim replication: Maximizing capacity while hiding wire delay in tiled Chip Multiprocessors
- M. Zhang and K. Asanovic, "Victim replication: Maximizing capacity while hiding wire delay in tiled Chip Multiprocessors," in ISCA, 2005, pp. 336-345.
- (2005) ISCA , pp. 336-345
- Zhang, M.¹ Asanovic, K.²

23
- 77950982560
- Victim migration: Dynamically adapting between private and shared CMP caches
- - "Victim migration: Dynamically adapting between private and shared CMP caches," in MIT Technical Report MIT-CSAIL-TR-2005-064, MIT-LCS-TR-.1006, 2005.
- (2005) MIT Technical Report MIT-CSAIL-TR-2005-064, MIT-LCS-TR-1006

24
- 33846535493
- The M5 simulator: Modeling networked, systems
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saldi, and S. K. Reinhardt, "The M5 simulator: Modeling networked, systems," IEEE Micro, vol.26, no.4, 2006.
- (2006) IEEE Micro , vol.26 , Issue.4
- Binkert, N.L.¹ Dreslinski, R.G.² Hsu, L.R.³ Lim, K.T.⁴ Saldi, A.G.⁵ Reinhardt, S.K.⁶

25
- 70449699817
- N. Corporation, "GeForce GTX 280 Specifications," 2008.
- (2008) GeForce GTX 280 Specifications
- Corporation, N.¹

26
- 34547664408
- Cacti 4.0
- D- Tarjan, S. Thoziyoor, and N. P. Jouppi, "Cacti 4.0," HP Laboratories Palo Alto, Tech. Rep. HPL-2006-2086, 2006.
- (2006) HP Laboratories Palo Alto, Tech. Rep. HPL-2006-2086
- Tarjan, D.¹ Thoziyoor, S.² Jouppi, N.P.³

27
- 33845423872
- An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
- C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," ASPLOS, vol.36, no.5, 2002.
- (2002) ASPLOS , vol.36 , Issue.5
- Kim, C.¹ Burger, D.² Keckler, S.W.³

28
- 36849004429
- Bringing NoCs to 65 nm
- A. Pullini, F. Angiollni, S. Murali, D. Atienza, G. D. Micheli, and L. Benini, "Bringing NoCs to 65 nm," IEEE Micro, vol.27, no.5, 2007.
- (2007) IEEE Micro , vol.27 , Issue.5
- Pullini, A.¹ Angiollni, F.² Murali, S.³ Atienza, D.⁴ Micheli, G.D.⁵ Benini, L.⁶

29
- 0002255264
- SPLASH: Stanford parallel applications for shared memory
- Mar.
- J. P. Singh, W.-D. Weber, and A. Gupta, "SPLASH: Stanford parallel applications for shared memory," ISCA, vol.20, no.1, pp. 5-44, Mar. 1995.
- (1995) ISCA , vol.20 , Issue.1 , pp. 5-44
- Singh, J.P.¹ Weber, W.-D.² Gupta, A.³

30
- 51449118065
- A performance study of general purpose applications on graphics processors using CUDA
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, "A performance study of general purpose applications on graphics processors using CUDA," JPDC, 2008.
- (2008) JPDC
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

31
- 47349098275
- Minebench: A benchmark suite for data mining workloads
- Oct.
- R. Narayanan, B. Ozislkyilmaz, J. Zambreno, G. Memik, and A. Choudhary, "Minebench: A benchmark suite for data mining workloads," WC, pp. 182-188, Oct. 2006.
- (2006) WC , pp. 182-188
- Narayanan, R.¹ Ozislkyilmaz, B.² Zambreno, J.³ Memik, G.⁴ Choudhary, A.⁵

32
- 0004302191
- 3rd ed. Morgan Kaufmann
- J. L. Hennessy and D. A. Patterson, Computer Architecture - A Quantitative Approach, 3rd ed. Morgan Kaufmann, 2003.
- (2003) Computer Architecture - A Quantitative Approach
- Hennessy, J.L.¹ Patterson, D.A.²

33
- 77950964185
- "Reference guide: R700 family instruction set architecture," http://developer.amd.com/gpu-assets/R700FamilyJnstnjction.Set-Architecture.pdf. 2009.
- Reference Guide: R700 Family Instruction Set Architecture

34
- 77955733050
- NVIDIA Corporation
- "NVIDIA compute PTX: Parallel thread execution," NVIDIA Corporation, 2007.
- (2007) NVIDIA Compute PTX: Parallel Thread Execution

35
- 70349937457
- citeseer.ist.psu.edu/476450.html.October
- L. Dagum, "OpenMP: A proposed industry standard API for shared memory programming," citeseer.ist.psu.edu/476450.html.October 1997.
- (1997) OpenMP: A Proposed Industry Standard API for Shared Memory Programming
- Dagum, L.¹

36
- 42149168865
- NVIDIA Corporation
- "NVIDIA CUDA compute unified device architecture programming guide," NVIDIA Corporation, 2007.
- (2007) NVIDIA CUDA Compute Unified Device Architecture Programming Guide

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.