SCOPUS 정보 검색 플랫폼

2009 11th IEEE International Conference on High Performance Computing and Communications, HPCC 2009

Volumn , Issue , 2009, Pages 188-195

Balancing locality and parallelism on shared-cache mulit-core systems

(2) Cade, Michael Jason a Qasem, Apan a

a TEXAS STATE UNIVERSITY (United States)

Author keywords

Memory hierarchy optimization; Parallelism; Performance tuning; Shared cache

Indexed keywords

CONCURRENT THREADS; CORE SYSTEMS; DATA LOCALITY; DATA REUSE; HARDWARE FEATURES; MEMORY HIERARCHY OPTIMIZATION; MULTI-CORE SYSTEMS; MULTICORE ARCHITECTURES; NEW OPPORTUNITIES; NUMBER OF THREADS; PARALLELISM; PERFORMANCE CAPABILITY; PERFORMANCE ENHANCING; PERFORMANCE OPTIMIZATIONS; PERFORMANCE POTENTIALS; PERFORMANCE TUNING; PROCESSING UNITS; SHARED-CACHE; STATE OF THE ART; THREAD LEVEL PARALLELISM;

COMPUTER SCIENCE; MICROPROCESSOR CHIPS; PROFITABILITY; SOFTWARE ARCHITECTURE; SYNCHRONIZATION; TUNING; WINDOWS;

CACHE MEMORY;

EID: 70449562109 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPCC.2009.61 Document Type: Conference Paper

Times cited : (9)

References (24)

1
- 85015899515
- The price of performance
- L. A. Barroso, "The price of performance," Queue, vol.3, no.7, pp. 48-53, 2005.
- (2005) Queue , vol.3 , Issue.7 , pp. 48-53
- Barroso, L.A.¹

2
- 45449096678
- Parallel tiled QR factorization for multicore architectures
- A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, "Parallel Tiled QR Factorization for Multicore Architectures," LECTURE NOTES IN COMPUTER SCIENCE, vol.4967, p. 639, 2008.
- (2008) Lecture Notes in Computer Science , vol.4967 , pp. 639
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

3
- 0028549474
- Improving the ratio of memory operations to floating-point operations in loops
- S. Carr and K. Kennedy, "Improving the ratio of memory operations to floating-point operations in loops," ACM Trans. Program. Lang. Syst., vol.16, no.6, pp. 1768-1810, 1994.
- (1994) ACM Trans. Program. Lang. Syst. , vol.16 , Issue.6 , pp. 1768-1810
- Carr, S.¹ Kennedy, K.²

4
- 2142702913
- Memory-access-aware data structure transformations for embedded software with dynamic data accesses
- March
- E. Daylight, D. Atienza, A. Vandecappelle, F. Catthoor, and J. Mendias, "Memory-access-aware data structure transformations for embedded software with dynamic data accesses," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.12, no.3, pp. 269-280, March 2004.
- (2004) IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol.12 , Issue.3 , pp. 269-280
- Daylight, E.¹ Atienza, D.² Vandecappelle, A.³ Catthoor, F.⁴ Mendias, J.⁵

5
- 84981274540
- Improving effective bandwidth through compiler enhancement of global cache reuse
- C. Ding and K. Kennedy, "Improving effective bandwidth through compiler enhancement of global cache reuse," Parallel and Distributed Processing Symposium, International, vol.1, p. 10038b, 2001.
- (2001) Parallel and Distributed Processing Symposium, International , vol.1
- Ding, C.¹ Kennedy, K.²

6
- 47349127017
- The impact of multicore on computational science software
- J. Dongarra, D. Gannon, G. Fox, and K. Kenned, "The impact of multicore on computational science software," CTWatch Quarterly, vol.3, pp. 3-10, 2007.
- (2007) CTWatch Quarterly , vol.3 , pp. 3-10
- Dongarra, J.¹ Gannon, D.² Fox, G.³ Kenned, K.⁴

7
- 34548803454
- Using PAPI for hardware performance monitoring on Linux systems
- J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra, "Using PAPI for hardware performance monitoring on Linux systems," in Conference on Linux Clusters: The HPC Revolution, 2001.
- (2001) Conference on Linux Clusters: The HPC Revolution
- Dongarra, J.¹ London, K.² Moore, S.³ Mucci, P.⁴ Terpstra, D.⁵

8
- 20344401552
- Chip makers turn to multicore processors
- D. Greer, "Chip makers turn to multicore processors," IEEE Computer, vol.38, no.5, pp. 11-13, 2005.
- (2005) IEEE Computer , vol.38 , Issue.5 , pp. 11-13
- Greer, D.¹

9
- 77957817744
- Streamware: Programming general-purpose multicore processors using streams
- New York, NY, USA: ACM
- J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum, "Streamware: programming general-purpose multicore processors using streams," in ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems. New York, NY, USA: ACM, 2008, pp. 297-307.
- (2008) ASPLOS XIII: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 297-307
- Gummaraju, J.¹ Coburn, J.² Turner, Y.³ Rosenblum, M.⁴

10
- 0035187053
- Exploring the design space of future CMPs
- J. Huh, D. Burger, and S. Keckler, "Exploring the design space of future CMPs," Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 199-210, 2001.
- (2001) Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques , pp. 199-210
- Huh, J.¹ Burger, D.² Keckler, S.³

11
- 33748870740
- Last level cache (llc) performance of data mining workloads on a cmp-a case study of parallel bioinformatics workloads
- A. Jaleel, M. Mattina, and B. Jacob, "Last level cache (llc) performance of data mining workloads on a cmp-a case study of parallel bioinformatics workloads," in HPCA '07: Proceedings of the 12th International Symposium on High Performance Computer Architecture (HPCA 2007), 2007.
- (2007) HPCA '07: Proceedings of the 12th International Symposium on High Performance Computer Architecture (HPCA 2007)
- Jaleel, A.¹ Mattina, M.² Jacob, B.³

12
- 50249115185
- Data locality enhancement for cmps
- Piscataway, NJ, USA: IEEE Press
- M. Kandemir, "Data locality enhancement for cmps," in ICCAD '07: Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design. Piscataway, NJ, USA: IEEE Press, 2007, pp. 155-159.
- (2007) ICCAD '07: Proceedings of the 2007 IEEE/ACM International Conference on Computer-aided Design , pp. 155-159
- Kandemir, M.¹

13
- 3142679577
- A data locality optimizing algorithm
- M. S. Lam and M. E. Wolf, "A data locality optimizing algorithm," SIGPLAN Not., vol.39, no.4, pp. 442-459, 2004.
- (2004) SIGPLAN Not. , vol.39 , Issue.4 , pp. 442-459
- Lam, M.S.¹ Wolf, M.E.²

14
- 33750050005
- Performance/watt: The new server focus
- J. Laudon, "Performance/watt: the new server focus," SIGARCH Comput. Archit. News, vol.33, no.4, pp. 5-13, 2005.
- (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.4 , pp. 5-13
- Laudon, J.¹

15
- 27944449256
- Locality-conscious workload assignment for array-based computations in mpsoc architectures
- NewYork, NY, USA: ACM
- F. Li and M. Kandemir, "Locality-conscious workload assignment for array-based computations in mpsoc architectures," in DAC '05: Proceedings of the 42nd annual conference on Design automation. NewYork, NY, USA: ACM, 2005, pp. 95-100.
- (2005) DAC '05: Proceedings of the 42nd Annual Conference on Design Automation , pp. 95-100
- Li, F.¹ Kandemir, M.²

16
- 2342468635
- Organizing the last line of defense before hitting the memory wall for CMPs
- IEEE Computer SocietyWashington, DC, USA
- C. Liu, A. Sivasubramaniam, and M. Kandemir, "Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs," in Proceedings of the 10th International Symposium on High Performance Computer Architecture. IEEE Computer SocietyWashington, DC, USA, 2004, p. 176.
- (2004) Proceedings of the 10th International Symposium on High Performance Computer Architecture , pp. 176
- Liu, C.¹ Sivasubramaniam, A.² Kandemir, M.³

17
- 0030259458
- The case for a single-chip multiprocessor
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The case for a single-chip multiprocessor," SIGPLAN Not., vol.31, no.9, pp. 2-11, 1996.
- (1996) SIGPLAN Not. , vol.31 , Issue.9 , pp. 2-11
- Olukotun, K.¹ Nayfeh, B.A.² Hammond, L.³ Wilson, K.⁴ Chang, K.⁵

18
- 25844437046
- Power5 system microarchitecture
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner, "Power5 system microarchitecture," IBM Journal of Research and Development, vol.49, no.4-5, pp. 505-522, 2005.
- (2005) IBM Journal of Research and Development , vol.49 , Issue.4-5 , pp. 505-522
- Sinharoy, B.¹ Kalla, R.N.² Tendler, J.M.³ Eickemeyer, R.J.⁴ Joyner, J.B.⁵

19
- 35048834531
- Bus-invert coding for low-power I/O
- M. Stan and W. Burleson, "Bus-invert coding for low-power I/O," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.3, no.1, pp. 49-58, 1995.
- (1995) IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol.3 , Issue.1 , pp. 49-58
- Stan, M.¹ Burleson, W.²

20
- 11844277605
- The synchronized pipelined parallelism model
- S. N. Vadlamani and S. F. Jenks, "The synchronized pipelined parallelism model." The 16th IASTED International Conference on Parallel and Distributed Computing and Systems, 2004.
- (2004) The 16th IASTED International Conference on Parallel and Distributed Computing and Systems
- Vadlamani, S.N.¹ Jenks, S.F.²

21
- 33745423635
- An accurate cost model for guiding data locality transformations
- X. Vera, J. Abella, J. Llosa, and A. González, "An accurate cost model for guiding data locality transformations," ACM Trans. Program. Lang. Syst., vol.27, no.5, pp. 946-987, 2005.
- (2005) ACM Trans. Program. Lang. Syst. , vol.27 , Issue.5 , pp. 946-987
- Vera, X.¹ Abella, J.² Llosa, J.³ González, A.⁴

22
- 35348861182
- Dramsim: A memory system simulator
- D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob, "Dramsim: a memory system simulator," SIGARCH Comput. Archit. News, vol.33, no.4, pp. 100-107, 2005.
- (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.4 , pp. 100-107
- Wang, D.¹ Ganesh, B.² Tuaycharoen, N.³ Baynes, K.⁴ Jaleel, A.⁵ Jacob, B.⁶

23
- 0029700806
- Power exploration for data dominated video applications
- S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, and L. IMEC, "Power exploration for data dominated video applications," Low Power Electronics and Design, 1996., International Symposium on, pp. 359-364, 1996.
- International Symposium on Low Power Electronics and Design 1996 , vol.1996 , pp. 359-364
- Wuytack, S.¹ Catthoor, F.² Nachtergaele, L.³ De Man, H.⁴ Imec, L.⁵

24
- 47249123399
- Cachescouts: Fine-grain monitoring of shared caches in cmp platforms
- Washington DC, USA: IEEE Computer Society
- L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell, "Cachescouts: Fine-grain monitoring of shared caches in cmp platforms," in PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). Washington, DC, USA: IEEE Computer Society, 2007, pp. 339-352.
- (2007) PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007) , pp. 339-352
- Zhao, L.¹ Iyer, R.² Illikkal, R.³ Moses, J.⁴ Makineni, S.⁵ Newell, D.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.