SCOPUS 정보 검색 플랫폼

Parallel Computing

Volumn 39, Issue 1, 2013, Pages 36-57

Accurate prediction of the behavior of multithreaded applications in shared caches

(3) Andrade, Diego a Fraguela, Basilio B a Doallo, Ramón a

a UNIVERSITY OF A CORUÑA (Spain)

Author keywords

Analytical modeling; Loop parallelism; Optimization; Parallel applications; Performance prediction; Shared caches

Indexed keywords

ACCURATE PREDICTION; CACHE ACCESS; LOOP LEVEL; LOOP PARALLELISM; MODEL PREDICTION; MULTI-CORES; MULTI-THREADED APPLICATION; PARALLEL APPLICATION; PARALLEL EXECUTIONS; PARALLELIZATION STRATEGIES; PERFORMANCE GAIN; PERFORMANCE PREDICTION; SHARED CACHE; THEORETICAL PERFORMANCE;

MODELS; OPTIMIZATION; PROGRAM COMPILERS;

ANALYTICAL MODELS;

EID: 84872916559 PISSN: 01678191 EISSN: None Source Type: Journal
DOI: 10.1016/j.parco.2012.11.003 Document Type: Article

Times cited : (9)

References (28)

1
- 84874544712
- Morgan & Claypool Publishers
- R. Balasubramonian, N.P. Jouppi, and N. Muralimanohar Multi-Core Cache Hierarchies, Synthesis Lectures on Computer Architecture 2011 Morgan & Claypool Publishers
- (2011) Multi-Core Cache Hierarchies, Synthesis Lectures on Computer Architecture
- Balasubramonian, R.¹ Jouppi, N.P.² Muralimanohar, N.³

2
- 0037340135
- Probabilistic miss equations: Evaluating memory hierarchy performance
- B.B. Fraguela, R. Doallo, and E.L. Zapata Probabilistic miss equations: evaluating memory hierarchy performance IEEE Trans. Comput. 52 3 2003 321 336
- (2003) IEEE Trans. Comput. , vol.52 , Issue.3 , pp. 321-336
- Fraguela, B.B.¹ Doallo, R.² Zapata, E.L.³

3
- 3042664555
- Efficient and accurate analytical modeling of whole-program data cache behavior
- J. Xue, and X. Vera Efficient and accurate analytical modeling of whole-program data cache behavior IEEE Trans. Comput. 53 5 2004 547 566
- (2004) IEEE Trans. Comput. , vol.53 , Issue.5 , pp. 547-566
- Xue, J.¹ Vera, X.²

4
- 1142268809
- Estimating cache misses and locality using stack distances
- C. Cascaval, D. Padua, Estimating cache misses and locality using stack distances, in: Proc. 17th Intl. Conf. on Supercomputing, 2003, pp. 150-159.
- (2003) Proc. 17th Intl. Conf. on Supercomputing , pp. 150-159
- Cascaval, C.¹ Padua, D.²

5
- 1842635044
- A fast and accurate framework to analyze and optimize cache memory behavior
- X. Vera, N. Bermudo, J. Llosa, and A. Gonzalez A fast and accurate framework to analyze and optimize cache memory behavior ACM Trans. Prog. Lang. Syst. 26 2 2004 263 300
- (2004) ACM Trans. Prog. Lang. Syst. , vol.26 , Issue.2 , pp. 263-300
- Vera, X.¹ Bermudo, N.² Llosa, J.³ Gonzalez, A.⁴

6
- 70449640953
- Automatic tuning of discrete fourier transforms driven by analytical modeling
- B.B. Fraguela, Y. Voronenko, M. Püschel, Automatic tuning of discrete fourier transforms driven by analytical modeling, in: Intl. Conf. on Parallel Arch. and Compilation, Techniques, 2009, pp. 271-280.
- (2009) Intl. Conf. on Parallel Arch. and Compilation, Techniques , pp. 271-280
- Fraguela, B.B.¹ Voronenko, Y.² Püschel, M.³

7
- 70349161684
- OpenMP Architecture Review Board
- OpenMP Architecture Review Board, OpenMP Program Interface, Version 3.0, 2008.
- (2008) OpenMP Program Interface, Version 3.0

8
- 1342264156
- A compiler tool to predict memory hierarchy performance of scientific codes
- B.B. Fraguela, R. Doallo, J. Touriño, and E.L. Zapata A compiler tool to predict memory hierarchy performance of scientific codes Parallel Comput. 30 2 2004 225 248
- (2004) Parallel Comput. , vol.30 , Issue.2 , pp. 225-248
- Fraguela, B.B.¹ Doallo, R.² Touriño, J.³ Zapata, E.L.⁴

9
- 36148935583
- Automated and accurate cache behavior analysis for codes with irregular access patterns
- DOI 10.1002/cpe.1173
- D. Andrade, M. Arenaz, B.B. Fraguela, J. Touriño, and R. Doallo Automated and accurate cache behavior analysis for codes with irregular access patterns Concur. Comput. Pract. Exp. 19 18 2007 2407 2423 (Pubitemid 350114292)
- (2007) Concurrency Computation Practice and Experience , vol.19 , Issue.18 , pp. 2407-2423
- Andrade, D.¹ Arenaz, M.² Fraguela, B.B.³ Tourino, J.⁴ Doallo, R.⁵

10
- 70449690852
- Optimal tile size selection guided by analytical models
- John von Neumann Institute for Computing (NIC)
- B.B. Fraguela, M.G. Carmueja, and D. Andrade Optimal tile size selection guided by analytical models Proc. of Parallel Computing vol. 33 2005 John von Neumann Institute for Computing (NIC) 565 572
- (2005) Proc. of Parallel Computing , vol.33 , pp. 565-572
- Fraguela, B.B.¹ Carmueja, M.G.² Andrade, D.³

11
- 33644879118
- J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, P. Montesinos, SESC Simulator, 2005.
- (2005) SESC Simulator
- Renau, J.¹ Fraguela, B.² Tuck, J.³ Liu, W.⁴ Prvulovic, M.⁵ Ceze, L.⁶ Sarangi, S.⁷ Sack, P.⁸ Strauss, K.⁹ Montesinos, P.¹⁰

12
- 77956506076
- I. Corporation
- I. Corporation, Intel 64 and ia-32 Architectures, Software Developers Manual, vol. 3a, System Programming Guide, Part 1, 2010.
- (2010) Intel 64 and ia-32 Architectures, Software Developers Manual, Vol. 3a, System Programming Guide, Part 1

13
- 14944380098
- Generating cache hints for improved program efficiency
- DOI 10.1016/j.sysarc.2004.09.004, PII S1383762104001213
- K. Beyls, and E.H. D'Hollander Generating cache hints for improved program efficiency J. Syst. Archit. 51 4 2005 223 250 (Pubitemid 40371114)
- (2005) Journal of Systems Architecture , vol.51 , Issue.4 , pp. 223-250
- Beyls, K.¹ D'Hollander, E.H.²

14
- 77954699826
- Static reuse distances for locality-based optimizations in matlab
- A. Chauhan, C.-Y. Shei, Static reuse distances for locality-based optimizations in matlab, in: Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, 2010, pp. 295-304.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10 , pp. 295-304
- Chauhan, A.¹ Shei, C.-Y.²

15
- 10444238444
- Fair cache sharing and partitioning in a chip multiprocessor architecture
- S. Kim, D. Chandra, Y. Solihin, Fair cache sharing and partitioning in a chip multiprocessor architecture, in: Proc. 13th Intl. Conf. on Parallel Architectures and Compilation, Techniques, 2004, pp. 111-122.
- (2004) Proc. 13th Intl. Conf. on Parallel Architectures and Compilation, Techniques , pp. 111-122
- Kim, S.¹ Chandra, D.² Solihin, Y.³

16
- 78650744021
- Online cache modeling for commodity multicore processors
- R. West, P. Zaroo, C.A. Waldspurger, and X. Zhang Online cache modeling for commodity multicore processors SIGOPS Oper. Syst. Rev. 44 2010 19 29
- (2010) SIGOPS Oper. Syst. Rev. , vol.44 , pp. 19-29
- West, R.¹ Zaroo, P.² Waldspurger, C.A.³ Zhang, X.⁴

17
- 21244474546
- Predicting inter-thread cache contention on a chip multi-processor architecture
- Proceedings - 11th International Symposium on High-Performance Computer Architecture, HPCA-11 2005
- D. Chandra, F. Guo, S. Kim, Y. Solihin, Predicting inter-thread cache contention on a chip multi-processor architecture, in: Proc. 11th Intl. Symp. on High-Performance Computer Arch., 2005, pp. 340-351. (Pubitemid 41731513)
- (2005) Proceedings - International Symposium on High-Performance Computer Architecture , pp. 340-351
- Chandra, D.¹ Guo, F.² Kim, S.³ Solihin, Y.⁴

18
- 33746292710
- Predicting cache space contention in utility computing servers
- Y. Solihin, F. Guo, and S. Kim Predicting cache space contention in utility computing servers Int. Parallel Distrib. Process. Symp. 11 2005 226b
- (2005) Int. Parallel Distrib. Process. Symp. , vol.11
- Solihin, Y.¹ Guo, F.² Kim, S.³

19
- 77952568608
- Cache contention and application performance prediction for multi-core systems
- C. Xu, X. Chen, R. Dick, Z. Mao, Cache contention and application performance prediction for multi-core systems, in: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), 2010, pp. 76-86.
- (2010) IEEE International Symposium on Performance Analysis of Systems Software (ISPASS) , pp. 76-86
- Xu, C.¹ Chen, X.² Dick, R.³ Mao, Z.⁴

20
- 47249151449
- L2 cache modeling for scientific applications on chip multi-processors
- F. Song, S. Moore, J. Dongarra, L2 cache modeling for scientific applications on chip multi-processors, in: Intl. Conf. on Parallel Processing, 2007, pp. 51-51.
- (2007) Intl. Conf. on Parallel Processing , pp. 51-51
- Song, F.¹ Moore, S.² Dongarra, J.³

21
- 78149254514
- Accelerating multicore reuse distance analysis with sampling and parallelization
- ACM New York, NY, USA
- D.L. Schuff, M. Kulkarni, and V.S. Pai Accelerating multicore reuse distance analysis with sampling and parallelization Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT'10 2010 ACM New York, NY, USA 53 64
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT'10 , pp. 53-64
- Schuff, D.L.¹ Kulkarni, M.² Pai, V.S.³

22
- 76749137634
- Optimizing shared cache behavior of chip multiprocessors
- M. Kandemir, S.P. Muralidhara, S.H.K. Narayanan, Y. Zhang, O. Ozturk, Optimizing shared cache behavior of chip multiprocessors, in: Proc. 42nd Intl. Symposium on Microarchitecture, 2009, pp. 505-516.
- (2009) Proc. 42nd Intl. Symposium on Microarchitecture , pp. 505-516
- Kandemir, M.¹ Muralidhara, S.P.² Narayanan, S.H.K.³ Zhang, Y.⁴ Ozturk, O.⁵

23
- 79957454903
- On-chip cache hierarchy-aware tile scheduling for multicore machines
- J. Liu, Y. Zhang, W. Ding, M. Kandemir, On-chip cache hierarchy-aware tile scheduling for multicore machines, in: 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2011, pp. 161-170.
- (2011) 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO) , pp. 161-170
- Liu, J.¹ Zhang, Y.² Ding, W.³ Kandemir, M.⁴

24
- 67650069905
- Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors
- M.M. Baskaran, N. Vydyanathan, U. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan, Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors, in: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, 2009, pp. 219-228.
- (2009) Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009 , pp. 219-228
- Baskaran, M.M.¹ Vydyanathan, N.² Bondhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

25
- 77951616746
- Is reuse distance applicable to data locality analysis on chip multiprocessors?
- Y. Jiang, E.Z. Zhang, K. Tian, X. Shen, Is reuse distance applicable to data locality analysis on chip multiprocessors?, in: Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler, Construction, CC'10/ETAPS'10, 2010, pp. 264-282.
- (2010) Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler, Construction, CC'10/ETAPS'10 , pp. 264-282
- Jiang, Y.¹ Zhang, E.Z.² Tian, K.³ Shen, X.⁴

26
- 70449652924
- Soft-olp: Improving hardware cache performance through software-controlled object-level partitioning
- Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Soft-olp: improving hardware cache performance through software-controlled object-level partitioning, in: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009, 2009, pp. 246-257.
- (2009) Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, PACT 2009 , pp. 246-257
- Lu, Q.¹ Lin, J.² Ding, X.³ Zhang, Z.⁴ Zhang, X.⁵ Sadayappan, P.⁶

27
- 57749186047
- Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, P. Sadayappan, Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems, in: IEEE 14th International Symposium on High Performance Computer Architecture, HPCA 2008, 2008, pp. 367-378.
- (2008) IEEE 14th International Symposium on High Performance Computer Architecture, HPCA 2008 , pp. 367-378
- Lin, J.¹ Lu, Q.² Ding, X.³ Zhang, Z.⁴ Zhang, X.⁵ Sadayappan, P.⁶

28
- 66749168716
- Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer
- L. Soares, D. Tam, M. Stumm, Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer, in: 41st IEEE/ACM International Symposium on Microarchitecture, MICRO-41, 2008, pp. 258-269.
- (2008) 41st IEEE/ACM International Symposium on Microarchitecture, MICRO-41 , pp. 258-269
- Soares, L.¹ Tam, D.² Stumm, M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.