SCOPUS 정보 검색 플랫폼

ACM Transactions on Computer Systems

Volumn 31, Issue 1, 2013, Pages

Efficient reuse distance analysis of multicore scaling for loop-based parallel programs

(2) Wu, Meng Ju a Yeung, Donald a

a UNIVERSITY OF MARYLAND (United States)

Author keywords

Cache performance; Chip multiprocessors; Reuse distance

Indexed keywords

APPLICATION PARAMETERS; CACHE PERFORMANCE; CHIP MULTIPROCESSOR; EXISTING PROBLEMS; MULTI-CORE PROCESSOR; PREDICTION ACCURACY; REUSE DISTANCE; SCALING PREDICTION;

COMPUTER ARCHITECTURE; MICROPROCESSOR CHIPS; PARALLEL ARCHITECTURES;

FORECASTING;

EID: 84874865302 PISSN: 07342071 EISSN: 15577333 Source Type: Journal
DOI: 10.1145/2427631.2427632 Document Type: Article

Times cited : (28)

References (40)

1
- 77949484043
- Tile processor: Embedded multicore for networking and multimedia
- Agarwal, A., Bao, L., Brown, J., Edwards, B., Mattina, M., Miao, C.-C., Ramey, C., and Wentzlaff, D. 2007. Tile processor: Embedded multicore for networking and multimedia. In Proceedings of the Symposium on High Performance Chips (Hot Chips).
- (2007) Proceedings of the Symposium on High Performance Chips (Hot Chips)
- Agarwal, A.¹ Bao, L.² Brown, J.³ Edwards, B.⁴ Mattina, M.⁵ Miao, C.-C.⁶ Ramey, C.⁷ Wentzlaff, D.⁸

2
- 70649107128
- A communication characterisation of splash-2 and parsec
- IEEE Computer Society
- Barrow-Williams, N., Fensch, C., and Moore, S. 2009. A communication characterisation of splash-2 and parsec. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 86-97.
- (2009) Proceedings of the IEEE International Symposium on Workload Characterization , pp. 86-97
- Barrow-Williams, N.¹ Fensch, C.² Moore, S.³

3
- 33750837706
- A statistical multiprocessor cache model
- 1620793, ISPASS 2006: IEEE International Symposium on Performance Analysis of Systems and Software, 2006
- Berg, E., Zeffer, H., and Hagersten, E. 2006. A statistical multiprocessor cache model. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, 89-99. (Pubitemid 44711113)
- (2006) ISPASS 2006: IEEE International Symposium on Performance Analysis of Systems and Software, 2006 , vol.2006 , pp. 89-99
- Berg, E.¹ Zeffer, H.² Hagersten, E.³

4
- 56449124998
- PARSEC vs. SPLASH2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors
- IEEE Computer Society
- Bienia, C., Kumar, S., and Li, K. 2008a. PARSEC vs. SPLASH2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 47-56.
- (2008) Proceedings of the IEEE International Symposium on Workload Characterization , pp. 47-56
- Bienia, C.¹ Kumar, S.² Li, K.³

5
- 63549095070
- The PARSEC benchmark suite: Characterization and architectural implications
- ACM
- Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008b. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72-81.
- (2008) Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques , pp. 72-81
- Bienia, C.¹ Kumar, S.² Singh, J.P.³ Li, K.⁴

6
- 33846535493
- The M5 simulator: Modeling networked systems
- DOI 10.1109/MM.2006.82
- Binkert, N., Dreslinski, R., Hsu, L., Lim, K., Saidi, A., and Reinhardt, S. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52-60. (Pubitemid 46504889)
- (2006) IEEE Micro , vol.26 , Issue.4 , pp. 52-60
- Binkert, N.L.¹ Dreslinski, R.G.² Hsu, L.R.³ Lim, K.T.⁴ Saidi, A.G.⁵ Reinhardt, S.K.⁶

7
- 21244474546
- Predicting inter-thread cache contention on a chip multi-processor architecture
- Proceedings - 11th International Symposium on High-Performance Computer Architecture, HPCA-11 2005
- Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 340-351. (Pubitemid 41731513)
- (2005) Proceedings - International Symposium on High-Performance Computer Architecture , pp. 340-351
- Chandra, D.¹ Guo, F.² Kim, S.³ Solihin, Y.⁴

8
- 33746683732
- Maximizing CMP throughput with mediocre cores
- DOI 10.1109/PACT.2005.42, 1515580, 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
- Davis, J., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 51-62. (Pubitemid 44159727)
- (2005) Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT , vol.2005 , pp. 51-62
- Davis, J.D.¹ Laudon, J.² Olukotun, K.³

9
- 77954050277
- A composable model for analyzing locality of multi-threaded programs
- Ding, C. and Chilimbi, T. 2009. A composable model for analyzing locality of multi-threaded programs. Tech. rep. MSR-TR-2009-107, Microsoft Research.
- (2009) Tech. Rep. MSR-TR-2009-107, Microsoft Research
- Ding, C.¹ Chilimbi, T.²

10
- 0038716440
- Predicting whole-program locality through reuse distance analysis
- ACM
- Ding, C. and Zhong, Y. 2003. Predicting whole-program locality through reuse distance analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 245-257.
- (2003) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 245-257
- Ding, C.¹ Zhong, Y.²

11
- 67650312346
- A mechanistic performance model for superscalar out of order processors
- 3:1-3:37
- Eyerman, S., Eeckhout, L., and Karkhanis, T. 2009. A mechanistic performance model for superscalar out of order processors. ACM Trans. Comput. Syst. 27, 2, 3:1-3:37.
- (2009) ACM Trans. Comput. Syst. , vol.27 , Issue.2
- Eyerman, S.¹ Eeckhout, L.² Karkhanis, T.³

12
- 70350601187
- Reactive NUCA: Near-optimal block placement and replication in distributed caches
- ACM
- Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-Optimal block placement and replication in distributed caches. In Proceedings of the 36th International Symposium on Computer Architecture. ACM, 184-195.
- (2009) Proceedings of the 36th International Symposium on Computer Architecture , pp. 184-195
- Hardavellas, N.¹ Ferdman, M.² Falsafi, B.³ Ailamaki, A.⁴

13
- 77957912762
- Teraflop prototype processor with 80 cores
- Hoskote, Y., Vangal, S., Dighe, S., Borkar, N., and Borkar, S. 2007. Teraflop prototype processor with 80 Cores. In Proceedings of Symposium on High Performance Chips (Hot Chips).
- (2007) Proceedings of Symposium on High Performance Chips (Hot Chips)
- Hoskote, Y.¹ Vangal, S.² Dighe, S.³ Borkar, N.⁴ Borkar, S.⁵

14
- 42549168687
- Exploring the cache design space for large scale CMPs
- Hsu, L., Iyer, R., Makineni, S., Reinhardt, S., and Newell, D. 2005. Exploring the cache design space for large scale CMPs. SIGARCH Comput. Archit. News 33, 4, 24-33.
- (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.4 , pp. 24-33
- Hsu, L.¹ Iyer, R.² Makineni, S.³ Reinhardt, S.⁴ Newell, D.⁵

15
- 0035187053
- Exploring the design space of future CMPs
- Huh, J. Burger, D. and Keckler, S. W. 2001. Exploring the design space of future CMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 199-210. (Pubitemid 33085437)
- (2001) Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT , pp. 199-210
- Huh, J.¹ Burger, D.² Keckler, S.W.³

16
- 32844471317
- A nuca substrate for flexible CMP cache sharing
- ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
- Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. W. 2005. A NUCA substrate for flexible CMP cache sharing. In Proceedings of the 19th International Conference on Supercomputing. ACM, 31-40. (Pubitemid 43251308)
- (2005) Proceedings of the International Conference on Supercomputing , pp. 31-40
- Huh, J.¹ Kim, C.² Shafi, H.³ Zhang, L.⁴ Burger, D.⁵ Keckler, S.W.⁶

17
- 77954998134
- High performance cache replacement using rereference interval prediction (RRIP)
- ACM
- Jaleel, A., Theobald, K. B., Steely Jr., S. C., and Emer, J. 2010. High performance cache replacement using rereference interval prediction (RRIP). In Proceedings of the 37th International Symposium on Computer Architecture. ACM, 60-71.
- (2010) Proceedings of the 37th International Symposium on Computer Architecture , pp. 60-71
- Jaleel, A.¹ Theobald, K.B.² Steely Jr., S.C.³ Emer, J.⁴

18
- 77951616746
- Is reuse distance applicable to data locality analysis on chip multiprocessors?
- Springer
- Jiang, Y., Zhang, E. Z., Tian, K., and Shen, X. 2010. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceeding of the International Conference on Compiler Construction. Springer, 264-282.
- (2010) Proceeding of the International Conference on Compiler Construction , pp. 264-282
- Jiang, Y.¹ Zhang, E.Z.² Tian, K.³ Shen, X.⁴

19
- 4644299010
- A first order superscalar processor model
- ACM
- Karkhanis, T. S. and Smith, J. E. 2004. A first order superscalar processor model. In Proceedings of the 31st International Symposium on Computer Architecture. ACM, 338-349.
- (2004) Proceedings of the 31st International Symposium on Computer Architecture , pp. 338-349
- Karkhanis, T.S.¹ Smith, J.E.²

20
- 33744504467
- Power-performance implications of thread-level parallelism on chip multiprocessors
- DOI 10.1109/ISPASS.2005.1430567, 1430567, ISPASS 2005 - IEEE International Symposium on Performance Analysis of Systems and Software
- Li, J. and Martinez, J. F. 2005. Power-Performance implications of thread-level parallelism on chip multiprocessors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, 124-134. (Pubitemid 43804310)
- (2005) ISPASS 2005 - IEEE International Symposium on Performance Analysis of Systems and Software , vol.2005 , pp. 124-134
- Li, J.¹ Martinez, J.F.²

21
- 76749146060
- McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures
- ACM
- Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D.M., and Jouppi, N. P. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, 469-480.
- (2009) Proceedings of the 42nd International Symposium on Microarchitecture , pp. 469-480
- Li, S.¹ Ahn, J.H.² Strong, R.D.³ Brockman, J.B.⁴ Tullsen, D.M.⁵ Jouppi, N.P.⁶

22
- 33748857902
- CMP design space exploration subject to physical constraints
- IEEE Computer Society
- Li, Y., Lee, B., Brooks, D., Hu, Z., and Skadron, K. 2006. CMP design space exploration subject to physical constraints. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 17-28.
- (2006) Proceedings of the International Symposium on High-Performance Computer Architecture , pp. 17-28
- Li, Y.¹ Lee, B.² Brooks, D.³ Hu, Z.⁴ Skadron, K.⁵

23
- 33745304805
- Pin: Building customized program analysis tools with dynamic instrumentation
- ACM
- Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 190-200.
- (2005) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 190-200
- Luk, C.-K.¹ Cohn, R.² Muth, R.³ Patil, H.⁴ Klauser, A.⁵ Lowney, G.⁶ Wallace, S.⁷ Reddi, V.J.⁸ Hazelwood, K.⁹

24
- 0014701246
- Evaluation techniques for storage hierarchies
- Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78-117.
- (1970) IBM Syst. J. , vol.9 , Issue.2 , pp. 78-117
- Mattson, R.L.¹ Gecsei, J.² Slutz, D.R.³ Traiger, I.L.⁴

25
- 80051967684
- Using pin as a memory reference generator for multiprocessor simulation
- McCurdy, C. and Fischer, C. 2005. Using pin as a memory reference generator for multiprocessor simulation. SIGARCH Comput. Archit. News 33, 5, 39-44.
- (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.5 , pp. 39-44
- McCurdy, C.¹ Fischer, C.²

26
- 47349098275
- MineBench: A benchmark suite for data mining workloads
- IEEE Computer Society
- Narayanan, R., Ozisikyilmaz, B., Zambreno, J., Memik, G., and Choudhary, A. 2006. MineBench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 182 -188.
- (2006) Proceedings of the IEEE International Symposium on Workload Characterization , pp. 182-188
- Narayanan, R.¹ Ozisikyilmaz, B.² Zambreno, J.³ Memik, G.⁴ Choudhary, A.⁵

27
- 0028348615
- Exploring the design space for a shared-cache multiprocessor
- Nayfeh, B. A. and Olukotun, K. 1994. Exploring the design space for a shared-cache multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture. 166-175.
- (1994) Proceedings of the 21st International Symposium on Computer Architecture , pp. 166-175
- Nayfeh, B.A.¹ Olukotun, K.²

28
- 84856534005
- Evaluating a model for cache conflict miss prediction
- Qasem, A. and Kennedy, K. 2005. Evaluating a model for cache conflict miss prediction. Tech. rep. CS-TR05-457, Rice University.
- (2005) Tech. Rep. CS-TR05-457, Rice University
- Qasem, A.¹ Kennedy, K.²

29
- 70450285524
- Scaling the bandwidth wall: Challenges in and avenues for CMP scaling
- ACM
- Rogers, B., Krishna, A., Bell, G., Vu, K., Jiang, X., and Solihin, Y. 2009. Scaling the bandwidth wall: Challenges in and avenues for CMP scaling. In Proceedings of the 36th International Symposium on Computer Architecture. ACM, 371-382.
- (2009) Proceedings of the 36th International Symposium on Computer Architecture , pp. 371-382
- Rogers, B.¹ Krishna, A.² Bell, G.³ Vu, K.⁴ Jiang, X.⁵ Solihin, Y.⁶

30
- 78149247667
- Multicore aware reuse distance analysis
- Schuff, D. L., Parsons, B. S., and Pai, V. S. 2009. Multicore-Aware reuse distance analysis. Tech. rep. TR-ECE-09-08, Purdue University.
- (2009) Tech. Rep. TR-ECE-09-08, Purdue University
- Schuff, D.L.¹ Parsons, B.S.² Pai, V.S.³

31
- 78149254514
- Accelerating multicore reuse distance analysis with sampling and parallelization
- ACM
- Schuff, D. L., Kulkarni, M., and Pai, V. S. 2010. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 53-64.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 53-64
- Schuff, D.L.¹ Kulkarni, M.² Pai, V.S.³

32
- 0034826142
- Analytical cache models with applications to cache partitioning
- Suh, G. E., Devadas, S., and Rudolph, L. 2001. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th International Conference on Supercomputing. ACM, 1-12. (Pubitemid 32865298)
- (2001) Proceedings of the International Conference on Supercomputing , pp. 1-12
- Edward Suh, G.¹ Devadas, S.² Rudolph, L.³

33
- 0029179077
- The SPLASH-2 Programs: Characterization and methodological considerations
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 Programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. ACM, 24-36.
- (1995) Proceedings of the 22nd International Symposium on Computer Architecture , pp. 24-36
- Woo, S.C.¹ Ohara, M.² Torrie, E.³ Singh, J.P.⁴ Gupta, A.⁵

34
- 84856557541
- Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs
- IEEE Computer Society
- Wu, M.-J. and Yeung, D. 2011. Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 264-275.
- (2011) Proceedings of the International Conference on Parallel Architectures and Compilation Techniques , pp. 264-275
- Wu, M.-J.¹ Yeung, D.²

35
- 84863053984
- Linear-time modeling of program working set in shared cache
- IEEE Computer Society
- Xiang, X., Bao, B., Ding, C., and Gao, Y. 2011. Linear-time modeling of program working set in shared cache. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 350-360.
- (2011) Proceedings of the International Conference on Parallel Architectures and Compilation Techniques , pp. 350-360
- Xiang, X.¹ Bao, B.² Ding, C.³ Gao, Y.⁴

36
- 27544495466
- Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
- Proceedings - 32nd International Symposium on Computer Architecture, ISCA 2005
- Zhang, M. and Asanovic, K. 2005. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proceedings of the 32nd International Symposium on Computer Architecture. IEEE Computer Society, 336-345. (Pubitemid 41543452)
- (2005) Proceedings - International Symposium on Computer Architecture , pp. 336-345
- Zhang, M.¹ Asanovic, K.²

37
- 52649176921
- Performance, area and bandwidth implications on large-scale CMP cache design
- Zhao, L., Iyer, R., Makineni, S., Moses, J., Illikkal, R., and Newell, D. 2007. Performance, area and bandwidth implications on large-scale CMP cache design. In Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects.
- (2007) Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects
- Zhao, L.¹ Iyer, R.² Makineni, S.³ Moses, J.⁴ Illikkal, R.⁵ Newell, D.⁶

38
- 57349160281
- Sampling-based program locality approximation
- ACM
- Zhong, Y. and Chang, W. 2008. Sampling-based program locality approximation. In Proceedings of the 7th International Symposium on Memory Management. ACM, 91-100.
- (2008) Proceedings of the 7th International Symposium on Memory Management , pp. 91-100
- Zhong, Y.¹ Chang, W.²

39
- 84968739606
- Miss rate prediction across all program inputs
- IEEE Computer Society
- Zhong, Y., Dropsho, S. G., and Ding, C. 2003. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 79-90.
- (2003) Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques , pp. 79-90
- Zhong, Y.¹ Dropsho, S.G.² Ding, C.³

40
- 70349743894
- Program locality analysis using reuse distance
- 20:1-20:39
- Zhong, Y., Shen, X., and Ding, C. 2009. Program locality analysis using reuse distance. ACMTrans. Program. Lang. Syst. 31, 6, 20:1-20:39.
- (2009) ACMTrans. Program. Lang. Syst. , vol.31 , Issue.6
- Zhong, Y.¹ Shen, X.² Ding, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.