-
1
-
-
77949484043
-
Tile processor: Embedded multicore for networking and multimedia
-
Agarwal, A., Bao, L., Brown, J., Edwards, B., Mattina, M., Miao, C.-C., Ramey, C., and Wentzlaff, D. 2007. Tile processor: Embedded multicore for networking and multimedia. In Proceedings of the Symposium on High Performance Chips (Hot Chips).
-
(2007)
Proceedings of the Symposium on High Performance Chips (Hot Chips)
-
-
Agarwal, A.1
Bao, L.2
Brown, J.3
Edwards, B.4
Mattina, M.5
Miao, C.-C.6
Ramey, C.7
Wentzlaff, D.8
-
2
-
-
70649107128
-
A communication characterisation of splash-2 and parsec
-
IEEE Computer Society
-
Barrow-Williams, N., Fensch, C., and Moore, S. 2009. A communication characterisation of splash-2 and parsec. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 86-97.
-
(2009)
Proceedings of the IEEE International Symposium on Workload Characterization
, pp. 86-97
-
-
Barrow-Williams, N.1
Fensch, C.2
Moore, S.3
-
3
-
-
33750837706
-
A statistical multiprocessor cache model
-
1620793, ISPASS 2006: IEEE International Symposium on Performance Analysis of Systems and Software, 2006
-
Berg, E., Zeffer, H., and Hagersten, E. 2006. A statistical multiprocessor cache model. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, 89-99. (Pubitemid 44711113)
-
(2006)
ISPASS 2006: IEEE International Symposium on Performance Analysis of Systems and Software, 2006
, vol.2006
, pp. 89-99
-
-
Berg, E.1
Zeffer, H.2
Hagersten, E.3
-
4
-
-
56449124998
-
PARSEC vs. SPLASH2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors
-
IEEE Computer Society
-
Bienia, C., Kumar, S., and Li, K. 2008a. PARSEC vs. SPLASH2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 47-56.
-
(2008)
Proceedings of the IEEE International Symposium on Workload Characterization
, pp. 47-56
-
-
Bienia, C.1
Kumar, S.2
Li, K.3
-
5
-
-
63549095070
-
The PARSEC benchmark suite: Characterization and architectural implications
-
ACM
-
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008b. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72-81.
-
(2008)
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques
, pp. 72-81
-
-
Bienia, C.1
Kumar, S.2
Singh, J.P.3
Li, K.4
-
6
-
-
33846535493
-
The M5 simulator: Modeling networked systems
-
DOI 10.1109/MM.2006.82
-
Binkert, N., Dreslinski, R., Hsu, L., Lim, K., Saidi, A., and Reinhardt, S. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52-60. (Pubitemid 46504889)
-
(2006)
IEEE Micro
, vol.26
, Issue.4
, pp. 52-60
-
-
Binkert, N.L.1
Dreslinski, R.G.2
Hsu, L.R.3
Lim, K.T.4
Saidi, A.G.5
Reinhardt, S.K.6
-
7
-
-
21244474546
-
Predicting inter-thread cache contention on a chip multi-processor architecture
-
Proceedings - 11th International Symposium on High-Performance Computer Architecture, HPCA-11 2005
-
Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 340-351. (Pubitemid 41731513)
-
(2005)
Proceedings - International Symposium on High-Performance Computer Architecture
, pp. 340-351
-
-
Chandra, D.1
Guo, F.2
Kim, S.3
Solihin, Y.4
-
8
-
-
33746683732
-
Maximizing CMP throughput with mediocre cores
-
DOI 10.1109/PACT.2005.42, 1515580, 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
-
Davis, J., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 51-62. (Pubitemid 44159727)
-
(2005)
Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
, vol.2005
, pp. 51-62
-
-
Davis, J.D.1
Laudon, J.2
Olukotun, K.3
-
11
-
-
67650312346
-
A mechanistic performance model for superscalar out of order processors
-
3:1-3:37
-
Eyerman, S., Eeckhout, L., and Karkhanis, T. 2009. A mechanistic performance model for superscalar out of order processors. ACM Trans. Comput. Syst. 27, 2, 3:1-3:37.
-
(2009)
ACM Trans. Comput. Syst.
, vol.27
, Issue.2
-
-
Eyerman, S.1
Eeckhout, L.2
Karkhanis, T.3
-
12
-
-
70350601187
-
Reactive NUCA: Near-optimal block placement and replication in distributed caches
-
ACM
-
Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-Optimal block placement and replication in distributed caches. In Proceedings of the 36th International Symposium on Computer Architecture. ACM, 184-195.
-
(2009)
Proceedings of the 36th International Symposium on Computer Architecture
, pp. 184-195
-
-
Hardavellas, N.1
Ferdman, M.2
Falsafi, B.3
Ailamaki, A.4
-
13
-
-
77957912762
-
Teraflop prototype processor with 80 cores
-
Hoskote, Y., Vangal, S., Dighe, S., Borkar, N., and Borkar, S. 2007. Teraflop prototype processor with 80 Cores. In Proceedings of Symposium on High Performance Chips (Hot Chips).
-
(2007)
Proceedings of Symposium on High Performance Chips (Hot Chips)
-
-
Hoskote, Y.1
Vangal, S.2
Dighe, S.3
Borkar, N.4
Borkar, S.5
-
14
-
-
42549168687
-
Exploring the cache design space for large scale CMPs
-
Hsu, L., Iyer, R., Makineni, S., Reinhardt, S., and Newell, D. 2005. Exploring the cache design space for large scale CMPs. SIGARCH Comput. Archit. News 33, 4, 24-33.
-
(2005)
SIGARCH Comput. Archit. News
, vol.33
, Issue.4
, pp. 24-33
-
-
Hsu, L.1
Iyer, R.2
Makineni, S.3
Reinhardt, S.4
Newell, D.5
-
16
-
-
32844471317
-
A nuca substrate for flexible CMP cache sharing
-
ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
-
Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. W. 2005. A NUCA substrate for flexible CMP cache sharing. In Proceedings of the 19th International Conference on Supercomputing. ACM, 31-40. (Pubitemid 43251308)
-
(2005)
Proceedings of the International Conference on Supercomputing
, pp. 31-40
-
-
Huh, J.1
Kim, C.2
Shafi, H.3
Zhang, L.4
Burger, D.5
Keckler, S.W.6
-
17
-
-
77954998134
-
High performance cache replacement using rereference interval prediction (RRIP)
-
ACM
-
Jaleel, A., Theobald, K. B., Steely Jr., S. C., and Emer, J. 2010. High performance cache replacement using rereference interval prediction (RRIP). In Proceedings of the 37th International Symposium on Computer Architecture. ACM, 60-71.
-
(2010)
Proceedings of the 37th International Symposium on Computer Architecture
, pp. 60-71
-
-
Jaleel, A.1
Theobald, K.B.2
Steely Jr., S.C.3
Emer, J.4
-
18
-
-
77951616746
-
Is reuse distance applicable to data locality analysis on chip multiprocessors?
-
Springer
-
Jiang, Y., Zhang, E. Z., Tian, K., and Shen, X. 2010. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceeding of the International Conference on Compiler Construction. Springer, 264-282.
-
(2010)
Proceeding of the International Conference on Compiler Construction
, pp. 264-282
-
-
Jiang, Y.1
Zhang, E.Z.2
Tian, K.3
Shen, X.4
-
20
-
-
33744504467
-
Power-performance implications of thread-level parallelism on chip multiprocessors
-
DOI 10.1109/ISPASS.2005.1430567, 1430567, ISPASS 2005 - IEEE International Symposium on Performance Analysis of Systems and Software
-
Li, J. and Martinez, J. F. 2005. Power-Performance implications of thread-level parallelism on chip multiprocessors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, 124-134. (Pubitemid 43804310)
-
(2005)
ISPASS 2005 - IEEE International Symposium on Performance Analysis of Systems and Software
, vol.2005
, pp. 124-134
-
-
Li, J.1
Martinez, J.F.2
-
21
-
-
76749146060
-
McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures
-
ACM
-
Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D.M., and Jouppi, N. P. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd International Symposium on Microarchitecture. ACM, 469-480.
-
(2009)
Proceedings of the 42nd International Symposium on Microarchitecture
, pp. 469-480
-
-
Li, S.1
Ahn, J.H.2
Strong, R.D.3
Brockman, J.B.4
Tullsen, D.M.5
Jouppi, N.P.6
-
22
-
-
33748857902
-
CMP design space exploration subject to physical constraints
-
IEEE Computer Society
-
Li, Y., Lee, B., Brooks, D., Hu, Z., and Skadron, K. 2006. CMP design space exploration subject to physical constraints. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 17-28.
-
(2006)
Proceedings of the International Symposium on High-Performance Computer Architecture
, pp. 17-28
-
-
Li, Y.1
Lee, B.2
Brooks, D.3
Hu, Z.4
Skadron, K.5
-
23
-
-
33745304805
-
Pin: Building customized program analysis tools with dynamic instrumentation
-
ACM
-
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 190-200.
-
(2005)
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
, pp. 190-200
-
-
Luk, C.-K.1
Cohn, R.2
Muth, R.3
Patil, H.4
Klauser, A.5
Lowney, G.6
Wallace, S.7
Reddi, V.J.8
Hazelwood, K.9
-
24
-
-
0014701246
-
Evaluation techniques for storage hierarchies
-
Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78-117.
-
(1970)
IBM Syst. J.
, vol.9
, Issue.2
, pp. 78-117
-
-
Mattson, R.L.1
Gecsei, J.2
Slutz, D.R.3
Traiger, I.L.4
-
25
-
-
80051967684
-
Using pin as a memory reference generator for multiprocessor simulation
-
McCurdy, C. and Fischer, C. 2005. Using pin as a memory reference generator for multiprocessor simulation. SIGARCH Comput. Archit. News 33, 5, 39-44.
-
(2005)
SIGARCH Comput. Archit. News
, vol.33
, Issue.5
, pp. 39-44
-
-
McCurdy, C.1
Fischer, C.2
-
26
-
-
47349098275
-
MineBench: A benchmark suite for data mining workloads
-
IEEE Computer Society
-
Narayanan, R., Ozisikyilmaz, B., Zambreno, J., Memik, G., and Choudhary, A. 2006. MineBench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization. IEEE Computer Society, 182 -188.
-
(2006)
Proceedings of the IEEE International Symposium on Workload Characterization
, pp. 182-188
-
-
Narayanan, R.1
Ozisikyilmaz, B.2
Zambreno, J.3
Memik, G.4
Choudhary, A.5
-
29
-
-
70450285524
-
Scaling the bandwidth wall: Challenges in and avenues for CMP scaling
-
ACM
-
Rogers, B., Krishna, A., Bell, G., Vu, K., Jiang, X., and Solihin, Y. 2009. Scaling the bandwidth wall: Challenges in and avenues for CMP scaling. In Proceedings of the 36th International Symposium on Computer Architecture. ACM, 371-382.
-
(2009)
Proceedings of the 36th International Symposium on Computer Architecture
, pp. 371-382
-
-
Rogers, B.1
Krishna, A.2
Bell, G.3
Vu, K.4
Jiang, X.5
Solihin, Y.6
-
30
-
-
78149247667
-
Multicore aware reuse distance analysis
-
Schuff, D. L., Parsons, B. S., and Pai, V. S. 2009. Multicore-Aware reuse distance analysis. Tech. rep. TR-ECE-09-08, Purdue University.
-
(2009)
Tech. Rep. TR-ECE-09-08, Purdue University
-
-
Schuff, D.L.1
Parsons, B.S.2
Pai, V.S.3
-
31
-
-
78149254514
-
Accelerating multicore reuse distance analysis with sampling and parallelization
-
ACM
-
Schuff, D. L., Kulkarni, M., and Pai, V. S. 2010. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 53-64.
-
(2010)
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques
, pp. 53-64
-
-
Schuff, D.L.1
Kulkarni, M.2
Pai, V.S.3
-
32
-
-
0034826142
-
Analytical cache models with applications to cache partitioning
-
Suh, G. E., Devadas, S., and Rudolph, L. 2001. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th International Conference on Supercomputing. ACM, 1-12. (Pubitemid 32865298)
-
(2001)
Proceedings of the International Conference on Supercomputing
, pp. 1-12
-
-
Edward Suh, G.1
Devadas, S.2
Rudolph, L.3
-
33
-
-
0029179077
-
The SPLASH-2 Programs: Characterization and methodological considerations
-
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 Programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. ACM, 24-36.
-
(1995)
Proceedings of the 22nd International Symposium on Computer Architecture
, pp. 24-36
-
-
Woo, S.C.1
Ohara, M.2
Torrie, E.3
Singh, J.P.4
Gupta, A.5
-
35
-
-
84863053984
-
Linear-time modeling of program working set in shared cache
-
IEEE Computer Society
-
Xiang, X., Bao, B., Ding, C., and Gao, Y. 2011. Linear-time modeling of program working set in shared cache. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 350-360.
-
(2011)
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
, pp. 350-360
-
-
Xiang, X.1
Bao, B.2
Ding, C.3
Gao, Y.4
-
36
-
-
27544495466
-
Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors
-
Proceedings - 32nd International Symposium on Computer Architecture, ISCA 2005
-
Zhang, M. and Asanovic, K. 2005. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proceedings of the 32nd International Symposium on Computer Architecture. IEEE Computer Society, 336-345. (Pubitemid 41543452)
-
(2005)
Proceedings - International Symposium on Computer Architecture
, pp. 336-345
-
-
Zhang, M.1
Asanovic, K.2
-
37
-
-
52649176921
-
Performance, area and bandwidth implications on large-scale CMP cache design
-
Zhao, L., Iyer, R., Makineni, S., Moses, J., Illikkal, R., and Newell, D. 2007. Performance, area and bandwidth implications on large-scale CMP cache design. In Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects.
-
(2007)
Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects
-
-
Zhao, L.1
Iyer, R.2
Makineni, S.3
Moses, J.4
Illikkal, R.5
Newell, D.6
-
39
-
-
84968739606
-
Miss rate prediction across all program inputs
-
IEEE Computer Society
-
Zhong, Y., Dropsho, S. G., and Ding, C. 2003. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 79-90.
-
(2003)
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
, pp. 79-90
-
-
Zhong, Y.1
Dropsho, S.G.2
Ding, C.3
-
40
-
-
70349743894
-
Program locality analysis using reuse distance
-
20:1-20:39
-
Zhong, Y., Shen, X., and Ding, C. 2009. Program locality analysis using reuse distance. ACMTrans. Program. Lang. Syst. 31, 6, 20:1-20:39.
-
(2009)
ACMTrans. Program. Lang. Syst.
, vol.31
, Issue.6
-
-
Zhong, Y.1
Shen, X.2
Ding, C.3
|