-
1
-
-
0029181140
-
Data and computation transformations for multiprocessors
-
Santa Barbara, CA
-
Anderson, J., Amarasinghe, S. and Lam, M. (1995) ‘Data and computation transformations for multiprocessors’, in Proceedings of the Fifth ACM Symposium on Principles and Practices of Parallel Programming (PPoPP’95), Santa Barbara, CA, July, pp.166–178.
-
(1995)
Proceedings of the Fifth ACM Symposium on Principles and Practices of Parallel Programming (PPoPP’95)
, vol.July
, pp. 166-178
-
-
Anderson, J.1
Amarasinghe, S.2
Lam, M.3
-
2
-
-
84877042382
-
Scalable cross-platform infrastructure for application performance tuning using hardware counters
-
Dallas, TX, November
-
Browne, S., Dongarra, J., Garner, N., London, K. and Mucci, P.A. (2000) ‘Scalable cross-platform infrastructure for application performance tuning using hardware counters’, in Proceedings of Supercomputing’2000: High Performance Networking and Computing Conference, Dallas, TX, November.
-
(2000)
Proceedings of Supercomputing’2000: High Performance Networking and Computing Conference
-
-
Browne, S.1
Dongarra, J.2
Garner, N.3
London, K.4
Mucci, P.A.5
-
3
-
-
0029235623
-
Hierarchical tiling for improved superscalar performance
-
Santa Barbara, CA
-
Carter, L., Ferrante, J. and Hummel, S. (1995) ‘Hierarchical tiling for improved superscalar performance’, in Proceedings of the Ninth International Parallel Processing Symposium (IPPS’95), Santa Barbara, CA, April, pp.239–245.
-
(1995)
Proceedings of the Ninth International Parallel Processing Symposium (IPPS’95)
, vol.April
, pp. 239-245
-
-
Carter, L.1
Ferrante, J.2
Hummel, S.3
-
4
-
-
84976745804
-
Tile size selection using cache organisation and data layout
-
San Diego, CA
-
Coleman, S. and McKinley, K. (1995) ‘Tile size selection using cache organisation and data layout’, in Proceedings of the 1995 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’95), San Diego, CA, June, pp.279–290.
-
(1995)
Proceedings of the 1995 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’95)
, vol.June
, pp. 279-290
-
-
Coleman, S.1
McKinley, K.2
-
5
-
-
0031223129
-
Compiler blockability of dense matrix factorisations
-
Carr, S. and Lehoucq, R. (1997) ‘Compiler blockability of dense matrix factorisations’, ACM Transactions on Mathematical Software, Vol. 23, No. 3, pp.336–361.
-
(1997)
ACM Transactions on Mathematical Software
, vol.23
, Issue.3
, pp. 336-361
-
-
Carr, S.1
Lehoucq, R.2
-
6
-
-
0032676178
-
A tile selection algorithm for data locality and cache interference
-
Rhodes, Greece
-
Chame, J. and Moon, S. (1999) ‘A tile selection algorithm for data locality and cache interference’, in Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99), Rhodes, Greece, June, pp.492–499.
-
(1999)
Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99)
, vol.June
, pp. 492-499
-
-
Chame, J.1
Moon, S.2
-
7
-
-
77953936946
-
Intel hyper-threading technology
-
Cross, R. (2002) ‘Intel hyper-threading technology’, Intel Technology Journal, Vol. 6, No. 1.
-
(2002)
Intel Technology Journal
, vol.6
, Issue.1
-
-
Cross, R.1
-
8
-
-
0037230301
-
High performance linear algebra algorithms using new generalized data structures and matrices
-
Gustavson, F. (2002) ‘High performance linear algebra algorithms using new generalized data structures and matrices’, IBM Journal of Research and Development, Vol. 17, No. 1, pp.31–56.
-
(2002)
IBM Journal of Research and Development
, vol.17
, Issue.1
, pp. 31-56
-
-
Gustavson, F.1
-
9
-
-
0347304618
-
Data-centric multilevel blocking
-
Las Vegas, NV
-
Kodukula, I., Ahmed, N. and Pingali, K. (1997) ‘Data-centric multilevel blocking’, in Proceedings of the 1997 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’97), Las Vegas, NV, June, pp.346–357.
-
(1997)
Proceedings of the 1997 ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’97)
, vol.June
, pp. 346-357
-
-
Kodukula, I.1
Ahmed, N.2
Pingali, K.3
-
10
-
-
0031199614
-
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
-
Lo, J., Emer, J., Levy, H., Stamm, R., Tullsen, D. and Eggers, S. (1997) ‘Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading’, ACM Transactions on Computer Systems, Vol. 15, No. 3, pp.322–353.
-
(1997)
ACM Transactions on Computer Systems
, vol.15
, Issue.3
, pp. 322-353
-
-
Lo, J.1
Emer, J.2
Levy, H.3
Stamm, R.4
Tullsen, D.5
Eggers, S.6
-
11
-
-
0031364101
-
Tuning compiler optimisations for simultaneous multithreading
-
Research Triangle Park, NC
-
Lo, J., Eggers, S., Levy, H., Parekh, S. and Tullsen, D. (1997) ‘Tuning compiler optimisations for simultaneous multithreading’, in Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO’30), Research Triangle Park, NC, November, pp.114–124.
-
(1997)
Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO’30)
, vol.November
, pp. 114-124
-
-
Lo, J.1
Eggers, S.2
Levy, H.3
Parekh, S.4
Tullsen, D.5
-
12
-
-
0030190854
-
Improving data locality with loop transformations
-
McKinley, K., Carr, S. and Tseng, C. (1996) ‘Improving data locality with loop transformations’, ACM Transactions on Programming Languages and Systems, Vol. 18, No. 4, pp.424–453.
-
(1996)
ACM Transactions on Programming Languages and Systems
, vol.18
, Issue.4
, pp. 424-453
-
-
McKinley, K.1
Carr, S.2
Tseng, C.3
-
13
-
-
35248882282
-
Tiling imperfect loop nests
-
Dallas, TX, November
-
Mateev, N., Ahmed, N. and Pingali, K. (2000) ‘Tiling imperfect loop nests’, in Proceedings of the IEEE/ACM Supercomputing’2000: High Performance Networking and Computing Conference (SC’2000), Dallas, TX, November.
-
(2000)
Proceedings of the IEEE/ACM Supercomputing’2000: High Performance Networking and Computing Conference (SC’2000)
-
-
Mateev, N.1
Ahmed, N.2
Pingali, K.3
-
14
-
-
0038040130
-
Improving server software support for simultaneous multithreaded processors
-
San Diego, CA
-
McDowell, L., Eggers, S. and Gribble, S. (2003) ‘Improving server software support for simultaneous multithreaded processors’, in Proceedings of the 2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’2003), San Diego, CA, June, pp.37–48.
-
(2003)
Proceedings of the 2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’2003)
, vol.June
, pp. 37-48
-
-
McDowell, L.1
Eggers, S.2
Gribble, S.3
-
15
-
-
0242370926
-
Code and data transformations for improving shared cache performance on SMT processors
-
Tokyo, Japan
-
Nikolopoulos, D.S. (2003) ‘Code and data transformations for improving shared cache performance on SMT processors’, in Proceedings of the Fifth International Symposium on High Performance Computing (ISHPC-V), Tokyo, Japan, October, pp.54–69.
-
(2003)
Proceedings of the Fifth International Symposium on High Performance Computing (ISHPC-V)
, vol.October
, pp. 54-69
-
-
Nikolopoulos, D.S.1
-
16
-
-
24644482622
-
Analysis of memory hierarchy performance of block data layout
-
Vancouver, Canada
-
Park, N., Hong, B. and Prasanna, V. (2002) ‘Analysis of memory hierarchy performance of block data layout’, in Proceedings of the 2002 International Conference on Parallel Processing (ICPP’2002) Vancouver, Canada, August, pp.35–42.
-
(2002)
Proceedings of the 2002 International Conference on Parallel Processing (ICPP’2002)
, vol.August
, pp. 35-42
-
-
Park, N.1
Hong, B.2
Prasanna, V.3
-
17
-
-
84949210195
-
A Comparison of tiling algorithms
-
Amsterdam, the Netherlands
-
Rivera, G. and Tseng, C. (1999) ‘A Comparison of tiling algorithms’, in Proceedings of the Eighth International Conference on Compiler Construction (CC’99) Amsterdam, the Netherlands, March, pp.168–182.
-
(1999)
Proceedings of the Eighth International Conference on Compiler Construction (CC’99)
, vol.March
, pp. 168-182
-
-
Rivera, G.1
Tseng, C.2
-
18
-
-
0034443225
-
Analysis of operating system behavior on a simultaneous multithreaded architecture
-
Cambridge, MA
-
Redstone, J., Eggers, S. and Levy, H. (2000) ‘Analysis of operating system behavior on a simultaneous multithreaded architecture’, in Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’IX), Cambridge, MA, November, pp.245–256.
-
(2000)
Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’IX)
, vol.November
, pp. 245-256
-
-
Redstone, J.1
Eggers, S.2
Levy, H.3
-
19
-
-
0034443570
-
Symbiotic job scheduling for a simultaneous multithreading processor
-
Cambridge, MA
-
Snavely, A. and Tullsen, D. (2000) ‘Symbiotic job scheduling for a simultaneous multithreading processor’, in Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’IX), Cambridge, MA, November, pp.234–244.
-
(2000)
Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’IX)
, vol.November
, pp. 234-244
-
-
Snavely, A.1
Tullsen, D.2
-
20
-
-
0034826142
-
Analytical cache models with applications to cache partitioning
-
Sorrento, Italy
-
Suh, G., Devadas, S. and Rudolph, L. (2001) ‘Analytical cache models with applications to cache partitioning’, in Proceedings of the 15th ACM International Conference on Supercomputing (ICS’01), Sorrento, Italy, June, pp.1–12.
-
(2001)
Proceedings of the 15th ACM International Conference on Supercomputing (ICS’01)
, vol.June
, pp. 1-12
-
-
Suh, G.1
Devadas, S.2
Rudolph, L.3
-
21
-
-
84944063479
-
Effects of memory performance on parallel job scheduling
-
Edinburgh, Scotland
-
Suh, G., Rudolph, L. and Devadas, S. (2002) ‘Effects of memory performance on parallel job scheduling’, in Proceedings of the eighth Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’02), Edinburgh, Scotland, June, pp.116–132.
-
(2002)
Proceedings of the eighth Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’02)
, vol.June
, pp. 116-132
-
-
Suh, G.1
Rudolph, L.2
Devadas, S.3
-
22
-
-
0027764718
-
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts
-
Portland, OR
-
Temam, O., Granston, E. and Jalby, W. (1993) ‘To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts’, in Proceedings of the ACM/IEEE Supercomputing’93: High Performance Networking and Computing Conference (SC’93), Portland, OR, November, pp.410–419.
-
(1993)
Proceedings of the ACM/IEEE Supercomputing’93: High Performance Networking and Computing Conference (SC’93)
, vol.November
, pp. 410-419
-
-
Temam, O.1
Granston, E.2
Jalby, W.3
-
23
-
-
0029200683
-
Simultaneous multithreading: maximizing on-chip parallelism
-
St. Margherita Ligure, Italy
-
Tullsen, D., Eggers, S. and Levy, H. (1995) ‘Simultaneous multithreading: maximizing on-chip parallelism’, in Proceedings of the 22nd International Symposium on Computer Architecture (ISCA’95), St. Margherita Ligure, Italy, June, pp.392–403.
-
(1995)
Proceedings of the 22nd International Symposium on Computer Architecture (ISCA’95)
, vol.June
, pp. 392-403
-
-
Tullsen, D.1
Eggers, S.2
Levy, H.3
-
24
-
-
34547715870
-
Initial observations of the simultaneous multithreading pentium IV processor
-
New Orleans, LA
-
Tuck, N. and Tullsen, D. (2003) ‘Initial observations of the simultaneous multithreading pentium IV processor’, in Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT’2003), New Orleans, LA, September, pp.26–35.
-
(2003)
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT’2003)
, vol.September
, pp. 26-35
-
-
Tuck, N.1
Tullsen, D.2
-
25
-
-
84976827033
-
A Data locality optimizing algorithm
-
Toronto, Canada
-
Wolf, M. and Lam, M. (1991) ‘A Data locality optimizing algorithm’, in Proceedings of the 1991 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’91), Toronto, Canada, June, pp.30–44.
-
(1991)
Proceedings of the 1991 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’91)
, vol.June
, pp. 30-44
-
-
Wolf, M.1
Lam, M.2
-
27
-
-
84951724411
-
-
Currently, Linux assigns different processor IDs to the hardware threads on an SMT processor, as if each hardware thread were a CPU on its own
-
Currently, Linux assigns different processor IDs to the hardware threads on an SMT processor, as if each hardware thread were a CPU on its own.
-
-
-
-
28
-
-
84951724412
-
-
In Linux, each kernel thread has a unique identifier
-
In Linux, each kernel thread has a unique identifier.
-
-
-
-
29
-
-
84951724413
-
-
All reported averages are arithmetic means taken from the executions of a program with a fixed number of threads across all dataset sizes
-
All reported averages are arithmetic means taken from the executions of a program with a fixed number of threads across all dataset sizes.
-
-
-
|