-
1
-
-
77954019149
-
-
LEON2 Processor
-
LEON2 Processor. http://vlsicad.eecs.umich.edu/BK/Slots/cache/www. gaisler.com/products/leon2/leon.html.
-
-
-
-
2
-
-
77952209580
-
-
NVIDIAs next generation CUDA compute architecture: Fermi
-
NVIDIAs next generation CUDA compute architecture: Fermi. NVIDIA Corporation, 2009.
-
(2009)
NVIDIA Corporation
-
-
-
3
-
-
33751205298
-
Cactus grid computing: Review of current development
-
London, UK. Springer-Verlag
-
G. Allen, W. Benger, T. Dramlitsch, T. Goodale, H.-C. Hege, G. Lanfermann, A.e Merzky, T. Radke, and E. Seidel. Cactus grid computing: Review of current development. In Euro-Par '01, pages 817-824, London, UK, 2001. Springer-Verlag.
-
(2001)
Euro-Par '01
, pp. 817-824
-
-
Allen, G.1
Benger, W.2
Dramlitsch, T.3
Goodale, T.4
Hege, H.-C.5
Lanfermann, G.6
Merzky, A.E.7
Radke, T.8
Seidel, E.9
-
4
-
-
35648995516
-
-
Technical Report UCB/EECS-2006-2183, EECS Department, University of California, Berkeley, December 18
-
K. Asanovic, R. Bodik, B. Christopher C., J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-2183, EECS Department, University of California, Berkeley, December 18 2006.
-
(2006)
The Landscape of Parallel Computing Research: A View from Berkeley
-
-
Asanovic, K.1
Bodik, R.2
Christopher, C.B.3
Gebis, J.J.4
Husbands, P.5
Keutzer, K.6
Patterson, D.A.7
Plishker, W.L.8
Shalf, J.9
Williams, S.W.10
Yelick, K.A.11
-
5
-
-
33846535493
-
The M5 simulator: Modeling networked systems
-
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4), 2006.
-
(2006)
IEEE Micro
, vol.26
, Issue.4
-
-
Binkert, N.L.1
Dreslinski, R.G.2
Hsu, L.R.3
Lim, K.T.4
Saidi, A.G.5
Reinhardt, S.K.6
-
8
-
-
77953698031
-
-
OpenMP application program interface, May
-
OpenMP Architecture Review Board. OpenMP application program interface, May 2008.
-
(2008)
OpenMP Architecture Review Board
-
-
-
9
-
-
0033719421
-
Wattch: A framework for architectural-level power analysis and optimizations
-
June
-
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In ISCA 27, June 2000.
-
(2000)
ISCA 27
-
-
Brooks, D.1
Tiwari, V.2
Martonosi, M.3
-
10
-
-
0029235623
-
Hierarchical tiling for improved superscalar performance
-
Washington, DC, USA
-
L. Carter, J. Ferrante, and S. F. Hummel. Hierarchical tiling for improved superscalar performance. In IPPS '95, pages 239-245, Washington, DC, USA, 1995.
-
(1995)
IPPS '95
, pp. 239-245
-
-
Carter, L.1
Ferrante, J.2
Hummel, S.F.3
-
11
-
-
21244474546
-
Predicting interthread cache contention on a chip multi-processor architecture
-
Washington, DC, USA
-
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting interthread cache contention on a chip multi-processor architecture. In HPCA '05, pages 340-351, Washington, DC, USA, 2005.
-
(2005)
HPCA '05
, pp. 340-351
-
-
Chandra, D.1
Guo, F.2
Kim, S.3
Solihin, Y.4
-
12
-
-
64949190009
-
PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches
-
Feb.
-
M. Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches. In HPCA, pages 227-238, Feb. 2009.
-
(2009)
HPCA
, pp. 227-238
-
-
Chaudhuri, M.1
-
13
-
-
51449118065
-
A performance study of general purpose applications on graphisc processors using CUDA
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general purpose applications on graphisc processors using CUDA. JPDC'08, 2008.
-
(2008)
JPDC'08
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
14
-
-
35248852476
-
Scheduling threads for constructive cache sharing on CMPs
-
New York, NY, USA
-
S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA '07, pages 105-115, New York, NY, USA, 2007.
-
(2007)
SPAA '07
, pp. 105-115
-
-
Chen, S.1
Gibbons, P.B.2
Kozuch, M.3
Liaskovitis, V.4
Ailamaki, A.5
Blelloch, G.E.6
Falsafi, B.7
Fix, L.8
Hardavellas, N.9
Mowry, T.C.10
Wilkerson, C.11
-
15
-
-
77953978022
-
-
Intel Corporation. Intel threading building blocks
-
Intel Corporation. Intel threading building blocks.
-
-
-
-
16
-
-
77953967507
-
-
Intel Corporation. Pircture the future now: Intel AVX
-
Intel Corporation. Pircture the future now: Intel AVX. http://software.intel.com/en-us/avx/.
-
-
-
-
17
-
-
77953976727
-
-
NVIDIA Corporation. GeForce GTX 280 specifications. 2008
-
NVIDIA Corporation. GeForce GTX 280 specifications. 2008.
-
-
-
-
19
-
-
84877083867
-
Merrimac: Supercomputing with streams
-
William J. Dally et al. Merrimac: Supercomputing with streams. In SC'03, 2003.
-
(2003)
SC'03
-
-
Dally, W.J.1
-
20
-
-
70350771127
-
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
-
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In SC '08, pages 1-12, 2008.
-
(2008)
SC'08
, pp. 1-12
-
-
Datta, K.1
Murphy, M.2
Volkov, V.3
Williams, S.4
Carter, J.5
Oliker, L.6
Patterson, D.7
Shalf, J.8
Yelick, K.9
-
21
-
-
33746683732
-
Maximizing cmp throughput with mediocre cores
-
Washington, DC, USA
-
J. D. Davis, J. Laudon, and K. Olukotun. Maximizing cmp throughput with mediocre cores. In PACT '05, pages 51-62, Washington, DC, USA, 2005.
-
(2005)
PACT '05
, pp. 51-62
-
-
Davis, J.D.1
Laudon, J.2
Olukotun, K.3
-
22
-
-
34548207355
-
Sequoia: Programming the memory hierarchy
-
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC'06, 2006.
-
(2006)
SC'06
-
-
Fatahalian, K.1
Horn, D.R.2
Knight, T.J.3
Leem, L.4
Houston, M.5
Park, J.Y.6
Erez, M.7
Ren, M.8
Aiken, A.9
Dally, W.J.10
Hanrahan, P.11
-
23
-
-
32844463802
-
Cache oblivious stencil computations
-
New York, NY, USA
-
M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS '05, pages 361-366, New York, NY, USA, 2005.
-
(2005)
ICS '05
, pp. 361-366
-
-
Frigo, M.1
Strumpen, V.2
-
24
-
-
34247376580
-
Chip multiprocessing and the cell broadband engine
-
New York, NY, USA
-
M. Gschwind. Chip multiprocessing and the Cell Broadband Engine. In CF'06, New York, NY, USA, 2006.
-
(2006)
CF'06
-
-
Gschwind, M.1
-
25
-
-
4444374512
-
Compact thermal modeling for temperature-aware design
-
W. Huang, M. R. Stan, K. Skadron, S. Ghosh, K. Sankaranarayanan, and S. Velusamy. Compact thermal modeling for temperature-aware design. In DAC'04, 2004.
-
(2004)
DAC'04
-
-
Huang, W.1
Stan, M.R.2
Skadron, K.3
Ghosh, S.4
Sankaranarayanan, K.5
Velusamy, S.6
-
26
-
-
57749175984
-
A comprehensive approach to dram power management
-
I. Hur and C. Lin. A comprehensive approach to dram power management. HPCA '08, pages 305-316, 2008.
-
(2008)
HPCA '08
, pp. 305-316
-
-
Hur, I.1
Lin, C.2
-
27
-
-
0022901352
-
Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system
-
W. Jalby and U. Meier. Optimizing matrix operations on a parallel multiprocessor with a hierarchical memory system. In Proc. Int. Conf. Parallel Processing, pages 429-432, 1986.
-
(1986)
Proc. Int. Conf. Parallel Processing
, pp. 429-432
-
-
Jalby, W.1
Meier, U.2
-
28
-
-
84893483994
-
An evaluation of thread migration for exploiting distributed array locality
-
S. Jenks and J.-L. Gaudiot. An evaluation of thread migration for exploiting distributed array locality. In HPCA, pages 190-195, 2002.
-
(2002)
HPCA
, pp. 190-195
-
-
Jenks, S.1
Gaudiot, J.-L.2
-
29
-
-
0347304618
-
Data-centric multilevel blocking
-
New York, NY, USA
-
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multilevel blocking. In PLDI '97, pages 346-357, New York, NY, USA, 1997.
-
(1997)
PLDI '97
, pp. 346-357
-
-
Kodukula, I.1
Ahmed, N.2
Pingali, K.3
-
30
-
-
20344374162
-
Niagara: A 32-way multithreaded sparc processor
-
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21-29, 2005.
-
(2005)
IEEE Micro
, vol.25
, Issue.2
, pp. 21-29
-
-
Kongetira, P.1
Aingaran, K.2
Olukotun, K.3
-
31
-
-
77957795684
-
Optimistic parallelism benefits from data partitioning
-
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS 13, 2008.
-
(2008)
ASPLOS
, vol.13
-
-
Kulkarni, M.1
Pingali, K.2
Ramanarayanan, G.3
Walter, B.4
Bala, K.5
Paul Chew, L.6
-
32
-
-
35348855586
-
Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
-
S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. SIGARCH Comput. Archit. News, 35(2), 2007.
-
(2007)
SIGARCH Comput. Archit. News
, vol.35
, Issue.2
-
-
Kumar, S.1
Hughes, C.J.2
Nguyen, A.3
-
33
-
-
0031364101
-
Tuning compiler optimizations for simultaneous multithreading
-
Washington, DC, USA
-
J. L. Lo, S. J. Eggers, H. M. Levy, S. S. Parekh, and D. M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In MICRO 30, pages 114-124, Washington, DC, USA, 1997.
-
(1997)
MICRO
, vol.30
, pp. 114-124
-
-
Lo, J.L.1
Eggers, S.J.2
Levy, H.M.3
Parekh, S.S.4
Tullsen, D.M.5
-
34
-
-
0033688597
-
Smart memories: A modular reconfigurable architecture
-
New York, NY, USA
-
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: a modular reconfigurable architecture. In ISCA '00, pages 161-171, New York, NY, USA, 2000.
-
(2000)
ISCA '00
, pp. 161-171
-
-
Mai, K.1
Paaske, T.2
Jayasena, N.3
Ho, R.4
Dally, W.J.5
Horowitz, M.6
-
35
-
-
84876909872
-
Using processor affinity in loop scheduling on shared-memory multiprocessors
-
E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. In SC'92, pages 104-113, 1992.
-
(1992)
SC'92
, pp. 104-113
-
-
Markatos, E.P.1
Leblanc, T.J.2
-
36
-
-
0003665539
-
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
-
K. S. McKinley and O. Temam. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks. ACM Trans. Comput. Syst., 17(4):288-336, 1999.
-
(1999)
ACM Trans. Comput. Syst.
, vol.17
, Issue.4
, pp. 288-336
-
-
McKinley, K.S.1
Temam, O.2
-
37
-
-
77950987305
-
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
-
Oct
-
J. Meng and K. Skadron. Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In ICCD, Oct 2009.
-
(2009)
ICCD
-
-
Meng, J.1
Skadron, K.2
-
38
-
-
47349098275
-
Minebench: A benchmark suite for data mining workloads
-
Oct.
-
R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. Minebench: A benchmark suite for data mining workloads. IISWC '06, pages 182-188, Oct. 2006.
-
(2006)
IISWC '06
, pp. 182-188
-
-
Narayanan, R.1
Ozisikyilmaz, B.2
Zambreno, J.3
Memik, G.4
Choudhary, A.5
-
39
-
-
77953964608
-
-
NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007
-
NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide, 2007.
-
-
-
-
41
-
-
2842513495
-
Thread scheduling for cache locality
-
New York, NY, USA. ACM
-
J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In ASPLOS-VII, pages 60-71, New York, NY, USA, 1996. ACM.
-
(1996)
ASPLOS-VII
, pp. 60-71
-
-
Philbin, J.1
Edler, J.2
Anshus, O.J.3
Douglas, C.C.4
Li, K.5
-
42
-
-
0036374188
-
Computation regrouping: Restructuring programs for temporal data cache locality
-
New York, NY, USA
-
V. K. Pingali, S. A. McKee, W. C. Hseih, and J. B. Carter. Computation regrouping: restructuring programs for temporal data cache locality. In ICS '02, pages 252-261, New York, NY, USA, 2002.
-
(2002)
ICS '02
, pp. 252-261
-
-
Pingali, V.K.1
McKee, S.A.2
Hseih, W.C.3
Carter, J.B.4
-
43
-
-
34248593308
-
Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow
-
K. N. Premnath and J. Abraham. Three-dimensional multirelaxation time (MRT) lattice-boltzmann models for multiphase flow. J. Comput. Phys., 224(2):539-559, 2007.
-
(2007)
J. Comput. Phys.
, vol.224
, Issue.2
, pp. 539-559
-
-
Premnath, K.N.1
Abraham, J.2
-
44
-
-
36849004429
-
Bringing NoCs to 65 nm
-
A. Pullini, F. Angiolini, S. Murali, D. Atienza, G. D. Micheli, and L. Benini. Bringing NoCs to 65 nm. IEEE Micro, 27(5), 2007.
-
(2007)
IEEE Micro
, vol.27
, Issue.5
-
-
Pullini, A.1
Angiolini, F.2
Murali, S.3
Atienza, D.4
Micheli, G.D.5
Benini, L.6
-
46
-
-
49249086142
-
Larrabee: A many-core x86 architecture for visual computing
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
-
(2008)
ACM Trans. Graph.
, vol.27
, Issue.3
, pp. 1-15
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
47
-
-
68949199685
-
A dynamically reconfigurable cache for multithreaded processors
-
A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. J. Embedded Comput., 2(2):221-233, 2006.
-
(2006)
J. Embedded Comput.
, vol.2
, Issue.2
, pp. 221-233
-
-
Settle, A.1
Connors, D.2
Gibert, E.3
Gonzalez, A.4
-
48
-
-
0039927463
-
Symbiotic jobscheduling for a simultaneous multithreaded processor
-
New York, NY, USA
-
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS '00, pages 234-244, New York, NY, USA, 2000.
-
(2000)
ASPLOS '00
, pp. 234-244
-
-
Snavely, A.1
Tullsen, D.M.2
-
49
-
-
0028754497
-
Affinity scheduling of unbalanced workloads
-
New York, NY, USA
-
S. Subramaniam and D. L. Eager. Affinity scheduling of unbalanced workloads. In SC '94, pages 214-226, New York, NY, USA, 1994.
-
(1994)
SC '94
, pp. 214-226
-
-
Subramaniam, S.1
Eager, D.L.2
-
50
-
-
84949769332
-
A new memory monitoring scheme for memory-aware scheduling and partitioning
-
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02, page 117, 2002.
-
(2002)
HPCA '02
, pp. 117
-
-
Suh, G.E.1
Devadas, S.2
Rudolph, L.3
-
52
-
-
0000444590
-
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
-
J. Torrellas, A. Tucker, and A. Gupta. Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors. J. Parallel Distrib. Comput., 24(2):139-151, 1995.
-
(1995)
J. Parallel Distrib. Comput.
, vol.24
, Issue.2
, pp. 139-151
-
-
Torrellas, J.1
Tucker, A.2
Gupta, A.3
-
53
-
-
34247272420
-
Thread-associative memory for multicore and multithreaded computing
-
New York, NY, USA
-
S. Wang and L. Wang. Thread-associative memory for multicore and multithreaded computing. In ISLPED '06, pages 139-142, New York, NY, USA, 2006.
-
(2006)
ISLPED '06
, pp. 139-142
-
-
Wang, S.1
Wang, L.2
-
54
-
-
0029179077
-
The SPLASH-2 programs: Characterization and methodological considerations
-
June
-
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. ISCA '95, pages 24-36, June 1995.
-
(1995)
ISCA '95
, pp. 24-36
-
-
Woo, S.C.1
Ohara, M.2
Torrie, E.3
Singh, J.P.4
Gupta, A.5
-
55
-
-
77953992522
-
-
Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide
-
Inc. XILINX. Virtex-ii pro and virtex-ii pro x fpga user guide.
-
-
-
-
56
-
-
84949817426
-
Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
-
S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In HPCA '02, page 151, 2002.
-
(2002)
HPCA '02
, pp. 151
-
-
Yang, S.-H.1
Falsafi, B.2
Powell, M.D.3
Vijaykumar, T.N.4
-
57
-
-
79952570595
-
An adaptive OpenMP loop scheduler for hyperthreaded SMPs
-
Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss. An adaptive OpenMP loop scheduler for hyperthreaded SMPs. In PDCS '04, 2004.
-
(2004)
PDCS '04
-
-
Zhang, Y.1
Burcea, M.2
Cheng, V.3
Ho, R.4
Voss, M.5
|