-
1
-
-
35648995516
-
The landscape of parallel computing research: A view from Berkeley
-
University of California, Berkeley, December 18
-
Asanovic K et al. The landscape of parallel computing research: A view from Berkeley. Technical Report No.UCB/EECS-2006-183, University of California, Berkeley, December 18, 2006.
-
(2006)
Technical Report No.UCB/EECS-2006-183
-
-
Asanovic, K.1
-
2
-
-
33646892173
-
The problem with threads
-
DOI 10.1109/MC.2006.180
-
EA Lee 2006 The problem with threads Computer 39 5 33 42 10.1109/MC.2006.180 (Pubitemid 43786509)
-
(2006)
Computer
, vol.39
, Issue.5
, pp. 33-42
-
-
Lee, E.A.1
-
3
-
-
78651582149
-
Real-world concurrency
-
10.1145/1454456.1454462
-
B Cantrill J Bonwick 2008 Real-world concurrency ACM Queue 6 5 16 25 10.1145/1454456.1454462
-
(2008)
ACM Queue
, vol.6
, Issue.5
, pp. 16-25
-
-
Cantrill, B.1
Bonwick, J.2
-
5
-
-
70350610063
-
An efficient and flexible task management for many-core architectures
-
Beijing, China, June 22-26
-
Yuan N, Yu L, Fan D. An efficient and flexible task management for many-core architectures. In Proc. Workshop on Software and Hardware Challenges of Manycore Platforms, in Conjunction with the 35th International Symposium on Computer Architecture (ISCA-35), Beijing, China, June 22-26, 2008, pp.1-17.
-
(2008)
Proc. Workshop on Software and Hardware Challenges of Manycore Platforms, in Conjunction with the 35th International Symposium on Computer Architecture (ISCA-35)
, pp. 1-17
-
-
Yuan, N.1
Yu, L.2
Fan, D.3
-
6
-
-
0000269759
-
Scheduling multithreaded computations by work stealing
-
1065.68504 10.1145/324133.324234 1747653
-
RD Blumofe CE Leiserson 1999 Scheduling multithreaded computations by work stealing Journal of the ACM 46 5 720 748 1065.68504 10.1145/324133.324234 1747653
-
(1999)
Journal of the ACM
, vol.46
, Issue.5
, pp. 720-748
-
-
Blumofe, R.D.1
Leiserson, C.E.2
-
7
-
-
40349113716
-
CAPSULE: Hardware-assisted parallel execution of component-based programs
-
Washington, DC, USA: IEEE Computer Society, Dec. 9-13
-
Palatin P, Lhuillier Y, Temam O. CAPSULE: Hardware-assisted parallel execution of component-based programs. In Proc. the 39th Annual IEEE/ACM International Symposium on Micro-Architecture, Washington, DC, USA: IEEE Computer Society, Dec. 9-13, 2006, pp.247-258.
-
(2006)
Proc. the 39th Annual IEEE/ACM International Symposium on Micro-Architecture
, pp. 247-258
-
-
Palatin, P.1
Lhuillier, Y.2
Temam, O.3
-
8
-
-
63649096141
-
Efficiency and scalability of barrier synchronization on NoC based many-core architecture
-
Atlanta, USA, Oct. 19-24
-
Villa O, Palermo G, Silvano C. Efficiency and scalability of barrier synchronization on NoC based many-core architecture. In Proc. CASES 2008, Atlanta, USA, Oct. 19-24, 2008, pp.81-90.
-
(2008)
Proc. CASES 2008
, pp. 81-90
-
-
Villa, O.1
Palermo, G.2
Silvano, C.3
-
10
-
-
0002081678
-
Co-array Fortran for parallel programming
-
10.1145/289918.289920
-
RW Numrich J Reid 1998 Co-array Fortran for parallel programming SIGPLAN Fortran Forum 17 2 1 31 10.1145/289918.289920
-
(1998)
SIGPLAN Fortran Forum
, vol.17
, Issue.2
, pp. 1-31
-
-
Numrich, R.W.1
Reid, J.2
-
11
-
-
0032155556
-
Titanium: A high-performance Java dialect
-
10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0. CO;2-H
-
K Yelick L Semenzato, et al. 1998 Titanium: A high-performance Java dialect Concurrency: Practice and Experience 10 11-13 825 836 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H
-
(1998)
Concurrency: Practice and Experience
, vol.10
, Issue.1113
, pp. 825-836
-
-
Yelick, K.1
Semenzato, L.2
-
12
-
-
34548207355
-
Sequoia: Programming the memory hierarchy
-
Tampa, Florida, Nov. 11-17
-
Fatahalian K, Horn D R et al. Sequoia: Programming the memory hierarchy. In Proc. the 2006 ACM/IEEE Conference on Supercomputing, Tampa, Florida, Nov. 11-17, 2006, pp.83-95.
-
(2006)
Proc. the 2006 ACM/IEEE Conference on Supercomputing
, pp. 83-95
-
-
Fatahalian, K.1
Horn, D.R.2
-
13
-
-
33751022080
-
Programming for parallelism and locality with hierarchically tiled arrays
-
New York, USA, March 29-31
-
Bikshandi G, Guo J et al. Programming for parallelism and locality with hierarchically tiled arrays. In Proc. the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, USA, March 29-31, 2006, pp.48-57.
-
(2006)
Proc. the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 48-57
-
-
Bikshandi Guo, G.J.1
-
14
-
-
0026137159
-
Synchronization without contention
-
Santa Clara, USA, April 8-11
-
Mellor-Crummey J M, Scott M L. Synchronization without contention. In Proc. Architectural Support for Programming Languages and Operating Systems, Santa Clara, USA, April 8-11, 1991, pp.269-278.
-
(1991)
Proc. Architectural Support for Programming Languages and Operating Systems
, pp. 269-278
-
-
Mellor-Crummey, J.M.1
Scott, M.L.2
-
15
-
-
0025028257
-
The Tera computer system
-
Amsterdam, The Netherlands, June 11-15
-
Alverson R, Callahan D et al. The Tera computer system. In Proc. the 4th Int. Conf. Supercomputing, Amsterdam, The Netherlands, June 11-15, 1990, pp.1-6.
-
(1990)
Proc. the 4th Int. Conf. Supercomputing
, pp. 1-6
-
-
Alverson, R.1
Callahan, D.2
-
16
-
-
35348812496
-
Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures
-
San Diego, USA, June 9-13
-
Zhu W, Sreedhar V C et al. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th Annual International Symposium on Computer Architecture, San Diego, USA, June 9-13, 2007, pp.35-45.
-
(2007)
Proc. the 34th Annual International Symposium on Computer Architecture
, pp. 35-45
-
-
Zhu Sreedhar, W.V.C.1
-
17
-
-
0029179077
-
The SPLASH-2 programs: Characterization and methodological considerations
-
Santa Margnerita Ligure, Italy, June 22-24
-
Woo S C, Ohara M et al. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. the 22nd Annual International Symposium on Computer Architecture, Santa Margnerita Ligure, Italy, June 22-24, 1995, pp.24-36.
-
(1995)
Proc. the 22nd Annual International Symposium on Computer Architecture
, pp. 24-36
-
-
Woo Ohara, C.S.M.1
-
18
-
-
4444237022
-
Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry
-
DOI 10.1093/bioinformatics/bth186
-
Y Fu Q Yang, et al. 2004 Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry Bioinformatics 20 1 1948 1954 10.1093/bioinformatics/bth186 (Pubitemid 39199057)
-
(2004)
Bioinformatics
, vol.20
, Issue.12
, pp. 1948-1954
-
-
Fu, Y.1
Yang, Q.2
Sun, R.3
Li, D.4
Zeng, R.5
Ling, C.X.6
Gao, W.7
-
19
-
-
0030801002
-
Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
-
DOI 10.1093/nar/25.17.3389
-
S Altschul T Madden A Schaffer, et al. 1997 Gapped Blast and Psi-Blast: A new generation of protein database search programs Nucleic Acids Research 25 17 3389 3402 10.1093/nar/25.17.3389 (Pubitemid 27359211)
-
(1997)
Nucleic Acids Research
, vol.25
, Issue.17
, pp. 3389-3402
-
-
Altschul, S.F.1
Madden, T.L.2
Schaffer, A.A.3
Zhang, J.4
Zhang, Z.5
Miller, W.6
Lipman, D.J.7
-
20
-
-
0032627704
-
Evaluating synchronization on shared address space multiprocessors: Methodology and performance
-
10.1145/301464.301477
-
S Kumar D Jiang, et al. 1999 Evaluating synchronization on shared address space multiprocessors: Methodology and performance ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS 1999) 27 1 23 34 10.1145/301464.301477
-
(1999)
ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS 1999)
, vol.27
, Issue.1
, pp. 23-34
-
-
Kumar, S.1
Jiang, D.2
-
21
-
-
0024032163
-
ANALYSIS OF THE COMPUTATIONAL AND PARALLEL COMPLEXITY OF THE LIVERMORE LOOPS.
-
DOI 10.1016/0167-8191(88)90037-3
-
J Feo 1988 An analysis of the computational and parallel complexity of the Livermore loops Parallel Computing 7 2 163 185 0651.65033 10.1016/0167-8191(88)90037-3 (Pubitemid 18648054)
-
(1988)
Parallel Computing
, vol.7
, Issue.2
, pp. 163-185
-
-
Feo John, T.1
-
22
-
-
70350630422
-
High performance matrix multiplication on many cores
-
Delft, The Netherlands, Aug. 25-28
-
Yuan N, Zhou Y et al. High performance matrix multiplication on many cores. In Proc. European Conference on Parallel and Distributed Computing (Euro-Par), Delft, The Netherlands, Aug. 25-28, 2009, pp.948-959.
-
(2009)
Proc. European Conference on Parallel and Distributed Computing (Euro-Par)
, pp. 948-959
-
-
Yuan Zhou, N.Y.1
-
23
-
-
70350771131
-
Benchmarking GPUs to tune dense linear algebra
-
Austin, USA, Now. 15-21, IEEE Press
-
Volkov V, Demmel J W. Benchmarking GPUs to tune dense linear algebra. In Proc. 2008 ACM/IEEE Conf. Supercomputing (SC 2008), Austin, USA, Now. 15-21, IEEE Press, 2008, pp.1-11.
-
(2008)
Proc. 2008 ACM/IEEE Conf. Supercomputing (SC 2008)
, pp. 1-11
-
-
Volkov, V.1
Demmel, J.W.2
-
24
-
-
34548712562
-
Optimizing fast Fourier transform on a multi-core architecture
-
Long Beach, USA, March 26-30
-
Chen L, Hu Z et al. Optimizing fast Fourier transform on a multi-core architecture. In Proc. IEEE International Parallel and Distributed Processing Symposium, Long Beach, USA, March 26-30, 2007, pp.1-8.
-
(2007)
Proc. IEEE International Parallel and Distributed Processing Symposium
, pp. 1-8
-
-
Chen Hu, L.Z.1
-
25
-
-
33750004191
-
Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences
-
Dresden, Germany, August 28-September 1
-
Hu Z, Cuvillo J et al. Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences. In Proc. Euro-Par 2006, Dresden, Germany, August 28-September 1, pp.134-144.
-
Proc. Euro-Par 2006
, pp. 134-144
-
-
Hu Cuvillo, Z.J.1
-
26
-
-
70350754502
-
High performance discrete Fourier transforms on graphics processors
-
Austin, USA, Nov. 15-21
-
Govindaraju N K et al. High performance discrete Fourier transforms on graphics processors. In Proc. the 2008 ACM/IEEE Conference on Supercomputing (SC2008), Austin, USA, Nov. 15-21, 2008, pp.13-24.
-
(2008)
Proc. the 2008 ACM/IEEE Conference on Supercomputing (SC2008)
, pp. 13-24
-
-
Govindaraju, N.K.1
Al, E.2
-
27
-
-
34247349114
-
The potential of the cell processor for scientific computing
-
Ischia, Italy, May 3-5
-
Williams S, Shalf J et al. The potential of the cell processor for scientific computing. In Proc. CF'06, Ischia, Italy, May 3-5, 2006, pp.9-20.
-
(2006)
Proc. CF'06
, pp. 9-20
-
-
Williams Shalf, S.J.1
Al, E.2
-
28
-
-
0034246578
-
Location consistency - a new memory model and cache consistency protocol
-
DOI 10.1109/12.868026
-
GR Gao V Sarkar 2000 Location consistency - A new memory model and cache consistency protocol IEEE Transactions on Computers 49 8 798 813 10.1109/12.868026 (Pubitemid 30927304)
-
(2000)
IEEE Transactions on Computers
, vol.49
, Issue.8
, pp. 798-813
-
-
Gao, G.R.1
Sarkar, V.2
-
29
-
-
0032671416
-
Commit-reconcile & fences (CRF): A new memory model for architects and compiler writers
-
Atlanta, USA, May 2-4
-
Shen X et al. Commit-reconcile & fences (CRF): A new memory model for architects and compiler writers. In Proc. the 26th Annual International Symposium on Computer Architecture, Atlanta, USA, May 2-4, 1999, pp.150-161.
-
(1999)
Proc. the 26th Annual International Symposium on Computer Architecture
, pp. 150-161
-
-
Shen, X.1
Al, E.2
-
30
-
-
0030402378
-
Scope consistency: A bridge between release consistency and entry consistency
-
Padua, Italy, June 24-26
-
Lftode L et al. Scope consistency: A bridge between release consistency and entry consistency. In Proc. the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, Padua, Italy, June 24-26, 1996, pp.277-287.
-
(1996)
Proc. the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures
, pp. 277-287
-
-
Lftode, L.1
Al, E.2
-
31
-
-
35348862407
-
BulkSC: Bulk enforcement of sequential consistency
-
San Diego, USA, June 9-13
-
Ceze L, Tuck J et al. BulkSC: Bulk enforcement of sequential consistency. In Proc. the 34th Annual International Symposium on Computer Architecture, San Diego, USA, June 9-13, 2007, pp.278-289.
-
(2007)
Proc. the 34th Annual International Symposium on Computer Architecture
, pp. 278-289
-
-
Ceze Tuck, L.J.1
Al, E.2
-
32
-
-
27644567646
-
Power efficient architecture and the cell processor
-
San Francisco, USA, February 12-16
-
Hofstee P. Power efficient architecture and the cell processor. In Proc. HPCA-11, San Francisco, USA, February 12-16, 2005, pp.258-262.
-
(2005)
Proc. HPCA-11
, pp. 258-262
-
-
Hofstee, P.1
-
33
-
-
33746304031
-
Dissecting cyclops: A detailed analysis of a multithreaded architecture
-
10.1145/773365.773369
-
G Almasi C Cascaval, et al. 2003 Dissecting cyclops: A detailed analysis of a multithreaded architecture ACM SIGARCH Computer Architecture News 31 1 26 38 10.1145/773365.773369
-
(2003)
ACM SIGARCH Computer Architecture News
, vol.31
, Issue.1
, pp. 26-38
-
-
Almasi, G.1
Cascaval, C.2
-
34
-
-
44849137198
-
NVIDIA Tesla: A unified graphics and computing architecture
-
DOI 10.1109/MM.2008.31
-
E Lindholm, et al. 2008 NVIDIA Tesla: A unified graphics and computing architecture IEEE Micro 28 2 39 55 10.1109/MM.2008.31 (Pubitemid 351796170)
-
(2008)
IEEE Micro
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
35
-
-
0026137159
-
Synchronization without contention
-
Santa Clara, USA, April 8-11
-
Mellor-Crummey, J M, Scott M L. Synchronization without contention. In Proc. Architectural Support for Programming Languages and Operating Systems, Santa Clara, USA, April 8-11, 1991, pp.269-278.
-
(1991)
Proc. Architectural Support for Programming Languages and Operating Systems
, pp. 269-278
-
-
Mellor-Crummey, M.J.1
Scott, M.L.2
-
36
-
-
0031593999
-
Exploiting fine-grain thread level parallelism on the MIT multi-alu processor
-
Barcelona, Spain, June 27-July 1
-
Keckler S W et al. Exploiting fine-grain thread level parallelism on the MIT multi-alu processor. In Proc. the 25th Annual International Symposium on Computer Architecture, Barcelona, Spain, June 27-July 1, 1998, pp.306-317.
-
(1998)
Proc. the 25th Annual International Symposium on Computer Architecture
, pp. 306-317
-
-
Keckler, S.W.1
-
37
-
-
40349086066
-
Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
-
Orlando, USA, Dec. 9-13
-
Sampson J, Gonzalez R. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, USA, Dec. 9-13, 2006, pp.235-246.
-
(2006)
Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture
, pp. 235-246
-
-
Sampson, J.1
Gonzalez, R.2
-
38
-
-
63649096141
-
Efficiency and scalability of barrier synchronization on NoC based many-core architecture
-
Atlanta, USA, October 19-24
-
Villa O et al. Efficiency and scalability of barrier synchronization on NoC based many-core architecture. In Proc. CASES 2008, Atlanta, USA, October 19-24, 2008, pp.81-90.
-
(2008)
Proc. CASES 2008
, pp. 81-90
-
-
Villa, O.1
Al, E.2
|