-
1
-
-
35348885705
-
LAPACK users' guide
-
3rd edition
-
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. SIAM, 3rd edition, 1999.
-
(1999)
SIAM
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
Blackford, S.4
Demmel, J.5
Dongarra, J.6
Croz, J.D.7
Greenbaum, A.8
Hammarling, S.9
McKenney, A.10
Sorensen, D.11
-
2
-
-
35648995516
-
-
EECS Department, University of California, Dec.
-
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Dec. 2006.
-
(2006)
The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183
-
-
Asanovic, K.1
Bodik, R.2
Catanzaro, B.C.3
Gebis, J.J.4
Husbands, P.5
Keutzer, K.6
Patterson, D.A.7
Plishker, W.L.8
Shalf, J.9
Williams, S.W.10
Yelick, K.A.11
-
4
-
-
0025447908
-
Improving register allocation for subscripted variables
-
White Plains, NY, June
-
D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for subscripted Variables. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 53-65, White Plains, NY, June 1990.
-
(1990)
Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation
, pp. 53-65
-
-
Callahan, D.1
Carr, S.2
Kennedy, K.3
-
6
-
-
84885772106
-
-
Clearspeed White Paper: CSX Processor Architecture. http://www. clearspeed.com/newsevents/presskit.
-
CSX Processor Architecture
-
-
-
7
-
-
84885802150
-
FAST: A functionally accurate simulation toolset for the cyclops-64 cellular architecture
-
Madison, Wisconsin, June 2005
-
J. del Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture. In Proceedings of the 2005 Workshop on Modeling, Benchmarking, and Simulation (MoBS 2005), Madison, Wisconsin, June 2005.
-
Proceedings of the 2005 Workshop on Modeling, Benchmarking, and Simulation (MoBS 2005)
-
-
Del Cuvillo, J.1
Zhu, W.2
Hu, Z.3
Gao, G.R.4
-
8
-
-
33746317085
-
TiNy threads: A thread virtual machine for the cyclops64 cellular architecture
-
Denver, Apr.
-
J. del Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture. In Proceedings of the 5th Workshop on Massively Parallel Processing, Denver, Apr. 2005.
-
(2005)
Proceedings of the 5th Workshop on Massively Parallel Processing
-
-
Del Cuvillo, J.1
Zhu, W.2
Hu, Z.3
Gao, G.R.4
-
9
-
-
0025402476
-
A set of level 3 basic linear algebra subprograms
-
J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1-17, 1990.
-
(1990)
ACM Transactions on Mathematical Software
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongarra, J.J.1
Croz, J.D.2
Hammarling, S.3
Duff, I.S.4
-
10
-
-
0029324485
-
Software libraries for linear algebra computations on high performance computers
-
J. J. Dongarra and D. W. Walker. Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2):151-180, 1995.
-
(1995)
SIAM Review
, vol.37
, Issue.2
, pp. 151-180
-
-
Dongarra, J.J.1
Walker, D.W.2
-
12
-
-
85034086461
-
On the problem of optimizing data transfers for complex memory systems
-
St. Malo, France
-
K. Gallivan, W. Jalby, and D. Gannon. On the problem of optimizing data transfers for complex memory systems. In Proceedings of the 2nd International Conference on Supercomputing, pages 238-253, St. Malo, France, 1988.
-
(1988)
Proceedings of the 2nd International Conference on Supercomputing
, pp. 238-253
-
-
Gallivan, K.1
Jalby, W.2
Gannon, D.3
-
13
-
-
0031273280
-
Recursion leads to automatic variable blocking for dense linear algebra algorithms
-
Nov.
-
F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear algebra algorithms. IBM Journal of Research and Development, 41(6):737-753, Nov. 1997.
-
(1997)
IBM Journal of Research and Development
, vol.41
, Issue.6
, pp. 737-753
-
-
Gustavson, F.G.1
-
15
-
-
33750004191
-
Optimization of dense matrix multiplication on IBM cyclops-64: Challenges and experiences
-
Dresden, Germany, Aug. 2006
-
Z. Hu, J. del Cuvillo, W. Zhu, and G. R. Gao. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences. In 12th International European Conference on Parallel Processing (Euro-Par 2006), pages 134-144, Dresden, Germany, Aug. 2006.
-
(2006)
12th International European Conference on Parallel Processing (Euro-Par)
, pp. 134-144
-
-
Hu, Z.1
Del Cuvillo, J.2
Zhu, W.3
Gao, G.R.4
-
16
-
-
0003648799
-
The OpenMP implementation of NAS parallel benchmarks and its performance
-
NASA Ames Research Center
-
H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance. Technical report nas-99-011, NASA Ames Research Center, 1999.
-
(1999)
Technical Report nas-99-011
-
-
Jin, H.1
Frumkin, M.2
Yan, J.3
-
17
-
-
0030190854
-
Improving data locality with loop transformations
-
K. S. McKinley, S. Carr, and C. W. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, 1996.
-
(1996)
ACM Transactions on Programming Languages and Systems
, vol.18
, Issue.4
, pp. 424-453
-
-
McKinley, K.S.1
Carr, S.2
Tseng, C.W.3
-
19
-
-
84885698984
-
-
FLAME Working Note 26, Sept.
-
G. Quintana-Orti, E. S. Quintana-Orti, E. Chan, R. A. van de Geijn, and F. G. V. Zee. Design and Scheduling of an Algorithm-by-Blocks for the LU Factorization on Multithreaded Architectures. FLAME Working Note 26, Sept. 2007.
-
(2007)
Design and Scheduling of An Algorithm-by-Blocks for the LU Factorization on Multithreaded Architectures
-
-
Quintana-Orti, G.1
Quintana-Orti, E.S.2
Chan, E.3
Geijn De Van, R.A.4
Zee, F.G.V.5
-
21
-
-
34548858682
-
An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS
-
San Francisco Marriott, CA, USA, Feb.
-
S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference, pages 5-7, San Francisco Marriott, CA, USA, Feb. 2007.
-
(2007)
Proceedings of the 2007 IEEE International Solid-State Circuits Conference
, pp. 5-7
-
-
Vangal, S.1
Howard, J.2
Ruhl, G.3
Dighe, S.4
Wilson, H.5
Tschanz, J.6
Finan, D.7
Iyer, P.8
Singh, A.9
Jacob, T.10
Jain, S.11
Venkataraman, S.12
Hoskote, Y.13
Borkar, N.14
-
22
-
-
35348879576
-
-
Technical Memo 75, Computer Architecture and Parallel Systems Laboratory, University of Delaware, Feb.
-
I. E. Venetis and G. R. Gao. Optimizing the LU Benchmark for the Cyclops-64 Architecture. Technical Memo 75, Computer Architecture and Parallel Systems Laboratory, University of Delaware, Feb. 2007. http://www.capsl.udel. edu/publications.shtml.
-
(2007)
Optimizing the LU Benchmark for the cyclops-64 Architecture
-
-
Venetis, I.E.1
Gao, G.R.2
-
23
-
-
0029179077
-
The SPLASH-2 programs: Characterization and methodological considerations
-
Santa Margherita Ligure, Italy, June
-
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.
-
(1995)
Proceedings of the 22nd International Symposium on Computer Architecture
, pp. 24-36
-
-
Woo, S.C.1
Ohara, M.2
Torrie, E.3
Singh, J.P.4
Gupta, A.5
-
24
-
-
35348812496
-
Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures
-
San Diego, California, USA, June
-
W. Zhu, V. C. Sreedhar, Z. Hu, and G. R. Gao. Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures. In Proceedings of the 34th International Symposium on Computer Architecture, pages 35-45, San Diego, California, USA, June 2007.
-
(2007)
Proceedings of the 34th International Symposium on Computer Architecture
, pp. 35-45
-
-
Zhu, W.1
Sreedhar, V.C.2
Hu, Z.3
Gao, G.R.4
|