-
1
-
-
33748317896
-
Fast additions on masked integers
-
May
-
M. D. Adams and D. S. Wise. Fast additions on masked integers. SIGPLAN Not., 41(5):39-45, May 2006. http://doi.acm.art/10.1145/1149982.1149987
-
(2006)
SIGPLAN Not
, vol.41
, Issue.5
, pp. 39-45
-
-
Adams, M.D.1
Wise, D.S.2
-
2
-
-
33745777806
-
Cache oblivious matrix multiplication using an element ordering based on the Peano curve
-
Parallel Processing and Applied Mathematics, of, Berlin, Springer
-
M. Bader and C. Zenger. Cache oblivious matrix multiplication using an element ordering based on the Peano curve. In Parallel Processing and Applied Mathematics, volume 3911 of Lecture Notes in Comput. Sci., pages 1042-1049, Berlin, 2006. Springer. http://dx.doo.org/10.1007/11752578_126
-
(2006)
Lecture Notes in Comput. Sci
, vol.3911
, pp. 1042-1049
-
-
Bader, M.1
Zenger, C.2
-
3
-
-
0036870763
-
Recursive array layouts and fast parallel matrix multiplication
-
Nov
-
S. Chatterjee, A. R. Lebeck, P. K. Patnala, and M. Thottenthodi. Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst., 13(11):1105-1123, Nov. 2002. http://dx.doi.org/10. 1109/TPDS.2002.1058095
-
(2002)
IEEE Trans. Parallel Distrib. Syst
, vol.13
, Issue.11
, pp. 1105-1123
-
-
Chatterjee, S.1
Lebeck, A.R.2
Patnala, P.K.3
Thottenthodi, M.4
-
4
-
-
0025402476
-
A set of level 3 Basic Linear Algebra Subprograms
-
Mar
-
J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 16(1): 1-17, Mar. 1990. http://doi.acm.org/10.1145/77626.79170
-
(1990)
ACM Trans. Math. Softw
, vol.16
, Issue.1
, pp. 1-17
-
-
Dongarra, J.J.1
Du Croz, J.2
Hammarling, S.3
Duff, I.S.4
-
5
-
-
34248384701
-
-
G. C. Fox. A graphical approach to load balancing and sparse matrix-vector multiplication. In M. Schultz, editor, Numerical Algorithms for Modern Parallel Architectures, 13 of IMA in Math. & Appl., pages 37-61. Springer, New York, 1988.
-
G. C. Fox. A graphical approach to load balancing and sparse matrix-vector multiplication. In M. Schultz, editor, Numerical Algorithms for Modern Parallel Architectures, volume 13 of IMA Vol. in Math. & Appl., pages 37-61. Springer, New York, 1988.
-
-
-
-
6
-
-
77953929281
-
-
B. B. Fraguela, J. Guo, G. Bikshandi, M. J. Garzarán, G. Almási, J. Moreira, and D. Padua. The hierarchically tiled arrays programming approach. In LCR '04: Proc. 7th Wkshp. Languages, Compilers, and Run-Time Support for Scalable Systems, 81 of ACM Int. Conf. Proc. Series, pages 1-12. ACM Press, New York, 2004. http://doi.acm.org/10.1145/ 1066650.1066657
-
B. B. Fraguela, J. Guo, G. Bikshandi, M. J. Garzarán, G. Almási, J. Moreira, and D. Padua. The hierarchically tiled arrays programming approach. In LCR '04: Proc. 7th Wkshp. Languages, Compilers, and Run-Time Support for Scalable Systems, volume 81 of ACM Int. Conf. Proc. Series, pages 1-12. ACM Press, New York, 2004. http://doi.acm.org/10.1145/ 1066650.1066657
-
-
-
-
7
-
-
0033350255
-
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Ann. Symp. Foundations of Computer Science, pages 285-298. IEEE Computer Soc. Press, Washington, DC, Oct. 1999. http://dx.doi.org/10.1109/SFFCS.1999.814600
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Ann. Symp. Foundations of Computer Science, pages 285-298. IEEE Computer Soc. Press, Washington, DC, Oct. 1999. http://dx.doi.org/10.1109/SFFCS.1999.814600
-
-
-
-
8
-
-
34248366388
-
-
Indiana University, Bloomington, IN, Apr
-
S. T. Gabriel, B. Chenoweth, K. P. Lorton, M. Carlson, and D. S. Wise. The Opie Compiler Distribution. Indiana University, Bloomington, IN, Apr. 2005. http://www.cs.indiana.edu/~dswise/Opie/distribution.html
-
(2005)
The Opie Compiler Distribution
-
-
Gabriel, S.T.1
Chenoweth, B.2
Lorton, K.P.3
Carlson, M.4
Wise, D.S.5
-
9
-
-
77954450405
-
The Opie compiler from rowmajor source to Morton-ordered matrices
-
J. Carter and L. Zhang, editors, ACM Press, New York
-
S. T. Gabriel and D. S. Wise. The Opie compiler from rowmajor source to Morton-ordered matrices. In J. Carter and L. Zhang, editors, Proc. 3rd Wkshp. on Memory Performance Issues, pages 136-144. ACM Press, New York, 2004. http://doi.acm.org/10.1145/1054943.1054962
-
(2004)
Proc. 3rd Wkshp. on Memory Performance Issues
, pp. 136-144
-
-
Gabriel, S.T.1
Wise, D.S.2
-
10
-
-
0020249952
-
An effective way to represent quadtrees
-
Dec
-
I. Gargantini. An effective way to represent quadtrees. Commun. ACM, 25(12):905-910, Dec. 1982. http://doi.acm.org/10.1145/358728.358741
-
(1982)
Commun. ACM
, vol.25
, Issue.12
, pp. 905-910
-
-
Gargantini, I.1
-
11
-
-
0004236492
-
-
The Johns Hopkins Univ. Press, Baltimore, third edition
-
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins Univ. Press, Baltimore, third edition, 1996.
-
(1996)
Matrix Computations
-
-
Golub, G.H.1
Van Loan, C.F.2
-
13
-
-
49149109685
-
Anatomy of high-performance matrix multiplication
-
Technical report, Univ. of Texas, Austin. Submittted for publication. Visited Sept
-
K. Goto and R. A. van de Geijn. Anatomy of high-performance matrix multiplication. Technical report, Univ. of Texas, Austin. Submittted for publication. Visited Sept. 2006. http://www.cs.atexas.edu/users/flame/pubs/ GOTO_TOMS.pdf
-
(2006)
-
-
Goto, K.1
van de Geijn, R.A.2
-
14
-
-
34248333297
-
-
Innovative Computing Laboratory, Univ. of Tennessee, Knoxville, TN. Performance Application Programming Interface (PAPI), Dec. 2005. http://icl.cs.utk.edu/papi/
-
Innovative Computing Laboratory, Univ. of Tennessee, Knoxville, TN. Performance Application Programming Interface (PAPI), Dec. 2005. http://icl.cs.utk.edu/papi/
-
-
-
-
15
-
-
2942630889
-
A theoretician's guide to the experimental analysis of algorithms
-
M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch, editors, Data Structures, Near Neighbor Searches, and Methodology: 5th & 6th DIMACS Implementation Challenges, of, Amer. Math. Soc, Providence
-
D. S. Johnson. A theoretician's guide to the experimental analysis of algorithms. In M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch, editors, Data Structures, Near Neighbor Searches, and Methodology: 5th & 6th DIMACS Implementation Challenges, volume 59 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pages 215-250. Amer. Math. Soc., Providence, 2002. http://www.research.att.com/~dsj/papers.html
-
(2002)
DIMACS Ser. Discrete Math. Theoret. Comput. Sci
, vol.59
, pp. 215-250
-
-
Johnson, D.S.1
-
16
-
-
0033907995
-
Scalable parallel matrix multiplication on distributed memory parallel computers
-
IEEE Computer Soc. Press, Washington, DC, May
-
K. Li. Scalable parallel matrix multiplication on distributed memory parallel computers. In 14th Int. Parallel and Distributed Processing Symp. (IPDPS'00), pages 307-314. IEEE Computer Soc. Press, Washington, DC, May 2000. http://dx.doi.org/10.1109/IPDPS.2000.846000
-
(2000)
14th Int. Parallel and Distributed Processing Symp. (IPDPS'00)
, pp. 307-314
-
-
Li, K.1
-
17
-
-
34248372349
-
-
J. Markoff. Writing the fastest code, by hand, for fun: A human computer keeps speeding up chips. The New York Times, CLV(53,412):C1, C6, 2005 Nov. 28. http://www.nytimes.com/2005/11/28/technology/28super.html
-
J. Markoff. Writing the fastest code, by hand, for fun: A human computer keeps speeding up chips. The New York Times, CLV(53,412):C1, C6, 2005 Nov. 28. http://www.nytimes.com/2005/11/28/technology/28super.html
-
-
-
-
18
-
-
0003460690
-
A computer oriented geodetic data base and a new technique in file sequencing
-
Technical report, IBM Ltd, Ottawa, Ontario, Mar
-
G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario, Mar. 1966.
-
(1966)
-
-
Morton, G.M.1
-
19
-
-
0042235298
-
Tiling, block data layout, and memory hierarchy performance
-
July
-
N. Park, B. Hong, and V. K. Prasanna. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parallel Distrib. Syst., 14(7):640-654, July 2003. http://dx.doi.org/10.1109/TPDS.2003.1214317
-
(2003)
IEEE Trans. Parallel Distrib. Syst
, vol.14
, Issue.7
, pp. 640-654
-
-
Park, N.1
Hong, B.2
Prasanna, V.K.3
-
21
-
-
4544352521
-
Optimizing graph algorithms for improved cache performance
-
Sept
-
J. Sang Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Trans. Parallel Distrib. Syst., 15(9):769-782, Sept. 2004. http://dx.doi.org/10.1109/TPDS.2004.44
-
(2004)
IEEE Trans. Parallel Distrib. Syst
, vol.15
, Issue.9
, pp. 769-782
-
-
Sang Park, J.1
Penner, M.2
Prasanna, V.K.3
-
22
-
-
0000058088
-
Finding neighbors of equal size in linear quadtrees and octrees in constant time
-
May
-
G. Schrack. Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst., 55(3):221-230, May 1992.
-
(1992)
CVGIP: Image Underst
, vol.55
, Issue.3
, pp. 221-230
-
-
Schrack, G.1
-
23
-
-
0037173976
-
A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels
-
V. Valsalam and A. Skjellum. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels. Concur. Comp. Prac. Exper., 14(10): 805-839, 2002. http://dx.doi.org/10.1002/cpe.630
-
(2002)
Concur. Comp. Prac. Exper
, vol.14
, Issue.10
, pp. 805-839
-
-
Valsalam, V.1
Skjellum, A.2
-
24
-
-
84937431996
-
Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free
-
A. Bode, T. Ludwig, W. Karl, and R. Wismüller, editors, Euro-Par 2000, Parallel Processing, of, Springer, Heidelberg
-
D. S. Wise. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In A. Bode, T. Ludwig, W. Karl, and R. Wismüller, editors, Euro-Par 2000 - Parallel Processing, volume 1900 of Lecture Notes in Comput. Sci., pages 774-883. Springer, Heidelberg, 2000. http://www.springerlink.com/link.asp?id=0pc0e9gfk4x9j5fa
-
(2000)
Lecture Notes in Comput. Sci
, vol.1900
, pp. 774-883
-
-
Wise, D.S.1
-
25
-
-
27144518219
-
A paradigm for parallel matrix algorithms: Scalable Cholesky
-
J. C. Cunha and P. D. Medeiros, editors, Euro-Par 2005, Parallel Processing, number in, Springer, Berlin, Aug
-
D. S. Wise, C. L. Citro, J. J. Hursey, F. Liu, and M. A. Rainey. A paradigm for parallel matrix algorithms: Scalable Cholesky. In J. C. Cunha and P. D. Medeiros, editors, Euro-Par 2005 - Parallel Processing, number 3648 in Lecture Notes in Comput. Sci., pages 687-698. Springer, Berlin, Aug. 2005. http://dx.doi.org/10.1007/11549468_76
-
(2005)
Lecture Notes in Comput. Sci
, vol.3648
, pp. 687-698
-
-
Wise, D.S.1
Citro, C.L.2
Hursey, J.J.3
Liu, F.4
Rainey, M.A.5
-
26
-
-
0024935630
-
More iteration space tiling
-
ACM Press, New York, NY, USA, Nov
-
M. Wolfe. More iteration space tiling. In Proc. Supercomputing '89, pages 655-664. ACM Press, New York, NY, USA, Nov. 1989.
-
(1989)
Proc. Supercomputing '89
, pp. 655-664
-
-
Wolfe, M.1
|