-
1
-
-
0343462141
-
Automated empirical optimizations of software and the ATLAS project
-
R. Clinton Whaley, Antoine Petitet and Jack J. Dongarra. Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Computing, 27(1-2):3-35, 2001.
-
(2001)
Parallel Computing
, vol.27
, Issue.1-2
, pp. 3-35
-
-
Clinton Whaley, R.1
Petitet, A.2
Dongarra, J.J.3
-
2
-
-
20744449792
-
The design and implementation of FFTW3
-
Matteo Frigo and Steven G. Johnson. The Design and Implementation of FFTW3. Proceedings of the IEEE, 93(2):216-231, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.G.2
-
3
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
Markus Püschel, José M. F. Moura, Jeremy Johnson, David A. Padua, Manuela M. Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson and Nicholas Rizzolo. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE, 93(2):232-275, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.A.4
Veloso, M.M.5
Singer, B.6
Xiong, J.7
Franchetti, F.8
Gacic, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
4
-
-
20744459570
-
Is search really necessary to generate high performance BLAS?
-
Kamen Yotov, Xiaoming Li, Gang Ren, María J. Garzaran, David A. Padua, Keshav Pingali and Paul Stodghill. Is Search Really Necessary to Generate High Performance BLAS? Proceedings of the IEEE, 93(2):358-386, 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 358-386
-
-
Yotov, K.1
Li, X.2
Ren, G.3
Garzaran, M.J.4
Padua, D.A.5
Pingali, K.6
Stodghill, P.7
-
5
-
-
84957602253
-
Optimization of MPI collectives on clusters of large-scale SMPs
-
Portland, OR, USA
-
Steve Sistare, Rolf van de Vaart and Eugene Loh. Optimization of MPI Collectives on Clusters of Large-Scale SMPs. In Proc. 12th Supercomputing Conf. (SC'99), pages 23-36, Portland, OR, USA, 1999.
-
(1999)
Proc. 12th Supercomputing Conf. (SC'99)
, pp. 23-36
-
-
Sistare, S.1
De V.R.Van2
Loh, E.3
-
6
-
-
84956863176
-
The hierarchical factor algorithm for all-to-all communication
-
Padeborn, Germany
-
Peter Sanders and Jesper L. Träff. The Hierarchical Factor Algorithm for All-to-All Communication. In Proc. 8th Euro-Par Conf. (Euro-Par'02), volume 2400 of Lecture Notes in Computer Science, pages 799-804, Padeborn, Germany, 2002.
-
(2002)
Proc. 8th Euro-Par Conf. (Euro-Par'02), Volume 2400 of Lecture Notes in Computer Science
, pp. 799-804
-
-
Sanders, P.1
Träff, J.L.2
-
7
-
-
84947252147
-
Fast collective operations using shared and remote memory access protocols on clusters
-
Nice, France
-
Vinod Tipparaju, Jarek Nieplocha and Dhabaleswar K. Panda. Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. In Proc. 17th Intl. Parallel and Distributed Processing Symposium (IPDPS'03), pages 84-93, Nice, France, 2003.
-
(2003)
Proc. 17th Intl. Parallel and Distributed Processing Symposium (IPDPS'03)
, pp. 84-93
-
-
Tipparaju, V.1
Nieplocha, J.2
Panda, D.K.3
-
8
-
-
70449640953
-
Automatic tuning of discrete fourier transforms driven by analytical modeling
-
Raleigh, NC, USA
-
Basilio B. Fraguela, Yevgen Voronenko and Markus Püschel. Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling. In 18th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'09), pages 271-280, Raleigh, NC, USA, 2009.
-
(2009)
18th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'09)
, pp. 271-280
-
-
Fraguela, B.B.1
Voronenko, Y.2
Püschel, M.3
-
9
-
-
34547472598
-
MPIPP: An automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters
-
Cairns, Australia
-
Hu Chen, Wenguang Chen, Jian Huang, Bob Robert and H. Kuhn. MPIPP: An Automatic Profile-guided Parallel Process Placement Toolset for SMP Clusters and Multiclusters. In Proc. 20th Intl. Conf. on Supercomputing (ICS'06), pages 353-360, Cairns, Australia, 2006.
-
(2006)
Proc. 20th Intl. Conf. on Supercomputing (ICS'06)
, pp. 353-360
-
-
Chen, H.1
Chen, W.2
Huang, J.3
Robert, B.4
Kuhn, H.5
-
10
-
-
70350462997
-
Towards an efficient process placement policy for MPI applications in multicore environments
-
Espoo, Finland
-
Guillaume Mercier and Jérôme Clet-Ortega. Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments. In Proc. 16th European PVM/MPI Users' Group Meeting (EuroPVM/MPI'09), volume 5759 of Lecture Notes in Computer Science, pages 104-115, Espoo, Finland, 2009.
-
(2009)
Proc. 16th European PVM/MPI Users' Group Meeting (EuroPVM/MPI'09), Volume 5759 of Lecture Notes in Computer Science
, pp. 104-115
-
-
Mercier, G.1
Clet-Ortega, J.2
-
11
-
-
70350662872
-
Process mapping for collective communications
-
Delft, The Netherlands
-
Jin Zhang, Jidong Zhai, Wenguang Chen and Weimin Zheng. Process Mapping for Collective Communications. In Proc. 15th Euro-Par Conf. (Euro-Par'09), volume 5704 of Lecture Notes in Computer Science, pages 81-92, Delft, The Netherlands, 2009.
-
(2009)
Proc. 15th Euro-Par Conf. (Euro-Par'09), Volume 5704 of Lecture Notes in Computer Science
, pp. 81-92
-
-
Zhang, J.1
Zhai, J.2
Chen, W.3
Zheng, W.4
-
12
-
-
33847254092
-
X-Ray: A tool for automatic measurement of hardware parameters
-
Torino, Italy
-
Kamen Yotov, Keshav Pingali and Paul Stodghill. X-Ray: A Tool for Automatic Measurement of Hardware Parameters. In Proc. 2nd Intl. Conf. on the Quantitative Evaluation of Systems (QEST'05), pages 168-177, Torino, Italy, 2005.
-
(2005)
Proc. 2nd Intl. Conf. on the Quantitative Evaluation of Systems (QEST'05)
, pp. 168-177
-
-
Yotov, K.1
Pingali, K.2
Stodghill, P.3
-
13
-
-
33244459867
-
Automatic measurement of memory hierarchy parameters
-
DOI 10.1145/1064212.1064233, SIGMETRICS 2005: International Conference on Measurement and Modeling of Computer Systems - Proceedings
-
Kamen Yotov, Keshav Pingali and Paul Stodghill. Automatic Measurement of Memory Hierarchy Parameters. In Proc. Intl. Conf. on Measurements and Modeling of Computer Systems (SIGMETRICS'05), pages 181-192, Banff, Canada, 2005. (Pubitemid 43275420)
-
(2005)
Performance Evaluation Review
, vol.33
, Issue.1
, pp. 181-192
-
-
Yotov, K.1
Pingali, K.2
Stodghill, P.3
-
14
-
-
58449097049
-
P-Ray: A suite of micro-benchmarks for multi-core architectures
-
Edmonton, Canada
-
Alexandre X. Duchateau, Albert Sidelnik, María J. Garzarán and David A. Padua. P-Ray: A Suite of Micro-benchmarks for Multi-core Architectures. In Proc. 21st Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC'08), volume 5335 of Lecture Notes in Computer Science, pages 187-201, Edmonton, Canada, 2008.
-
(2008)
Proc. 21st Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC'08), Volume 5335 of Lecture Notes in Computer Science
, pp. 187-201
-
-
Duchateau, A.X.1
Sidelnik, A.2
Garzarán, M.J.3
Padua, D.A.4
-
15
-
-
0000718681
-
Measuring cache and TLB performance and their effect on benchmark runtimes
-
Rafael H. Saavedra and Alan J. Smith. Measuring Cache and TLB Performance and their Effect on Benchmark Runtimes. IEEE Trans. Computers, 44(10):1223-1235, 1995.
-
(1995)
IEEE Trans. Computers
, vol.44
, Issue.10
, pp. 1223-1235
-
-
Saavedra, R.H.1
Smith, A.J.2
-
18
-
-
0009346826
-
LogP: Towards a realistic model of parallel computation
-
San Diego, CA, USA
-
David E. Culler, Richard Karp, David A. Patterson, Abhijit Sahay, Klaus E. Schauser, Eunice E. Santos, Ramesh Subramonian and Thorsten von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc. 4th Symposium on Principles and Practice of Parallel Programming (PPoPP'93), pages 1-12, San Diego, CA, USA, 1993.
-
(1993)
Proc. 4th Symposium on Principles and Practice of Parallel Programming (PPoPP'93)
, pp. 1-12
-
-
Culler, D.E.1
Karp, R.2
Patterson, D.A.3
Sahay, A.4
Schauser, K.E.5
Santos, E.E.6
Subramonian, R.7
Von Eicken, T.8
-
19
-
-
0028401457
-
The communication challenge for MPP: Intel paragon and meiko CS-2
-
Roger W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389-398, 1994.
-
(1994)
Parallel Computing
, vol.20
, Issue.3
, pp. 389-398
-
-
Hockney, R.W.1
-
21
-
-
77954006294
-
-
Finis Terrae, Last visit: January
-
Finis Terrae. http://www.top500.org/system/9500, Last visit: January 2010.
-
(2010)
-
-
|