NVIDIA, "CUDA C Programming Guide, Version 4.2, " April 2012
NVIDIA, "CUDA C Programming Guide, Version 4.2, " April 2012.
Khronos OpenCL Working Group, "The OpenCL Specification, Version 1.1, Revision 44, " June 2011
Khronos OpenCL Working Group, "The OpenCL Specification, Version 1.1, Revision 44, " June 2011.
OpenMP Architecture Review Board, "OpenMP Application Program Interface, Version 3.1, " July 2011
OpenMP Architecture Review Board, "OpenMP Application Program Interface, Version 3.1, " July 2011.
U. Drepper and I. Molnar, "The Native POSIX Thread Library for Linux, " Redhat, Tech. Rep. February 2003
U. Drepper and I. Molnar, "The Native POSIX Thread Library for Linux, " Redhat, Tech. Rep., February 2003.
MPI Forum, "MPI: A Message-Passing Interface Standard, Version 3.0, " July 1997
MPI Forum, "MPI: A Message-Passing Interface Standard, Version 3.0, " July 1997.
S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures, " Commun. ACM, vol. 52, no. 4, 65-76, April 2009
S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures, " Commun. ACM, vol. 52, no. 4, pp. 65-76, April 2009.
Intel Corporation, "IntelR Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual, " September 2012, reference number 327364-001
Intel Corporation, "IntelR Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual, " September 2012, reference number 327364-001.
8] A. Heinecke, M. Klemm, and H.-J. Bungartz, "From GPGPUs to Many-Core: NVIDIA Fermi∗ and IntelR Many Integrated Core Architecture, " Computing in Science and Engineering, vol. 14, no. 2, 78-83, March-April 2012
. [8] A. Heinecke, M. Klemm, and H.-J. Bungartz, "From GPGPUs to Many-Core: NVIDIA Fermi∗ and IntelR Many Integrated Core Architecture, " Computing in Science and Engineering, vol. 14, no. 2, pp. 78-83, March-April 2012.
Intel Corporation, "IntelR C++ Compiler XE 13.0 User and Reference Guides, " September 2012, document number 323273-130US
Intel Corporation, "IntelR C++ Compiler XE 13.0 User and Reference Guides, " September 2012, document number 323273-130US.
D. Gutfreund, "Mesca BCS Systems, " Bull SAS, rue Jean Jaurs, 78340 Les Clayes sous Bois, France, October 2012
D. Gutfreund, "Mesca BCS Systems, " Bull SAS, rue Jean Jaurs, 78340 Les Clayes sous Bois, France, October 2012.
S. Wienke, D. Plotnikov, D. an Mey, C. Bischof, A. Hardjosuwito, C. Gorgels, and C. Brecher, "Simulation of bevel gear cutting with GPGPUs-performance and productivity, " Computer Science-Research and Development, vol. 26, 165-174, 2011
S. Wienke, D. Plotnikov, D. an Mey, C. Bischof, A. Hardjosuwito, C. Gorgels, and C. Brecher, "Simulation of bevel gear cutting with GPGPUs-performance and productivity, " Computer Science-Research and Development, vol. 26, pp. 165-174, 2011.
K. W. Schulz, R. Ulerich, N. Malaya, P. T. Bauman, R. Stogner, and C. Simmons, "Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform, " TACC-Intel Highly Parallel Computing Symposium, Tech. Rep. April 2012
K. W. Schulz, R. Ulerich, N. Malaya, P. T. Bauman, R. Stogner, and C. Simmons, "Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform, " TACC-Intel Highly Parallel Computing Symposium, Tech. Rep., April 2012.
A. Heinecke, M. Klemm, D. Pflüger, A. Bode, and H.-J. Bungartz, "Extending a Highly Parallel Data Mining Algorithm to the IntelR Many Integrated Core Architecture, " in Euro-Par 2011: Parallel Processing Workshops, Bordeaux, France, August 2011, 375-384, LNCS 7156
A. Heinecke, M. Klemm, D. Pflüger, A. Bode, and H.-J. Bungartz, "Extending a Highly Parallel Data Mining Algorithm to the IntelR Many Integrated Core Architecture, " in Euro-Par 2011: Parallel Processing Workshops, Bordeaux, France, August 2011, pp. 375-384, LNCS 7156.
N. Bell and M. Garland, "Efficient Sparse Matrix-Vector Multiplication on CUDA, " NVIDIA Corporation, Tech. Rep. NVR-2008-004, December 2008
N. Bell and M. Garland, "Efficient Sparse Matrix-Vector Multiplication on CUDA, " NVIDIA Corporation, Tech. Rep. NVR-2008-004, December 2008.
C. Terboven, D. an Mey, D. Schmidl, H. Jin, and T. Reichstein, "Data and Thread Affinity in OpenMP Programs, " in Proc. of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem?, Ischia, Italy, May 2008, 377-384
C. Terboven, D. an Mey, D. Schmidl, H. Jin, and T. Reichstein, "Data and Thread Affinity in OpenMP Programs, " in Proc. of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem?, Ischia, Italy, May 2008, pp. 377-384.
N. Berr, D. Schmidl, J. H. Göbbert, S. Lankes, D. an Mey, T. Bemmerl, and C. Bischof, "Trajectory-Search on ScaleMP's vSMP Architecture, " Advances in Parallel Computing: Applications, Tools and Techniques on the Road to Exascale Computing, vol. 22, 227-234, 2012
N. Berr, D. Schmidl, J. H. Göbbert, S. Lankes, D. an Mey, T. Bemmerl, and C. Bischof, "Trajectory-Search on ScaleMP's vSMP Architecture, " Advances in Parallel Computing: Applications, Tools and Techniques on the Road to Exascale Computing, vol. 22, pp. 227-234, 2012.
D. Schmidl, C. Terboven, A. Wolf, D. an Mey, and C. H. Bischof, "How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture, " in Proc. of the IEEE Intl. Conf. on Cluster Computing, Heraklion, Greece, September 2010, 29-37
D. Schmidl, C. Terboven, A. Wolf, D. an Mey, and C. H. Bischof, "How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture, " in Proc. of the IEEE Intl. Conf. on Cluster Computing, Heraklion, Greece, September 2010, pp. 29-37.
J. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers, " http://www.cs.virginia.edu/stream, 1999, [Online, accessed 29-March-2012
J. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers, " http://www.cs.virginia.edu/stream, 1999, [Online, accessed 29-March-2012].
J. M. Bull, "Measuring Synchronisation and Scheduling Overheads in OpenMP, " in Proc. of the 1st European Workshop on OpenMP, Lund, Sweden, October 1999, 99-105
J. M. Bull, "Measuring Synchronisation and Scheduling Overheads in OpenMP, " in Proc. of the 1st European Workshop on OpenMP, Lund, Sweden, October 1999, pp. 99-105.
M. R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems, " Journal of Research of the National Bureau of Standards, vol. 49, no. 6, 409-436, December 1952
M. R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems, " Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409-436, December 1952.
T. A. Davis, "University of Florida Sparse Matrix Collection, " NA Digest, vol. 92, 1994
T. A. Davis, "University of Florida Sparse Matrix Collection, " NA Digest, vol. 92, 1994.