-
1
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
R. D. e. a. Blumofe Aug.
-
R. D. e. a. Blumofe. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207-216, Aug. 1995.
-
(1995)
SIGPLAN Not.
, vol.30
, Issue.8
, pp. 207-216
-
-
-
3
-
-
70450059008
-
Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors
-
May
-
M. Boyer et al. Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors. In IPDPS'09, May 2009.
-
(2009)
IPDPS'09
-
-
Boyer, M.1
-
4
-
-
59749100826
-
Optimization and performance modeling of stencil computations on modern microprocessors
-
Feb.
-
K. Datta et al. Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev., 51(1):129-159, Feb. 2009.
-
(2009)
SIAM Rev.
, vol.51
, Issue.1
, pp. 129-159
-
-
Datta, K.1
-
7
-
-
84899688523
-
Lists of instruction latencies, throughputs and micro-operation reakdowns
-
Feb.
-
A. Fog. Lists of instruction latencies, throughputs and micro-operation reakdowns. Technical report, Copenhagen University, Feb. 2012.
-
(2012)
Technical Report, Copenhagen University
-
-
Fog, A.1
-
8
-
-
84880053798
-
Modeling communication in cache-coherent SMP systems - A case-study with xeon phi
-
S. R. Garea and T. Hoefler. Modeling Communication in Cache-Coherent SMP Systems - A Case-Study with Xeon Phi. 2013. HPDC'13.
-
(2013)
HPDC'13
-
-
Garea, S.R.1
Hoefler, T.2
-
9
-
-
33746763319
-
Instruction latencies and throughput for AMD and intel x86 processors
-
Feb.
-
T. Granlund. Instruction latencies and throughput for AMD and intel x86 processors. Technical report, KTH, Feb. 2012.
-
(2012)
Technical Report, KTH
-
-
Granlund, T.1
-
15
-
-
84875664258
-
-
Intel April
-
Intel. Intel Xeon Phi Coprocessor. http://software.intel.com/mic- developer, April 2013.
-
(2013)
Intel Xeon Phi Coprocessor
-
-
-
17
-
-
84865353129
-
Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
June
-
V. W. Lee et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Comput. Archit. News, 38(3), June 2010.
-
(2010)
SIGARCH Comput. Archit. News
, vol.38
, Issue.3
-
-
Lee, V.W.1
-
18
-
-
85084160699
-
Lmbench: Portable tools for performance analysis
-
L. McVoy et al. lmbench: portable tools for performance analysis. In USENIX ATEC'96, 1996.
-
(1996)
USENIX ATEC'96
-
-
McVoy, L.1
-
19
-
-
70449643566
-
Memory performance and cache coherency effects on an intel nehalem multiprocessor system
-
Sept.
-
D. Molka et al. Memory performance and cache coherency effects on an intel nehalem multiprocessor system. In PACT'09., Sept. 2009.
-
(2009)
PACT'09
-
-
Molka, D.1
-
20
-
-
48149094931
-
Memory hierarchy performance measurement of commercial dual-core desktop processors
-
Aug.
-
L. Peng et al. Memory hierarchy performance measurement of commercial dual-core desktop processors. Journal of Systems Architecture, 54(8):816-828, Aug. 2008.
-
(2008)
Journal of Systems Architecture
, vol.54
, Issue.8
, pp. 816-828
-
-
Peng, L.1
-
21
-
-
10044237712
-
Motion gradient vector flow: An external force for tracking rolling leukocytes with shape and size constrained active contours
-
IEEE Transactions on, Dec.
-
N. Ray et al. Motion gradient vector flow: an external force for tracking rolling leukocytes with shape and size constrained active contours. Medical Imaging, IEEE Transactions on, Dec. 2004.
-
(2004)
Medical Imaging
-
-
Ray, N.1
-
22
-
-
84866875424
-
Radio astronomy beam forming on many-core architectures
-
A. Sclocco et al. Radio astronomy beam forming on Many-Core architectures. In IPDPS, 2012.
-
(2012)
IPDPS
-
-
Sclocco, A.1
-
23
-
-
0000718681
-
Measuring cache and TLB performance and their effect on benchmark runtimes
-
Oct.
-
A. J. Smith et al. Measuring cache and TLB performance and their effect on benchmark runtimes. IEEE Trans. Comput., (10), Oct. 1995.
-
(1995)
IEEE Trans. Comput.
, Issue.10
-
-
Smith, A.J.1
-
24
-
-
77952162137
-
OpenCL: A parallel programming standard for heterogeneous computing systems
-
May
-
J. E. Stone et al. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, 12(3):66-72, May 2010.
-
(2010)
Computing in Science & Engineering
, vol.12
, Issue.3
, pp. 66-72
-
-
Stone, J.E.1
-
26
-
-
84899670707
-
Automatic OpenCL device characterization: Guiding optimized kernel design
-
P. Thoman et al. Automatic OpenCL device characterization: Guiding optimized kernel design. In Euro-Par'11. 2011.
-
(2011)
Euro-Par'11
-
-
Thoman, P.1
-
27
-
-
60649117768
-
Building high-resolution sky images using the Cell/B.e
-
A. L. Varbanescu, A. S. van Amesfoort, T. Cornwell, G. van Diepen, R. van Nieuwpoort, B. G. Elmegreen, and H. J. Sips. Building high-resolution sky images using the Cell/B.e. Scientific Programming, 17(1-2):113-134, 2009.
-
(2009)
Scientific Programming
, vol.17
, Issue.1-2
, pp. 113-134
-
-
Varbanescu, A.L.1
Van Amesfoort, A.S.2
Cornwell, T.3
Van Diepen, G.4
Van Nieuwpoort, R.5
Elmegreen, B.G.6
Sips, H.J.7
-
28
-
-
70350771131
-
Benchmarking GPUs to tune dense linear algebra
-
SC 2008. International Conference for IEEE, Nov.
-
V. Volkov and J. W. Demmel. Benchmarking GPUs to tune dense linear algebra. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1-11. IEEE, Nov. 2008.
-
(2008)
High Performance Computing, Networking, Storage and Analysis, 2008
, pp. 1-11
-
-
Volkov, V.1
Demmel, J.W.2
-
29
-
-
77952579552
-
Demystifying GPU microarchitecture through microbenchmarking
-
IEEE, Mar.
-
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. Demystifying GPU microarchitecture through microbenchmarking. In 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pages 235-246. IEEE, Mar. 2010.
-
(2010)
2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
, pp. 235-246
-
-
Wong, H.1
Papadopoulou, M.-M.2
Sadooghi-Alvandi, M.3
Moshovos, A.4
|