-
1
-
-
77956773183
-
Extending openmp to survive the heterogeneous multi-core era
-
E. Ayguadé, R. M. Badia, P. Bellens, D. Cabrera, A. Duran, R. Ferrer, M. González, F. D. Igual, D. Jiménez-González, and J. Labarta. Extending openmp to survive the heterogeneous multi-core era. International Journal of Parallel Programming, 38(5-6):440-459, 2010.
-
(2010)
International Journal of Parallel Programming
, vol.38
, Issue.5-6
, pp. 440-459
-
-
Ayguadé, E.1
Badia, R.M.2
Bellens, P.3
Cabrera, D.4
Duran, A.5
Ferrer, R.6
González, M.7
Igual, F.D.8
Jiménez-González, D.9
Labarta, J.10
-
2
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09), pages 163-174, 2009.
-
(2009)
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09)
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
3
-
-
0015330108
-
The Illiac IV system
-
W. J. Bouknight, S. A. Denenberg, D. E. McIntyre, J. M. Randall, A. H. Sameh, and D. L. Slotnick. The Illiac IV system. Proc. of IEEE, 60(4):369-388, 1972.
-
(1972)
Proc. of IEEE
, vol.60
, Issue.4
, pp. 369-388
-
-
Bouknight, W.J.1
Denenberg, S.A.2
McIntyre, D.E.3
Randall, J.M.4
Sameh, A.H.5
Slotnick, D.L.6
-
5
-
-
84879815667
-
Dynamic task parallelism with a gpu work-stealing runtime system
-
S. Chatterjee, M. Grossman, A. S. Sbîrlea, and V. Sarkar. Dynamic task parallelism with a gpu work-stealing runtime system. In International Workshop on Languages and Compilers for Parallel Computing (LCPC'11), pages 203-217, 2011.
-
(2011)
International Workshop on Languages and Compilers for Parallel Computing (LCPC'11)
, pp. 203-217
-
-
Chatterjee, S.1
Grossman, M.2
Sbîrlea, A.S.3
Sarkar, V.4
-
6
-
-
84863351470
-
SIMD re-convergence at thread frontiers
-
G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili. SIMD re-convergence at thread frontiers. In IEEE/ACM International Symposium on Microarchitecture (MICRO '11), pages 477-488, 2011.
-
(2011)
IEEE/ACM International Symposium on Microarchitecture (MICRO '11)
, pp. 477-488
-
-
Diamos, G.1
Ashbaugh, B.2
Maiyuran, S.3
Kerr, A.4
Wu, H.5
Yalamanchili, S.6
-
7
-
-
77951455429
-
Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP
-
A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In International Conference on Parallel Processing (ICPP '09), pages 124-131, 2009.
-
(2009)
International Conference on Parallel Processing (ICPP '09)
, pp. 124-131
-
-
Duran, A.1
Teruel, X.2
Ferrer, R.3
Martorell, X.4
Ayguade, E.5
-
8
-
-
84879816910
-
-
Dwarf mine. http://view.eecs.berkeley.edu/wiki/Dwarf\-Mine.
-
Dwarf Mine
-
-
-
9
-
-
34249839187
-
The area of the mandelbrot set
-
J. Ewing and G. Schober. The area of the mandelbrot set. Numerische Mathematik, 61(1):59-72, 1992.
-
(1992)
Numerische Mathematik
, vol.61
, Issue.1
, pp. 59-72
-
-
Ewing, J.1
Schober, G.2
-
12
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In IEEE/ACM International Symposium on Microarchitecture (MICRO '07), pages 407-420, 2007.
-
(2007)
IEEE/ACM International Symposium on Microarchitecture (MICRO '07)
, pp. 407-420
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
13
-
-
84947255974
-
Vectorization of multigrid codes using SIMD ISA extensions
-
C. Garcia, R. Lario, M. Prieto, L. Pinuel, and F. Tirado. Vectorization of multigrid codes using SIMD ISA extensions. In International Parallel and Distributed Processing Symposium (IPDPS '03), pages 8-pp, 2003.
-
(2003)
International Parallel and Distributed Processing Symposium (IPDPS '03)
, pp. 8
-
-
Garcia, C.1
Lario, R.2
Prieto, M.3
Pinuel, L.4
Tirado, F.5
-
14
-
-
84892549898
-
-
GPGPU-Sim 3.x manual. http://gpgpu-sim.org/manual/index.php5/GPGPU-Sim-3. x-Manual#Introduction.
-
GPGPU-Sim 3.X Manual
-
-
-
15
-
-
0034228634
-
2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing
-
Jul
-
N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, A. Kunimatsu, T. Sato, T. Kamei, T. Okada, and M. Suzuoki. 2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing. IEEE Journal of Solid-State Circuits, 35(7):1025-1033, Jul 2000.
-
(2000)
IEEE Journal of Solid-State Circuits
, vol.35
, Issue.7
, pp. 1025-1033
-
-
Ide, N.1
Hirano, M.2
Endo, Y.3
Yoshioka, S.4
Murakami, H.5
Kunimatsu, A.6
Sato, T.7
Kamei, T.8
Okada, T.9
Suzuoki, M.10
-
16
-
-
4544235747
-
Graph coloring algorithms
-
W. Klotz. Graph coloring algorithms. Mathematics Report, pages 1-9, 2002.
-
(2002)
Mathematics Report
, pp. 1-9
-
-
Klotz, W.1
-
18
-
-
0003657590
-
-
3rd ed. fundamental algorithms . Addison Wesley Longman Publishing Co., Inc.
-
D. E. Knuth. The art of computer programming, volume 1: (3rd ed.) fundamental algorithms . Addison Wesley Longman Publishing Co., Inc., 1997.
-
(1997)
The Art of Computer Programming
, vol.1
-
-
Knuth, D.E.1
-
20
-
-
77955001720
-
Method for conditional branch execution in simd vector processors
-
Mar. 6 US Patent 4,435,758
-
R. A. Lorie and H. R. Strong Jr. Method for conditional branch execution in simd vector processors, Mar. 6 1984. US Patent 4,435,758.
-
(1984)
-
-
Lorie, R.A.1
Strong Jr., H.R.2
-
22
-
-
84863342255
-
Improving gpu performance via large warps and two-level warp scheduling
-
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt. Improving gpu performance via large warps and two-level warp scheduling. In IEEE/ACM International Symposium on Microarchitecture (MICRO '11), pages 308-317, 2011.
-
(2011)
IEEE/ACM International Symposium on Microarchitecture (MICRO '11)
, pp. 308-317
-
-
Narasiman, V.1
Shebanow, M.2
Lee, C.J.3
Miftakhutdinov, R.4
Mutlu, O.5
Patt, Y.N.6
-
25
-
-
0033743209
-
Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors
-
T. Rognes and E. Seeberg. Six-fold speed-up of smith-waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics, 16(8):699-706, 2000.
-
(2000)
Bioinformatics
, vol.16
, Issue.8
, pp. 699-706
-
-
Rognes, T.1
Seeberg, E.2
-
27
-
-
30744459395
-
RPU: A programmable ray processing unit for realtime ray tracing
-
S. Woop, J. Schmittler, and P. Slusallek. RPU: a programmable ray processing unit for realtime ray tracing. In ACM SIGGRAPH 2005 Papers, SIGGRAPH '05, pages 434-444, 2005.
-
(2005)
ACM SIGGRAPH 2005 Papers, SIGGRAPH '05
, pp. 434-444
-
-
Woop, S.1
Schmittler, J.2
Slusallek, P.3
-
28
-
-
70350600765
-
Stack-based parallel recursion on graphics processors
-
K. Yang, B. He, Q. Luo, P. V. Sander, and J. Shi. Stack-based parallel recursion on graphics processors. In ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 299-300, 2009.
-
(2009)
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09
, pp. 299-300
-
-
Yang, K.1
He, B.2
Luo, Q.3
Sander, P.V.4
Shi, J.5
-
29
-
-
84860322837
-
CPU-assisted GPGPU on fused CPU-GPU architectures
-
IEEE
-
Y. Yang, P. Xiang, M. Mantor, and H. Zhou. CPU-assisted GPGPU on fused CPU-GPU architectures. In IEEE International Symposium on High Performance Computer Architecture (HPCA '12), pages 1-12. IEEE, 2012.
-
(2012)
IEEE International Symposium on High Performance Computer Architecture (HPCA '12)
, pp. 1-12
-
-
Yang, Y.1
Xiang, P.2
Mantor, M.3
Zhou, H.4
|