-
2
-
-
57949083229
-
A dependency-aware task-based programming environment for multi-core architectures
-
September
-
J. M. Perez, R. M. Badia, and J. Labarta, "A dependency-aware task-based programming environment for multi-core architectures," IEEE Int. Conference on Cluster Computing, pp. 142-151, September 2008.
-
(2008)
IEEE Int. Conference on Cluster Computing
, pp. 142-151
-
-
Perez, J.M.1
Badia, R.M.2
Labarta, J.3
-
3
-
-
35649006026
-
CellSs: Making it easier to program the Cell Broadband Engine processor
-
September
-
J. M. Perez, P. Bellens, R. M. Badia, and J. Labarta, "CellSs: Making it easier to program the Cell Broadband Engine processor," IBM Journal of Research and Development, vol. 51, no. 5, pp. 593-604, September 2007.
-
(2007)
IBM Journal of Research and Development
, vol.51
, Issue.5
, pp. 593-604
-
-
Perez, J.M.1
Bellens, P.2
Badia, R.M.3
Labarta, J.4
-
5
-
-
84893623161
-
Productive Cluster Programming with OmpSs
-
to appear
-
J. Bueno, L. Martinell, A. Duran, M. Farreras, X. Martorell, R. M. Badia, E. Ayguade, and J. Labarta, " Productive Cluster Programming with OmpSs ," in Europar'11 (to appear), 2011.
-
(2011)
Europar'11
-
-
Bueno, J.1
Martinell, L.2
Duran, A.3
Farreras, M.4
Martorell, X.5
Badia, R.M.6
Ayguade, E.7
Labarta, J.8
-
6
-
-
0029191296
-
Cilk: An efficient multithreaded runtime system
-
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: an efficient multithreaded runtime system," SIGPLAN Not., vol. 30, no. 8, pp. 207-216, 1995.
-
(1995)
SIGPLAN Not.
, vol.30
, Issue.8
, pp. 207-216
-
-
Blumofe, R.D.1
Joerg, C.F.2
Kuszmaul, B.C.3
Leiserson, C.E.4
Randall, K.H.5
Zhou, Y.6
-
7
-
-
67650056929
-
Extending the OpenMP Tasking Model to Allow Dependent Tasks
-
Springer Berlin / Heidelberg
-
A. Duran, J. M. Pérez, E. Eduard Ayguadé, R. M. Badia, and J. Labarta, "Extending the OpenMP Tasking Model to Allow Dependent Tasks," in OpenMP in a New Era of Parallelism. Springer Berlin / Heidelberg, 2008, pp. 111-122.
-
(2008)
OpenMP in A New Era of Parallelism
, pp. 111-122
-
-
Duran, A.1
Pérez, J.M.2
Eduard Ayguadé, E.3
Badia, R.M.4
Labarta, J.5
-
8
-
-
77954751089
-
Handling task dependencies under strided and aliased references
-
ser. ICS '10. New York, NY, USA: ACM
-
J. M. Perez, R. M. Badia, and J. Labarta, "Handling task dependencies under strided and aliased references," in Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10. New York, NY, USA: ACM, 2010, pp. 263-274.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing
, pp. 263-274
-
-
Perez, J.M.1
Badia, R.M.2
Labarta, J.3
-
9
-
-
77951980969
-
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
-
Dresden, Germany: Springer, June
-
E. Ayguade, R. M. Badia, D. Cabrera, A. Duran, M. Gonzalez, F. Igual, D. Jimenez, J. Labarta, X. Martorell, R. Mayo, J. M. Perez, and E. S. Quintana-Orti, "A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures," in IWOMP: Evolving OpenMP in an Age of Extreme Parallelism, vol. 5568. Dresden, Germany: Springer, June 2009, pp. 154-167.
-
(2009)
IWOMP: Evolving OpenMP in An Age of Extreme Parallelism
, vol.5568
, pp. 154-167
-
-
Ayguade, E.1
Badia, R.M.2
Cabrera, D.3
Duran, A.4
Gonzalez, M.5
Igual, F.6
Jimenez, D.7
Labarta, J.8
Martorell, X.9
Mayo, R.10
Perez, J.M.11
Quintana-Orti, E.S.12
-
11
-
-
79957528059
-
Trace-driven Simulation of Multithreaded Applications
-
to appear
-
"Trace-driven Simulation of Multithreaded Applications," in Proceedings of the 2011 ISPASS (to appear), 2011.
-
(2011)
Proceedings of the 2011 ISPASS
-
-
-
12
-
-
84866874333
-
-
Master's thesis, Computer Architecture Department, Universitat Politècnica de Catalunya
-
L. Martinell, ""Memory usage improvements for the SMPSs runtime"," Master's thesis, Computer Architecture Department, Universitat Politècnica de Catalunya, 2010.
-
(2010)
Memory Usage Improvements for the SMPSs Runtime
-
-
Martinell, L.1
-
18
-
-
31744441529
-
X10: An object-oriented approach to non-uniform cluster computing
-
ser. OOPSLA '05. New York, NY, USA: ACM
-
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: an object-oriented approach to non-uniform cluster computing," in Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, ser. OOPSLA '05. New York, NY, USA: ACM, 2005, pp. 519-538.
-
(2005)
Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications
, pp. 519-538
-
-
Charles, P.1
Grothoff, C.2
Saraswat, V.3
Donawa, C.4
Kielstra, A.5
Ebcioglu, K.6
Von Praun, C.7
Sarkar, V.8
-
19
-
-
34249696738
-
Parallel programmability and the chapel language
-
August
-
B. Chamberlain, D. Callahan, and H. Zima, "Parallel programmability and the chapel language," Int. J. High Perform. Comput. Appl., vol. 21, pp. 291-312, August 2007.
-
(2007)
Int. J. High Perform. Comput. Appl.
, vol.21
, pp. 291-312
-
-
Chamberlain, B.1
Callahan, D.2
Zima, H.3
-
20
-
-
77749249189
-
Effective communication and computation overlap with hybrid mpi/smpss
-
ser. PPoPP '10. New York, NY, USA: ACM
-
E. A. V. Marjanovic, J. Labarta and M. Valero, "Effective communication and computation overlap with hybrid mpi/smpss," in Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPoPP '10. New York, NY, USA: ACM, 2010, pp. 337-338.
-
(2010)
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 337-338
-
-
Marjanovic, E.A.V.1
Labarta, J.2
Valero, M.3
-
21
-
-
77952597755
-
CUDA-lite: Reducing GPU Programming Complexity
-
S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W. mei W. Hwu, "CUDA-lite: Reducing GPU Programming Complexity," in In Languages and Compilers for Parallel Computing (LCPC) 21st Annual Workshop, August 2008.
-
Languages and Compilers for Parallel Computing (LCPC) 21st Annual Workshop, August 2008
-
-
Ueng, S.-Z.1
Lathara, M.2
Baghsorkhi, S.S.3
Mei, W.4
Hwu, W.5
-
22
-
-
79951728783
-
-
8 December [Online]. Available
-
Khronos OpenCLWorking Group, The OpenCL Specification, version 1.0.29, 8 December 2008. [Online]. Available: http://khronos.org/registry/cl/specs/opencl- 1.0.29.pdf
-
(2008)
The OpenCL Specification, Version 1.0.29
-
-
-
23
-
-
78650835532
-
190 tflops astrophysical n-body simulation on a cluster of gpus
-
ser. SC '10. Washington, DC, USA: IEEE Computer Society
-
T. Hamada and K. Nitadori, "190 tflops astrophysical n-body simulation on a cluster of gpus," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-9.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, pp. 1-9
-
-
Hamada, T.1
Nitadori, K.2
-
24
-
-
84856841346
-
Performance analysis of a hybrid mpi/cuda implementation of the naslu benchmark
-
March
-
S. J. Pennycook, S. D. Hammond, S. A. Jarvis, and G. R. Mudalige, "Performance analysis of a hybrid mpi/cuda implementation of the naslu benchmark," SIGMETRICS Perform. Eval. Rev., vol. 38, pp. 23-29, March 2011.
-
(2011)
SIGMETRICS Perform. Eval. Rev.
, vol.38
, pp. 23-29
-
-
Pennycook, S.J.1
Hammond, S.D.2
Jarvis, S.A.3
Mudalige, G.R.4
-
27
-
-
67650686517
-
Accelerating linpack with cuda on heterogenous clusters
-
ser. GPGPU-2. New York, NY, USA: ACM
-
M. Fatica, "Accelerating linpack with cuda on heterogenous clusters," in Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ser. GPGPU-2. New York, NY, USA: ACM, 2009, pp. 46-51.
-
(2009)
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
, pp. 46-51
-
-
Fatica, M.1
-
28
-
-
78650802947
-
Openmpc: Extended openmp programming and tuning for gpus
-
ser. SC '10. Washington, DC, USA: IEEE Computer Society
-
S. Lee and R. Eigenmann, "Openmpc: Extended openmp programming and tuning for gpus," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1-11.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, pp. 1-11
-
-
Lee, S.1
Eigenmann, R.2
-
29
-
-
79952596877
-
Unified parallel c for gpu clusters: Language extensions and compiler implementation
-
Languages and Compilers for Parallel Computing, ser. K. Cooper, J. Mellor-Crummey, and V. Sarkar, Eds. Springer Berlin / Heidelberg
-
L. Chen, L. Liu, S. Tang, L. Huang, Z. Jing, S. Xu, D. Zhang, and B. Shou, "Unified parallel c for gpu clusters: Language extensions and compiler implementation," in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science, K. Cooper, J. Mellor-Crummey, and V. Sarkar, Eds. Springer Berlin / Heidelberg, 2011, vol. 6548, pp. 151-165.
-
(2011)
Lecture Notes in Computer Science
, vol.6548
, pp. 151-165
-
-
Chen, L.1
Liu, L.2
Tang, S.3
Huang, L.4
Jing, Z.5
Xu, S.6
Zhang, D.7
Shou, B.8
-
31
-
-
79959601133
-
Mint: Realizing cuda performance in 3d stencil methods with annotated c
-
ser. ICS '11. New York, NY, USA: ACM
-
D. Unat, X. Cai, and S. B. Baden, "Mint: Realizing cuda performance in 3d stencil methods with annotated c," in Proceedings of the 25th ACM International Conference on Supercomputing, ser. ICS '11. New York, NY, USA: ACM, 2011, pp. 214-224.
-
(2011)
Proceedings of the 25th ACM International Conference on Supercomputing
, pp. 214-224
-
-
Unat, D.1
Cai, X.2
Baden, S.B.3
-
33
-
-
78649498878
-
Offload - Automating code migration to heterogeneous multicore systems
-
Lecture Notes in Computer Science
-
P. Cooper, U. Dolinsky, A. F. Donaldson, A. Richards, C. Riley, and G. Russell, "Offload - automating code migration to heterogeneous multicore systems," in Lecture Notes in Computer Science, HiPEAC Conference 2010, 2010, pp. 307-321.
-
(2010)
HiPEAC Conference 2010
, pp. 307-321
-
-
Cooper, P.1
Dolinsky, U.2
Donaldson, A.F.3
Richards, A.4
Riley, C.5
Russell, G.6
-
34
-
-
34748865391
-
Compilation for explicitly managed memory hierarchies
-
T. J. Knight, J. Y. Park, M. Ren, M. Houston, M. Erez, K. Fatahalian, A. Aiken, W. J. Dally, and P. Hanrahan, "Compilation for explicitly managed memory hierarchies," in Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007.
-
Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
-
-
Knight, T.J.1
Park, J.Y.2
Ren, M.3
Houston, M.4
Erez, M.5
Fatahalian, K.6
Aiken, A.7
Dally, W.J.8
Hanrahan, P.9
-
35
-
-
84865717999
-
-
Portland Group Inc., Sep
-
Portland Group Inc., "PGI Accelerator Compilers," Sep 2011.
-
(2011)
PGI Accelerator Compilers
-
-
|