-
1
-
-
79952784184
-
Copperhead: Compiling an embedded data parallel language
-
New York, NY, USA: ACM
-
B. Catanzaro, M. Garland, and K. Keutzer, "Copperhead: compiling an embedded data parallel language," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP. New York, NY, USA: ACM, 2011, pp. 47-56.
-
(2011)
Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP
, pp. 47-56
-
-
Catanzaro, B.1
Garland, M.2
Keutzer, K.3
-
2
-
-
84860295764
-
Nikola: Embedding compiled GPU functions in Haskell
-
New York, NY, USA: ACM
-
G. Mainland and G. Morrisett, "Nikola: embedding compiled GPU functions in Haskell," in Proceedings of the third ACM Haskell symposium on Haskell, ser. Haskell '10. New York, NY, USA: ACM, 2010, pp. 67-78.
-
(2010)
Proceedings of the Third ACM Haskell Symposium on Haskell, Ser. Haskell '10
, pp. 67-78
-
-
Mainland, G.1
Morrisett, G.2
-
3
-
-
84887171337
-
Optimising purely functional GPU programs
-
New York, NY, USA: ACM. [Online]
-
T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier, "Optimising purely functional GPU programs," in Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ser. ICFP '13. New York, NY, USA: ACM, 2013, pp. 49-60. [Online]. Available: http://doi.acm.org/10.1145/2500365.2500595
-
(2013)
Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, Ser. ICFP '13
, pp. 49-60
-
-
McDonell, T.L.1
Chakravarty, M.M.2
Keller, G.3
Lippmeier, B.4
-
4
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
Oct.
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, Oct 2009, pp. 44-54.
-
(2009)
Workload Characterization 2009. IISWC 2009. IEEE International Symposium on
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
6
-
-
81455154935
-
Firepile: Run-time compilation for GPUS in Scala
-
New York, NY, USA: ACM
-
N. Nystrom, D. White, and K. Das, "Firepile: run-time compilation for GPUs in Scala," in Proceedings of the 10th ACM international conference on Generative programming and component engineering, ser. GPCE. New York, NY, USA: ACM, 2011, pp. 107-116.
-
(2011)
Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, Ser. GPCE
, pp. 107-116
-
-
Nystrom, N.1
White, D.2
Das, K.3
-
7
-
-
79952811127
-
Accelerating CUDA graph algorithms at maximum warp
-
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp," in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, ser. PPoPP, 2011, pp. 267-276.
-
(2011)
Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP
, pp. 267-276
-
-
Hong, S.1
Kim, S.K.2
Oguntebi, T.3
Olukotun, K.4
-
8
-
-
84863015363
-
A heterogeneous parallel framework for domain-specific languages
-
K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "A heterogeneous parallel framework for domain-specific languages," ser. PACT, 2011.
-
(2011)
Ser. PACT
-
-
Brown, K.J.1
Sujeeth, A.K.2
Lee, H.3
Rompf, T.4
Chafi, H.5
Odersky, M.6
Olukotun, K.7
-
10
-
-
84880234733
-
Harnessing the multicores: Nested data parallelism in Haskell
-
S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty, "Harnessing the multicores: Nested data parallelism in Haskell," in FSTTCS, 2008, pp. 383-414.
-
(2008)
FSTTCS
, pp. 383-414
-
-
Jones, S.L.P.1
Leshchinskiy, R.2
Keller, G.3
Chakravarty, M.M.T.4
-
11
-
-
84872943015
-
Polyhedral parallel code generation for CUDA
-
Jan. [Online]
-
S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado, and F. Catthoor, "Polyhedral parallel code generation for CUDA," ACM Trans. Archit. Code Optim., vol. 9, no. 4, pp. 54:1-54:23, Jan. 2013. [Online]. Available: http://doi.acm.org/10.1145/2400682.2400713
-
(2013)
ACM Trans. Archit. Code Optim.
, vol.9
, Issue.4
, pp. 541-5423
-
-
Verdoolaege, S.1
Carlos Juega, J.2
Cohen, A.3
Ignacio Gómez, J.4
Tenllado, C.5
Catthoor, F.6
-
12
-
-
84908238161
-
Par4all: From convex array regions to heterogeneous computing
-
M. Amini, O. Goubier, S. Guelton, J. O. Mcmahon, F.-X. Pasquier, G. PÃl'an, and P. Villalon, "Par4all: From convex array regions to heterogeneous computing," in Second International Workshop on Polyhedral Compilation Techniques, ser. IMPACT 2012, 2012.
-
(2012)
Second International Workshop on Polyhedral Compilation Techniques, Ser. IMPACT 2012
-
-
Amini, M.1
Goubier, O.2
Guelton, S.3
Mcmahon, J.O.4
Pasquier, F.-X.5
Pãl'an, G.6
Villalon, P.7
-
13
-
-
0003780986
-
-
Stanford InfoLab, Technical Report 1999-66, November 1999, previous number = SIDL-WP-1999-0120. [Online]
-
L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation ranking: Bringing order to the web." Stanford InfoLab, Technical Report 1999-66, November 1999, previous number = SIDL-WP-1999-0120. [Online]. Available: http://ilpubs.stanford.edu:8090/422/
-
The Pagerank Citation Ranking: Bringing Order to the Web
-
-
Page, L.1
Brin, S.2
Motwani, R.3
Winograd, T.4
-
14
-
-
85162467517
-
Hogwild!: A lock-free approach to parallelizing stochastic gradient descent
-
F. Niu, B. Recht, C. Ré, and S. J. Wright, "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent," Advances in Neural Information Processing Systems, vol. 24, pp. 693-701, 2011.
-
(2011)
Advances in Neural Information Processing Systems
, vol.24
, pp. 693-701
-
-
Niu, F.1
Recht, B.2
Ré, C.3
Wright, S.J.4
-
15
-
-
63449118443
-
Using generalized ensemble simulations and Markov state models to identify conformational states
-
[Online]
-
G. R. Bowman, X. Huang, and V. S. Pande, "Using generalized ensemble simulations and Markov state models to identify conformational states," Methods, vol. 49, no. 2, pp. 197-201, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1046202309000978
-
(2009)
Methods
, vol.49
, Issue.2
, pp. 197-201
-
-
Bowman, G.R.1
Huang, X.2
Pande, V.S.3
-
16
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
New York, NY, USA: ACM. [Online]
-
S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY, USA: ACM, 2009, pp. 152-163. [Online]. Available: http://doi.acm.org/10.1145/1555754.1555775
-
(2009)
Proceedings of the 36th Annual International Symposium on Computer Architecture, Ser. ISCA '09
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
17
-
-
77749337497
-
An adaptive performance modeling tool for GPU architectures
-
New York, NY, USA: ACM. [Online]
-
S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '10. New York, NY, USA: ACM, 2010, pp. 105-114. [Online]. Available: http://doi.acm.org/10.1145/1693453.1693470
-
(2010)
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '10
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
19
-
-
58449127539
-
CUDAlite: Reducing GPU programming complexity
-
Springer
-
S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W. H. Wen-mei, "CUDAlite: Reducing GPU programming complexity," in Languages and Compilers for Parallel Computing. Springer, 2008, pp. 1-15.
-
(2008)
Languages and Compilers for Parallel Computing
, pp. 1-15
-
-
Ueng, S.-Z.1
Lathara, M.2
Baghsorkhi, S.S.3
Wen-Mei, W.H.4
-
20
-
-
77954691442
-
A GPGPU compiler for memory optimization and parallelism management
-
New York, NY, USA: ACM. [Online]
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou, "A GPGPU compiler for memory optimization and parallelism management," in Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '10. New York, NY, USA: ACM, 2010, pp. 86-97. [Online]. Available: http://doi.acm.org/10.1145/1806596.1806606
-
(2010)
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '10
, pp. 86-97
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
21
-
-
84863827989
-
Sponge: Portable stream programming on graphics engines
-
ACM
-
A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke, "Sponge: portable stream programming on graphics engines," in ACM SIGPLAN Notices, vol. 46, no. 3. ACM, 2011, pp. 381-392.
-
(2011)
ACM SIGPLAN Notices
, vol.46
, Issue.3
, pp. 381-392
-
-
Hormati, A.H.1
Samadi, M.2
Woh, M.3
Mudge, T.4
Mahlke, S.5
-
22
-
-
84959045524
-
Streamit: A language for streaming applications
-
Springer
-
W. Thies, M. Karczmarek, and S. Amarasinghe, "Streamit: A language for streaming applications," in Compiler Construction. Springer, 2002, pp. 179-196.
-
(2002)
Compiler Construction
, pp. 179-196
-
-
Thies, W.1
Karczmarek, M.2
Amarasinghe, S.3
-
23
-
-
67650563116
-
Software pipelined execution of stream programs on GPUS
-
IEEE
-
A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil, "Software pipelined execution of stream programs on GPUs," in Code Generation and Optimization, 2009. CGO 2009. International Symposium on. IEEE, 2009, pp. 200-209.
-
(2009)
Code Generation and Optimization 2009. CGO 2009. International Symposium on
, pp. 200-209
-
-
Udupa, A.1
Govindarajan, R.2
Thazhuthaveetil, M.J.3
-
24
-
-
79959904195
-
Automatic CPU-GPU communication management and optimization
-
T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August, "Automatic CPU-GPU communication management and optimization," ACM SIGPLAN Notices, vol. 46, no. 6, pp. 142-151, 2011.
-
(2011)
ACM SIGPLAN Notices
, vol.46
, Issue.6
, pp. 142-151
-
-
Jablin, T.B.1
Prabhu, P.2
Jablin, J.A.3
Johnson, N.P.4
Beard, S.R.5
August, D.I.6
-
25
-
-
78650145768
-
Lime: A Javacompatible and synthesizable language for heterogeneous architectures
-
New York, NY, USA: ACM
-
J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah, "Lime: a Javacompatible and synthesizable language for heterogeneous architectures," in Proceedings of the ACM international conference on Object oriented programming systems languages and applications, ser. OOPSLA. New York, NY, USA: ACM, 2010, pp. 89-108.
-
(2010)
Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, Ser. OOPSLA
, pp. 89-108
-
-
Auerbach, J.1
Bacon, D.F.2
Cheng, P.3
Rabbah, R.4
-
26
-
-
84863463369
-
Compiling a high-level language for GPUS: (Via language support for architectures and compilers)
-
C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink, "Compiling a high-level language for GPUs: (via language support for architectures and compilers)," in Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, ser. PLDI '12, 2012, pp. 1-12.
-
(2012)
Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, Ser. PLDI '12
, pp. 1-12
-
-
Dubach, C.1
Cheng, P.2
Rabbah, R.3
Bacon, D.F.4
Fink, S.J.5
-
27
-
-
84889679621
-
Dandelion: A compiler and runtime for heterogeneous systems
-
ACM
-
C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly, "Dandelion: a compiler and runtime for heterogeneous systems," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013, pp. 49-68.
-
(2013)
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
, pp. 49-68
-
-
Rossbach, C.J.1
Yu, Y.2
Currey, J.3
Martin, J.-P.4
Fetterly, D.5
-
28
-
-
84867546922
-
Nested data-parallelism on the GPU
-
New York, NY, USA: ACM. [Online]
-
L. Bergstrom and J. Reppy, "Nested data-parallelism on the GPU," in Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, ser. ICFP '12. New York, NY, USA: ACM, 2012, pp. 247-258. [Online]. Available: http://doi.acm.org/10.1145/2364527.2364563
-
(2012)
Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, Ser. ICFP '12
, pp. 247-258
-
-
Bergstrom, L.1
Reppy, J.2
-
29
-
-
84896893237
-
CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications
-
New York, NY, USA: ACM. [Online]
-
Y. Yang and H. Zhou, "CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications," in Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '14. New York, NY, USA: ACM, 2014, pp. 93-106. [Online]. Available: http://doi.acm.org/10.1145/2555243.2555254.
-
(2014)
Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Ser. PPoPP '14
, pp. 93-106
-
-
Yang, Y.1
Zhou, H.2
|