-
1
-
-
0014620011
-
TRANQUIL: A language for an array processing computer
-
ACM
-
Norma E. Abel, Paul P. Budnik, David J. Kuck, Yoichi Muraoka, Robert S. Northcote, and Robert B. Wilhelmson. TRANQUIL: a language for an array processing computer. In AFIPS, pages 57-73. ACM, 1969.
-
(1969)
AFIPS
, pp. 57-73
-
-
Abel, N.E.1
Budnik, P.P.2
Kuck, D.J.3
Muraoka, Y.4
Northcote, R.S.5
Wilhelmson, R.B.6
-
3
-
-
77749337497
-
An adaptive performance modeling tool for GPU architectures
-
ACM
-
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, and Wen-mei W. Hwu. An adaptive performance modeling tool for GPU architectures. In PPoPP, pages 105-114. ACM, 2010.
-
(2010)
PPoPP
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
4
-
-
0025545476
-
Vcode: A data-parallel intermediate language
-
ACM
-
Guy Blelloch and Siddhartha Chatterjee. Vcode: A data-parallel intermediate language. In FMPC, pages 471-480. ACM, 1990.
-
(1990)
FMPC
, pp. 471-480
-
-
Blelloch, G.1
Chatterjee, S.2
-
5
-
-
0026923480
-
Control structures for data-parallel SIMD languages: Semantics and implementation
-
DOI 10.1016/0167-739X(92)90069-N
-
Luc Bougé and Jean-Luc Levaire. Control structures for data-parallel SIMD languages: semantics and implementation. Future Generation Computer Systems, 8(4):363-378, 1992. (Pubitemid 23556759)
-
(1992)
Future Generation Computer Systems
, vol.8
, Issue.4
, pp. 363-378
-
-
Bouge Luc1
Levaire Jean-Luc2
-
6
-
-
0015330108
-
The Illiac IV system
-
W.J. Bouknight, Stewart A. Denenberg, David E. McIntyre, J. M. Randall, Amed H. Sameh, and Daniel L. Slotnick. The Illiac IV system. Proceedings of the IEEE, 60(4):369-388, 1972.
-
(1972)
Proceedings of the IEEE
, vol.60
, Issue.4
, pp. 369-388
-
-
Bouknight, W.J.1
Denenberg, S.A.2
McIntyre, D.E.3
Randall, J.M.4
Sameh, A.H.5
Slotnick, D.L.6
-
7
-
-
0031385522
-
Efficient oblivious parallel sorting on the MasPar MP-1
-
Klaus Brockmann and Rolf Wanka. Efficient oblivious parallel sorting on the MasPar MP-1. ICSS, 1:200, 1997.
-
(1997)
ICSS
, vol.1
, pp. 200
-
-
Brockmann, K.1
Wanka, R.2
-
8
-
-
78650745912
-
GPU-quicksort: A practical quicksort algorithm for graphics processors
-
Daniel Cederman and Philippas Tsigas. GPU-quicksort: A practical quicksort algorithm for graphics processors. Journal of Experimental Algorithmics, 14(1):4-24, 2009.
-
(2009)
Journal of Experimental Algorithmics
, vol.14
, Issue.1
, pp. 4-24
-
-
Cederman, D.1
Tsigas, P.2
-
9
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
IEEE
-
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44-54. IEEE, 2009.
-
(2009)
IISWC
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
10
-
-
84856559490
-
Dynamic detection of uniform and affine vectors in GPGPU computations
-
Springer
-
Sylvain Collange, David Defour, and Yao Zhang. Dynamic detection of uniform and affine vectors in GPGPU computations. In HPPC, pages 46-55. Springer, 2009.
-
(2009)
HPPC
, pp. 46-55
-
-
Collange, S.1
Defour, D.2
Zhang, Y.3
-
11
-
-
78650730073
-
Performance debugging of GPGPU applications with the divergence map
-
IEEE
-
Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintao Pereira, and Wagner Meira Jr. Performance debugging of GPGPU applications with the divergence map. In SBAC-PAD, pages 33-40. IEEE, 2010.
-
(2010)
SBAC-PAD
, pp. 33-40
-
-
Coutinho, B.1
Sampaio, D.2
Magno, F.3
Pereira, Q.4
Meira Jr., W.5
-
12
-
-
0026243790
-
Efficiently computing static single assignment form and the control dependence graph
-
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451-490, 1991.
-
(1991)
TOPLAS
, vol.13
, Issue.4
, pp. 451-490
-
-
Cytron, R.1
Ferrante, J.2
Rosen, B.K.3
Wegman, M.N.4
Zadeck, F.K.5
-
13
-
-
78149233155
-
Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
-
Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In PACT, pages 354-364, 2010.
-
(2010)
PACT
, pp. 354-364
-
-
Diamos, G.1
Kerr, A.2
Yalamanchili, S.3
Clark, N.4
-
14
-
-
51049096377
-
Massive supercomputing coping with heterogeneity of modern accelerators
-
IEEE
-
T Endo and S Matsuoka. Massive supercomputing coping with heterogeneity of modern accelerators. In IPDPS, pages 1-10. IEEE, 2008.
-
(2008)
IPDPS
, pp. 1-10
-
-
Endo, T.1
Matsuoka, S.2
-
15
-
-
0347244078
-
Formal specification of parallel SIMD execution
-
DOI 10.1016/S0304-3975(96)00113-2, PII S0304397596001132
-
Craig A. Farrell and Dorota H. Kieronska. Formal specification of parallel SIMD execution. Theo. Comp. Science, 169(1):39-65, 1996. (Pubitemid 126412425)
-
(1996)
Theoretical Computer Science
, vol.169
, Issue.1
, pp. 39-65
-
-
Farrell, C.A.1
Kieronska, D.H.2
-
17
-
-
0015401565
-
Some computer organizations and their effectiveness
-
Michael J. Flynn. Some computer organizations and their effectiveness. IEEE Trans. Comput., C-21:948+, 1972.
-
(1972)
IEEE Trans. Comput.
, vol.C-21
-
-
Flynn, M.J.1
-
18
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
IEEE
-
Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, pages 407-420. IEEE, 2007.
-
(2007)
MICRO
, pp. 407-420
-
-
Fung W. L, W.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
19
-
-
78149258346
-
Understanding throughput-oriented architectures
-
Michael Garland and David B. Kirk. Understanding throughput-oriented architectures. Commun. ACM, 53:58-66, 2010.
-
(2010)
Commun. ACM
, vol.53
, pp. 58-66
-
-
Garland, M.1
Kirk, D.B.2
-
20
-
-
56749137408
-
-
Technical Report Initial release on February 14, 2007, NVIDIA
-
Mark Harris. The parallel prefix sum (scan) with CUDA. Technical Report Initial release on February 14, 2007, NVIDIA, 2008.
-
(2008)
The Parallel Prefix Sum (Scan) with CUDA
-
-
Harris, M.1
-
22
-
-
70649104826
-
A characterization and analysis of PTX kernels
-
IEEE
-
Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. A characterization and analysis of PTX kernels. In IISWC, pages 3-12. IEEE, 2009.
-
(2009)
IISWC
, pp. 3-12
-
-
Kerr, A.1
Diamos, G.F.2
Yalamanchili, S.3
-
23
-
-
84956982868
-
POMP, or how to design a massively parallel machine with small developments
-
Springer
-
R. Keryell, Ph. Materat, and N. Paris. POMP, or how to design a massively parallel machine with small developments. In PARLE, pages 83-100. Springer, 1991.
-
(1991)
PARLE
, pp. 83-100
-
-
Keryell, R.1
Materat, Ph.2
Paris, N.3
-
24
-
-
0020203229
-
Wavefront array processor: language, architecture, and applications
-
Sun-Yuan Kung, K. S. Arun, R. J. Gal-Ezer, and D. V. Bhaskar Rao. Wavefront array processor: Language, architecture, and applications. IEEE Trans. Comput., 31:1054-1066, 1982. (Pubitemid 13478801)
-
(1982)
IEEE Transactions on Computers
, vol.C-31
, Issue.11
, pp. 1054-1066
-
-
Kung Sun Yuan1
Arun, K.S.2
Gal-Ezer Ron, J.3
Bhaskar Rao, D.V.4
-
25
-
-
3042658703
-
LLVM: A compilation framework for lifelong program analysis & transformation
-
IEEE
-
Chris Lattner and Vikram S. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, pages 75-88. IEEE, 2004.
-
(2004)
CGO
, pp. 75-88
-
-
Lattner, C.1
Adve, V.S.2
-
26
-
-
0016486794
-
Glypnir-a programming language for Illiac IV
-
Duncan H. Lawrie, T. Layman, D. Baer, and J. M. Randal. Glypnir-a programming language for Illiac IV. Commun. ACM, 18(3):157-164, 1975.
-
(1975)
Commun. ACM
, vol.18
, Issue.3
, pp. 157-164
-
-
Lawrie, D.H.1
Layman, T.2
Baer, D.3
Randal, J.M.4
-
28
-
-
77951154340
-
The GPU computing era
-
John Nickolls and William J. Dally. The GPU computing era. IEEE Micro, 30:56-69, 2010.
-
(2010)
IEEE Micro
, vol.30
, pp. 56-69
-
-
Nickolls, J.1
Dally, W.J.2
-
29
-
-
77951148621
-
Graphics and computing GPUs
-
(Patterson and Hennessy), chapter A. Elsevier, 4th edition
-
John Nickolls and David Kirk. Graphics and Computing GPUs. Computer Organization and Design, (Patterson and Hennessy), chapter A, pages A.1 - A.77. Elsevier, 4th edition, 2009.
-
(2009)
Computer Organization and Design
-
-
Nickolls, J.1
Kirk, D.2
-
30
-
-
84963624364
-
The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages
-
ACM
-
Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages. In PLDI, pages 257-271. ACM, 1990.
-
(1990)
PLDI
, pp. 257-271
-
-
Ottenstein, K.J.1
Ballance, R.A.2
MacCabe, A.B.3
-
31
-
-
84856548513
-
-
Fernando M. Q. Pereira, 2011. http://divmap.wordpress.com/.
-
(2011)
-
-
Pereira F. Q, M.1
-
32
-
-
84976791215
-
A language for array and vector processors
-
R. H. Perrot. A language for array and vector processors. TOPLAS, 1:177-195, 1979.
-
(1979)
TOPLAS
, vol.1
, pp. 177-195
-
-
Perrot, R.H.1
-
33
-
-
49249086142
-
Larrabee: A many-core x86 architecture for visual computing
-
Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan, and Pat Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
-
(2008)
ACM Trans. Graph.
, vol.27
, Issue.3
, pp. 1-15
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
34
-
-
47849103500
-
Introducing control flow into vectorized code
-
IEEE
-
Jaewook Shin. Introducing control flow into vectorized code. In PACT, pages 280-291. IEEE, 2007.
-
(2007)
PACT
, pp. 280-291
-
-
Shin, J.1
-
35
-
-
0019887799
-
Identification of common molecular subsequences
-
Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197, 1981.
-
(1981)
Journal of Molecular Biology
, vol.147
, Issue.1
, pp. 195-197
-
-
Smith, T.F.1
Waterman, M.S.2
-
36
-
-
77953978573
-
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
-
IEEE
-
John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, and Wen-mei W. Hwu. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In CGO, pages 111-119. IEEE, 2010.
-
(2010)
CGO
, pp. 111-119
-
-
Stratton, J.A.1
Grover, V.2
Marathe, J.3
Aarts, B.4
Murphy, M.5
Hu, Z.6
Hwu, W.-M.W.7
-
37
-
-
67649855320
-
Equality saturation: A new approach to optimization
-
ACM
-
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: a new approach to optimization. In POPL, pages 264-276. ACM, 2009.
-
(2009)
POPL
, pp. 264-276
-
-
Tate, R.1
Stepp, M.2
Tatlock, Z.3
Lerner, S.4
-
38
-
-
85050273691
-
Program slicing
-
IEEE
-
Mark Weiser. Program slicing. In ICSE, pages 439-449. IEEE, 1981.
-
(1981)
ICSE
, pp. 439-449
-
-
Weiser, M.1
-
39
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for GPU computing
-
ACM
-
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, pages 369-380. ACM, 2011.
-
(2011)
ASPLOS
, pp. 369-380
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
|