SCOPUS 정보 검색 플랫폼

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

Volumn , Issue , 2011, Pages 320-329

Divergence analysis and optimizations

(4) Coutinho, Bruno a Sampaio, Diogo a Pereira, Fernando Magno Quintão a Meira Jr , Wagner a

a FEDERAL UNIVERSITY OF MINAS GERAIS (Brazil)

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION DEVELOPERS; AUTOMATIC OPTIMIZATION; COMPILER OPTIMIZATIONS; COMPUTATIONAL POWER; CONDITIONAL BRANCH; EXECUTION MODEL; GENE SEQUENCING; GPU PROGRAMMING; OPEN-SOURCE; PROCESSING ELEMENTS; PROGRAM VARIABLES; QUICKSORT; RODINIA; SIMD MACHINES; SINGLE INSTRUCTION MULTIPLE DATA; SYNCHRONIZATION POINTS;

GENES; OPTIMIZATION; PARALLEL ARCHITECTURES; PROGRAM TRANSLATORS; STATIC ANALYSIS;

PROGRAM COMPILERS;

EID: 84856530584 PISSN: 1089795X EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/PACT.2011.63 Document Type: Conference Paper

Times cited : (78)

References (39)

1
- 0014620011
- TRANQUIL: A language for an array processing computer
- ACM
- Norma E. Abel, Paul P. Budnik, David J. Kuck, Yoichi Muraoka, Robert S. Northcote, and Robert B. Wilhelmson. TRANQUIL: a language for an array processing computer. In AFIPS, pages 57-73. ACM, 1969.
- (1969) AFIPS , pp. 57-73
- Abel, N.E.¹ Budnik, P.P.² Kuck, D.J.³ Muraoka, Y.⁴ Northcote, R.S.⁵ Wilhelmson, R.B.⁶

2
- 0004242324
- Cambridge University Press, 2nd edition
- Andrew W. Appel and Jens Palsberg. Modern Compiler Implementation in Java. Cambridge University Press, 2nd edition, 2002.
- (2002) Modern Compiler Implementation in Java
- Appel, A.W.¹ Palsberg, J.²

3
- 77749337497
- An adaptive performance modeling tool for GPU architectures
- ACM
- Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, and Wen-mei W. Hwu. An adaptive performance modeling tool for GPU architectures. In PPoPP, pages 105-114. ACM, 2010.
- (2010) PPoPP , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

4
- 0025545476
- Vcode: A data-parallel intermediate language
- ACM
- Guy Blelloch and Siddhartha Chatterjee. Vcode: A data-parallel intermediate language. In FMPC, pages 471-480. ACM, 1990.
- (1990) FMPC , pp. 471-480
- Blelloch, G.¹ Chatterjee, S.²

5
- 0026923480
- Control structures for data-parallel SIMD languages: Semantics and implementation
- DOI 10.1016/0167-739X(92)90069-N
- Luc Bougé and Jean-Luc Levaire. Control structures for data-parallel SIMD languages: semantics and implementation. Future Generation Computer Systems, 8(4):363-378, 1992. (Pubitemid 23556759)
- (1992) Future Generation Computer Systems , vol.8 , Issue.4 , pp. 363-378
- Bouge Luc¹ Levaire Jean-Luc²

6
- 0015330108
- The Illiac IV system
- W.J. Bouknight, Stewart A. Denenberg, David E. McIntyre, J. M. Randall, Amed H. Sameh, and Daniel L. Slotnick. The Illiac IV system. Proceedings of the IEEE, 60(4):369-388, 1972.
- (1972) Proceedings of the IEEE , vol.60 , Issue.4 , pp. 369-388
- Bouknight, W.J.¹ Denenberg, S.A.² McIntyre, D.E.³ Randall, J.M.⁴ Sameh, A.H.⁵ Slotnick, D.L.⁶

7
- 0031385522
- Efficient oblivious parallel sorting on the MasPar MP-1
- Klaus Brockmann and Rolf Wanka. Efficient oblivious parallel sorting on the MasPar MP-1. ICSS, 1:200, 1997.
- (1997) ICSS , vol.1 , pp. 200
- Brockmann, K.¹ Wanka, R.²

8
- 78650745912
- GPU-quicksort: A practical quicksort algorithm for graphics processors
- Daniel Cederman and Philippas Tsigas. GPU-quicksort: A practical quicksort algorithm for graphics processors. Journal of Experimental Algorithmics, 14(1):4-24, 2009.
- (2009) Journal of Experimental Algorithmics , vol.14 , Issue.1 , pp. 4-24
- Cederman, D.¹ Tsigas, P.²

9
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- IEEE
- Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44-54. IEEE, 2009.
- (2009) IISWC , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.-H.⁶ Skadron, K.⁷

10
- 84856559490
- Dynamic detection of uniform and affine vectors in GPGPU computations
- Springer
- Sylvain Collange, David Defour, and Yao Zhang. Dynamic detection of uniform and affine vectors in GPGPU computations. In HPPC, pages 46-55. Springer, 2009.
- (2009) HPPC , pp. 46-55
- Collange, S.¹ Defour, D.² Zhang, Y.³

11
- 78650730073
- Performance debugging of GPGPU applications with the divergence map
- IEEE
- Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintao Pereira, and Wagner Meira Jr. Performance debugging of GPGPU applications with the divergence map. In SBAC-PAD, pages 33-40. IEEE, 2010.
- (2010) SBAC-PAD , pp. 33-40
- Coutinho, B.¹ Sampaio, D.² Magno, F.³ Pereira, Q.⁴ Meira Jr., W.⁵

12
- 0026243790
- Efficiently computing static single assignment form and the control dependence graph
- Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451-490, 1991.
- (1991) TOPLAS , vol.13 , Issue.4 , pp. 451-490
- Cytron, R.¹ Ferrante, J.² Rosen, B.K.³ Wegman, M.N.⁴ Zadeck, F.K.⁵

13
- 78149233155
- Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
- Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In PACT, pages 354-364, 2010.
- (2010) PACT , pp. 354-364
- Diamos, G.¹ Kerr, A.² Yalamanchili, S.³ Clark, N.⁴

14
- 51049096377
- Massive supercomputing coping with heterogeneity of modern accelerators
- IEEE
- T Endo and S Matsuoka. Massive supercomputing coping with heterogeneity of modern accelerators. In IPDPS, pages 1-10. IEEE, 2008.
- (2008) IPDPS , pp. 1-10
- Endo, T.¹ Matsuoka, S.²

15
- 0347244078
- Formal specification of parallel SIMD execution
- DOI 10.1016/S0304-3975(96)00113-2, PII S0304397596001132
- Craig A. Farrell and Dorota H. Kieronska. Formal specification of parallel SIMD execution. Theo. Comp. Science, 169(1):39-65, 1996. (Pubitemid 126412425)
- (1996) Theoretical Computer Science , vol.169 , Issue.1 , pp. 39-65
- Farrell, C.A.¹ Kieronska, D.H.²

16
- 0023385308
- Program dependence graph and its use in optimization
- DOI 10.1145/24039.24041
- Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use in optimization. TOPLAS, 9(3):319-349, 1987. (Pubitemid 17641083)
- (1987) ACM Transactions on Programming Languages and Systems , vol.9 , Issue.3 , pp. 319-349
- Ferrante Jeanne¹ Ottenstein Karl, J.² Warren Joe, D.³

17
- 0015401565
- Some computer organizations and their effectiveness
- Michael J. Flynn. Some computer organizations and their effectiveness. IEEE Trans. Comput., C-21:948+, 1972.
- (1972) IEEE Trans. Comput. , vol.C-21
- Flynn, M.J.¹

18
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- IEEE
- Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, pages 407-420. IEEE, 2007.
- (2007) MICRO , pp. 407-420
- Fung W. L, W.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

19
- 78149258346
- Understanding throughput-oriented architectures
- Michael Garland and David B. Kirk. Understanding throughput-oriented architectures. Commun. ACM, 53:58-66, 2010.
- (2010) Commun. ACM , vol.53 , pp. 58-66
- Garland, M.¹ Kirk, D.B.²

20
- 56749137408
- Technical Report Initial release on February 14, 2007, NVIDIA
- Mark Harris. The parallel prefix sum (scan) with CUDA. Technical Report Initial release on February 14, 2007, NVIDIA, 2008.
- (2008) The Parallel Prefix Sum (Scan) with CUDA
- Harris, M.¹

21
- 0025550566
- Proc Supercomput 90
- Ken Kennedy and Kathryn S. McKinley. Loop distribution with arbitrary control flow. In Supercomputing, pages 407-416. IEEE, 1990. (Pubitemid 21675251)
- (1990) Loop distribution with arbitrary control flow , pp. 407-416
- Kennedy Ken¹ McKinley Kathryn, S.²

22
- 70649104826
- A characterization and analysis of PTX kernels
- IEEE
- Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. A characterization and analysis of PTX kernels. In IISWC, pages 3-12. IEEE, 2009.
- (2009) IISWC , pp. 3-12
- Kerr, A.¹ Diamos, G.F.² Yalamanchili, S.³

23
- 84956982868
- POMP, or how to design a massively parallel machine with small developments
- Springer
- R. Keryell, Ph. Materat, and N. Paris. POMP, or how to design a massively parallel machine with small developments. In PARLE, pages 83-100. Springer, 1991.
- (1991) PARLE , pp. 83-100
- Keryell, R.¹ Materat, Ph.² Paris, N.³

24
- 0020203229
- Wavefront array processor: language, architecture, and applications
- Sun-Yuan Kung, K. S. Arun, R. J. Gal-Ezer, and D. V. Bhaskar Rao. Wavefront array processor: Language, architecture, and applications. IEEE Trans. Comput., 31:1054-1066, 1982. (Pubitemid 13478801)
- (1982) IEEE Transactions on Computers , vol.C-31 , Issue.11 , pp. 1054-1066
- Kung Sun Yuan¹ Arun, K.S.² Gal-Ezer Ron, J.³ Bhaskar Rao, D.V.⁴

25
- 3042658703
- LLVM: A compilation framework for lifelong program analysis & transformation
- IEEE
- Chris Lattner and Vikram S. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, pages 75-88. IEEE, 2004.
- (2004) CGO , pp. 75-88
- Lattner, C.¹ Adve, V.S.²

26
- 0016486794
- Glypnir-a programming language for Illiac IV
- Duncan H. Lawrie, T. Layman, D. Baer, and J. M. Randal. Glypnir-a programming language for Illiac IV. Commun. ACM, 18(3):157-164, 1975.
- (1975) Commun. ACM , vol.18 , Issue.3 , pp. 157-164
- Lawrie, D.H.¹ Layman, T.² Baer, D.³ Randal, J.M.⁴

27
- 0003939246
- Cold Sprint Harbor Laboratory Press, 1st edition
- David W. Mount. Bioinformatics: Sequence and Genome Analysis. Cold Sprint Harbor Laboratory Press, 1st edition, 2004.
- (2004) Bioinformatics: Sequence and Genome Analysis
- Mount, D.W.¹

28
- 77951154340
- The GPU computing era
- John Nickolls and William J. Dally. The GPU computing era. IEEE Micro, 30:56-69, 2010.
- (2010) IEEE Micro , vol.30 , pp. 56-69
- Nickolls, J.¹ Dally, W.J.²

29
- 77951148621
- Graphics and computing GPUs
- (Patterson and Hennessy), chapter A. Elsevier, 4th edition
- John Nickolls and David Kirk. Graphics and Computing GPUs. Computer Organization and Design, (Patterson and Hennessy), chapter A, pages A.1 - A.77. Elsevier, 4th edition, 2009.
- (2009) Computer Organization and Design
- Nickolls, J.¹ Kirk, D.²

30
- 84963624364
- The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages
- ACM
- Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages. In PLDI, pages 257-271. ACM, 1990.
- (1990) PLDI , pp. 257-271
- Ottenstein, K.J.¹ Ballance, R.A.² MacCabe, A.B.³

31
- 84856548513
- Fernando M. Q. Pereira, 2011. http://divmap.wordpress.com/.
- (2011)
- Pereira F. Q, M.¹

32
- 84976791215
- A language for array and vector processors
- R. H. Perrot. A language for array and vector processors. TOPLAS, 1:177-195, 1979.
- (1979) TOPLAS , vol.1 , pp. 177-195
- Perrot, R.H.¹

33
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan, and Pat Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
- (2008) ACM Trans. Graph. , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

34
- 47849103500
- Introducing control flow into vectorized code
- IEEE
- Jaewook Shin. Introducing control flow into vectorized code. In PACT, pages 280-291. IEEE, 2007.
- (2007) PACT , pp. 280-291
- Shin, J.¹

35
- 0019887799
- Identification of common molecular subsequences
- Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197, 1981.
- (1981) Journal of Molecular Biology , vol.147 , Issue.1 , pp. 195-197
- Smith, T.F.¹ Waterman, M.S.²

36
- 77953978573
- Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
- IEEE
- John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, and Wen-mei W. Hwu. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In CGO, pages 111-119. IEEE, 2010.
- (2010) CGO , pp. 111-119
- Stratton, J.A.¹ Grover, V.² Marathe, J.³ Aarts, B.⁴ Murphy, M.⁵ Hu, Z.⁶ Hwu, W.-M.W.⁷

37
- 67649855320
- Equality saturation: A new approach to optimization
- ACM
- Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: a new approach to optimization. In POPL, pages 264-276. ACM, 2009.
- (2009) POPL , pp. 264-276
- Tate, R.¹ Stepp, M.² Tatlock, Z.³ Lerner, S.⁴

38
- 85050273691
- Program slicing
- IEEE
- Mark Weiser. Program slicing. In ICSE, pages 439-449. IEEE, 1981.
- (1981) ICSE , pp. 439-449
- Weiser, M.¹

39
- 79953126288
- On-the-fly elimination of dynamic irregularities for GPU computing
- ACM
- Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, pages 369-380. ACM, 2011.
- (2011) ASPLOS , pp. 369-380
- Zhang, E.Z.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.