메뉴 건너뛰기




Volumn , Issue , 2011, Pages 320-329

Divergence analysis and optimizations

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION DEVELOPERS; AUTOMATIC OPTIMIZATION; COMPILER OPTIMIZATIONS; COMPUTATIONAL POWER; CONDITIONAL BRANCH; EXECUTION MODEL; GENE SEQUENCING; GPU PROGRAMMING; OPEN-SOURCE; PROCESSING ELEMENTS; PROGRAM VARIABLES; QUICKSORT; RODINIA; SIMD MACHINES; SINGLE INSTRUCTION MULTIPLE DATA; SYNCHRONIZATION POINTS;

EID: 84856530584     PISSN: 1089795X     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/PACT.2011.63     Document Type: Conference Paper
Times cited : (78)

References (39)
  • 3
    • 77749337497 scopus 로고    scopus 로고
    • An adaptive performance modeling tool for GPU architectures
    • ACM
    • Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, and Wen-mei W. Hwu. An adaptive performance modeling tool for GPU architectures. In PPoPP, pages 105-114. ACM, 2010.
    • (2010) PPoPP , pp. 105-114
    • Baghsorkhi, S.S.1    Delahaye, M.2    Patel, S.J.3    Gropp, W.D.4    Hwu, W.-M.W.5
  • 4
    • 0025545476 scopus 로고
    • Vcode: A data-parallel intermediate language
    • ACM
    • Guy Blelloch and Siddhartha Chatterjee. Vcode: A data-parallel intermediate language. In FMPC, pages 471-480. ACM, 1990.
    • (1990) FMPC , pp. 471-480
    • Blelloch, G.1    Chatterjee, S.2
  • 5
    • 0026923480 scopus 로고
    • Control structures for data-parallel SIMD languages: Semantics and implementation
    • DOI 10.1016/0167-739X(92)90069-N
    • Luc Bougé and Jean-Luc Levaire. Control structures for data-parallel SIMD languages: semantics and implementation. Future Generation Computer Systems, 8(4):363-378, 1992. (Pubitemid 23556759)
    • (1992) Future Generation Computer Systems , vol.8 , Issue.4 , pp. 363-378
    • Bouge Luc1    Levaire Jean-Luc2
  • 7
    • 0031385522 scopus 로고    scopus 로고
    • Efficient oblivious parallel sorting on the MasPar MP-1
    • Klaus Brockmann and Rolf Wanka. Efficient oblivious parallel sorting on the MasPar MP-1. ICSS, 1:200, 1997.
    • (1997) ICSS , vol.1 , pp. 200
    • Brockmann, K.1    Wanka, R.2
  • 8
    • 78650745912 scopus 로고    scopus 로고
    • GPU-quicksort: A practical quicksort algorithm for graphics processors
    • Daniel Cederman and Philippas Tsigas. GPU-quicksort: A practical quicksort algorithm for graphics processors. Journal of Experimental Algorithmics, 14(1):4-24, 2009.
    • (2009) Journal of Experimental Algorithmics , vol.14 , Issue.1 , pp. 4-24
    • Cederman, D.1    Tsigas, P.2
  • 9
  • 10
    • 84856559490 scopus 로고    scopus 로고
    • Dynamic detection of uniform and affine vectors in GPGPU computations
    • Springer
    • Sylvain Collange, David Defour, and Yao Zhang. Dynamic detection of uniform and affine vectors in GPGPU computations. In HPPC, pages 46-55. Springer, 2009.
    • (2009) HPPC , pp. 46-55
    • Collange, S.1    Defour, D.2    Zhang, Y.3
  • 11
    • 78650730073 scopus 로고    scopus 로고
    • Performance debugging of GPGPU applications with the divergence map
    • IEEE
    • Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintao Pereira, and Wagner Meira Jr. Performance debugging of GPGPU applications with the divergence map. In SBAC-PAD, pages 33-40. IEEE, 2010.
    • (2010) SBAC-PAD , pp. 33-40
    • Coutinho, B.1    Sampaio, D.2    Magno, F.3    Pereira, Q.4    Meira Jr., W.5
  • 12
    • 0026243790 scopus 로고
    • Efficiently computing static single assignment form and the control dependence graph
    • Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451-490, 1991.
    • (1991) TOPLAS , vol.13 , Issue.4 , pp. 451-490
    • Cytron, R.1    Ferrante, J.2    Rosen, B.K.3    Wegman, M.N.4    Zadeck, F.K.5
  • 13
    • 78149233155 scopus 로고    scopus 로고
    • Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
    • Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In PACT, pages 354-364, 2010.
    • (2010) PACT , pp. 354-364
    • Diamos, G.1    Kerr, A.2    Yalamanchili, S.3    Clark, N.4
  • 14
    • 51049096377 scopus 로고    scopus 로고
    • Massive supercomputing coping with heterogeneity of modern accelerators
    • IEEE
    • T Endo and S Matsuoka. Massive supercomputing coping with heterogeneity of modern accelerators. In IPDPS, pages 1-10. IEEE, 2008.
    • (2008) IPDPS , pp. 1-10
    • Endo, T.1    Matsuoka, S.2
  • 15
    • 0347244078 scopus 로고    scopus 로고
    • Formal specification of parallel SIMD execution
    • DOI 10.1016/S0304-3975(96)00113-2, PII S0304397596001132
    • Craig A. Farrell and Dorota H. Kieronska. Formal specification of parallel SIMD execution. Theo. Comp. Science, 169(1):39-65, 1996. (Pubitemid 126412425)
    • (1996) Theoretical Computer Science , vol.169 , Issue.1 , pp. 39-65
    • Farrell, C.A.1    Kieronska, D.H.2
  • 17
    • 0015401565 scopus 로고
    • Some computer organizations and their effectiveness
    • Michael J. Flynn. Some computer organizations and their effectiveness. IEEE Trans. Comput., C-21:948+, 1972.
    • (1972) IEEE Trans. Comput. , vol.C-21
    • Flynn, M.J.1
  • 18
    • 47349104432 scopus 로고    scopus 로고
    • Dynamic warp formation and scheduling for efficient GPU control flow
    • IEEE
    • Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, pages 407-420. IEEE, 2007.
    • (2007) MICRO , pp. 407-420
    • Fung W. L, W.1    Sham, I.2    Yuan, G.3    Aamodt, T.M.4
  • 19
    • 78149258346 scopus 로고    scopus 로고
    • Understanding throughput-oriented architectures
    • Michael Garland and David B. Kirk. Understanding throughput-oriented architectures. Commun. ACM, 53:58-66, 2010.
    • (2010) Commun. ACM , vol.53 , pp. 58-66
    • Garland, M.1    Kirk, D.B.2
  • 20
    • 56749137408 scopus 로고    scopus 로고
    • Technical Report Initial release on February 14, 2007, NVIDIA
    • Mark Harris. The parallel prefix sum (scan) with CUDA. Technical Report Initial release on February 14, 2007, NVIDIA, 2008.
    • (2008) The Parallel Prefix Sum (Scan) with CUDA
    • Harris, M.1
  • 22
    • 70649104826 scopus 로고    scopus 로고
    • A characterization and analysis of PTX kernels
    • IEEE
    • Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. A characterization and analysis of PTX kernels. In IISWC, pages 3-12. IEEE, 2009.
    • (2009) IISWC , pp. 3-12
    • Kerr, A.1    Diamos, G.F.2    Yalamanchili, S.3
  • 23
    • 84956982868 scopus 로고
    • POMP, or how to design a massively parallel machine with small developments
    • Springer
    • R. Keryell, Ph. Materat, and N. Paris. POMP, or how to design a massively parallel machine with small developments. In PARLE, pages 83-100. Springer, 1991.
    • (1991) PARLE , pp. 83-100
    • Keryell, R.1    Materat, Ph.2    Paris, N.3
  • 25
    • 3042658703 scopus 로고    scopus 로고
    • LLVM: A compilation framework for lifelong program analysis & transformation
    • IEEE
    • Chris Lattner and Vikram S. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, pages 75-88. IEEE, 2004.
    • (2004) CGO , pp. 75-88
    • Lattner, C.1    Adve, V.S.2
  • 26
    • 0016486794 scopus 로고
    • Glypnir-a programming language for Illiac IV
    • Duncan H. Lawrie, T. Layman, D. Baer, and J. M. Randal. Glypnir-a programming language for Illiac IV. Commun. ACM, 18(3):157-164, 1975.
    • (1975) Commun. ACM , vol.18 , Issue.3 , pp. 157-164
    • Lawrie, D.H.1    Layman, T.2    Baer, D.3    Randal, J.M.4
  • 28
    • 77951154340 scopus 로고    scopus 로고
    • The GPU computing era
    • John Nickolls and William J. Dally. The GPU computing era. IEEE Micro, 30:56-69, 2010.
    • (2010) IEEE Micro , vol.30 , pp. 56-69
    • Nickolls, J.1    Dally, W.J.2
  • 29
    • 77951148621 scopus 로고    scopus 로고
    • Graphics and computing GPUs
    • (Patterson and Hennessy), chapter A. Elsevier, 4th edition
    • John Nickolls and David Kirk. Graphics and Computing GPUs. Computer Organization and Design, (Patterson and Hennessy), chapter A, pages A.1 - A.77. Elsevier, 4th edition, 2009.
    • (2009) Computer Organization and Design
    • Nickolls, J.1    Kirk, D.2
  • 30
    • 84963624364 scopus 로고
    • The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages
    • ACM
    • Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages. In PLDI, pages 257-271. ACM, 1990.
    • (1990) PLDI , pp. 257-271
    • Ottenstein, K.J.1    Ballance, R.A.2    MacCabe, A.B.3
  • 31
    • 84856548513 scopus 로고    scopus 로고
    • Fernando M. Q. Pereira, 2011. http://divmap.wordpress.com/.
    • (2011)
    • Pereira F. Q, M.1
  • 32
    • 84976791215 scopus 로고
    • A language for array and vector processors
    • R. H. Perrot. A language for array and vector processors. TOPLAS, 1:177-195, 1979.
    • (1979) TOPLAS , vol.1 , pp. 177-195
    • Perrot, R.H.1
  • 34
    • 47849103500 scopus 로고    scopus 로고
    • Introducing control flow into vectorized code
    • IEEE
    • Jaewook Shin. Introducing control flow into vectorized code. In PACT, pages 280-291. IEEE, 2007.
    • (2007) PACT , pp. 280-291
    • Shin, J.1
  • 35
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197, 1981.
    • (1981) Journal of Molecular Biology , vol.147 , Issue.1 , pp. 195-197
    • Smith, T.F.1    Waterman, M.S.2
  • 36
    • 77953978573 scopus 로고    scopus 로고
    • Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
    • IEEE
    • John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, and Wen-mei W. Hwu. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs. In CGO, pages 111-119. IEEE, 2010.
    • (2010) CGO , pp. 111-119
    • Stratton, J.A.1    Grover, V.2    Marathe, J.3    Aarts, B.4    Murphy, M.5    Hu, Z.6    Hwu, W.-M.W.7
  • 37
    • 67649855320 scopus 로고    scopus 로고
    • Equality saturation: A new approach to optimization
    • ACM
    • Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: a new approach to optimization. In POPL, pages 264-276. ACM, 2009.
    • (2009) POPL , pp. 264-276
    • Tate, R.1    Stepp, M.2    Tatlock, Z.3    Lerner, S.4
  • 38
    • 85050273691 scopus 로고
    • Program slicing
    • IEEE
    • Mark Weiser. Program slicing. In ICSE, pages 439-449. IEEE, 1981.
    • (1981) ICSE , pp. 439-449
    • Weiser, M.1
  • 39
    • 79953126288 scopus 로고    scopus 로고
    • On-the-fly elimination of dynamic irregularities for GPU computing
    • ACM
    • Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, pages 369-380. ACM, 2011.
    • (2011) ASPLOS , pp. 369-380
    • Zhang, E.Z.1    Jiang, Y.2    Guo, Z.3    Tian, K.4    Shen, X.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.