SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on Code Generation and Optimization, CGO 2012

Volumn , Issue , 2012, Pages 23-32

Dynamic compilation of data-parallel kernels for vector processors

(3) Kerr, Andrew a Diamos, Gregory a Yalamanchili, S a

a GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARCHITECTURE DESIGNERS; CONTROL-FLOW; DATA PARALLEL; DYNAMIC COMPILATION; FUNCTIONAL UNITS; GPU COMPUTING; INSTRUCTION SET EXTENSION; MICRO-BENCHMARK; MODERN PROCESSORS; MULTI CORE; OVER CURRENT; PERFORMANCE IMPROVEMENTS; PERFORMANCE SCALABILITY; POWER EFFICIENCY; PROGRAM TRANSFORMATIONS; REAL-WORLD APPLICATION; SOFTWARE PARALLELISM; STATE OF THE ART; VECTOR PROCESSORS;

COMPUTER SOFTWARE PORTABILITY; DIGITAL SIGNAL PROCESSING; MICROPROCESSOR CHIPS; MULTICORE PROGRAMMING; NETWORK COMPONENTS; OPTIMIZATION; PROGRAM COMPILERS; THROUGHPUT;

PARALLEL PROCESSING SYSTEMS;

EID: 84863449186 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2259016.2259020 Document Type: Conference Paper

Times cited : (10)

References (24)

1
- 70449098063
- Intel Corporation, Number 248966-018 in Intel 64 and IA-32 Optimization Manaul, Intel Corporation, March
- Intel Corporation. Intel 64 and IA-32 Architectures Optimization Reference Manual. Number 248966-018 in Intel 64 and IA-32 Optimization Manaul. Intel Corporation, March 2009.
- (2009) Intel 64 and IA-32 Architectures Optimization Reference Manual

2
- 70449722984
- Intel Corp., March
- Intel Corp. Intel AVX: New Frontiers in Performance Improvements and Energy Efficiency, March 2008.
- (2008) Intel AVX: New Frontiers in Performance Improvements and Energy Efficiency

3
- 70349100958
- KHRONOS OpenCL Working Group, December
- KHRONOS OpenCL Working Group. The OpenCL Specification, December 2008.
- (2008) The OpenCL Specification

4
- 67650694407
- NVIDIA, NVIDIA Corporation, Santa Clara, California, 2.1 edition, October
- NVIDIA. NVIDIA CUDA Compute Unified Device Architecture. NVIDIA Corporation, Santa Clara, California, 2.1 edition, October 2008.
- (2008) NVIDIA CUDA Compute Unified Device Architecture

5
- 77953978573
- Efficient compilation of fine-grained spmd-threaded programs for multicore cpus
- Toronto, Canada, April
- John Stratton and Vinod Grover et al. Efficient compilation of fine-grained spmd-threaded programs for multicore cpus. In CGO 2010, Toronto, Canada, April 2010.
- (2010) CGO 2010
- Stratton, J.¹ Grover, V.²

6
- 78149276036
- Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors
- New York, NY, USA, ACM
- Jayanth Gummaraju and Laurent Morichetti et al. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. PACT'10, pages 205-216, New York, NY, USA, 2010. ACM.
- (2010) PACT'10 , pp. 205-216
- Gummaraju, J.¹ Morichetti, L.²

7
- 78149255519
- An opencl framework for heterogeneous multicores with local memory
- New York, NY, USA, ACM
- Jaejin Lee and Jungwon Kim et al. An opencl framework for heterogeneous multicores with local memory. PACT'10, pages 193-204, New York, NY, USA, 2010. ACM.
- (2010) PACT'10 , pp. 193-204
- Lee, J.¹ Kim, J.²

8
- 84863457471
- Characterization and transformation of unstructured control flow in gpu applications
- June
- Haicheng Wu, G. Diamos, Si Li, and S. Yalamanchili. Characterization and transformation of unstructured control flow in gpu applications. In First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems, June 2011.
- (2011) First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems
- Wu, H.¹ Diamos, G.² Li, S.³ Yalamanchili, S.⁴

9
- 70649102016
- NVIDIA, NVIDIA Corporation, Santa Clara, California, 1.3 edition, October
- NVIDIA. NVIDIA Compute PTX: Parallel Thread Execution. NVIDIA Corporation, Santa Clara, California, 1.3 edition, October 2008.
- (2008) NVIDIA Compute PTX: Parallel Thread Execution

10
- 57649106258
- Larrabee: A many-core x86 architecture for visual computing
- pages 18:1-18:15, New York, NY, USA, ACM
- Larry Seiler and Doug Carmean et al. Larrabee: a many-core x86 architecture for visual computing. In ACM SIGGRAPH 2008 papers, SIGGRAPH'08, pages 18:1-18:15, New York, NY, USA, 2008. ACM.
- (2008) ACM SIGGRAPH 2008 Papers, SIGGRAPH'08
- Seiler, L.¹ Carmean, D.²

11
- 84856530584
- Divergence analysis and optimizations
- oct.
- Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintao Pereira, and Wagner Meira Jr. Divergence analysis and optimizations. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 320 -329, oct. 2011.
- (2011) Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on , pp. 320-329
- Coutinho, B.¹ Sampaio, D.² Pereira, F.M.Q.³ Meira Jr., W.⁴

12
- 84856559490
- Dynamic detection of uniform and affine vectors in gpgpu computations
- Universite de Perpignan, June
- Sylvain Collange and David Defour et al. Dynamic detection of uniform and affine vectors in gpgpu computations. Technical report, Universite de Perpignan, University of California Davis, June 2009.
- (2009) Technical Report, University of California Davis
- Collange, S.¹ Defour, D.²

13
- 84856512446
- Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu
- oct.
- Ziyu Guo, Eddy Zheng Zhang, and Xipeng Shen. Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 310 -319, oct. 2011.
- (2011) Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on , pp. 310-319
- Guo, Z.¹ Zhang, E.Z.² Shen, X.³

14
- 70649104826
- A characterization and analysis of ptx kernels
- Austin, TX, USA, October
- Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. A characterization and analysis of ptx kernels. In IISWC'09, Austin, TX, USA, October 2009.
- (2009) IISWC'09
- Kerr, A.¹ Diamos, G.² Yalamanchili, S.³

15
- 70649115322
- June
- Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili. Gpuocelot: A binary translation framework for ptx., June 2009. http://code.google.com/p/ gpuocelot/.
- (2009) Gpuocelot: A Binary Translation Framework for Ptx
- Diamos, G.¹ Kerr, A.² Yalamanchili, S.³

16
- 78149233155
- Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
- New York, NY, USA, ACM
- Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. PACT'10, pages 353-364, New York, NY, USA, 2010. ACM.
- (2010) PACT'10 , pp. 353-364
- Diamos, G.¹ Kerr, A.² Yalamanchili, S.³ Clark, N.⁴

17
- 84863474058
- The parboil benchmark suite
- IMPACT. The parboil benchmark suite, 2007.
- (2007) IMPACT

18
- 70350771131
- Benchmarking gpus to tune dense linear algebra
- Piscataway, NJ, USA
- Volkov Vasily and Demmel James W. Benchmarking gpus to tune dense linear algebra. In Supercomputing'08, Piscataway, NJ, USA, 2008.
- (2008) Supercomputing'08
- Volkov, V.¹ Demmel, J.W.²

19
- 79957502935
- Whole-function vectorization
- Ralf Karrenberg and Sebastian Hack. Whole-function vectorization. CGO, 2011.
- (2011) CGO
- Ralf, K.¹ Sebastian, H.²

20
- 47849103500
- Introducing control flow into vectorized code
- Washington, DC, USA, IEEE Computer Society
- Jaewook Shin. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, PACT'07, pages 280-291, Washington, DC, USA, 2007. IEEE Computer Society.
- (2007) Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, PACT'07 , pp. 280-291
- Shin, J.¹

21
- 79951700098
- Improving simt efficiency of global rendering algorithms with architectural support for dynamic micro-kernels
- Washington, DC, USA
- Michael Steffen and Joseph Zambreno. Improving simt efficiency of global rendering algorithms with architectural support for dynamic micro-kernels. MICRO'43, Washington, DC, USA, 2010.
- (2010) MICRO'43
- Steffen, M.¹ Zambreno, J.²

22
- 79953126288
- On-the-fly elimination of dynamic irregularities for gpu computing
- New York, NY, USA, ACM
- Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, ASPLOS'11, pages 369-380, New York, NY, USA, 2011. ACM.
- (2011) Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11 , pp. 369-380
- Zhang, E.Z.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

23
- 34547678136
- Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping
- DOI 10.1109/HPCA.2007.346199, 4147662, 2007 IEEE 13th Annual International Symposium on High Performance Computer Architecture, HPCA-13
- Nathan Clark and Amir Hormati et al. Liquid simd: Abstracting simd hardware using lightweight dynamic mapping. In HPCA'07, pages 216-227, Washington, DC, USA, 2007. IEEE Computer Society. (Pubitemid 47208166)
- (2007) Proceedings - International Symposium on High-Performance Computer Architecture , pp. 216-227
- Clark, N.¹ Hormati, A.² Yehia, S.³ Mahlke, S.⁴ Flautner, K.⁵

24
- 79951702599
- Efficient selection of vector instructions using dynamic programming
- Washington, DC, USA, IEEE Computer Society
- Rajkishore Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. MICRO'43, pages 201-212, Washington, DC, USA, 2010. IEEE Computer Society.
- (2010) MICRO'43 , pp. 201-212
- Barik, R.¹ Zhao, J.² Sarkar, V.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.