-
1
-
-
70449098063
-
-
Intel Corporation, Number 248966-018 in Intel 64 and IA-32 Optimization Manaul, Intel Corporation, March
-
Intel Corporation. Intel 64 and IA-32 Architectures Optimization Reference Manual. Number 248966-018 in Intel 64 and IA-32 Optimization Manaul. Intel Corporation, March 2009.
-
(2009)
Intel 64 and IA-32 Architectures Optimization Reference Manual
-
-
-
3
-
-
70349100958
-
-
KHRONOS OpenCL Working Group, December
-
KHRONOS OpenCL Working Group. The OpenCL Specification, December 2008.
-
(2008)
The OpenCL Specification
-
-
-
4
-
-
67650694407
-
-
NVIDIA, NVIDIA Corporation, Santa Clara, California, 2.1 edition, October
-
NVIDIA. NVIDIA CUDA Compute Unified Device Architecture. NVIDIA Corporation, Santa Clara, California, 2.1 edition, October 2008.
-
(2008)
NVIDIA CUDA Compute Unified Device Architecture
-
-
-
5
-
-
77953978573
-
Efficient compilation of fine-grained spmd-threaded programs for multicore cpus
-
Toronto, Canada, April
-
John Stratton and Vinod Grover et al. Efficient compilation of fine-grained spmd-threaded programs for multicore cpus. In CGO 2010, Toronto, Canada, April 2010.
-
(2010)
CGO 2010
-
-
Stratton, J.1
Grover, V.2
-
6
-
-
78149276036
-
Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors
-
New York, NY, USA, ACM
-
Jayanth Gummaraju and Laurent Morichetti et al. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. PACT'10, pages 205-216, New York, NY, USA, 2010. ACM.
-
(2010)
PACT'10
, pp. 205-216
-
-
Gummaraju, J.1
Morichetti, L.2
-
7
-
-
78149255519
-
An opencl framework for heterogeneous multicores with local memory
-
New York, NY, USA, ACM
-
Jaejin Lee and Jungwon Kim et al. An opencl framework for heterogeneous multicores with local memory. PACT'10, pages 193-204, New York, NY, USA, 2010. ACM.
-
(2010)
PACT'10
, pp. 193-204
-
-
Lee, J.1
Kim, J.2
-
9
-
-
70649102016
-
-
NVIDIA, NVIDIA Corporation, Santa Clara, California, 1.3 edition, October
-
NVIDIA. NVIDIA Compute PTX: Parallel Thread Execution. NVIDIA Corporation, Santa Clara, California, 1.3 edition, October 2008.
-
(2008)
NVIDIA Compute PTX: Parallel Thread Execution
-
-
-
10
-
-
57649106258
-
Larrabee: A many-core x86 architecture for visual computing
-
pages 18:1-18:15, New York, NY, USA, ACM
-
Larry Seiler and Doug Carmean et al. Larrabee: a many-core x86 architecture for visual computing. In ACM SIGGRAPH 2008 papers, SIGGRAPH'08, pages 18:1-18:15, New York, NY, USA, 2008. ACM.
-
(2008)
ACM SIGGRAPH 2008 Papers, SIGGRAPH'08
-
-
Seiler, L.1
Carmean, D.2
-
11
-
-
84856530584
-
Divergence analysis and optimizations
-
oct.
-
Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintao Pereira, and Wagner Meira Jr. Divergence analysis and optimizations. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 320 -329, oct. 2011.
-
(2011)
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
, pp. 320-329
-
-
Coutinho, B.1
Sampaio, D.2
Pereira, F.M.Q.3
Meira Jr., W.4
-
12
-
-
84856559490
-
Dynamic detection of uniform and affine vectors in gpgpu computations
-
Universite de Perpignan, June
-
Sylvain Collange and David Defour et al. Dynamic detection of uniform and affine vectors in gpgpu computations. Technical report, Universite de Perpignan, University of California Davis, June 2009.
-
(2009)
Technical Report, University of California Davis
-
-
Collange, S.1
Defour, D.2
-
13
-
-
84856512446
-
Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu
-
oct.
-
Ziyu Guo, Eddy Zheng Zhang, and Xipeng Shen. Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for cpu. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 310 -319, oct. 2011.
-
(2011)
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
, pp. 310-319
-
-
Guo, Z.1
Zhang, E.Z.2
Shen, X.3
-
14
-
-
70649104826
-
A characterization and analysis of ptx kernels
-
Austin, TX, USA, October
-
Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. A characterization and analysis of ptx kernels. In IISWC'09, Austin, TX, USA, October 2009.
-
(2009)
IISWC'09
-
-
Kerr, A.1
Diamos, G.2
Yalamanchili, S.3
-
16
-
-
78149233155
-
Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
-
New York, NY, USA, ACM
-
Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. PACT'10, pages 353-364, New York, NY, USA, 2010. ACM.
-
(2010)
PACT'10
, pp. 353-364
-
-
Diamos, G.1
Kerr, A.2
Yalamanchili, S.3
Clark, N.4
-
17
-
-
84863474058
-
The parboil benchmark suite
-
IMPACT. The parboil benchmark suite, 2007.
-
(2007)
IMPACT
-
-
-
18
-
-
70350771131
-
Benchmarking gpus to tune dense linear algebra
-
Piscataway, NJ, USA
-
Volkov Vasily and Demmel James W. Benchmarking gpus to tune dense linear algebra. In Supercomputing'08, Piscataway, NJ, USA, 2008.
-
(2008)
Supercomputing'08
-
-
Volkov, V.1
Demmel, J.W.2
-
19
-
-
79957502935
-
Whole-function vectorization
-
Ralf Karrenberg and Sebastian Hack. Whole-function vectorization. CGO, 2011.
-
(2011)
CGO
-
-
Ralf, K.1
Sebastian, H.2
-
20
-
-
47849103500
-
Introducing control flow into vectorized code
-
Washington, DC, USA, IEEE Computer Society
-
Jaewook Shin. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, PACT'07, pages 280-291, Washington, DC, USA, 2007. IEEE Computer Society.
-
(2007)
Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, PACT'07
, pp. 280-291
-
-
Shin, J.1
-
21
-
-
79951700098
-
Improving simt efficiency of global rendering algorithms with architectural support for dynamic micro-kernels
-
Washington, DC, USA
-
Michael Steffen and Joseph Zambreno. Improving simt efficiency of global rendering algorithms with architectural support for dynamic micro-kernels. MICRO'43, Washington, DC, USA, 2010.
-
(2010)
MICRO'43
-
-
Steffen, M.1
Zambreno, J.2
-
22
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for gpu computing
-
New York, NY, USA, ACM
-
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, ASPLOS'11, pages 369-380, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11
, pp. 369-380
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
-
23
-
-
34547678136
-
Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping
-
DOI 10.1109/HPCA.2007.346199, 4147662, 2007 IEEE 13th Annual International Symposium on High Performance Computer Architecture, HPCA-13
-
Nathan Clark and Amir Hormati et al. Liquid simd: Abstracting simd hardware using lightweight dynamic mapping. In HPCA'07, pages 216-227, Washington, DC, USA, 2007. IEEE Computer Society. (Pubitemid 47208166)
-
(2007)
Proceedings - International Symposium on High-Performance Computer Architecture
, pp. 216-227
-
-
Clark, N.1
Hormati, A.2
Yehia, S.3
Mahlke, S.4
Flautner, K.5
-
24
-
-
79951702599
-
Efficient selection of vector instructions using dynamic programming
-
Washington, DC, USA, IEEE Computer Society
-
Rajkishore Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. MICRO'43, pages 201-212, Washington, DC, USA, 2010. IEEE Computer Society.
-
(2010)
MICRO'43
, pp. 201-212
-
-
Barik, R.1
Zhao, J.2
Sarkar, V.3
|