SCOPUS 정보 검색 플랫폼

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Volumn , Issue , 2013, Pages

A large-scale cross-architecture evaluation of thread-coarsening

(3) Magni, Alberto a Dubach, Christophe a O'Boyle, Michael F P a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

GPU; OpenCL; Regression trees; Thread coarsening

Indexed keywords

DIGITAL STORAGE; HARDWARE; PROGRAM COMPILERS;

DATA-PARALLEL PROGRAMMING; GPU; OPENCL; PARALLEL DEVICES; PERFORMANCE COUNTERS; PROGRAM PERFORMANCE; REGRESSION TREES; STATISTICAL REGRESSION;

COARSENING;

EID: 84899692998 PISSN: 21674329 EISSN: 21674337 Source Type: Conference Proceeding
DOI: 10.1145/2503210.2503268 Document Type: Conference Paper

Times cited : (65)

References (28)

1
- 84899700709
- AMD Inc., AMD APP Profiler
- AMD Inc., AMD APP Profiler http://developer. amd. com/tools/ heterogeneous-computing/amd-app-profiler/.

2
- 84899683908
- The llvm compiler infrastructure
- The llvm compiler infrastructure http://llvm. org.

3
- 84899683649
- NVIDIA Corporation, NVIDIA Profiler
- NVIDIA Corporation, NVIDIA Profiler http: //docs. nvidia. com/cuda/profiler-users-guide/.

4
- 84899696089
- Nvidia's Next Generation CUDA Compute Architecture: Fermi
- Nvidia's Next Generation CUDA Compute Architecture: Fermi http://www. nvidia. com/content/PDF/fermi-white papers/NVIDIA-Fermi-Compute-Architecture Whitepaper. pdf, 2009.
- (2009)

5
- 84866665233
- AMD Accelerated parallel processing OpenCL, 2012.
- (2012) AMD Accelerated Parallel Processing OpenCL

6
- 84872539869
- Nvidia's Next Generation CUDA Compute Architecture: Kepler http://www. nvidia. com/content/PDF/kepler/ NVIDIA-Kepler-GK110-Architecture-Whitepaper. pdf, 2012.
- (2012) Nvidia's Next Generation CUDA Compute Architecture: Kepler

7
- 84899692748
- MICA: Microarchitecture-Independent Characterization of Applications http://boegel. kejo. be/ELIS/mica/, 2013.
- (2013) MICA: Microarchitecture-Independent Characterization of Applications

8
- 84856530584
- Divergence analysis and optimizations
- oct.
- B. Coutinho, D. Sampaio, F. Pereira, and W. Meira. Divergence analysis and optimizations. PACT, pages 320-329, oct. 2011.
- (2011) PACT , pp. 320-329
- Coutinho, B.¹ Sampaio, D.² Pereira, F.³ Meira, W.⁴

9
- 78149233155
- Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
- New York, NY, USA,. ACM
- G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. PACT'10, pages 353-364, New York, NY, USA, 2010. ACM.
- (2010) PACT'10 , pp. 353-364
- Diamos, G.F.¹ Kerr, A.R.² Yalamanchili, S.³ Clark, N.⁴

10
- 84863463369
- Compiling a high-level language for gpus: (Via language support for architectures and compilers)
- C. Dubach, P. Cheng, R. M. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for gpus: (via language support for architectures and compilers). In PLDI, pages 1-12, 2012.
- (2012) PLDI , pp. 1-12
- Dubach, C.¹ Cheng, P.² Rabbah, R.M.³ Bacon, D.F.⁴ Fink, S.J.⁵

11
- 84876937393
- Portable mapping of data parallel programs to opencl for heterogeneous systems
- D. Grewe, Z. Wang, and M. F. O'Boyle. Portable mapping of data parallel programs to opencl for heterogeneous systems. CGO'13. ACM, 2013.
- (2013) CGO'13. ACM
- Grewe, D.¹ Wang, Z.² O'boyle, M.F.³

12
- 79953071805
- Sponge: Portable stream programming on graphics engines
- New York, NY, USA, ACM
- A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. ASPLOS'11, pages 381-392, New York, NY, USA, 2011. ACM.
- (2011) ASPLOS'11 , pp. 381-392
- Hormati, A.H.¹ Samadi, M.² Woh, M.³ Mudge, T.⁴ Mahlke, S.⁵

13
- 34548327455
- oct.
- K. Hoste and L. Eeckhout. Comparing benchmarks using key microarchitecture-independent characteristics. pages 83-92, oct. 2006.
- (2006) Comparing Benchmarks Using Key Microarchitecture-independent Characteristics , pp. 83-92
- Hoste, K.¹ Eeckhout, L.²

14
- 79957502935
- Whole-function vectorization
- april
- R. Karrenberg and S. Hack. Whole-function vectorization. CGO'11, pages 141-150, april 2011.
- (2011) CGO'11 , pp. 141-150
- Karrenberg, R.¹ Hack, S.²

15
- 84859143447
- Improving performance of opencl on cpus
- R. Karrenberg and S. Hack. Improving performance of opencl on cpus. CC, pages 1-20, 2012.
- (2012) CC , pp. 1-20
- Karrenberg, R.¹ Hack, S.²

16
- 77952256778
- Modeling gpu-cpu workloads and systems
- New York, NY, USA,. ACM
- A. Kerr, G. Diamos, and S. Yalamanchili. Modeling gpu-cpu workloads and systems. GPGPU'10, pages 31-42, New York, NY, USA, 2010. ACM.
- (2010) GPGPU'10 , pp. 31-42
- Kerr, A.¹ Diamos, G.² Yalamanchili, S.³

17
- 70450103746
- A cross-input adaptive framework for gpu program optimizations
- may
- Y. Liu, E. Zhang, and X. Shen. A cross-input adaptive framework for gpu program optimizations. IPDPS'09, pages 1-10, may 2009.
- (2009) IPDPS'09 , pp. 1-10
- Liu, Y.¹ Zhang, E.² Shen, X.³

18
- 33745304805
- Pin: Building customized program analysis tools with dynamic instrumentation
- June
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not., 40(6):190-200, June 2005.
- (2005) SIGPLAN Not. , vol.40 , Issue.6 , pp. 190-200
- Luk, C.-K.¹ Cohn, R.² Muth, R.³ Patil, H.⁴ Klauser, A.⁵ Lowney, G.⁶ Wallace, S.⁷ Reddi, V.J.⁸ Hazelwood, K.⁹

19
- 84899696234
- S. Moll. Decompilation of LLVM IR, 2011.
- (2011) Decompilation of LLVM IR
- Moll, S.¹

20
- 84953405534
- Cambridge University Press, Jan.
- B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Jan. 1996.
- (1996) Pattern Recognition and Neural Networks
- Ripley, B.D.¹

21
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded gpu using cuda
- New York, NY, USA,. ACM
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. PPoPP'08, pages 73-82, New York, NY, USA, 2008. ACM.
- (2008) PPoPP'08 , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.-M.W.⁶

22
- 84863347222
- A performance analysis framework for identifying potential benefits in gpgpu applications
- New York, NY, USA, ACM
- J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. PPoPP'12, pages 11-22, New York, NY, USA, 2012. ACM.
- (2012) PPoPP'12 , pp. 11-22
- Sim, J.¹ Dasgupta, A.² Kim, H.³ Vuduc, R.⁴

23
- 84859153100
- Automatic restructuring of gpu kernels for exploiting inter-thread data locality
- S. Unkule, C. Shaltz, and A. Qasem. Automatic restructuring of gpu kernels for exploiting inter-thread data locality. CC, pages 21-40, 2012.
- (2012) CC , pp. 21-40
- Unkule, S.¹ Shaltz, C.² Qasem, A.³

24
- 70350771131
- Benchmarking gpus to tune dense linear algebra
- Piscataway, NJ, USA,. IEEE Press
- V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. SC'08, pages 31:1-31:11, Piscataway, NJ, USA, 2008. IEEE Press.
- (2008) SC'08 , pp. 311-3111
- Volkov, V.¹ Demmel, J.W.²

25
- 85050273691
- Program slicing
- Piscataway, NJ, USA,. IEEE Press
- M. Weiser. Program slicing. ICSE'81, pages 439-449, Piscataway, NJ, USA, 1981. IEEE Press.
- (1981) ICSE'81 , pp. 439-449
- Weiser, M.¹

26
- 84863053984
- Linear-time modeling of program working set in shared cache
- X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 350-360, 2011.
- (2011) Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on , pp. 350-360
- Xiang, X.¹ Bao, B.² Ding, C.³ Gao, Y.⁴

27
- 84863663143
- A unified optimizing compiler framework for different gpgpu architectures
- Y. Yang, P. Xiang, J. Kong, M. Mantor, and H. Zhou. A unified optimizing compiler framework for different gpgpu architectures. TACO, 9(2):9, 2012.
- (2012) TACO , vol.9 , Issue.2 , pp. 9
- Yang, Y.¹ Xiang, P.² Kong, J.³ Mantor, M.⁴ Zhou, H.⁵

28
- 79953126288
- On-the-fly elimination of dynamic irregularities for gpu computing
- New York, NY, USA, ACM
- E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. ASPLOS'11, pages 369-380, New York, NY, USA, 2011. ACM.
- (2011) ASPLOS'11 , pp. 369-380
- Zhang, E.Z.¹ Jiang, Y.² Guo, Z.³ Tian, K.⁴ Shen, X.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.