SCOPUS 정보 검색 플랫폼

Proceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015

Volumn , Issue , 2015, Pages 257-268

Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures

(5) Kim, Hee Seok a Hajj, Izzat El a Stratton, John b Lumetta, Steven a Hwu, Wen Mei a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

b Center for Language and Brain (United States)

Author keywords

[No Author keywords available]

Indexed keywords

PROGRAM COMPILERS; SCHEDULING;

BARRIER SYNCHRONIZATION; HETEROGENEOUS COMPUTING; IMPROVING PERFORMANCE; MEMORY ACCESS PATTERNS; MULTI-CORE CPU ARCHITECTURES; SYNCHRONIZATION POINTS; SYNCHRONOUS PROGRAMMING; THREAD SCHEDULING;

MEMORY ARCHITECTURE;

EID: 84961314978 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CGO.2015.7054205 Document Type: Conference Paper

Times cited : (18)

References (41)

1
- 84969901203
- Opencl for programming shared memory multicore cpus
- A. Ali, U. Dastgeer, and C. Kessler. Opencl for programming shared memory multicore cpus. In Proceedings of the 5th Workshop on MULTIPROG, 2012.
- (2012) Proceedings of the 5th Workshop on MULTIPROG
- Ali, A.¹ Dastgeer, U.² Kessler, C.³

2
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- Apr
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In Performance Analysis of Systems and Software, IEEE International Symposium on, pages 163-174, Apr. 2009.
- (2009) Performance Analysis of Systems and Software, IEEE International Symposium On, Pages , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

3
- 57349139452
- A practical automatic polyhedral parallelizer and locality optimizer
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 101-113, 2008.
- (2008) Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 101-113
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

4
- 84877702106
- A scalable, numerically stable, high-performance tridiagonal solver using gpus
- L. Chang, J. A. Stratton, H. Kim, and W. W. Hwu. A Scalable, Numerically Stable, High-performance Tridiagonal Solver Using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 27:1-27:11, 2012.
- (2012) Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis , pp. 271-2711
- Chang, L.¹ Stratton, J.A.² Kim, H.³ Hwu, W.W.⁴

5
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, IEEE International Symposium on, pages 44-54, 2009.
- (2009) Workload Characterization, IEEE International Symposium on , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.⁶ Skadron, K.⁷

6
- 78049512154
- Barra: A parallel functional simulator for gpgpu
- Aug.
- S. Collange, M. Daumas, D. Defour, and D. Parello. Barra: A Parallel Functional Simulator for GPGPU. In Modeling, Analysis Simulation of Computer and Telecommunication Systems, IEEE International Symposium on, pages 351-360, Aug. 2010.
- (2010) Modeling, Analysis Simulation of Computer and Telecommunication Systems, IEEE International Symposium on , pp. 351-360
- Collange, S.¹ Daumas, M.² Defour, D.³ Parello, D.⁴

7
- 77954598423
- Dynamic detection of uniform and affine vectors in gpgpu computations
- S. Collange, D. Defour, and Y. Zhang. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations. In Proceedings of the 2009 International Conference on Parallel Processing, pages 46-55, 2010.
- (2010) Proceedings of the 2009 International Conference on Parallel Processing , pp. 46-55
- Collange, S.¹ Defour, D.² Zhang, Y.³

8
- 84856530584
- Divergence analysis and optimizations
- Oct.
- B. Coutinho, D. Sampaio, F. M. Q. Pereira, and W. Meira. Divergence Analysis and Optimizations. In Parallel Architectures and Compilation Techniques, 2011 International Conference on, pages 320-329, Oct. 2011.
- (2011) Parallel Architectures and Compilation Techniques, 2011 International Conference on , pp. 320-329
- Coutinho, B.¹ Sampaio, D.² Pereira, F.M.Q.³ Meira, W.⁴

9
- 0002806690
- OpenMP: An industry standard API for shared-memory programming
- L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 5(1):46-55, 1998.
- (1998) Computational Science & Engineering, IEEE , vol.5 , Issue.1 , pp. 46-55
- Dagum, L.¹ Menon, R.²

10
- 78149233155
- Ocelot: A dynamic optimization framework for bulksynchronous applications in heterogeneous systems
- G. F. Diamos, N. Clark, A. R. Kerr, and S. Yalamanchili. Ocelot: A Dynamic Optimization Framework for Bulksynchronous Applications in Heterogeneous Systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 353-364, 2010.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 353-364
- Diamos, G.F.¹ Clark, N.² Kerr, A.R.³ Yalamanchili, S.⁴

11
- 84961365591
- Master's thesis
- I. El Hajj. Dynamic loop vectorization for executing OpenCL kernels on CPUs (Master's thesis). 2014.
- (2014) Dynamic Loop Vectorization for Executing OpenCL Kernels on CPUs
- El Hajj, I.¹

12
- 49949106993
- Perfmon2: A flexible performance monitoring interface for Linux
- Citeseer
- S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux. In Proc. of the 2006 Ottawa Linux Symposium, pages 269-288. Citeseer, 2006.
- (2006) Proc. of the 2006 Ottawa Linux Symposium , pp. 269-288
- Eranian, S.¹

13
- 78149276036
- Twin peaks: A software platform for heterogeneous computing on general-purpose and graphics processors
- J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. R. Gaster, and B. Zheng. Twin Peaks: A Software Platform for Heterogeneous Computing on General-purpose and Graphics Processors. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 205-216, 2010.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 205-216
- Gummaraju, J.¹ Morichetti, L.² Houston, M.³ Sander, B.⁴ Gaster, B.R.⁵ Zheng, B.⁶

14
- 33645444470
- Interprocedural parallelization analysis in suif
- July
- M.W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in suif. ACM Trans. Program. Lang. Syst., 27(4):662-731, July 2005.
- (2005) ACM Trans. Program. Lang. Syst. , vol.27 , Issue.4 , pp. 662-731
- Hall, M.W.¹ Amarasinghe, S.P.² Murphy, B.R.³ Liao, S.-W.⁴ Lam, M.S.⁵

15
- 84961295225
- P. Jaaskelainen, C. S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. pocl: A performance-portable OpenCL implementation, 2014.
- (2014) Pocl: A Performance-portable OpenCL Implementation
- Jaaskelainen, P.¹ De La Lama, C.S.² Schnetter, E.³ Raiskila, K.⁴ Takala, J.⁵ Berg, H.⁶

16
- 84899719703
- OpenCL framework for arm processors with neon support
- G. Jo, W. Jeon, W. Jung, G. Taft, and J. Lee. OpenCL Framework for ARM Processors with NEON Support. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, pages 33-40, 2014.
- (2014) Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing , pp. 33-40
- Jo, G.¹ Jeon, W.² Jung, W.³ Taft, G.⁴ Lee, J.⁵

17
- 79957502935
- Whole-function vectorization
- Apr
- R. Karrenberg and S. Hack. Whole-function Vectorization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 141-150, Apr. 2011.
- (2011) Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 141-150
- Karrenberg, R.¹ Hack, S.²

18
- 84859143447
- Improving performance of opencl on cpus
- R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In Proceedings of the 21st International Conference on Compiler Construction, pages 1-20, 2012.
- (2012) Proceedings of the 21st International Conference on Compiler Construction , pp. 1-20
- Karrenberg, R.¹ Hack, S.²

19
- 0037952146
- Morgan Kaufmann Publishers Inc.
- K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002.
- (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Kennedy, K.¹ Allen, J.R.²

20
- 84863449186
- Dynamic compilation of data-parallel kernels for vector processors
- A. Kerr, G. Diamos, and S. Yalamanchili. Dynamic Compilation of Data-parallel Kernels for Vector Processors. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, pages 23-32, 2012.
- (2012) Proceedings of the Tenth International Symposium on Code Generation and Optimization , pp. 23-32
- Kerr, A.¹ Diamos, G.² Yalamanchili, S.³

21
- 84961318208
- Khronos OpenCL Working Group and others. The OpenCL Specification. A. Munshi, Ed, 2008
- Khronos OpenCL Working Group and others. The OpenCL Specification. A. Munshi, Ed, 2008.

22
- 84961332840
- IMPACT Technical Report
- H.-S. Kim, I. El Hajj, J. A. Stratton, and W.-M. W. Hwu. Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance. IMPACT Technical Report, 2014.
- (2014) Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance
- Kim, H.-S.¹ El Hajj, I.² Stratton, J.A.³ Hwu, W.-M.W.⁴

23
- 84864054886
- SnuCL: An opencl framework for heterogeneous cpu/gpu clusters
- J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters. In Proceedings of the 26th ACM International Conference on Supercomputing, pages 341-352, 2012.
- (2012) Proceedings of the 26th ACM International Conference on Supercomputing , pp. 341-352
- Kim, J.¹ Seo, S.² Lee, J.³ Nah, J.⁴ Jo, G.⁵ Lee, J.⁶

24
- 84883089997
- When polyhedral transformations meet simd code generation
- June
- M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet, and P. Sadayappan. When Polyhedral Transformations Meet SIMD Code Generation. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, number 6, pages 127-138, June 2013.
- (2013) Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation , Issue.6 , pp. 127-138
- Kong, M.¹ Veras, R.² Stock, K.³ Franchetti, F.⁴ Pouchet, L.⁵ Sadayappan, P.⁶

25
- 78149255519
- An opencl framework for heterogeneous multicores with local memory
- J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. Dao, Y. Cho, S. Seo, S. Lee, S. Cho, H. Song, S. Suh, and J. Choi. An OpenCL Framework for Heterogeneous Multicores with Local Memory. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 193-204, 2010.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 193-204
- Lee, J.¹ Kim, J.² Seo, S.³ Kim, S.⁴ Park, J.⁵ Kim, H.⁶ Dao, T.⁷ Cho, Y.⁸ Seo, S.⁹ Lee, S.¹⁰ Cho, S.¹¹ Song, H.¹² Suh, S.¹³ Choi, J.¹⁴

26
- 84899746576
- OpenCL performance evaluation on modern multi core cpus
- May
- J. Lee, K. Patel, N. Nigania, H. Kim, and H. Kim. OpenCL Performance Evaluation on Modern Multi Core CPUs. In Parallel and Distributed Processing Symposium Workshops PhD Forum, 2013 IEEE 27th International, pages 1177-1185, May 2013.
- (2013) Parallel and Distributed Processing Symposium Workshops PhD Forum, 2013 IEEE 27th International , pp. 1177-1185
- Lee, J.¹ Patel, K.² Nigania, N.³ Kim, H.⁴ Kim, H.⁵

27
- 84876943307
- Convergence and scalarization for data-parallel architectures
- Feb
- Y. Lee, R. Krashinsky, V. Grover, S. Keckler, and K. Asanovic. Convergence and scalarization for data-parallel architectures. In Code Generation and Optimization, 2013 IEEE/ACM International Symposium on, pages 1-11, Feb 2013.
- (2013) Code Generation and Optimization, 2013 IEEE/ACM International Symposium on , pp. 1-11
- Lee, Y.¹ Krashinsky, R.² Grover, V.³ Keckler, S.⁴ Asanovic, K.⁵

28
- 84899692998
- A large-scale cross-architecture evaluation of thread-coarsening
- A. Magni, C. Dubach, and M. F. P. O'Boyle. A Large-scale Cross-architecture Evaluation of Thread-coarsening. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 11:1-11:11, 2013.
- (2013) Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 111-1111
- Magni, A.¹ Dubach, C.² O'boyle, M.F.P.³

29
- 0029202471
- A comparison of full and partial predicated execution support for ILP processors
- June
- S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on, pages 138-149, June 1995.
- (1995) Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on , pp. 138-149
- Mahlke, S.A.¹ Hank, R.E.² McCormick, J.E.³ August, D.I.⁴ Hwu, W.W.⁵

30
- 0030190854
- Improving data locality with loop transformations
- July
- K. S. McKinley, S. Carr, and C. Tseng. Improving Data Locality with Loop Transformations. ACM Trans. Program. Lang. Syst., 18(4):424-453, July 1996.
- (1996) ACM Trans. Program. Lang. Syst. , vol.18 , Issue.4 , pp. 424-453
- McKinley, K.S.¹ Carr, S.² Tseng, C.³

31
- 65649105504
- Intel threading building blocks
- Apr.
- C. Pheatt. Intel Threading Building Blocks. J. Comput. Sci. Coll., 23(4):298, Apr. 2008.
- (2008) J. Comput. Sci. Coll. , vol.23 , Issue.4 , pp. 298
- Pheatt, C.¹

32
- 10444289646
- Code generation in the polyhedral model is easier than you think
- B. L. Prism, V. S. Quentin, V. Cedex, and C. Bastoul. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 7-16, 2004.
- (2004) Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques , pp. 7-16
- Prism, B.L.¹ Quentin, V.S.² Cedex, V.³ Bastoul, C.⁴

33
- 84961311076
- N. Rotem. Intel OpenCL Implicit Vectorization Module, 2011.
- (2011) Intel OpenCL Implicit Vectorization Module
- Rotem, N.¹

34
- 84887446142
- Automatic opencl workgroup size selection for multicore cpus
- Sept
- S. Seo, J. Lee, G. Jo, and J. Lee. Automatic OpenCL workgroup size selection for multicore CPUs. In Parallel Architectures and Compilation Techniques, 2013 22nd International Conference on, pages 387-397, Sept 2013.
- (2013) Parallel Architectures and Compilation Techniques, 2013 22nd International Conference on , pp. 387-397
- Seo, S.¹ Lee, J.² Jo, G.³ Lee, J.⁴

35
- 84877647998
- Performance traps in opencl for cpus
- Feb
- J. Shen, J. Fang, H. Sips, and A. L. Varbanescu. Performance Traps in OpenCL for CPUs. In Parallel, Distributed and Network-Based Processing, 2013 21st Euromicro International Conference on, pages 38-45, Feb. 2013.
- (2013) Parallel, Distributed and Network-Based Processing, 2013 21st Euromicro International Conference on , pp. 38-45
- Shen, J.¹ Fang, J.² Sips, H.³ Varbanescu, A.L.⁴

36
- 58449109179
- MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs
- J. A. Stratton, S. S. Stone, andW.W. Hwu. MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In J. N. Amaral, editor, Languages and Compilers for Parallel Computing, pages 16-30. 2008.
- (2008) J. N. Amaral, Editor, Languages and Compilers for Parallel Computing , pp. 16-30
- Stratton, J.A.¹ Stone, S.S.² Hwu, W.W.³

37
- 77953978573
- Efficient compilation of fine-grained spmd-threaded programs for multicore cpus
- J. A. Stratton, V. Grover, J. Marathe, B. Aarts, M. Murphy, Z. Hu, andW.W. Hwu. Efficient Compilation of Fine-grained SPMD-threaded Programs for Multicore CPUs. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 111-119, 2010.
- (2010) Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization , pp. 111-119
- Stratton, J.A.¹ Grover, V.² Marathe, J.³ Aarts, B.⁴ Murphy, M.⁵ Hu, Z.⁶ Hwu, W.W.⁷

38
- 84873470137
- IMPACT Technical Report
- J. A. Stratton, C. Rodrigues, I. Sung, N. Obeid, L. Chang, N. Anssari, G. D. Liu, and W. W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report, 2012.
- (2012) Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing
- Stratton, J.A.¹ Rodrigues, C.² Sung, I.³ Obeid, N.⁴ Chang, L.⁵ Anssari, N.⁶ Liu, G.D.⁷ Hwu, W.W.⁸

39
- 78149251414
- Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
- I. Sung, J. A. Stratton, and W. W. Hwu. Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 513-522, 2010.
- (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques , pp. 513-522
- Sung, I.¹ Stratton, J.A.² Hwu, W.W.³

40
- 84961329123
- Predicate vectors if you must
- S. Timnat, O. Shacham, and A. Zaks. Predicate Vectors If You Must. In WPMVP '14:Workshop on Programming Models for SIMD/Vector Processing, 2014.
- (2014) WPMVP '14:Workshop on Programming Models for SIMD/Vector Processing
- Timnat, S.¹ Shacham, O.² Zaks, A.³

41
- 77954691442
- A gpgpu compiler for memory optimization and parallelism management
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU Compiler for Memory Optimization and Parallelism Management. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 86-97, 2010.
- (2010) Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 86-97
- Yang, Y.¹ Xiang, P.² Kong, J.³ Zhou, H.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.