SCOPUS 정보 검색 플랫폼

ISPASS 2017 - IEEE International Symposium on Performance Analysis of Systems and Software

Volumn , Issue , 2017, Pages 43-54

Chai: Collaborative heterogeneous applications for integrated-Architectures

(8) Ǵomez Luna, Juan a Hajj, Izzat El b Chang, Li Wen b Garćia Flores, Víctor c,d De Gonzalo, Simon Garcia b Jablin, Thomas B b,e Pẽna, Antonio J d Hwu, Wen Mei b

a UNIVERSITY OF CÓRDOBA (Spain)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

c UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

d BARCELONA SUPERCOMPUTING CENTER (Spain)

e Multicoreware Inc (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BENCHMARKING; C++ (PROGRAMMING LANGUAGE); COMPUTER PROGRAMMING; DATA HANDLING; DISTRIBUTED COMPUTER SYSTEMS; HIGH LEVEL LANGUAGES; MEMORY ARCHITECTURE; SPECIFICATIONS;

APPLICATION PERFORMANCE; COLLABORATION PATTERNS; DEVICE ARCHITECTURES; HETEROGENEOUS ARCHITECTURES; HETEROGENEOUS PLATFORMS; HETEROGENEOUS SYSTEMS; INTEGRATED ARCHITECTURE; SHARED VIRTUAL MEMORY;

COMPUTER ARCHITECTURE;

EID: 85019024615 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ISPASS.2017.7975269 Document Type: Conference Paper

Times cited : (72)

References (37)

1
- 84966891675
- Morgan Kaufman
- W.-m. W. Hwu, Heterogeneous System Architecture: A New Compute Platform Infrastructure. Morgan Kaufman, 2015.
- (2015) Heterogeneous System Architecture: A New Compute Platform Infrastructure
- Hwu, W.-M.W.¹

2
- 70349100958
- Khronos group, Version 2.0
- Khronos group, "The OpenCL specification, " Version 2.0, 2015.
- (2015) The OpenCL Specification

3
- 84866918568
- NVIDIA, September
- NVIDIA, "CUDA C programming guide v. 8.0, " September 2016.
- (2016) CUDA C Programming Guide v. 8.0

4
- 84873470137
- Parboil: A revised benchmark suite for scientific and commercial throughput computing
- J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. W. Hwu, "Parboil: A revised benchmark suite for scientific and commercial throughput computing, " IMPACT Technical Report, 2012.
- (2012) IMPACT Technical Report
- Stratton, J.A.¹ Rodrigues, C.² Sung, I.-J.³ Obeid, N.⁴ Chang, L.-W.⁵ Anssari, N.⁶ Liu, G.D.⁷ Hwu, W.-M.W.⁸

5
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing, " in Workload Characterization, IEEE International Symposium on, pp. 44-54, 2009.
- (2009) Workload Characterization, IEEE International Symposium on , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.⁵ Lee, S.-H.⁶ Skadron, K.⁷

6
- 77952273045
- The scalable heterogeneous computing (SHOC) benchmark, suite
- A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (SHOC) benchmark suite, " in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63- 74, 2010.
- (2010) Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units , pp. 63-74
- Danalis, A.¹ Marin, G.² McCurdy, C.³ Meredith, J.S.⁴ Roth, P.C.⁵ Spafford, K.⁶ Tipparaju, V.⁷ Vetter, J.S.⁸

7
- 84923879310
- NUPAR: A benchmark suite for modern GPU architectures
- Y. Ukidave, F. N. Paravecino, L. Yu, C. Kalra, A. Momeni, Z. Chen, N. Materise, B. Daley, P. Mistry, and D. Kaeli, "NUPAR: A benchmark suite for modern GPU architectures, " in Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 2015.
- (2015) Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering
- Ukidave, Y.¹ Paravecino, F.N.² Yu, L.³ Kalra, C.⁴ Momeni, A.⁵ Chen, Z.⁶ Materise, N.⁷ Daley, B.⁸ Mistry, P.⁹ Kaeli, D.¹⁰

8
- 84873458159
- A quantitative study of irregular programs on GPUs
- M. Burtscher, R. Nasre, and K. Pingali, "A quantitative study of irregular programs on GPUs, " in Workload Characterization, IEEE International Symposium on, pp. 141-151, 2012.
- (2012) Workload Characterization, IEEE International Symposium on , pp. 141-151
- Burtscher, M.¹ Nasre, R.² Pingali, K.³

9
- 84875979403
- Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems
- P. Mistry, Y. Ukidave, D. Schaa, and D. Kaeli, "Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems, " in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pp. 54-65, 2013.
- (2013) Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units , pp. 54-65
- Mistry, P.¹ Ukidave, Y.² Schaa, D.³ Kaeli, D.⁴

10
- 84994777428
- Hetero-Mark, a benchmark suite for CPU-GPU collaborative computing
- Y. Sun, X. Gong, A. K. Ziabari, L. Yu, X. Li, S. Mukherjee, C. Mc-Cardwell, A. Villegas, and D. Kaeli, "Hetero-Mark, a benchmark suite for CPU-GPU collaborative computing, " in Workload Characterization, IEEE International Symposium on, 2016.
- (2016) Workload Characterization, IEEE International Symposium on
- Sun, Y.¹ Gong, X.² Ziabari, A.K.³ Yu, L.⁴ Li, X.⁵ Mukherjee, S.⁶ Mc-Cardwell, C.⁷ Villegas, A.⁸ Kaeli, D.⁹

11
- 84958535612
- Exploring the features of OpenCL 2.0
- S. Mukherjee, X. Gong, L. Yu, C. McCardwell, Y. Ukidave, T. Dao, F. N. Paravecino, and D. Kaeli, "Exploring the features of OpenCL 2.0, " in Proceedings of the 3rd International Workshop on OpenCL, pp. 51-55, 2015.
- (2015) Proceedings of the 3rd International Workshop on OpenCL , pp. 51-55
- Mukherjee, S.¹ Gong, X.² Yu, L.³ McCardwell, C.⁴ Ukidave, Y.⁵ Dao, T.⁶ Paravecino, F.N.⁷ Kaeli, D.⁸

12
- 84978733890
- A comprehensive performance analysis of HSA and OpenCL 2.0
- S. Mukherjee, Y. Sun, P. Blinzer, A. K. Ziabari, and D. Kaeli, "A comprehensive performance analysis of HSA and OpenCL 2.0, " in Performance Analysis of Systems and Software, IEEE International Symposium on, pp. 183-193, 2016.
- (2016) Performance Analysis of Systems and Software, IEEE International Symposium on , pp. 183-193
- Mukherjee, S.¹ Sun, Y.² Blinzer, P.³ Ziabari, A.K.⁴ Kaeli, D.⁵

13
- 84962221365
- Implementing cross-device atomics in heterogeneous processors
- M. Gupta, D. Das, P. Raghavendra, T. Tye, L. Lobachev, A. Agarwal, and R. Hegde, "Implementing cross-device atomics in heterogeneous processors, " in Parallel and Distributed Processing Symposium Workshop, IEEE International, pp. 659-668, 2015.
- (2015) Parallel and Distributed Processing Symposium Workshop, IEEE International , pp. 659-668
- Gupta, M.¹ Das, D.² Raghavendra, P.³ Tye, T.⁴ Lobachev, L.⁵ Agarwal, A.⁶ Hegde, R.⁷

14
- 0022808786
- A computational approach to edge detection
- J. Canny, "A computational approach to edge detection, " Pattern Analysis and Machine Intelligence, IEEE Transactions on, no. 6, pp. 679-698, 1986.
- (1986) Pattern Analysis and Machine Intelligence, IEEE Transactions on , Issue.6 , pp. 679-698
- Canny, J.¹

15
- 84976501593
- Inplace data sliding algorithms for many-core architectures
- J. Ǵomez Luna, L.-W. Chang, I.-J. Sung, W.-M. Hwu, and N. Guil, "Inplace data sliding algorithms for many-core architectures, " in Parallel Processing, 44th International Conference on, pp. 210-219, 2015.
- (2015) Parallel Processing, 44th International Conference on , pp. 210-219
- Luna, J.G.¹ Chang, L.-W.² Sung, I.-J.³ Hwu, W.-M.⁴ Guil, N.⁵

16
- 0019574599
- Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography
- June
- M. A. Fischler and R. C. Bolles, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, " Communications of the ACM, vol. 24, pp. 381-395, June 1981.
- (1981) Communications of the ACM , vol.24 , pp. 381-395
- Fischler, M.A.¹ Bolles, R.C.²

17
- 84870691946
- DL: A data layout transformation system for heterogeneous computing
- I.-J. Sung, G. Liu, and W.-M. Hwu, "DL: A data layout transformation system for heterogeneous computing, " in Innovative Parallel Computing, pp. 1 -11, 2012.
- (2012) Innovative Parallel Computing , pp. 1-11
- Sung, I.-J.¹ Liu, G.² Hwu, W.-M.³

18
- 77953985375
- Dynamic load balancing on single-And multi-GPU systems
- L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao, "Dynamic load balancing on single-And multi-GPU systems, " in Parallel Distributed Processing, IEEE International Symposium on, pp. 1-12, 2010.
- (2010) Parallel Distributed Processing, IEEE International Symposium on , pp. 1-12
- Chen, L.¹ Villa, O.² Krishnamoorthy, S.³ Gao, G.⁴

19
- 77956200064
- An effective GPU implementation of breadth-first search
- L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search, " in Proceedings of the 47th Design Automation Conference, pp. 52-55, 2010.
- (2010) Proceedings of the 47th Design Automation Conference , pp. 52-55
- Luo, L.¹ Wong, M.² Hwu, W.-M.³

20
- 84903968515
- Gem5-GPU: A heterogeneous CPU-GPU simulator
- Jan
- J. Power, J. Hestness, M. Orr, M. Hill, and D. Wood, "gem5-gpu: A heterogeneous CPU-GPU simulator, " Computer Architecture Letters, vol. 13, Jan 2014.
- (2014) Computer Architecture Letters , vol.13
- Power, J.¹ Hestness, J.² Orr, M.³ Hill, M.⁴ Wood, D.⁵

21
- 85027458542
- Rade on Open Compute
- Rade on Open Compute, "ROCm: Platform for GPU enabled HPC and ultrascale computing." https://github.com/RadeonOpenCompute/ROCm 2016.
- (2016) ROCm: Platform for GPU Enabled HPC and Ultrascale Computing

22
- 85027453863
- AMD
- AMD, "App profiler settings." http://developer.amd.com/tools-And-sdks/archive/compute/amd-App-profiler/user-guide/app-profiler-settings/.
- App Profiler Settings

23
- 85027451719
- S. Kelley. https://github.com/smskelley/canny-opencl.
- Kelley, S.¹

24
- 84928805583
- AMD, June
- AMD, "Memory system on Fusion APUs. The benefits of zero copy." http://developer.amd.com/wordpress/media/2013/06/1004final.pdf, June 2011.
- (2011) Memory System on Fusion APUs. the Benefits of Zero Copy

25
- 85027468688
- bshaozi, September
- bshaozi, "Compile problem." https://github.com/RadeonOpenCompute/hcc/issues/124, September 2016.
- (2016) Compile Problem

26
- 84946020782
- MachSuite: Benchmarks for accelerator design and customized architectures
- B. Reagen, R. Adolf, Y. S. Shao, G. Y. Wei, and D. Brooks, "MachSuite: Benchmarks for accelerator design and customized architectures, " in Workload Characterization, IEEE International Symposium on, pp. 110- 119, 2014.
- (2014) Workload Characterization, IEEE International Symposium on , pp. 110-119
- Reagen, B.¹ Adolf, R.² Shao, Y.S.³ Wei, G.Y.⁴ Brooks, D.⁵

27
- 84862695013
- The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
- K. L. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter, "The tradeoffs of fused memory hierarchies in heterogeneous computing architectures, " in Proceedings of the 9th conference on Computing Frontiers, pp. 103-112, 2012.
- (2012) Proceedings of the 9th Conference on Computing Frontiers , pp. 103-112
- Spafford, K.L.¹ Meredith, J.S.² Lee, S.³ Li, D.⁴ Roth, P.C.⁵ Vetter, J.S.⁶

28
- 84882833309
- Performance characterization of dataintensive kernels on AMD fusion architectures
- K. Lee, H. Lin, and W.-c. Feng, "Performance characterization of dataintensive kernels on AMD fusion architectures, " Computer Science- Research and Development, vol. 28, no. 2-3, pp. 175-184, 2013.
- (2013) Computer Science- Research and Development , vol.28 , Issue.2-3 , pp. 175-184
- Lee, K.¹ Lin, H.² Feng, W.-C.³

29
- 85016777931
- Understanding co-run performance on CPU-GPU integrated processors: Observations, insights, directions
- Q. Zhu, B. Wu, X. Shen, K. Shen, L. Shen, and Z. Wang, "Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions, " Frontiers of Computer Science, pp. 1-17, 2016.
- (2016) Frontiers of Computer Science , pp. 1-17
- Zhu, Q.¹ Wu, B.² Shen, X.³ Shen, K.⁴ Shen, L.⁵ Wang, Z.⁶

30
- 84978477088
- Accelerating graph applications on integrated GPU platforms via instrumentationdriven optimizations
- N. Farooqui, I. Roy, Y. Chen, V. Talwar, and K. Schwan, "Accelerating graph applications on integrated GPU platforms via instrumentationdriven optimizations, " in Proceedings of the ACM International Conference on Computing Frontiers, pp. 19-28, 2016.
- (2016) Proceedings of the ACM International Conference on Computing Frontiers , pp. 19-28
- Farooqui, N.¹ Roy, I.² Chen, Y.³ Talwar, V.⁴ Schwan, K.⁵

31
- 85027447368
- Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications
- V. Garcia-Flores, J. Ǵomez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Pena, "Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications, " in Workload Characterization, IEEE International Symposium on, pp. 1- 10, 2016.
- (2016) Workload Characterization, IEEE International Symposium, on , pp. 1-10
- Garcia-Flores, V.¹ Ǵomez-Luna, J.² Grass, T.³ Rico, A.⁴ Ayguade, E.⁵ Pena, A.J.⁶

32
- 85015994987
- Dynamic buffer overflow detection for gpgpus
- C. Erb, M. Collins, and J. L. Greathouse, "Dynamic buffer overflow detection for gpgpus, " in Proceedings of the 2017 International Symposium on Code Generation and Optimization, pp. 61-73, 2017.
- (2017) Proceedings of the 2017 International Symposium on Code Generation and Optimization , pp. 61-73
- Erb, C.¹ Collins, M.² Greathouse, J.L.³

33
- 84959927541
- Free launch: Optimizing GPU dynamic kernel launches through thread reuse
- G. Chen and X. Shen, "Free launch: optimizing GPU dynamic kernel launches through thread reuse, " in Proceedings of the 48th International Symposium on Microarchitecture, pp. 407-419, 2015.
- (2015) Proceedings of the 48th International Symposium on Microarchitecture , pp. 407-419
- Chen, G.¹ Shen, X.²

34
- 85027464144
- Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on GPUs
- J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili, "Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on GPUs, " in ACM SIGARCH Computer Architecture News, vol. 43, pp. 528-540, 2015.
- (2015) ACM SIGARCH Computer Architecture News , vol.43 , pp. 528-540
- Wang, J.¹ Rubin, N.² Sidelnik, A.³ Yalamanchili, S.⁴

35
- 84983239150
- Compiler-Assisted workload consolidation for efficient dynamic parallelism on GPU
- H. Wu, D. Li, and M. Becchi, "Compiler-Assisted workload consolidation for efficient dynamic parallelism on GPU, " in Parallel and Distributed Processing Symposium, 2016 IEEE International, pp. 534-543, 2016.
- (2016) Parallel and Distributed Processing Symposium, 2016 IEEE International , pp. 534-543
- Wu, H.¹ Li, D.² Becchi, M.³

36
- 85009382810
- KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
- IEEE
- I. El Hajj, J. Ǵomez-Luna, C. Li, L.-W. Chang, D. Milojicic, and W.-m. Hwu, "KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism, " in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pp. 1-12, IEEE, 2016.
- (2016) Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on , pp. 1-12
- Hajj, I.E.¹ Ǵomez-Luna, J.² Li, C.³ Chang, L.-W.⁴ Milojicic, D.⁵ Hwu, W.-M.⁶

37
- 85027441431
- X. Tang, A. Pattnaik, H. Jiang, O. Kayiran, A. Jog, M. I. Sreepathi Pai, M. T. Kandemir, and C. R. Das, "Controlled kernel launch for dynamic parallelism in GPUs .
- Controlled Kernel Launch for Dynamic Parallelism in GPUs
- Tang, X.¹ Pattnaik, A.² Jiang, H.³ Kayiran, O.⁴ Jog, A.⁵ Pai, M.I.S.⁶ Kandemir, M.T.⁷ Das, C.R.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.