SCOPUS 정보 검색 플랫폼

IEEE Transactions on Parallel and Distributed Systems

Volumn 29, Issue 5, 2018, Pages 1089-1102

Intra-Node Memory Safe GPU Co-Scheduling

(4) Reano, Carlos a Silla, Federico a Nikolopoulos, Dimitrios S b Varghese, Blesson b

a UNIVERSITAT POLITÈCNICA DE VALÈNCIA (Spain)

b QUEEN'S UNIVERSITY BELFAST (United Kingdom)

Author keywords

accelerator; access synchronisation; GPU co scheduling; memory safe; schedGPU; under utilisation

Indexed keywords

COMPUTER GRAPHICS; GRAPHICS PROCESSING UNIT; LIBRARIES; MEMORY ARCHITECTURE; MULTITASKING; PARTICLE ACCELERATORS; PROGRAM PROCESSORS; SCHEDULING; SERVERS;

CO-SCHEDULING; MEMORY MANAGEMENT; NON-VOLATILE MEMORY; SCHEDGPU; UNDER-UTILISATION;

MEMORY MANAGEMENT UNITS;

EID: 85039762526 PISSN: 10459219 EISSN: None Source Type: Journal
DOI: 10.1109/TPDS.2017.2784428 Document Type: Article

Times cited : (13)

References (35)

1
- 84934299651
- GPU cluster for high performance computing
- Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-Stover, "GPU cluster for high performance computing, " in Proc. IEEE/ACM Conf. Supercomput., 2004, pp. 47-47.
- (2004) Proc. IEEE/ACM Conf. Supercomput. , pp. 47
- Fan, Z.¹ Qiu, F.² Kaufman, A.³ Yoakum-Stover, S.⁴

2
- 84904350194
- Trends in high-performance computing for engineering calculations
- Art. no. 20130319
- M. B. Giles and I. Reguly, "Trends in high-performance computing for engineering calculations, " Philosoph. Trans. Roy. Soc. London Series A, vol. 372, 2014, Art. no. 20130319.
- (2014) Philosoph. Trans. Roy. Soc. London Series A , vol.372
- Giles, M.B.¹ Reguly, I.²

3
- 0242571753
- SLURM: Simple Linux utility for resource management
- A. B. Yoo, M. A. Jette, and M. Grondona, "SLURM: Simple Linux utility for resource management, " in Proc. Int. Workshop Job Scheduling Strategies Parallel Process., 2003, pp. 44-60.
- (2003) Proc. Int. Workshop Job Scheduling Strategies Parallel Process. , pp. 44-60
- Yoo, A.B.¹ Jette, M.A.² Grondona, M.³

4
- 85045585440
- Adaptive Computing. TORQUE Resource Manager. [Online]
- Adaptive Computing. TORQUE Resource Manager. 2016. [Online]. Available: http://www.adaptivecomputing.com/products/open-source/torque/
- (2016)

5
- 67650668277
- QP: A heterogeneous multi-accelerator cluster
- M. Showerman, et al., "QP: A heterogeneous multi-accelerator cluster, " in Proc. 10th LCI Int. Conf. High-Perform. Clustered Comput., 2009, pp. 1-8.
- (2009) Proc. 10th LCI Int. Conf. High-Perform. Clustered Comput. , pp. 1-8
- Showerman, M.¹

6
- 84907440423
- A survey of methods for analysing and improving GPU energy efficiency
- S. Mittal and J. S. Vetter, "A survey of methods for analysing and improving GPU energy efficiency, " ACM Comput. Surveys, vol. 47, no. 2, pp. 19:1-19:23, 2014.
- (2014) ACM Comput. Surveys , vol.47 , Issue.2 , pp. 191-1923
- Mittal, S.¹ Vetter, J.S.²

7
- 84959042403
- Acceleration-as-a-service: Exploiting virtualised GPUs for a financial application
- B. Varghese, J. Prades, C. Reano, and F. Silla, "Acceleration-as-a-service: Exploiting virtualised GPUs for a financial application, " in Proc. 11th IEEE Int. Conf. e-Sci., 2015, pp. 47-56.
- (2015) Proc. 11th IEEE Int. Conf. E-Sci. , pp. 47-56
- Varghese, B.¹ Prades, J.² Reano, C.³ Silla, F.⁴

8
- 84863676008
- Scheduling concurrent applications on a cluster of CPU-GPU nodes
- V. T. Ravi, M. Becchi, W. Jiang, G Agrawal, and S. Chakradhar, "Scheduling concurrent applications on a cluster of CPU-GPU nodes, " in Proc. 12th IEEE/ACM Int. Symp. Cluster Cloud Grid Com-put., 2012, pp. 140-147.
- (2012) Proc. 12th IEEE/ACM Int. Symp. Cluster Cloud Grid Com-put. , pp. 140-147
- Ravi, V.T.¹ Becchi, M.² Jiang, W.³ Agrawal, G.⁴ Chakradhar, S.⁵

9
- 84867281703
- Dynamic load scheduling on CPU-GPU for iterative tomographic reconstruction
- J. I. Agulleiro, F. Vazquez, E. M. Garzon, and J. J. Fernandez, "Dynamic load scheduling on CPU-GPU for iterative tomographic reconstruction, " in Proc. 10th IEEE Int. Symp. Parallel Distrib. Process. Appl., 2012, pp. 603-608.
- (2012) Proc. 10th IEEE Int. Symp. Parallel Distrib. Process. Appl. , pp. 603-608
- Agulleiro, J.I.¹ Vazquez, F.² Garzon, E.M.³ Fernandez, J.J.⁴

10
- 80955141000
- Exploring fine-grained task-based execution on multi-GPU systems
- L. Chen, O Villa, and G R. Gao, "Exploring fine-grained task-based execution on multi-GPU systems, " in Proc. IEEE Int. Conf. Cluster Comput., 2011, pp. 386-394.
- (2011) Proc. IEEE Int. Conf. Cluster Comput. , pp. 386-394
- Chen, L.¹ Villa, O.² Gao, G.R.³

11
- 85045565081
- Towards multi-tenant GPGPU: Event-driven programming model for system-wide scheduling on shared GPUs
- Y. Suzuki, H. Yamada, S. Kato, and K. Kono, "Towards multi-tenant GPGPU: Event-driven programming model for system-wide scheduling on shared GPUs, " in Proc. Workshop Multicore Rack-Scale Syst., 2016, pp. 1-7.
- (2016) Proc. Workshop Multicore Rack-Scale Syst. , pp. 1-7
- Suzuki, Y.¹ Yamada, H.² Kato, S.³ Kono, K.⁴

12
- 84894342197
- GPUSync: A framework for real-time GPU management
- G. A. Elliott, B. C. Ward, and J. H. Anderson, "GPUSync: A framework for real-time GPU management, " in Proc. IEEE Real-Time Syst. Symp., 2013, pp. 33-44.
- (2013) Proc. IEEE Real-Time Syst. Symp. , pp. 33-44
- Elliott, G.A.¹ Ward, B.C.² Anderson, J.H.³

13
- 85077032008
- TimeGraph: GPU scheduling for real-time multi-tasking envi-ronments
- S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa, "TimeGraph: GPU scheduling for real-time multi-tasking envi-ronments, " in Proc. USENIXAnnu. Tech. Conf., 2011, pp. 17-30.
- (2011) Proc. USENIXAnnu. Tech. Conf. , pp. 17-30
- Kato, S.¹ Lakshmanan, K.² Rajkumar, R.³ Ishikawa, Y.⁴

14
- 84964682582
- Improving application concurrency on GPUs by managing implicit and explicit syn-chronisations
- M. Butler, K. Sajjapongse, and M. Becchi, "Improving application concurrency on GPUs by managing implicit and explicit syn-chronisations, " in Proc. 21st IEEE Int. Conf. Parallel Distrib. Syst., 2015, pp. 535-544.
- (2015) Proc. 21st IEEE Int. Conf. Parallel Distrib. Syst. , pp. 535-544
- Butler, M.¹ Sajjapongse, K.² Becchi, M.³

15
- 84991588487
- Improving GPU utilisation with multi-process service (MPS)
- [Online]
- P. Sah, "Improving GPU utilisation with multi-process service (MPS), " in Proc. GPU Technol. Conf., ID S5584, 2015. [Online]. Available: http://on-demand.gputechconf.com/gtc/2015/presentation/S5584-Priyanka-Sah.pdf
- (2015) Proc. GPU Technol. Conf., ID S5584
- Sah, P.¹

16
- 84906356952
- Multi-threaded Kernel offloading to GPGPU using hyper-Q on Kepler architecture
- Jun.
- F. Wende, T. Steinke, and F. Cordes, "Multi-threaded Kernel offloading to GPGPU using hyper-Q on Kepler architecture, " in Proc. Zuse Inst. Berlin Rep., Jun. 2014, pp. 1-17.
- (2014) Proc. Zuse Inst. Berlin Rep , pp. 1-17
- Wende, F.¹ Steinke, T.² Cordes, F.³

17
- 85045564794
- NVIDIA, CUDA CProgramming Guide 8.0, 2016. [Online]
- NVIDIA, CUDA CProgramming Guide 8.0, 2016. [Online]. Available: https://docs.nvidia.com/cuda/pdf/CUDA-C-Programming- Guide.pdf

18
- 79951728783
- [Online]
- L. Howes, OpenCL 2.1 Specification, Khronos OpenCL Working Group, 2015. [Online]. Available: https://www.khronos.org/registry/cl/specs/opencl-2.1.pdf
- (2015) OpenCL 2.1 Specification Khronos OpenCL Working Group
- Howes, L.¹

19
- 84991641321
- SchedGPU: Fine-grain dynamic and adaptive scheduling for GPUs
- C. Reano, F. Silla, and M. J. Leslie, "schedGPU: Fine-grain dynamic and adaptive scheduling for GPUs, " in Proc. Int. Conf. High Perform. Comput. Simul., 2016, pp. 993-997.
- (2016) Proc. Int. Conf. High Perform. Comput. Simul. , pp. 993-997
- Reano, C.¹ Silla, F.² Leslie, M.J.³

20
- 0042830650
- Performance analysis of five interprocess communication mechanisms across UNIX operating systems
- P. K. Immich, R. S. Bhagavatula, and R. Pendse, "Performance analysis of five interprocess communication mechanisms across UNIX operating systems, " J. Syst. Softw., vol. 68, no. 1, pp. 27-43, 2003.
- (2003) J. Syst. Softw. , vol.68 , Issue.1 , pp. 27-43
- Immich, P.K.¹ Bhagavatula, R.S.² Pendse, R.³

21
- 0027721450
- Performance analysis of job scheduling policies in parallel supercomputing environ-ments
- V. K. Naik, M. S. Squillante, and S. K. Setia, "Performance analysis of job scheduling policies in parallel supercomputing environ-ments, " in Proc. IEEE/ACM Conf. Supercomput., 1993, pp. 824-833.
- (1993) Proc. IEEE/ACM Conf. Supercomput. , pp. 824-833
- Naik, V.K.¹ Squillante, M.S.² Setia, S.K.³

22
- 84976722900
- The impact of operating system scheduling policies and synchronisation methods of per-formance of parallel applications
- A. Gupta, A. Tucker, and S. Urushibara, "The impact of operating system scheduling policies and synchronisation methods of per-formance of parallel applications, " SIGMETRICS Perform. Eval. Rev., vol. 19, no. 1, pp. 120-132, 1991.
- (1991) SIGMETRICS Perform. Eval. Rev. , vol.19 , Issue.1 , pp. 120-132
- Gupta, A.¹ Tucker, A.² Urushibara, S.³

23
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, et al., "Rodinia: A benchmark suite for heterogeneous computing, " in Proc. IEEE Int. Symp. Workload Characterization, 2009, pp. 44-54.
- (2009) Proc. IEEE Int. Symp. Workload Characterization , pp. 44-54
- Che, S.¹

24
- 84873470137
- Center Reliable High-Perform. Comput., IMPACT Technical Report, IMPACT-12-01, University of Illinois at Urbana-Champaign
- J. A. Stratton, et al., "Parboil: A revised benchmark suite for scientific and commercial throughput computing, " Center Reliable High-Perform. Comput., IMPACT Technical Report, IMPACT-12-01, University of Illinois at Urbana-Champaign, 2012.
- (2012) Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing
- Stratton, J.A.¹

25
- 84876565276
- Parallel simulations for analysing portfolios of catastrophic event risk
- A. K. Bahl, O. Baltzer, A. Rau-Chaplin, and B. Varghese, "Parallel simulations for analysing portfolios of catastrophic event risk, " in Proc. Int. Supercomput. Conf. Companion: High Perform. Comput. Netw. Storage Anal., 2012, pp. 1176-1184.
- (2012) Proc. Int. Supercomput. Conf. Companion: High Perform. Comput. Netw. Storage Anal. , pp. 1176-1184
- Bahl, A.K.¹ Baltzer, O.² Rau-Chaplin, A.³ Varghese, B.⁴

26
- 69949100622
- Optimising data intensive GPGPU computations for DNA sequence alignment
- C. Trapnell and M. C. Schatz, "Optimising data intensive GPGPU computations for DNA sequence alignment, " Parallel Comput., vol. 35, no. 8/9, pp. 429-440, 2009.
- (2009) Parallel Comput. , vol.35 , Issue.8-9 , pp. 429-440
- Trapnell, C.¹ Schatz, M.C.²

27
- 78651415181
- GPU-BLAST: Using graphics processors to accelerate protein sequence alignment
- P. D. Vouzis and N. V. Sahinidis, "GPU-BLAST: Using graphics processors to accelerate protein sequence alignment, " Bioinf., vol. 27, no. 2, pp. 182-188, 2011.
- (2011) Bioinf. , vol.27 , Issue.2 , pp. 182-188
- Vouzis, P.D.¹ Sahinidis, N.V.²

28
- 80052985746
- Exploiting concurrent kernel execution on graphic processing units
- L. Wang, M. Huang, and T. El-Ghazawi, "Exploiting concurrent kernel execution on graphic processing units, " in Proc. Int. Conf. High Perform. Comput. Simul., 2011, pp. 24-32.
- (2011) Proc. Int. Conf. High Perform. Comput. Simul. , pp. 24-32
- Wang, L.¹ Huang, M.² El-Ghazawi, T.³

29
- 84894883016
- Fine-grained resource sharing for concurrent GPGPU kernels
- C. Gregg, J. Dorn, K. Hazelwood, and K. Skadron, "Fine-grained resource sharing for concurrent GPGPU kernels, " in Proc. USENIX Workshop Hot Topics Parallelism, 2012, pp. 10-10.
- (2012) Proc. USENIX Workshop Hot Topics Parallelism , pp. 10
- Gregg, C.¹ Dorn, J.² Hazelwood, K.³ Skadron, K.⁴

30
- 85045564529
- NVIDIA CUDA Multi-Process Service, May 2015. [Online]
- NVIDIA, CUDA Multi-Process Service, May 2015. [Online]. Available: https://docs.nvidia.com/deploy/pdf/CUDA-Multi-Process- Service-Overview.pdf

31
- 85045563630
- [Online]
- T. Bradley, Hyper-Q Example, NVIDIA, 2013. [Online]. Available: https://www.ecse.rpi.edu/wrf/wiki/ParallelComputingSpring 2014/cuda-samples/samples/6-Advanced/simpleHyperQ/doc/HyperQ.pdf
- (2013) Hyper-Q Example NVIDIA
- Bradley, T.¹

32
- 77953126928
- Grid Resource Management. Norwell, MA, USA: Kluwer
- B. Nitzberg, J. M. Schopf, and J. P. Jones, "PBS Pro: Grid computing and scheduling attributes, " in Grid Resource Management. Norwell, MA, USA: Kluwer, 2004, pp. 183-190.
- (2004) PBS Pro: Grid Computing and Scheduling Attributes , pp. 183-190
- Nitzberg, B.¹ Schopf, J.M.² Jones, J.P.³

33
- 84905509992
- Enabling preemptive multiprogramming on GPUs
- I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero, "Enabling preemptive multiprogramming on GPUs, " in Proc. Annu. Int. Symp. Comput. Archit., 2014, pp. 193-204.
- (2014) Proc. Annu. Int. Symp. Comput. Archit. , pp. 193-204
- Tanasic, I.¹ Gelado, I.² Cabezas, J.³ Ramirez, A.⁴ Navarro, N.⁵ Valero, M.⁶

34
- 85045576958
- NVIDIA, Tesla P100, 2016. [Online]
- NVIDIA, Tesla P100, 2016. [Online]. Available: https://images. nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf

35
- 84975267270
- Baymax: QoS awareness and increased utilization for non-preemptive accelerators in warehouse scale computers
- Q. Chen, H. Yang, J. Mars, and L. Tang, "Baymax: QoS awareness and increased utilization for non-preemptive accelerators in warehouse scale computers, " in Proc. 21st Int. Conf. Archit. Support Program. Languages Operating Syst., 2016, pp. 681-696.
- (2016) Proc. 21st Int. Conf. Archit. Support Program. Languages Operating Syst. , pp. 681-696
- Chen, Q.¹ Yang, H.² Mars, J.³ Tang, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.