SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on Computer Architecture

Volumn , Issue , 2014, Pages 193-204

Enabling preemptive multiprogramming on GPUs

(6) Tanasic, Ivan a,b Gelado, Isaac c Cabezas, Javier a,b Ramirez, Alex a,b Navarro, Nacho a,b Valero, Mateo a,b

a BARCELONA SUPERCOMPUTING CENTER (Spain)

b UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

c NVIDIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; MULTIPROGRAMMING; QUALITY OF SERVICE; TURNAROUND TIME;

CONCURRENT EXECUTION; CONCURRENT PROCESS; HARDWARE EXTENSION; MULTIPLE APPLICATIONS; PREEMPTIVE MULTITASKING; RESOURCE SHARING; SCHEDULING POLICIES; SYSTEM FAIRNESS;

PROGRAM PROCESSORS;

EID: 84905509992 PISSN: 10636897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ISCA.2014.6853208 Document Type: Conference Paper

Times cited : (192)

References (39)

1
- 84860351763
- The case for GPGPU spatial multitasking, High Performance Computer Architecture (HPCA) 2012
- J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte, "The case for GPGPU spatial multitasking," in High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012, pp. 1-12.
- (2012) IEEE 18th International Symposium On. IEEE , pp. 1-12
- Adriaens, J.T.¹ Compton, K.² Kim, N.S.³ Schulte, M.J.⁴

2
- 70450183916
- Understanding the efficiency of ray traversal on GPUs
- T. Aila and S. Laine, "Understanding the efficiency of ray traversal on GPUs," in Proceedings of the Conference on High Performance Graphics 2009. ACM, 2009, pp. 145-149.
- (2009) Proceedings of the Conference on High Performance Graphics 2009 ACM , pp. 145-149
- Aila, T.¹ Laine, S.²

3
- 84905507634
- AMD, "AMD A-Series Processor-in-a-Box," 2012. [Online]. Available: http://www.amd.com/us/products/desktop/processors/a-series/Pages/a- series-pib.aspx
- (2012) AMD AMD A-Series Processor-in-a-Box

4
- 84889605275
- AMD, "AMD Graphics Cores Next (GCN) architecture white paper," 2012.
- (2012) AMD AMD Graphics Cores Next (GCN) Architecture White Paper

5
- 84905507635
- ARM, "ARM Mali," 2012. [Online]. Available: www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute
- (2012) ARM ARM Mali

6
- 84866460333
- Supporting preemptive task executions and memory copies in GPGPUs
- C. Basaran and K.-D. Kang, "Supporting preemptive task executions and memory copies in GPGPUs," in Real-Time Systems (ECRTS), 2012 24th Euromicro Conference on. IEEE, 2012, pp. 287-296.
- (2012) Real-Time Systems (ECRTS) 2012 24th Euromicro Conference on IEEE , pp. 287-296
- Basaran, C.¹ Kang, K.-D.²

7
- 43649096256
- Graphic engine resource management
- M. Bautin, A. Dwarakinath, and T. Chiueh, "Graphic engine resource management," in SPIE 2008, vol. 6818, 2008, p. 68180O.
- (2008) SPIE , vol.6818 , Issue.2008
- Bautin, M.¹ Dwarakinath, A.² Chiueh, T.³

8
- 84859702950
- AMD Fusion APU: Llano
- A. Branover, D. Foley, and M. Steinman, "AMD Fusion APU: Llano," Micro, IEEE, vol. 32, no. 2, pp. 28-37, 2012.
- (2012) Micro IEEE , vol.32 , Issue.2 , pp. 28-37
- Branover, A.¹ Foley, D.² Steinman, M.³

9
- 79951697459
- Task superscalar: An out-of-order task pipeline
- Y. Etsion, F. Cabarcas, A. Rico, A. Ramirez, R. M. Badia, E. Ayguade, J. Labarta, and M. Valero, "Task superscalar: An out-of-order task pipeline," in Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on. IEEE, 2010, pp. 89-100.
- (2010) Microarchitecture (MICRO) 2010 43rd Annual IEEE/ACM International Symposium On. IEEE , pp. 89-100
- Etsion, Y.¹ Cabarcas, F.² Rico, A.³ Ramirez, A.⁴ Badia, R.M.⁵ Ayguade, E.⁶ Labarta, J.⁷ Valero, M.⁸

10
- 47249094055
- System-level performance metrics for multiprogram workloads
- S. Eyerman and L. Eeckhout, "System-level performance metrics for multiprogram workloads," Micro, IEEE, vol. 28, no. 3, pp. 42-53, 2008.
- (2008) Micro IEEE , vol.28 , Issue.3 , pp. 42-53
- Eyerman, S.¹ Eeckhout, L.²

11
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic warp formation and scheduling for efficient GPU control flow," in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007, pp. 407-420.
- (2007) Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 407-420
- Fung, W.W.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

12
- 84894883016
- Fine-grained resource sharing for concurrent GPGPU kernels
- C. Gregg, J. Dorn, K. Hazelwood, and K. Skadron, "Fine-grained resource sharing for concurrent GPGPU kernels," in Proceedings of the 4th USENIX conference on Hot Topics in Parallelism. USENIX Association, 2012, pp. 10-10.
- (2012) Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism. USENIX Association , pp. 10-10
- Gregg, C.¹ Dorn, J.² Hazelwood, K.³ Skadron, K.⁴

13
- 79960526623
- Enabling task parallelism in the CUDA scheduler
- M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron, "Enabling task parallelism in the CUDA scheduler," in Workshop on Programming Models for Emerging Architectures, 2009, pp. 69-76.
- (2009) Workshop on Programming Models for Emerging Architectures , pp. 69-76
- Guevara, M.¹ Gregg, C.² Hazelwood, K.³ Skadron, K.⁴

14
- 84870690379
- A study of persistent threads style GPU programming for GPGPU workloads
- K. Gupta, J. A. Stuart, and J. D. Owens, "A study of persistent threads style GPU programming for GPGPU workloads," in Innovative Parallel Computing (InPar), 2012. IEEE, 2012, pp. 1-14.
- (2012) Innovative Parallel Computing (InPar) 2012 IEEE , pp. 1-14
- Gupta, K.¹ Stuart, J.A.² Owens, J.D.³

15
- 84905507636
- Intel, "4th generation Intel Core processors are here," 2012. [Online]. Available: http://www.intel.com/content/www/us/en/processors/core/4th- gen-core-processor-family.html
- (2012) Intel 4th Generation Intel Core Processors Are Here

16
- 84863015834
- RGEM: A responsive GPGPU execution model for runtime engines
- S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar, "RGEM: A responsive GPGPU execution model for runtime engines," in Real-Time Systems Symposium (RTSS), 2011 IEEE 32nd. IEEE, 2011, pp. 57-66.
- (2011) Real-Time Systems Symposium (RTSS) 2011 IEEE 32nd. IEEE , pp. 57-66
- Kato, S.¹ Lakshmanan, K.² Kumar, A.³ Kelkar, M.⁴ Ishikawa, Y.⁵ Rajkumar, R.⁶

17
- 85077032008
- Time-Graph: GPU scheduling for real-time multi-tasking environments
- S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa, "Time-Graph: GPU scheduling for real-time multi-tasking environments," in 2011 USENIX Annual Technical Conference (USENIX ATC11), 2011, p. 17.
- (2011) 2011 USENIX Annual Technical Conference (USENIX ATC11) , pp. 17
- Kato, S.¹ Lakshmanan, K.² Rajkumar, R.R.³ Ishikawa, Y.⁴

18
- 84878156908
- Gdev: First-class GPU resource management in the operating system
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt, "Gdev: First-class GPU resource management in the operating system," in USENIX ATC, vol. 12, 2012, pp. 37-37.
- (2012) USENIX ATC , vol.12 , pp. 37-37
- Kato, S.¹ McThrow, M.² Maltzahn, C.³ Brandt, S.⁴

19
- 84888133920
- Heterogenious System Architecture: A technical review
- G. Kyriazis, "Heterogenious System Architecture: a technical review," AMD, 2012.
- (2012) AMD
- Kyriazis, G.¹

20
- 80155183121
- GPU resource sharing and virtualization on high performance computing systems
- T. Li, V. K. Narayana, E. El-Araby, and T. El-Ghazawi, "GPU resource sharing and virtualization on high performance computing systems," in Parallel Processing (ICPP), 2011 International Conference on. IEEE, 2011, pp. 733-742.
- (2011) Parallel Processing (ICPP), 2011 International Conference on IEEE , pp. 733-742
- Li, T.¹ Narayana, V.K.² El-Araby, E.³ El-Ghazawi, T.⁴

21
- 44849137198
- NVIDIA Tesla: A unified graphics and computing architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A unified graphics and computing architecture," Micro, IEEE, vol. 28, no. 2, pp. 39-55, 2008.
- (2008) Micro IEEE , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

22
- 84864857149
- Igpu: Exception support and speculative execution on gpus
- J. Menon, M. De Kruijf, and K. Sankaralingam, "igpu: Exception support and speculative execution on gpus," in Proceedings of the 39th Annual International Symposium on Computer Architecture. IEEE, 2012, pp. 72-83.
- (2012) Proceedings of the 39th Annual International Symposium on Computer Architecture IEEE , pp. 72-83
- Menon, J.¹ De Kruijf, M.² Sankaralingam, K.³

23
- 84872539869
- NVIDIA
- NVIDIA, "Next generation CUDA computer architecture Kepler GK110," 2012.
- (2012) Next Generation CUDA Computer Architecture Kepler GK110

24
- 84905507628
- NVIDIA
- NVIDIA, "Sharing a GPU between MPI processes: multi-process service (MPS) overview," 2013.
- (2013) Sharing A GPU between MPI Processes: Multi-process Service (MPS) Overview

25
- 85130336256
- NVIDIA
- NVIDIA, "Programming guide-CUDA toolkit documentation," 2014. [Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index. html
- (2014) Programming Guide-CUDA Toolkit Documentation

26
- 49049088756
- GPU computing
- J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, "GPU computing," Proceedings of the IEEE, vol. 96, no. 5, pp. 879-899, 2008.
- (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 879-899
- Owens, J.D.¹ Houston, M.² Luebke, D.³ Green, S.⁴ Stone, J.E.⁵ Phillips, J.C.⁶

27
- 84875669496
- Improving GPGPU concurrency with elastic kernels
- S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan, "Improving GPGPU concurrency with elastic kernels," in Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems. ACM, 2013, pp. 407-418.
- (2013) Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems ACM , pp. 407-418
- Pai, S.¹ Thazhuthaveetil, M.J.² Govindarajan, R.³

28
- 84897759661
- Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces
- B. Pichai, L. Hsu, and A. Bhattacharjee, "Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2014, pp. 743-758.
- (2014) Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems ACM , pp. 743-758
- Pichai, B.¹ Hsu, L.² Bhattacharjee, A.³

29
- 79960506159
- Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
- V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework," in Proceedings of the 20th international symposium on High performance distributed computing. ACM, 2011, pp. 217-228.
- (2011) Proceedings of the 20th International Symposium on High Performance Distributed Computing ACM , pp. 217-228
- Ravi, V.T.¹ Becchi, M.² Agrawal, G.³ Chakradhar, S.⁴

30
- 82655162782
- PTask: Operating system abstractions to manage GPUs as compute devices
- C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel, "PTask: operating system abstractions to manage GPUs as compute devices," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011, pp. 233-248.
- (2011) Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM , pp. 233-248
- Rossbach, C.J.¹ Currey, J.² Silberstein, M.³ Ray, B.⁴ Witchel, E.⁵

31
- 84905507630
- Samsung, "Samsung Exynos," 2012. [Online]. Available: www.samsung.com/exynos
- (2012) Samsung Samsung Exynos

32
- 0022205683
- Implementation of precise interrupts in pipelined processors
- J. E. Smith and A. R. Pleszkun, "Implementation of precise interrupts in pipelined processors," in Proceedings of the 12th annual International Symposium on Computer Architecture, ser. ISCA 85, 1985, pp. 36-44.
- (1985) Proceedings of the 12th Annual International Symposium on Computer Architecture, Ser. ISCA 85 , pp. 36-44
- Smith, J.E.¹ Pleszkun, A.R.²

33
- 84870188759
- Softshell: Dynamic scheduling on GPUs
- M. Steinberger, B. Kainz, B. Kerbl, S. Hauswiesner, M. Kenzel, and D. Schmalstieg, "Softshell: dynamic scheduling on GPUs," ACM Transactions on Graphics (TOG), vol. 31, no. 6, p. 161, 2012.
- (2012) ACM Transactions on Graphics (TOG) , vol.31 , Issue.6 , pp. 161
- Steinberger, M.¹ Kainz, B.² Kerbl, B.³ Hauswiesner, S.⁴ Kenzel, M.⁵ Schmalstieg, D.⁶

34
- 84873470137
- The parboil benchmarks
- University of Illinois at Urbana-Champaign, Tech. Rep
- J. Stratton, C. Rodrigues, I. Sung, N. Obeid, L. Chang, G. Liu, and W. Hwu, "The Parboil benchmarks," Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, Tech. Rep., 2012.
- (2012) Technical Report IMPACT-12-01
- Stratton, J.¹ Rodrigues, C.² Sung, I.³ Obeid, N.⁴ Chang, L.⁵ Liu, G.⁶ Hwu, W.⁷

35
- 58449109179
- MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs
- J. Stratton, S. Stone, and W.-m. Hwu, "MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs," LCPC 2008, pp. 16-30, 2008.
- (2008) LCPC , vol.2008 , pp. 16-30
- Stratton, J.¹ Stone, S.² Hwu, W.-M.³

36
- 34547715870
- Initial observations of the simultaneous multithreading Pentium 4 processor
- N. Tuck and D. M. Tullsen, "Initial observations of the simultaneous multithreading Pentium 4 processor," in Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT 2003. IEEE, 2003, pp. 26-34.
- (2003) Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT 2003 IEEE , pp. 26-34
- Tuck, N.¹ Tullsen, D.M.²

37
- 47249121916
- FAME: Fairly measuring multithreaded architectures
- J. Vera, F. J. Cazorla, A. Pajuelo, O. J. Santana, E. Fernandez, and M. Valero, "FAME: Fairly measuring multithreaded architectures," in Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on. IEEE, 2007, pp. 305-316.
- (2007) Parallel Architecture and Compilation Techniques 2007. PACT 2007. 16th International Conference on IEEE , pp. 305-316
- Vera, J.¹ Cazorla, F.J.² Pajuelo, A.³ Santana, O.J.⁴ Fernandez, E.⁵ Valero, M.⁶

38
- 79955435088
- Fermi GF100 GPU architecture
- C. M. Wittenbrink, E. Kilgariff, and A. Prabhu, "Fermi GF100 GPU architecture," Micro, IEEE, vol. 31, no. 2, pp. 50-59, 2011.
- (2011) Micro IEEE , vol.31 , Issue.2 , pp. 50-59
- Wittenbrink, C.M.¹ Kilgariff, E.² Prabhu, A.³

39
- 84891139433
- arXiv preprint arXiv 1303.5164
- J. Zhong and B. He, "Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling," arXiv preprint arXiv:1303.5164, 2013.
- (2013) Kernelet: High-throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
- Zhong, J.¹ He, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.