SCOPUS 정보 검색 플랫폼

Transactions on Architecture and Code Optimization

Volumn 8, Issue 4, 2012, Pages

Exploring the limits of GPGPU scheduling in control flow bound applications

(4) Malits, Roman a Bolotin, Evgeny b Kolodny, Avinoam a Mendelson, Avi c

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

b INTEL CORPORATION (Israel)

c MICROSOFT RESEARCH (United States)

Author keywords

GPGPU; Parallel machines; Scheduling algorithm

Indexed keywords

CONTROL FLOWS; DATA PARALLEL; GENERAL PURPOSE; GLOBAL SCHEDULING; GPGPU; HARDWARE UTILIZATION; HIGH RATE; IN-CONTROL; INHERENT LIMITATIONS; MACHINE UTILIZATION; MATRIX MULTIPLICATION; MEMORY ACCESS PATTERNS; PARALLEL GRAPH ALGORITHMS; PARALLEL MACHINE; PERFORMANCE GAIN; PERFORMANCE IMPROVEMENTS; ROOT CAUSE; RUNNING CONTROL; SCHEDULING MECHANISM; SCHEDULING METHODS; SHARED MEMORIES; THREAD SCHEDULING;

BENCHMARKING; OPTIMIZATION; PROGRAM PROCESSORS; SCHEDULING; SCHEDULING ALGORITHMS;

MULTITASKING;

EID: 84857873786 PISSN: 15443566 EISSN: 15443973 Source Type: Journal
DOI: 10.1145/2086696.2086708 Document Type: Article

Times cited : (10)

References (27)

1
- 84857846339
- OpenCL, parallel computing on GPU and CPU
- AAFTAB M. 2008. OpenCL, parallel computing on GPU and CPU. In Proceedings of SigGraph.
- (2008) Proceedings of SigGraph
- Aaftab, M.¹

2
- 34548040112
- Thesis, Stanford University
- AHN, J. H. 2007. Memory and control organizations of stream processors. Thesis, Stanford University.
- (2007) Memory and Control Organizations of Stream Processors
- Ahn, J.H.¹

3
- 77952660587
- Visualizing complex dynamics in many-core accelerator architectures
- ARIEL, A., FUNG, W. W. L., TURNER, A., AND AAMODT, T. M. Visualizing complex dynamics in many-core accelerator architectures. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 164-174.
- Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , pp. 164-174
- Ariel, A.¹ Fung, W.W.L.² Turner, A.³ Aamodt, T.M.⁴

4
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- BAKHODA, A., YUAN, G. L., FUNG, W. W. L., WONG, H., AND AAMODT, T. M. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 163-174.
- (2009) Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

5
- 0000269759
- Scheduling multithreaded computations by work stealing
- BLUMOFE, R. D. AND LEISERSON, C. E. 1999. Scheduling multithreaded computations by work stealing. J. ACM. 46, 5.
- (1999) J. ACM , vol.46 , pp. 5
- Blumofe, R.D.¹ Leiserson, C.E.²

6
- 51449118065
- A performance study of general purpose applications on graphics processors using CUDA
- CHE, S., BOYER, M., MENG, J., TARJAN, D., SHEAFER, J. W., AND SKADRON, K. 2008. A performance study of general purpose applications on graphics processors using CUDA. J. Parall. Distrib. Comput. 68, 10.
- (2008) J. Parall. Distrib. Comput. , vol.68 , pp. 10
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheafer, J.W.⁵ Skadron, K.⁶

7
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- CHE, S., BOYER, M., MENG, J., TARJAN, D., SHEAFER, J., LEE, S.-H., AND SKADRON, K. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization. 44-54.
- (2009) Proceedings of the IEEE International Symposium on Workload Characterization , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheafer, J.⁵ Lee, S.-H.⁶ Skadron, K.⁷

8
- 77950886971
- CONAN, B. AND GUY, K. 2008. A neural network on GPU. http://www. codeproject.com/KB/graphics/GPUNN.aspx.
- (2008) A Neural Network on GPU
- Conan, B.¹ Guy, K.²

9
- 80054828244
- Computer Science Department, Carleton University, Ottawa, Canada
- DEHNE, F. AND YOGARATNAM, K. 2010. Exploring the limits of GPU's with parallel graph algorithms. Computer Science Department, Carleton University, Ottawa, Canada.
- (2010) Exploring the Limits of GPU's with Parallel Graph Algorithms
- Dehne, F.¹ Yogaratnam, K.²

10
- 47349104432
- Dynamic warp formation and scheduling for efficient gpu control flow
- FUNG, W. W. L., SHAM, I., YUAN, G., AND AAMODT, T. M. 2007. Dynamic warp formation and scheduling for efficient gpu control flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.
- (2007) Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

11
- 79955923056
- Thread block compaction for efficient SIMT control flow
- FUNG, W. W. L. AND AAMODT, T. M. Thread block compaction for efficient SIMT control flow. In Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17). 25-36.
- Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17) , pp. 25-36
- Fung, W.W.L.¹ Aamodt, T.M.²

12
- 38349041620
- Accelerating large graph algorithms on the GPU using CUDA
- HARISH, P. AND NARAYANAN, P. J. 2007. Accelerating large graph algorithms on the GPU using CUDA. In Proceedings of HiPC. 197-208.
- (2007) Proceedings of HiPC , pp. 197-208
- Harish, P.¹ Narayanan, P.J.²

13
- 33845425948
- ClawHMMER: A streaming HMMer-search implementation
- HORN, R., HOUSTON, M., AND HANRAHAN, P. 2005. ClawHMMER: A streaming HMMer-search implementation. In Proceedings of the ACM/IEEE Supercomputing Conference.
- (2005) Proceedings of the ACM/IEEE Supercomputing Conference
- Horn, R.¹ Houston, M.² Hanrahan, P.³

14
- 79952811127
- Accelerating CUDA graph algorithms at maximum warp
- HONG, S., KIM, S. K., OGUNTEBI, T., AND OLUKOTUN, K. 2011. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of PPoPP.
- (2011) Proceedings of PPoPP
- Hong, S.¹ Kim, S.K.² Oguntebi, T.³ Olukotun, K.⁴

15
- 84856541553
- Efficient parallel graph exploration for multi-core CPU and GPU
- HONG, S., OGUNTEBI, T., AND OLUKOTUN, K. 2011. Efficient parallel graph exploration for multi-core CPU and GPU. In Proceedings of PACT.
- (2011) Proceedings of PACT
- Hong, S.¹ Oguntebi, T.² Olukotun, K.³

16
- 0034459255
- Efficient conditional operations for data-parallel architectures
- KAPASI, U. J., DALLY, J., RIXNER, W. S., MATTSON, P. R., OWENS, J. D., AND KHAILANY, B. 2000. Efficient conditional operations for data-parallel architectures. In Proceedings of MICRO.
- (2000) Proceedings of MICRO
- Kapasi, U.J.¹ Dally, J.² Rixner, W.S.³ Mattson, P.R.⁴ Owens, J.D.⁵ Khailany, B.⁶

17
- 80052335998
- British Columbia Canada
- KARIMI NEIL, K., DICKSON, G., AND HAMZE, F. 2010. A performance comparison of CUDA and OpenCL. British Columbia Canada.
- (2010) A Performance Comparison of CUDA and OpenCL
- Karimi Neil, K.¹ Dickson, G.² Hamze, F.³

18
- 70649104826
- A characterization and analysis of PTX kernels
- KERR, A., DIAMOS, G., AND YALAMANCHILI, S. 2009. A characterization and analysis of PTX kernels. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC).
- (2009) Proceedings of the IEEE International Symposium on Workload Characterization (IISWC)
- Kerr, A.¹ Diamos, G.² Yalamanchili, S.³

19
- 0042650298
- Software pipelining: An effective scheduling technique for VLIW machines
- LAM, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.
- (1988) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation
- Lam, M.¹

20
- 84857846338
- MAXIME, B. 2010. Ray tracing in CUDA. http://ercbench.ece.wisc.edu/index. php?option=com-content&view=article&id=59:ray-tracing&catid=18: gpgpu&Itemid=20.
- (2010) Ray Tracing in CUDA
- Maxime, B.¹

21
- 77954994930
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- University of Virgina
- MENG, J., TARJAN, D., AND SKADRON, K. 2010. Dynamic warp subdivision for integrated branch and memory divergence tolerance. Tech. rep. CS-2010-5, University of Virgina.
- (2010) Tech. Rep. CS-2010-5
- Meng, J.¹ Tarjan, D.² Skadron, K.³

22
- 57649232417
- GPU acceleration of numerical weather prediction
- MICHALAKES, J. AND VACHHARAJANI, M. GPU acceleration of numerical weather prediction. IEEE International Symposium on Parallel and Distributed Processing. 1-7.
- IEEE International Symposium on Parallel and Distributed Processing , pp. 1-7
- Michalakes, J.¹ Vachharajani, M.²

23
- 84857874275
- CUDA C programming best practices guide
- NVIDIA
- NVIDIA. 2009. CUDA C programming best practices guide. CUDA Toolkit 2.3.
- (2009) CUDA Toolkit 2.3

24
- 77951900491
- NVIDIA CORPORATION
- NVIDIA CORPORATION. 2009. NVIDIA's next generation CUDA compute architecture.
- (2009) NVIDIA's Next Generation CUDA Compute Architecture

25
- 0033691565
- Memory access scheduling
- RIXNER, S., DALLY, W. J., KAPASI, U. J., MATTSON, P., AND OWENS, J. D. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00).
- (2000) Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00)
- Rixner, S.¹ Dally, W.J.² Kapasi, U.J.³ Mattson, P.⁴ Owens, J.D.⁵

26
- 38849131252
- High-throughput sequence alignment using graphics processing units
- SCHATZ, M., TRAPNELL, C., DELCHER, A., AND VARSHNEY, A. 2007. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 8, 1, 474.
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

27
- 83755201638
- Department of Electrical and Computer Engineering, University of Toronto
- WONG, H., PAPADOPOULOU, M., SADOOGHI-ALVANDI, M., AND MOSHOVOS, A. 2010 . Demystifying GPU microarchitecture through microbenchmarking. Department of Electrical and Computer Engineering, University of Toronto.
- (2010) Demystifying GPU Microarchitecture Through Microbenchmarking
- Wong, H.¹ Papadopoulou, M.² Sadooghi-Alvandi, M.³ Moshovos, A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.