SCOPUS 정보 검색 플랫폼

Journal of Computer Science and Technology

Volumn 27, Issue 1, 2012, Pages 57-74

A hybrid circular queue method for iterative stencil computations on GPUs

(4) Yang, Yang a,b Cui, Hui Min a,b Feng, Xiao Bing a Xue, Jing Ling c

a INSTITUTE OF COMPUTING TECHNOLOGY (China)

b UNIVERSITY OF CHINESE ACADEMY OF SCIENCES (China)

c UNIVERSITY OF NEW SOUTH WALES (Australia)

Author keywords

Circular queue; GPU; Occupancy; Register; Stencil computation

Indexed keywords

CIRCULAR QUEUE; GPU; OCCUPANCY; REGISTER; STENCIL COMPUTATIONS;

PROGRAM PROCESSORS;

QUEUEING THEORY;

EID: 84861635761 PISSN: 10009000 EISSN: None Source Type: Journal
DOI: 10.1007/s11390-012-1206-3 Document Type: Article

Times cited : (9)

References (37)

1
- 1542392248
- Achieving scalable locality with time skewing
- Wonnacott D. Achieving scalable locality with time skewing. Int. J. Parallel Program, 2002, 30(3): 181-221.
- (2002) Int. J. Parallel Program , vol.30 , Issue.3 , pp. 181-221
- Wonnacott, D.¹

2
- 34547503691
- Time skewing: A value-based ap-proach to optimizing for memory locality
- Department of Computer Science, Rugers Uni-versity
- Mccalpin J, Wonnacott D. Time skewing: A value-based ap-proach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers Uni-versity. 1999.
- (1999) Technical Report DCS-TR-379
- McCalpin, J.¹ Wonnacott, D.²

3
- 77954709215
- Cache oblivious parallelograms in iterative stencil computations
- Tsukuba, Japan, Jun. 1-4
- Strzodka R, Shaheen M, Pajak D et al. Cache oblivious parallelograms in iterative stencil computations. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.49-59.
- (2010) Proc. The 24th ACM Int. Conf. Supercomputing , pp. 49-59
- Strzodka, R.¹ Shaheen, M.² Pajak, D.³

4
- 0032635362
- New tiling techniques to improve cache temporal locality
- Atlanta, USA, May 1-4
- Song Y, Li Z. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Program-ming Language Design and Implementation, Atlanta, USA, May 1-4, 1999, pp.215-228.
- (1999) Proc. ACM SIGPLAN Conference on Program-ming Language Design and Implementation , pp. 215-228
- Song, Y.¹ Li, Z.²

5
- 84858693885
- Increasing tempo-ral locality with skewing and recursive blocking
- Denver, USA, Nov. 10-16
- Jin G, Mellor-Crummey J, Fowler R. Increasing tempo-ral locality with skewing and recursive blocking. In Proc. ACM/IEEE Conference on Supercomputing, Denver, USA, Nov. 10-16, 2001, pp.43-43.
- (2001) Proc. ACM/IEEE Conference on Supercomputing , pp. 43-43
- Jin, G.¹ Mellor-Crummey, J.² Fowler, R.³

6
- 70350771127
- Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures
- Austin, USA, Nov.15-21, Article 4.
- Datta K, Murphy M, Volkov V et al. Stencil computation op-timization and auto-tuning on state-of-the-art multicore ar-chitectures. In Proc. ACM/IEEE Conference on Supercom-puting, Austin, USA, Nov.15-21, 2008, Article 4.
- (2008) Proc. ACM/IEEE Conference on Supercom-putting
- Datta, K.¹ Murphy, M.² Volkov, V.³

7
- 34250216007
- Scientific computing kernels on the cell processor
- DOI 10.1007/s10766-007-0034-5
- Williams S, Shalf J, Oliker L et al. Scientific computing Ker-nels on the cell processor. Int. J. Parallel Program, 2007, 35(3): 263-298. (Pubitemid 46904454)
- (2007) International Journal of Parallel Programming , vol.35 , Issue.3 , pp. 263-298
- Williams, S.¹ Shalf, J.² Oliker, L.³ Kamil, S.⁴ Husbands, P.⁵ Yelick, K.⁶

8
- 70449723385
- Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
- Yorktown Heights, USA, Jun. 8-12
- Meng J, Skadron K. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proc. The 23rd International Conference on Supercomput-ing, Yorktown Heights, USA, Jun. 8-12, 2009, pp.256-265.
- (2009) Proc. The 23rd International Conference on Supercomput-ing , pp. 256-265
- Meng, J.¹ Skadron, K.²

9
- 79953817719
- NVIDIA. NVIDIA CUDA programming guide 3.0, http://de-veloper.download. nvidia.com/compute/cuda/3 0/toolkit/do-cs/NVIDIA CUDA ProgrammingGuide-pdf, 2010.
- (2010) NVIDIA CUDA Programming Guide 3.0

10
- 84875678919
- NVIDIA Corp
- NVIDIA Corp. CUDA Occupancy Calculator, 2010.
- (2010) CUDA Occupancy Calculator

11
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- Austin, USA, Jun. 20-24
- Hong S, Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. The 36th Annual Int. Symp. Computer Architecture, Austin, USA, Jun. 20-24, 2009, pp.152-163.
- (2009) Proc. The 36th Annual Int. Symp. Computer Architecture , pp. 152-163
- Hong, S.¹ Kim, H.²

12
- 77957561221
- An adaptive performance modeling tool for GPU archi-tectures
- Bangalore, India, Jan. 9-14
- Baghsorkhi S S, Delahaye M, Patel S J, Gropp W D, Hwu W W. An adaptive performance modeling tool for GPU archi-tectures. In Proc. The 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, Jan. 9-14, 2010, pp.105-114.
- (2010) Proc. The 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.W.⁵

13
- 84861583048
- Sept.
- van der Laan W J. Decuda. http://wiki.github.com/laanwj/decuda/, Sept., 2010.
- (2010) Decuda
- Van Der Laan, W.J.¹

14
- 70350786536
- A memory optimization technique for software-managed scratchpad memory in GPUs
- San Francisco, USA, Jul. 27-28
- Moazeni M, Bui A, Sarrafzadeh M. A memory optimization technique for software-managed scratchpad memory in GPUs. In Proc. The 7th IEEE Symposium on Application Specific Processors, San Francisco, USA, Jul. 27-28, 2009, pp.43-49.
- (2009) Proc. The 7th IEEE Symposium on Application Specific Processors , pp. 43-49
- Moazeni, M.¹ Bui, A.² Sarrafzadeh, M.³

15
- 77951159230
- A control-structure splitting opti-mization for GPGPU
- Ischia, Italy, May 18-20
- Carrillo S, Siegel J, Li X. A control-structure splitting opti-mization for GPGPU. In Proc. The 6th ACM Conf. Comput-ing Frontiers, Ischia, Italy, May 18-20, 2009, pp.147-150.
- (2009) Proc. The 6th ACM Conf. Comput-ing Frontiers , pp. 147-150
- Carrillo, S.¹ Siegel, J.² Li, X.³

16
- 78649816542
- Hy-brid core acceleration of UWB SIRE radar signal processing
- Park S J, Ross J, Shires D, Richie D, Henz B, Nguyen L. Hy-brid core acceleration of UWB SIRE radar signal processing. IEEE Trans. Parallel Distrib. Syst, 2011, 22(1): 46-57.
- (2011) IEEE Trans. Parallel Distrib. Syst , vol.22 , Issue.1 , pp. 46-57
- Park, S.J.¹ Ross, J.² Shires, D.³ Richie, D.⁴ Henz, B.⁵ Nguyen, L.⁶

17
- 77954713684
- An empirically tuned 2D and 3D FFT library on CUDA GPU
- Tsukuba, Japan, Jun. 1-4
- Gu L, Li X, Siegel J. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proc. The 24th ACM Int. Conf. Supercomputing, Tsukuba, Japan, Jun. 1-4, 2010, pp.305-314.
- (2010) Proc. The 24th ACM Int. Conf. Supercomputing , pp. 305-314
- Gu, L.¹ Li, X.² Siegel, J.³

18
- 77949629485
- Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA
- Shenzhen, China, Dec. 9-11
- Goorts P, Rogmans S, Bekaert P. Optimal data distribu-tion for versatile finite impulse response filtering on next-generation graphics hardware using CUDA. In Proc. The 15th International Conference on Parallel and Distributed Sys-tems, Shenzhen, China, Dec. 9-11, 2009, pp.300-307.
- (2009) Proc. The 15th International Conference on Parallel and Distributed Sys-tems , pp. 300-307
- Goorts, P.¹ Rogmans, S.² Bekaert, P.³

19
- 77951435761
- Accele-rating lattice Boltzmann fluid flow simulations using graphics processors
- Vienna, Austria, Sep. 22-25
- Bailey P, Myre J, Walsh S D C, Lilja D J, Saar M O. Accele-rating lattice Boltzmann fluid flow simulations using graphics processors. In Proc. International Conference on Parallel Processing, Vienna, Austria, Sep. 22-25, 2009, pp.550-557.
- (2009) Proc. International Conference on Parallel Processing , pp. 550-557
- Bailey, P.¹ Myre, J.² Walsh, S.D.C.³ Lilja, D.J.⁴ Saar, M.O.⁵

20
- 70449723384
- Tuned and wildly asyn-chronous stencil kernels for hybrid CPU/GPU systems
- Yorktown Heights, USA, Jun. 8-12
- Venkatasubramanian S, Vuduc R W. Tuned and wildly asyn-chronous stencil kernels for hybrid CPU/GPU systems. In Proc. The 23rd International Conference on Supercomputing, Yorktown Heights, USA, Jun. 8-12, 2009, pp.244-255.
- (2009) Proc. The 23rd International Conference on Supercomputing , pp. 244-255
- Venkatasubramanian, S.¹ Vuduc, R.W.²

21
- 70450077422
- Parallel data-locality aware stencil computations on modern micro-architectures
- Rome, Italy, May 23-29
- Christen M, Schenk O, Neufeld E et al. Parallel data-locality aware stencil computations on modern micro-architectures. In Proc. IEEE Int. Symp. Parallel & Distributed Process-ing, Rome, Italy, May 23-29, 2009, pp.1-10.
- (2009) Proc. IEEE Int. Symp. Parallel & Distributed Process-ing , pp. 1-10
- Christen, M.¹ Schenk, O.² Neufeld, E.³

22
- 67650671606
- 3D finite difference computation on GPUs us-ing CUDA
- Washington, USA, Mar. 8
- Micikevicius P. 3D finite difference computation on GPUs us-ing CUDA. In Proc. The 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, USA, Mar. 8, 2009, pp.79-84.
- (2009) Proc. The 2nd Workshop on General Purpose Processing on Graphics Processing Units , pp. 79-84
- Micikevicius, P.¹

23
- 78649538491
- Toward harnessing DOACROSS parallelism for multi-GPGPUs
- San Diego, USA, Sep. 13-16
- Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. The 39th Int. Conf. Parallel Processing, San Diego, USA, Sep. 13-16, 2010, pp.40-50.
- (2010) Proc. The 39th Int. Conf. Parallel Processing , pp. 40-50
- Di, P.¹ Wan, Q.² Zhang, X.³

24
- 80053238973
- Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures
- Anchorage, USA, May 16-20
- Christen M, Schenk O, Burkhart H. Patus: A code genera-tion and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Proc. IEEE International Parallel & Distributed Processing Symposium, Anchorage, USA, May 16-20, 2011, pp.676-687.
- (2011) Proc. IEEE International Parallel & Distributed Processing Symposium , pp. 676-687
- Christen, M.¹ Schenk, O.² Burkhart, H.³

25
- 78650806116
- 3 5-D blocking optimization for stencil computations on modern CPUs and GPUs
- New Orleans, USA, Nov. 13-19
- Nguyen A, Satish N, Chhugani J, Kim C, Dubey P. 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In Proc. ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, New Orleans, USA, Nov. 13-19, 2010, pp.1-13.
- (2010) Proc. ACM/IEEE Int. Conf. for High Performance Computing Networking, Storage and Analysis , pp. 1-13
- Nguyen, A.¹ Satish, N.² Chhugani, J.³ Kim, C.⁴ Dubey, P.⁵

26
- 34547401051
- Profitable loop fusion and tiling using model-driven empirical search
- DOI 10.1145/1183401.1183437, Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006
- Qasem A, Kennedy K. Profitable loop fusion and tiling us-ing model-driven empirical search. In Proc. The 20th Annual International Conference on Supercomputing, Cairns, Aus-tralia, Jun. 28-Jul. 1, 2006, pp.249-258. (Pubitemid 47168511)
- (2006) Proceedings of the International Conference on Supercomputing , pp. 249-258
- Qasem, A.¹ Kennedy, K.²

27
- 0442295621
- The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles
- Knijnenburg P M W, Kisuki T, Gallivan K et al. The effect of cache models on iterative compilation for combined tiling and unrolling: Research articles. Concurrency and Computation: Practice & Experience, 2004, 16(2-3): 247-270.
- (2004) Concurrency and Computation: Practice & Experience , vol.16 , Issue.2-3 , pp. 247-270
- Knijnenburg, P.M.W.¹ Kisuki, T.² Gallivan, K.³

28
- 10744232785
- A comparison of empirical and model-driven optimization
- San Diego, USA, Jun. 8-11
- Yotov K, Li X, Ren G et al. A comparison of empirical and model-driven optimization. In Proc. The ACM SIGPLAN 2003 Conference on Programming Language Design and Im-plementation, San Diego, USA, Jun. 8-11, 2003, pp.63-76.
- (2003) Proc. The ACM SIGPLAN 2003 Conference on Programming Language Design and Im-plementation , pp. 63-76
- Yotov, K.¹ Li, X.² Ren, G.³

29
- 33646828918
- Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
- 1402081, Proceedings of the 2005 International Symposium onCode Generation and Optimization, CGO 2005
- Chen C, Chame J, Hall M. Combining models and guided empirical search to optimize for multiple levels of the mem-ory hierarchy. In Proc. Int. Symp. Code Generation and Optimization, San Jose, USA, Mar. 20-23, 2005, pp.111-122. (Pubitemid 43773797)
- (2005) Proceedings of the 2005 International Symposium on Code Generation and Optimization, CGO 2005 , vol.2005 , pp. 111-122
- Chen, C.¹ Chame, J.² Hall, M.³

30
- 43949083742
- Analytic models and empirical search: A hybrid approach to code optimization
- DOI 10.1007/978-3-540-69330-7-18, Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
- Epshteyn A, Garzaran M, DeJong G et al. Analytical models and empirical search: A hybrid approach to code optimiza-tion. In Proc. The 18th International Workshop on Lan-guages and Compilers for Parallel Computing, Hawthorne, USA, Oct. 20-22, 2005, pp.259-273. (Pubitemid 351702215)
- (2006) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.4339 , pp. 259-273
- Epshteyn, A.¹ Garzaran, M.J.² DeJong, G.³ Padua, D.⁴ Ren, G.⁵ Li, X.⁶ Yotov, K.⁷ Pingali, K.⁸

31
- 84886006847
- Using machine learning to focus iterative optimization
- New York, USA, Mar. 26-29
- Agakov F, Bonilla E, Cavazos J et al. Using machine learning to focus iterative optimization. In Proc. Int. Symp. Code Generation and Optimization, New York, USA, Mar. 26-29, 2006, pp.295-305.
- (2006) Proc. Int. Symp. Code Generation and Optimization , pp. 295-305
- Agakov, F.¹ Bonilla, E.² Cavazos, J.³

32
- 4544380943
- Finding effective compilation sequences
- Washington, USA, Jun. 11-13
- Almagor L, Cooper K D, Grosul A, Harvey T J, Reeves S W, Subramanian D, Torczon L, Waterman T. Finding effective compilation sequences. In Proc. ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embed-ded Systems, Washington, USA, Jun. 11-13, 2004, pp.231-239.
- (2004) Proc. ACM SIGPLAN/SIGBED Conference on Languages Compilers, and Tools for Embed-ded Systems , pp. 231-239
- Almagor, L.¹ Cooper, K.D.² Grosul, A.³ Harvey, T.J.⁴ Reeves, S.W.⁵ Subramanian, D.⁶ Torczon, L.⁷ Waterman, T.⁸

33
- 34547681859
- Microarchitecture sensitive empirical models for compiler optimizations
- DOI 10.1109/CGO.2007.25, 4145110, International Symposium on Code Generation and Optimization, CGO 2007
- Vaswani K, Thazhuthaveetil M J, Srikant Y N et al. Mi-croarchitecture sensitive empirical models for compiler opti-mizations. In Proc. Int, Symp, Code Generation and Opti-mization, San Jose, USA, Mar. 11-14, 2007, pp.131-143. (Pubitemid 47214304)
- (2007) International Symposium on Code Generation and Optimization, CGO 2007 , pp. 131-143
- Vaswani, K.¹ Thazhuthaveetil, M.J.² Srikant, Y.N.³ Joseph, P.J.⁴

34
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- DOI 10.1145/1356058.1356084, Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization
- Ryoo S, Rodrigues C I, Stone S S et al. Program optimiza-tion space pruning for a multithreaded gpu. In Proc. The 6th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, Boston, USA, Apr. 6-9, 2008, pp.195-204. (Pubitemid 351667266)
- (2008) Proceedings of the 2008 CGO - Sixth International Symposium on Code Generation and Optimization , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.-M.W.⁷

35
- 77749340082
- Model-driven autotuning of sparse matrix-vector multiply on GPUs
- Bangalore, India, Jan. 9-14
- Choi J W, Singh A, Vuduc R W. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In Proc. The 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Bangalore, India, Jan. 9-14, 2010, pp.115-126.
- (2010) Proc. The 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming , pp. 115-126
- Choi, J.W.¹ Singh, A.² Vuduc, R.W.³

36
- 0042650298
- Software pipelining: An effective scheduling tech-nique for VLIW machines
- Atlanta, USA, Jun. 20-24
- Lam M. Software pipelining: An effective scheduling tech-nique for VLIW machines. In Proc. ACM SIGPLAN Confe-rence on Programming Language Design and Implementa-tion, Atlanta, USA, Jun. 20-24, 1988, pp.318-328.
- (1988) Proc. ACM SIGPLAN Confe-rence on Programming Language Design and Implementa-tion , pp. 318-328
- Lam, M.¹

37
- 0025447908
- Improving register allocation for subscripted variables
- White Plains, USA, Jun. 20-22
- Callahan D, Carr S, Kennedy K. Improving register allocation for subscripted variables. In Proc. ACM SIGPLAN Confer-ence on Programming Language Design and Implementation, White Plains, USA, Jun. 20-22, 1990, pp.53-65.
- (1990) Proc. ACM SIGPLAN Confer-ence on Programming Language Design and Implementation , pp. 53-65
- Callahan, D.¹ Carr, S.² Kennedy, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.