SCOPUS 정보 검색 플랫폼

Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06

Volumn , Issue , 2006, Pages

A memory model for scientific algorithms on graphics processors

(4) Govindaraju, Naga K a,b Larsen, Scott a Gray, Jim b Manocha, Dinesh a

a UNIVERSITY OF NORTH CAROLINA (United States)

b MICROSOFT (United States)

Author keywords

Graphics processors; Memory model; Scientific algorithms

Indexed keywords

2D BLOCK REPRESENTATIONS; GRAPHICS PROCESSORS; MEMORY MODELS; SCIENTIFIC ALGORITHMS;

ALGORITHMS; CACHE MEMORY; COMPUTATION THEORY; COMPUTER ARCHITECTURE; FAST FOURIER TRANSFORMS; MATHEMATICAL MODELS;

DATA STORAGE EQUIPMENT;

EID: 34548292052 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1188455.1188549 Document Type: Conference Paper

Times cited : (128)

References (39)

1
- 0024082546
- The iuput/output complexity of sorting and related problems
- AGGARWAL, A., AND VITTER, J. S. .1988. The iuput/output complexity of sorting and related problems. Commun. ACM 31, 1116-1127.
- (1988) Commun. ACM , vol.31 , pp. 1116-1127
- AGGARWAL, A.¹ VITTER, J.S.²

2
- 0003706460
- SlAM, Philadelphia
- ANDERSON, E., BAI, Z., BISCHOF, C., DEMMEL, J., DONOARRA, J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., AND SORENSEN, D. 1992. LAPACK User's Guide, Release 1.0. SlAM, Philadelphia.
- (1992) LAPACK User's Guide, Release 1.0
- ANDERSON, E.¹ BAI, Z.² BISCHOF, C.³ DEMMEL, J.⁴ DONOARRA, J.⁵ DU CROZ, J.⁶ GREENBAUM, A.⁷ HAMMARLING, S.⁸ SORENSEN, D.⁹

3
- 34548217409
- AROE, L., B RODAL, G., AND FAOERBERO, R. 2004. Cache oblivious data structures. Handbook on Data Structures and Applications.
- AROE, L., B RODAL, G., AND FAOERBERO, R. 2004. Cache oblivious data structures. Handbook on Data Structures and Applications.

4
- 0028743437
- Compiler transformations for high-performance computing
- BACON, D. F., GRAHAM, S. L., AND SHARP, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4, 345-420.
- (1994) ACM Comput. Surv , vol.26 , Issue.4 , pp. 345-420
- BACON, D.F.¹ GRAHAM, S.L.² SHARP, O.J.³

5
- 0038164060
- Unimodular transformations of double loops
- BANERJEE, U. 1990. Unimodular transformations of double loops. Proc. of the Workshop on Advances in Lanugages and Compilers for Parallel Processing, 192-219.
- (1990) Proc. of the Workshop on Advances in Lanugages and Compilers for Parallel Processing , pp. 192-219
- BANERJEE, U.¹

6
- 85154002090
- Sorting networks and their applications
- BATCHER, K. 1968. Sorting networks and their applications. In AFIPS Spring Joint Computer Conference.
- (1968) AFIPS Spring Joint Computer Conference
- BATCHER, K.¹

7
- 0242533311
- Sparse matrix solvers on the GPU: Conjugate gradients and multigrid
- BOLZ, J., FARMER, I., GRINSPUN, E., AND SCHRÖDER, P. 2003. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22, 3, 917-924.
- (2003) ACM Trans. Graph , vol.22 , Issue.3 , pp. 917-924
- BOLZ, J.¹ FARMER, I.² GRINSPUN, E.³ SCHRÖDER, P.⁴

8
- 10644248153
- Brook for GPUs: Stream, computing on graphics hardware
- BUCK, I., FOLEY, T., HORN, D., SUOERMAN, J., FATAHALIAN, K., HOUSTON, M., AND HANRAHAN, P. 2004. Brook for GPUs: stream, computing on graphics hardware. ACM Trans. Graph. 23, 3, 777-786.
- (2004) ACM Trans. Graph , vol.23 , Issue.3 , pp. 777-786
- BUCK, I.¹ FOLEY, T.² HORN, D.³ SUOERMAN, J.⁴ FATAHALIAN, K.⁵ HOUSTON, M.⁶ HANRAHAN, P.⁷

9
- 84964748976
- Compiler blockability of numerical algorithms
- CARR, S., AND KENNEDY, K. 1992. Compiler blockability of numerical algorithms. Proc. of ACM/IEEE Conference on Supercomputing, 114-124.
- (1992) Proc. of ACM/IEEE Conference on Supercomputing , pp. 114-124
- CARR, S.¹ KENNEDY, K.²

10
- 84976745804
- Tile size selection using cache organization and data layout
- COLEMAN, S., AND MCKINLEY, K. 1995. Tile size selection using cache organization and data layout. SIGPLAN Conference on Programming Language Design and Implementation, 279-290.
- (1995) SIGPLAN Conference on Programming Language Design and Implementation , pp. 279-290
- COLEMAN, S.¹ MCKINLEY, K.²

11
- 23944462603
- GPU cluster for high performance computing
- FAN, Z., QIU, F., KAUFMAN, A., AND YOAKUM-STOVER, S. 2004. GPU cluster for high performance computing. In ACM/IEEE Supercomputing Conference 2004.
- (2004) ACM/IEEE Supercomputing Conference 2004
- FAN, Z.¹ QIU, F.² KAUFMAN, A.³ YOAKUM-STOVER, S.⁴

12
- 78651269052
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
- Eurographics Association
- FATAHALIAN, K., SUOERMAN, J., AND HANRAHAN, P. 2004. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In Proceedings of the ACM SIGGRAPH/ EIIROGRAPHICS conference on Graphics hardware, Eurographics Association.
- (2004) Proceedings of the ACM SIGGRAPH/ EIIROGRAPHICS conference on Graphics hardware
- FATAHALIAN, K.¹ SUOERMAN, J.² HANRAHAN, P.³

13
- 0033350255
- Cacheoblivious algorithms
- FRIGO, M., LEISERSON, C., PROKOP, H., AND RAMACHANDRAN, S. 1999. Cacheoblivious algorithms. Symposium on Foundations of Computer Science.
- (1999) Symposium on Foundations of Computer Science
- FRIGO, M.¹ LEISERSON, C.² PROKOP, H.³ RAMACHANDRAN, S.⁴

14
- 33845468997
- LUGPU: Efficient algorithms for solving dense linear systems on graphics hardware
- GALOPPO, N., GOVINDARAJU, N., HENSON, M., AND MANOCHA, D. 2005. LUGPU: Efficient algorithms for solving dense linear systems on graphics hardware. In Proc. ACM/IEEE SuperComputing Conference.
- (2005) Proc. ACM/IEEE SuperComputing Conference
- GALOPPO, N.¹ GOVINDARAJU, N.² HENSON, M.³ MANOCHA, D.⁴

15
- 33845440618
- GPGPU performance tuning
- Tech. rep, University of Dortmund, Germany
- GÖDDEKE, D. 2005. GPGPU performance tuning. Tech. rep., University of Dortmund, Germany, http://www.mathematik.uni-dortiimiid.de/ ~goedd8ke/ gpgpu/.
- (2005)
- GÖDDEKE, D.¹

16
- 3142739595
- Fast computation of database operations using graphics processors
- GOVINDARAJU, N., LLOYD, B., WANO, W., LIN, M., AND MANOCHA, D. 2004. Fast computation of database operations using graphics processors. Proc. of ACM SIGMOD.
- (2004) Proc. of ACM SIGMOD
- GOVINDARAJU, N.¹ LLOYD, B.² WANO, W.³ LIN, M.⁴ MANOCHA, D.⁵

17
- 29844438097
- Fast and approximate stream mining of quantites and frequencies using graphics processors
- GOVINDARAJU, N., RAGHUVANSHI, N., AND MANOCHA, D. 2005. Fast and approximate stream mining of quantites and frequencies using graphics processors. Proc. of ACM SIGMOD.
- (2005) Proc. of ACM SIGMOD
- GOVINDARAJU, N.¹ RAGHUVANSHI, N.² MANOCHA, D.³

18
- 33947607609
- GPUTeraSort: High performance graphics coprocessor sorting for large database management
- GOVINDARAJU, N., GRAY, J., KUMAR, R., AND MANOCHA, D. 2006. GPUTeraSort: High performance graphics coprocessor sorting for large database management. Proc. of ACM SIGMOD.
- (2006) Proc. of ACM SIGMOD
- GOVINDARAJU, N.¹ GRAY, J.² KUMAR, R.³ MANOCHA, D.⁴

19
- 0030677581
- The design and analysis of a cache architecture for texture mapping
- HAKURA, Z., AND GUPTA, A. 1997. The design and analysis of a cache architecture for texture mapping. Proc. of 24th International Symposium on Computer Architecture, 108-120.
- (1997) Proc. of 24th International Symposium on Computer Architecture , pp. 108-120
- HAKURA, Z.¹ GUPTA, A.²

20
- 10644280791
- Cache and bandwidth aware matrix multiplication on the GPU
- Technical Report UIUCDCS-R-2003-2328, University of Illinois at Urbana-Champaign
- HALL, J. D., CARS, N., AND HART, J. 2003. Cache and bandwidth aware matrix multiplication on the GPU. Technical Report UIUCDCS-R-2003-2328, University of Illinois at Urbana-Champaign.
- (2003)
- HALL, J.D.¹ CARS, N.² HART, J.³

21
- 78651284090
- Simulation of cloud dynamics on graphics hardware
- HARRIS, M., BAXTER, B., SCHEUERMANN, G., AND LASTRA, A. 2003. Simulation of cloud dynamics on graphics hardware. SIGGRAPH/Eurographics Workshop on Graphics Hardware.
- (2003) SIGGRAPH/Eurographics Workshop on Graphics Hardware
- HARRIS, M.¹ BAXTER, B.² SCHEUERMANN, G.³ LASTRA, A.⁴

22
- 0024903997
- Evaluating associativity in cpu caches
- HILL, M. D., AND SMITH, A.J. 1989. Evaluating associativity in cpu caches. IEEE Transactions on Computers 38, 12, 1612-1630.
- (1989) IEEE Transactions on Computers , vol.38 , Issue.12 , pp. 1612-1630
- HILL, M.D.¹ SMITH, A.J.²

23
- 85019066865
- Visual simulation of ice crystal growth
- KIM, T., AND LIN, M. 2003. Visual simulation of ice crystal growth. In Proc. of ACM SIGGRAPH / Eurographics Symposium on Computer Animcation.
- (2003) Proc. of ACM SIGGRAPH / Eurographics Symposium on Computer Animcation
- KIM, T.¹ LIN, M.²

24
- 78650965796
- Uberflow: A gpu-based particle engine
- KIPFER, P., SEOAL, M., AND WESTERMANN, R. 2004. Uberflow: A gpu-based particle engine. SIGGRAPH/Eurographics Workshop on Graphics Hardware.
- (2004) SIGGRAPH/Eurographics Workshop on Graphics Hardware
- KIPFER, P.¹ SEOAL, M.² WESTERMANN, R.³

25
- 0347304618
- Data-centric multi-level blocking
- KODUKULA, I., AHMED, N., AND PINOALI, K. 1997. Data-centric multi-level blocking. Proc. of ACM SIGPLAN, 346-357.
- (1997) Proc. of ACM SIGPLAN , pp. 346-357
- KODUKULA, I.¹ AHMED, N.² PINOALI, K.³

26
- 77954024744
- KRÜOER,. J., AND W.ESTERMANN, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908-916.
- KRÜOER,. J., AND W.ESTERMANN, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908-916.

27
- 0026137116
- The performance and optimization of blocked algorithms
- LAM, M., ROTHBERO, E., AND WOLF, M. 1991. The performance and optimization of blocked algorithms. Proc. of 4th International conference on Architectural support for programming languages and operating systems, 63-74.
- (1991) Proc. of 4th International conference on Architectural support for programming languages and operating systems , pp. 63-74
- LAM, M.¹ ROTHBERO, E.² WOLF, M.³

28
- 85059252992
- Fast matrix multiplies using graphics hardware
- ACM Press
- LARSEN, E. S., AND MCALLISTER, D. 2001. Fast matrix multiplies using graphics hardware. In Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), ACM Press, 55-55.
- (2001) Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) , pp. 55-55
- LARSEN, E.S.¹ MCALLISTER, D.²

29
- 34548257439
- LASTRA, A., LIN, M., AND MANOCHA, D. 2004. ACM workshop on general purpose computation on graphics processors.
- (2004) ACM workshop on general purpose computation on graphics processors
- LASTRA, A.¹ LIN, M.² MANOCHA, D.³

30
- 0027694019
- Access normalization: Loop restructuring for numa computers
- LI, W., AND PINOALI, K. 1993. Access normalization: loop restructuring for numa computers. ACM Transactions on Computer Systems 11, 4, 353-375.
- (1993) ACM Transactions on Computer Systems , vol.11 , Issue.4 , pp. 353-375
- LI, W.¹ PINOALI, K.²

31
- 10644238428
- Shader algebra
- MCCOOL, M., TOIT, S. D., POPA, T., CHAN, B., AND MOULE, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787-795.
- (2004) ACM Trans. Graph , vol.23 , Issue.3 , pp. 787-795
- MCCOOL, M.¹ TOIT, S.D.² POPA, T.³ CHAN, B.⁴ MOULE, K.⁵

32
- 34249003958
- OWENS, J., LUEBKE, D., GOVINDARAJU, N., HARRIS, M., KRUGER, J., LEFOHN, A., AND PURCELL, T. 2005. A survey of general-purpose computation on graphics hardware.
- (2005) A survey of general-purpose computation on graphics hardware
- OWENS, J.¹ LUEBKE, D.² GOVINDARAJU, N.³ HARRIS, M.⁴ KRUGER, J.⁵ LEFOHN, A.⁶ PURCELL, T.⁷

33
- 10444224900
- Photon mapping on programmable graphics hardware
- PURCELL, T., DONNER, C., CAMMARANO, M., JENSEN, H., AND HANRAHAN, P. 2003. Photon mapping on programmable graphics hardware. ACM SIGGRAPH/Eurographics Conference on Graphics Hardware, 41-50.
- (2003) ACM SIGGRAPH/Eurographics Conference on Graphics Hardware , pp. 41-50
- PURCELL, T.¹ DONNER, C.² CAMMARANO, M.³ JENSEN, H.⁴ HANRAHAN, P.⁵

34
- 33845461400
- Using graphics cards for quantized FEM computations
- RUMPF, M., AND STRZODKA, R. 2001. Using graphics cards for quantized FEM computations. In Proc. of IASTED Visualization, Imaging and Image Processing Conference (VIIPVl), 193-202.
- (2001) Proc. of IASTED Visualization, Imaging and Image Processing Conference (VIIPVl) , pp. 193-202
- RUMPF, M.¹ STRZODKA, R.²

35
- 4243187062
- Towards a theory of cache-efficient algorithms
- SEN, S., CHATTERJEE, S., AND DUMIR, N. 2002. Towards a theory of cache-efficient algorithms. Journal of the ACM 49, 828-858.
- (2002) Journal of the ACM , vol.49 , pp. 828-858
- SEN, S.¹ CHATTERJEE, S.² DUMIR, N.³

36
- 0003985847
- Springer
- TOLIMIERI, R., AN, M., AND LU, C. 1997. Algorithms for Discrete Fourier Tmnsforms and Convolution. Springer.
- (1997) Algorithms for Discrete Fourier Tmnsforms and Convolution
- TOLIMIERI, R.¹ AN, M.² LU, C.³

37
- 0001321490
- External memory algorithms and data structures: Dealing with, massive data
- VITTER, J. 2001. External memory algorithms and data structures: Dealing with, massive data. ACM Computing Surveys, 209-271.
- (2001) ACM Computing Surveys , pp. 209-271
- VITTER, J.¹

38
- 0003927035
- Addison-Wesley
- WOLFE, M., SHANKLIN, C., AND ORTEGA, L. 1995. High performance compilers for parallel computing. Addison-Wesley.
- (1995) High performance compilers for parallel computing
- WOLFE, M.¹ SHANKLIN, C.² ORTEGA, L.³

39
- 0002433589
- Iteration space tiling for memory hierarchies
- WOLFE, M. 1987. Iteration space tiling for memory hierarchies. Proc. of the Third SIAM Conference on Parallel Processing for Scientific Computing, 357-361.
- (1987) Proc. of the Third SIAM Conference on Parallel Processing for Scientific Computing , pp. 357-361
- WOLFE, M.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.