SCOPUS 정보 검색 플랫폼

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Volumn , Issue , 2010, Pages 207-216

Acceleration of streamed tensor contraction expressions on GPGPU-based clusters

(4) Ma, Wenjing a Krishnamoorthy, Sriram b Villa, Oreste b Kowalski, Karol b

a OHIO STATE UNIVERSITY (United States)

b PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CCSD; COUPLED-CLUSTER METHODS; DATA MOVEMENTS; ENTIRE SYSTEM; MULTI-DIMENSIONAL MATRICES; TENSOR CONTRACTION; TENSOR CONTRACTION EXPRESSIONS;

CLUSTER COMPUTING; COMPUTATIONAL CHEMISTRY; PROGRAM PROCESSORS; QUANTUM CHEMISTRY;

TENSORS;

EID: 78649489299 PISSN: 15525244 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CLUSTER.2010.26 Document Type: Conference Paper

Times cited : (24)

References (35)

1
- 70450059008
- Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors
- M. Boyer, D. Tarjan, S. T. Acton, and K. Skadron, "Accelerating Leukocyte Tracking using CUDA: A Case Study in Leveraging Manycore Coprocessors," in Proceedings of the international parallel and distributed processing symposium (IPDPS), 2009, pp. 1-12.
- (2009) Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) , pp. 1-12
- Boyer, M.¹ Tarjan, D.² Acton, S.T.³ Skadron, K.⁴

2
- 51449118065
- A performance study of general-purpose applications on graphics processors using CUDA
- S. Che, J. Meng, J. W. Sheaffer, and K. Skadron, "A performance study of general-purpose applications on graphics processors using CUDA," Journal of parallel and distributed computing, vol. 68, no. 10, pp. 1370-1380, 2008.
- (2008) Journal of Parallel and Distributed Computing , vol.68 , Issue.10 , pp. 1370-1380
- Che, S.¹ Meng, J.² Sheaffer, J.W.³ Skadron, K.⁴

3
- 38349041620
- Accelerating large graph algorithms on the GPU Using CUDA
- P. Harish and P. Narayanan, "Accelerating Large Graph Algorithms on the GPU Using CUDA," in Proceedings of the international conference on high performance computing (HiPC), 2007, pp. 197-208.
- (2007) Proceedings of the International Conference on High Performance Computing (HiPC) , pp. 197-208
- Harish, P.¹ Narayanan, P.²

4
- 78651550268
- Scalable parallel programming with CUDA
- J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," Queue, vol. 6, no. 2, pp. 40-53, 2008.
- (2008) Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

5
- 70350759823
- Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
- A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka, "Bandwidth Intensive 3-D FFT Kernel for GPUs using CUDA," in Proceedings of the ACM/IEEE SC conference on high performance networking and computing, 2008, pp. 1-11.
- (2008) Proceedings of the ACM/IEEE SC Conference on High Performance Networking and Computing , pp. 1-11
- Nukada, A.¹ Ogata, Y.² Endo, T.³ Matsuoka, S.⁴

6
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. M. Hwu, "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA," in Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), 2008, pp. 73-82.
- (2008) Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , pp. 73-82
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.M.⁶

7
- 38849131252
- High-throughput sequence alignment using graphics processing units
- M. Schatz, C. Trapnell, A. Delcher, and A. Varshney, "High- throughput Sequence Alignment Using Graphics Processing Units," BMC Bioinformatics, vol. 8, no. 1, p. 474, 2007.
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

8
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- V. Volkov and J. W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," in Proceedings of the ACM/IEEE SC conference on high performance networking and computing, 2008, pp. 1-11.
- (2008) Proceedings of the ACM/IEEE SC Conference on High Performance Networking and Computing , pp. 1-11
- Volkov, V.¹ Demmel, J.W.²

9
- 22844457256
- A critical assessment of coupled cluster method in quantum chemistry
- J. Paldus and X. Li, "A Critical Assessment of Coupled Cluster Method in Quantum Chemistry," Advances in Chemical Physics, vol. 110, pp. 1-175, 1999.
- (1999) Advances in Chemical Physics , vol.110 , pp. 1-175
- Paldus, J.¹ Li, X.²

10
- 77952873681
- Nvidia
- Nvidia, "NVIDIA CUDA Programming Guide 2.3," 2009.
- (2009) NVIDIA CUDA Programming Guide 2.3

11
- 36849099976
- On correlation problem in atomic and molecular systems. Calculation of wavefunction components in ursell-type expansion using quantum-field theoretical methods
- J. Cizek, "On Correlation Problem in Atomic and Molecular Systems. Calculation of Wavefunction Components in Ursell-Type Expansion Using Quantum-Field Theoretical Methods," Journal of Chemical Physics, vol. 45, no. 11, pp. 4256-4266, 1966.
- (1966) Journal of Chemical Physics , vol.45 , Issue.11 , pp. 4256-4266
- Cizek, J.¹

12
- 33847389465
- Coupled-cluster theory in quantum chemistry
- Feb
- R. J. Bartlett and M. Musiał, "Coupled-cluster theory in quantum chemistry," Reviews of Modern Physics, vol. 79, no. 1, pp. 291-352, Feb 2007.
- (2007) Reviews of Modern Physics , vol.79 , Issue.1 , pp. 291-352
- Bartlett, R.J.¹ Musiał, M.²

13
- 0006244148
- A 5th-order perturbation comparison of electron correlation theories
- K. Raghavachari, T. G.W., J. A. Pople, and M. Head-Gordon, "A 5th-Order Perturbation Comparison of Electron Correlation Theories," Chemical Physics Letters, vol. 157, no. 6, pp. 479-483, 1989.
- (1989) Chemical Physics Letters , vol.157 , Issue.6 , pp. 479-483
- Raghavachari, K.¹ T, G.W.² Pople, J.A.³ Head-Gordon, M.⁴

14
- 31744435977
- Automatic code generation for many-body electronic structure methods: The tensor contraction engine
- A A Auer et al., "Automatic Code Generation for Many-body Electronic Structure Methods: the Tensor Contraction Engine," Molecular Physics, vol. 2, p. 211, 2006.
- (2006) Molecular Physics , vol.2 , pp. 211
- Auer, A.A.¹

15
- 0345566357
- Tensor Contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories
- S. Hirata, "Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories," The Journal of Physical Chemistry A, vol. 107, no. 46, pp. 9887-9897, 2003.
- (2003) The Journal of Physical Chemistry A , vol.107 , Issue.46 , pp. 9887-9897
- Hirata, S.¹

16
- 68849128792
- A note on auto-tuning GEMM for GPUs
- Y. Li, J. Dongarra, and S. Tomov, "A Note on Auto-tuning GEMM for GPUs," in Proceedings of the international conference on computational science (ICCS), 2009, pp. 884-892.
- (2009) Proceedings of the International Conference on Computational Science (ICCS) , pp. 884-892
- Li, Y.¹ Dongarra, J.² Tomov, S.³

17
- 67650056991
- EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2008-49, May. [Online]. Available
- V. Volkov and J. Demmel, "LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2008-49, May 2008. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.html
- (2008) LU, QR and Cholesky Factorizations Using Vector Capabilities of GPUs
- Volkov, V.¹ Demmel, J.²

18
- 57349180412
- A compiler framework for optimization of affine loop nests for GPGPUs
- M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A compiler framework for optimization of affine loop nests for GPGPUs," in Proceedings of the international conference on Supercomputing (ICS), 2008, pp. 225-234.
- (2008) Proceedings of the International Conference on Supercomputing (ICS) , pp. 225-234
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

19
- 70449707774
- A translation system for enabling data mining applications on GPUs
- W. Ma and G. Agrawal, "A translation system for enabling data mining applications on GPUs," in Proceedings of the international conference on Supercomputing (ICS), 2009, pp. 400-409.
- (2009) Proceedings of the International Conference on Supercomputing (ICS) , pp. 400-409
- Ma, W.¹ Agrawal, G.²

20
- 77749340082
- Model-driven autotuning of sparse matrix-vector multiply on GPUs
- J. W. Choi, A. Singh, and R. W. Vuduc, "Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs," in Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), 2010, pp. 115-126.
- (2010) Proceedings of Therftxt ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , pp. 115-126
- Choi, J.W.¹ Singh, A.² Vuduc, R.W.³

21
- 67650563116
- Software pipelined execution of stream programs on GPUs
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil, "Software pipelined execution of stream programs on GPUs," in Proceedings of the international symposium on code generation and optimization (CGO), 2009, pp. 200-209.
- (2009) Proceedings of the International Symposium on Code Generation and Optimization (CGO) , pp. 200-209
- Udupa, A.¹ Govindarajan, R.² Thazhuthaveetil, M.J.³

22
- 33846471996
- Exploiting coarsegrained task, data, and pipeline parallelism in stream programs
- M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting Coarsegrained Task, Data, and Pipeline Parallelism in Stream Programs," SIGOPS Oper. Syst. Rev., vol. 40, no. 5, pp. 151-162, 2006.
- (2006) SIGOPS Oper. Syst. Rev. , vol.40 , Issue.5 , pp. 151-162
- Gordon, M.I.¹ Thies, W.² Amarasinghe, S.³

23
- 43449094719
- Program optimization space pruning for a multithreaded GPU
- S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-m. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," in Proceedings of the international symposium on code generation and optimization (CGO), 2008, pp. 195-204.
- (2008) Proceedings of the International Symposium on Code Generation and Optimization (CGO) , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Baghsorkhi, S.S.⁴ Ueng, S.-Z.⁵ Stratton, J.A.⁶ Hwu, W.-M.W.⁷

24
- 70450231944
- An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
- S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness," SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 152-163, 2009.
- (2009) SIGARCH Comput. Archit. News , vol.37 , Issue.3 , pp. 152-163
- Hong, S.¹ Kim, H.²

25
- 78649485472
- Master's thesis, The Ohio State University
- S. G. Murthy, "Optimal loop unrolling for GPGPU programs," Master's thesis, The Ohio State University, 2009.
- (2009) Optimal Loop Unrolling for GPGPU Programs
- Murthy, S.G.¹

26
- 77957561221
- An adaptive performance modeling tool for GPU architectures
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu, "An adaptive performance modeling tool for GPU architectures," in Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP), 2010, pp. 105-114.
- (2010) Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , pp. 105-114
- Baghsorkhi, S.S.¹ Delahaye, M.² Patel, S.J.³ Gropp, W.D.⁴ Hwu, W.-M.W.⁵

27
- 70349204069
- Absorption spectrum of the green fluorescent protein chromophore: A difficult case for ab initio methods?
- Jul
- C. Filippi, M. Zaccheddu, and F. Buda, "Absorption Spectrum of the Green Fluorescent Protein Chromophore: A Difficult Case for ab Initio Methods?" Journal of Chemical Theory and Computation, vol. 5, pp. 2074-2087, Jul 2009.
- (2009) Journal of Chemical Theory and Computation , vol.5 , pp. 2074-2087
- Filippi, C.¹ Zaccheddu, M.² Buda, F.³

28
- 33746614482
- Gaussian basis sets for use in correlated molecular calculations. I. the atoms boron through neon and hydrogen
- T. Dunning, "Gaussian Basis Sets for Use in Correlated Molecular Calculations. I. The Atoms Boron through Neon and Hydrogen," Journal of Chemical Physics, vol. 90, pp. 1007-1023, 1989.
- (1989) Journal of Chemical Physics , vol.90 , pp. 1007-1023
- Dunning, T.¹

29
- 74049154762
- Liquid water: Obtaining the right answer for the right reasons
- E. Aprà, A. P. Rendell, R. J. Harrison, V. Tipparaju, W. A. deJong, and S. S. Xantheas, "Liquid water: Obtaining the Right Answer for the Right Reasons," in Proceedings of the ACM/IEEE SC conference on high performance networking and computing, 2009, pp. 1-7.
- (2009) Proceedings of the ACM/IEEE SC Conference on High Performance Networking and Computing , pp. 1-7
- Aprà, E.¹ Rendell, A.P.² Harrison, R.J.³ Tipparaju, V.⁴ Dejong, W.A.⁵ Xantheas, S.S.⁶

30
- 34247114368
- Combining analytical and empirical approaches in tuning matrix transposition
- Q. Lu, S. Krishnamoorthy, and P. Sadayappan, "Combining Analytical and Empirical Approaches in Tuning Matrix Transposition," in Proceedings of the conference on parallel architectures and compilation techniques (PACT), 2006, pp. 233-242.
- (2006) Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT) , pp. 233-242
- Lu, Q.¹ Krishnamoorthy, S.² Sadayappan, P.³

31
- 84870404942
- Nvidia, "NVIDIA's Next Generation CUDA Compute Architecture: Fermi," http://www.nvidia.com/object/fermi-architecture.html.
- NVIDIA's Next Generation CUDA Compute Architecture: Fermi

32
- 78649478659
- --, "Tesla 20-series," http://www.nvidia.com/object/tesla- computing-solutions.html.
- Tesla 20-series
- Lu, Q.¹ Krishnamoorthy, S.² Sadayappan, P.³

33
- 70449643566
- Memory performance and cache coherency effects on an intel nehalem multiprocessor system
- D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, "Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System," in Proceedings of the conference on parallel architectures and compilation techniques (PACT), 2009, pp. 261-270.
- (2009) Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT) , pp. 261-270
- Molka, D.¹ Hackenberg, D.² Schone, R.³ Muller, M.S.⁴

34
- 84873160007
- H. T. Consortium, "PCI Express 3.0 specification," http://www.hypertransport.org/docs/twgdocs/HTC20051222-00046- 0028.pdf.
- PCI Express 3.0 Specification

35
- 70449693703
- Document Number: 320412, January
- Intel, "An Introduction to the Intel QuickPath Interconnect," Document Number: 320412, January 2009, http://www.intel.com/technology/ quickpath/introduction.pdf.
- (2009) An Introduction to the Intel QuickPath Interconnect

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.