-
4
-
-
70449720693
-
-
Sara Baghsorkhi, Melvin Lathara, and Wen mei Hwu. CUDA-lite: Reducing GPU Programming Complexity. In LCPC 2008, 2008.
-
Sara Baghsorkhi, Melvin Lathara, and Wen mei Hwu. CUDA-lite: Reducing GPU Programming Complexity. In LCPC 2008, 2008.
-
-
-
-
5
-
-
0029394470
-
-
Prithviraj Banerjee, John A. Chandy, Manish Gupta, Eugene W. Hodges IV, John G. Holm, Antonio Lain, Daniel J. Palermo, Shankar Ramaswamy, and Ernesto Su. The Paradigm Compiler for Distributed-Memory Multicomputers. IEEE Computer, 28(10):37-47, October 1995.
-
Prithviraj Banerjee, John A. Chandy, Manish Gupta, Eugene W. Hodges IV, John G. Holm, Antonio Lain, Daniel J. Palermo, Shankar Ramaswamy, and Ernesto Su. The Paradigm Compiler for Distributed-Memory Multicomputers. IEEE Computer, 28(10):37-47, October 1995.
-
-
-
-
6
-
-
57349180412
-
-
Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In International Conference on Supercomputing, pages 225-234, 2008.
-
Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In International Conference on Supercomputing, pages 225-234, 2008.
-
-
-
-
7
-
-
0030382364
-
Parallel programming with Polaris
-
December
-
W. Blume, R. Doallo, R. Eigenman, J. Grout, J. Hoelflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78-82, December 1996.
-
(1996)
IEEE Computer
, vol.29
, Issue.12
, pp. 78-82
-
-
Blume, W.1
Doallo, R.2
Eigenman, R.3
Grout, J.4
Hoelflinger, J.5
Lawrence, T.6
Lee, J.7
Padua, D.8
Paek, Y.9
Pottenger, B.10
Rauchwerger, L.11
Tu, P.12
-
8
-
-
52949145167
-
Data-Intensive Supercomputing: The Case for DISC
-
Technical Report CMU-CS-07-128, School of Computer Science, Carnegie Mellon University
-
Randal E. Bryant. Data-Intensive Supercomputing: The Case for DISC. Technical Report CMU-CS-07-128, School of Computer Science, Carnegie Mellon University, 2007.
-
(2007)
-
-
Bryant, R.E.1
-
9
-
-
61849183365
-
-
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Mike, and H. Pat. Brook for GPUs: Stream Computing on Graphics Hardware, 2004.
-
(2004)
Brook for GPUs: Stream Computing on Graphics Hardware
-
-
Buck, I.1
Foley, T.2
Horn, D.3
Sugerman, J.4
Mike, K.5
Pat, H.6
-
10
-
-
33746614750
-
A Graphics Hardware Accelerated Algorithm for Nearest Neighbor Search
-
Vassil N. Alexandrov, Geert Dick van Albada, Peter M.A. Sloot, and Jack Dongarra, editors, Computational Science, ICCS 2006, of, Springer
-
Benjamin Bustos, Oliver Deussen, Stefan Hiller, and Daniel Keim. A Graphics Hardware Accelerated Algorithm for Nearest Neighbor Search. In Vassil N. Alexandrov, Geert Dick van Albada, Peter M.A. Sloot, and Jack Dongarra, editors, Computational Science - ICCS 2006, volume 3994 of LNCS, pages 196-199. Springer, 2006.
-
(2006)
LNCS
, vol.3994
, pp. 196-199
-
-
Bustos, B.1
Deussen, O.2
Hiller, S.3
Keim, D.4
-
12
-
-
70449717067
-
-
Shuai Che, Jiayuan Meng, and Jeremy W. Sheaffer. A Performance Study of General Purpose Applications on Graphics Processors
-
Shuai Che, Jiayuan Meng, and Jeremy W. Sheaffer. A Performance Study of General Purpose Applications on Graphics Processors.
-
-
-
-
13
-
-
0002607026
-
Bayesian classification (autoclass): Theory and practice
-
AAAI Press, MIT Press
-
P. Cheeseman and J. Stutz. Bayesian classification (autoclass): Theory and practice. In Advanced in Knowledge Discovery and Data Mining, pages 61-83. AAAI Press / MIT Press, 1996.
-
(1996)
Advanced in Knowledge Discovery and Data Mining
, pp. 61-83
-
-
Cheeseman, P.1
Stutz, J.2
-
15
-
-
85030321143
-
Mapreduce: Simplified data processing on large clusters
-
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137-150, 2004.
-
(2004)
OSDI
, pp. 137-150
-
-
Dean, J.1
Ghemawat, S.2
-
16
-
-
0002629270
-
Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm
-
Arthur Dempster, Nan Laird, and Donald Rubin. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1-38, 1977.
-
(1977)
Journal of the Royal Statistical Society
, vol.39
, Issue.1
, pp. 1-38
-
-
Dempster, A.1
Laird, N.2
Rubin, D.3
-
17
-
-
84934299651
-
GPU Cluster for High Prformance Computing
-
Washington, DC, USA, IEEE Computer Society
-
Zhe Fan, Feng Qiu, Arie Kaufman, and Suzanne Yoakum-Stover. GPU Cluster for High Prformance Computing. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 47, Washington, DC, USA, 2004. IEEE Computer Society.
-
(2004)
SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing
, pp. 47
-
-
Fan, Z.1
Qiu, F.2
Kaufman, A.3
Yoakum-Stover, S.4
-
19
-
-
33947607609
-
GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management
-
New York, NY, USA, ACM
-
Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 325-336, New York, NY, USA, 2006. ACM.
-
(2006)
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
, pp. 325-336
-
-
Govindaraju, N.1
Gray, J.2
Kumar, R.3
Manocha, D.4
-
22
-
-
0030380793
-
Maximizing Multiprocessor Performance with the SUIF Compiler
-
December
-
M. Hall, S. Amarsinghe, B. Murphy, S. Liao, and M. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, (12), December 1996.
-
(1996)
IEEE Computer
, vol.12
-
-
Hall, M.1
Amarsinghe, S.2
Murphy, B.3
Liao, S.4
Lam, M.5
-
25
-
-
63549097654
-
Mars: A MapReduce Framework on Graphics Processors
-
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. Mars: A MapReduce Framework on Graphics Processors. In PACT08: IEEE International Conference on Parallel Architecture and Compilation Techniques 2008, 2008.
-
(2008)
PACT08: IEEE International Conference on Parallel Architecture and Compilation Techniques 2008
-
-
He, B.1
Fang, W.2
Luo, Q.3
Govindaraju, N.K.4
Wang, T.5
-
26
-
-
84976813879
-
Compiling Fortran D for MIMD distributed-memory machines
-
August
-
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, August 1992.
-
(1992)
Communications of the ACM
, vol.35
, Issue.8
, pp. 66-80
-
-
Hiranandani, S.1
Kennedy, K.2
Tseng, C.-W.3
-
28
-
-
70449713451
-
-
R. Jin and G. Agrawal. Shared memory parallelization of data mining algorithms: Techniques. citeseer.ist.psu.edu/article/jin02shared.html, 2002.
-
R. Jin and G. Agrawal. Shared memory parallelization of data mining algorithms: Techniques. citeseer.ist.psu.edu/article/jin02shared.html, 2002.
-
-
-
-
30
-
-
70449715278
-
-
Andreas Klockner. PyCuda, 2008.
-
(2008)
-
-
-
33
-
-
67650081010
-
OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization
-
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In PPoPP'09, 2009.
-
(2009)
PPoPP'09
-
-
Lee, S.1
Min, S.-J.2
Eigenmann, R.3
-
34
-
-
33845187300
-
Parallelizing user-defined and implicit reductions globally on multiprocessors
-
Chris R. Jesshope and Colin Egan, editors, Asia-Pacific Computer Systems Architecture Conference, of, Springer
-
Shih-Wei Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. In Chris R. Jesshope and Colin Egan, editors, Asia-Pacific Computer Systems Architecture Conference, volume 4186 of Lecture Notes in Computer Science, pages 189-202. Springer, 2006.
-
(2006)
Lecture Notes in Computer Science
, vol.4186
, pp. 189-202
-
-
Liao, S.-W.1
-
36
-
-
0031674776
-
Optimization of Implicit Reductions for Distributed Memory Multiprocessors
-
Bo Lu and John Mellor-Crummey. Compiler, April
-
Bo Lu and John Mellor-Crummey. Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors. In Proceedings of the 12th International Parallel Processing Symposium (IPPS), April 1998.
-
(1998)
Proceedings of the 12th International Parallel Processing Symposium (IPPS)
-
-
-
37
-
-
0002431740
-
Automatic Construction of Decision Trees from Data: A Multi-disciplinary Survey
-
S. K. Murthy. Automatic Construction of Decision Trees from Data: A Multi-disciplinary Survey. Data Mining and Knowledge Discovery, 2(4):345-389, 1998.
-
(1998)
Data Mining and Knowledge Discovery
, vol.2
, Issue.4
, pp. 345-389
-
-
Murthy, S.K.1
-
38
-
-
70449715277
-
-
NVidia. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. version 2.0. http://developer.download.nvidia.com/compute/cuda/2.0-Beta2/ docs/Programming-Guide-2.0beta2.pdf, June 7 2008.
-
NVidia. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. version 2.0. http://developer.download.nvidia.com/compute/cuda/2.0-Beta2/ docs/Programming-Guide-2.0beta2.pdf, June 7 2008.
-
-
-
-
40
-
-
77951558943
-
A Performance-oriented Data Parallel Virtual Machine for GPUs
-
New York, NY, USA, ACM
-
Mark Peercy, Mark Segal, and Derek Gerstmann. A Performance-oriented Data Parallel Virtual Machine for GPUs. In SIGGRAPH '06: ACM SIGGRAPH 2006 Sketches, page 184, New York, NY, USA, 2006. ACM.
-
(2006)
SIGGRAPH '06: ACM SIGGRAPH 2006 Sketches
, pp. 184
-
-
Peercy, M.1
Segal, M.2
Gerstmann, D.3
-
41
-
-
0031631999
-
The Role of Associativity and Commutativity in the Detection and Transformation of Loop-Level Parallelism
-
ACM Press, July
-
William M. Pottenger. The Role of Associativity and Commutativity in the Detection and Transformation of Loop-Level Parallelism. In Conference Proceedings of the 1998 International Conference on Supercomputing (ICS), pages 188-195. ACM Press, July 1998.
-
(1998)
Conference Proceedings of the 1998 International Conference on Supercomputing (ICS)
, pp. 188-195
-
-
Pottenger, W.M.1
-
42
-
-
10444224900
-
Photon Mapping on Programmable Graphics Hardware
-
Eurographics Association
-
Timothy J. Purcell, Craig Donner, Mike Cammarano, Henrik Wann Jensen, and Pat Hanrahan. Photon Mapping on Programmable Graphics Hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 41-50. Eurographics Association, 2003.
-
(2003)
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware
, pp. 41-50
-
-
Purcell, T.J.1
Donner, C.2
Cammarano, M.3
Wann Jensen, H.4
Hanrahan, P.5
-
44
-
-
58449109179
-
-
John Stratton, Sam Stone, and Wen mei Hwu. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs. In 21st Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'2008), July 2008.
-
John Stratton, Sam Stone, and Wen mei Hwu. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs. In 21st Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'2008), July 2008.
-
-
-
-
45
-
-
33947595619
-
Accelerator: Using Data Parallelism to Program GPUs for General-purpose Uses
-
New York, NY, USA, ACM
-
David Tarditi, Sidd Puri, and Jose Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-purpose Uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 325-335, New York, NY, USA, 2006. ACM.
-
(2006)
ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
, pp. 325-335
-
-
Tarditi, D.1
Puri, S.2
Oglesby, J.3
-
49
-
-
0027543560
-
Compiling for Distributed-Memory Systems
-
February, In Special Section on Languages and Compilers for Parallel Machines
-
Hans P. Zima and Barbara Mary Chapman. Compiling for Distributed-Memory Systems. Proceedings of the IEEE, 81(2):264-287, February 1993. In Special Section on Languages and Compilers for Parallel Machines.
-
(1993)
Proceedings of the IEEE
, vol.81
, Issue.2
, pp. 264-287
-
-
Zima, H.P.1
Mary Chapman, B.2
|