-
1
-
-
0004072686
-
-
Addison Wesley
-
A. Aho, R. Sethi, , and J. Ullman. Compilers: principles, techniques, and tools. Addison Wesley, 1986.
-
(1986)
Compilers: Principles, Techniques, and Tools
-
-
Aho, A.1
Sethi, R.2
Ullman, J.3
-
2
-
-
77957561221
-
An adaptive performance modeling tool for gpu architectures
-
January
-
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. Patel, William D. Gropp, and Wen-mei W. Hwu. An adaptive performance modeling tool for GPU architectures. SIGPLAN Notices, 45(5):105-114, January 2010.
-
(2010)
SIGPLAN Notices
, vol.45
, Issue.5
, pp. 105-114
-
-
Baghsorkhi, S.S.1
Delahaye, M.2
Patel, S.J.3
Gropp, W.D.4
Hwu, W.-M.W.5
-
3
-
-
84858379069
-
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors
-
New York, NY, USA, ACM
-
Sara S. Baghsorkhi, Isaac Gelado, Matthieu Delahaye, and Wen-mei W. Hwu. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 23-34, New York, NY, USA, 2012. ACM.
-
(2012)
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 23-34
-
-
Baghsorkhi, S.S.1
Gelado, I.2
Delahaye, M.3
Hwu, W.-M.W.4
-
4
-
-
70349169075
-
Analyzing cuda workloads using a detailed gpu simulator
-
April
-
A. Bakhoda, G.L. Yuan, W.W.L. Fung, H. Wong, and T.M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 163-174, April 2009.
-
(2009)
IEEE International Symposium on Performance Analysis of Systems and Software
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
5
-
-
33846349887
-
A hierarchical o(n log n) force-calculation algorithm
-
December
-
J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324(4), December 1986.
-
(1986)
Nature
, vol.324
, Issue.4
-
-
Barnes, J.1
Hut, P.2
-
6
-
-
26944443478
-
Survey propagation: An algorithm for satisfiability
-
A. Braunstein, M. Mèzard, and R. Zecchina. Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms, 27(2):201-226, 2005.
-
(2005)
Random Structures and Algorithms
, vol.27
, Issue.2
, pp. 201-226
-
-
Braunstein, A.1
Mèzard, M.2
Zecchina, R.3
-
7
-
-
84858427151
-
An efficient cuda implementation of the tree-based barnes hut n-body algorithm
-
Morgan Kaufmann
-
Martin Burtscher and Keshav Pingali. An efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition, pages 75-92. Morgan Kaufmann, 2011.
-
(2011)
GPU Computing Gems Emerald Edition
, pp. 75-92
-
-
Burtscher, M.1
Pingali, K.2
-
8
-
-
80052878699
-
High performance hybrid functional petri net simulations of biological pathway models on cuda
-
November
-
Georgios Chalkidis, Masao Nagasaki, and Satoru Miyano. High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1545-1556, November 2011.
-
(2011)
IEEE/ACM Transactions on Computational Biology and Bioinformatics
, vol.8
, Issue.6
, pp. 1545-1556
-
-
Chalkidis, G.1
Nagasaki, M.2
Miyano, S.3
-
9
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
Washington, DC, USA, IEEE Computer Society
-
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, pages 44-54, Washington, DC, USA, 2009. IEEE Computer Society.
-
(2009)
Proceedings of the 2009 IEEE International Symposium on Workload Characterization
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.-H.6
Skadron, K.7
-
10
-
-
51449118065
-
A performance study of general-purpose applications on graphics processors using cuda
-
October
-
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and Kevin Skadron. A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributing Computing, 68:1370-1380, October 2008.
-
(2008)
Journal of Parallel and Distributing Computing
, vol.68
, pp. 1370-1380
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
14
-
-
76349105923
-
Taming irregular eda applications on gpus
-
New York, NY, USA, ACM
-
Yangdong (Steve) Deng, Bo David Wang, and Shuai Mu. Taming irregular EDA applications on GPUs. In Proceedings of the 2009 International Conference on Computer-Aided Design, pages 539-546, New York, NY, USA, 2009. ACM.
-
(2009)
Proceedings of the 2009 International Conference on Computer-Aided Design
, pp. 539-546
-
-
Deng, Y.1
Wang, B.D.2
Mu, S.3
-
15
-
-
84873461993
-
-
Fermi. http://www.nvidia.com/content/PDF/fermiwhitepapers/ NVIDIAFermiComputeArchitectureWhitepaper.pdf, 2010.
-
(2010)
Fermi
-
-
-
16
-
-
79955923056
-
Thread block compaction for efficient simt control flow
-
Washington, DC, USA, IEEE Computer Society
-
Wilson W. L. Fung and Tor M. Aamodt. Thread block compaction for efficient SIMT control flow. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pages 25-36, Washington, DC, USA, 2011. IEEE Computer Society.
-
(2011)
Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
, pp. 25-36
-
-
Fung, W.W.L.1
Aamodt, T.M.2
-
17
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient gpu control flow
-
Washington, DC, USA, IEEE Computer Society
-
Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 407-420, Washington, DC, USA, 2007. IEEE Computer Society.
-
(2007)
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
, pp. 407-420
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
19
-
-
78751477137
-
Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications
-
Washington, DC, USA, IEEE Computer Society
-
Nilanjan Goswami, Ramkumar Shankar, Madhura Joshi, and Tao Li. Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 1-10, Washington, DC, USA, 2010. IEEE Computer Society.
-
(2010)
Proceedings of the IEEE International Symposium on Workload Characterization
, pp. 1-10
-
-
Goswami, N.1
Shankar, R.2
Joshi, M.3
Li, T.4
-
21
-
-
77951141930
-
Fast fluid dynamics simulation on the gpu
-
New York, NY, USA, ACM
-
Mark Harris. Fast fluid dynamics simulation on the GPU. In ACM SIGGRAPH 2005 Courses, New York, NY, USA, 2005. ACM.
-
(2005)
ACM SIGGRAPH 2005 Courses
-
-
Harris, M.1
-
22
-
-
84862107632
-
Characterizing and evaluating a key-value store application on heterogeneous cpu-gpu systems
-
April
-
T.H. Hetherington, T.G. Rogers, L. Hsu, M. O'Connor, and T.M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 88-98, April 2012.
-
(2012)
IEEE International Symposium on Performance Analysis of Systems and Software
, pp. 88-98
-
-
Hetherington, T.H.1
Rogers, T.G.2
Hsu, L.3
O'Connor, M.4
Aamodt, T.M.5
-
24
-
-
79952811127
-
Accelerating cuda graph algorithms at maximum warp
-
New York, NY, USA, ACM
-
Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pages 267-276, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming
, pp. 267-276
-
-
Hong, S.1
Kim, S.K.2
Oguntebi, T.3
Olukotun, K.4
-
26
-
-
70649104826
-
A characterization and analysis of ptx kernels
-
Washington, DC, USA, IEEE Computer Society
-
Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. A characterization and analysis of PTX kernels. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, pages 3-12, Washington, DC, USA, 2009. IEEE Computer Society.
-
(2009)
Proceedings of the 2009 IEEE International Symposium on Workload Characterization
, pp. 3-12
-
-
Kerr, A.1
Diamos, G.2
Yalamanchili, S.3
-
27
-
-
77952256778
-
Modeling gpu-cpu workloads and systems
-
New York, NY, USA, ACM
-
Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. Modeling GPU-CPU workloads and systems. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 31-42, New York, NY, USA, 2010. ACM.
-
(2010)
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
, pp. 31-42
-
-
Kerr, A.1
Diamos, G.2
Yalamanchili, S.3
-
28
-
-
84855707932
-
Programming massively parallel processors: A hands-on approach
-
David B. Kirk and Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, 2010.
-
(2010)
Morgan Kaufmann
-
-
Kirk, D.B.1
Hwu, W.-M.W.2
-
29
-
-
42549111870
-
Optimistic parallelism requires abstractions
-
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Notices (Proceedings of PLDI), 42(6):211-222, 2007.
-
(2007)
SIGPLAN Notices (Proceedings of PLDI)
, vol.42
, Issue.6
, pp. 211-222
-
-
Kulkarni, M.1
Pingali, K.2
Walter, B.3
Ramanarayanan, G.4
Bala, K.5
Paul Chew, L.6
-
30
-
-
44849137198
-
Nvidia tesla: A unified graphics and computing architecture
-
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28:39-55, 2008.
-
(2008)
IEEE Micro
, vol.28
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
31
-
-
84857873786
-
Exploring the limits of gpgpu scheduling in control flow bound applications
-
January
-
Roman Malits, Evgeny Bolotin, Avinoam Kolodny, and Avi Mendelson. Exploring the limits of GPGPU scheduling in control flow bound applications. ACM Transactions on Architecture and Code Optimization, 8(4):29:1-29:22, January 2012.
-
(2012)
ACM Transactions on Architecture and Code Optimization
, vol.8
, Issue.4
, pp. 291-2922
-
-
Malits, R.1
Bolotin, E.2
Kolodny, A.3
Mendelson, A.4
-
32
-
-
84878605997
-
A gpu implementation of inclusion-based points-to analysis
-
New York, NY, USA, ACM
-
Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 107-116, New York, NY, USA, 2012. ACM.
-
(2012)
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, pp. 107-116
-
-
Mendez-Lojo, M.1
Burtscher, M.2
Pingali, K.3
-
33
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
New York, NY, USA, ACM
-
Jiayuan Meng, David Tarjan, and Kevin Skadron. Dynamic warp subdivision for integrated branch and memory divergence tolerance. In Proceedings of the 37th Annual International Symposium on Computer Architecture, pages 235-246, New York, NY, USA, 2010. ACM.
-
(2010)
Proceedings of the 37th Annual International Symposium on Computer Architecture
, pp. 235-246
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
35
-
-
0022678067
-
Distributed discrete-event simulation
-
Jayadev Misra. Distributed discrete-event simulation. ACM Computing Surveys, 18(1):39-65, 1986.
-
(1986)
ACM Computing Surveys
, vol.18
, Issue.1
, pp. 39-65
-
-
Misra, J.1
-
37
-
-
84873416888
-
Floating-point data compression at 75 gb/s on a gpu
-
New York, NY, USA, ACM
-
Molly A. O'Neil and Martin Burtscher. Floating-point data compression at 75 Gb/s on a GPU. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, pages 7:1-7:7, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
, pp. 71-77
-
-
O'Neil, M.A.1
Burtscher, M.2
-
38
-
-
33947588048
-
A survey of generalpurpose computation on graphics hardware
-
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron Lefohn, and Timothy J. Purcell. A survey of generalpurpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, 2007.
-
(2007)
Computer Graphics Forum
, vol.26
, Issue.1
, pp. 80-113
-
-
Owens, J.D.1
Luebke, D.2
Govindaraju, N.3
Harris, M.4
Krger, J.5
Lefohn, A.6
Purcell, T.J.7
-
40
-
-
79251566519
-
Eigencfa: Accelerating flow analysis with gpus
-
New York, NY, USA, ACM
-
Tarun Prabhu, Shreyas Ramalingam, Matthew Might, and Mary Hall. EigenCFA: Accelerating flow analysis with GPUs. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 511-522, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, pp. 511-522
-
-
Prabhu, T.1
Ramalingam, S.2
Might, M.3
Hall, M.4
-
41
-
-
78149343218
-
Option pricing on the gpu
-
Washington, DC, USA, IEEE Computer Society
-
Steven Solomon, Ruppa K. Thulasiram, and Parimala Thulasiraman. Option Pricing on the GPU. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications, pages 289-296, Washington, DC, USA, 2010. IEEE Computer Society.
-
(2010)
Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
, pp. 289-296
-
-
Solomon, S.1
Thulasiram, R.K.2
Thulasiraman, P.3
-
43
-
-
84873470137
-
-
Technical Report IMPACT-12-01, University of Illinois, at Urbana-Champaign
-
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen mei W. Hwu. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Technical Report IMPACT-12-01, University of Illinois, at Urbana-Champaign, 2012.
-
(2012)
Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing
-
-
Stratton, J.A.1
Rodrigues, C.2
Sung, I.-J.3
Obeid, N.4
Chang, L.-W.5
Anssari, N.6
Liu, G.D.7
Hwu, W.M.W.8
-
45
-
-
70450194802
-
Fast minimum spanning tree for large graphs on the gpu
-
New York, NY, USA, ACM
-
Vibhav Vineet, Pawan Harish, Suryakant Patidar, and P. J. Narayanan. Fast minimum spanning tree for large graphs on the GPU. In Proceedings of the Conference on High Performance Graphics 2009, pages 167-171, New York, NY, USA, 2009. ACM.
-
(2009)
Proceedings of the Conference on High Performance Graphics 2009
, pp. 167-171
-
-
Vineet, V.1
Harish, P.2
Patidar, S.3
Narayanan, P.J.4
-
47
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for gpu computing
-
New York, NY, USA, ACM
-
Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369-380, New York, NY, USA, 2011. ACM.
-
(2011)
Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems
, pp. 369-380
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
-
48
-
-
57749174539
-
Real-time kdtree construction on graphics hardware
-
Kun Zhou, Qiming Hou, Rui Wang, and Baining Guo. Real-time KDtree construction on graphics hardware. ACM Transactions on Graphics, 27(5):1-11, 2008.
-
(2008)
ACM Transactions on Graphics
, vol.27
, Issue.5
, pp. 1-11
-
-
Zhou, K.1
Hou, Q.2
Wang, R.3
Guo, B.4
|