-
1
-
-
84893426008
-
-
GE Intelligent Platforms. http://defense.ge-ip. com/products/hpec/c560.
-
-
-
-
2
-
-
84893408658
-
-
Mosek. http://www.mosek.com/.
-
-
-
-
3
-
-
84893362632
-
-
NVIDIA. Fermi GPUs www.nvidia.com/object/fermi-architecture.html.
-
-
-
-
4
-
-
84893389126
-
-
NVIDIA. Kepler GPUs www.nvidia.com/object/nvidia-kepler.html.
-
-
-
-
5
-
-
84893347454
-
-
NVIDIA. PTX Code http://docs.nvidia.com/cuda/pdf/ptx-isa-3.1.pdf.
-
-
-
-
7
-
-
84893425171
-
-
NVIDIA. Profiler http://docs.nvidia.com/cuda/profiler-users-guide/index. html.
-
-
-
-
8
-
-
84893367579
-
-
NVIDIA GPU Computing SDK. http://developer.nvidia.com/gpu-computing-sdk.
-
-
-
-
9
-
-
84893429325
-
-
NVIDIA Tegra. http://www.nvidia.com/object/tegra.html.
-
-
-
-
10
-
-
84893381500
-
-
QualcommInc. http://www.qualcomm.com/snapdragon.
-
-
-
-
11
-
-
84893398220
-
-
SamSung Inc. www.samsung.com/exynos.
-
-
-
-
12
-
-
0004116989
-
-
McGraw-Hill Higher Education, 2nd edition
-
T. Cormen, C. Stein, R. Rivest, and C. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
-
(2001)
Introduction to Algorithms
-
-
Cormen, T.1
Stein, C.2
Rivest, R.3
Leiserson, C.4
-
13
-
-
84866876242
-
An accurate GPU performance model for effective control flow divergence optimization
-
Z. Cui, Y. Liang, K. Rupnow, and D. Chen. An accurate GPU performance model for effective control flow divergence optimization. In IPDPS, 2012.
-
(2012)
IPDPS
-
-
Cui, Z.1
Liang, Y.2
Rupnow, K.3
Chen, D.4
-
14
-
-
84863389330
-
SHiP: Signature-based hit predictor for high performance caching
-
C. J. Wu et al. SHiP: signature-based hit predictor for high performance caching. In Micro, 2011.
-
(2011)
Micro
-
-
Wu, C.J.1
-
15
-
-
84873470137
-
Parboil: A revised benchmark suite for scientific and commercial throughput computing
-
J. A. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. In IMPACT Technical Report, 2012.
-
(2012)
IMPACT Technical Report
-
-
Stratton, J.A.1
-
17
-
-
57349180412
-
A compiler framework for optimization of affine loop nests for GPGPUs
-
M. M. Baskaran et al. A compiler framework for optimization of affine loop nests for GPGPUs. In ICS, 2008.
-
(2008)
ICS
-
-
Baskaran, M.M.1
-
19
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, 2008.
-
(2008)
PPoPP
-
-
Ryoo, S.1
-
20
-
-
84948958301
-
Compiler managed micro-cache bypassing for high performance EPIC processors
-
Y. Wu et al. Compiler managed micro-cache bypassing for high performance EPIC processors. In Micro, 2002.
-
(2002)
Micro
-
-
Wu, Y.1
-
21
-
-
4444328501
-
An integrated hardware/software approach for run-time scratchpad management
-
P. Francesco et al. An integrated hardware/software approach for run-time scratchpad management. In DAC, 2004.
-
(2004)
DAC
-
-
Francesco, P.1
-
22
-
-
70450231944
-
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
-
S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009.
-
(2009)
ISCA
-
-
Hong, S.1
Kim, H.2
-
23
-
-
84864068497
-
Characterizing and improving the use of demand-fetched caches in GPUs
-
W. Jia, K. A. Shaw, and M. Martonosi. Characterizing and improving the use of demand-fetched caches in GPUs. In ICS, 2012.
-
(2012)
ICS
-
-
Jia, W.1
Shaw, K.A.2
Martonosi, M.3
-
25
-
-
80052655793
-
CuMAPz: A tool to analyze memory access patterns in CUDA
-
Y. Kim and A. Shrivastava. CuMAPz: A tool to analyze memory access patterns in CUDA. In DAC, 2011.
-
(2011)
DAC
-
-
Kim, Y.1
Shrivastava, A.2
-
26
-
-
84877739484
-
Cache capacity aware thread scheduling for irregular memory access on many-core GPGPUs
-
H. Kuo, T. Yen, B. C. Lai, and J. Jou. Cache capacity aware thread scheduling for irregular memory access on many-core GPGPUs. In ASPDAC, 2013.
-
(2013)
ASPDAC
-
-
Kuo, H.1
Yen, T.2
Lai, B.C.3
Jou, J.4
-
27
-
-
84877777934
-
Register and thread structure optimization for GPUs
-
Y. Liang, Z. Cui, K. Rupnow, and D. Chen. Register and thread structure optimization for GPUs. In ASPDAC, 2013.
-
(2013)
ASPDAC
-
-
Liang, Y.1
Cui, Z.2
Rupnow, K.3
Chen, D.4
-
28
-
-
84862069040
-
Real-time implementation and performance optimization of 3D sound localization on GPUs
-
Y. Liang et al. Real-time implementation and performance optimization of 3D sound localization on GPUs. In DATE, 2012.
-
(2012)
DATE
-
-
Liang, Y.1
-
29
-
-
63349099764
-
Static analysis for fast and accurate design space exploration of caches
-
Y. Liang and T. Mitra. Static analysis for fast and accurate design space exploration of caches. In CODES+ISSS, 2008.
-
(2008)
CODES+ISSS
-
-
Liang, Y.1
Mitra, T.2
-
30
-
-
66749155879
-
Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency
-
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Micro, 2008.
-
(2008)
Micro
-
-
Liu, H.1
Ferdman, M.2
Huh, J.3
Burger, D.4
-
31
-
-
78149251414
-
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
-
I. J. Sung, J. A. Stratton, and W. W. Hwu. Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In PACT, 2010.
-
(2010)
PACT
-
-
Sung, I.J.1
Stratton, J.A.2
Hwu, W.W.3
-
32
-
-
47649086892
-
Dynamic allocation for scratch-pad memory using compile-time decisions
-
May
-
S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst., 5(2):472-511, May 2006.
-
(2006)
ACM Trans. Embed. Comput. Syst.
, vol.5
, Issue.2
, pp. 472-511
-
-
Udayakumaran, S.1
Dominguez, A.2
Barua, R.3
-
34
-
-
77954691442
-
A GPGPU compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, 2010.
-
(2010)
PLDI
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
35
-
-
79953126288
-
On-the-fly elimination of dynamic irregularities for GPU computing
-
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for GPU computing. In ASPLOS, 2011.
-
(2011)
ASPLOS
-
-
Zhang, E.Z.1
Jiang, Y.2
Guo, Z.3
Tian, K.4
Shen, X.5
|