-
1
-
-
77952659348
-
-
Online. Available
-
"Open|SpeedShop." [Online]. Available: http://www. openspeedshop.org/wp/
-
-
-
-
2
-
-
43649092214
-
-
1st ed., Advanced Micro Devices, Inc.
-
ATI CTM Guide, 1st ed., Advanced Micro Devices, Inc., 2006.
-
(2006)
ATI CTM Guide
-
-
-
5
-
-
57349130987
-
StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems
-
S. Al-Kiswany, A. Gharaibeh, E. Santos-Neto, G. Yuan, and M. Ripeanu, "StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems," in Proc. 17th Int'l Symp. on High Performance Distributed Computing, 2008, pp. 165-174.
-
Proc. 17th Int'l Symp. on High Performance Distributed Computing, 2008
, pp. 165-174
-
-
Al-Kiswany, S.1
Gharaibeh, A.2
Santos-Neto, E.3
Yuan, G.4
Ripeanu, M.5
-
7
-
-
77952597070
-
Parallelization Made Easier with Intel Performance-Tuning Utility
-
A. Alexandrov, S. Bratanov, J. Fedorova, D. Levinthal, I. Lopatin, and D. Ryabtsev, "Parallelization Made Easier with Intel Performance-Tuning Utility," Intel Technology Journal, vol.11, no.4, 2007.
-
(2007)
Intel Technology Journal
, vol.11
, Issue.4
-
-
Alexandrov, A.1
Bratanov, S.2
Fedorova, J.3
Levinthal, D.4
Lopatin, I.5
Ryabtsev, D.6
-
8
-
-
77952591082
-
-
Online. Available
-
Apple Inc., "Optimizing with Shark." [Online]. Available: http://developer.apple.com/tools/shark optimize.html
-
Optimizing with Shark
-
-
-
9
-
-
70349169075
-
Analyzing CUDA Workloads Using a Detailed GPU Simulator
-
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 163-174.
-
IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009
, pp. 163-174
-
-
Bakhoda, A.1
Yuan, G.L.2
Fung, W.W.L.3
Wong, H.4
Aamodt, T.M.5
-
10
-
-
52249111370
-
Trace-based Performance Analysis on Cell BE
-
M. Biberstein, U. Shvadron, J. Turek, B. Mendelson, and M. Chang, "Trace-based Performance Analysis on Cell BE," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2008), April 2008, pp. 213-222.
-
IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2008), April 2008
, pp. 213-222
-
-
Biberstein, M.1
Shvadron, U.2
Turek, J.3
Mendelson, B.4
Chang, M.5
-
14
-
-
47349104432
-
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
-
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," in Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007.
-
Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007
-
-
Fung, W.W.L.1
Sham, I.2
Yuan, G.3
Aamodt, T.M.4
-
15
-
-
68549096107
-
Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
-
-, "Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware," ACM Trans. Archit. Code Optim., vol.6, no.2, pp. 1-37, 2009.
-
(2009)
ACM Trans. Archit. Code Optim.
, vol.6
, Issue.2
, pp. 1-37
-
-
-
17
-
-
38349041620
-
Accelerating Large Graph Algorithms on the GPU Using CUDA
-
P. Harish and P. J. Narayanan, "Accelerating Large Graph Algorithms on the GPU Using CUDA," in HiPC, 2007, pp. 197-208.
-
(2007)
HiPC
, pp. 197-208
-
-
Harish, P.1
Narayanan, P.J.2
-
19
-
-
70450231944
-
An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness
-
S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness," Proc. 36th Int'l Symp. on Computer Architecture, vol.37, no.3, pp. 152-163, 2009.
-
(2009)
Proc. 36th Int'l Symp. on Computer Architecture
, vol.37
, Issue.3
, pp. 152-163
-
-
Hong, S.1
Kim, H.2
-
21
-
-
77952640012
-
-
1st ed., Khronos Group
-
OpenCL 1.0 Specification, 1st ed., Khronos Group, 2009.
-
(2009)
OpenCL 1.0 Specification
-
-
-
23
-
-
44849137198
-
NVIDIA Tesla: A Unified Graphics and Computing Architecture
-
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol.28, no.2, pp. 39-55, 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
25
-
-
85015171905
-
-
Maxime
-
Maxime, "Ray tracing," http://www.nvidia.com/cuda.
-
Ray Tracing
-
-
-
26
-
-
77952603264
-
Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision
-
J. Meng, D. Tarjan, and K. Skadron, "Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision," Department of Computer Science, University of Virginia, Tech. Rep. CS-2009-2102, 2009.
-
(2009)
Department of Computer Science, University of Virginia, Tech. Rep. CS-2009-2102
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
27
-
-
70349189978
-
Cetra: A Trace and Analysis Framework for the Evaluation of Cell BE systems
-
J. Merino, L. Alvarez, M. Gil, and N. Navarro, "Cetra: A Trace and Analysis Framework for the Evaluation of Cell BE systems," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 43-52.
-
IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009
, pp. 43-52
-
-
Merino, J.1
Alvarez, L.2
Gil, M.3
Navarro, N.4
-
28
-
-
78651550268
-
Scalable Parallel Programming with CUDA
-
Mar.-Apr.
-
J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," ACM Queue, vol.6, no.2, pp. 40-53, Mar.-Apr. 2008.
-
(2008)
ACM Queue
, vol.6
, Issue.2
, pp. 40-53
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
29
-
-
70449710954
-
-
1st ed., NVIDIA Corp., Online. Available
-
NVIDIA CUDA Visual Profiler, 1st ed., NVIDIA Corp., 2008. [Online]. Available: http://developer.download.nvidia.com/compute/cuda/2 3/toolkit/docs/cudaprof 2.3 readme.txt
-
(2008)
NVIDIA CUDA Visual Profiler
-
-
-
30
-
-
84872053761
-
-
NVIDIA Corporation, Online. Available
-
NVIDIA Corporation, "NVIDIA CUDA SDK code samples." [Online]. Available: http://developer.download.nvidia.com/compute/cuda/sdk/website/ samples.html
-
NVIDIA CUDA SDK Code Samples
-
-
-
33
-
-
77952626611
-
-
Rice University, Online. Available
-
Rice University, "HPCToolkit." [Online]. Available: http://hpctoolkit.org/
-
HPCToolkit
-
-
-
34
-
-
0033691565
-
Memory Access Scheduling
-
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory Access Scheduling," in Proc. 27th Int'l Symp. on Computer Architecture, 2000, pp. 128-138.
-
Proc. 27th Int'l Symp. on Computer Architecture, 2000
, pp. 128-138
-
-
Rixner, S.1
Dally, W.J.2
Kapasi, U.J.3
Mattson, P.4
Owens, J.D.5
-
35
-
-
43449094719
-
Program Optimization Space Pruning for a Multithreaded GPU
-
S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S.-Z. Ueng, J. Stratton, and W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," in Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO), April 2008, pp. 195-204.
-
Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO), April 2008
, pp. 195-204
-
-
Ryoo, S.1
Rodrigues, C.2
Stone, S.3
Baghsorkhi, S.4
Ueng, S.-Z.5
Stratton, J.6
Hwu, W.W.7
-
36
-
-
38849131252
-
High-Throughput Sequence Alignment Using Graphics Processing Units
-
Online. Available
-
M. Schatz, C. Trapnell, A. Delcher, and A. Varshney, "High- Throughput Sequence Alignment Using Graphics Processing Units," BMC Bioinformatics, vol.8, no.1, p. 474, 2007. [Online]. Available: http://www.biomedcentral.com/1471-2105/8/474
-
(2007)
BMC Bioinformatics
, vol.8
, Issue.1
, pp. 474
-
-
Schatz, M.1
Trapnell, C.2
Delcher, A.3
Varshney, A.4
-
37
-
-
33645998439
-
The TAU Parallel Performance System
-
S. S. Shende and A. D. Malony, "The TAU Parallel Performance System," Int. J. High Perform. Comput. Appl., vol.20, no.2, pp. 287-311, 2006.
-
(2006)
Int. J. High Perform. Comput. Appl.
, vol.20
, Issue.2
, pp. 287-311
-
-
Shende, S.S.1
Malony, A.D.2
-
38
-
-
51849084074
-
-
Sun Microsystems, Online. Available
-
Sun Microsystems, "Sun Studio Performance Analyzer." [Online]. Available: http://developers.sun.com/sunstudio/
-
Sun Studio Performance Analyzer
-
-
-
39
-
-
74049095154
-
Diagnosing Performance Bottlenecks in Emerging Petascale Applications
-
ACM
-
N. R. Tallent, J. M. Mellor-Crummey, L. Adhianto, M. W. Fagan, and M. Krentel, "Diagnosing Performance Bottlenecks in Emerging Petascale Applications," in ACM/IEEE Conference on Supercomputing (SC'09). ACM, 2009, pp. 1-11.
-
(2009)
ACM/IEEE Conference on Supercomputing (SC'09)
, pp. 1-11
-
-
Tallent, N.R.1
Mellor-Crummey, J.M.2
Adhianto, L.3
Fagan, M.W.4
Krentel, M.5
-
40
-
-
70450255123
-
Binary Analysis for Measurement and Attribution of Program Performance
-
N. R. Tallent, J. M. Mellor-Crummey, and M. W. Fagan, "Binary Analysis for Measurement and Attribution of Program Performance," in Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'09), 2009, pp. 441-452.
-
Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'09), 2009
, pp. 441-452
-
-
Tallent, N.R.1
Mellor-Crummey, J.M.2
Fagan, M.W.3
-
42
-
-
84937496563
-
Performance Analysis Using Pipeline Visualization
-
C. Weaver, K. C. Barr, E. Marsman, D. Ernst, and T. Austin, "Performance Analysis Using Pipeline Visualization," in Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE Int'l Symp. on, 2001, pp. 18-21.
-
Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE Int'l Symp. On, 2001
, pp. 18-21
-
-
Weaver, C.1
Barr, K.C.2
Marsman, E.3
Ernst, D.4
Austin, T.5
|