SCOPUS 정보 검색 플랫폼

ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software

Volumn , Issue , 2010, Pages 164-174

Visualizing complex dynamics in many-core accelerator architectures

(4) Ariel, Aaron a Fung, Wilson W L a Turner, Andrew E a Aamodt, Tor M a

a UNIVERSITY OF BRITISH COLUMBIA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ACCELERATOR ARCHITECTURES; ANOMALOUS BEHAVIOR; APPLICATION EXECUTION; ARCHITECTURE DESIGNS; COMPLEX DYNAMIC BEHAVIOR; COMPLEX DYNAMICS; COMPUTING POWER; DYNAMIC BEHAVIORS; GRAPHICS PROCESSING UNITS; HARDWARE DESIGNERS; IDENTIFICATION PROCESS; MANY-CORE; MASSIVE PARALLELISM; NOVEL METHODOLOGY; ORDERS OF MAGNITUDE; PERFORMANCE ANALYSIS; PERFORMANCE COUNTERS; PERFORMANCE LOSS; PERFORMANCE POTENTIALS; PERFORMANCE STATISTICS; PERFORMANCE VISUALIZATION; SOFTWARE DEVELOPER; SOURCE CODES;

OPTIMIZATION; VISUALIZATION;

PROGRAM PROCESSORS;

EID: 77952660587 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ISPASS.2010.5452029 Document Type: Conference Paper

Times cited : (27)

References (44)

1
- 77952659348
- Online. Available
- "Open|SpeedShop." [Online]. Available: http://www. openspeedshop.org/wp/

2
- 43649092214
- 1st ed., Advanced Micro Devices, Inc.
- ATI CTM Guide, 1st ed., Advanced Micro Devices, Inc., 2006.
- (2006) ATI CTM Guide

3
- 84964634356
- 28 January
- Press Release: AMD Delivers Enthusiast Performance Leadership with the Introduction of the ATI Radeon HD 3870 X2, http://www.amd.com, Advanced Micro Devices, Inc., 28 January 2008.
- (2008) Press Release: AMD Delivers Enthusiast Performance Leadership with the Introduction of the ATI Radeon HD 3870 X2

4
- 77952633238
- S. Al-Kiswany, "Personal Communication," 2009.
- (2009) Personal Communication
- Al-Kiswany, S.¹

5
- 57349130987
- StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems
- S. Al-Kiswany, A. Gharaibeh, E. Santos-Neto, G. Yuan, and M. Ripeanu, "StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems," in Proc. 17th Int'l Symp. on High Performance Distributed Computing, 2008, pp. 165-174.
- Proc. 17th Int'l Symp. on High Performance Distributed Computing, 2008 , pp. 165-174
- Al-Kiswany, S.¹ Gharaibeh, A.² Santos-Neto, E.³ Yuan, G.⁴ Ripeanu, M.⁵

6
- 27544481926
- Variability in architectural simulations of multi-threaded workloads
- A. R. Alameldeen and D. A. Wood, "Variability in architectural simulations of multi-threaded workloads," in Proc. 9th Int'l Symp. on High Performance Computer Architecture, 2003, pp. 7-18.
- Proc. 9th Int'l Symp. on High Performance Computer Architecture, 2003 , pp. 7-18
- Alameldeen, A.R.¹ Wood, D.A.²

7
- 77952597070
- Parallelization Made Easier with Intel Performance-Tuning Utility
- A. Alexandrov, S. Bratanov, J. Fedorova, D. Levinthal, I. Lopatin, and D. Ryabtsev, "Parallelization Made Easier with Intel Performance-Tuning Utility," Intel Technology Journal, vol.11, no.4, 2007.
- (2007) Intel Technology Journal , vol.11 , Issue.4
- Alexandrov, A.¹ Bratanov, S.² Fedorova, J.³ Levinthal, D.⁴ Lopatin, I.⁵ Ryabtsev, D.⁶

8
- 77952591082
- Online. Available
- Apple Inc., "Optimizing with Shark." [Online]. Available: http://developer.apple.com/tools/shark optimize.html
- Optimizing with Shark

9
- 70349169075
- Analyzing CUDA Workloads Using a Detailed GPU Simulator
- A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 163-174.
- IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009 , pp. 163-174
- Bakhoda, A.¹ Yuan, G.L.² Fung, W.W.L.³ Wong, H.⁴ Aamodt, T.M.⁵

10
- 52249111370
- Trace-based Performance Analysis on Cell BE
- M. Biberstein, U. Shvadron, J. Turek, B. Mendelson, and M. Chang, "Trace-based Performance Analysis on Cell BE," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2008), April 2008, pp. 213-222.
- IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2008), April 2008 , pp. 213-222
- Biberstein, M.¹ Shvadron, U.² Turek, J.³ Mendelson, B.⁴ Chang, M.⁵

11
- 0003465202
- D. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," http://www.simplescalar.com, 1997.
- (1997) The SimpleScalar Tool Set, Version 2.0
- Burger, D.¹ Austin, T.M.²

12
- 60649113854
- CEPBA
- CEPBA, "Paraver - Parallel Program Visualization and Analysis tool - REFERENCE MANUAL," 2001.
- (2001) Paraver - Parallel Program Visualization and Analysis Tool - REFERENCE MANUAL

13
- 77952591944
- D. Dale, M. Droettboom, E. Firing, and J. Hunter, "Matplotlib User's Guide," http://matplotlib.sourceforge.net/Matplotlib.pdf.
- Matplotlib User's Guide
- Dale, D.¹ Droettboom, M.² Firing, E.³ Hunter, J.⁴

14
- 47349104432
- Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," in Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007.
- Proc. 40th IEEE/ACM Int'l Symp. on Microarchitecture, 2007
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

15
- 68549096107
- Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware
- -, "Dynamic Warp Formation: Efficient MIMD Control Flow on SIMD Graphics Hardware," ACM Trans. Archit. Code Optim., vol.6, no.2, pp. 1-37, 2009.
- (2009) ACM Trans. Archit. Code Optim. , vol.6 , Issue.2 , pp. 1-37

16
- 77952620490
- Online. Available
- M. Giles and S. Xiaoke, "Notes on Using the NVIDIA 8800 GTX Graphics Card." [Online]. Available: http://people.maths.ox.ac.uk/ ~gilesm/hpc/
- Notes on Using the NVIDIA 8800 GTX Graphics Card
- Giles, M.¹ Xiaoke, S.²

17
- 38349041620
- Accelerating Large Graph Algorithms on the GPU Using CUDA
- P. Harish and P. J. Narayanan, "Accelerating Large Graph Algorithms on the GPU Using CUDA," in HiPC, 2007, pp. 197-208.
- (2007) HiPC , pp. 197-208
- Harish, P.¹ Narayanan, P.J.²

18
- 77952587909
- Online. Available
- M. Harris, "UNSW CUDA Tutorial Part 4 - Optimizing CUDA." [Online]. Available: http://www.cse.unsw.edu.au/~pls/cuda-workshop09/slides/04 OptimizingCUDA full.pdf
- UNSW CUDA Tutorial Part 4 - Optimizing CUDA
- Harris, M.¹

19
- 70450231944
- An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness
- S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness," Proc. 36th Int'l Symp. on Computer Architecture, vol.37, no.3, pp. 152-163, 2009.
- (2009) Proc. 36th Int'l Symp. on Computer Architecture , vol.37 , Issue.3 , pp. 152-163
- Hong, S.¹ Kim, H.²

20
- 19644399541
- Online. Available
- Intel Corp., "Intel VTune™ Performance Analyzer." [Online]. Available: http://software.intel.com/en-us/intel-vtune/
- Intel VTune™ Performance Analyzer
- Corp, I.¹

21
- 77952640012
- 1st ed., Khronos Group
- OpenCL 1.0 Specification, 1st ed., Khronos Group, 2009.
- (2009) OpenCL 1.0 Specification

22
- 0021458622
- Chap - A SIMD Graphics Processor
- A. Levinthal and T. Porter, "Chap - a SIMD Graphics Processor," in Proc. 11th Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH '84), 1984, pp. 77-82.
- Proc. 11th Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH '84), 1984 , pp. 77-82
- Levinthal, A.¹ Porter, T.²

23
- 44849137198
- NVIDIA Tesla: A Unified Graphics and Computing Architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol.28, no.2, pp. 39-55, 2008.
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

24
- 77954440613
- Marco Chiappetta, Online. Available
- Marco Chiappetta, "ATI Stream Computing: ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview." [Online]. Available: http://developer.amd.com/gpu/ATIStreamSDK/pages/Publications.aspx
- ATI Stream Computing: ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview

25
- 85015171905
- Maxime
- Maxime, "Ray tracing," http://www.nvidia.com/cuda.
- Ray Tracing

26
- 77952603264
- Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision
- J. Meng, D. Tarjan, and K. Skadron, "Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision," Department of Computer Science, University of Virginia, Tech. Rep. CS-2009-2102, 2009.
- (2009) Department of Computer Science, University of Virginia, Tech. Rep. CS-2009-2102
- Meng, J.¹ Tarjan, D.² Skadron, K.³

27
- 70349189978
- Cetra: A Trace and Analysis Framework for the Evaluation of Cell BE systems
- J. Merino, L. Alvarez, M. Gil, and N. Navarro, "Cetra: A Trace and Analysis Framework for the Evaluation of Cell BE systems," in IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 43-52.
- IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS 2009), April 2009 , pp. 43-52
- Merino, J.¹ Alvarez, L.² Gil, M.³ Navarro, N.⁴

28
- 78651550268
- Scalable Parallel Programming with CUDA
- Mar.-Apr.
- J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," ACM Queue, vol.6, no.2, pp. 40-53, Mar.-Apr. 2008.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

29
- 70449710954
- 1st ed., NVIDIA Corp., Online. Available
- NVIDIA CUDA Visual Profiler, 1st ed., NVIDIA Corp., 2008. [Online]. Available: http://developer.download.nvidia.com/compute/cuda/2 3/toolkit/docs/cudaprof 2.3 readme.txt
- (2008) NVIDIA CUDA Visual Profiler

30
- 84872053761
- NVIDIA Corporation, Online. Available
- NVIDIA Corporation, "NVIDIA CUDA SDK code samples." [Online]. Available: http://developer.download.nvidia.com/compute/cuda/sdk/website/ samples.html
- NVIDIA CUDA SDK Code Samples

31
- 35948991669
- 1st ed., NVIDIA Corporation
- NVIDIA CUDA Programming Guide, 1st ed., NVIDIA Corporation, 2007.
- (2007) NVIDIA CUDA Programming Guide

32
- 70349189054
- 20 June
- Press Release: NVIDIA Tesla GPU Computing Processor Ushers In the Era of Personal Supercomputing, http://www.nvidia.com, NVIDIA Corporation, 20 June 2007.
- (2007) Press Release: NVIDIA Tesla GPU Computing Processor Ushers in the Era of Personal Supercomputing

33
- 77952626611
- Rice University, Online. Available
- Rice University, "HPCToolkit." [Online]. Available: http://hpctoolkit.org/
- HPCToolkit

34
- 0033691565
- Memory Access Scheduling
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory Access Scheduling," in Proc. 27th Int'l Symp. on Computer Architecture, 2000, pp. 128-138.
- Proc. 27th Int'l Symp. on Computer Architecture, 2000 , pp. 128-138
- Rixner, S.¹ Dally, W.J.² Kapasi, U.J.³ Mattson, P.⁴ Owens, J.D.⁵

35
- 43449094719
- Program Optimization Space Pruning for a Multithreaded GPU
- S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S.-Z. Ueng, J. Stratton, and W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," in Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO), April 2008, pp. 195-204.
- Proc. 6th Int'l Symp. on Code Generation and Optimization (CGO), April 2008 , pp. 195-204
- Ryoo, S.¹ Rodrigues, C.² Stone, S.³ Baghsorkhi, S.⁴ Ueng, S.-Z.⁵ Stratton, J.⁶ Hwu, W.W.⁷

36
- 38849131252
- High-Throughput Sequence Alignment Using Graphics Processing Units
- Online. Available
- M. Schatz, C. Trapnell, A. Delcher, and A. Varshney, "High- Throughput Sequence Alignment Using Graphics Processing Units," BMC Bioinformatics, vol.8, no.1, p. 474, 2007. [Online]. Available: http://www.biomedcentral.com/1471-2105/8/474
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

37
- 33645998439
- The TAU Parallel Performance System
- S. S. Shende and A. D. Malony, "The TAU Parallel Performance System," Int. J. High Perform. Comput. Appl., vol.20, no.2, pp. 287-311, 2006.
- (2006) Int. J. High Perform. Comput. Appl. , vol.20 , Issue.2 , pp. 287-311
- Shende, S.S.¹ Malony, A.D.²

38
- 51849084074
- Sun Microsystems, Online. Available
- Sun Microsystems, "Sun Studio Performance Analyzer." [Online]. Available: http://developers.sun.com/sunstudio/
- Sun Studio Performance Analyzer

39
- 74049095154
- Diagnosing Performance Bottlenecks in Emerging Petascale Applications
- ACM
- N. R. Tallent, J. M. Mellor-Crummey, L. Adhianto, M. W. Fagan, and M. Krentel, "Diagnosing Performance Bottlenecks in Emerging Petascale Applications," in ACM/IEEE Conference on Supercomputing (SC'09). ACM, 2009, pp. 1-11.
- (2009) ACM/IEEE Conference on Supercomputing (SC'09) , pp. 1-11
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Adhianto, L.³ Fagan, M.W.⁴ Krentel, M.⁵

40
- 70450255123
- Binary Analysis for Measurement and Attribution of Program Performance
- N. R. Tallent, J. M. Mellor-Crummey, and M. W. Fagan, "Binary Analysis for Measurement and Attribution of Program Performance," in Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'09), 2009, pp. 441-452.
- Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'09), 2009 , pp. 441-452
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Fagan, M.W.³

41
- 77952649149
- Increasing Memory Miss Tolerance for SIMD Cores
- D. Tarjan, J. Meng, and K. Skadron, "Increasing Memory Miss Tolerance for SIMD Cores," in ACM/IEEE Conference on Supercomputing (SC'09), 2009.
- ACM/IEEE Conference on Supercomputing (SC'09), 2009
- Tarjan, D.¹ Meng, J.² Skadron, K.³

42
- 84937496563
- Performance Analysis Using Pipeline Visualization
- C. Weaver, K. C. Barr, E. Marsman, D. Ernst, and T. Austin, "Performance Analysis Using Pipeline Visualization," in Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE Int'l Symp. on, 2001, pp. 18-21.
- Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE Int'l Symp. On, 2001 , pp. 18-21
- Weaver, C.¹ Barr, K.C.² Marsman, E.³ Ernst, D.⁴ Austin, T.⁵

43
- 77952654737
- G. Yuan, "Personal Communication," 2009.
- (2009) Personal Communication
- Yuan, G.¹

44
- 76749123978
- Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures
- G. L. Yuan, A. Bakhoda, and T. M. Aamodt, "Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures," in Proc. 42th IEEE/ACM Int'l Symp. on Microarchitecture, 2009.
- Proc. 42th IEEE/ACM Int'l Symp. on Microarchitecture, 2009
- Yuan, G.L.¹ Bakhoda, A.² Aamodt, T.M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.