SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 22, Issue 6, 2010, Pages 685-701

HPCTOOLKIT: Tools for performance analysis of optimized parallel programs

(7) Adhianto, L a Banerjee, S a Fagan, M a Krentel, M a Marin, G b Mellor Crummey, J a Tallent, N R a

a Rice University (United States)

b OAK RIDGE NATIONAL LABORATORY (United States)

Author keywords

Binary analysis; Call path profiling; Execution monitoring; Performance tools; Tracing

Indexed keywords

PROGRAM COMPILERS; SPACE TIME CODES; USER INTERFACES;

BINARY ANALYSIS; CALL PATH; EXECUTION MONITORING; PERFORMANCE TOOLS; TRACING;

APPLICATION PROGRAMS;

EID: 77950611743 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe Document Type: Article

Times cited : (584)

References (44)

1
- 0036679608
- HPCView: A tool for top-down analysis of node performance
- Mellor-Crummey JM, Fowler R, Marin G, Tallent N. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing 2002; 23(1):81-104.
- (2002) The Journal of Supercomputing , vol.23 , Issue.1 , pp. 81-104
- Mellor-Crummey, J.M.¹ Fowler, R.² Marin, G.³ Tallent, N.⁴

2
- 67650844203
- Producing wrong data without doing anything obviously wrong!
- ACM: New York, NY, U.S.A.
- Mytkowicz T, Diwan A, Hauswirth M, Sweeney PF. Producing wrong data without doing anything obviously wrong! Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM: New York, NY, U.S.A., 2009; 265-276.
- (2009) Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 265-276
- Mytkowicz, T.¹ Diwan, A.² Hauswirth, M.³ Sweeney, P.F.⁴

3
- 84976736522
- Gprof: A call graph execution profiler
- ACM Press: New York, NY, U.S.A.
- Graham SL, Kessler PB, McKusick MK. Gprof: A call graph execution profiler. Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction. ACM Press: New York, NY, U.S.A., 1982; 120-126.
- (1982) Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction , pp. 120-126
- Graham, S.L.¹ Kessler, P.B.² McKusick, M.K.³

4
- 32844470371
- Low-overhead call path profiling of unmodified, optimized code
- DOI 10.1145/1088149.1088161, ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
- Froyd N, Mellor-Crummey JM, Fowler R. Low-overhead call path profiling of unmodified, optimized code. Proceedings of the 19th Annual International Conference on Supercomputing. ACM Press: New York, NY, U.S.A., 2005; 81-90. (Pubitemid 43251312)
- (2005) Proceedings of the International Conference on Supercomputing , pp. 81-90
- Froyd, N.¹ Mellor-Crummey, J.² Fowler, R.³

5
- 77950623122
- Intel Corporation. Intel VTune performance analyzer. Available at, 2 December
- Intel Corporation. Intel VTune performance analyzer. Available at: http://software.intel.com/en-us/intel-vtune [2 December 2009].
- (2009)

6
- 77950608876
- Intel Corporation. Intel Performance Tuning Utility. Available at, 2 December
- Intel Corporation. Intel Performance Tuning Utility. Available at: http://software.intel.com/en-us/articles/intel-performancetuning-utility [2 December 2009].
- (2009)

7
- 31944440969
- Pin: Building customized program analysis tools with dynamic instrumentation
- Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 05
- Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K. Pin: Building customized program analysis tools with dynamic instrumentation. Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press: New York, NY, U.S.A., 2005; 190-200. (Pubitemid 43185951)
- (2005) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , pp. 190-200
- Luk, C.-K.¹ Cohn, R.² Muth, R.³ Patil, H.⁴ Klauser, A.⁵ Lowney, G.⁶ Wallace, S.⁷ Reddi, V.J.⁸ Hazelwood, K.⁹

8
- 70450255123
- Binary analysis for measurement and attribution of program performance
- ACM: New York, NY, U.S.A.
- Tallent NR, Mellor-Crummey JM, Fagan MW. Binary analysis for measurement and attribution of program performance. Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM: New York, NY, U.S.A., 2009; 441-452.
- Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation , vol.2009 , pp. 441-452
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Fagan, M.W.³

9
- 38049043035
- Springer: Berlin
- Shende S, Malony A, Morris A. Optimization of Instrumentation in Parallel Performance Evaluation Tools (Lecture Notes in Computer Science, vol. 4699). Springer: Berlin, 2008; 440-449.
- (2008) Optimization of Instrumentation in Parallel Performance Evaluation Tools (Lecture Notes in Computer Science, vol. 4699) , pp. 440-449
- Shende, S.¹ Malony, A.² Morris, A.³

10
- 0030645124
- Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling
- Ammons G, Ball T, Larus JR. Exploiting hardware performance counters with flow and context sensitive profiling. SIGPLAN Conference on Programming Language Design and Implementation. ACM: New York, NY, U.S.A., 1997; 85-96. (Pubitemid 127453689)
- (1997) SIGPLAN Notices (ACM Special Interest Group on Programming Languages) , vol.32 , Issue.5 , pp. 85-96
- Ammons, G.¹ Ball, T.² Larus, J.R.³

11
- 34548010778
- Scalability analysis of SPMD codes using expectations
- DOI 10.1145/1274971.1274976, Proceedings of ICS07: 21st ACM International Conference on Supercomputing
- Coarfa C, Mellor-Crummey JM, Froyd N, Dotsenko Y. Scalability analysis of SPMD codes using expectations. ICS'07: Proceedings of the 21st Annual International Conference on Supercomputing. ACM: New York, NY, U.S.A., 2007; 13-22. (Pubitemid 47281602)
- (2007) Proceedings of the International Conference on Supercomputing , pp. 13-22
- Coarfa, C.¹ Mellor-Crummey, J.² Froyd, N.³ Dotsenko, Y.⁴

12
- 74049095154
- Diagnosing performance bottlenecks in emerging petascale applications
- ACM: New York, NY, DOI
- Tallent NR, Mellor-Crummey JM, Adhianto L, Fagan MW, Krentel M. Diagnosing performance bottlenecks in emerging petascale applications. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. SC'09. ACM: New York, NY, 2009; 1-11. DOI: http://doi.acm.org/10. 1145/1654059.1654111.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. SC'09 , pp. 1-11
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Adhianto, L.³ Fagan, M.W.⁴ Krentel, M.⁵

13
- 70350597876
- Effective performance measurement and analysis of multithreaded applications
- ACM: New York, NY, U.S.A.
- Tallent NR, Mellor-Crummey JM. Effective performance measurement and analysis of multithreaded applications. Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM: New York, NY, U.S.A., 2009; 229-240.
- (2009) Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 229-240
- Tallent, N.R.¹ Mellor-Crummey, J.M.²

14
- 77957574504
- Analyzing lock contention in multithreaded applications
- Bangalore, India
- Tallent NR, Mellor-Crummey JM, Porterfield A. Analyzing lock contention in multithreaded applications. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, 2010.
- (2010) Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Porterfield, A.³

15
- 0031622953
- The Implementation of the Cilk-5 Multithreaded Language
- Frigo M, Leiserson CE, Randall KH. The implementation of the Cilk-5 multithreaded language. Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Que., Canada, 1998; 212-223. (Pubitemid 128454798)
- (1998) SIGPLAN Notices (ACM Special Interest Group on Programming Languages) , vol.33 , Issue.5 , pp. 212-223
- Frigo, M.¹ Leiserson, C.E.² Randall, K.H.³

16
- 0032544628
- Turbulent transport reduction by zonal flows: Massively parallel simulations
- DOI 10.1126/science.281.5384.1835
- Lin Z, Hahm TS, Lee WW, Tang WM, White RB. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 1998; 281(5384):1835-1837. (Pubitemid 28450499)
- (1998) Science , vol.281 , Issue.5384 , pp. 1835-1837
- Lin, Z.¹ Hahm, T.S.² Lee, W.W.³ Tang, W.M.⁴ White, R.B.⁵

17
- 0002438680
- VAMPIR: Visualization and analysis of MPI resources
- Nagel WE, Arnold A, Weber M, Hoppe HC, Solchenbach K. VAMPIR: Visualization and analysis of MPI resources. Supercomputer 1996; 12(1):69-80. (Pubitemid 126796012)
- (1996) Supercomputer , vol.12 , Issue.1 , pp. 69-80
- Nagel, W.E.¹ Arnold, A.² Weber, M.³ Hoppe, H.-Ch.⁴ Solchenbach, K.⁵

18
- 0032139230
- Falcon: On-line monitoring for steering parallel programs
- Gu W, Eisenhauer G, Schwan K, Vetter J. Falcon: On-line monitoring for steering parallel programs. Concurrency: Practice and Experience 1998; 10(9):699-736. (Pubitemid 128445432)
- (1998) Concurrency Practice and Experience , vol.10 , Issue.9 , pp. 699-736
- Gu, W.¹ Eisenhauer, G.² Schwan, K.³ Vetter, J.⁴

19
- 0032593334
- Toward scalable performance visualization with Jumpshot
- Zaki O, Lusk E, Gropp W, Swider D. Toward scalable performance visualization with Jumpshot. High Performance Computing Applications 1999; 13(2):277-288.
- (1999) High Performance Computing Applications , vol.13 , Issue.2 , pp. 277-288
- Zaki, O.¹ Lusk, E.² Gropp, W.³ Swider, D.⁴

20
- 79959603625
- Available at, 2 December
- Worley PH. MPICL: A port of the PICL tracing logic to MPI. Available at: http://www.csm.ornl.gov/picl [2 December 2009].
- (2009) MPICL: A Port of the PICL Tracing Logic to MPI
- Worley, PH.¹

21
- 84974695561
- A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications
- OpenMP Shared Memory Parallel Programming International Workshop on OpenMP Applications and Tools, WOMPAT 2001 West Lafayette, IN, USA, July 30-31, 2001 Proceedings
- Caubet J, Gimenez J, Labarta J, Rose LD, Vetter JS. A dynamic tracing mechanism for performance analysis of OpenMP applications. Proceedings of the International Workshop on OpenMP Applications and Tools. Springer: London, U.K., 2001; 53-67. (Pubitemid 33315607)
- (2001) LECTURE NOTES IN COMPUTER SCIENCE , Issue.2104 , pp. 53-67
- Caubet, J.¹ Gimenez, J.² Labarta, J.³ DeRose, L.⁴ Vetter, J.⁵

22
- 33745149889
- EPILOG binary trace-data format
- Forschungszentrum Julich, May
- Wolf F, Mohr B. EPILOG binary trace-data format. Technical Report FZJ-ZAM-IB-2004-06, Forschungszentrum Julich, May 2004.
- (2004) Technical Report FZJ-ZAM-IB-2004-06
- Wolf, F.¹ Mohr, B.²

23
- 0025567275
- Quartz: A tool for tuning parallel program performance
- Anderson TE, Lazowska ED. Quartz: A tool for tuning parallel program performance. SIGMETRICS Performance Evaluation Review 1990; 18(1):115-125.
- (1990) SIGMETRICS Performance Evaluation Review , vol.18 , Issue.1 , pp. 115-125
- Anderson, T.E.¹ Lazowska, E.D.²

24
- 0005973264
- Origin 2000 and Onyx2 performance tuning and optimization guide
- Silicon Graphics, Inc.
- Cortesi D, Fier J, Wilson J, Boney J. Origin 2000 and Onyx2 performance tuning and optimization guide. Technical Report 007-3430-003, Silicon Graphics, Inc., 2001.
- (2001) Technical Report 007-3430-003
- Cortesi, D.¹ Fier, J.² Wilson, J.³ Boney, J.⁴

25
- 38049186498
- On using incremental profiling for the performance analysis of shared memory parallel applications
- Rennes, France
- Fürlinger K, Gerndt M, Dongarra J. On using incremental profiling for the performance analysis of shared memory parallel applications. Proceedings of the 13th International Euro-Par Conference on Parallel Processing, Rennes, France, 2007; 62-71.
- (2007) Proceedings of the 13th International Euro-Par Conference on Parallel Processing , pp. 62-71
- Fürlinger, K.¹ Gerndt, M.² Dongarra, J.³

26
- 51849091556
- Observing performance dynamics using parallel profile snapshots
- Springer: Berlin, Heidelberg
- Morris A, Spear W, Malony AD, Shende S. Observing performance dynamics using parallel profile snapshots. Proceedings of the 14th International Euro-Par Conference on Parallel Processing. Springer: Berlin, Heidelberg, 2008; 162-171.
- (2008) Proceedings of the 14th International Euro-Par Conference on Parallel Processing , pp. 162-171
- Morris, A.¹ Spear, W.² Malony, A.D.³ Shende, S.⁴

27
- 51849136706
- SpeedShop user's guide
- Silicon Graphics Inc. (SGI), SGI
- Silicon Graphics, Inc. (SGI). SpeedShop User's Guide. Technical Report 007-3311-011, SGI, 2003.
- (2003) Technical Report 007-3311-011

28
- 84875944868
- Krell Institute, Available at
- Krell Institute. Open SpeedShop for Linux. Available at: http://www.openspeedshop.org.
- Open SpeedShop for Linux

29
- 33645998439
- The TAU parallel performance system
- Shende SS, Malony AD. The TAU parallel performance system. International Journal of High Performance Computing Applications 2006; 20(2):287-311.
- (2006) International Journal of High Performance Computing Applications , vol.20 , Issue.2 , pp. 287-311
- Shende, S.S.¹ Malony, A.D.²

30
- 0034819519
- Statistical scalability analysis of communication operations in distributed applications
- Snowbird, UT
- Vetter JS, McCracken MO. Statistical scalability analysis of communication operations in distributed applications. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Snowbird, UT, 2001.
- (2001) Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Vetter, J.S.¹ McCracken, M.O.²

31
- 85052019260
- From trace generation to visualization: A performance framework for distributed parallel systems
- IEEE Computer Society: Washington, DC, U.S.A.
- Wu CE, Bolmarcich A, Snir M, Wootton D, Parpia F, Chan A, Lusk E, Gropp W. From trace generation to visualization: A performance framework for distributed parallel systems. Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society: Washington, DC, U.S.A., 2000.
- (2000) Proceedings of the ACM/IEEE Conference on Supercomputing
- Wu, C.E.¹ Bolmarcich, A.² Snir, M.³ Wootton, D.⁴ Parpia, F.⁵ Chan, A.⁶ Lusk, E.⁷ Gropp, W.⁸

32
- 0036036949
- Dynamic statistical profiling of communication activity in distributed applications
- Vetter J. Dynamic statistical profiling of communication activity in distributed applications. Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM Press: New York, NY, U.S.A., 2002; 240-250. (Pubitemid 35009526)
- (2002) Performance Evaluation Review , vol.30 , Issue.1 , pp. 240-250
- Vetter, J.¹

33
- 35048825254
- Design and prototype of a performance tool interface for OpenMP
- Santa Fe, NM, October
- Mohr B, Malony AD, Shende S, Wolf F. Design and prototype of a performance tool interface for OpenMP. Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM, October 2001.
- (2001) Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium
- Mohr, B.¹ Malony, A.D.² Shende, S.³ Wolf, F.⁴

34
- 77950605550
- GASP! A standardized performance analysis tool interface for global address space programming models
- Lawrence Berkeley National Laboratory
- Su H-H, Bonachea D, Leko A, Sherburne H Billingsley III M. George AD. GASP! A standardized performance analysis tool interface for global address space programming models. Technical Report LBNL-61659, Lawrence Berkeley National Laboratory, 2006.
- (2006) Technical Report LBNL-61659
- Su, H.-H.¹ Bonachea, D.² Leko, A.³ Sherburne, H.⁴ Billingsley III, M.⁵ George, A.D.⁶

35
- 85040770718
- Scalable performance analysis: The Pablo performance analysis environment
- IEEE Computer Society: Silver Spring, MD
- Reed DA, Aydt RA, Noe RJ, Roth PC, Shields KA, Schwartz BW, Tavera LF. Scalable performance analysis: The Pablo performance analysis environment. Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer Society: Silver Spring, MD, 1993; 104-113.
- (1993) Proceedings of the Scalable Parallel Libraries Conference , pp. 104-113
- Reed, D.A.¹ Aydt, R.A.² Noe, R.J.³ Roth, P.C.⁴ Shields, K.A.⁵ Schwartz, B.W.⁶ Tavera, L.F.⁷

36
- 33646427877
- A performance monitoring interface for OpenMP
- Rome, Italy
- Mohr B, Malony AD, Hoppe H-C, Schlimbach F, Haab G, Hoeflinger J, Shah S. A performance monitoring interface for OpenMP. Proceedings of the Fourth European Workshop on OpenMP, Rome, Italy, 2002.
- (2002) Proceedings of the Fourth European Workshop on OpenMP
- Mohr, B.¹ Malony, A.D.² Hoppe, H.-C.³ Schlimbach, F.⁴ Haab, G.⁵ Hoeflinger, J.⁶ Shah, S.⁷

37
- 77952005316
- OmpP: A profiling tool for OpenMP
- Eugene, OR, U.S.A.
- Fürlinger K, Gerndt M. ompP: A profiling tool for OpenMP. Proceedings of the First and Second International Workshops on OpenMP (Lecture Notes in Computer Science, vol. 4315), Eugene, OR, U.S.A., 2005; 12-23.
- (2005) Proceedings of the First and Second International Workshops on OpenMP (Lecture Notes in Computer Science, vol. 4315) , pp. 12-23
- Fürlinger, K.¹ Gerndt, M.²

38
- 56749160395
- N MPI tools: A whole lot greater than the sum of their parts
- ACM: New York, NY, U.S.A.
- N MPI tools: A whole lot greater than the sum of their parts. Proceedings of the 2007 ACM/IEEE Conference on Supercomputing. ACM: New York, NY, U.S.A., 2007; 1-10.
- (2007) Proceedings of the 2007 ACM/IEEE Conference on Supercomputing , pp. 1-10
- Schulz, M.¹ De Supinski, R.B.²

39
- 0034543798
- An API for runtime code patching
- Buck B, Hollingsworth JK. An API for runtime code patching. The International Journal of High Performance Computing Applications 2000; 14(4):317-329.
- (2000) The International Journal of High Performance Computing Applications , vol.14 , Issue.4 , pp. 317-329
- Buck, B.¹ Hollingsworth, J.K.²

40
- 84981167256
- The dynamic probe class library - An infrastructure for developing instrumentation for performance tools
- San Francisco, CA, U.S.A., April
- DeRose L, Ted Hoover J, Hollingsworth JK. The dynamic probe class library-An infrastructure for developing instrumentation for performance tools. Proceedings of the International Parallel and Distributed Processing Symposium, San Francisco, CA, U.S.A., April 2001.
- (2001) Proceedings of the International Parallel and Distributed Processing Symposium
- DeRose, L.¹ Ted Hoover, J.² Hollingsworth, J.K.³

41
- 0029408429
- The Paradyn parallel performance measurement tool
- Miller BP, Callaghan MD, Cargille JM, Hollingsworth JK, Irvin RB, Karavanic KL, Kunchithapadam K, Newhall T. The Paradyn parallel performance measurement tool. IEEE Computer 1995; 28(11):37-46.
- (1995) IEEE Computer , vol.28 , Issue.11 , pp. 37-46
- Miller, B.P.¹ Callaghan, M.D.² Cargille, J.M.³ Hollingsworth, J.K.⁴ Irvin, R.B.⁵ Karavanic, K.L.⁶ Kunchithapadam, K.⁷ Newhall, T.⁸

42
- 77950623438
- Available at, 2 December
- Mucci PJ. PapiEx-Execute arbitrary application and measure hardware performance counters with PAPI. Available at: http://icl.cs.utk.edu/~mucci/ papiex [2 December 2009].
- (2009) PapiEx-execute Arbitrary Application and Measure Hardware Performance Counters with PAPI
- Mucci, P.J.¹

43
- 0033691589
- Performance analysis of distributed applications using automatic classification of communication inefficiencies
- Santa Fe, NM, U.S.A.
- Vetter J. Performance analysis of distributed applications using automatic classification of communication inefficiencies. International Conference on Supercomputing, Santa Fe, NM, U.S.A., 2000; 245-254.
- (2000) International Conference on Supercomputing , pp. 245-254
- Vetter, J.¹

44
- 33646137721
- Efficient Pattern Search in Large Traces Through Successive Refinement
- Euro-Par 2004 Parallel Processing
- Wolf F, Mohr B, Dongarra J, Moore S. Efficient pattern search in large traces through successive refinement. Proceedings of the European Conference on Parallel Computing, Pisa, Italy, August 2004. (Pubitemid 39217254)
- (2004) LECTURE NOTES IN COMPUTER SCIENCE , Issue.3149 , pp. 47-54
- Wolf, F.¹ Mohr, B.² Dongarra, J.³ Moore, S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.