SCOPUS 정보 검색 플랫폼

Annual ACM Symposium on Parallelism in Algorithms and Architectures

Volumn , Issue , 2010, Pages 145-156

The Cilkview scalability analyzer

(3) He, Yuxiong a,b Leiserson, Charles E a,c Leiserson, William M a

a INTEL CORPORATION (United States)

b MICROSOFT RESEARCH (United States)

c MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Burdened parallelism; Cilk++; Cilkview; Dag model; Multicore programming; Multithreading; Parallel programming; Parallelism; Performance; Scalability; Software tools; Span; Speedup; Work

Indexed keywords

CILKVIEW; MULTI CORE; MULTI-THREADING; PERFORMANCE SCALABILITY; SOFTWARE TOOL;

COMPUTER AIDED SOFTWARE ENGINEERING; COMPUTER SOFTWARE; GRAIN SIZE AND SHAPE; JAVA PROGRAMMING LANGUAGE; METADATA; MULTITASKING; PARALLEL PROGRAMMING; SCALABILITY;

BENCHMARKING;

EID: 77954942121 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1810479.1810509 Document Type: Conference Paper

Times cited : (84)

References (51)

1
- 77952597070
- Parallelization made easier with Intel Performance-Tuning Utility
- A. Alexandrov, S. Bratanov, J. Fedorova, D. Levinthal, I. Lopatin, and D. Ryabtsev. Parallelization made easier with Intel Performance-Tuning Utility. Intel Technology Journal, 2007. http://www.intel.com/technology/itj/2007/v11i4// 1-adstract.htm.
- (2007) Intel Technology Journal
- Alexandrov, A.¹ Bratanov, S.² Fedorova, J.³ Levinthal, D.⁴ Lopatin, I.⁵ Ryabtsev, D.⁶

2
- 33646421297
- Version 1.0. Sun Microsystems, Inc.
- E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. S. Jr., and S. Tobin-Hochstadt. The Fortress Language Specification, Version 1.0. Sun Microsystems, Inc., 2008.
- (2008) The Fortress Language Specification
- Allen, E.¹ Chase, D.² Hallett, J.³ Luchangco, V.⁴ Maessen, J.-W.⁵ Ryu, S.⁶ S. Jr., G.L.⁷ Tobin-Hochstadt, S.⁸

3
- 0025567275
- Quartz: A tool for tuning parallel program performance
- T. E. Anderson and E. D. Lazowska. Quartz: a tool for tuning parallel program performance. SIGMETRICS Perform. Eval. Rev., 18(1):115-125, 1990.
- (1990) SIGMETRICS Perform. Eval. Rev. , vol.18 , Issue.1 , pp. 115-125
- Anderson, T.E.¹ Lazowska, E.D.²

4
- 0038036149
- Space-efficient scheduling of multithreaded computations
- R. D. Blumofe and C. E. Leiserson. Space-efficient scheduling of multithreaded computations. SIAM J. Comput., 27(1):202-229, 1998.
- (1998) SIAM J. Comput. , vol.27 , Issue.1 , pp. 202-229
- Blumofe, R.D.¹ Leiserson, C.E.²

5
- 0000269759
- Scheduling multithreaded computations by work stealing
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720-748, 1999.
- (1999) JACM , vol.46 , Issue.5 , pp. 720-748
- Blumofe, R.D.¹ Leiserson, C.E.²

6
- 0013244471
- Hood: A user-level threads library for multiprogrammed multiprocessors
- R. D. Blumofe and D. Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin, 1999.
- (1999) Technical Report, University of Texas at Austin
- Blumofe, R.D.¹ Papadopoulos, D.²

7
- 0016046965
- The parallel evaluation of general arithmetic expressions
- R. P. Brent. The parallel evaluation of general arithmetic expressions. JACM, 21(2):201-206, 1974.
- (1974) JACM , vol.21 , Issue.2 , pp. 201-206
- Brent, R.P.¹

8
- 31944438135
- PhD thesis, MIT EECS
- D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, MIT EECS, 2004.
- (2004) Efficient, Transparent, and Comprehensive Runtime Code Manipulation
- Bruening, D.¹

9
- 0034543798
- An API for runtime code patching
- B. Buck and J. K. Hollingsworth. An API for Runtime Code Patching. Int. J. High Perf. Comput. Appl., 14(4):317-329, 2000.
- (2000) Int. J. High Perf. Comput. Appl. , vol.14 , Issue.4 , pp. 317-329
- Buck, B.¹ Hollingsworth, J.K.²

10
- 77954949865
- Available from
- J. Carr. A parallel bzip2. Available from http://sotware.intel.com/en-us/ articles/a-parallel-bzip2/, 2009.
- (2009) A Parallel
- Carr, J.¹

11
- 84974695561
- A dynamic tracing mechanism for performance analysis of OpenMP applications
- J. Caubet, J. Gimenez, J. Labarta, L. D. Rose, and J. S. Vetter. A dynamic tracing mechanism for performance analysis of OpenMP applications. In WOMPAT, pp. 53-67, 2001.
- (2001) WOMPAT , pp. 53-67
- Caubet, J.¹ Gimenez, J.² Labarta, J.³ Rose, L.D.⁴ Vetter, J.S.⁵

12
- 33745200313
- X10: An object-oriented approach to non-uniform cluster computing
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, pp. 519-538, 2005.
- (2005) OOPSLA , pp. 519-538
- Charles, P.¹ Grothoff, C.² Saraswat, V.³ Donawa, C.⁴ Kielstra, A.⁵ Ebcioglu, K.⁶ Praun, C.V.⁷ Sarkar, V.⁸

13
- 0004116989
- The MIT Press, third edition
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.
- (2009) Introduction to Algorithms
- Cormen, T.H.¹ Leiserson, C.E.² Rivest, R.L.³ Stein, C.⁴

14
- 0001162786
- On the partial difference equations of mathematical physics
- R. Courant, K. Friedrichs, and H. Lewy. On the partial difference equations of mathematical physics. IBM J. R&D, 11(2):215-234, 1967.
- (1967) IBM J. R&D , vol.11 , Issue.2 , pp. 215-234
- Courant, R.¹ Friedrichs, K.² Lewy, H.³

15
- 84981167256
- The Dynamic Probe Class Library - An infrastructure for developing instrumentation for performance tools
- L. DeRose, T. Hoover Jr., and J. K. Hollingsworth. The Dynamic Probe Class Library - an infrastructure for developing instrumentation for performance tools. In IPDPS, p. 10066b, 2001.
- (2001) IPDPS
- DeRose, L.¹ Hoover Jr., T.² Hollingsworth, J.K.³

16
- 0001801746
- Protocol verification as a hardware design aid
- D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as a hardware design aid. In ICCD, pp. 522-525, 1992.
- (1992) ICCD , pp. 522-525
- Dill, D.L.¹ Drexler, A.J.² Hu, A.J.³ Yang, C.H.⁴

17
- 0024627264
- Speedup versus efficiency in parallel systems
- D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEE Trans. Comput., 38(3):408-423, 1989.
- (1989) IEEE Trans. Comput. , vol.38 , Issue.3 , pp. 408-423
- Eager, D.L.¹ Zahorjan, J.² Lazowska, E.D.³

18
- 70449631676
- Reducers and other Cilk++ hyperobjects
- M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other Cilk++ hyperobjects. In SPAA, pp. 79-90, 2009.
- (2009) SPAA , pp. 79-90
- Frigo, M.¹ Halpern, P.² Leiserson, C.E.³ Lewin-Berlin, S.⁴

19
- 0033350255
- Cache-oblivious algorithms
- New York, New York
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pp. 285-297, New York, New York, 1999.
- (1999) FOCS , pp. 285-297
- Frigo, M.¹ Leiserson, C.E.² Prokop, H.³ Ramachandran, S.⁴

20
- 0031622953
- The implementation of the Cilk-5 multithreaded language
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pp. 212-223, 1998.
- (1998) PLDI , pp. 212-223
- Frigo, M.¹ Leiserson, C.E.² Randall, K.H.³

21
- 32844463802
- Cache oblivious stencil computations
- M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS, pp. 361-366, 2005.
- (2005) ICS , pp. 361-366
- Frigo, M.¹ Strumpen, V.²

22
- 33749564381
- The cache complexity of multithreaded cache oblivious algorithms
- M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In SPAA, pp. 271-280, 2006.
- (2006) SPAA , pp. 271-280
- Frigo, M.¹ Strumpen, V.²

23
- 0004127488
- W. H. Freeman
- M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman, 1979.
- (1979) Computers and Intractability
- Garey, M.R.¹ Johnson, D.S.²

24
- 0026284572
- Performance debugging shared memory multiprocessor programs with MTOOL
- A. J. Goldberg and J. L. Hennessy. Performance debugging shared memory multiprocessor programs with MTOOL. In SC'91, pp. 481-490, 1991.
- (1991) SC'91 , pp. 481-490
- Goldberg, A.J.¹ Hennessy, J.L.²

25
- 84944813080
- Bounds for certain multiprocessing anomalies
- R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45:1563-1581, 1966.
- (1966) Bell System Technical Journal , vol.45 , pp. 1563-1581
- Graham, R.L.¹

26
- 0020807223
- An execution profiler for modular programs
- S. Graham, P. Kessler, and M. McKusick. An execution profiler for modular programs. Software-Practice and Experience, 13(8):671-685, 1983.
- (1983) Software-Practice and Experience , vol.13 , Issue.8 , pp. 671-685
- Graham, S.¹ Kessler, P.² McKusick, M.³

27
- 77954904570
- Available from
- Y. He. Multicore-enabling the Murphi verification tool. Available from http://sotware.intel.com/en-us/articles/multicore-enabling-the-murphi- cerfication-tool/, 2009.
- (2009) Multicore-enabling the Murphi Verification Tool
- He, Y.¹

28
- 77954004251
- Available from, Document No. 322581-001US
- Intel Corp. Intel Cilk++ SDK Programmer's Guide, 2009. Available from http://sotware.intel.com/en-us/articles/download-intel-cilk-sdk/Document No. 322581-001US.
- (2009) Intel Cilk++ SDK Programmer's Guide

29
- 77954930852
- Available from, Document No. 320486-003US
- Intel Corp. Intel Parallel Amplifier. Available from http://sotware. intel.com/sites/products/documentation/studio/amplifier/en-us/2009/ug-docs/ index.htm. Document No. 320486-003US, 2009.
- (2009) Intel Parallel Amplifier

30
- 77954936577
- Available from
- Intel Corp. Intel Thread Profiler. Available from http://sotware.intel. com/en-us/articles/intel-thread-profiler-for-windows-documentation/, 2010.
- (2010) Intel Thread Profiler

31
- 0034593391
- A Java fork/join framework
- D. Lea. A Java fork/join framework. In Java Grande, pp. 36-43, 2000.
- (2000) Java Grande , pp. 36-43
- Lea, D.¹

32
- 72249096886
- The design of a task parallel library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA, pp. 227-242, 2009.
- (2009) OOPSLA , pp. 227-242
- Leijen, D.¹ Schulte, W.² Burckhardt, S.³

33
- 77951240770
- The Cilk++ concurrency platform
- C. E. Leiserson. The Cilk++ concurrency platform. J. Supercomput., 51(3):244-257, 2010.
- (2010) J. Supercomput. , vol.51 , Issue.3 , pp. 244-257
- Leiserson, C.E.¹

34
- 77954929696
- A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)
- C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, 2010.
- (2010) SPAA
- Leiserson, C.E.¹ Schardl, T.B.²

35
- 31944440969
- Pin: Building customized program analysis tools with dynamic instrumentation
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, pp. 190-200, 2005.
- (2005) PLDI , pp. 190-200
- Luk, C.-K.¹ Cohn, R.² Muth, R.³ Patil, H.⁴ Klauser, A.⁵ Lowney, G.⁶ Wallace, S.⁷ Reddi, V.J.⁸ Hazelwood, K.⁹

36
- 77950985865
- Balanced dense polynomial multiplication on multi-cores
- M. M. Maza and Y. Xie. Balanced dense polynomial multiplication on multi-cores. In PDCAT, pp. 1-9, 2009.
- (2009) PDCAT , pp. 1-9
- Maza, M.M.¹ Xie, Y.²

37
- 77952351456
- FFT-based dense polynomial arithmetic on multi-cores
- M. M. Maza and Y. Xie. FFT-based dense polynomial arithmetic on multi-cores. In HPCS, pp. 378-399, 2009.
- (2009) HPCS , pp. 378-399
- Maza, M.M.¹ Xie, Y.²

38
- 0029408429
- The Paradyn parallel performance measurement tool
- B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The Paradyn parallel performance measurement tool. IEEE Computer, 28(11):37-46, 1995.
- (1995) IEEE Computer , vol.28 , Issue.11 , pp. 37-46
- Miller, B.P.¹ Callaghan, M.D.² Cargille, J.M.³ Hollingsworth, J.K.⁴ Irvin, R.B.⁵ Karavanic, K.L.⁶ Kunchithapadam, K.⁷ Newhall, T.⁸

39
- 33646427877
- A performance monitoring interface for OpenMP
- B. Mohr, A. D. Malony, F. Schlimbach, G. Haab, J. Hoeflinger, and S. Shah. A performance monitoring interface for OpenMP. In IWOMP, 2002.
- (2002) IWOMP
- Mohr, B.¹ Malony, A.D.² Schlimbach, F.³ Haab, G.⁴ Hoeflinger, J.⁵ Shah, S.⁶

40
- 0036679605
- Design and prototype of a performance tool interface for OpenMP
- B. Mohr, A. D. Malony, S. Shende, and F. Wolf. Design and prototype of a performance tool interface for OpenMP. J. Supercomput., 23(1):105-128, 2002.
- (2002) J. Supercomput. , vol.23 , Issue.1 , pp. 105-128
- Mohr, B.¹ Malony, A.D.² Shende, S.³ Wolf, F.⁴

41
- 33646152753
- A scalable approach to MPI application performance analysis
- S. Moore, F. Wolf, J. Dongarra, S. Shende, A. Malony, and B. Mohr. A scalable approach to MPI application performance analysis. In EUROPVMMPI, pp. 309-316, 2005.
- (2005) EUROPVMMPI , pp. 309-316
- Moore, S.¹ Wolf, F.² Dongarra, J.³ Shende, S.⁴ Malony, A.⁵ Mohr, B.⁶

42
- 33745612838
- version 3.0
- OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, 2008.
- (2008) OpenMP Application Program Interface

43
- 85040770718
- Scalable performance analysis: The Pablo performance analysis environment
- D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. W. Schwartz, and L. F. Tavera. Scalable performance analysis: The Pablo performance analysis environment. In Scalable Parallel Lib. Conf., pp. 104-113, 1993.
- (1993) Scalable Parallel Lib. Conf. , pp. 104-113
- Reed, D.A.¹ Aydt, R.A.² Noe, R.J.³ Roth, P.C.⁴ Shields, K.A.⁵ Schwartz, B.W.⁶ Tavera, L.F.⁷

44
- 67650370031
- Intel Press
- J. Reinders. VTune Performance Analyzer Essentials. Intel Press, 2005.
- (2005) VTune Performance Analyzer Essentials
- Reinders, J.¹

45
- 43149087461
- O'Reilly
- J. Reinders. Intel Threading Building Blocks. O'Reilly, 2007.
- (2007) Intel Threading Building Blocks
- Reinders, J.¹

46
- 77954935024
- Available from
- J. Seward. bzip2 and libbzip2, version 1.0.5: A program and library for data compression. Available from http://www.bzip2.org.
- Bzip2 and Libbzip2, Version 1.0.5: A Program and Library for Data Compression
- Seward, J.¹

47
- 67650034867
- Effective performance measurement and analysis of multithreaded applications
- N. R. Tallent and J. M. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In PPoPP, pp. 229-240, 2009.
- (2009) PPoPP , pp. 229-240
- Tallent, N.R.¹ Mellor-Crummey, J.M.²

48
- 67650837951
- Binary analysis for measurement and attribution of program performance
- N. R. Tallent, J. M. Mellor-Crummey, and M. W. Fagan. Binary analysis for measurement and attribution of program performance. In PLDI, pp. 441-452, 2009.
- (2009) PLDI , pp. 441-452
- Tallent, N.R.¹ Mellor-Crummey, J.M.² Fagan, M.W.³

49
- 0036036949
- Dynamic statistical profiling of communication activity in distributed applications
- J. Vetter. Dynamic statistical profiling of communication activity in distributed applications. In SIGMETRICS, pp. 240-250, 2002.
- (2002) SIGMETRICS , pp. 240-250
- Vetter, J.¹

50
- 0034819519
- Statistical scalability analysis of communication operations in distributed applications
- J. S. Vetter and M. O. McCracken. Statistical scalability analysis of communication operations in distributed applications. SIGPLAN Not., 36(7):123-132, 2001.
- (2001) SIGPLAN Not. , vol.36 , Issue.7 , pp. 123-132
- Vetter, J.S.¹ McCracken, M.O.²

51
- 33750427372
- From trace generation to visualization: A performance framework for distributed parallel systems
- C. E. Wu, A. Bolmarcich, M. Snir, D. Wootton, F. Parpia, A. Chan, and E. Lusk. From trace generation to visualization: A performance framework for distributed parallel systems. In SC'00, p. 50, 2000.
- (2000) SC'00 , pp. 50
- Wu, C.E.¹ Bolmarcich, A.² Snir, M.³ Wootton, D.⁴ Parpia, F.⁵ Chan, A.⁶ Lusk, E.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.