-
1
-
-
77952597070
-
Parallelization made easier with Intel Performance-Tuning Utility
-
A. Alexandrov, S. Bratanov, J. Fedorova, D. Levinthal, I. Lopatin, and D. Ryabtsev. Parallelization made easier with Intel Performance-Tuning Utility. Intel Technology Journal, 2007. http://www.intel.com/technology/itj/2007/v11i4// 1-adstract.htm.
-
(2007)
Intel Technology Journal
-
-
Alexandrov, A.1
Bratanov, S.2
Fedorova, J.3
Levinthal, D.4
Lopatin, I.5
Ryabtsev, D.6
-
2
-
-
33646421297
-
-
Version 1.0. Sun Microsystems, Inc.
-
E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. S. Jr., and S. Tobin-Hochstadt. The Fortress Language Specification, Version 1.0. Sun Microsystems, Inc., 2008.
-
(2008)
The Fortress Language Specification
-
-
Allen, E.1
Chase, D.2
Hallett, J.3
Luchangco, V.4
Maessen, J.-W.5
Ryu, S.6
S. Jr., G.L.7
Tobin-Hochstadt, S.8
-
3
-
-
0025567275
-
Quartz: A tool for tuning parallel program performance
-
T. E. Anderson and E. D. Lazowska. Quartz: a tool for tuning parallel program performance. SIGMETRICS Perform. Eval. Rev., 18(1):115-125, 1990.
-
(1990)
SIGMETRICS Perform. Eval. Rev.
, vol.18
, Issue.1
, pp. 115-125
-
-
Anderson, T.E.1
Lazowska, E.D.2
-
4
-
-
0038036149
-
Space-efficient scheduling of multithreaded computations
-
R. D. Blumofe and C. E. Leiserson. Space-efficient scheduling of multithreaded computations. SIAM J. Comput., 27(1):202-229, 1998.
-
(1998)
SIAM J. Comput.
, vol.27
, Issue.1
, pp. 202-229
-
-
Blumofe, R.D.1
Leiserson, C.E.2
-
5
-
-
0000269759
-
Scheduling multithreaded computations by work stealing
-
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720-748, 1999.
-
(1999)
JACM
, vol.46
, Issue.5
, pp. 720-748
-
-
Blumofe, R.D.1
Leiserson, C.E.2
-
7
-
-
0016046965
-
The parallel evaluation of general arithmetic expressions
-
R. P. Brent. The parallel evaluation of general arithmetic expressions. JACM, 21(2):201-206, 1974.
-
(1974)
JACM
, vol.21
, Issue.2
, pp. 201-206
-
-
Brent, R.P.1
-
10
-
-
77954949865
-
-
Available from
-
J. Carr. A parallel bzip2. Available from http://sotware.intel.com/en-us/ articles/a-parallel-bzip2/, 2009.
-
(2009)
A Parallel
-
-
Carr, J.1
-
11
-
-
84974695561
-
A dynamic tracing mechanism for performance analysis of OpenMP applications
-
J. Caubet, J. Gimenez, J. Labarta, L. D. Rose, and J. S. Vetter. A dynamic tracing mechanism for performance analysis of OpenMP applications. In WOMPAT, pp. 53-67, 2001.
-
(2001)
WOMPAT
, pp. 53-67
-
-
Caubet, J.1
Gimenez, J.2
Labarta, J.3
Rose, L.D.4
Vetter, J.S.5
-
12
-
-
33745200313
-
X10: An object-oriented approach to non-uniform cluster computing
-
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, pp. 519-538, 2005.
-
(2005)
OOPSLA
, pp. 519-538
-
-
Charles, P.1
Grothoff, C.2
Saraswat, V.3
Donawa, C.4
Kielstra, A.5
Ebcioglu, K.6
Praun, C.V.7
Sarkar, V.8
-
13
-
-
0004116989
-
-
The MIT Press, third edition
-
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.
-
(2009)
Introduction to Algorithms
-
-
Cormen, T.H.1
Leiserson, C.E.2
Rivest, R.L.3
Stein, C.4
-
14
-
-
0001162786
-
On the partial difference equations of mathematical physics
-
R. Courant, K. Friedrichs, and H. Lewy. On the partial difference equations of mathematical physics. IBM J. R&D, 11(2):215-234, 1967.
-
(1967)
IBM J. R&D
, vol.11
, Issue.2
, pp. 215-234
-
-
Courant, R.1
Friedrichs, K.2
Lewy, H.3
-
15
-
-
84981167256
-
The Dynamic Probe Class Library - An infrastructure for developing instrumentation for performance tools
-
L. DeRose, T. Hoover Jr., and J. K. Hollingsworth. The Dynamic Probe Class Library - an infrastructure for developing instrumentation for performance tools. In IPDPS, p. 10066b, 2001.
-
(2001)
IPDPS
-
-
DeRose, L.1
Hoover Jr., T.2
Hollingsworth, J.K.3
-
16
-
-
0001801746
-
Protocol verification as a hardware design aid
-
D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as a hardware design aid. In ICCD, pp. 522-525, 1992.
-
(1992)
ICCD
, pp. 522-525
-
-
Dill, D.L.1
Drexler, A.J.2
Hu, A.J.3
Yang, C.H.4
-
17
-
-
0024627264
-
Speedup versus efficiency in parallel systems
-
D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEE Trans. Comput., 38(3):408-423, 1989.
-
(1989)
IEEE Trans. Comput.
, vol.38
, Issue.3
, pp. 408-423
-
-
Eager, D.L.1
Zahorjan, J.2
Lazowska, E.D.3
-
18
-
-
70449631676
-
Reducers and other Cilk++ hyperobjects
-
M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other Cilk++ hyperobjects. In SPAA, pp. 79-90, 2009.
-
(2009)
SPAA
, pp. 79-90
-
-
Frigo, M.1
Halpern, P.2
Leiserson, C.E.3
Lewin-Berlin, S.4
-
19
-
-
0033350255
-
Cache-oblivious algorithms
-
New York, New York
-
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS, pp. 285-297, New York, New York, 1999.
-
(1999)
FOCS
, pp. 285-297
-
-
Frigo, M.1
Leiserson, C.E.2
Prokop, H.3
Ramachandran, S.4
-
20
-
-
0031622953
-
The implementation of the Cilk-5 multithreaded language
-
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pp. 212-223, 1998.
-
(1998)
PLDI
, pp. 212-223
-
-
Frigo, M.1
Leiserson, C.E.2
Randall, K.H.3
-
21
-
-
32844463802
-
Cache oblivious stencil computations
-
M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS, pp. 361-366, 2005.
-
(2005)
ICS
, pp. 361-366
-
-
Frigo, M.1
Strumpen, V.2
-
22
-
-
33749564381
-
The cache complexity of multithreaded cache oblivious algorithms
-
M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In SPAA, pp. 271-280, 2006.
-
(2006)
SPAA
, pp. 271-280
-
-
Frigo, M.1
Strumpen, V.2
-
24
-
-
0026284572
-
Performance debugging shared memory multiprocessor programs with MTOOL
-
A. J. Goldberg and J. L. Hennessy. Performance debugging shared memory multiprocessor programs with MTOOL. In SC'91, pp. 481-490, 1991.
-
(1991)
SC'91
, pp. 481-490
-
-
Goldberg, A.J.1
Hennessy, J.L.2
-
25
-
-
84944813080
-
Bounds for certain multiprocessing anomalies
-
R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45:1563-1581, 1966.
-
(1966)
Bell System Technical Journal
, vol.45
, pp. 1563-1581
-
-
Graham, R.L.1
-
26
-
-
0020807223
-
An execution profiler for modular programs
-
S. Graham, P. Kessler, and M. McKusick. An execution profiler for modular programs. Software-Practice and Experience, 13(8):671-685, 1983.
-
(1983)
Software-Practice and Experience
, vol.13
, Issue.8
, pp. 671-685
-
-
Graham, S.1
Kessler, P.2
McKusick, M.3
-
28
-
-
77954004251
-
-
Available from, Document No. 322581-001US
-
Intel Corp. Intel Cilk++ SDK Programmer's Guide, 2009. Available from http://sotware.intel.com/en-us/articles/download-intel-cilk-sdk/Document No. 322581-001US.
-
(2009)
Intel Cilk++ SDK Programmer's Guide
-
-
-
29
-
-
77954930852
-
-
Available from, Document No. 320486-003US
-
Intel Corp. Intel Parallel Amplifier. Available from http://sotware. intel.com/sites/products/documentation/studio/amplifier/en-us/2009/ug-docs/ index.htm. Document No. 320486-003US, 2009.
-
(2009)
Intel Parallel Amplifier
-
-
-
30
-
-
77954936577
-
-
Available from
-
Intel Corp. Intel Thread Profiler. Available from http://sotware.intel. com/en-us/articles/intel-thread-profiler-for-windows-documentation/, 2010.
-
(2010)
Intel Thread Profiler
-
-
-
31
-
-
0034593391
-
A Java fork/join framework
-
D. Lea. A Java fork/join framework. In Java Grande, pp. 36-43, 2000.
-
(2000)
Java Grande
, pp. 36-43
-
-
Lea, D.1
-
32
-
-
72249096886
-
The design of a task parallel library
-
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA, pp. 227-242, 2009.
-
(2009)
OOPSLA
, pp. 227-242
-
-
Leijen, D.1
Schulte, W.2
Burckhardt, S.3
-
33
-
-
77951240770
-
The Cilk++ concurrency platform
-
C. E. Leiserson. The Cilk++ concurrency platform. J. Supercomput., 51(3):244-257, 2010.
-
(2010)
J. Supercomput.
, vol.51
, Issue.3
, pp. 244-257
-
-
Leiserson, C.E.1
-
34
-
-
77954929696
-
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)
-
C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, 2010.
-
(2010)
SPAA
-
-
Leiserson, C.E.1
Schardl, T.B.2
-
35
-
-
31944440969
-
Pin: Building customized program analysis tools with dynamic instrumentation
-
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, pp. 190-200, 2005.
-
(2005)
PLDI
, pp. 190-200
-
-
Luk, C.-K.1
Cohn, R.2
Muth, R.3
Patil, H.4
Klauser, A.5
Lowney, G.6
Wallace, S.7
Reddi, V.J.8
Hazelwood, K.9
-
36
-
-
77950985865
-
Balanced dense polynomial multiplication on multi-cores
-
M. M. Maza and Y. Xie. Balanced dense polynomial multiplication on multi-cores. In PDCAT, pp. 1-9, 2009.
-
(2009)
PDCAT
, pp. 1-9
-
-
Maza, M.M.1
Xie, Y.2
-
37
-
-
77952351456
-
FFT-based dense polynomial arithmetic on multi-cores
-
M. M. Maza and Y. Xie. FFT-based dense polynomial arithmetic on multi-cores. In HPCS, pp. 378-399, 2009.
-
(2009)
HPCS
, pp. 378-399
-
-
Maza, M.M.1
Xie, Y.2
-
38
-
-
0029408429
-
The Paradyn parallel performance measurement tool
-
B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The Paradyn parallel performance measurement tool. IEEE Computer, 28(11):37-46, 1995.
-
(1995)
IEEE Computer
, vol.28
, Issue.11
, pp. 37-46
-
-
Miller, B.P.1
Callaghan, M.D.2
Cargille, J.M.3
Hollingsworth, J.K.4
Irvin, R.B.5
Karavanic, K.L.6
Kunchithapadam, K.7
Newhall, T.8
-
39
-
-
33646427877
-
A performance monitoring interface for OpenMP
-
B. Mohr, A. D. Malony, F. Schlimbach, G. Haab, J. Hoeflinger, and S. Shah. A performance monitoring interface for OpenMP. In IWOMP, 2002.
-
(2002)
IWOMP
-
-
Mohr, B.1
Malony, A.D.2
Schlimbach, F.3
Haab, G.4
Hoeflinger, J.5
Shah, S.6
-
40
-
-
0036679605
-
Design and prototype of a performance tool interface for OpenMP
-
B. Mohr, A. D. Malony, S. Shende, and F. Wolf. Design and prototype of a performance tool interface for OpenMP. J. Supercomput., 23(1):105-128, 2002.
-
(2002)
J. Supercomput.
, vol.23
, Issue.1
, pp. 105-128
-
-
Mohr, B.1
Malony, A.D.2
Shende, S.3
Wolf, F.4
-
41
-
-
33646152753
-
A scalable approach to MPI application performance analysis
-
S. Moore, F. Wolf, J. Dongarra, S. Shende, A. Malony, and B. Mohr. A scalable approach to MPI application performance analysis. In EUROPVMMPI, pp. 309-316, 2005.
-
(2005)
EUROPVMMPI
, pp. 309-316
-
-
Moore, S.1
Wolf, F.2
Dongarra, J.3
Shende, S.4
Malony, A.5
Mohr, B.6
-
42
-
-
33745612838
-
-
version 3.0
-
OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, 2008.
-
(2008)
OpenMP Application Program Interface
-
-
-
43
-
-
85040770718
-
Scalable performance analysis: The Pablo performance analysis environment
-
D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. W. Schwartz, and L. F. Tavera. Scalable performance analysis: The Pablo performance analysis environment. In Scalable Parallel Lib. Conf., pp. 104-113, 1993.
-
(1993)
Scalable Parallel Lib. Conf.
, pp. 104-113
-
-
Reed, D.A.1
Aydt, R.A.2
Noe, R.J.3
Roth, P.C.4
Shields, K.A.5
Schwartz, B.W.6
Tavera, L.F.7
-
47
-
-
67650034867
-
Effective performance measurement and analysis of multithreaded applications
-
N. R. Tallent and J. M. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In PPoPP, pp. 229-240, 2009.
-
(2009)
PPoPP
, pp. 229-240
-
-
Tallent, N.R.1
Mellor-Crummey, J.M.2
-
48
-
-
67650837951
-
Binary analysis for measurement and attribution of program performance
-
N. R. Tallent, J. M. Mellor-Crummey, and M. W. Fagan. Binary analysis for measurement and attribution of program performance. In PLDI, pp. 441-452, 2009.
-
(2009)
PLDI
, pp. 441-452
-
-
Tallent, N.R.1
Mellor-Crummey, J.M.2
Fagan, M.W.3
-
49
-
-
0036036949
-
Dynamic statistical profiling of communication activity in distributed applications
-
J. Vetter. Dynamic statistical profiling of communication activity in distributed applications. In SIGMETRICS, pp. 240-250, 2002.
-
(2002)
SIGMETRICS
, pp. 240-250
-
-
Vetter, J.1
-
50
-
-
0034819519
-
Statistical scalability analysis of communication operations in distributed applications
-
J. S. Vetter and M. O. McCracken. Statistical scalability analysis of communication operations in distributed applications. SIGPLAN Not., 36(7):123-132, 2001.
-
(2001)
SIGPLAN Not.
, vol.36
, Issue.7
, pp. 123-132
-
-
Vetter, J.S.1
McCracken, M.O.2
-
51
-
-
33750427372
-
From trace generation to visualization: A performance framework for distributed parallel systems
-
C. E. Wu, A. Bolmarcich, M. Snir, D. Wootton, F. Parpia, A. Chan, and E. Lusk. From trace generation to visualization: A performance framework for distributed parallel systems. In SC'00, p. 50, 2000.
-
(2000)
SC'00
, pp. 50
-
-
Wu, C.E.1
Bolmarcich, A.2
Snir, M.3
Wootton, D.4
Parpia, F.5
Chan, A.6
Lusk, E.7
|