SCOPUS 정보 검색 플랫폼

Performance Tuning of Scientific Applications

Volumn , Issue , 2010, Pages 1-395

Performance tuning of scientific applications

(3) Bailey, David H a Lucas, Robert F b Williams, Samuel a

a LAWRENCE BERKELEY NATIONAL LABORATORY (United States)

b University of Southern California (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPLICATION PROGRAMS; COMPUTER HARDWARE;

ANALYSIS OF PERFORMANCE; AUTOMATIC PERFORMANCE TUNING; IMPROVING PERFORMANCE; PARALLEL COMPUTER SYSTEMS; PERFORMANCE MONITORING; SCIENCE AND ENGINEERING; SCIENTIFIC APPLICATIONS; SCIENTIFIC COMPUTATION;

BENCHMARKING;

EID: 85054469366 PISSN: None EISSN: None Source Type: Book
DOI: 10.1201/b10509 Document Type: Book

Times cited : (14)

References (392)

1
- 85054422128
- Frequently Asked Questions
- ATLAS Frequently Asked Questions. http://math-atlas.sourceforge.net/faq.html

2
- 85054451643
- BLAS: Basic linear algebra subprograms. http://www.netlib.org/blas

3
- 85054431715
- CactusEinstein toolkit home page. http://www.cactuscode.org/Community/NumericalRelativity

4
- 85054462117
- GEO 600.

5
- 85054448244
- Gnu standard: Formatting error messages. http://www.gnu.org/prep/standards/html_node/Errors.html

6
- 85054445397
- Kranc: Automated code generation.

7
- 85054463898
- LIGO: Laser Interferometer Gravitational wave Observatory.

8
- 85054461122
- LISA: Laser Interferometer Space Antenna.

9
- 85054468897
- Mesh refinement with Carpet.

10
- 85054446144
- Netlib repository. http://www.netlib.org

11
- 85054449561
- Queen Bee, the core supercomputer of LONI
- Queen Bee, the core supercomputer of LONI.

12
- 85054439712
- Sun Constellation Linux Cluster: Ranger.

13
- 85054451362
- Top500 Supercomputer Sites. http://www.top500.org

14
- 85054466880
- Optimizing applications on the Cray X1TM system, 2009. http://docs.cray.com/books/S-2315-50/html-S-2315-50/z1055157958smg.html
- (2009)

15
- 85054468206
- ROSE Web Reference, 2010. http://www.rosecompiler.org
- (2010) ROSE Web Reference

16
- 85054459462
- SciDAC Performance Engineering Research Institute, 2010. http://www.peri-scidac.org/perci
- (2010) SciDAC Performance Engineering Research Institute

17
- 0010540283
- An automatic design optimization tool and its application to computational fluid dynamics
- New York, NY, ACM
- D. Abramson, A. Lewis, T. Peachey, and C. Fletcher. An automatic design optimization tool and its application to computational fluid dynamics. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC01), pages 25-25, New York, NY, 2001. ACM.
- (2001) Proceedings of the ACM/IEEE Conference on Supercomputing (SC01) , pp. 25
- Abramson, D.¹ Lewis, A.² Peachey, T.³ Fletcher, C.⁴

18
- 85054452279
- February 11
- E.M. Abreu. Gordon Moore sees another decade for Moore’s Law. February 11 2003. http://www.cnn.com/2003/TECH/biztechj/02/11/moore.law.reut
- (2003) Gordon Moore sees another decade for Moore’s Law
- Abreu, E.M.¹

19
- 2142697354
- A distributed memory unstructured Gauss-Seidel algorithm for multigrid smoothers
- Denver, CO, November
- M.F. Adams. A distributed memory unstructured Gauss-Seidel algorithm for multigrid smoothers. In ACM/IEEE Proceedings of SC2001: High Performance Networking and Computing, Denver, CO, November 2001.
- (2001) ACM/IEEE Proceedings of SC2001: High Performance Networking and Computing
- Adams, M.F.¹

20
- 84934276784
- Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom
- M.F. Adams, H.H. Bayraktar, T.M. Keaveny, and P. Papadopoulos. Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom. In ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing, 2004. 356
- (2004) ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing , pp. 356
- Adams, M.F.¹ Bayraktar, H.H.² Keaveny, T.M.³ Papadopoulos, P.⁴

21
- 0037670448
- Parallel multigrid smoothing: Polynomial versus Gauss-Seidel
- M.F. Adams, M. Brezina, J. J. Hu, and R.S. Tuminaro. Parallel multigrid smoothing: Polynomial versus Gauss-Seidel. Journal of Computational Physics, 188(2): 593-610, 2003.
- (2003) Journal of Computational Physics , vol.188 , Issue.2 , pp. 593-610
- Adams, M.F.¹ Brezina, M.² Hu, J.J.³ Tuminaro, R.S.⁴

22
- 77950611743
- HPCToolkit: Tools for performance analysis of optimized parallel programs
- L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N.R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 2010. http://dx.doi.org/10.1002/cpe.1553
- (2010) Concurrency and Computation: Practice and Experience
- Adhianto, L.¹ Banerjee, S.² Fagan, M.³ Krentel, M.⁴ Marin, G.⁵ Mellor-Crummey, J.⁶ Tallent, N.R.⁷

23
- 78649880246
- Effectively presenting call path profiles of application performance
- L. Adhianto, J. Mellor-Crummey, and N.R. Tallent. Effectively presenting call path profiles of application performance. In Proceedings of the 2010 Workshop on Parallel Software Tools and Tool Infrastructures, held in conjunction with the 2010 International Conference on Parallel Processing, 2010.
- (2010) Proceedings of the 2010 Workshop on Parallel Software Tools and Tool Infrastructures, held in conjunction with the 2010 International Conference on Parallel Processing
- Adhianto, L.¹ Mellor-Crummey, J.² Tallent, N.R.³

24
- 12444309581
- November
- N.R. Adiga, et al. An overview of the BlueGene/L supercomputer. November 2002.
- (2002) An overview of the BlueGene/L supercomputer
- Adiga, N.R.¹

25
- 63449134093
- Advanced Micro Devices. AMD CodeAnalyst performance analyzer.http://developer.amd.com/cpu/codeanalyst
- AMD CodeAnalyst performance analyzer

26
- 0038102538
- Gauge conditions for long-term numerical black hole evolutions without excision
- M. Alcubierre, B. Brügmann, P. Diener, M. Koppitz, D. Pollney, E. Seidel, and R. Takahashi. Gauge conditions for long-term numerical black hole evolutions without excision. Physical Review D, 67: 084023, 2003.
- (2003) Physical Review D , vol.67 , pp. 084023
- Alcubierre, M.¹ Brügmann, B.² Diener, P.³ Koppitz, M.⁴ Pollney, D.⁵ Seidel, E.⁶ Takahashi, R.⁷

27
- 17144408376
- Towards a stable numerical evolution of strongly gravitating systems in general relativity: The conformal treatments
- M. Alcubierre, B. Brügmann, T. Dramlitsch, J.A. Font, P. Papadopoulos, E. Seidel, N. Stergioulas, and R. Takahashi. Towards a stable numerical evolution of strongly gravitating systems in general relativity: The conformal treatments. Physical Review D, 62: 044034, 2000.
- (2000) Physical Review D , vol.62 , pp. 044034
- Alcubierre, M.¹ Brügmann, B.² Dramlitsch, T.³ Font, J.A.⁴ Papadopoulos, P.⁵ Seidel, E.⁶ Stergioulas, N.⁷ Takahashi, R.⁸

28
- 0008688174
- A.S. Almgren, J.B. Bell, P. Colella, L.H. Howell, and M. Welcome. A conservative adaptive projection method for the variable density incompressible Navier-Stokes equations. 142: 1-46, 1998.
- (1998) A conservative adaptive projection method for the variable density incompressible Navier-Stokes equations , vol.142 , pp. 1-46
- Almgren, A.S.¹ Bell, J.B.² Colella, P.³ Howell, L.H.⁴ Welcome, M.⁵

29
- 85054451287
- Alpaca: Cactus tools for application-level profiling and correctness analysis. http://www.cct.lsu.edu/~eschnett/Alpaca

30
- 77952572316
- AMD. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors, 2007.
- (2007) Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors

31
- 0030645124
- Exploiting hardware performance counterswith flow and context sensitive profiling
- New York, NY, USA, ACM
- G. Ammons, T. Ball, and J.R. Larus. Exploiting hardware performance counterswith flow and context sensitive profiling. In SIGPLAN Conference on Programming Language Design and Implementation, pages 85-96, New York, NY, USA, 1997. ACM.
- (1997) SIGPLAN Conference on Programming Language Design and Implementation , pp. 85-96
- Ammons, G.¹ Ball, T.² Larus, J.R.³

32
- 0031270220
- Continuous profiling: Where have all the cycles gone?
- J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S-T A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl. Continuous profiling: Where have all the cycles gone? ACM Transactions on Computer Systems, 15(4): 357-390, 1997.
- (1997) ACM Transactions on Computer Systems , vol.15 , Issue.4 , pp. 357-390
- Anderson, J.M.¹ Berc, L.M.² Dean, J.³ Ghemawat, S.⁴ Henzinger, M.R.⁵ Leung, S.-T.A.⁶ Sites, R.L.⁷ Vandevoorde, M.T.⁸ Waldspurger, C.A.⁹ Weihl, W.E.¹⁰

33
- 85054464091
- home page
- Astrophysics Simulation Collaboratory (ASC) home page.

34
- 70350635626
- An extension of the StarSs programming model for platforms with multiple GPUs
- Spinger
- E. Ayguade, R.M. Badia, F.D. Igual, J. Labarta, R. Mayo, and E.S. Quintana-Orti. An extension of the StarSs programming model for platforms with multiple GPUs. In Procs. of the 15th international Euro-Par Conference (Euro-Par 2009), pages 851-862. Spinger, 2009.
- (2009) Procs. of the 15th international Euro-Par Conference (Euro-Par 2009) , pp. 851-862
- Ayguade, E.¹ Badia, R.M.² Igual, F.D.³ Labarta, J.⁴ Mayo, R.⁵ Quintana-Orti, E.S.⁶

35
- 85054455825
- June
- R. Azimi, M. Stumm, and R. Wisniewski. Online performance analysis by statistical sampling of microprocessor performance counters. June 2005.
- (2005) Online performance analysis by statistical sampling of microprocessor performance counters
- Azimi, R.¹ Stumm, M.² Wisniewski, R.³

36
- 85054439613
- September
- L. Bachega, S. Chatterjee, K. Dockser, J. Gunnels, M. Gupta, F. Gustavson, C. Lapkowski, G. Liu, M. Mendell, C. Wait, and T.J.C. Ward. A high-performance SIMD floating point unit design for BlueGene/L: Architecture, compilation, and algorithm design. September 2004.
- (2004) A high-performance SIMD floating point unit design for BlueGene/L: Architecture, compilation, and algorithm design
- Bachega, L.¹ Chatterjee, S.² Dockser, K.³ Gunnels, J.⁴ Gupta, M.⁵ Gustavson, F.⁶ Lapkowski, C.⁷ Liu, G.⁸ Mendell, M.⁹ Wait, C.¹⁰ Ward, T.J.C.¹¹

37
- 33646425180
- Programming grid applications with grid superscalar
- R. Badia, J. Labarta, R. Sirvent, J.M. Perez, J.M. .Cela, and R. Grima. Programming grid applications with grid superscalar. Journal of Grid Computing, 1(2): 151-170, 2003.
- (2003) Journal of Grid Computing , vol.1 , Issue.2 , pp. 151-170
- Badia, R.¹ Labarta, J.² Sirvent, R.³ Perez, J.M.⁴ Cela, J.M.⁵ Grima, R.⁶

38
- 0002404913
- The NAS parallel benchmarks
- D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishman, and S. Weeratunga. The NAS parallel benchmarks. International Journal of Supercomputer Applications, 5: 66-73, 1991.
- (1991) International Journal of Supercomputer Applications , vol.5 , pp. 66-73
- Bailey, D.¹ Barszcz, E.² Barton, J.³ Browning, D.⁴ Carter, R.⁵ Dagum, L.⁶ Fatoohi, R.⁷ Fineberg, S.⁸ Frederickson, P.⁹ Lasinski, T.¹⁰ Schreiber, R.¹¹ Simon, H.¹² Venkatakrishman, V.¹³ Weeratunga, S.¹⁴

39
- 0041638552
- Twelve ways to fool the masses when giving performance results on parallel computers
- August
- D.H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputing Review, pages 54-55, August 1991.
- (1991) Supercomputing Review , pp. 54-55
- Bailey, D.H.¹

40
- 34147135028
- Misleading performance reporting in the supercomputing field
- D.H. Bailey. Misleading performance reporting in the supercomputing field. Scientific Programming, 1: 141-151, 1992.
- (1992) Scientific Programming , vol.1 , pp. 141-151
- Bailey, D.H.¹

41
- 4243648129
- D.H. Bailey. Little’s law and high performance computing, 1997. http://crd.lbl.gov/~dhbailey/dhbpapers/little.pdf
- (1997) Little’s law and high performance computing
- Bailey, D.H.¹

42
- 34548276622
- D.H. Bailey and A.S. Snavely. Performance modeling: Understanding the present and predicting the future, 2005.
- (2005) Performance modeling: Understanding the present and predicting the future
- Bailey, D.H.¹ Snavely, A.S.²

43
- 0003660984
- PETSc users manual
- Argonne National Laboratory
- S. Balay, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, B.F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 -Revision 3.0.0, Argonne National Laboratory, 2008.
- (2008) Technical Report ANL-95/11 -Revision 3.0.0
- Balay, S.¹ Buschelman, K.² Eijkhout, V.³ Gropp, W.D.⁴ Kaushik, D.⁵ Knepley, M.G.⁶ McInnes, L.C.⁷ Smith, B.F.⁸ Zhang, H.⁹

44
- 67650069905
- Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors
- Raleigh, North Carolina, February
- M.M. Baskaran, N. Vydyanathan, U. Bonkhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, North Carolina, February 2009.
- (2009) 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Baskaran, M.M.¹ Vydyanathan, N.² Bonkhugula, U.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

45
- 33645441190
- Micromechanics of the human vertebral body
- San Francisco
- H.H. Bayraktar, M.F. Adams, P.F. Hoffmann, D.C. Lee, A. Gupta., P. Papadopoulos, and T.M. Keaveny. Micromechanics of the human vertebral body. In Transactions of the Orthopaedic Research Society, volume 29, page 1129, San Francisco, 2004.
- (2004) Transactions of the Orthopaedic Research Society , vol.29 , pp. 1129
- Bayraktar, H.H.¹ Adams, M.F.² Hoffmann, P.F.³ Lee, D.C.⁴ Gupta, A.⁵ Papadopoulos, P.⁶ Keaveny, T.M.⁷

46
- 38449102073
- Timestamp synchronization for event traces of large-scale message-passing applications
- Paris
- D. Becker, R. Rabenseifner, and F. Wolf. Timestamp synchronization for event traces of large-scale message-passing applications. In Proceedings of 14th European PVM and MPI Conference (EuroPVM/MPI), pages 315-325, Paris, 2007.
- (2007) Proceedings of 14th European PVM and MPI Conference (EuroPVM/MPI) , pp. 315-325
- Becker, D.¹ Rabenseifner, R.² Wolf, F.³

47
- 37549015666
- Bell’s law for the birth and death of computer classes
- January
- G. Bell. Bell’s law for the birth and death of computer classes. Communications of the ACM, 5(1): 86-94, January 2008.
- (2008) Communications of the ACM , vol.5 , Issue.1 , pp. 86-94
- Bell, G.¹

48
- 0000843403
- J. Bell, M. Berger, J. Saltzman, and M. Welcome. A three-dimensional adaptive mesh refinement for hyperbolic conservation laws. 15(1): 127-138, 1994.
- (1994) A three-dimensional adaptive mesh refinement for hyperbolic conservation laws , vol.15 , Issue.1 , pp. 127-138
- Bell, J.¹ Berger, M.² Saltzman, J.³ Welcome, M.⁴

49
- 85054429184
- A portable, extensible, and scalable tool for parallel performance profile analysis
- R. Bell, A. Malony, and S. Shende. A portable, extensible, and scalable tool for parallel performance profile analysis. In Proceedings of European Conference on Parallel Computing, 2003.
- (2003) Proceedings of European Conference on Parallel Computing
- Bell, R.¹ Malony, A.² Shende, S.³

50
- 34548265764
- CellSs: A programming model for the Cell BE architecture
- P. Bellens, J.M. Perez, R.M. Badia, and J. Labarta. CellSs: A programming model for the Cell BE architecture. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC06), 2006.
- (2006) Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC06)
- Bellens, P.¹ Perez, J.M.² Badia, R.M.³ Labarta, J.⁴

51
- 11744289966
- Local adaptive mesh refinement for shock hydrodynamics
- May
- M.J. Berger and P. Colella. Local adaptive mesh refinement for shock hydrodynamics. Journal of Computational Physics, 82(1): 64-84, May 1989.
- (1989) Journal of Computational Physics , vol.82 , Issue.1 , pp. 64-84
- Berger, M.J.¹ Colella, P.²

52
- 48749141209
- Adaptive mesh refinement for hyperbolic partial differential equations
- M.J. Berger and J. Oliger. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics, 53: 484-512, 1984.
- (1984) Journal of Computational Physics , vol.53 , pp. 484-512
- Berger, M.J.¹ Oliger, J.²

53
- 0029428752
- Lattice QCD on the IBM scalable POWERParallel systems SP2
- San Diego, California, November
- C. Bernard, C. DeTar, S. Gottlieb, U.M. Heller, J. Hetrick, N. Ishizuka, L. Kärkkäinen, S.R. Lantz, K. Rummukainen, R. Sugar, D. Toussaint, and M. Wingate. Lattice QCD on the IBM scalable POWERParallel systems SP2. In ACM/IEEE Proceedings of SC 1995: High Performance Networking and Computing, San Diego, California, November 1995.
- (1995) ACM/IEEE Proceedings of SC 1995: High Performance Networking and Computing
- Bernard, C.¹ DeTar, C.² Gottlieb, S.³ Heller, U.M.⁴ Hetrick, J.⁵ Ishizuka, N.⁶ Kärkkäinen, L.⁷ Lantz, S.R.⁸ Rummukainen, K.⁹ Sugar, R.¹⁰ Toussaint, D.¹¹ Wingate, M.¹²

54
- 23844515651
- A component architecture for high-performance scientific computing
- ACTS Collection Special Issue
- D.E. Bernholdt, B.A. Allan, R. Armstrong, F. Bertrand, K. Chiu, T.L. Dahlgren, K. Damevski, W.R. Elwasif, T.G.W. Epperly, M. Govindaraju, D.S. Katz, J.A. Kohl, M. Krishnan, G. Kumfert, J.W. Larson, S. Lefantzi, M.J. Lewis, A.D. Malony, L.C. McInnes, J. Nieplocha, B. Norris, S.G. Parker, J. Ray, S. Shende, T.L. Windus, and S. Zhou. A component architecture for high-performance scientific computing. Intl. Journal of High-Performance Computing Applications, ACTS Collection Special Issue, 2005.
- (2005) Intl. Journal of High-Performance Computing Applications
- Bernholdt, D.E.¹ Allan, B.A.² Armstrong, R.³ Bertrand, F.⁴ Chiu, K.⁵ Dahlgren, T.L.⁶ Damevski, K.⁷ Elwasif, W.R.⁸ Epperly, T.G.W.⁹ Govindaraju, M.¹⁰ Katz, D.S.¹¹ Kohl, J.A.¹² Krishnan, M.¹³ Kumfert, G.¹⁴ Larson, J.W.¹⁵ Lefantzi, S.¹⁶ Lewis, M.J.¹⁷ Malony, A.D.¹⁸ McInnes, L.C.¹⁹ Nieplocha, J.²⁰ more..

55
- 0030661485
- Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology
- Vienna, Austria
- J. Bilmes, K. Asanovic, C-W Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In International Conference on Supercomputing, pages 340-347, Vienna, Austria, 1997.
- (1997) International Conference on Supercomputing , pp. 340-347
- Bilmes, J.¹ Asanovic, K.² Chin, C.-W.³ Demmel, J.⁴

56
- 1242267320
- Cambridge University Press, U.K
- D. Biskamp. Magnetohydrodynamic Turbulence. Cambridge University Press, U.K., 2003.
- (2003) Magnetohydrodynamic Turbulence
- Biskamp, D.¹

57
- 0033407555
- An energy-conserving thermodynamic model of sea ice
- C.M. Bitz and W.H. Lipscomb. An energy-conserving thermodynamic model of sea ice. Journal of Geophysical Research, 104: 15669-15677, 1999.
- (1999) Journal of Geophysical Research , vol.104 , pp. 15669-15677
- Bitz, C.M.¹ Lipscomb, W.H.²

58
- 0003615167
- SIAM, Philadelphia
- L.S. Blackford, J. Choi, A. Cleary, E. DAzevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walke, and R.C. Whaley. ScaLAPACK Users Guide. SIAM, Philadelphia, 1997.
- (1997) ScaLAPACK Users Guide
- Blackford, L.S.¹ Choi, J.² Cleary, A.³ Dazevedo, E.⁴ Demmel, J.⁵ Dhillon, I.⁶ Dongarra, J.⁷ Hammarling, S.⁸ Henry, G.⁹ Petitet, A.¹⁰ Stanley, K.¹¹ Walke, D.¹² Whaley, R.C.¹³

59
- 0030382364
- Parallel programming with polaris
- December
- W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with polaris. Computer, 29(12), December 1996.
- (1996) Computer , vol.29 , Issue.12
- Blume, W.¹ Doallo, R.² Eigenmann, R.³ Grout, J.⁴ Hoeflinger, J.⁵ Lawrence, T.⁶ Lee, J.⁷ Padua, D.⁸ Paek, Y.⁹ Pottenger, B.¹⁰ Rauchwerger, L.¹¹ Tu, P.¹²

60
- 67650079888
- A practical automatic polyhedral parallelizer and locality optimizer
- June
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.
- (2008) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

61
- 0032046628
- Performance modeling for SPMD messagepassing programs
- J. Brehm, P.H. Worley, and M. Madhukar. Performance modeling for SPMD messagepassing programs. Concurrency: Practice and Experience, 10(5): 333-357, 1998.
- (1998) Concurrency: Practice and Experience , vol.10 , Issue.5 , pp. 333-357
- Brehm, J.¹ Worley, P.H.² Madhukar, M.³

62
- 62549150832
- Turduckening black holes: An analytical and computational study
- D. Brown, P. Diener, O. Sarbach, E. Schnetter, and M. Tiglio. Turduckening black holes: An analytical and computational study. Physical Review D (submitted), 2008.
- (2008) Physical Review D (submitted)
- Brown, D.¹ Diener, P.² Sarbach, O.³ Schnetter, E.⁴ Tiglio, M.⁵

63
- 0033708935
- Semicoarsening multigrid on distributed memory machines
- P.N. Brown, R.D. Falgout, and J.E. Jones. Semicoarsening multigrid on distributed memory machines. SIAM Journal on Scientific Computing, 21(5): 1823-1834, 2000.
- (2000) SIAM Journal on Scientific Computing , vol.21 , Issue.5 , pp. 1823-1834
- Brown, P.N.¹ Falgout, R.D.² Jones, J.E.³

64
- 0034268943
- A portable programming interface for performance evaluation on modern processors
- S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications, 14(4): 189-204, 2000.
- (2000) The International Journal of High Performance Computing Applications , vol.14 , Issue.4 , pp. 189-204
- Browne, S.¹ Dongarra, J.² Garner, N.³ Ho, G.⁴ Mucci, P.⁵

65
- 0034268943
- A portable programming interface for performance evaluation on modern processors
- S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications, 14(3): 189-204, 2000.
- (2000) International Journal of High Performance Computing Applications , vol.14 , Issue.3 , pp. 189-204
- Browne, S.¹ Dongarra, J.² Garner, N.³ Ho, G.⁴ Mucci, P.⁵

66
- 0242339524
- Online remote trace analysis of parallel applications on high-performance clusters
- Springer
- H. Brunst, A.D. Malony, S. Shende, and R. Bell. Online remote trace analysis of parallel applications on high-performance clusters. In Proceedings of the ISHPC Conference (LNCS 2858), pages 440-449. Springer, 2003.
- (2003) Proceedings of the ISHPC Conference (LNCS 2858) , pp. 440-449
- Brunst, H.¹ Malony, A.D.² Shende, S.³ Bell, R.⁴

67
- 38049022805
- Scalable performance analysis of parallel systems: Concepts and experiences
- H. Brunst and W.E. Nagel. Scalable performance analysis of parallel systems: Concepts and experiences. In Parallel Computing: Software, Alghorithms, Architectures Applications, pages 737-744, 2003.
- (2003) Parallel Computing: Software, Alghorithms, Architectures Applications , pp. 737-744
- Brunst, H.¹ Nagel, W.E.²

68
- 84908429370
- A distributed performance analysis architecture for clusters
- IEEE Computer Society
- H. Brunst, W.E. Nagel, and A.D. Malony. A distributed performance analysis architecture for clusters. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2003), pages 73-83. IEEE Computer Society, 2003.
- (2003) Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2003) , pp. 73-83
- Brunst, H.¹ Nagel, W.E.² Malony, A.D.³

69
- 0034543798
- An API for runtime code patching
- Winter
- B. Buck and J.K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14(4): 317-329, Winter 2000.
- (2000) The International Journal of High Performance Computing Applications , vol.14 , Issue.4 , pp. 317-329
- Buck, B.¹ Hollingsworth, J.K.²

70
- 85054454180
- Perfexpert: An automated HPC performance measurement and analysis tool with optimization recommendations
- New York, NY, November, ACM
- M. Burtscher, B.D. Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne. Perfexpert: An automated HPC performance measurement and analysis tool with optimization recommendations. In Proceedings of ACM/IEEE Conference on Supercomputing (SC10), New York, NY, November 2010. ACM.
- (2010) Proceedings of ACM/IEEE Conference on Supercomputing (SC10)
- Burtscher, M.¹ Kim, B.D.² Diamond, J.³ McCalpin, J.⁴ Koesterke, L.⁵ Browne, J.⁶

71
- 58149269099
- A class of parallel tiled linear algebra algorithms for multicore architectures
- A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing, 35(1): 38-53, 2009.
- (2009) Parallel Computing , vol.35 , Issue.1 , pp. 38-53
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

72
- 85054441479
- home page
- Cactus computational toolkit home page. http://www.cactuscode.org

73
- 0000493064
- Estimating interlock and improving balance for pipelined architectures
- D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined architectures. Journal of Parallel and Distributed Computing, 5(4): 334-358, 1988.
- (1988) Journal of Parallel and Distributed Computing , vol.5 , Issue.4 , pp. 334-358
- Callahan, D.¹ Cocke, J.² Kennedy, K.³

74
- 31144456100
- A parallel chemical reactor simulation using Cactus
- K. Camarda, Y. He, and K.A. Bishop. A parallel chemical reactor simulation using Cactus. In Proceedings of Linux Clusters: The HPC Revolution, NCSA, 2001.
- (2001) Proceedings of Linux Clusters: The HPC Revolution, NCSA
- Camarda, K.¹ He, Y.² Bishop, K.A.³

75
- 4243606192
- R. Car and M. Parrinello. Physics Review Letters, 55, 2471, 1985.
- (1985) Physics Review Letters , vol.55 , pp. 2471
- Car, R.¹ Parrinello, M.²

76
- 0003510632
- Introduction to upc and language specification
- 17100 Science Dr., Bowie, MD 20715, May
- W.W. Carlson, J.M. Draper, D.E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to upc and language specification. Technical Report CCS-TR-99-157, Center for Computing Sciences, 17100 Science Dr., Bowie, MD 20715, May 1999.
- (1999) Technical Report CCS-TR-99-157, Center for Computing Sciences
- Carlson, W.W.¹ Draper, J.M.² Culler, D.E.³ Yelick, K.⁴ Brooks, E.⁵ Warren, K.⁶

77
- 0028549474
- Improving the ratio of memory operations to floating-point operations in loops
- S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6): 1768-1810, 1994.
- (1994) ACM Transactions on Programming Languages and Systems , vol.16 , Issue.6 , pp. 1768-1810
- Carr, S.¹ Kennedy, K.²

78
- 27144483997
- A performance prediction framework for scientific applications
- June
- L. Carrington, A. Snavely, X. Gao, and N. Wolter. A performance prediction framework for scientific applications. ICCS Workshop on Performance Modeling and Analysis (PMA03), June 2003.
- (2003) ICCS Workshop on Performance Modeling and Analysis (PMA03)
- Carrington, L.¹ Snavely, A.² Gao, X.³ Wolter, N.⁴

79
- 34250161860
- Applying an automated framework to produce accurate blind performance predictions of full-scale HPC applications
- June
- L. Carrington, N. Wolter, A. Snavely, and C.B. Lee. Applying an automated framework to produce accurate blind performance predictions of full-scale HPC applications. DoD Users Group Conference (UGC2004), June 2004.
- (2004) DoD Users Group Conference (UGC2004)
- Carrington, L.¹ Wolter, N.² Snavely, A.³ Lee, C.B.⁴

80
- 57349096291
- Automatic analysis of speedup of MPI applications
- M. Casas, R. Badia, and J. Labarta. Automatic analysis of speedup of MPI applications. In Proceedings of the 22nd ACM International Conference on Supercomputing (ICS), pages 349-358, 2008.
- (2008) Proceedings of the 22nd ACM International Conference on Supercomputing (ICS) , pp. 349-358
- Casas, M.¹ Badia, R.² Labarta, J.³

81
- 38049152838
- Automatic structure extraction from MPI applications tracefiles
- M. Casas, R.M. Badia, and J. Labarta. Automatic structure extraction from MPI applications tracefiles. In European Conference on Parallel Computing, pages 3-12, 2007.
- (2007) European Conference on Parallel Computing , pp. 3-12
- Casas, M.¹ Badia, R.M.² Labarta, J.³

82
- 85054448359
- Analyzing the temporal behavior of application using spectral analysis
- M. Casas, H. Servat, R.M. Badia, and J. Labarta. Analyzing the temporal behavior of application using spectral analysis. In Research Report UPC-RR-CAP-2009-14, 2009.
- (2009) Research Report UPC-RR-CAP-2009-14
- Casas, M.¹ Servat, H.² Badia, R.M.³ Labarta, J.⁴

83
- 33646073716
- Multiple page size modeling and optimization
- 17-21 September
- C. Cascaval, E. Duesterwald, P.F. Sweeney, and R.W. Wisniewski. Multiple page size modeling and optimization. Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pages 339-349, 17-21 September 2005.
- (2005) Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on , pp. 339-349
- Cascaval, C.¹ Duesterwald, E.² Sweeney, P.F.³ Wisniewski, R.W.⁴

84
- 85054462630
- CCSM Software Engineering Group. http://www.ccsm.ucar.edu/cseg

85
- 85054427463
- CCSM Software Engineering Working Group. http://www.ccsm.ucar.edu/csm/working_groups/Software

86
- 85054462150
- National Energy Research Scientific Computing Center. Parallel total energy code, 2009.
- (2009) Parallel total energy code

87
- 36048968626
- PhD thesis, University of Southern California
- C. Chen. Model-Guided Empirical Optimization for Memory Hierarchy. PhD thesis, University of Southern California, 2007.
- (2007) Model-Guided Empirical Optimization for Memory Hierarchy
- Chen, C.¹

88
- 36049036039
- March
- C. Chen, J. Chame, and M. Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. March 2005.
- (2005) Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
- Chen, C.¹ Chame, J.² Hall, M.³

89
- 70449959487
- Technical Report 08-897, University of Southern California, June
- C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, University of Southern California, June 2008.
- (2008) CHiLL: A framework for composing high-level loop transformations
- Chen, C.¹ Chame, J.² Hall, M.³

90
- 33646828918
- Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
- March
- C. Chen, J. Chame, and M.W. Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In Proceedings of the International Symposium on Code Generation and Optimization, March 2005.
- (2005) Proceedings of the International Symposium on Code Generation and Optimization
- Chen, C.¹ Chame, J.² Hall, M.W.³

91
- 77954007684
- April
- D. Chen, N. Vachharajani, R. Hundt, S.W. Liao, V. Ramasamy, P. Yuan, W. Chen, and W. Zheng. Taming hardware event samples for FDO compilation. pages 42-53, April 2010.
- (2010) Taming hardware event samples for FDO compilation , pp. 42-53
- Chen, D.¹ Vachharajani, N.² Hundt, R.³ Liao, S.W.⁴ Ramasamy, V.⁵ Yuan, P.⁶ Chen, W.⁷ Zheng, W.⁸

92
- 0029204978
- Scalable linear algebra software libraries for distributed memory concurrent computers
- Washington, DC, USA, IEEE Computer Society
- J. Choi and J.J. Dongarra. Scalable linear algebra software libraries for distributed memory concurrent computers. In FTDCS '95: Proceedings of the 5th IEEE Workshop on Future Trends of Distributed Computing Systems, page 170, Washington, DC, USA, 1995. IEEE Computer Society.
- (1995) FTDCS '95: Proceedings of the 5th IEEE Workshop on Future Trends of Distributed Computing Systems , pp. 170
- Choi, J.¹ Dongarra, J.J.²

93
- 84934324812
- Using Information from Prior Runs to Improve Automated Tuning Systems
- Washington, DC, USA, IEEE Computer Society
- I-H Chung and J.K. Hollingsworth. Using Information from Prior Runs to Improve Automated Tuning Systems. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing (SC04), page 30, Washington, DC, USA, 2004. IEEE Computer Society.
- (2004) Proceedings of the 2004 ACM/IEEE conference on Supercomputing (SC04) , pp. 30
- Chung, I.-H.¹ Hollingsworth, J.K.²

94
- 33845876030
- A case study using automatic performance tuning for large-scale scientific programs
- I.-H. Chung and J.K. Hollingsworth. A case study using automatic performance tuning for large-scale scientific programs. In High Performance Distributed Computing, 2006 15th IEEE International Symposium on High Performance Distributed Computing, pages 45-56, 2006.
- (2006) High Performance Distributed Computing, 2006 15th IEEE International Symposium on High Performance Distributed Computing , pp. 45-56
- Chung, I.-H.¹ Hollingsworth, J.K.²

95
- 34548010778
- Scalability analysis of SPMD codes using expectations
- New York, NY, ACM
- C. Coarfa, J. Mellor-Crummey, N. Froyd, and Y. Dotsenko. Scalability analysis of SPMD codes using expectations. In ICS '07: Proceedings of the 21st annual International Conference on Supercomputing, pages 13-22, New York, NY, 2007. ACM.
- (2007) ICS '07: Proceedings of the 21st annual International Conference on Supercomputing , pp. 13-22
- Coarfa, C.¹ Mellor-Crummey, J.² Froyd, N.³ Dotsenko, Y.⁴

96
- 26844455510
- Multidimensional Upwind Methods for Hyperbolic Conservation Laws
- P. Colella. Multidimensional Upwind Methods for Hyperbolic Conservation Laws. Journal of Computational Physics, 87: 171-200, 1990.
- (1990) Journal of Computational Physics , vol.87 , pp. 171-200
- Colella, P.¹

97
- 85054466975
- A limiter for PPM that preserves accuracy at smooth extrema
- submitted
- P. Colella and M.D. Sekora. A limiter for PPM that preserves accuracy at smooth extrema. Journal of Computational Physics, submitted.
- Journal of Computational Physics
- Colella, P.¹ Sekora, M.D.²

98
- 33744490657
- The community climate system model version 3 (CCSM3)
- W.D. Collins, C.M. Bitz, M.L. Blackmon, G.B. Bonan, C.S. Bretherton, J.A. Carton, P. Chang, S.C. Doney, J.H. Hack, T.B. Henderson, J.T. Kiehl, W.G. Large, D.S. McKenna, B.D. Santer, and R.D. Smith. The community climate system model version 3 (CCSM3). Journal of Climate, 19(11): 2122-2143, 2006.
- (2006) Journal of Climate , vol.19 , Issue.11 , pp. 2122-2143
- Collins, W.D.¹ Bitz, C.M.² Blackmon, M.L.³ Bonan, G.B.⁴ Bretherton, C.S.⁵ Carton, J.A.⁶ Chang, P.⁷ Doney, S.C.⁸ Hack, J.H.⁹ Henderson, T.B.¹⁰ Kiehl, J.T.¹¹ Large, W.G.¹² McKenna, D.S.¹³ Santer, B.D.¹⁴ Smith, R.D.¹⁵

99
- 1842826742
- NCAR Tech Note NCAR/TN-464+STR, National Center for Atmospheric Research, Boulder, CO 80307
- W.D. Collins and P.J. Rasch, et al. Description of the NCAR community atmosphere model (CAM 3.0). NCAR Tech Note NCAR/TN-464+STR, National Center for Atmospheric Research, Boulder, CO 80307, 2004.
- (2004) Description of the NCAR community atmosphere model (CAM 3.0)
- Collins, W.D.¹ Rasch, P.J.²

100
- 33947636363
- The formulation and atmospheric simulation of the community atmosphere model: CAM3
- W.D. Collins, et al. The formulation and atmospheric simulation of the community atmosphere model: CAM3. Journal of Climate, 2005.
- (2005) Journal of Climate
- Collins, W.D.¹

101
- 85054469230
- Community Climate System Model. http://www.ccsm.ucar.edu

102
- 0036679993
- Adaptive optimizing compilers for the 21st century
- August
- K.D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. The Journal of Supercomputing, 23(1): 7-22, August 2002.
- (2002) The Journal of Supercomputing , vol.23 , Issue.1 , pp. 7-22
- Cooper, K.D.¹ Subramanian, D.² Torczon, L.³

103
- 85117245869
- Active harmony: Towards automated performance tuning
- Los Alamitos, CA, USA, IEEE Computer Society Press
- C. Ţăpuş, I-H Chung, and J.K. Hollingsworth. Active harmony: Towards automated performance tuning. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC02), pages 1-11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
- (2002) Proceedings of the ACM/IEEE Conference on Supercomputing (SC02) , pp. 1-11
- Ţăpuş, C.¹ Chung, I.-H.² Hollingsworth, J.K.³

104
- 0003662159
- Morgan Kaufmann, San Francisco
- D. Culler, J.P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Francisco, 1999.
- (1999) Parallel Computer Architecture: A Hardware/Software Approach
- Culler, D.¹ Singh, J.P.² Gupta, A.³

105
- 70350720198
- A.N. Cutler. A history of the speed of light. 2001. http://www.sigma-engineering.co.uk/light/lightindex.shtml
- (2001) A history of the speed of light
- Cutler, A.N.¹

106
- 0002806690
- OpenMP: An industry-standard API for shared-memory programming
- January/March
- L. Dagum and R. Menon. OpenMP: an industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1): 46-55, January/March 1998.
- (1998) IEEE Computational Science and Engineering , vol.5 , Issue.1 , pp. 46-55
- Dagum, L.¹ Menon, R.²

107
- 33845393854
- Transformations to parallel codes for communication-computation overlap
- November
- A. Danalis, K. Kim, L. Pollock, and M. Swany. Transformations to parallel codes for communication-computation overlap. In Proceedings of IEEE/ACM Conference on Supercomputing (SC05), November 2005.
- (2005) Proceedings of IEEE/ACM Conference on Supercomputing (SC05)
- Danalis, A.¹ Kim, K.² Pollock, L.³ Swany, M.⁴

108
- 70350771127
- Stencil computation optimization and autotuning on stateof-the-art multicore architectures
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and autotuning on stateof-the-art multicore architectures. In Proceedings of ACM/IEEE Conference on Supercomputing (SC08), 2008.
- (2008) Proceedings of ACM/IEEE Conference on Supercomputing (SC08)
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

109
- 34547470812
- K. Davis, A. Hoisie, G. Johnson, D. Kerbyson, M. Lang, S. Pakin, and F. Petrini. A performance and scalability analysis of the bluegene/l architecture.
- A performance and scalability analysis of the bluegene/l architecture
- Davis, K.¹ Hoisie, A.² Johnson, G.³ Kerbyson, D.⁴ Lang, M.⁵ Pakin, S.⁶ Petrini, F.⁷

110
- 0031340339
- ProfileMe: Hardware support for instruction-level profiling on out-of-order processors
- Washington, DC, IEEE Computer Society
- J. Dean, J.E. Hicks, C.A. Waldspurger, W.E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture, pages 292-302, Washington, DC, 1997. IEEE Computer Society.
- (1997) MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture , pp. 292-302
- Dean, J.¹ Hicks, J.E.² Waldspurger, C.A.³ Weihl, W.E.⁴ Chrysos, G.⁵

111
- 20744452904
- Self adapting linear algebra algorithms and software
- Special issue on Program Generation, Optimization, and Adaptation
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, C. Whaley, and K. Yelick. Self adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 2005. Special issue on Program Generation, Optimization, and Adaptation.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2
- Demmel, J.¹ Dongarra, J.² Eijkhout, V.³ Fuentes, E.⁴ Petitet, A.⁵ Vuduc, R.⁶ Whaley, C.⁷ Yelick, K.⁸

112
- 0003252789
- Applied Numerical Linear Algebra
- Philadephia, PA
- J.W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadephia, PA, 1997.
- (1997) Society for Industrial and Applied Mathematics
- Demmel, J.W.¹

113
- 0012612903
- Technical Report TR-01-23, Department of Computer Sciences, The University of Texas at Austin
- R. Desikan, D. Burger, S. Keckler, and T. Austin. Sim-alpha: a validated, executiondriven Alpha 21264 simulator. Technical Report TR-01-23, Department of Computer Sciences, The University of Texas at Austin, 2001.
- (2001) Sim-alpha: A validated, executiondriven Alpha 21264 simulator
- Desikan, R.¹ Burger, D.² Keckler, S.³ Austin, T.⁴

114
- 31344460981
- The Community Land Model and its climate statistics as a component of the Climate System Model
- R.E. Dickinson, K.W. Oleson, G. Bonan, F. Hoffman, P. Thornton, M. Vertenstein, Z-L Yang, and X. Zeng. The Community Land Model and its climate statistics as a component of the Climate System Model. Journal of Climate, 19(11): 2032-2324, 2006.
- (2006) Journal of Climate , vol.19 , Issue.11 , pp. 2032-2324
- Dickinson, R.E.¹ Oleson, K.W.² Bonan, G.³ Hoffman, F.⁴ Thornton, P.⁵ Vertenstein, M.⁶ Yang, Z.-L.⁷ Zeng, X.⁸

115
- 34250840018
- Optimized high-order derivative and dissipation operators satisfying summation by parts, and applications in threedimensional multi-block evolutions
- P. Diener, E.N. Dorband, E. Schnetter, and M. Tiglio. Optimized high-order derivative and dissipation operators satisfying summation by parts, and applications in threedimensional multi-block evolutions. Journal of Scientific Computing, 32: 109-145, 2007.
- (2007) Journal of Scientific Computing , vol.32 , pp. 109-145
- Diener, P.¹ Dorband, E.N.² Schnetter, E.³ Tiglio, M.⁴

116
- 85054449328
- F. Dijkstra and A. van der Steen. Integration of two ocean models.
- Integration of two ocean models
- Dijkstra, F.¹ van Der Steen, A.²

117
- 34547709622
- A language for the compact representation of multiple program versions
- October
- S. Donadio, J. Brodman, T. Roeder, K. Yotov, D. Barthou, A. Cohen, M.J. Garzarán, D. Padua, and K. Pingali. A language for the compact representation of multiple program versions. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing, October 2005.
- (2005) Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing
- Donadio, S.¹ Brodman, J.² Roeder, T.³ Yotov, K.⁴ Barthou, D.⁵ Cohen, A.⁶ Garzarán, M.J.⁷ Padua, D.⁸ Pingali, K.⁹

118
- 28044453637
- Performance instrumentation and measurement for terascale systems
- J. Dongarra, A.D. Malony, S. Moore, P. Mucci, and S. Shende. Performance instrumentation and measurement for terascale systems. In Proceedings of the ICCS 2003 Conference (LNCS 2660), pages 53-62, 2003.
- (2003) Proceedings of the ICCS 2003 Conference (LNCS 2660) , pp. 53-62
- Dongarra, J.¹ Malony, A.D.² Moore, S.³ Mucci, P.⁴ Shende, S.⁵

119
- 0004304389
- PCCM2: A GCM adapted for scalable parallel computer
- American Meteorological Society, Boston
- J.B. Drake, I.T. Foster, J.J. Hack, J.G. Michalakes, B.D. Semeraro, B. Tonen, D.L. Williamson, and P.H. Worley. PCCM2: A GCM adapted for scalable parallel computer. In Fifth Symposium on Global Change Studies, pages 91-98. American Meteorological Society, Boston, 1994.
- (1994) Fifth Symposium on Global Change Studies , pp. 91-98
- Drake, J.B.¹ Foster, I.T.² Hack, J.J.³ Michalakes, J.G.⁴ Semeraro, B.D.⁵ Tonen, B.⁶ Williamson, D.L.⁷ Worley, P.H.⁸

120
- 0029389354
- Design and performance of a scalable parallel community climate model
- J.B. Drake, I.T. Foster, J.G. Michalakes, B. Toonen, and P.H. Worley. Design and performance of a scalable parallel community climate model. Parallel Computing, 21(10): 1571-1591, 1995.
- (1995) Parallel Computing , vol.21 , Issue.10 , pp. 1571-1591
- Drake, J.B.¹ Foster, I.T.² Michalakes, J.G.³ Toonen, B.⁴ Worley, P.H.⁵

121
- 69949095393
- Performance tuning and evaluation of a parallel community climate model
- New York, NY, USA, ACM
- J.B. Drake, S. Hammond, R. James, and P.H. Worley. Performance tuning and evaluation of a parallel community climate model. In Proceedings of 1999 ACM/IEEE Conference on Supercomputing (SC99), page 34, New York, NY, USA, 1999. ACM.
- (1999) Proceedings of 1999 ACM/IEEE Conference on Supercomputing (SC99) , pp. 34
- Drake, J.B.¹ Hammond, S.² James, R.³ Worley, P.H.⁴

122
- 23844488736
- Overview of the software design of the Community Climate System Model
- Fall
- J.B. Drake, P.W. Jones, and G. Carr. Overview of the software design of the Community Climate System Model. International Journal of High Performance Computing Applications, 19(3): 177-186, Fall 2005.
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.3 , pp. 177-186
- Drake, J.B.¹ Jones, P.W.² Carr, G.³

123
- 23844488736
- Special issue on climate modeling
- August
- J.B. Drake, P.W. Jones, and G.R. Carr, Jr. Special issue on climate modeling. International Journal of High Performance Computing Applications, 19(3), August 2005.
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.3
- Drake, J.B.¹ Jones, P.W.² Carr, G.R.³

124
- 79953225178
- Software design for petascale climate science
- D.A. Bader, editor, chapter 7, Chapman & Hall/CRC, New York, NY
- J.B. Drake, P.W. Jones, M. Vertenstein, J.B. White III, and P.H. Worley. Software design for petascale climate science. In D.A. Bader, editor, Petascale Computing: Algorithms and Applications, chapter 7, pages 125-146. Chapman & Hall/CRC, New York, NY, 2008.
- (2008) Petascale Computing: Algorithms and Applications , pp. 125-146
- Drake, J.B.¹ Jones, P.W.² Vertenstein, M.³ White III, J.B.⁴ Worley, P.H.⁵

125
- 77952572316
- November
- P.J. Drongowski. Instruction-based sampling: A new performance analysis technique for AMD family 10h processors, November 2007. http://developer.amd.com/Assets/AMD_IBS_paper_EN.pdf
- (2007) Instruction-based sampling: A new performance analysis technique for AMD family 10h processors
- Drongowski, P.J.¹

126
- 67650793277
- Introduction to FLASH 3.0, with application to supersonic turbulence
- A. Dubey, L.B. Reid, and R. Fisher. Introduction to FLASH 3.0, with application to supersonic turbulence. Physica Scripta, 132: 014046, 2008.
- (2008) Physica Scripta , vol.132 , pp. 014046
- Dubey, A.¹ Reid, L.B.² Fisher, R.³

127
- 0027532792
- A reformulation and implementation of the Bryan-Cox-Semtner ocean model
- J.K. Dukowicz, R.D. Smith, and R.C. Malone. A reformulation and implementation of the Bryan-Cox-Semtner ocean model. Journal of Atmospheric and Oceanic Technology, 10: 195-208, 1993.
- (1993) Journal of Atmospheric and Oceanic Technology , vol.10 , pp. 195-208
- Dukowicz, J.K.¹ Smith, R.D.² Malone, R.C.³

128
- 85054464329
- Octave home page
- J.W. Eaton. Octave home page. http://www.octave.org
- Eaton, J.W.¹

129
- 49949106993
- July
- S. Eranian. Perfmon2: A flexible performance monitoring interface for Linux. pages 269-288, July 2006.
- (2006) Perfmon2: A flexible performance monitoring interface for Linux , pp. 269-288
- Eranian, S.¹

130
- 85170282443
- A density-based algorithm for discovering clusters in large spatial databases with noise
- M. Ester, H.P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 226-231, 1996.
- (1996) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , pp. 226-231
- Ester, M.¹ Kriegel, H.P.² Sander, J.³ Xu, X.⁴

131
- 85054465022
- FEAP.

132
- 38049119143
- PhD thesis, Harvard University, September
- A. Federova. Operating System Scheduling for Chip Multithreaded Processors. PhD thesis, Harvard University, September 2006.
- (2006) Operating System Scheduling for Chip Multithreaded Processors
- Federova, A.¹

133
- 34548715722
- The importance of being low power in high-performance computing
- W. Feng. The importance of being low power in high-performance computing. CTWatch Quarterly, 1(3): 12-20, 2005.
- (2005) CTWatch Quarterly , vol.1 , Issue.3 , pp. 12-20
- Feng, W.¹

134
- 85054425042
- March
- Solaris memory placement optimization and sun fireservers. http://www.sun.com/software/solaris/performance.jsp, March 2003.
- (2003)

135
- 0042071939
- Spectral element methods for transitional flows in complex geometries
- P.F. Fischer, G.W. Kruse, and F. Loth. Spectral element methods for transitional flows in complex geometries. Journal of Scientific Computing, 17, 2002.
- (2002) Journal of Scientific Computing , pp. 17
- Fischer, P.F.¹ Kruse, G.W.² Loth, F.³

136
- 0346575937
- Performance of parallel computers for spectral atmospheric models
- I.T. Foster, B. Toonen, and P.H. Worley. Performance of parallel computers for spectral atmospheric models. Journal of Atmospheric and Oceanic Technology, 13(5): 1031-1045, 1996.
- (1996) Journal of Atmospheric and Oceanic Technology , vol.13 , Issue.5 , pp. 1031-1045
- Foster, I.T.¹ Toonen, B.² Worley, P.H.³

137
- 0031129188
- May
- I.T. Foster and P.H. Worley. Parallel algorithms for the spectral transform method. 18(3): 806-837, May 1997.
- (1997) Parallel algorithms for the spectral transform method , vol.18 , Issue.3 , pp. 806-837
- Foster, I.T.¹ Worley, P.H.²

138
- 35048845536
- Exploring the predictability of MPI messages
- F. Freitag, J. Caubet, M. Farreras, T. Cortes, and J. Labarta. Exploring the predictability of MPI messages. In Proceedings of the 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS03), pages 46-55, 2003.
- (2003) Proceedings of the 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS03) , pp. 46-55
- Freitag, F.¹ Caubet, J.² Farreras, M.³ Cortes, T.⁴ Labarta, J.⁵

139
- 0348209599
- A fast Fourier transform compiler
- May
- M. Frigo. A fast Fourier transform compiler. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, May 1999.
- (1999) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
- Frigo, M.¹

140
- 0031636309
- FFTW: An adaptive software architecture for the FFT
- IEEE
- M. Frigo and S.G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, volume 3, pages 1381-1384. IEEE, 1998.
- (1998) Proceedings of 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing , vol.3 , pp. 1381-1384
- Frigo, M.¹ Johnson, S.G.²

141
- 80155153602
- M. Frigo and S.G. Johnson. FFTW for version 3.0, 2003. http://www.fftw.org/fftw3.pdf
- (2003) FFTW for version 3.0
- Frigo, M.¹ Johnson, S.G.²

142
- 0031622953
- The implementation of the Cilk-5 multithreaded language
- Montreal, Quebec, Canada, June
- M. Frigo, C.E. Leiserson, and K.H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212-223, Montreal, Quebec, Canada, June 1998.
- (1998) Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 212-223
- Frigo, M.¹ Leiserson, C.E.² Randall, K.H.³

143
- 32844470371
- Low-overhead call path profiling of unmodified, optimized code
- New York, NY, ACM Press
- N. Froyd, J. Mellor-Crummey, and R. Fowler. Low-overhead call path profiling of unmodified, optimized code. In Proceedings of 19th International Conference on Supercomputing, pages 81-90, New York, NY, 2005. ACM Press.
- (2005) Proceedings of 19th International Conference on Supercomputing , pp. 81-90
- Froyd, N.¹ Mellor-Crummey, J.² Fowler, R.³

144
- 85054465813
- Capturing and visualizing event flow graphs of mpi applications
- August
- K. Frlinger and D. Skinner. Capturing and visualizing event flow graphs of mpi applications. Proceedings of the Workshop on Productivity and Performance (PROPER 2009), August 2009.
- (2009) Proceedings of the Workshop on Productivity and Performance (PROPER 2009)
- Frlinger, K.¹ Skinner, D.²

145
- 70350755747
- Scalable loadbalance measurement for SPMD codes
- Piscataway, NJ, IEEE Press
- T. Gamblin, B.R. de Supinski, M. Schulz, R. Fowler, and D.A. Reed. Scalable loadbalance measurement for SPMD codes. In Proceedings of ACM/IEEE Conference on Supercomputing (SC08), pages 1-12, Piscataway, NJ, 2008. IEEE Press.
- (2008) Proceedings of ACM/IEEE Conference on Supercomputing (SC08) , pp. 1-12
- Gamblin, T.¹ De Supinski, B.R.² Schulz, M.³ Fowler, R.⁴ Reed, D.A.⁵

146
- 77951467652
- LeWI: A runtime balancing algorithm for nested parallelism
- M. Garcia, J. Corbalan, and J. Labarta. LeWI: A runtime balancing algorithm for nested parallelism. In Proceedings of the International Conference on Parallel Processing (ICPP’09), 2009.
- (2009) Proceedings of the International Conference on Parallel Processing (ICPP’09)
- Garcia, M.¹ Corbalan, J.² Labarta, J.³

147
- 72149119839
- Scalable collation and presentation of call-path profile data with cube
- Julich (Germany)
- M. Geimer, B. Kuhlmann, F. Pulatova, F. Wolf, and B.J.N. Wylie. Scalable collation and presentation of call-path profile data with cube. In Parallel Computing: Architectures, Algorithms and Applications: Proceedings of Parallel Computing (ParCo07), volume 15, pages 645-652, Julich (Germany), 2007.
- (2007) Parallel Computing: Architectures, Algorithms and Applications: Proceedings of Parallel Computing (ParCo07) , vol.15 , pp. 645-652
- Geimer, M.¹ Kuhlmann, B.² Pulatova, F.³ Wolf, F.⁴ Wylie, B.J.N.⁵

148
- 70149102227
- A generic and configurable sourcecode instrumentation component
- G. Allen, J. Nabrzyski, E. Seidel, G. van Albada, J. Dongarra, and P. Sloot, editors, Baton Rouge, LA, May, Springer
- M. Geimer, S. Shende, A. Malony, and F. Wolf. A generic and configurable sourcecode instrumentation component. In G. Allen, J. Nabrzyski, E. Seidel, G. van Albada, J. Dongarra, and P. Sloot, editors, International Conference on Computational Science (ICCS), volume 5545 of Lecture Notes in Computer Science, pages 696-705, Baton Rouge, LA, May 2009. Springer.
- (2009) International Conference on Computational Science (ICCS), volume 5545 of Lecture Notes in Computer Science , pp. 696-705
- Geimer, M.¹ Shende, S.² Malony, A.³ Wolf, F.⁴

149
- 33746593747
- Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
- June
- S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming, 34(3): 261-317, June 2006.
- (2006) International Journal of Parallel Programming , vol.34 , Issue.3 , pp. 261-317
- Girbal, S.¹ Vasilache, N.² Bastoul, C.³ Cohen, A.⁴ Parello, D.⁵ Sigler, M.⁶ Temam, O.⁷

150
- 70449983121
- Automatic detection of parallel applications computation phases
- J. Gonzalez, J. Gimenez, and J. Labarta. Automatic detection of parallel applications computation phases. In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS09), 2009.
- (2009) Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS09)
- Gonzalez, J.¹ Gimenez, J.² Labarta, J.³

151
- 0345584934
- The Cactus framework and toolkit: Design and applications
- Berlin, Springer
- T. Goodale, G. Allen, G. Lanfermann, J. Massó, T. Radke, E. Seidel, and J. Shalf. The Cactus framework and toolkit: Design and applications. In Vector and Parallel Processing -VECPAR’2002, 5th International Conference, Lecture Notes in Computer Science, Berlin, 2003. Springer.
- (2003) Vector and Parallel Processing -VECPAR’2002, 5th International Conference, Lecture Notes in Computer Science
- Goodale, T.¹ Allen, G.² Lanfermann, G.³ Massó, J.⁴ Radke, T.⁵ Seidel, E.⁶ Shalf, J.⁷

152
- 85015306739
- gprof: A call graph execution profiler
- June
- S. Graham, P. Kessler, and M. McKusick. gprof: A call graph execution profiler. SIGPLAN '82 Symposium on Compiler Construction, pages 120-126, June 1982.
- (1982) SIGPLAN '82 Symposium on Compiler Construction , pp. 120-126
- Graham, S.¹ Kessler, P.² McKusick, M.³

153
- 0039435412
- FLAME: Formal linear algebra methods environment
- J.A. Gunnels, R.A. Van De Geijn, and G.M. Henry. FLAME: Formal linear algebra methods environment. ACM Transactions on Mathematical Software, 27, 2001.
- (2001) ACM Transactions on Mathematical Software , pp. 27
- Gunnels, J.A.¹ Van De Geijn, R.A.² Henry, G.M.³

154
- 85054436902
- D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang. Performance database technology for SciDAC applications. 2007.
- (2007) Performance database technology for SciDAC applications
- Gunter, D.¹ Huck, K.² Karavanic, K.³ May, J.⁴ Malony, A.⁵ Mohror, K.⁶ Moore, S.⁷ Morris, A.⁸ Shende, S.⁹ Taylor, V.¹⁰ Wu, X.¹¹ Zhang, Y.¹²

155
- 40749124008
- Architecture of Qbox: A scalable first-principles molecular dynamics code
- January/March
- F. Gygi. Architecture of Qbox: A scalable first-principles molecular dynamics code. IBM Journal of Research and Development, 52, January/March 2008.
- (2008) IBM Journal of Research and Development , pp. 52
- Gygi, F.¹

156
- 33845422522
- Large-scale first-principles molecular dynamics simulations on the BlueGene/L platform using the Qbox code
- F. Gygi, E. Draeger, B.R. de Supinski, R.K. Yates, F. Franchetti, S. Kral, J. Lorenz, C.W. Überhuber, J.A. Gunnels, and J.C. Sexton. Large-scale first-principles molecular dynamics simulations on the BlueGene/L platform using the Qbox code. In Proceedings of ACM/IEEE Conference on Supercomputing (SC05), 2005.
- (2005) Proceedings of ACM/IEEE Conference on Supercomputing (SC05)
- Gygi, F.¹ Draeger, E.² De Supinski, B.R.³ Yates, R.K.⁴ Franchetti, F.⁵ Kral, S.⁶ Lorenz, J.⁷ Überhuber, C.W.⁸ Gunnels, J.A.⁹ Sexton, J.C.¹⁰

157
- 34548239117
- Large-scale electronic structure calculations of high-z metals on the BlueGene/L Platform
- November
- F. Gygi, E.W. Draeger, M. Schulz, B.R. de Supinski, J.A. Gunnels, V. Austel, J.C. Sexton, F. Franchetti, S. Kral, J. Lorenz, and C.W. Überhuber. Large-scale electronic structure calculations of high-z metals on the BlueGene/L Platform. In Proceedings of ACM/IEEE Conference on Supercomputing (SC06), November 2006.
- (2006) Proceedings of ACM/IEEE Conference on Supercomputing (SC06)
- Gygi, F.¹ Draeger, E.W.² Schulz, M.³ De Supinski, B.R.⁴ Gunnels, J.A.⁵ Austel, V.⁶ Sexton, J.C.⁷ Franchetti, F.⁸ Kral, S.⁹ Lorenz, J.¹⁰ Überhuber, C.W.¹¹

158
- 0003501882
- NCAR Tech. Note NCAR/TN-382+STR, National Center for Atmospheric Research, Boulder, CO
- J.J. Hack, B.A. Boville, B.P. Briegleb, J.T. Kiehland, P.J. Rasch, and D.L. Williamson. Description of the NCAR community climate model (CCM2). NCAR Tech. Note NCAR/TN-382+STR, National Center for Atmospheric Research, Boulder, CO, 1992.
- (1992) Description of the NCAR community climate model (CCM2)
- Hack, J.J.¹ Boville, B.A.² Briegleb, B.P.³ Kiehland, J.T.⁴ Rasch, P.J.⁵ Williamson, D.L.⁶

159
- 84870211068
- Loop transformation recipes for code generation and auto-tuning
- October
- M. Hall, J. Chame, J. Shin, C. Chen, G. Rudy, and M.M. Khan. Loop transformation recipes for code generation and auto-tuning. In LCPC, October, 2009.
- (2009) LCPC
- Hall, M.¹ Chame, J.² Shin, J.³ Chen, C.⁴ Rudy, G.⁵ Khan, M.M.⁶

160
- 58849101883
- Compiler research: The next fifty years
- February
- M. Hall, D. Padua, and K. Pingali. Compiler research: The next fifty years. Communications of the ACM, February 2009.
- (2009) Communications of the ACM
- Hall, M.¹ Padua, D.² Pingali, K.³

161
- 0030380793
- Maximizing multiprocessor performance with the SUIF compiler
- December
- M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, S. Liao, E. Bugnion, and M.S. Lam. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12): 84-89, December 1996.
- (1996) IEEE Computer , vol.29 , Issue.12 , pp. 84-89
- Hall, M.W.¹ Anderson, J.M.² Amarasinghe, S.P.³ Murphy, B.R.⁴ Liao, S.⁵ Bugnion, E.⁶ Lam, M.S.⁷

162
- 70449793159
- Annotation-based empirical performance tuning using Orio
- May
- A. Hartono, B. Norris, and P. Sadayappan. Annotation-based empirical performance tuning using Orio. In Proceedings of the 23rd International Parallel and Distributed Processing Symposium, May 2009.
- (2009) Proceedings of the 23rd International Parallel and Distributed Processing Symposium
- Hartono, A.¹ Norris, B.² Sadayappan, P.³

163
- 70449793159
- Annotation-based empirical performance tuning using Orio
- May
- A. Hartono and S. Ponnuswamy. Annotation-based empirical performance tuning using Orio. In 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) Rome, Italy, May 2009.
- (2009) 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) Rome, Italy
- Hartono, A.¹ Ponnuswamy, S.²

164
- 0004302191
- Morgan Kaufmann, San Francisco
- J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, 2006.
- (2006) Computer Architecture: A Quantitative Approach
- Hennessy, J.L.¹ Patterson, D.A.²

165
- 0024903997
- Evaluating Associativity in CPU Caches
- M.D. Hill and A.J. Smith. Evaluating Associativity in CPU Caches. IEEE Transactions on Computers, 38(12): 1612-1630, 1989.
- (1989) IEEE Transactions on Computers , vol.38 , Issue.12 , pp. 1612-1630
- Hill, M.D.¹ Smith, A.J.²

166
- 10644250257
- Inhomogeneous electron gas
- P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Physical Review, 136: B864, 1964.
- (1964) Physical Review , vol.136 , pp. B864
- Hohenberg, P.¹ Kohn, W.²

167
- 0034543848
- Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications
- A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications, 14: 330-346, 2000.
- (2000) International Journal of High Performance Computing Applications , vol.14 , pp. 330-346
- Hoisie, A.¹ Lubeck, O.² Wasserman, H.³

168
- 12444335040
- Prediction and adaptation in Active Harmony
- J.K. Hollingsworth and P.J. Keleher. Prediction and adaptation in Active Harmony. Cluster Computing, 2(3): 195-205, 1999.
- (1999) Cluster Computing , vol.2 , Issue.3 , pp. 195-205
- Hollingsworth, J.K.¹ Keleher, P.J.²

169
- 0028553216
- Dynamic program instrumentation for scalable performance tools
- Knoxville, TN, May
- J.K. Hollingsworth, B.P. Miller, and J. Cargille. Dynamic program instrumentation for scalable performance tools. In 1994 Scalable High Performance Computing Conference, pages 841-850, Knoxville, TN, May 1994.
- (1994) 1994 Scalable High Performance Computing Conference , pp. 841-850
- Hollingsworth, J.K.¹ Miller, B.P.² Cargille, J.³

170
- 84938447945
- Direct search solution of numerical and statistical problems
- R. Hooke and T.A. Jeeves. Direct search solution of numerical and statistical problems. Journal of the ACM, 8(2): 212-229, 1961.
- (1961) Journal of the ACM , vol.8 , Issue.2 , pp. 212-229
- Hooke, R.¹ Jeeves, T.A.²

171
- 85054424049
- HPC challenge benchmark. http://icl.cs.utk.edu/hpcc/index.html

172
- 33845441581
- PerfExplorer: A performance data mining framework for large-scale parallel computing
- K. Huck and A. Malony. PerfExplorer: A performance data mining framework for large-scale parallel computing. In Proceedings of ACM/IEEE Conference on Supercomputing (SC05), 2005.
- (2005) Proceedings of ACM/IEEE Conference on Supercomputing (SC05)
- Huck, K.¹ Malony, A.²

173
- 48849093309
- Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0
- (special issue on Large-Scale Programming Tools and Environments)
- K. Huck, A. Malony, S. Shende, and A. Morris. Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0. The Journal of Scientific Programming, 16(2-3): 123-134, 2008. (special issue on Large-Scale Programming Tools and Environments).
- (2008) The Journal of Scientific Programming , vol.16 , Issue.2-3 , pp. 123-134
- Huck, K.¹ Malony, A.² Shende, S.³ Morris, A.⁴

174
- 33745170397
- Design and implementation of a parallel performance data management framework
- Washington, DC, USA, IEEE Computer Society
- K.A. Huck., A.D. Malony, and A. Morris. Design and implementation of a parallel performance data management framework. In Proceedings of the 2005 International Conference on Parallel Processing (ICPP05), pages 473-482, Washington, DC, USA, 2005. IEEE Computer Society.
- (2005) Proceedings of the 2005 International Conference on Parallel Processing (ICPP05) , pp. 473-482
- Huck, K.A.¹ Malony, A.D.² Morris, A.³

175
- 0001439727
- An elastic-viscous-plastic model for sea ice dynamics
- E.C. Hunke and J.K. Dukowicz. An elastic-viscous-plastic model for sea ice dynamics. Journal of Physical Oceanography, 27: 1849-1867, 1997.
- (1997) Journal of Physical Oceanography , vol.27 , pp. 1849-1867
- Hunke, E.C.¹ Dukowicz, J.K.²

176
- 47049122028
- Automatic tuning of PDGEMM towards optimal performance
- August
- S. Hunold and T. Rauber. Automatic tuning of PDGEMM towards optimal performance. In Proceedings European Conference on Parallel Computing, August 2005.
- (2005) Proceedings European Conference on Parallel Computing
- Hunold, S.¹ Rauber, T.²

177
- 33646765746
- Kranc: A Mathematica application to generate numerical codes for tensorial evolution equations
- S. Husa, I. Hinder, and C. Lechner. Kranc: A Mathematica application to generate numerical codes for tensorial evolution equations. Computer Physics Communications, 174: 983-1004, 2006.
- (2006) Computer Physics Communications , vol.174 , pp. 983-1004
- Husa, S.¹ Hinder, I.² Lechner, C.³

178
- 0030168832
- Lua an extensible extension language
- June
- R. Ierusalimschy, L.H. de Figueiredo, and W.C. Filho. Lua an extensible extension language. Software: Practice and Experience, 26: 635-652, June 1996.
- (1996) Software: Practice and Experience , vol.26 , pp. 635-652
- Ierusalimschy, R.¹ De Figueiredo, L.H.² Filho, W.C.³

179
- 34447569672
- part 2, number 253669-032us, September
- Intel Corporation. Intel 64 and IA-32 architectures software developers manualvolume 3b: System programming guide, part 2, number 253669-032us, September 2009. http://www.intel.com/Assets/PDF/manual/253669.pdf
- (2009) Intel 64 and IA-32 architectures software developers manualvolume 3b: System programming guide

180
- 27144518084
- An approach to performance prediction for parallel applications
- E. Ipek, B.R. de Supinski, M. Schulz, and S.A. McKee. An approach to performance prediction for parallel applications. In Euro-Par 2005 Parallel Processing, pages 196-205, 2005.
- (2005) Euro-Par 2005 Parallel Processing , pp. 196-205
- Ipek, E.¹ De Supinski, B.R.² Schulz, M.³ McKee, S.A.⁴

181
- 85054431030
- ITER: International thermonuclear experimental reactor.

182
- 85054461393
- HPC profiling with the Sun Studio(TM) performance tools
- Dresden, Germany, September
- M. Itzkowitz and Y. Maruyama. HPC profiling with the Sun Studio(TM) performance tools. In Third Parallel Tools Workshop, Dresden, Germany, September 2009.
- (2009) Third Parallel Tools Workshop
- Itzkowitz, M.¹ Maruyama, Y.²

183
- 0037595554
- Sheared poloidal flow driven by mode conversion in tokamak plasmas
- E. Jaeger, L. Berry, and J. Myra, et al. Sheared poloidal flow driven by mode conversion in tokamak plasmas. Physical Review Letters, 90, 2003.
- (2003) Physical Review Letters , pp. 90
- Jaeger, E.¹ Berry, L.² Myra, J.³

184
- 0028602950
- June
- J.A. Joines and C.R. Houck. On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GA’s. pages 579-584 vol.2, June 1994.
- (1994) On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GA’s , vol.2 , pp. 579-584
- Joines, J.A.¹ Houck, C.R.²

185
- 23244452422
- Practical performance portability in the Parallel Ocean Program (POP)
- P.W. Jones, P.H. Worley, Y. Yoshida, J.B. White III, and J. Levesque. Practical performance portability in the Parallel Ocean Program (POP). Concurrency and Computation: Practice and Experience, 17(10): 1317-1327, 2005.
- (2005) Concurrency and Computation: Practice and Experience , vol.17 , Issue.10 , pp. 1317-1327
- Jones, P.W.¹ Worley, P.H.² Yoshida, Y.³ White III, J.B.⁴ Levesque, J.⁵

186
- 0003561904
- Parallel multilevel k-way partitioning scheme for irregular graphs
- G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. ACM/IEEE Proceedings of SC1996: High Performance Networking and Computing, 1996.
- (1996) ACM/IEEE Proceedings of SC1996: High Performance Networking and Computing
- Karypis, G.¹ Kumar, V.²

187
- 78149347218
- Predictive performance and scalability modeling of a large-scale application
- New York, NY, USA, ACM
- D.J. Kerbyson, H.J. Alme, A. Hoisie, F. Petrini, H.J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Proceedings of ACM/IEEE Conference on Supercomputing (SC01), pages 37-37, New York, NY, USA, 2001. ACM.
- (2001) Proceedings of ACM/IEEE Conference on Supercomputing (SC01) , pp. 37
- Kerbyson, D.J.¹ Alme, H.J.² Hoisie, A.³ Petrini, F.⁴ Wasserman, H.J.⁵ Gittings, M.⁶

188
- 0031718804
- The National Center for Atmospheric Research Community Climate Model: CCM3
- J.T. Kiehl, J.J. Hack, G. Bonan, B.A. Boville, D.L. Williamson, and P.J. Rasch. The National Center for Atmospheric Research Community Climate Model: CCM3. Journal of Climate, 11: 1131-1149, 1998.
- (1998) Journal of Climate , vol.11 , pp. 1131-1149
- Kiehl, J.T.¹ Hack, J.J.² Bonan, G.³ Boville, B.A.⁴ Williamson, D.L.⁵ Rasch, P.J.⁶

189
- 35048895994
- Advanced simulation technique for modeling multiphase fluid flow in porous media
- J.G. Kim and H.W. Park. Advanced simulation technique for modeling multiphase fluid flow in porous media. In Computational Science and Its Applications -Iccsa 2004, LNCS 2004, by A. Lagana et. al., pages 1-9, 2004.
- (2004) Computational Science and Its Applications -Iccsa 2004, LNCS 2004, by A. Lagana et. al. , pp. 1-9
- Kim, J.G.¹ Park, H.W.²

190
- 0034512401
- Combined selection of tile sizes and unroll factors using iterative compilation
- Washington, DC, USA, IEEE Computer Society
- T. Kisuki, P.M.W. Knijnenburg, and M.F.P. O’Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, 2000. IEEE Computer Society.
- (2000) PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
- Kisuki, T.¹ Knijnenburg, P.M.W.² O’Boyle, M.F.P.³

191
- 33746635961
- Introducing the Open Trace Format (OTF)
- Reading, UK, May
- A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W.E. Nagel. Introducing the Open Trace Format (OTF). In Proceedings of the 6th International Conference on Computational Science, volume 3992 of Springer Lecture Notes in Computer Science, pages 526-533, Reading, UK, May 2006.
- (2006) Proceedings of the 6th International Conference on Computational Science, volume 3992 of Springer Lecture Notes in Computer Science , pp. 526-533
- Knüpfer, A.¹ Brendel, R.² Brunst, H.³ Mix, H.⁴ Nagel, W.E.⁵

192
- 33745136466
- Construction and compression of complete call graphs for post-mortem program trace analysis
- A. Knupfer and W.E. Nagel. Construction and compression of complete call graphs for post-mortem program trace analysis. In Proceedings of the International Conference on Parallel Processing (ICPP), pages 165-172, 2005.
- (2005) Proceedings of the International Conference on Parallel Processing (ICPP) , pp. 165-172
- Knupfer, A.¹ Nagel, W.E.²

193
- 24944580988
- Springer
- S-H Ko, K.W. Cho, Y.D. Song, Y.G. Kim, J-S Na, and C. Kim. Development of Cactus driver for CFD analyses in the grid computing environment, pages 771-777. Springer, 2005.
- (2005) Development of Cactus driver for CFD analyses in the grid computing environment , pp. 771-777
- Ko, S.-H.¹ Cho, K.W.² Song, Y.D.³ Kim, Y.G.⁴ Na, J.-S.⁵ Kim, C.⁶

194
- 7444248681
- Divorcing language dependencies from a scientific software library
- S. Kohn, G. Kumfert, J. Painter, and C. Ribbens. Divorcing language dependencies from a scientific software library. In Proceedings of the 10th SIAM Conference on Parallel Processing, 2001.
- (2001) Proceedings of the 10th SIAM Conference on Parallel Processing
- Kohn, S.¹ Kumfert, G.² Painter, J.³ Ribbens, C.⁴

195
- 4043140349
- Density functional and density matrix method scaling linearly with the number of atoms
- W. Kohn. Density functional and density matrix method scaling linearly with the number of atoms. Physical Review Letters, 76(17): 3168-3171, 1996.
- (1996) Physical Review Letters , vol.76 , Issue.17 , pp. 3168-3171
- Kohn, W.¹

196
- 0042113153
- Self-consistent equations including exchange and correlation effects
- W. Kohn and L.J. Sham. Self-consistent equations including exchange and correlation effects. Physical Review, 140: A1133, 1965.
- (1965) Physical Review , vol.140 , pp. A1133
- Kohn, W.¹ Sham, L.J.²

197
- 0242667172
- Optimization by direct search: New perspectives on some classical and modern methods
- T.G. Kolda, R.M. Lewis, and V. Torczon. Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45(3): 385-482, 2004.
- (2004) SIAM Review , vol.45 , Issue.3 , pp. 385-482
- Kolda, T.G.¹ Lewis, R.M.² Torczon, V.³

198
- 0029359304
- Comparison of initial value and eigenvalue codes for kinetic toroidal plasma instabilities
- August
- M. Kotschenreuther, G. Rewoldt, and W.M. Tang. Comparison of initial value and eigenvalue codes for kinetic toroidal plasma instabilities. Computer Physics Communications, 88: 128-140, August 1995.
- (1995) Computer Physics Communications , vol.88 , pp. 128-140
- Kotschenreuther, M.¹ Rewoldt, G.² Tang, W.M.³

199
- 85054429973
- Kranc: Automated code generation. http://www.cct.lsu.edu/~eschnett/Kranc

200
- 65549119644
- Quantum chromodynamics with advanced computing
- A.S. Kronfeld. Quantum chromodynamics with advanced computing. Journal of Physics: Conference Series, 125: 012067, 2008.
- (2008) Journal of Physics: Conference Series , vol.125 , pp. 012067
- Kronfeld, A.S.¹

201
- 34548776288
- PerfSuite: An accessible, open source performance analysis environment for Linux
- R. Kufrin. PerfSuite: An accessible, open source performance analysis environment for Linux. In Sixth International Conference on Linux Clusters (LCI), 2005.
- (2005) Sixth International Conference on Linux Clusters (LCI)
- Kufrin, R.¹

202
- 1442337776
- Finding effective optimization phase sequences
- P. Kulkarni, W. Zhao, H. Moon, K. Cho, D. Whalley, J. Davidson, M. Bailey, Y. Paek, and K. Gallivan. Finding effective optimization phase sequences. SIGPLAN Not., 38(7): 12-23, 2003.
- (2003) SIGPLAN Not. , vol.38 , Issue.7 , pp. 12-23
- Kulkarni, P.¹ Zhao, W.² Moon, H.³ Cho, K.⁴ Whalley, D.⁵ Davidson, J.⁶ Bailey, M.⁷ Paek, Y.⁸ Gallivan, K.⁹

203
- 48849094389
- Scalability of tracing and visualization tools
- Malaga
- J. Labarta, J. Gimenez, E. Martinez, P. Gonzalez, H. Servat, G. Llort, and X. Aguilar. Scalability of tracing and visualization tools. In Parallel Computing 2005, Malaga, 2005.
- (2005) Parallel Computing 2005
- Labarta, J.¹ Gimenez, J.² Martinez, E.³ Gonzalez, P.⁴ Servat, H.⁵ Llort, G.⁶ Aguilar, X.⁷

204
- 84947944896
- Dip: A parallel program development environment
- Lyon (France), August
- J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. Dip: A parallel program development environment. In Proceedings of 2nd International EuroPar Conference (EuroPar 96), Lyon (France), August 1996.
- (1996) Proceedings of 2nd International EuroPar Conference (EuroPar 96)
- Labarta, J.¹ Girona, S.² Pillet, V.³ Cortes, T.⁴ Gregoris, L.⁵

205
- 0032251894
- Convergence properties of the Nelder-Mead simplex algorithm in low dimensions
- J.C. Lagarias, J.A. Reeds, M.H. Wright, and P.E. Wright. Convergence properties of the Nelder-Mead simplex algorithm in low dimensions. SIAM Journal on Optimization, 9: 112-147, 1998.
- (1998) SIAM Journal on Optimization , vol.9 , pp. 112-147
- Lagarias, J.C.¹ Reeds, J.A.² Wright, M.H.³ Wright, P.E.⁴

206
- 0028380268
- Rewriting executable files to measure program behavior
- J.R. Larus and T. Ball. Rewriting executable files to measure program behavior. Software Practice and Experience, 24(2): 197-218, 1994.
- (1994) Software Practice and Experience , vol.24 , Issue.2 , pp. 197-218
- Larus, J.R.¹ Ball, T.²

207
- 0003834102
- Prentice-Hall, Inc., Upper Saddle River, NJ, USA
- E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik. Quantitative System Performance: Computer System Analysis Using Queueing Network Models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1984.
- (1984) Quantitative System Performance: Computer System Analysis Using Queueing Network Models
- Lazowska, E.D.¹ Zahorjan, J.² Graham, G.S.³ Sevcik, K.C.⁴

208
- 70249083648
- From tensor equations to numerical code -computer algebra tools for numerical relativity
- C. Lechner, D. Alic, and S. Husa. From tensor equations to numerical code -computer algebra tools for numerical relativity. In SYNASC 2004 -6th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania, 2004.
- (2004) SYNASC 2004 -6th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania
- Lechner, C.¹ Alic, D.² Husa, S.³

209
- 34748909426
- Methods of inference and learning for performance modeling of parallel applications
- New York, NY, ACM
- B.C. Lee, D.M. Brooks, B.R. de Supinski, M. Schulz, K. Singh, and S.A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP '07: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 249-258, New York, NY, 2007. ACM.
- (2007) PPoPP '07: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 249-258
- Lee, B.C.¹ Brooks, D.M.² De Supinski, B.R.³ Schulz, M.⁴ Singh, K.⁵ McKee, S.A.⁶

210
- 20844459296
- Gyrokinetic particle simulation model
- W.W. Lee. Gyrokinetic particle simulation model. Journal of Computational Physics, 72: 243-269, 1987.
- (1987) Journal of Computational Physics , vol.72 , pp. 243-269
- Lee, W.W.¹

211
- 33645434039
- A code isolator: Isolating code fragments from large programs
- September
- Y. Lee and M. Hall. A code isolator: Isolating code fragments from large programs. In Proceedings of the Seventeenth Workshop on Languages, Compilers for Parallel Computing (LCPC’04), September 2004.
- (2004) Proceedings of the Seventeenth Workshop on Languages, Compilers for Parallel Computing (LCPC’04)
- Lee, Y.¹ Hall, M.²

212
- 85054467691
- Dyninst as a binary rewriter
- M. Legendre. Dyninst as a binary rewriter. In Paradyn/Dyninst week, 2009. http: //www.dyninst.org/pdWeek09/slides/legendre-binrewriter.pdf
- (2009) Paradyn/Dyninst week
- Legendre, M.¹

213
- 31844448677
- J. Levon and P. Elie. Oprofile: A system profiler for Linux. http://oprofile.sourceforge.net
- Oprofile: A system profiler for Linux
- Levon, J.¹ Elie, P.²

214
- 77954054624
- A note on auto-tuning GEMM for GPUs
- Baton Rouge, LA, May
- Y. Li, J. Dongarra, and S. Tomov. A note on auto-tuning GEMM for GPUs. In 9th International Conference on Computation Science (ICCS’09), Baton Rouge, LA, May 2009.
- (2009) 9th International Conference on Computation Science (ICCS’09)
- Li, Y.¹ Dongarra, J.² Tomov, S.³

215
- 85048177394
- Effective source-to-source outlining to support whole program empirical optimization
- October
- C. Liao, D.J. Quinlan, R. Vuduc, and T. Panas. Effective source-to-source outlining to support whole program empirical optimization. In Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC09), October 2009.
- (2009) Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC09)
- Liao, C.¹ Quinlan, D.J.² Vuduc, R.³ Panas, T.⁴

216
- 0037071357
- Size scaling of turbulent transport in magnetically confined plasmas
- Z. Lin, S. Ethier, T.S. Hahm, and W.M. Tang. Size scaling of turbulent transport in magnetically confined plasmas. Physical Review Letters, 88, 2002.
- (2002) Physical Review Letters , pp. 88
- Lin, Z.¹ Ethier, S.² Hahm, T.S.³ Tang, W.M.⁴

217
- 0032544628
- Turbulent transport reduction by zonal flows: Massively parallel simulations
- September
- Z. Lin, T.S. Hahm, W.W. Lee, W.M. Tang, and R.B. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, 281(5384): 1835-1837, September 1998.
- (1998) Science , vol.281 , Issue.5384 , pp. 1835-1837
- Lin, Z.¹ Hahm, T.S.² Lee, W.W.³ Tang, W.M.⁴ White, R.B.⁵

218
- 84862321901
- A tool framework for static and dynamic analysis of object-oriented software with templates
- K.A. Lindlan, J. Cuny, A.D. Malony, S. Shende, B. Mohr, R. Rivenburgh, and C. Rasmussen. A tool framework for static and dynamic analysis of object-oriented software with templates. In Proceedings of ACM/IEEE Conference on Supercomputing (SC2000), 2000.
- (2000) Proceedings of ACM/IEEE Conference on Supercomputing (SC2000)
- Lindlan, K.A.¹ Cuny, J.² Malony, A.D.³ Shende, S.⁴ Mohr, B.⁵ Rivenburgh, R.⁶ Rasmussen, C.⁷

219
- 85054463495
- High-resolution peripheral quantitative computed tomography can assess microstructural and mechanical properties of human distal tibial bone
- press
- X.S. Liu, X.H. Zhang, K.K. Sekhon, M.F. Adam, D.J. McMahon, E. Shane, J.P. Bilezikian, and X.E. Guo. High-resolution peripheral quantitative computed tomography can assess microstructural and mechanical properties of human distal tibial bone. Journal of Bone and Mineral Research, in press.
- Journal of Bone and Mineral Research
- Liu, X.S.¹ Zhang, X.H.² Sekhon, K.K.³ Adam, M.F.⁴ McMahon, D.J.⁵ Shane, E.⁶ Bilezikian, J.P.⁷ Guo, X.E.⁸

220
- 77954020714
- On-line detection of large-scale parallel application’s structure
- April
- G. Llort, J. Gonzalez, H. Servat, J. Gimenez, and J. Labarta. On-line detection of large-scale parallel application’s structure. In IPDPS 2010, April 2010.
- (2010) IPDPS 2010
- Llort, G.¹ Gonzalez, J.² Servat, H.³ Gimenez, J.⁴ Labarta, J.⁵

221
- 31944440969
- Pin: Building customized program analysis tools with dynamic instrumentation
- C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V.J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of Programming Language Design and Implementation (PLDI), pages 191-200, 2005.
- (2005) Proceedings of Programming Language Design and Implementation (PLDI) , pp. 191-200
- Luk, C.K.¹ Cohn, R.² Muth, R.³ Patil, H.⁴ Klauser, A.⁵ Lowney, G.⁶ Wallace, S.⁷ Reddi, V.J.⁸ Hazelwood, K.⁹

222
- 0031164889
- Increasing the efficiency of ideal solar cells by photon induced tansitions at intermediate lavels
- A. Luque and A. Marti. Increasing the efficiency of ideal solar cells by photon induced tansitions at intermediate lavels. Physical Review Letters, 78: 5014, 1997.
- (1997) Physical Review Letters , vol.78 , pp. 5014
- Luque, A.¹ Marti, A.²

223
- 33645446819
- Lattice boltzmann model for dissipative MHD
- Montreux, Switzerland, June 17-21
- A. Macnab, G. Vahala, L. Vahala, and P. Pavlo. Lattice boltzmann model for dissipative MHD. In 29th EPS Conference on Controlled Fusion and Plasma Physics, volume 26B, Montreux, Switzerland, June 17-21, 2002.
- (2002) 29th EPS Conference on Controlled Fusion and Plasma Physics , vol.26B
- Macnab, A.¹ Vahala, G.² Vahala, L.³ Pavlo, P.⁴

224
- 33745859061
- Spatial hypersurfaces in causal set cosmology
- Jun
- S. Major, D. Rideout, and S. Surya. Spatial hypersurfaces in causal set cosmology. Classical Quantum Gravity, 23: 4743-4752, Jun 2006.
- (2006) Classical Quantum Gravity , vol.23 , pp. 4743-4752
- Major, S.¹ Rideout, D.² Surya, S.³

225
- 0002646447
- Kluwer, Norwell, MA
- A. Malony and S. Shende. Performance technology for complex parallel and distributed systems, pages 37-46. Kluwer, Norwell, MA, 2000.
- (2000) Performance technology for complex parallel and distributed systems , pp. 37-46
- Malony, A.¹ Shende, S.²

226
- 38049136248
- Phase-based parallel performance profiling
- Malaga, Spain, September
- A. Malony, S. Shende, and A. Morris. Phase-based parallel performance profiling. In ParCo 2005: Parallel Computing 2005, Malaga, Spain, September 2005.
- (2005) ParCo 2005: Parallel Computing 2005
- Malony, A.¹ Shende, S.² Morris, A.³

227
- 8344269521
- Crossarchitecture performance predictions for scientific applications using parameterized models
- New York, NY
- G. Marin and J. Mellor-Crummey. Crossarchitecture performance predictions for scientific applications using parameterized models. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRIC 2004), pages 2-13, New York, NY, 2004.
- (2004) Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRIC 2004) , pp. 2-13
- Marin, G.¹ Mellor-Crummey, J.²

228
- 77749249189
- Effective communication and computation overlap with hybrid MPI/SMPSs
- V. Marjanovic, J. Labarta, E. Ayguad, and M. Valero. Effective communication and computation overlap with hybrid MPI/SMPSs. In Poster at PPoPP 2010, 2010.
- (2010) Poster at PPoPP 2010
- Marjanovic, V.¹ Labarta, J.² Ayguad, E.³ Valero, M.⁴

229
- 70350727348
- Measuring how fast computers really are
- September
- J. Markoff. Measuring how fast computers really are. New York Times, page 14F, September 1991.
- (1991) New York Times , pp. 14
- Markoff, J.¹

230
- 85054455832
- Performance Measurement of Applications with GPU Acceleration using CUDA
- to appear
- S. Mayanglambam, A. Malony, and M. Sottile. Performance Measurement of Applications with GPU Acceleration using CUDA. In Parallel Computing (ParCo), 2009. to appear.
- (2009) Parallel Computing (ParCo)
- Mayanglambam, S.¹ Malony, A.² Sottile, M.³

231
- 0033098422
- Differential profiling
- P.E. McKenney. Differential profiling. Software: Practice and Experience, 29(3): 219-234, 1998.
- (1998) Software: Practice and Experience , vol.29 , Issue.3 , pp. 219-234
- McKenney, P.E.¹

232
- 0032252855
- Convergence of the Nelder-Mead simplex method to a nonstationary point
- K.I.M. McKinnon. Convergence of the Nelder-Mead simplex method to a nonstationary point. SIAM Journal on Optimization, 9(1): 148-158, 1998.
- (1998) SIAM Journal on Optimization , vol.9 , Issue.1 , pp. 148-158
- McKinnon, K.I.M.¹

233
- 85054461446
- a public BSSN code
- McLachlan, a public BSSN code.
- McLachlan¹

234
- 36049031758
- Harnessing the power of emerging petascale platforms
- June
- J. Mellor-Crummey. Harnessing the power of emerging petascale platforms. Journal of Physics: Conference Series, 78, June 2007.
- (2007) Journal of Physics: Conference Series , pp. 78
- Mellor-Crummey, J.¹

235
- 0036679608
- HPCView: A tool for top-down analysis of node performance
- J. Mellor-Crummey, R.J. Fowler, G. Marin, and N. Tallent. HPCView: A tool for top-down analysis of node performance. Journal of Supercomputing, 23(1): 81-104, 2002.
- (2002) Journal of Supercomputing , vol.23 , Issue.1 , pp. 81-104
- Mellor-Crummey, J.¹ Fowler, R.J.² Marin, G.³ Tallent, N.⁴

236
- 33846529179
- Performance monitoring on the POWER5 microprocessor
- L.K. John and L. Eeckhout, CRC PRESS
- A. Mericas. Performance monitoring on the POWER5 microprocessor. In L.K. John and L. Eeckhout, editors, Performance Evaluation and Benchmarking, pages 247-266. CRC PRESS, 2006.
- (2006) Performance Evaluation and Benchmarking , pp. 247-266
- Mericas, A.¹

237
- 84886585478
- A. Mericas, et al. CPI analysis on POWER5, Part 2: Introducing the CPI breakdown model. https://www.ibm.com/developerworks/library/pa-cpipower2
- CPI analysis on POWER5, Part 2: Introducing the CPI breakdown model
- Mericas, A.¹

238
- 85054428712
- Message Passsing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputer Applications (Special Issue on MPI), 8(3/4), 1994.
- International Journal of Supercomputer Applications (Special Issue on MPI) , vol.8 , Issue.3-4 , pp. 1994

239
- 84919437437
- Tracedriven cosimulation of highperformance computing systems using omnet++
- C. Mikenberg and G. Rodriguez. Tracedriven cosimulation of highperformance computing systems using omnet++. In 2nd International Workshop on OMNeT++, in conjunction with the 2nd International Conference on Simulation Tools and Techniques (SIMUTools’09), 2009.
- (2009) 2nd International Workshop on OMNeT++, in conjunction with the 2nd International Conference on Simulation Tools and Techniques (SIMUTools’09)
- Mikenberg, C.¹ Rodriguez, G.²

240
- 0037146399
- A Conservative Three-Dimensional Eulerian Method for Coupled Solid-Fluid Shock Capturing
- G.H. Miller and P. Colella. A Conservative Three-Dimensional Eulerian Method for Coupled Solid-Fluid Shock Capturing. Journal of Computational Physics, 183: 26-82, 2002.
- (2002) Journal of Computational Physics , vol.183 , pp. 26-82
- Miller, G.H.¹ Colella, P.²

241
- 83155177863
- Coping at the user-level with resource limitations in the Cray message passing poolkit MPI at scale: How not to spend your summer vacation
- R. Winget and K. Winget, editor, Eagan, MN, Cray User Group, Inc
- R. Mills, F. Hoffman, P.Worley, K. Perumalla, A. Mirin, G. Hammond, and B. Smith. Coping at the user-level with resource limitations in the Cray message passing poolkit MPI at scale: How not to spend your summer vacation. In R. Winget and K. Winget, editor, Proceedings of the 51st Cray User Group Conference, May 4-7, 2009, Eagan, MN, 2009. Cray User Group, Inc.
- (2009) Proceedings of the 51st Cray User Group Conference, May 4-7, 2009
- Mills, R.¹ Hoffman, F.² Worley, P.³ Perumalla, K.⁴ Mirin, A.⁵ Hammond, G.⁶ Smith, B.⁷

242
- 35348840289
- Block structured adaptive mesh and time refinement for hybrid, hyperbolic + n-body systems
- F. Miniati and P. Colella. Block structured adaptive mesh and time refinement for hybrid, hyperbolic + n-body systems. Journal of Computational Physics, 227: 400-430, 2007.
- (2007) Journal of Computational Physics , vol.227 , pp. 400-430
- Miniati, F.¹ Colella, P.²

243
- 36049045668
- Extending scalability of the Community Atmosphere Model
- A. Mirin and P. Worley. Extending scalability of the Community Atmosphere Model. Journal of Physics: Conference Series, 78, 2007. doi: 10.1088/1742-6596/78/1/012082
- (2007) Journal of Physics: Conference Series , pp. 78
- Mirin, A.¹ Worley, P.²

244
- 23844539932
- A scalable implemenation of a finite-volume dynamical core in the Community Atmosphere Model
- August
- A.A. Mirin and W.B. Sawyer. A scalable implemenation of a finite-volume dynamical core in the Community Atmosphere Model. International Journal of High Performance Computing Applications, 19(3), August 2005.
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.3
- Mirin, A.A.¹ Sawyer, W.B.²

245
- 33646403324
- Towards a performance tool interface for OpenMP: An approach based on directive rewriting
- B. Mohr, A.D. Malony, S. Shende, and F. Wolf. Towards a performance tool interface for OpenMP: An approach based on directive rewriting. In Proceedings of Third European Workshop on OpenMP.
- Proceedings of Third European Workshop on OpenMP
- Mohr, B.¹ Malony, A.D.² Shende, S.³ Wolf, F.⁴

246
- 35048890131
- KOJAK -A tool set for automatic performance analysis of parallel programs
- August
- B. Mohr and F. Wolf. KOJAK -A tool set for automatic performance analysis of parallel programs. In Procs. of the International Conference on Parallel and Distributed Computing (Euro-Par 2003). (Lecture notes in computer science; 2790), pages 1301-1304, August 2003.
- (2003) Procs. of the International Conference on Parallel and Distributed Computing (Euro-Par 2003). (Lecture notes in computer science; 2790) , pp. 1301-1304
- Mohr, B.¹ Wolf, F.²

247
- 0000793139
- Cramming more components onto integrated circuits
- April
- G.E. Moore. Cramming more components onto integrated circuits. Electronics, 38(8), April 1965.
- (1965) Electronics , vol.38 , Issue.8
- Moore, G.E.¹

248
- 51849091556
- Observing performance dynamics using parallel profile snapshots
- Canary Island, Spain, August, Springer
- A. Morris, W. Spear, A. Malony, and S. Shende. Observing performance dynamics using parallel profile snapshots. In EuroPar 2008, volume LNCS 5168, pages 162-171, Canary Island, Spain, August 2008. Springer.
- (2008) EuroPar 2008, volume LNCS 5168 , pp. 162-171
- Morris, A.¹ Spear, W.² Malony, A.³ Shende, S.⁴

249
- 84869354999
- D. Mosberger-Tang. libunwind. http://www.nongnu.org/libunwind
- Libunwind
- Mosberger-Tang, D.¹

250
- 67650844203
- Producing wrong data without doing anything obviously wrong!
- New York, NY, USA, ACM
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P.F. Sweeney. Producing wrong data without doing anything obviously wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 265-276, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems , pp. 265-276
- Mytkowicz, T.¹ Diwan, A.² Hauswirth, M.³ Sweeney, P.F.⁴

251
- 0002438680
- VAMPIR: Visualization and Analysis of MPI Resources
- W E. Nagel, A. Arnold, M. Weber, H-C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer, 12(1): 69-80, 1996.
- (1996) Supercomputer , vol.12 , Issue.1 , pp. 69-80
- Nagel, W.E.¹ Arnold, A.² Weber, M.³ Hoppe, H.-C.⁴ Solchenbach, K.⁵

252
- 85054452978
- VAMPIR: Visualization and analysis of MPI resources
- W.E. Nagel, A. Arnold, M. Weber, H.C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. The International Journal of Supercomputer Applications and High Performance Computing, 11(2): 144-159, 1997.
- (1997) The International Journal of Supercomputer Applications and High Performance Computing , vol.11 , Issue.2 , pp. 144-159
- Nagel, W.E.¹ Arnold, A.² Weber, M.³ Hoppe, H.C.⁴ Solchenbach, K.⁵

253
- 77954018807
- TAUoverMRNet (ToM): A framework for scalable parallel performance monitoring
- A. Nataraj, A. Malony, A. Morris, D. Arnold, and B. Miller. TAUoverMRNet (ToM): A framework for scalable parallel performance monitoring. In International Workshop on Scalable Tools for High-End Computing (STHEC '08), 2008.
- (2008) International Workshop on Scalable Tools for High-End Computing (STHEC '08)
- Nataraj, A.¹ Malony, A.² Morris, A.³ Arnold, D.⁴ Miller, B.⁵

254
- 40449120689
- Integrated parallel performance views
- A. Nataraj, A.D. Malony, S. Shende, and A. Morris. Integrated parallel performance views. Cluster Computing, 11(1): 57-73, 2008.
- (2008) Cluster Computing , vol.11 , Issue.1 , pp. 57-73
- Nataraj, A.¹ Malony, A.D.² Shende, S.³ Morris, A.⁴

255
- 56749181050
- The ghost in the machine: Observing the effects of kernel operation on parallel application performance
- Reno, Nevada, November 10-16
- A. Nataraj, A. Morris, A.D. Malony, M. Sottile, and P. Beckman. The ghost in the machine: Observing the effects of kernel operation on parallel application performance. In Proceedings of 2007 ACM/IEEE Conference on Supercomputing (SC2007), Reno, Nevada, November 10-16 2007.
- (2007) Proceedings of 2007 ACM/IEEE Conference on Supercomputing (SC2007)
- Nataraj, A.¹ Morris, A.² Malony, A.D.³ Sottile, M.⁴ Beckman, P.⁵

256
- 51849107896
- TAUoverSupermon: Low-overhead online parallel performance monitoring
- A. Nataraj, M. Sottile, A. Morris, A.D. Malony, and S. Shende. TAUoverSupermon: Low-overhead online parallel performance monitoring. In Europar’07: European Conference on Parallel Processing, 2007.
- (2007) Europar’07: European Conference on Parallel Processing
- Nataraj, A.¹ Sottile, M.² Morris, A.³ Malony, A.D.⁴ Shende, S.⁵

257
- 85054443810
- National Center for Supercomputing Applications. Blue Waters hardware. http://www.ncsa.illinois.edu/BlueWaters/hardware.html
- Blue Waters hardware

258
- 0000238336
- A simplex method for function minimization
- J.A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7: 308-313, 1965.
- (1965) Computer Journal , vol.7 , pp. 308-313
- Nelder, J.A.¹ Mead, R.²

259
- 51049092126
- Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization
- April
- Y.L. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman. Model-guided performance tuning of parameter values: A case study with molecular dynamics visualization. IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), April 2008.
- (2008) IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008)
- Nelson, Y.L.¹ Bansal, B.² Hall, M.³ Nakano, A.⁴ Lerman, K.⁵

260
- 85054466071
- Real-time statistical clustering for event trace reduction
- O.Y. Nickolayev, P.C. Roth, and D.A. Reed. Real-time statistical clustering for event trace reduction. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC08), pages 1-12, 2008.
- (2008) Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC08) , pp. 1-12
- Nickolayev, O.Y.¹ Roth, P.C.² Reed, D.A.³

261
- 67349187344
- Scalatrace: Scalable compression and replay of communication traces in high performance computing
- Aug
- M. Noeth, P. Ratn, F. Mueller, M. Schulz, and B. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel and Distributed Computing, 69(8): 969-710, Aug 2009.
- (2009) Journal of Parallel and Distributed Computing , vol.69 , Issue.8 , pp. 710-969
- Noeth, M.¹ Ratn, P.² Mueller, F.³ Schulz, M.⁴ De Supinski, B.⁵

262
- 0002081678
- Co-Array Fortran for parallel programming
- R.W. Numrich and J.K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum, 17(2): 1-31, 1998.
- (1998) ACM Fortran Forum , vol.17 , Issue.2 , pp. 1-31
- Numrich, R.W.¹ Reid, J.K.²

263
- 85054449155
- Department of Energy
- July 30
- Office of Science, U.S. Department of Energy. A science-based case for large-scale simulation. http://www.pnl.gov/scales, July 30 2003.
- (2003) A science-based case for large-scale simulation

264
- 84934325826
- Scientific computations on modern parallel vector systems
- Washington, DC, USA, IEEE Computer Society
- L. Oliker, A. Canning, J. Carter, J. Shalf, and S. Ethier. Scientific computations on modern parallel vector systems. In Proceedings of ACM/IEEE Conference on Supercomputing (SC04), page 10, Washington, DC, USA, 2004. IEEE Computer Society.
- (2004) Proceedings of ACM/IEEE Conference on Supercomputing (SC04) , pp. 10
- Oliker, L.¹ Canning, A.² Carter, J.³ Shalf, J.⁴ Ethier, S.⁵

265
- 85054453885
- L. Oliker, A. Canning, and J. Carter, et al. Scientific application performance on candidate petascale platforms.
- Scientific application performance on candidate petascale platforms
- Oliker, L.¹ Canning, A.² Carter, J.³

266
- 84894478110
- March
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient deterministic multithreading in software. March 2009.
- (2009) Kendo: Efficient deterministic multithreading in software
- Olszewski, M.¹ Ansel, J.² Amarasinghe, S.³

267
- 0031123703
- From silicon to RNA: The coming of age of first-principle molecular dynamics
- M. Parrinello. From silicon to RNA: The coming of age of first-principle molecular dynamics. Solid State Communications, 103, 107, 1997.
- (1997) Solid State Communications , vol.103 , pp. 107
- Parrinello, M.¹

268
- 0003719406
- Morgan Kaufmann, San Francisco
- D.A. Patterson and J.L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Francisco, 2008.
- (2008) Computer Organization and Design: The Hardware/Software Interface
- Patterson, D.A.¹ Hennessy, J.L.²

269
- 11944256577
- Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients
- M.C. Payne, M.P. Teter, D.C. Allan, T.A. Arias, and J.D. Joannopoulos. Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients. Reviews of Modern Physics, 64: 1045, 1992.
- (1992) Reviews of Modern Physics , vol.64 , pp. 1045
- Payne, M.C.¹ Teter, M.P.² Allan, D.C.³ Arias, T.A.⁴ Joannopoulos, J.D.⁵

270
- 27144551353
- Using simpoint for accurate and efficient simulation
- E. Perelman, G. Hamerly, M.V. Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. ACM SIGMETRICS Performance Evaluation Review, 31: 318-319, 2003.
- (2003) ACM SIGMETRICS Performance Evaluation Review , vol.31 , pp. 318-319
- Perelman, E.¹ Hamerly, G.² Biesbrouck, M.V.³ Sherwood, T.⁴ Calder, B.⁵

271
- 85054451417
- SciDAC Performance Engineering Research Institute (PERI).

272
- 85054459001
- PETSc: Portable, extensible toolkit for scientific computation.

273
- 70649090070
- Victoria Falls: Scaling highly-threaded processor cores
- S. Phillips. Victoria Falls: Scaling highly-threaded processor cores. In HotChips 19, 2007.
- (2007) HotChips 19
- Phillips, S.¹

274
- 0028409163
- The NX message passing interface
- April
- P. Pierce. The NX message passing interface. Parallel Computing, 20(4): 463-480, April 1994.
- (1994) Parallel Computing , vol.20 , Issue.4 , pp. 463-480
- Pierce, P.¹

275
- 33751095034
- PARAVER: A tool to visualise and analyze parallel code
- Amsterdam, IOS Press
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, volume 44, pages 17-31, Amsterdam, 1995. IOS Press.
- (1995) Proceedings of WoTUG-18: Transputer and occam Developments , vol.44 , pp. 17-31
- Pillet, V.¹ Labarta, J.² Cortes, T.³ Girona, S.⁴

276
- 33645436979
- Technical Report UPC-CEPBA 95-3, European Center for Parallelism of Barcelona (CEPBA), Universitat Polit`ecnica de Catalunya (UPC)
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualize and analyze parallel code. Technical Report UPC-CEPBA 95-3, European Center for Parallelism of Barcelona (CEPBA), Universitat Polit`ecnica de Catalunya (UPC), 1995. http://tinyurl.com/paraver95
- (1995) PARAVER: A tool to visualize and analyze parallel code
- Pillet, V.¹ Labarta, J.² Cortes, T.³ Girona, S.⁴

277
- 0003789144
- Viking, New York
- S. Pinker. The Blank Slate: The Modern Denial of Human Nature. Viking, New York, 2002.
- (2002) The Blank Slate: The Modern Denial of Human Nature
- Pinker, S.¹

278
- 85054461952
- PLASMA project. http://icl.cs.utk.edu/plasma

279
- 84871295761
- Graphite: Polyhedral analyses and optimizations for gcc
- S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Silber, and N. Vasilache. Graphite: Polyhedral analyses and optimizations for gcc. In Proceedings of the 2006 GCC Developers Summit, page 2006, 2006.
- (2006) Proceedings of the 2006 GCC Developers Summit , pp. 2006
- Pop, S.¹ Cohen, A.² Bastoul, C.³ Girbal, S.⁴ Silber, G.⁵ Vasilache, N.⁶

280
- 14744298722
- Technical Report TR03-419, Rice University, October
- A. Qasem, G. Jin, and J. Mellor-Crummey. Improving performance with integrated program transformations. Technical Report TR03-419, Rice University, October 2003.
- (2003) Improving performance with integrated program transformations
- Qasem, A.¹ Jin, G.² Mellor-Crummey, J.³

281
- 34547401051
- Profitable loop fusion and tiling using model-driven empirical search
- June
- A. Qasem and K. Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In Proceedings of the 2006 ACM International Conference on Supercomputing, June 2006.
- (2006) Proceedings of the 2006 ACM International Conference on Supercomputing
- Qasem, A.¹ Kennedy, K.²

282
- 85054441534
- Coefficient of determination. mathbits.com/mathbits/tisection/statistics2/correlation.htm

283
- 57349170105
- Preserving time in large-scale communication traces
- June
- P. Ratn, F. Mueller, M. Schulz, and B. de Supinski. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46-55, June 2008.
- (2008) International Conference on Supercomputing , pp. 46-55
- Ratn, P.¹ Mueller, F.² Schulz, M.³ De Supinski, B.⁴

284
- 67650060203
- Rice University. HPCToolkit performance tools. http://hpctoolkit.org
- HPCToolkit performance tools

285
- 33846164822
- Evidence for an entropy bound from fundamentally discrete gravity
- D. Rideout and S. Zohren. Evidence for an entropy bound from fundamentally discrete gravity. Classical Quantum Gravity, 2006.
- (2006) Classical Quantum Gravity
- Rideout, D.¹ Zohren, S.²

286
- 84877034501
- Mrnet: A software-based multicast/reduction network for scalable tools
- IEEE Computer Society
- P.C. Roth, D.C. Arnold, and B.P. Miller. Mrnet: A software-based multicast/reduction network for scalable tools. In International Conference on Supercomputing, pages 21-36. IEEE Computer Society, 2003.
- (2003) International Conference on Supercomputing , pp. 21-36
- Roth, P.C.¹ Arnold, D.C.² Miller, B.P.³

287
- 79952608531
- Master’s thesis, May
- G. Rudy. CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation. Master’s thesis, May 2010.
- (2010) CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation
- Rudy, G.¹

288
- 84863064747
- Technical Report RC24351 W0709-061, IBM Research Division
- V. Salapura, K. Ganesan, A. Gara, M. Gschwind, J. Sexton, and R. Walkup. Nextgeneration performance counters: Towards monitoring over a thousand concurrent events. Technical Report RC24351 W0709-061, IBM Research Division, 2007.
- (2007) Nextgeneration performance counters: Towards monitoring over a thousand concurrent events
- Salapura, V.¹ Ganesan, K.² Gara, A.³ Gschwind, M.⁴ Sexton, J.⁵ Walkup, R.⁶

289
- 70249124202
- CBHPC 2008 (Component-Based High Performance Computing) (accepted)
- E. Schnetter. Multi-physics coupling of Einstein and hydrodynamics evolution: A case study of the Einstein Toolkit. CBHPC 2008 (Component-Based High Performance Computing) (accepted), 2008.
- (2008) Multi-physics coupling of Einstein and hydrodynamics evolution: A case study of the Einstein Toolkit
- Schnetter, E.¹

290
- 33746604824
- A multi-block infrastructure for three-dimensional time-dependent numerical relativity
- E. Schnetter, P. Diener, E.N. Dorband, and M. Tiglio. A multi-block infrastructure for three-dimensional time-dependent numerical relativity. Classical Quantum Gravity, 23: S553-S578, 2006.
- (2006) Classical Quantum Gravity , vol.23 , pp. S553-S578
- Schnetter, E.¹ Diener, P.² Dorband, E.N.³ Tiglio, M.⁴

291
- 1842479966
- Evolutions in 3D numerical relativity using fixed mesh refinement
- E. Schnetter, S.H. Hawley, and I. Hawke. Evolutions in 3D numerical relativity using fixed mesh refinement. Classical and Quantum Gravity, 21: 1465-1488, 2004.
- (2004) Classical and Quantum Gravity , vol.21 , pp. 1465-1488
- Schnetter, E.¹ Hawley, S.H.² Hawke, I.³

292
- 34548192076
- Optical properties of zno/zns and zno/znte heterostructures for photovoltaic applications
- J. Schrier, D.O. Demchenko, L.-W. Wang, and A.P. Alivisatos. Optical properties of zno/zns and zno/znte heterostructures for photovoltaic applications. NanoLett., 7: 2377, 2007.
- (2007) NanoLett. , vol.7 , pp. 2377
- Schrier, J.¹ Demchenko, D.O.² Wang, L.-W.³ Alivisatos, A.P.⁴

293
- 34547489425
- A flexible and dynamic infrastructure for MPI tool interoperability
- M. Schulz and B.R. de Supinski. A flexible and dynamic infrastructure for MPI tool interoperability. In Proceedings of ICPP 2006, pages 193-202, 2006.
- (2006) Proceedings of ICPP 2006 , pp. 193-202
- Schulz, M.¹ De Supinski, B.R.²

294
- 56749160395
- pnMPI tools: A whole lot greater than the sum of their parts
- M. Schulz and B.R. de Supinski. pnMPI tools: A whole lot greater than the sum of their parts. In Proceedings of SC07, 2007.
- (2007) Proceedings of SC07
- Schulz, M.¹ De Supinski, B.R.²

295
- 85054464259
- Report of the High-End Computing Revitalization Task Force (HECRTF)
- National Science and Technology Council Committee on Technology High-End Computing Revitalization Task Force. Report of the High-End Computing Revitalization Task Force (HECRTF). 2004.
- (2004) On Technology High-End Computing Revitalization Task Force

296
- 33645982477
- Technical Report ZHR-R-0304, Dresden University of Technology, Center for High-Performance Computing, Nov
- S. Seidl. VTF3 -A fast Vampir trace file low-level management library. Technical Report ZHR-R-0304, Dresden University of Technology, Center for High-Performance Computing, Nov 2003.
- (2003) VTF3 -A fast Vampir trace file low-level management library
- Seidl, S.¹

297
- 80155189010
- Detailed performance analysis using coarse grain sampling
- H. Servat, G. Llort, J. Gimenez, and J. Labarta. Detailed performance analysis using coarse grain sampling. In 2nd Workshop on Productivity and Performance (PROPER 2009), 2009.
- (2009) 2nd Workshop on Productivity and Performance (PROPER 2009)
- Servat, H.¹ Llort, G.² Gimenez, J.³ Labarta, J.⁴

298
- 0013500375
- PhD thesis, University of Oregon, August
- S. Shende. The Role of Instrumentation and Mapping in Performance Measurement. PhD thesis, University of Oregon, August 2001.
- (2001) The Role of Instrumentation and Mapping in Performance Measurement
- Shende, S.¹

299
- 38049043035
- Springer
- S. Shende, A. Malony, and A. Morris. Optimization of Instrumentation in Parallel Performance Evaluation Tools, volume 4699 of LNCS, pages 440-449. Springer, 2008.
- (2008) Optimization of Instrumentation in Parallel Performance Evaluation Tools, volume 4699 of LNCS , pp. 440-449
- Shende, S.¹ Malony, A.² Morris, A.³

300
- 33645998439
- The TAU parallel performance system
- Summer
- S. Shende and A.D. Malony. The TAU parallel performance system. The International Journal of High Performance Computing Applications, 20(2): 287-331, Summer 2006.
- (2006) The International Journal of High Performance Computing Applications , vol.20 , Issue.2 , pp. 287-331
- Shende, S.¹ Malony, A.D.²

301
- 0031635137
- Portable Profiling and Tracing for Parallel Scientific Applications using C++
- S. Shende, A.D. Malony, J. Cuny, K. Lindlan, P. Beckman, and S. Karmesin. Portable Profiling and Tracing for Parallel Scientific Applications using C++. In Proceedings of the SIGMETRICS Symposium onParallel and Distributed Tools, SPDT’98, pages 134-145, 1998.
- (1998) Proceedings of the SIGMETRICS Symposium onParallel and Distributed Tools, SPDT’98 , pp. 134-145
- Shende, S.¹ Malony, A.D.² Cuny, J.³ Lindlan, K.⁴ Beckman, P.⁵ Karmesin, S.⁶

302
- 84947296432
- A Performance Interface for Component-Based Applications
- S. Shende, A.D. Malony, C. Rasmussen, and M. Sottile. A Performance Interface for Component-Based Applications. In Proceedings of International Workshop on Performance Modeling, Evaluation and Optimization, International Parallel and Distributed Processing Symposium, 2003.
- (2003) Proceedings of International Workshop on Performance Modeling, Evaluation and Optimization, International Parallel and Distributed Processing Symposium
- Shende, S.¹ Malony, A.D.² Rasmussen, C.³ Sottile, M.⁴

303
- 85054444397
- Autotuning and specialization: Speeding up Nek5000 with compiler technology
- June
- J. Shin, M.W. Hall, J. Chame, C. Chen, P. Fischer, and P.D. Hovland. Autotuning and specialization: Speeding up Nek5000 with compiler technology. In Proceedings of the International Conference on Supercomputing, June 2010.
- (2010) Proceedings of the International Conference on Supercomputing
- Shin, J.¹ Hall, M.W.² Chame, J.³ Chen, C.⁴ Fischer, P.⁵ Hovland, P.D.⁶

304
- 79958257802
- Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology
- October
- J. Shin, M.W. Hall, J. Chame, C. Chen, and P.D. Hovland. Autotuning and specialization: Speeding up matrix multiply for small matrices with compiler technology. In The Fourth International Workshop on Automatic Performance Tuning, October 2009.
- (2009) The Fourth International Workshop on Automatic Performance Tuning
- Shin, J.¹ Hall, M.W.² Chame, J.³ Chen, C.⁴ Hovland, P.D.⁵

305
- 77954733934
- Real time power estimation of multi-cores via performance counters
- November
- K. Singh, M. Bhadauria, and S.A. McKee. Real time power estimation of multi-cores via performance counters. Proceedings of Workshop on Design, Architecture and Simulation of Chip Multi-Processors, November 2008.
- (2008) Proceedings of Workshop on Design, Architecture and Simulation of Chip Multi-Processors
- Singh, K.¹ Bhadauria, M.² McKee, S.A.³

306
- 35948986416
- Predicting parallel application performance via machine learning approaches
- K. Singh, E. Ipek, S.A. McKee, B.R. de Supinski, M. Schulz, and R. Caruana. Predicting parallel application performance via machine learning approaches. Concurrency And Computation: Practice and Experience, 19(17): 2219-2235, 2007.
- (2007) Concurrency And Computation: Practice and Experience , vol.19 , Issue.17 , pp. 2219-2235
- Singh, K.¹ Ipek, E.² McKee, S.A.³ De Supinski, B.R.⁴ Schulz, M.⁵ Caruana, R.⁶

307
- 56749144383
- Technical Report LBNL-5503, Lawrence Berkeley National Laboratory
- D. Skinner. Performance monitoring of parallel scientific applications. Technical Report LBNL-5503, Lawrence Berkeley National Laboratory, 2005.
- (2005) Performance monitoring of parallel scientific applications
- Skinner, D.¹

308
- 22944475131
- Morgan Kaufmann Publishers Inc., San Francisco, CA
- A. Sloss, D. Symes, and C. Wright. ARM System Developer’s Guide: Designing and Optimizing System Software. Morgan Kaufmann Publishers Inc., San Francisco, CA, 2004.
- (2004) ARM System Developer’s Guide: Designing and Optimizing System Software
- Sloss, A.¹ Symes, D.² Wright, C.³

309
- 0017949328
- A comparative study of set associative memory mapping algorithms and their use for cache and main memory
- A.J. Smith. A comparative study of set associative memory mapping algorithms and their use for cache and main memory. IEEE Transactions on Software Engineering, (2): 121-130.
- IEEE Transactions on Software Engineering , Issue.2 , pp. 121-130
- Smith, A.J.¹

310
- 44049110107
- Parallel ocean general circulation modeling
- R.D. Smith, J.K. Dukowicz, and R.C. Malone. Parallel ocean general circulation modeling. Phys. D, 60(1-4): 38-61, 1992.
- (1992) Phys. D , vol.60 , Issue.1-4 , pp. 38-61
- Smith, R.D.¹ Dukowicz, J.K.² Malone, R.C.³

311
- 0242505770
- A framework for application performance modeling and prediction
- A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In Proceedings of ACM/IEEE Conference on Supercomputing (SC02), 2002.
- (2002) Proceedings of ACM/IEEE Conference on Supercomputing (SC02)
- Snavely, A.¹ Carrington, L.² Wolter, N.³ Labarta, J.⁴ Badia, R.⁵ Purkayastha, A.⁶

312
- 33845456969
- Performance modeling of HPC applications
- October
- A. Snavely, X. Gao, C. Lee, N. Wolter, J. Labarta, J. Gimenez, and P. Jones. Performance modeling of HPC applications. Proceedings of the Parallel Computing Conference 2003, October 2003.
- (2003) Proceedings of the Parallel Computing Conference 2003
- Snavely, A.¹ Gao, X.² Lee, C.³ Wolter, N.⁴ Labarta, J.⁵ Gimenez, J.⁶ Jones, P.⁷

313
- 10044276950
- An Algebra for Cross-Experiment Performance Analysis
- August
- F. Song, F. Wolf, N. Bhatia, J. Dongarra, and S. Moore. An Algebra for Cross-Experiment Performance Analysis. In Proceedings of International Conference on Parallel Processing (ICPP-04), August 2004.
- (2004) Proceedings of International Conference on Parallel Processing (ICPP-04)
- Song, F.¹ Wolf, F.² Bhatia, N.³ Dongarra, J.⁴ Moore, S.⁵

314
- 85054454559
- SPIRAL project. http://www.spiral.net

315
- 0036652569
- Pentium 4 performance-monitoring features
- B. Sprunt. Pentium 4 performance-monitoring features. IEEE Micro, 22(4): 72-82, 2002.
- (2002) IEEE Micro , vol.22 , Issue.4 , pp. 72-82
- Sprunt, B.¹

316
- 0028132513
- Atom: A system for buiding customized porgram analysis tools
- Orlando, FL, June
- A. Srivastava and A. Eustace. Atom: A system for buiding customized porgram analysis tools. In Proceedings of of the SIGPLAN 94 Conf. on Porgramming Language Design and Implementation, pages 196-205, Orlando, FL, June 1994.
- (1994) Proceedings of of the SIGPLAN 94 Conf. on Porgramming Language Design and Implementation , pp. 196-205
- Srivastava, A.¹ Eustace, A.²

317
- 85054430208
- STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream

318
- 16244377113
- Architecture independent performance characterization and benchmarking for scientific applications
- October
- E. Strohmaier and H. Shan. Architecture independent performance characterization and benchmarking for scientific applications. In International Symposium on Modeling, Analysis and Simulation of Computer and telecommunication Systems, October 2004.
- (2004) International Symposium on Modeling, Analysis and Simulation of Computer and telecommunication Systems
- Strohmaier, E.¹ Shan, H.²

319
- 33845468993
- Apex-MAP: A global data access benchmark to analyze HPC systems and parallel programming paradigms
- E. Strohmaier and H. Shan. Apex-MAP: A global data access benchmark to analyze HPC systems and parallel programming paradigms. In Proceedings of 2005 ACM/IEEE Conference on Supercomputing (SC05), 2005.
- (2005) Proceedings of 2005 ACM/IEEE Conference on Supercomputing (SC05)
- Strohmaier, E.¹ Shan, H.²

320
- 0346569455
- Technical Report PSC-Sandia-FR-3.0, Pittsburgh Supercomputing Center, PA
- R. Subramanya and R. Reddy. Sandia DNS code for 3D compressible flows -Final Report. Technical Report PSC-Sandia-FR-3.0, Pittsburgh Supercomputing Center, PA, 2000.
- (2000) Sandia DNS code for 3D compressible flows -Final Report
- Subramanya, R.¹ Reddy, R.²

321
- 85054452679
- Sun Microsystems. Sun Studio Performance Analyzer. http://developers.sun.com/sunstudio/overview/topics/analyzing.jsp 2009.
- (2009) Sun Studio Performance Analyzer

322
- 33845443250
- Parallel Parameter Tuning for Applications with Performance Variability
- Washington, DC, IEEE Computer Society
- V. Tabatabaee, A. Tiwari, and J.K. Hollingsworth. Parallel Parameter Tuning for Applications with Performance Variability. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 57, Washington, DC, 2005. IEEE Computer Society.
- (2005) SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing , pp. 57
- Tabatabaee, V.¹ Tiwari, A.² Hollingsworth, J.K.³

323
- 84892209892
- task 4 report -earth system modeling framework survey
- B. Talbot, S. Zhou, and G. Higgins. Review of the Cactus framework: Software engineering support of the third round of scientific grand challenge investigations, task 4 report -earth system modeling framework survey.
- Review of the Cactus framework: Software engineering support of the third round of scientific grand challenge investigations
- Talbot, B.¹ Zhou, S.² Higgins, G.³

324
- 74049095154
- Diagnosing performance bottlenecks in emerging petascale applications
- New York, NY, USA, ACM
- N. Tallent, J. Mellor-Crummey, L. Adhianto, M. Fagan, and M. Krentel. Diagnosing performance bottlenecks in emerging petascale applications. In Proceedings of ACM/IEEE Conference on Supercomputing (SC09), pages 1-11, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of ACM/IEEE Conference on Supercomputing (SC09) , pp. 1-11
- Tallent, N.¹ Mellor-Crummey, J.² Adhianto, L.³ Fagan, M.⁴ Krentel, M.⁵

325
- 78650837195
- Scalable identification of load imbalance in parallel executions using call path profiles
- New York, NY, November, ACM
- N.R. Tallent, L. Adhianto, and J. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles. In Proceedings of ACM/IEEE Conference on Supercomputing (SC10), New York, NY, November 2010. ACM.
- (2010) Proceedings of ACM/IEEE Conference on Supercomputing (SC10)
- Tallent, N.R.¹ Adhianto, L.² Mellor-Crummey, J.³

326
- 67650034867
- Effective performance measurement and analysis of multithreaded applications
- New York, NY, USA, ACM
- N.R. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 229-240, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 229-240
- Tallent, N.R.¹ Mellor-Crummey, J.²

327
- 67650837951
- Binary analysis for measurement and attribution of program performance
- New York, NY, USA, ACM
- N.R. Tallent, J. Mellor-Crummey, and M.W. Fagan. Binary analysis for measurement and attribution of program performance. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 441-452, New York, NY, USA, 2009. ACM.
- (2009) Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation , pp. 441-452
- Tallent, N.R.¹ Mellor-Crummey, J.² Fagan, M.W.³

328
- 77957574504
- Analyzing lock contention in multithreaded applications
- N.R. Tallent, J. Mellor-Crummey, and A. Porterfield. Analyzing lock contention in multithreaded applications. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010.
- (2010) Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Tallent, N.R.¹ Mellor-Crummey, J.² Porterfield, A.³

329
- 77957022710
- Technical Report CCT-TR-2008-5, Louisiana State University
- J. Tao, G. Allen, I. Hinder, E. Schnetter, and Y. Zlochower. XiRel: Standard benchmarks for numerical relativity codes using Cactus and Carpet. Technical Report CCT-TR-2008-5, Louisiana State University, 2008.
- (2008) XiRel: Standard benchmarks for numerical relativity codes using Cactus and Carpet
- Tao, J.¹ Allen, G.² Hinder, I.³ Schnetter, E.⁴ Zlochower, Y.⁵

330
- 23944471086
- Prophesy: An infrastructure for performance analysis and modeling of parallel and grid applications
- V. Taylor, X. Wu, and R. Stevens. Prophesy: An infrastructure for performance analysis and modeling of parallel and grid applications. SIGMETRICS Perform. Eval. Rev., 30(4): 13-18, 2003.
- (2003) SIGMETRICS Perform. Eval. Rev. , vol.30 , Issue.4 , pp. 13-18
- Taylor, V.¹ Wu, X.² Stevens, R.³

331
- 85054432382
- The Parallel Ocean Program. http://climate.lanl.gov/Models/POP

332
- 18544378971
- The R Foundation for Statistical Computing. R project for statistical computing. http://www.r-project.org, 2007.
- (2007) R project for statistical computing

333
- 0345421747
- sixth edition, May
- K. Thompson and D.M. Ritchie. Unix programmers manual, sixth edition, May 1975.
- (1975) Unix programmers manual
- Thompson, K.¹ Ritchie, D.M.²

334
- 0002862950
- Gravitational Radiation -a New Window Onto the Universe. (Karl Schwarzschild Lecture 1996)
- K.S. Thorne. Gravitational Radiation -a New Window Onto the Universe. (Karl Schwarzschild Lecture 1996). Reviews of Modern Astronomy, 10: 1-28, 1997.
- (1997) Reviews of Modern Astronomy , vol.10 , pp. 1-28
- Thorne, K.S.¹

335
- 56749185811
- A genetic algorithm approach to modeling the performance of memory-bound computations
- M.M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. A genetic algorithm approach to modeling the performance of memory-bound computations. In Proceedings of ACM/IEEE Conference on Supercomputing (SC07), 2007.
- (2007) Proceedings of ACM/IEEE Conference on Supercomputing (SC07)
- Tikir, M.M.¹ Carrington, L.² Strohmaier, E.³ Snavely, A.⁴

336
- 0034375534
- The accuracy, consistency, and speed of an electronpositron equation of state based on table interpolation of the helmholtz free energy
- F.X. Timmes and F.D. Swesty. The accuracy, consistency, and speed of an electronpositron equation of state based on table interpolation of the helmholtz free energy. Astrophysical Journal, Supplement, 126: 501-516, 2000.
- (2000) Astrophysical Journal, Supplement , vol.126 , pp. 501-516
- Timmes, F.X.¹ Swesty, F.D.²

337
- 70449844310
- A scalable autotuning framework for compiler optimization
- April
- A. Tiwari, C. Chen, J. Chame, M. Hall, and J.K. Hollingsworth. A scalable autotuning framework for compiler optimization. In Proceedings of the 24th International Parallel and Distributed Processing Symposium, April 2009.
- (2009) Proceedings of the 24th International Parallel and Distributed Processing Symposium
- Tiwari, A.¹ Chen, C.² Chame, J.³ Hall, M.⁴ Hollingsworth, J.K.⁵

338
- 75849140925
- Technical Report UT-CS-08-632, University of Tennessee, LAPACK Working Note 210
- S. Tomov, J. Dongarra, and M. Baboulin. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Technical Report UT-CS-08-632, University of Tennessee, 2008. LAPACK Working Note 210.
- (2008) Towards dense linear algebra for hybrid GPU accelerated manycore systems
- Tomov, S.¹ Dongarra, J.² Baboulin, M.³

339
- 84858922896
- University of Oregon. TAU Portable Profiling. http://tau.uoregon.edu
- TAU Portable Profiling

340
- 85054446997
- University of Oregon. TAU Portal. http://tau.nic.uoregon.edu
- TAU Portal

341
- 0036036949
- Dynamic statistical profiling of communication activity in distributed applications
- New York, NY, USA, ACM
- J. Vetter. Dynamic statistical profiling of communication activity in distributed applications. In Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 240-250, New York, NY, USA, 2002. ACM.
- (2002) Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems , pp. 240-250
- Vetter, J.¹

342
- 38049063674
- J. Vetter and C. Chambreau. mpiP: Lightweight, scalable MPI profiling. http://www.llnl.gov/CASC/mpip
- Mpip: Lightweight, scalable MPI profiling
- Vetter, J.¹ Chambreau, C.²

343
- 38049063674
- April
- J.S. Vetter and C. Chambreau. mpiP: Lightweight, scalable MPI profiling, April 2005. http://www.llnl.gov/CASC/mpip
- (2005) Mpip: Lightweight, scalable MPI profiling
- Vetter, J.S.¹ Chambreau, C.²

344
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- IEEE, to appear
- V. Volkov and J. Demmel. Benchmarking GPUs to tune dense linear algebra. In Supercomputing 08. IEEE, 2008. to appear.
- (2008) Supercomputing 08
- Volkov, V.¹ Demmel, J.²

345
- 84949504639
- ADAPT: Automated de-coupled adaptive program transformation
- M.J. Voss and R. Eigenmann. ADAPT: Automated de-coupled adaptive program transformation. Parallel Processing, 2000. Proceedings. 2000 International Conference on, 2000.
- (2000) Parallel Processing, 2000. Proceedings. 2000 International Conference on
- Voss, M.J.¹ Eigenmann, R.²

346
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- Institute of Physics Publishing, June
- R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series. Institute of Physics Publishing, June 2005.
- (2005) Proceedings of SciDAC 2005, Journal of Physics: Conference Series
- Vuduc, R.¹ Demmel, J.² Yelick, K.³

347
- 24344485098
- OSKI: A library of automatically tuned sparse matrix kernels
- June
- R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16: 521-530, June 2005.
- (2005) Journal of Physics: Conference Series , vol.16 , pp. 521-530
- Vuduc, R.¹ Demmel, J.² Yelick, K.³

348
- 84990706303
- Parallelizing the spectral transform method. Part II
- October
- D.W. Walker, P.H. Worley, and J.B. Drake. Parallelizing the spectral transform method. Part II. Concurrency: Practice and Experience, 4(7): 509-531, October 1992.
- (1992) Concurrency: Practice and Experience , vol.4 , Issue.7 , pp. 509-531
- Walker, D.W.¹ Worley, P.H.² Drake, J.B.³

349
- 84947558486
- SPMD OpenMP vs MPI for ocean models
- Lund, Sweden, Lund University
- A.J. Wallcraft. SPMD OpenMP vs MPI for ocean models. In Proceedings of the First European Workshop on OpenMP, Lund, Sweden, 1999. Lund University. http://www.it.lth.se/ewomp99
- (1999) Proceedings of the First European Workshop on OpenMP
- Wallcraft, A.J.¹

350
- 70449505546
- Linearly scaling 3D fragment method for large-scale electronic structure calculations
- L.-W. Wang, B. Lee, H. Shan, Z. Zhao, J. Meza, E. Strohmaier, and D. Bailey. Linearly scaling 3D fragment method for large-scale electronic structure calculations. Proceedings of ACM/IEEE Conference on Supercomputing (SC08), 2008.
- (2008) Proceedings of ACM/IEEE Conference on Supercomputing (SC08)
- Wang, L.-W.¹ Lee, B.² Shan, H.³ Zhao, Z.⁴ Meza, J.⁵ Strohmaier, E.⁶ Bailey, D.⁷

351
- 42749103540
- First-principles thousand-atoms quantum dot calculations
- L.-W. Wang and J. Li. First-principles thousand-atoms quantum dot calculations. Physical Review B, 69: 153302, 2004.
- (2004) Physical Review B , vol.69 , pp. 153302
- Wang, L.-W.¹ Li, J.²

352
- 42049097273
- Linear scaling three-dimensional fragment method for large-scale electronic structure calculations
- L.-W.Wang, Z. Zhao, and J. Meza. Linear scaling three-dimensional fragment method for large-scale electronic structure calculations. Physical Review B, 77: 165113, 2008.
- (2008) Physical Review B , vol.77 , pp. 165113
- Wang, L.-W.¹ Zhao, Z.² Meza, J.³

353
- 0001604458
- Solving Schrodinger’s equation around a desired energy: Application to silicon quantum dots
- L.-W. Wang and A. Zunger. Solving Schrodinger’s equation around a desired energy: Application to silicon quantum dots. Journal of Chemical Physics, 100: 2394, 1994.
- (1994) Journal of Chemical Physics , vol.100 , pp. 2394
- Wang, L.-W.¹ Zunger, A.²

354
- 70449490133
- L.W. Wang. Parallel planewave pseudopotential ab initio package, 2004. http://hpcrd.lbl.gov/~linwang/PEtot/PEtot.html
- (2004) Parallel planewave pseudopotential ab initio package
- Wang, L.W.¹

355
- 0000761743
- T.A.Weaver, G.B. Zimmerman, and S.E.Woosley. Presupernova evolution of massive stars. 225: 1021-1029, 1978.
- (1978) Presupernova evolution of massive stars , vol.225 , pp. 1021-1029
- Weaver, T.A.¹ Zimmerman, G.B.² Woosley, S.E.³

356
- 56449095414
- September
- V.M.Weaver and S.A. McKee. Can hardware performance counters be trusted? pages 141-150, September 2008.
- (2008) Can hardware performance counters be trusted? , pp. 141-150
- Weaver, V.M.¹ McKee, S.A.²

357
- 33845417137
- Quantifying locality in the memory access patterns of HPC applications
- Nov
- J. Weinberg, M.O. McCracken, E. Strohmaier, and A. Snavely. Quantifying locality in the memory access patterns of HPC applications. Proceedings of ACM/IEEE Conference on Supercomputing (SC05), pages 50-61, Nov. 2005.
- (2005) Proceedings of ACM/IEEE Conference on Supercomputing (SC05) , pp. 50-61
- Weinberg, J.¹ McCracken, M.O.² Strohmaier, E.³ Snavely, A.⁴

358
- 85054453781
- Atlas version 3.8: Status and overview
- Tokyo, Japan, September
- R.C. Whaley. Atlas version 3.8: Status and overview. In International Workshop on Automatic Performance Tuning (iWAPT07), Tokyo, Japan, September 2007.
- (2007) International Workshop on Automatic Performance Tuning (iWAPT07)
- Whaley, R.C.¹

359
- 0003278639
- Automatically tuned linear algebra software
- November
- R.C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In Proceedings of Supercomputing '98, November 1998.
- (1998) Proceedings of Supercomputing '98
- Whaley, R.C.¹ Dongarra, J.²

360
- 84943297310
- Automatically tuned linear algebra software
- R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In SuperComputing, 1998.
- (1998) SuperComputing
- Whaley, R.C.¹ Dongarra, J.J.²

361
- 0343462141
- Automated empirical optimization of software and the ATLAS project
- R.C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2): 3-35, 2001.
- (2001) Parallel Computing , vol.27 , Issue.1-2 , pp. 3-35
- Whaley, R.C.¹ Petitet, A.² Dongarra, J.³

362
- 65649090648
- PhD thesis, EECS Department, University of California, Berkeley, Dec
- S. Williams. Auto-tuning Performance on Multicore Computers. PhD thesis, EECS Department, University of California, Berkeley, Dec 2008.
- (2008) Auto-tuning Performance on Multicore Computers
- Williams, S.¹

363
- 51049106193
- Lattice Boltzmann simulation optimization on leading multicore platforms
- Miami, FL
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. In Interational Conference on Parallel and Distributed Computing Systems (IPDPS), Miami, FL, 2008.
- (2008) Interational Conference on Parallel and Distributed Computing Systems (IPDPS)
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

364
- 67650998701
- Lattice Boltzmann simulation optimization on leading multicore platforms
- S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. Journal of Parallel and Distributed Computing, 69(9): 762-777, 2009.
- (2009) Journal of Parallel and Distributed Computing , vol.69 , Issue.9 , pp. 762-777
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

365
- 56749158843
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of ACM/IEEE Conference on Supercomputing (SC07), 2007.
- (2007) Proceedings of ACM/IEEE Conference on Supercomputing (SC07)
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

366
- 60949098907
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing -Special Issue on Revolutionary Technologies for Acceleration of Emerging Petascale Applications, 35(3): 178-194, 2008.
- (2008) Parallel Computing -Special Issue on Revolutionary Technologies for Acceleration of Emerging Petascale Applications , vol.35 , Issue.3 , pp. 178-194
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

367
- 68949198052
- The roofline model: A pedagogical tool for auto-tuning kernels on multicore architectures
- August
- S. Williams, D. Patterson, L. Oliker, J. Shalf, and K. Yelick. The roofline model: A pedagogical tool for auto-tuning kernels on multicore architectures. In IEEE HotChips Symposium on High-Performance Chips (HotChips 2008), August 2008.
- (2008) IEEE HotChips Symposium on High-Performance Chips (HotChips 2008)
- Williams, S.¹ Patterson, D.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

368
- 67650797544
- Roofline: An insightful visual performance model for floating-point programs and multicore architectures
- April
- S. Williams, A. Watterman, and D. Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Communications of the ACM, April 2009.
- (2009) Communications of the ACM
- Williams, S.¹ Watterman, A.² Patterson, D.³

369
- 0004039521
- NCAR Tech. Note NCAR/TN-210+STR, NTIS PB83 231068, National Center for Atmospheric Research, Boulder, Colo
- D. L. Williamson. Description of NCAR Community Climate Model (CCM0B). NCAR Tech. Note NCAR/TN-210+STR, NTIS PB83 231068, National Center for Atmospheric Research, Boulder, Colo., 1983.
- (1983) Description of NCAR Community Climate Model (CCM0B)
- Williamson, D.L.¹

370
- 0004039521
- NCAR Tech. Note NCAR/TN-285+STR, NTIS PB87-203782/AS, June
- D.L. Williamson, J.T. Kiehl, V. Ramanathan, R.E. Dickinson, and J.J. Hack. Description of NCAR community climate model (CCM1). NCAR Tech. Note NCAR/TN-285+STR, NTIS PB87-203782/AS, June 1987.
- (1987) Description of NCAR community climate model (CCM1)
- Williamson, D.L.¹ Kiehl, J.T.² Ramanathan, V.³ Dickinson, R.E.⁴ Hack, J.J.⁵

371
- 0003957032
- Morgan Kaufmann
- I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
- (2005) Data Mining: Practical Machine Learning Tools and Techniques
- Witten, I.¹ Frank, E.²

372
- 33646137721
- Efficient pattern search in large traces through successive refinement
- Springer
- F. Wolf, B. Mohr, J. Dongarra, and S. Moore. Efficient pattern search in large traces through successive refinement. In Proceedings of the European Conference on Parallel Computing (EuroPar 2004, LNCS 3149), pages 47-54. Springer, 2004.
- (2004) Proceedings of the European Conference on Parallel Computing (EuroPar 2004, LNCS 3149) , pp. 47-54
- Wolf, F.¹ Mohr, B.² Dongarra, J.³ Moore, S.⁴

373
- 84885411868
- Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications
- Stuttgart, Germany, July, Springer. 978-3-540-68561-6
- F. Wolf, B. Wylie, E. Ábrahám, D. Becker, W. Frings, K. Fürlinger, M. Geimer, M. Hermanns, B. Mohr, S. Moore, M. Pfeifer, and Z. Szebenyi. Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications. In Proceedings of the 2nd HLRS Parallel Tools Workshop, pages 157-167, Stuttgart, Germany, July 2008. Springer. ISBN 978-3-540-68561-6.
- (2008) Proceedings of the 2nd HLRS Parallel Tools Workshop , pp. 157-167
- Wolf, F.¹ Wylie, B.² Ábrahám, E.³ Becker, D.⁴ Frings, W.⁵ Fürlinger, K.⁶ Geimer, M.⁷ Hermanns, M.⁸ Mohr, B.⁹ Moore, S.¹⁰ Pfeifer, M.¹¹ Szebenyi, Z.¹²

374
- 85054458024
- Performance of the Community Atmosphere Model on the Cray X1E and XT3
- R. Winget and K. Winget, editor, Eagan, MN, Cray User Group, Inc
- P. Worley. Performance of the Community Atmosphere Model on the Cray X1E and XT3. In R. Winget and K. Winget, editor, Proceedings of the 48th Cray User Group Conference, May 8-11, 2006, Eagan, MN, 2006. Cray User Group, Inc.
- (2006) Proceedings of the 48th Cray User Group Conference, May 8-11, 2006
- Worley, P.¹

375
- 85054459662
- June, Poster Presentation at the 13th Annual CCSM Workshop, June 17-19, 2008, Breckenridge, CO
- P. Worley and A. Mirin. Performance Results for the new CAM Benchmark Suite, June 2008. Poster Presentation at the 13th Annual CCSM Workshop, June 17-19, 2008, Breckenridge, CO.
- (2008) Performance Results for the new CAM Benchmark Suite
- Worley, P.¹ Mirin, A.²

376
- 33749065293
- Performance engineering in the community atmosphere model
- P. Worley, A. Mirin, J. Drake, and W. Sawyer. Performance engineering in the community atmosphere model. Journal of Physics: Conference Series, 46: 356-362, 2006. doi: 10.1088/1742-6596/46/1/050
- (2006) Journal of Physics: Conference Series , vol.46 , pp. 356-362
- Worley, P.¹ Mirin, A.² Drake, J.³ Sawyer, W.⁴

377
- 0346941076
- MPI performance evaluation and characterization using a compact application benchmark code
- IEEE Computer Society Press, Los Alamitos, CA
- P.H. Worley. MPI performance evaluation and characterization using a compact application benchmark code. In Proceedings of the Second MPI Developers Conference and Users’ Meeting, pages 170-177. IEEE Computer Society Press, Los Alamitos, CA, 1996.
- (1996) Proceedings of the Second MPI Developers Conference and Users’ Meeting , pp. 170-177
- Worley, P.H.¹

378
- 85054450271
- Scaling the unscalable: A case study on the AlphaServer SC
- P.H. Worley. Scaling the unscalable: A case study on the AlphaServer SC. In Proceedings of ACM/IEEE Conference on Supercomputing (SC02). 2002.
- (2002) Proceedings of ACM/IEEE Conference on Supercomputing (SC02)
- Worley, P.H.¹

379
- 84883330114
- Benchmarking using the Community Atmosphere Model
- Warrenton, VA, The Standard Performance Evaluation Corp
- P.H. Worley. Benchmarking using the Community Atmosphere Model. In Proceedings of the 2006 SPEC Benchmark Workshop, January 23, 2006, Warrenton, VA, 2006. The Standard Performance Evaluation Corp.
- (2006) Proceedings of the 2006 SPEC Benchmark Workshop, January 23, 2006
- Worley, P.H.¹

380
- 84990717295
- Parallelizing the spectral transform method
- June
- P.H. Worley and J.B. Drake. Parallelizing the spectral transform method. Concurrency: Practice and Experience, 4(4): 269-291, June 1992.
- (1992) Concurrency: Practice and Experience , vol.4 , Issue.4 , pp. 269-291
- Worley, P.H.¹ Drake, J.B.²

381
- 23844503894
- Performance portability in the physical parameterizations of the Community Atmosphere Model
- August
- P.H. Worley and J.B. Drake. Performance portability in the physical parameterizations of the Community Atmosphere Model. International Journal of High Performance Computing Applications, 19(3): 1-15, August 2005.
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.3 , pp. 1-15
- Worley, P.H.¹ Drake, J.B.²

382
- 0028565417
- Parallel spectral transform shallow water model: A runtime-tunable parallel benchmark code
- J. J. Dongarra and D. W. Walker, editors, IEEE Computer Society Press, Los Alamitos, CA
- P.H. Worley and I.T. Foster. Parallel spectral transform shallow water model: a runtime-tunable parallel benchmark code. In J. J. Dongarra and D. W. Walker, editors, Proceedings of the Scalable High Performance Computing Conference, pages 207-214. IEEE Computer Society Press, Los Alamitos, CA, 1994.
- (1994) Proceedings of the Scalable High Performance Computing Conference , pp. 207-214
- Worley, P.H.¹ Foster, I.T.²

383
- 0002372482
- Algorithm comparison and benchmarking using a parallel spectral transform shallow water model
- G.-R. Hoffman and N. Kreitz, editors, World Scientific Publishing Co. Pte. Ltd., Singapore
- P.H. Worley, I.T. Foster, and B. Toonen. Algorithm comparison and benchmarking using a parallel spectral transform shallow water model. In G.-R. Hoffman and N. Kreitz, editors, Coming of Age: Proceedings of the Sixth ECMWF Workshop on Use of Parallel Processors in Meteorology, pages 277-289. World Scientific Publishing Co. Pte. Ltd., Singapore, 1995.
- (1995) Coming of Age: Proceedings of the Sixth ECMWF Workshop on Use of Parallel Processors in Meteorology , pp. 277-289
- Worley, P.H.¹ Foster, I.T.² Toonen, B.³

384
- 25144441529
- The performance evolution of the Parallel Ocean Program on the Cray X1
- R. Winget and K. Winget, editor, Eagan, MN, ray User Group, Inc
- P.H. Worley and J. Levesque. The performance evolution of the Parallel Ocean Program on the Cray X1. In R. Winget and K. Winget, editor, Proceedings of the 46th Cray User Group Conference, May 17-21, 2004, Eagan, MN, 2004. Cray User Group, Inc.
- (2004) Proceedings of the 46th Cray User Group Conference, May 17-21, 2004
- Worley, P.H.¹ Levesque, J.²

385
- 85052019260
- From trace generation to visualization: A performance framework for distributed parallel systems
- November
- C.E. Wu, A. Bolmarcich, M. Snir, D. Wootton, F. Parpia, A. Chan, E. Lusk, and W. Gropp. From trace generation to visualization: A performance framework for distributed parallel systems. In Proceedings of ACM/IEEE Conference on Supercomputing (SC00), November 2000.
- (2000) Proceedings of ACM/IEEE Conference on Supercomputing (SC00)
- Wu, C.E.¹ Bolmarcich, A.² Snir, M.³ Wootton, D.⁴ Parpia, F.⁵ Chan, A.⁶ Lusk, E.⁷ Gropp, W.⁸

386
- 18844422753
- SPL: A language and compiler for DSP algorithms
- June
- J. Xiong, J. Johnson, R. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2001.
- (2001) Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation
- Xiong, J.¹ Johnson, J.² Johnson, R.³ Padua, D.⁴

387
- 34548765138
- POET: Parameterized optimizations for empirical tuning
- March
- Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan. POET: parameterized optimizations for empirical tuning. In Proceedings of the 21st International Parallel and Distributed Processing Symposium, March 2007.
- (2007) Proceedings of the 21st International Parallel and Distributed Processing Symposium
- Yi, Q.¹ Seymour, K.² You, H.³ Vuduc, R.⁴ Quinlan, D.⁵

388
- 20744459570
- Is search really necessary to generate high-performance BLAS?
- K. Yotov, X. Li, G. Ren, M.J. Garzarán, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2): 358-386, 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 358-386
- Yotov, K.¹ Li, X.² Ren, G.³ Garzarán, M.J.⁴ Padua, D.⁵ Pingali, K.⁶ Stodghill, P.⁷

389
- 0942279071
- Diluted ii-vi oxide semiconductors with multiple band gaps
- K.M. Yu, W. Walukiewicz, J. Wu, W. Shan, J.W. Beeman, M.A. Scarpulla, O.D. Dubon, and P. Becta. Diluted ii-vi oxide semiconductors with multiple band gaps. Physical Review Letters, 91: 246403, 2003.
- (2003) Physical Review Letters , vol.91 , pp. 246403
- Yu, K.M.¹ Walukiewicz, W.² Wu, J.³ Shan, W.⁴ Beeman, J.W.⁵ Scarpulla, M.A.⁶ Dubon, O.D.⁷ Becta, P.⁸

390
- 47249088157
- A divide and conquer linear scaling three dimensional fragment method for large scale electronic structure calculations
- Z. Zhao, J. Meza, and L.-W. Wang. A divide and conquer linear scaling three dimensional fragment method for large scale electronic structure calculations. Journal of Physics: Condensed Matter, 20(294203), 2008.
- (2008) Journal of Physics: Condensed Matter , vol.20
- Zhao, Z.¹ Meza, J.² Wang, L.-W.³

391
- 70449635864
- Model-guided autotuning of highproductivity languages for petascale computing
- May
- H. Zima, M. Hall, C. Chen, and J. Chame. Model-guided autotuning of highproductivity languages for petascale computing. In Proceedings of the Symposium on High Performance Distributed Computing, May 2009.
- (2009) Proceedings of the Symposium on High Performance Distributed Computing
- Zima, H.¹ Hall, M.² Chen, C.³ Chame, J.⁴

392
- 44649089660
- Multipatch methods in general relativistic astrophysics -hydrodynamical flows on fixed backgrounds
- B. Zink, E. Schnetter, and M. Tiglio. Multipatch methods in general relativistic astrophysics -hydrodynamical flows on fixed backgrounds. Physical Review D, 77: 103015, 2008.
- (2008) Physical Review D , vol.77 , pp. 103015
- Zink, B.¹ Schnetter, E.² Tiglio, M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.