메뉴 건너뛰기




Volumn , Issue , 2010, Pages

Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method

Author keywords

[No Author keywords available]

Indexed keywords

AUTOMATED PERFORMANCE ANALYSIS; END USER PROGRAMMERS; FAST MULTIPOLE METHOD; MULTI CORE; MULTI-CORE SYSTEMS; MULTI-THREADED IMPLEMENTATION; PERFORMANCE ANALYSIS;

EID: 78650822594     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/SC.2010.19     Document Type: Conference Paper
Times cited : (28)

References (47)
  • 3
    • 20744459570 scopus 로고    scopus 로고
    • Is search really necessary to generate high-performance BLAS?
    • February
    • K. Yotov, X. Li, G. Ren, M. J. Garzarán, D. Padua, K. Pingali, and P. Stodghill, "Is search really necessary to generate high-performance BLAS?" Proc. IEEE, vol. 93, no. 2, pp. 358-386, February 2005.
    • (2005) Proc. IEEE , vol.93 , Issue.2 , pp. 358-386
    • Yotov, K.1    Li, X.2    Ren, G.3    Garzarán, M.J.4    Padua, D.5    Pingali, K.6    Stodghill, P.7
  • 5
    • 44249094647 scopus 로고    scopus 로고
    • Anatomy of high-performance matrix multiplication
    • May
    • K. Goto and R. A. van de Geijn, "Anatomy of high-performance matrix multiplication, "ACM Trans. Mathematical Software (TOMS), vol. 34, no. 12, p. 25pp, May 2008.
    • (2008) ACM Trans. Mathematical Software (TOMS) , vol.34 , Issue.12 , pp. 25
    • Goto, K.1    Van De Geijn, R.A.2
  • 6
    • 48149094931 scopus 로고    scopus 로고
    • Memory hierarchy performance measurement of commercial dual-core desktop processors
    • August
    • L. Peng, J.-K. Peir, T. K. Prakash, C. Staelin, Y.-K. Chen, and D. Koppelman, "Memory hierarchy performance measurement of commercial dual-core desktop processors, "J. Sys. Arch., vol. 54, no. 8, pp. 816-828, August 2008.
    • (2008) J. Sys. Arch. , vol.54 , Issue.8 , pp. 816-828
    • Peng, L.1    Peir, J.-K.2    Prakash, T.K.3    Staelin, C.4    Chen, Y.-K.5    Koppelman, D.6
  • 7
  • 9
    • 65949107549 scopus 로고    scopus 로고
    • Roofline: An insightful visual performance model for multicore architectures
    • April
    • S. Williams, A. Waterman, and D. Patterson, "Roofline: An insightful visual performance model for multicore architectures, "Comm. ACM (CACM), vol. 52, no. 4, pp. 65-76, April 2009.
    • (2009) Comm. ACM (CACM) , vol.52 , Issue.4 , pp. 65-76
    • Williams, S.1    Waterman, A.2    Patterson, D.3
  • 10
    • 72049130291 scopus 로고    scopus 로고
    • Analytical modeling and optimization for affinity based thread scheduling on multicore systems
    • New Orleans, LA, USA, October
    • F. Song, S. Moore, and J. Dongarra, "Analytical modeling and optimization for affinity based thread scheduling on multicore systems, "in Proc. IEEE Int'l. Conf. Cluster Computing (CLUSTER), New Orleans, LA, USA, October 2009.
    • (2009) Proc. IEEE Int'l. Conf. Cluster Computing (CLUSTER)
    • Song, F.1    Moore, S.2    Dongarra, J.3
  • 12
    • 78650812184 scopus 로고    scopus 로고
    • An exploration of performance attributes for symbolic modeling of emerging processing devices
    • Houston, TX, USA, September, [Online]. Available
    • S. R. Alam, N. Bhatia, and J. S. Vetter, "An exploration of performance attributes for symbolic modeling of emerging processing devices," in Proc. High Performance Computation Conference (HPCC), Houston, TX, USA, September 2007. [Online]. Available: http://www2.cs.uh.edu/ ∼openuh/hpcc07/papers/64-Alam.pdf.
    • (2007) Proc. High Performance Computation Conference (HPCC)
    • Alam, S.R.1    Bhatia, N.2    Vetter, J.S.3
  • 14
    • 56749185811 scopus 로고    scopus 로고
    • A genetic algorithms approach to modeling the performance of memory-bound computations
    • Reno, NV, USA, November
    • M. M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely, "A genetic algorithms approach to modeling the performance of memory-bound computations, "in Proc. ACM/IEEE Conf. Supercomputing (SC), no. 47, Reno, NV, USA, November 2007.
    • (2007) Proc. ACM/IEEE Conf. Supercomputing (SC) , Issue.47
    • Tikir, M.M.1    Carrington, L.2    Strohmaier, E.3    Snavely, A.4
  • 17
    • 0000396658 scopus 로고
    • A fast algorithm for particle simulations
    • L. Greengard and V. Rokhlin, "A fast algorithm for particle simulations, "J. Comp. Phys., vol. 73, pp. 325-348, 1987.
    • (1987) J. Comp. Phys. , vol.73 , pp. 325-348
    • Greengard, L.1    Rokhlin, V.2
  • 18
    • 2442446356 scopus 로고    scopus 로고
    • A kernel-independent adaptive fast multipole method in two and three dimensions
    • May
    • L. Ying, D. Zorin, and G. Biros, "A kernel-independent adaptive fast multipole method in two and three dimensions, "J. Comp. Phys., vol. 196, pp. 591-626, May 2004.
    • (2004) J. Comp. Phys. , vol.196 , pp. 591-626
    • Ying, L.1    Zorin, D.2    Biros, G.3
  • 21
    • 0037482408 scopus 로고    scopus 로고
    • The fast multipole algorithm
    • January/February
    • J. Board and K. Schulten, "The fast multipole algorithm, "Computing in Science and Engineering, vol. 2, no. 1, pp. 76-79, January/February 2000.
    • (2000) Computing in Science and Engineering , vol.2 , Issue.1 , pp. 76-79
    • Board, J.1    Schulten, K.2
  • 23
    • 77956199814 scopus 로고    scopus 로고
    • Linear-time algorithms for pairwise statistical problems
    • Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. MIT Press, [Online]. Available
    • P. Ram, D. Lee, W. March, and A. Gray, "Linear-time algorithms for pairwise statistical problems, "in Proc. Advances in Neural Information Processing Systems (NIPS), Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. MIT Press, 2009, pp. 1527-1535. [Online]. Available: http://books.nips.cc/papers/files/nips22/NIPS20090436.pdf.
    • (2009) Proc. Advances in Neural Information Processing Systems (NIPS) , pp. 1527-1535
    • Ram, P.1    Lee, D.2    March, W.3    Gray, A.4
  • 24
    • 78650847982 scopus 로고    scopus 로고
    • April, [Online]. Available: http://software.intel. com/en-us/intel-vtune/
    • TMPerformance Analyzer, "http://software.intel.com/en-us/intel-vtune/, April 2010. [Online]. Available: http://software.intel. com/en-us/intel-vtune/.
    • (2010) TMPerformance Analyzer
  • 25
    • 55749098636 scopus 로고    scopus 로고
    • [Online]. Available: http://oprofile.sourceforge.net/doc/index.html
    • J. Levon, "OProfile manual, "http://oprofile.sourceforge.net/ doc/index.html, 2004. [Online]. Available: http://oprofile.sourceforge.net/doc/ index.html.
    • (2004) OProfile Manual
    • Levon, J.1
  • 27
    • 78650835381 scopus 로고    scopus 로고
    • [Online]. Available: http://www.openspeedshop.org/wp/
    • TM, version 1.9.3, "http://www. openspeedshop.org/wp/, October2009. [Online]. Available: http://www. openspeedshop.org/wp/.
    • TM, Version 1.9.3
  • 29
    • 85084160699 scopus 로고    scopus 로고
    • Lmbench: Portable tools for performance analysis
    • San Diego, CA, USA, January, [Online]. Available
    • L. McVoy and C. Staelin, "lmbench: Portable tools for performance analysis, "in Proc. USENIX Ann. Technical Conf., San Diego, CA, USA, January 1996. [Online]. Available: http://lmbench.sourceforge.net/.
    • (1996) Proc. USENIX Ann. Technical Conf.
    • McVoy, L.1    Staelin, C.2
  • 30
    • 0038998034 scopus 로고
    • Memory bandwidth and machine balance in high performance computers
    • Newsletter, December, [Online]. Available
    • J. McCalpin, "Memory bandwidth and machine balance in high performance computers, "IEEE Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995. [Online]. Available: http://www.cs.virginia. edu/∼mccalpin/papers/balance/index.html.
    • (1995) IEEE Technical Committee on Computer Architecture (TCCA)
    • McCalpin, J.1
  • 32
    • 37249081465 scopus 로고    scopus 로고
    • High performance BLAS formulation of the multipole-to-local operator in the fast multipole method
    • January
    • O. Coulaud, P. Fortin, and J. Roman, "High performance BLAS formulation of the multipole-to-local operator in the fast multipole method, "J. Comp. Phys., vol. 227, no. 3, pp. 1836-1862, January 2008.
    • (2008) J. Comp. Phys. , vol.227 , Issue.3 , pp. 1836-1862
    • Coulaud, O.1    Fortin, P.2    Roman, J.3
  • 33
    • 84877033732 scopus 로고    scopus 로고
    • A new parallel kernel-independent fast multipole method
    • Phoenix, AZ, USA, November, [Online]. Available
    • L. Ying, G. Biros, D. Zorin, and H. Langston, "A new parallel kernel-independent fast multipole method, "in Proc. ACM/IEEE Conf. Supercomputing (SC), Phoenix, AZ, USA, November 2003. [Online]. Available: http://portal.acm.org/citation.cfm?id=1050165.
    • (2003) Proc. ACM/IEEE Conf. Supercomputing (SC)
    • Ying, L.1    Biros, G.2    Zorin, D.3    Langston, H.4
  • 34
    • 48149107858 scopus 로고    scopus 로고
    • Fast multipole methods on graphics processors
    • N. A. Gumerov and R. Duraiswami, "Fast multipole methods on graphics processors, "J. Comp. Phys., vol. 227, pp. 8290-8313, 2008.
    • (2008) J. Comp. Phys. , vol.227 , pp. 8290-8313
    • Gumerov, N.A.1    Duraiswami, R.2
  • 35
    • 70350754499 scopus 로고    scopus 로고
    • Adapting a message-driven parallel application to GPU-accelerated clusters
    • Austin, TX, USA, November
    • J. C. Phillips, J. E. Stone, and K. Schulten, "Adapting a message-driven parallel application to GPU-accelerated clusters, "in Proc. ACM/IEEE Conf. Supercomputing (SC), Austin, TX, USA, November 2008.
    • (2008) Proc. ACM/IEEE Conf. Supercomputing (SC)
    • Phillips, J.C.1    Stone, J.E.2    Schulten, K.3
  • 36
    • 77953999937 scopus 로고    scopus 로고
    • Fast, parallel, GPUbased construction of space filling curves and octrees
    • Redwood City, CA, USA, (poster)
    • P. Ajmera, R. Goradia, S. Chandran, and S. Aluru, "Fast, parallel, GPUbased construction of space filling curves and octrees, "in Proc. Symp. Interactive 3D Graphics (I3D), Redwood City, CA, USA, 2008, (poster).
    • (2008) Proc. Symp. Interactive 3D Graphics (I3D)
    • Ajmera, P.1    Goradia, R.2    Chandran, S.3    Aluru, S.4
  • 37
    • 74049152899 scopus 로고    scopus 로고
    • 42 TFlops hierarchical n-body simulations on GPUs with applications in both astrophysics and turbulence
    • Portland, OR, USA, November
    • T. Hamada, T. Narumi, R. Yokota, K. Y. K. Nitadori, and M. Taiji, "42 TFlops hierarchical n-body simulations on GPUs with applications in both astrophysics and turbulence, "in Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009.
    • (2009) Proc. ACM/IEEE Conf. Supercomputing (SC)
    • Hamada, T.1    Narumi, T.2    Yokota, R.3    Nitadori, K.Y.K.4    Taiji, M.5
  • 38
    • 19944419779 scopus 로고    scopus 로고
    • Massively parallel implementation of a fast multipole method for distributed memory machines
    • July
    • J. Kurzak and B. M. Pettitt, "Massively parallel implementation of a fast multipole method for distributed memory machines, "J. Parallel Distrib. Comput., vol. 65, pp. 870-881, July 2005.
    • (2005) J. Parallel Distrib. Comput. , vol.65 , pp. 870-881
    • Kurzak, J.1    Pettitt, B.M.2
  • 39
    • 0038825209 scopus 로고    scopus 로고
    • Scalable and portable implementation of the fast multipole method on parallel comptuers
    • July
    • S. Ogata, T. J. Campbell, R. K. Kalia, A. Nakano, P. Vashishta, and S. Vemparala, "Scalable and portable implementation of the fast multipole method on parallel comptuers, "Computer Phys. Comm., vol. 153, no. 3, pp. 445-461, July 2003.
    • (2003) Computer Phys. Comm. , vol.153 , Issue.3 , pp. 445-461
    • Ogata, S.1    Campbell, T.J.2    Kalia, R.K.3    Nakano, A.4    Vashishta, P.5    Vemparala, S.6
  • 40
    • 0027747808 scopus 로고
    • A parallel hashed oct-tree n-body algorithm
    • Portland, OR, USA, November
    • M. S. Warren and J. K. Salmon, "A parallel hashed oct-tree n-body algorithm, "in Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 1993, pp. 12-21.
    • (1993) Proc. ACM/IEEE Conf. Supercomputing (SC) , pp. 12-21
    • Warren, M.S.1    Salmon, J.K.2
  • 41
    • 18844402673 scopus 로고    scopus 로고
    • Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods
    • March-April
    • B. Hariharan and S. Aluru, "Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods, "Parallel Computing (ParCo), vol. 31, no. 3-4, pp. 311-331, March- April 2005.
    • (2005) Parallel Computing (ParCo) , vol.31 , Issue.3-4 , pp. 311-331
    • Hariharan, B.1    Aluru, S.2
  • 42
    • 85060036181 scopus 로고
    • Validity of the single processor approach to achieving large-scale computing capabilities
    • Atlantic City, NJ, USA, April
    • G. M. Amdahl, "Validity of the single processor approach to achieving large-scale computing capabilities, "in Proc. AFIPS Joint Computer Conf., vol. 30, Atlantic City, NJ, USA, April 1967, pp. 483-485.
    • (1967) Proc. AFIPS Joint Computer Conf. , vol.30 , pp. 483-485
    • Amdahl, G.M.1
  • 45
    • 51049101584 scopus 로고    scopus 로고
    • A class of parallel tiled linear algebra algorithms for multicore architectures
    • Innovative Computing Laboratory, (LAPACK Working Note 191), September
    • A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, "A class of parallel tiled linear algebra algorithms for multicore architectures." Innovative Computing Laboratory, University of Tennessee Knoxville, Tech. Rep. UT-CS-07-600 (LAPACK Working Note 191), September 2007, http://www.netlib.org/ lapack/lawnspdf/lawn191.pdf.
    • (2007) University of Tennessee Knoxville, Tech. Rep. UT-CS-07-600
    • Buttari, A.1    Langou, J.2    Kurzak, J.3    Dongarra, J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.