SCOPUS 정보 검색 플랫폼

Proceedings of the 6th ACM Conference on Computing Frontiers, CF 2009

Volumn , Issue , 2009, Pages 71-80

Mapping the LU decomposition on a many-core architecture: Challenges and solutions

(2) Venetis, Ioannis E a Gao, Guang R b

a UNIVERSITY OF PATRAS (Greece)

b UNIVERSITY OF DELAWARE (United States)

Author keywords

Load balancing; Local memory; LU decomposition; Multi core; Register tiling

Indexed keywords

ADAPTIVE LOAD DISTRIBUTION; LOCAL MEMORY; LU DECOMPOSITION; MANY-CORE ARCHITECTURE; MULTI CORE; MULTICORE ARCHITECTURES; PERFORMANCE POTENTIALS; REGISTER TILING;

COMPUTER SCIENCE; RESOURCE ALLOCATION;

MEMORY ARCHITECTURE;

EID: 84885779509 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1531743.1531756 Document Type: Conference Paper

Times cited : (26)

References (24)

1
- 35348885705
- LAPACK users' guide
- 3rd edition
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. SIAM, 3rd edition, 1999.
- (1999) SIAM
- Anderson, E.¹ Bai, Z.² Bischof, C.³ Blackford, S.⁴ Demmel, J.⁵ Dongarra, J.⁶ Croz, J.D.⁷ Greenbaum, A.⁸ Hammarling, S.⁹ McKenney, A.¹⁰ Sorensen, D.¹¹

2
- 35648995516
- EECS Department, University of California, Dec.
- K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Dec. 2006.
- (2006) The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183
- Asanovic, K.¹ Bodik, R.² Catanzaro, B.C.³ Gebis, J.J.⁴ Husbands, P.⁵ Keutzer, K.⁶ Patterson, D.A.⁷ Plishker, W.L.⁸ Shalf, J.⁹ Williams, S.W.¹⁰ Yelick, K.A.¹¹

3
- 51049101584
- LAPACK Working Note 194, Nov.
- A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. LAPACK Working Note 194, Nov. 2007.
- (2007) A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
- Buttari, A.¹ Langou, J.² Kurzak, J.³ Dongarra, J.⁴

4
- 0025447908
- Improving register allocation for subscripted variables
- White Plains, NY, June
- D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for subscripted Variables. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 53-65, White Plains, NY, June 1990.
- (1990) Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation , pp. 53-65
- Callahan, D.¹ Carr, S.² Kennedy, K.³

5
- 34548753903
- T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: A Performance View. http://www-128.ibm.com/developerworks/power/library/pa-cellperf.
- Cell Broadband Engine Architecture and Its First Implementation: A Performance View
- Chen, T.¹ Raghavan, R.² Dale, J.³ Iwata, E.⁴

6
- 84885772106
- Clearspeed White Paper: CSX Processor Architecture. http://www. clearspeed.com/newsevents/presskit.
- CSX Processor Architecture

7
- 84885802150
- FAST: A functionally accurate simulation toolset for the cyclops-64 cellular architecture
- Madison, Wisconsin, June 2005
- J. del Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture. In Proceedings of the 2005 Workshop on Modeling, Benchmarking, and Simulation (MoBS 2005), Madison, Wisconsin, June 2005.
- Proceedings of the 2005 Workshop on Modeling, Benchmarking, and Simulation (MoBS 2005)
- Del Cuvillo, J.¹ Zhu, W.² Hu, Z.³ Gao, G.R.⁴

8
- 33746317085
- TiNy threads: A thread virtual machine for the cyclops64 cellular architecture
- Denver, Apr.
- J. del Cuvillo, W. Zhu, Z. Hu, and G. R. Gao. TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture. In Proceedings of the 5th Workshop on Massively Parallel Processing, Denver, Apr. 2005.
- (2005) Proceedings of the 5th Workshop on Massively Parallel Processing
- Del Cuvillo, J.¹ Zhu, W.² Hu, Z.³ Gao, G.R.⁴

9
- 0025402476
- A set of level 3 basic linear algebra subprograms
- J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1-17, 1990.
- (1990) ACM Transactions on Mathematical Software , vol.16 , Issue.1 , pp. 1-17
- Dongarra, J.J.¹ Croz, J.D.² Hammarling, S.³ Duff, I.S.⁴

10
- 0029324485
- Software libraries for linear algebra computations on high performance computers
- J. J. Dongarra and D. W. Walker. Software Libraries for Linear Algebra Computations on High Performance Computers. SIAM Review, 37(2):151-180, 1995.
- (1995) SIAM Review , vol.37 , Issue.2 , pp. 151-180
- Dongarra, J.J.¹ Walker, D.W.²

11
- 36049006000
- The push of network processing to the top of the pyramid
- Princeton, NJ
- W. Eatherton. The Push of Network Processing to the Top of the Pyramid. Keynote at the Symposium on Architectures for Networking and Communication Systems, Princeton, NJ.
- Keynote at the Symposium on Architectures for Networking and Communication Systems
- Eatherton, W.¹

12
- 85034086461
- On the problem of optimizing data transfers for complex memory systems
- St. Malo, France
- K. Gallivan, W. Jalby, and D. Gannon. On the problem of optimizing data transfers for complex memory systems. In Proceedings of the 2nd International Conference on Supercomputing, pages 238-253, St. Malo, France, 1988.
- (1988) Proceedings of the 2nd International Conference on Supercomputing , pp. 238-253
- Gallivan, K.¹ Jalby, W.² Gannon, D.³

13
- 0031273280
- Recursion leads to automatic variable blocking for dense linear algebra algorithms
- Nov.
- F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear algebra algorithms. IBM Journal of Research and Development, 41(6):737-753, Nov. 1997.
- (1997) IBM Journal of Research and Development , vol.41 , Issue.6 , pp. 737-753
- Gustavson, F.G.¹

14
- 0041893747
- HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl, 2004.
- (2004) HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers

15
- 33750004191
- Optimization of dense matrix multiplication on IBM cyclops-64: Challenges and experiences
- Dresden, Germany, Aug. 2006
- Z. Hu, J. del Cuvillo, W. Zhu, and G. R. Gao. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences. In 12th International European Conference on Parallel Processing (Euro-Par 2006), pages 134-144, Dresden, Germany, Aug. 2006.
- (2006) 12th International European Conference on Parallel Processing (Euro-Par) , pp. 134-144
- Hu, Z.¹ Del Cuvillo, J.² Zhu, W.³ Gao, G.R.⁴

16
- 0003648799
- The OpenMP implementation of NAS parallel benchmarks and its performance
- NASA Ames Research Center
- H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance. Technical report nas-99-011, NASA Ames Research Center, 1999.
- (1999) Technical Report nas-99-011
- Jin, H.¹ Frumkin, M.² Yan, J.³

17
- 0030190854
- Improving data locality with loop transformations
- K. S. McKinley, S. Carr, and C. W. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, 1996.
- (1996) ACM Transactions on Programming Languages and Systems , vol.18 , Issue.4 , pp. 424-453
- McKinley, K.S.¹ Carr, S.² Tseng, C.W.³

18
- 84858075638
- Message Passing Interface Forum
- Message Passing Interface Forum. MPI-2:Extensions to the Message-Passing Interface, 2003.
- (2003) MPI-2:Extensions to the Message-Passing Interface

19
- 84885698984
- FLAME Working Note 26, Sept.
- G. Quintana-Orti, E. S. Quintana-Orti, E. Chan, R. A. van de Geijn, and F. G. V. Zee. Design and Scheduling of an Algorithm-by-Blocks for the LU Factorization on Multithreaded Architectures. FLAME Working Note 26, Sept. 2007.
- (2007) Design and Scheduling of An Algorithm-by-Blocks for the LU Factorization on Multithreaded Architectures
- Quintana-Orti, G.¹ Quintana-Orti, E.S.² Chan, E.³ Geijn De Van, R.A.⁴ Zee, F.G.V.⁵

20
- 84881285062
- The Top500 List. http://www.top500.org.
- The Top500 List

21
- 34548858682
- An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS
- San Francisco Marriott, CA, USA, Feb.
- S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference, pages 5-7, San Francisco Marriott, CA, USA, Feb. 2007.
- (2007) Proceedings of the 2007 IEEE International Solid-State Circuits Conference , pp. 5-7
- Vangal, S.¹ Howard, J.² Ruhl, G.³ Dighe, S.⁴ Wilson, H.⁵ Tschanz, J.⁶ Finan, D.⁷ Iyer, P.⁸ Singh, A.⁹ Jacob, T.¹⁰ Jain, S.¹¹ Venkataraman, S.¹² Hoskote, Y.¹³ Borkar, N.¹⁴

22
- 35348879576
- Technical Memo 75, Computer Architecture and Parallel Systems Laboratory, University of Delaware, Feb.
- I. E. Venetis and G. R. Gao. Optimizing the LU Benchmark for the Cyclops-64 Architecture. Technical Memo 75, Computer Architecture and Parallel Systems Laboratory, University of Delaware, Feb. 2007. http://www.capsl.udel. edu/publications.shtml.
- (2007) Optimizing the LU Benchmark for the cyclops-64 Architecture
- Venetis, I.E.¹ Gao, G.R.²

23
- 0029179077
- The SPLASH-2 programs: Characterization and methodological considerations
- Santa Margherita Ligure, Italy, June
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.
- (1995) Proceedings of the 22nd International Symposium on Computer Architecture , pp. 24-36
- Woo, S.C.¹ Ohara, M.² Torrie, E.³ Singh, J.P.⁴ Gupta, A.⁵

24
- 35348812496
- Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures
- San Diego, California, USA, June
- W. Zhu, V. C. Sreedhar, Z. Hu, and G. R. Gao. Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures. In Proceedings of the 34th International Symposium on Computer Architecture, pages 35-45, San Diego, California, USA, June 2007.
- (2007) Proceedings of the 34th International Symposium on Computer Architecture , pp. 35-45
- Zhu, W.¹ Sreedhar, V.C.² Hu, Z.³ Gao, G.R.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.