SCOPUS 정보 검색 플랫폼

Journal of Computer Science and Technology

Volumn 24, Issue 6, 2009, Pages 1061-1073

Godson-t: An efficient many-core architecture for parallel program executions

(12) Fan, Dong Rui a Yuan, Nan a Zhang, Jun Chao a Zhou, Yong Bin a Lin, Wei a Song, Feng Long a Ye, Xiao Chun a Huang, He a Yu, Lei a Long, Guo Ping a Zhang, Hao a Liu, Lei a

a INSTITUTE OF COMPUTING TECHNOLOGY (China)

Author keywords

Data communication; Many core; Multithread; Parallel computing; Runtime system; Thread synchronization

Indexed keywords

DATA-COMMUNICATION; MANY-CORE; MULTI-THREAD; PARALLEL COMPUTING; RUNTIME SYSTEMS; THREAD SYNCHRONIZATION;

COMPUTER SCIENCE; CONVOLUTIONAL CODES; DATA TRANSFER; MULTITASKING; PARALLEL ARCHITECTURES; SIMULATORS; SYNCHRONIZATION;

COMPUTATIONAL EFFICIENCY;

EID: 70450170935 PISSN: 10009000 EISSN: None Source Type: Journal
DOI: 10.1007/s11390-009-9295-3 Document Type: Article

Times cited : (41)

References (38)

1
- 35648995516
- The landscape of parallel computing research: A view from Berkeley
- University of California, Berkeley, December 18
- Asanovic K et al. The landscape of parallel computing research: A view from Berkeley. Technical Report No.UCB/EECS-2006-183, University of California, Berkeley, December 18, 2006.
- (2006) Technical Report No.UCB/EECS-2006-183
- Asanovic, K.¹

2
- 33646892173
- The problem with threads
- DOI 10.1109/MC.2006.180
- EA Lee 2006 The problem with threads Computer 39 5 33 42 10.1109/MC.2006.180 (Pubitemid 43786509)
- (2006) Computer , vol.39 , Issue.5 , pp. 33-42
- Lee, E.A.¹

3
- 78651582149
- Real-world concurrency
- 10.1145/1454456.1454462
- B Cantrill J Bonwick 2008 Real-world concurrency ACM Queue 6 5 16 25 10.1145/1454456.1454462
- (2008) ACM Queue , vol.6 , Issue.5 , pp. 16-25
- Cantrill, B.¹ Bonwick, J.²

4
- 70349652835
- University of Illinois at Urbana-Champaign, November
- Adve S V, Adve V S et al. Parallel computing research at Illinois: The UPCRC agenda. Technical Report, University of Illinois at Urbana-Champaign, November 2008.
- (2008) Parallel Computing Research at Illinois: The UPCRC Agenda. Technical Report
- Adve Adve, V.S.V.S.¹

5
- 70350610063
- An efficient and flexible task management for many-core architectures
- Beijing, China, June 22-26
- Yuan N, Yu L, Fan D. An efficient and flexible task management for many-core architectures. In Proc. Workshop on Software and Hardware Challenges of Manycore Platforms, in Conjunction with the 35th International Symposium on Computer Architecture (ISCA-35), Beijing, China, June 22-26, 2008, pp.1-17.
- (2008) Proc. Workshop on Software and Hardware Challenges of Manycore Platforms, in Conjunction with the 35th International Symposium on Computer Architecture (ISCA-35) , pp. 1-17
- Yuan, N.¹ Yu, L.² Fan, D.³

6
- 0000269759
- Scheduling multithreaded computations by work stealing
- 1065.68504 10.1145/324133.324234 1747653
- RD Blumofe CE Leiserson 1999 Scheduling multithreaded computations by work stealing Journal of the ACM 46 5 720 748 1065.68504 10.1145/324133.324234 1747653
- (1999) Journal of the ACM , vol.46 , Issue.5 , pp. 720-748
- Blumofe, R.D.¹ Leiserson, C.E.²

7
- 40349113716
- CAPSULE: Hardware-assisted parallel execution of component-based programs
- Washington, DC, USA: IEEE Computer Society, Dec. 9-13
- Palatin P, Lhuillier Y, Temam O. CAPSULE: Hardware-assisted parallel execution of component-based programs. In Proc. the 39th Annual IEEE/ACM International Symposium on Micro-Architecture, Washington, DC, USA: IEEE Computer Society, Dec. 9-13, 2006, pp.247-258.
- (2006) Proc. the 39th Annual IEEE/ACM International Symposium on Micro-Architecture , pp. 247-258
- Palatin, P.¹ Lhuillier, Y.² Temam, O.³

8
- 63649096141
- Efficiency and scalability of barrier synchronization on NoC based many-core architecture
- Atlanta, USA, Oct. 19-24
- Villa O, Palermo G, Silvano C. Efficiency and scalability of barrier synchronization on NoC based many-core architecture. In Proc. CASES 2008, Atlanta, USA, Oct. 19-24, 2008, pp.81-90.
- (2008) Proc. CASES 2008 , pp. 81-90
- Villa, O.¹ Palermo, G.² Silvano, C.³

9
- 0003510632
- University of California, Berkeley
- Carlson W W, Draper J M et al. Introduction to UPC and language specification. Technical Report No. CCS-TR-99-157, University of California, Berkeley, 1999.
- (1999) Introduction to UPC and Language Specification. Technical Report No. CCS-TR-99-157
- Carlson, W.W.¹ Draper, J.M.²

10
- 0002081678
- Co-array Fortran for parallel programming
- 10.1145/289918.289920
- RW Numrich J Reid 1998 Co-array Fortran for parallel programming SIGPLAN Fortran Forum 17 2 1 31 10.1145/289918.289920
- (1998) SIGPLAN Fortran Forum , vol.17 , Issue.2 , pp. 1-31
- Numrich, R.W.¹ Reid, J.²

11
- 0032155556
- Titanium: A high-performance Java dialect
- 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0. CO;2-H
- K Yelick L Semenzato, et al. 1998 Titanium: A high-performance Java dialect Concurrency: Practice and Experience 10 11-13 825 836 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H
- (1998) Concurrency: Practice and Experience , vol.10 , Issue.1113 , pp. 825-836
- Yelick, K.¹ Semenzato, L.²

12
- 34548207355
- Sequoia: Programming the memory hierarchy
- Tampa, Florida, Nov. 11-17
- Fatahalian K, Horn D R et al. Sequoia: Programming the memory hierarchy. In Proc. the 2006 ACM/IEEE Conference on Supercomputing, Tampa, Florida, Nov. 11-17, 2006, pp.83-95.
- (2006) Proc. the 2006 ACM/IEEE Conference on Supercomputing , pp. 83-95
- Fatahalian, K.¹ Horn, D.R.²

13
- 33751022080
- Programming for parallelism and locality with hierarchically tiled arrays
- New York, USA, March 29-31
- Bikshandi G, Guo J et al. Programming for parallelism and locality with hierarchically tiled arrays. In Proc. the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, USA, March 29-31, 2006, pp.48-57.
- (2006) Proc. the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 48-57
- Bikshandi Guo, G.J.¹

14
- 0026137159
- Synchronization without contention
- Santa Clara, USA, April 8-11
- Mellor-Crummey J M, Scott M L. Synchronization without contention. In Proc. Architectural Support for Programming Languages and Operating Systems, Santa Clara, USA, April 8-11, 1991, pp.269-278.
- (1991) Proc. Architectural Support for Programming Languages and Operating Systems , pp. 269-278
- Mellor-Crummey, J.M.¹ Scott, M.L.²

15
- 0025028257
- The Tera computer system
- Amsterdam, The Netherlands, June 11-15
- Alverson R, Callahan D et al. The Tera computer system. In Proc. the 4th Int. Conf. Supercomputing, Amsterdam, The Netherlands, June 11-15, 1990, pp.1-6.
- (1990) Proc. the 4th Int. Conf. Supercomputing , pp. 1-6
- Alverson, R.¹ Callahan, D.²

16
- 35348812496
- Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures
- San Diego, USA, June 9-13
- Zhu W, Sreedhar V C et al. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th Annual International Symposium on Computer Architecture, San Diego, USA, June 9-13, 2007, pp.35-45.
- (2007) Proc. the 34th Annual International Symposium on Computer Architecture , pp. 35-45
- Zhu Sreedhar, W.V.C.¹

17
- 0029179077
- The SPLASH-2 programs: Characterization and methodological considerations
- Santa Margnerita Ligure, Italy, June 22-24
- Woo S C, Ohara M et al. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. the 22nd Annual International Symposium on Computer Architecture, Santa Margnerita Ligure, Italy, June 22-24, 1995, pp.24-36.
- (1995) Proc. the 22nd Annual International Symposium on Computer Architecture , pp. 24-36
- Woo Ohara, C.S.M.¹

18
- 4444237022
- Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry
- DOI 10.1093/bioinformatics/bth186
- Y Fu Q Yang, et al. 2004 Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry Bioinformatics 20 1 1948 1954 10.1093/bioinformatics/bth186 (Pubitemid 39199057)
- (2004) Bioinformatics , vol.20 , Issue.12 , pp. 1948-1954
- Fu, Y.¹ Yang, Q.² Sun, R.³ Li, D.⁴ Zeng, R.⁵ Ling, C.X.⁶ Gao, W.⁷

19
- 0030801002
- Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
- DOI 10.1093/nar/25.17.3389
- S Altschul T Madden A Schaffer, et al. 1997 Gapped Blast and Psi-Blast: A new generation of protein database search programs Nucleic Acids Research 25 17 3389 3402 10.1093/nar/25.17.3389 (Pubitemid 27359211)
- (1997) Nucleic Acids Research , vol.25 , Issue.17 , pp. 3389-3402
- Altschul, S.F.¹ Madden, T.L.² Schaffer, A.A.³ Zhang, J.⁴ Zhang, Z.⁵ Miller, W.⁶ Lipman, D.J.⁷

20
- 0032627704
- Evaluating synchronization on shared address space multiprocessors: Methodology and performance
- 10.1145/301464.301477
- S Kumar D Jiang, et al. 1999 Evaluating synchronization on shared address space multiprocessors: Methodology and performance ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS 1999) 27 1 23 34 10.1145/301464.301477
- (1999) ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS 1999) , vol.27 , Issue.1 , pp. 23-34
- Kumar, S.¹ Jiang, D.²

21
- 0024032163
- ANALYSIS OF THE COMPUTATIONAL AND PARALLEL COMPLEXITY OF THE LIVERMORE LOOPS.
- DOI 10.1016/0167-8191(88)90037-3
- J Feo 1988 An analysis of the computational and parallel complexity of the Livermore loops Parallel Computing 7 2 163 185 0651.65033 10.1016/0167-8191(88)90037-3 (Pubitemid 18648054)
- (1988) Parallel Computing , vol.7 , Issue.2 , pp. 163-185
- Feo John, T.¹

22
- 70350630422
- High performance matrix multiplication on many cores
- Delft, The Netherlands, Aug. 25-28
- Yuan N, Zhou Y et al. High performance matrix multiplication on many cores. In Proc. European Conference on Parallel and Distributed Computing (Euro-Par), Delft, The Netherlands, Aug. 25-28, 2009, pp.948-959.
- (2009) Proc. European Conference on Parallel and Distributed Computing (Euro-Par) , pp. 948-959
- Yuan Zhou, N.Y.¹

23
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- Austin, USA, Now. 15-21, IEEE Press
- Volkov V, Demmel J W. Benchmarking GPUs to tune dense linear algebra. In Proc. 2008 ACM/IEEE Conf. Supercomputing (SC 2008), Austin, USA, Now. 15-21, IEEE Press, 2008, pp.1-11.
- (2008) Proc. 2008 ACM/IEEE Conf. Supercomputing (SC 2008) , pp. 1-11
- Volkov, V.¹ Demmel, J.W.²

24
- 34548712562
- Optimizing fast Fourier transform on a multi-core architecture
- Long Beach, USA, March 26-30
- Chen L, Hu Z et al. Optimizing fast Fourier transform on a multi-core architecture. In Proc. IEEE International Parallel and Distributed Processing Symposium, Long Beach, USA, March 26-30, 2007, pp.1-8.
- (2007) Proc. IEEE International Parallel and Distributed Processing Symposium , pp. 1-8
- Chen Hu, L.Z.¹

25
- 33750004191
- Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences
- Dresden, Germany, August 28-September 1
- Hu Z, Cuvillo J et al. Optimization of dense matrix multiplication on IBM Cyclops-64: Challenges and experiences. In Proc. Euro-Par 2006, Dresden, Germany, August 28-September 1, pp.134-144.
- Proc. Euro-Par 2006 , pp. 134-144
- Hu Cuvillo, Z.J.¹

26
- 70350754502
- High performance discrete Fourier transforms on graphics processors
- Austin, USA, Nov. 15-21
- Govindaraju N K et al. High performance discrete Fourier transforms on graphics processors. In Proc. the 2008 ACM/IEEE Conference on Supercomputing (SC2008), Austin, USA, Nov. 15-21, 2008, pp.13-24.
- (2008) Proc. the 2008 ACM/IEEE Conference on Supercomputing (SC2008) , pp. 13-24
- Govindaraju, N.K.¹ Al, E.²

27
- 34247349114
- The potential of the cell processor for scientific computing
- Ischia, Italy, May 3-5
- Williams S, Shalf J et al. The potential of the cell processor for scientific computing. In Proc. CF'06, Ischia, Italy, May 3-5, 2006, pp.9-20.
- (2006) Proc. CF'06 , pp. 9-20
- Williams Shalf, S.J.¹ Al, E.²

28
- 0034246578
- Location consistency - a new memory model and cache consistency protocol
- DOI 10.1109/12.868026
- GR Gao V Sarkar 2000 Location consistency - A new memory model and cache consistency protocol IEEE Transactions on Computers 49 8 798 813 10.1109/12.868026 (Pubitemid 30927304)
- (2000) IEEE Transactions on Computers , vol.49 , Issue.8 , pp. 798-813
- Gao, G.R.¹ Sarkar, V.²

29
- 0032671416
- Commit-reconcile & fences (CRF): A new memory model for architects and compiler writers
- Atlanta, USA, May 2-4
- Shen X et al. Commit-reconcile & fences (CRF): A new memory model for architects and compiler writers. In Proc. the 26th Annual International Symposium on Computer Architecture, Atlanta, USA, May 2-4, 1999, pp.150-161.
- (1999) Proc. the 26th Annual International Symposium on Computer Architecture , pp. 150-161
- Shen, X.¹ Al, E.²

30
- 0030402378
- Scope consistency: A bridge between release consistency and entry consistency
- Padua, Italy, June 24-26
- Lftode L et al. Scope consistency: A bridge between release consistency and entry consistency. In Proc. the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, Padua, Italy, June 24-26, 1996, pp.277-287.
- (1996) Proc. the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures , pp. 277-287
- Lftode, L.¹ Al, E.²

31
- 35348862407
- BulkSC: Bulk enforcement of sequential consistency
- San Diego, USA, June 9-13
- Ceze L, Tuck J et al. BulkSC: Bulk enforcement of sequential consistency. In Proc. the 34th Annual International Symposium on Computer Architecture, San Diego, USA, June 9-13, 2007, pp.278-289.
- (2007) Proc. the 34th Annual International Symposium on Computer Architecture , pp. 278-289
- Ceze Tuck, L.J.¹ Al, E.²

32
- 27644567646
- Power efficient architecture and the cell processor
- San Francisco, USA, February 12-16
- Hofstee P. Power efficient architecture and the cell processor. In Proc. HPCA-11, San Francisco, USA, February 12-16, 2005, pp.258-262.
- (2005) Proc. HPCA-11 , pp. 258-262
- Hofstee, P.¹

33
- 33746304031
- Dissecting cyclops: A detailed analysis of a multithreaded architecture
- 10.1145/773365.773369
- G Almasi C Cascaval, et al. 2003 Dissecting cyclops: A detailed analysis of a multithreaded architecture ACM SIGARCH Computer Architecture News 31 1 26 38 10.1145/773365.773369
- (2003) ACM SIGARCH Computer Architecture News , vol.31 , Issue.1 , pp. 26-38
- Almasi, G.¹ Cascaval, C.²

34
- 44849137198
- NVIDIA Tesla: A unified graphics and computing architecture
- DOI 10.1109/MM.2008.31
- E Lindholm, et al. 2008 NVIDIA Tesla: A unified graphics and computing architecture IEEE Micro 28 2 39 55 10.1109/MM.2008.31 (Pubitemid 351796170)
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

35
- 0026137159
- Synchronization without contention
- Santa Clara, USA, April 8-11
- Mellor-Crummey, J M, Scott M L. Synchronization without contention. In Proc. Architectural Support for Programming Languages and Operating Systems, Santa Clara, USA, April 8-11, 1991, pp.269-278.
- (1991) Proc. Architectural Support for Programming Languages and Operating Systems , pp. 269-278
- Mellor-Crummey, M.J.¹ Scott, M.L.²

36
- 0031593999
- Exploiting fine-grain thread level parallelism on the MIT multi-alu processor
- Barcelona, Spain, June 27-July 1
- Keckler S W et al. Exploiting fine-grain thread level parallelism on the MIT multi-alu processor. In Proc. the 25th Annual International Symposium on Computer Architecture, Barcelona, Spain, June 27-July 1, 1998, pp.306-317.
- (1998) Proc. the 25th Annual International Symposium on Computer Architecture , pp. 306-317
- Keckler, S.W.¹

37
- 40349086066
- Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
- Orlando, USA, Dec. 9-13
- Sampson J, Gonzalez R. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, USA, Dec. 9-13, 2006, pp.235-246.
- (2006) Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 235-246
- Sampson, J.¹ Gonzalez, R.²

38
- 63649096141
- Efficiency and scalability of barrier synchronization on NoC based many-core architecture
- Atlanta, USA, October 19-24
- Villa O et al. Efficiency and scalability of barrier synchronization on NoC based many-core architecture. In Proc. CASES 2008, Atlanta, USA, October 19-24, 2008, pp.81-90.
- (2008) Proc. CASES 2008 , pp. 81-90
- Villa, O.¹ Al, E.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.