SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on Computer Architecture

Volumn , Issue , 2009, Pages 140-151

Rigel: An architecture and scalable programming interface for a 1000-core accelerator

(9) Kelm, John H a Johnson, Daniel R a Johnson, Matthew R a Crago, Neal C a Tuohy, William a Mahesri, Aqeel a Lumetta, Steven S a Frank, Matthew I a Patel, Sanjay J a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

Accelerator; Computer architecture; Low level programming interface

Indexed keywords

A-DENSITY; ACCELERATOR ARCHITECTURES; ADDRESS SPACE; BARRIER OPERATIONS; DESIGN ANALYSIS; DOMAIN SPECIFIC; EXECUTION MODEL; EXPERIMENTAL ANALYSIS; HARDWARE SUPPORTS; INITIAL DESIGN; LOAD-BALANCING; PARALLEL COMPUTATION; POWER EFFICIENCY; PROGRAMMING INTERFACE; PROGRAMMING MODELS; SCALABILITY ISSUE; SOFTWARE TECHNIQUES; SPECIALIZED HARDWARE; TASK DISTRIBUTION; WORK DISTRIBUTION;

ACCELERATION; COMPUTER HARDWARE; COMPUTER PROGRAMMING; HARDWARE; INTERFACES (COMPUTER); PROGRAM PROCESSORS;

COMPUTER ARCHITECTURE;

EID: 70450237431 PISSN: 10636897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1555754.1555774 Document Type: Conference Paper

Times cited : (94)

References (31)

1
- 34547471544
- Design tradeoffs for tiled CMP on-chip networks
- J. Balfour and W. J. Dally. Design tradeoffs for tiled CMP on-chip networks. In ICS'06, 2006.
- (2006) ICS'06
- Balfour, J.¹ Dally, W.J.²

2
- 0024770039
- Scans as primitive parallel operations
- G. E. Blelloch. Scans as primitive parallel operations. IEEE Trans. Comput., 38(11), 1989.
- (1989) IEEE Trans. Comput , vol.38 , Issue.11
- Blelloch, G.E.¹

3
- 0029666646
- Memory bandwidth limitations of future microprocessors
- D. Burger, J. R. Goodman, and A. Kägi. Memory bandwidth limitations of future microprocessors. In ISCA'96, 1996.
- (1996) ISCA'96
- Burger, D.¹ Goodman, J.R.² Kägi, A.³

4
- 0029209574
- A hierarchical task queue organization for shared-memory multiprocessor systems
- S. P. Dandamudi and P. S. P. Cheng. A hierarchical task queue organization for shared-memory multiprocessor systems. IEEE Trans. Parallel Distrib. Syst., 6(1), 1995.
- (1995) IEEE Trans. Parallel Distrib. Syst , vol.6 , Issue.1
- Dandamudi, S.P.¹ Cheng, P.S.P.²

5
- 80052037090
- Poster session - N-body simulation on GPUs
- E. Elsen, M. Houston, V. Vishal, E. Darve, P. Hanrahan, and V. Pande. Poster session - N-body simulation on GPUs. In SC'06, 2006.
- (2006) SC'06
- Elsen, E.¹ Houston, M.² Vishal, V.³ Darve, E.⁴ Hanrahan, P.⁵ Pande, V.⁶

6
- 34548207355
- Sequoia: Programming the memory hierarchy
- K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In SC'06, 2006.
- (2006) SC'06
- Fatahalian, K.¹ Horn, D.R.² Knight, T.J.³ Leem, L.⁴ Houston, M.⁵ Park, J.Y.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.J.¹⁰ Hanrahan, P.¹¹

7
- 56649087761
- GPUs: A closer look
- K. Fatahalian and M. Houston. GPUs: a closer look. Queue, 6(2):18-28, 2008.
- (2008) Queue , vol.6 , Issue.2 , pp. 18-28
- Fatahalian, K.¹ Houston, M.²

8
- 70450264487
- Cedar: A large scale multiprocessor
- D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. Cedar: a large scale multiprocessor. SIGARCH Comput. Archit. News, 11(1):7-11, 1983.
- (1983) SIGARCH Comput. Archit. News , vol.11 , Issue.1 , pp. 7-11
- Gajski, D.¹ Kuck, D.² Lawrie, D.³ Sameh, A.⁴

9
- 70450275953
- The NYU ultracomputer
- A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU ultracomputer. In ISCA'82, 1982.
- (1982) ISCA'82
- Gottlieb, A.¹ Grishman, R.² Kruskal, C.P.³ McAuliffe, K.P.⁴ Rudolph, L.⁵ Snir, M.⁶

10
- 34247376580
- Chip multiprocessing and the Cell broadband engine
- New York, NY, USA
- M. Gschwind. Chip multiprocessing and the Cell broadband engine. In CF'06, pages 1-8, New York, NY, USA, 2006.
- (2006) CF'06 , pp. 1-8
- Gschwind, M.¹

11
- 33847108581
- Hierarchically tiled arrays for parallelism and locality
- April
- J. Guo, G. Bikshandi, D. Hoeflinger, G. Almasi, B. Fraguela, M. Garzaran, D. Padua, and C. von Praun. Hierarchically tiled arrays for parallelism and locality. In Parallel and Distributed Processing Symposium, April 2006.
- (2006) Parallel and Distributed Processing Symposium
- Guo, J.¹ Bikshandi, G.² Hoeflinger, D.³ Almasi, G.⁴ Fraguela, B.⁵ Garzaran, M.⁶ Padua, D.⁷ von Praun, C.⁸

12
- 70450249565
- Intel. Intel microprocessor export compliance metrics, Februrary 2009.
- Intel. Intel microprocessor export compliance metrics, Februrary 2009.

13
- 0041562664
- Programmable stream processors
- U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. Computer, 36(8), 2003.
- (2003) Computer , vol.36 , Issue.8
- Kapasi, U.J.¹ Rixner, S.² Dally, W.J.³ Khailany, B.⁴ Ahn, J.H.⁵ Mattson, P.⁶ Owens, J.D.⁷

14
- 35348855586
- Carbon: Architectural support for fine-grained parallelism on chip multiprocessors
- New York, NY, USA
- S. Kumar, C. J. Hughes, and A. Nguyen. Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In ISCA'07, pages 162-173, New York, NY, USA, 2007.
- (2007) ISCA'07 , pp. 162-173
- Kumar, S.¹ Hughes, C.J.² Nguyen, A.³

15
- 16144366475
- The network architecture of the connection machine CM-5
- C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. S. Pierre, D. S. Wells, M. C. Wong-Chan, S.-W. Yang, and R. Zak. The network architecture of the connection machine CM-5. J. Parallel Distrib. Comput., 33(2), 1996.
- (1996) J. Parallel Distrib. Comput , vol.33 , Issue.2
- Leiserson, C.E.¹ Abuhamdeh, Z.S.² Douglas, D.C.³ Feynman, C.R.⁴ Ganmukhi, M.N.⁵ Hill, J.V.⁶ Hillis, W.D.⁷ Kuszmaul, B.C.⁸ Pierre, M.A.S.⁹ Wells, D.S.¹⁰ Wong-Chan, M.C.¹¹ Yang, S.-W.¹² Zak, R.¹³

16
- 44849137198
- NVIDIA tesla: A unified graphics and computing architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 2008.
- (2008) IEEE Micro , vol.28 , Issue.2
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

17
- 66749170578
- Tradeoffs in designing accelerator architectures for visual computing
- A. Mahesri, D. Johnson, N. Crago, and S. J. Patel. Tradeoffs in designing accelerator architectures for visual computing. In MICRO'08, 2008.
- (2008) MICRO'08
- Mahesri, A.¹ Johnson, D.² Crago, N.³ Patel, S.J.⁴

18
- 84976718540
- Algorithms for scalable synchronization on shared-memory multiprocessors
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(1):21-65, 1991.
- (1991) ACM Trans. Comput. Syst , vol.9 , Issue.1 , pp. 21-65
- Mellor-Crummey, J.M.¹ Scott, M.L.²

19
- 70450249564
- MIPS
- MIPS. MIPS32 24K Family of Synthesizable Processor Cores, 2009.
- (2009) MIPS32 24K Family of Synthesizable Processor Cores

20
- 78651550268
- Scalable parallel programming with CUDA
- J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2), 2008.
- (2008) Queue , vol.6 , Issue.2
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

21
- 33947588048
- A survey of general-purpose computation on graphics hardware
- J. D. Owens, D. Luebke, N. Govindaraju, mark Harris, J. Krueger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, 2007.
- (2007) Computer Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.D.¹ Luebke, D.² Govindaraju, N.³ mark Harris⁴ Krueger, J.⁵ Lefohn, A.E.⁶ Purcell, T.J.⁷

22
- 70349285149
- A 45nm 8-core enterprise Xeon processor
- Februrary
- S. Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayers, R. V. J. Chang, M. Ratta, and S. Kottapalli. A 45nm 8-core enterprise Xeon processor. In ISSCC'09, Februrary 2009.
- (2009) ISSCC'09
- Rusu, S.¹ Tam, S.² Muljono, H.³ Stinson, J.⁴ Ayers, D.⁵ Chang, R.V.J.⁶ Ratta, M.⁷ Kottapalli, S.⁸

23
- 40349086066
- Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
- J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO'06, 2006.
- (2006) MICRO'06
- Sampson, J.¹ Gonzalez, R.² Collard, J.-F.³ Jouppi, N.P.⁴ Schlansker, M.⁵ Calder, B.⁶

24
- 0030259457
- Synchronization and communication in the T3E multiprocessor
- S. L. Scott. Synchronization and communication in the T3E multiprocessor. In ASPLOS'96, pages 26-36, 1996.
- (1996) ASPLOS'96 , pp. 26-36
- Scott, S.L.¹

25
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1-15, 2008.
- (2008) ACM Trans. Graph , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

26
- 0009384049
- The architecture of HEP
- Massachusetts Institute of Technology
- B. Smith. The architecture of HEP. In On Parallel MIMD computation, pages 41-55. Massachusetts Institute of Technology, 1985.
- (1985) On Parallel MIMD computation , pp. 41-55
- Smith, B.¹

27
- 35948963714
- Accelerating molecular modeling applications with graphics processors
- J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten. Accelerating molecular modeling applications with graphics processors. Journal of Computational Chemistry, 28:2618-2640, 2007.
- (2007) Journal of Computational Chemistry , vol.28 , pp. 2618-2640
- Stone, J.E.¹ Phillips, J.C.² Freddolino, P.L.³ Hardy, D.J.⁴ Trabuco, L.G.⁵ Schulten, K.⁶

28
- 51449100575
- Accelerating advanced MRI reconstructions on GPUs
- S. S. Stone, J. P. Haldar, S. C. Tsao, W. m. W. Hwu, B. P. Sutton, and Z. P. Liang. Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput., 68(10):1307-1318, 2008.
- (2008) J. Parallel Distrib. Comput , vol.68 , Issue.10 , pp. 1307-1318
- Stone, S.S.¹ Haldar, J.P.² Tsao, S.C.³ Hwu, W.M.W.⁴ Sutton, B.P.⁵ Liang, Z.P.⁶

29
- 70450261675
- Tensilica. 570T Static-Superscalar CPU Core PRODUCT BRIEF, 2007.
- (2007) 570T Static-Superscalar CPU Core PRODUCT BRIEF
- Tensilica¹

30
- 49549084422
- A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor
- Feb
- M. Tremblay and S. Chaudhry. A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor. In ISSCC'08, Feb. 2008.
- (2008) ISSCC'08
- Tremblay, M.¹ Chaudhry, S.²

31
- 0025467711
- A bridging model for parallel computation
- L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103-111, 1990.
- (1990) Communications of the ACM , vol.33 , Issue.8 , pp. 103-111
- Valiant, L.G.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.