SCOPUS 정보 검색 플랫폼

Annual ACM Symposium on Parallelism in Algorithms and Architectures

Volumn , Issue , 2012, Pages 121-130

High-performance RMA-based broadcast on the Intel SCC

(4) Petrović, Darko a Shahmirzadi, Omid a Ropars, Thomas a Schiper, André a

a EPFL (Switzerland)

Author keywords

Broadcast; HPC; Many Core Chips; Message Passing; RMA

Indexed keywords

ANALYTICAL EVALUATION; BROADCAST; BROADCAST ALGORITHM; CACHE COHERENCE; COLLECTIVE OPERATIONS; HARDWARE FEATURES; HPC; K-ARY TREE; MANY-CORE; MANY-CORE ARCHITECTURE; ON CHIPS; PROGRAMMING MODELS; REMOTE MEMORY ACCESS; RESEARCH DIRECTIONS; RMA; SCALABILITY ISSUE; SINGLE-CHIP;

ALGORITHMS; COMPUTER ARCHITECTURE; COMPUTER PROGRAMMING;

MESSAGE PASSING;

EID: 84864151929 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2312005.2312029 Document Type: Conference Paper

Times cited : (12)

References (30)

1
- 32844464238
- Optimization of MPI collective communication on BlueGene/L systems
- ICS05 - Proceedings of the 19th ACM International Conference on Supercomputing
- G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. Optimization of MPI collective communication on BlueGene/L systems. In Proceedings of the 19th annual international conference on Supercomputing, ICS '05, pages 253-262, 2005. (Pubitemid 43251330)
- (2005) Proceedings of the International Conference on Supercomputing , pp. 253-262
- Almasi, G.¹ Heidelberger, P.² Archer, C.J.³ Martorell, X.⁴ Erway, C.C.⁵ Moreira, J.E.⁶ Steinmacher-Burow, B.⁷ Zheng, Y.⁸

2
- 0007910858
- I. T. Association, InfiniBand Trade Association
- I. T. Association. InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000.
- (2000) InfiniBand Architecture Specification: Release 1.0

3
- 72249097688
- The multikernel: A new OS architecture for scalable multicore systems
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 29-44, 2009.
- (2009) Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09 , pp. 29-44
- Baumann, A.¹ Barham, P.² Dagand, P.-E.³ Harris, T.⁴ Isaacs, R.⁵ Peter, S.⁶ Roscoe, T.⁷ Schüpbach, A.⁸ Singhania, A.⁹

4
- 33746284933
- Broadcast trees for heterogeneous platforms
- O. Beaumont, L. Marchal, and Y. Robert. Broadcast Trees for Heterogeneous Platforms. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 80-92, 2005.
- (2005) Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05 , pp. 80-92
- Beaumont, O.¹ Marchal, L.² Robert, Y.³

5
- 34547261834
- Thousand core chips - A technology perspective
- DOI 10.1109/DAC.2007.375263, 4261282, 2007 44th ACM/IEEE Design Automation Conference, DAC'07
- S. Borkar. Thousand core chips: a technology perspective. In Proceedings of the 44th annual Design Automation Conference, DAC '07, pages 746-749, 2007. (Pubitemid 47130064)
- (2007) Proceedings - Design Automation Conference , pp. 746-749
- Borkar, S.¹

6
- 0031269329
- Efficient algorithms for all-to-all communications in multiport message-passing systems
- J. Bruck, C.-T. Ho, E. Upfal, S. Kipnis, and D. Weathersby. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Transactions on Parallel and Distributed Systems, 8:1143-1156, November 1997. (Pubitemid 127763326)
- (1997) IEEE Transactions on Parallel and Distributed Systems , vol.8 , Issue.11 , pp. 1143-1156
- Bruck, J.¹ Ho, C.-T.² Kipnis, S.³ Upfal, E.⁴ Weathersby, D.⁵

7
- 84864147959
- A Collective Communication Library for the Intel Single-chip Cloud Computer
- E. Chan. RCCE comm: A Collective Communication Library for the Intel Single-chip Cloud Computer. http://communities.intel.com/docs/DOC-5663, 2010.
- (2010)
- Chan, E.¹

8
- 84857417606
- C. Clauss, S. Lankes, J. Galowicz, and T. Bemmerl. iRCCE: a non-blocking communication extension to the RCCE communication library for the Intel Single-chip Cloud Computer. http://communities. intel.com/docs/DOC-6003, 2011.
- (2011) IRCCE: A Non-blocking Communication Extension to the RCCE Communication Library for the Intel Single-chip Cloud Computer
- Clauss, C.¹ Lankes, S.² Galowicz, J.³ Bemmerl, T.⁴

9
- 0009346826
- LogP: Towards a realistic model of parallel computation
- D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '93, pages 1-12, 1993.
- (1993) Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '93 , pp. 1-12
- Culler, D.¹ Karp, R.² Patterson, D.³ Sahay, A.⁴ Schauser, K.E.⁵ Santos, E.⁶ Subramonian, R.⁷ Von Eicken, T.⁸

10
- 3643067761
- Assessing fast network interfaces
- D. E. Culler, L. T. Liu, R. P. Martin, and C. O. Yoshikawa. Assessing Fast Network Interfaces. In IEEE Micro, pages 35-43, Feb. 1996. (Pubitemid 126530205)
- (1996) IEEE Micro , vol.16 , Issue.1 , pp. 35-43
- Culler, D.E.¹ Liu, L.T.² Martin, R.P.³ Yoshikawa, C.O.⁴

11
- 35048884271
- Open MPI: Goals, concept, and design of a next generation MPI implementation
- Budapest, Hungary, September
- E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S.Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings, 11th European PVM/MPI Users' Group Meeting, pages 97-104, Budapest, Hungary, September 2004.
- (2004) Proceedings, 11th European PVM/MPI Users' Group Meeting , pp. 97-104
- Gabriel, E.¹ Fagg, G.E.² Bosilca, G.³ Angskun, T.⁴ Dongarra, J.J.⁵ Squyres, J.M.⁶ Sahay, V.⁷ Kambadur, P.⁸ Barrett, B.⁹ Lumsdaine, A.¹⁰ Castain, R.H.¹¹ Daniel, D.J.¹² Graham, R.L.¹³ Woodall, T.S.¹⁴

12
- 84947273700
- Efficient collective operations using remote memory operations on VIA-based clusters
- R. Gupta, P. Balaji, D. K. Panda, and J. Nieplocha. Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS '03, pages 46-62, 2003.
- (2003) Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS '03 , pp. 46-62
- Gupta, R.¹ Balaji, P.² Panda, D.K.³ Nieplocha, J.⁴

13
- 34548793392
- A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast
- T. Hoefler, C. Siebert, andW. Rehm. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, IPDPS '07, page 232, 2007.
- (2007) Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, IPDPS '07 , pp. 232
- Hoefler, T.¹ Siebert, C.² Rehm, W.³

14
- 77952123736
- A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS
- IEEE
- J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In 2010 IEEE International SolidState Circuits Conference, pages 108-109. IEEE, 2010.
- (2010) 2010 IEEE International SolidState Circuits Conference , pp. 108-109
- Howard, J.¹ Dighe, S.² Hoskote, Y.³ Vangal, S.⁴ Finan, D.⁵ Ruhl, G.⁶ Jenkins, D.⁷ Wilson, H.⁸ Borkar, N.⁹ Schrom, G.¹⁰

15
- 0018518295
- Virtual cut-through: A new computer communication switching technique
- DOI 10.1016/0376-5075(79)90032-1
- P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3(4):267-286, 1979. (Pubitemid 10422271)
- (1979) Computer networks , vol.3 , Issue.4 , pp. 267-286
- Kermani Parviz¹ Kleinrock Leonard²

16
- 66749092384
- Exascale computing study: Technology challenges in achieving exascale systems
- P. Kogge et al. Exascale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical report, DARPA, 2008.
- (2008) Technical Report DARPA
- Kogge, P.¹

17
- 12444269036
- Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast supportsch
- J. Liu, A. R. Mamidala, and D. K. Panda. Fast and Scalable MPI-Level Broadcast Using InfiniBand's Hardware Multicast Supportsch. In Proceedings of the 18th International Symposium on Parallel and Distributed Processing, IPDPS '04, page 10, 2004.
- (2004) Proceedings of the 18th International Symposium on Parallel and Distributed Processing, IPDPS '04 , pp. 10
- Liu, J.¹ Mamidala, A.R.² Panda, D.K.³

18
- 1142305191
- High performance RDMA-based MPI implementation over InfiniBand
- J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. In Proceedings of the 17th annual international conference on Supercomputing, ICS '03, pages 295-304, 2003.
- (2003) Proceedings of the 17th Annual International Conference on Supercomputing, ICS '03 , pp. 295-304
- Liu, J.¹ Wu, J.² Kini, S.P.³ Wyckoff, P.⁴ Panda, D.K.⁵

19
- 84864147958
- RCCE: a Small Library for Many-Core Communication
- T. Mattson and R. Van Der Wijngaart. RCCE: a Small Library for Many-Core Communication. http://techresearch.intel.com, 2010.
- (2010)
- Mattson, T.¹ Van Der Wijngaart, R.²

20
- 70350754500
- Programming the Intel 80-core network-on-a-chip terascale processor
- T. G. Mattson, R. Van der Wijngaart, and M. Frumkin. Programming the Intel 80-core network-on-a-chip terascale processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 38:1-38:11, 2008.
- (2008) Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08 , pp. 381-3811
- Mattson, T.G.¹ Van Der Wijngaart, R.² Frumkin, M.³

21
- 0003604499
- MPI Forum. MPI2: Extensions to the Message-Passing Interface. www.mpi-forum.org, 1997.
- (1997) MPI2: Extensions to the Message-Passing Interface

22
- 78650735454
- BatchQueue: Fast and memory-thrifty core to core communication
- T. Preud'homme, J. Sopena, G. Thomas, and B. Folliot. BatchQueue: Fast and Memory-Thrifty Core to Core Communication. In Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD '10, pages 215-222, 2010.
- (2010) Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD '10 , pp. 215-222
- Preud'homme, T.¹ Sopena, J.² Thomas, G.³ Folliot, B.⁴

23
- 84863890827
- On efficient message passing on the intel scc
- R. Rotta. On efficient message passing on the intel scc. In Proceedings of the 3rd MARC Symposium, pages 53-58, 2011.
- (2011) Proceedings of the 3rd MARC Symposium , pp. 53-58
- Rotta, R.¹

24
- 5044232557
- CollMark: MPI collective communication benchmark
- M. Shroff and R. Van De Geijn. CollMark: MPI collective communication benchmark. In International Conference on Supercomputing 2000, page 10, 1999.
- (1999) International Conference on Supercomputing 2000 , pp. 10
- Shroff, M.¹ Van De Geijn, R.²

25
- 33646719765
- High performance RDMA based all-to-all broadcast for infiniband clusters
- S. Sur, U. K. R. Bondhugula, A. Mamidala, H. W. Jin, and D. K. Panda. High performance RDMA based all-to-all broadcast for infiniband clusters. In Proceedings of the 12th international conference on High Performance Computing, HiPC'05, pages 148-157, 2005.
- (2005) Proceedings of the 12th International Conference on High Performance Computing, HiPC'05 , pp. 148-157
- Sur, S.¹ Bondhugula, U.K.R.² Mamidala, A.³ Jin, H.W.⁴ Panda, D.K.⁵

26
- 14744288044
- Optimization of collective communication operations in MPICH
- DOI 10.1177/1094342005051521
- R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. IJHPCA, 19(1):49-66, 2005. (Pubitemid 40329106)
- (2005) International Journal of High Performance Computing Applications , vol.19 , Issue.1 , pp. 49-66
- Thakur, R.¹ Rabenseifner, R.² Gropp, W.³

27
- 70450209566
- Architectures for extreme-scale computing
- Nov.
- J. Torrellas. Architectures for Extreme-Scale Computing. Computer, 42(11):28-35, Nov. 2009.
- (2009) Computer , vol.42 , Issue.11 , pp. 28-35
- Torrellas, J.¹

28
- 80053027876
- RCKMPI - Lightweight MPI implementation for intel's single-chip cloud computer (SCC)
- I. A. C. Ureña, M. Riepen, and M. Konow. RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC). In Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface, EuroMPI'11, pages 208-217, 2011.
- (2011) Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, EuroMPI'11 , pp. 208-217
- Ureña, I.A.C.¹ Riepen, M.² Konow, M.³

29
- 84856529095
- Light-weight communications on Intel's single-chip cloud computer processor
- Feb.
- R. F. van der Wijngaart, T. G. Mattson, and W. Haas. Light-weight communications on Intel's single-chip cloud computer processor. ACM SIGOPS Operating Systems Review, 45(1):73-83, Feb. 2011.
- (2011) ACM SIGOPS Operating Systems Review , vol.45 , Issue.1 , pp. 73-83
- Van Der Wijngaart, R.F.¹ Mattson, T.G.² Haas, W.³

30
- 84870534520
- Efficient memory copy operations on the 48-core intel SCC processor
- M. W. van Tol, R. Bakker, M. Verstraaten, C. Grelck, and C. R. Jesshope. Efficient Memory Copy Operations on the 48-core Intel SCC Processor. In Proceedings of the 3rd MARC Symposium, pages 13-18, 2011.
- (2011) Proceedings of the 3rd MARC Symposium , pp. 13-18
- Van Tol, M.W.¹ Bakker, R.² Verstraaten, M.³ Grelck, C.⁴ Jesshope, C.R.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.