SCOPUS 정보 검색 플랫폼

Transactions on Architecture and Code Optimization

Volumn 8, Issue 4, 2012, Pages

Optimizing explicit data transfers for data parallel applications on the cell architecture

(4) Saidi, Selma a,b Tendulkar, Pranav a Lepley, Thierry b Maler, Oded a,c

a UNIV GRENOBLE ALPES (France)

b STMICROELECTRONICS (France)

c VERIMAG (France)

Author keywords

Cell B.E.; Data parallelization; Direct memory access (DMA); Double buffering

Indexed keywords

CELL ARCHITECTURES; CELL BROADBAND ENGINE ARCHITECTURE; COMPUTATION TIME; CYCLE-ACCURATE SIMULATORS; DATA ITEMS; DATA PARALLEL; DATA PARALLELIZATION; DATA SHARING; DIRECT MEMORY ACCESS; DOUBLE BUFFERING; GENERAL APPROACH; LOCAL MEMORIES; MULTI CORE; NUMBER OF BLOCKS; OFF-CHIP; OPTIMAL VALUES;

DATA TRANSFER; OPTIMIZATION;

COMPUTER ARCHITECTURE;

EID: 84857823732 PISSN: 15443566 EISSN: 15443973 Source Type: Journal
DOI: 10.1145/2086696.2086716 Document Type: Article

Times cited : (22)

References (36)

1
- 0029373981
- Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors
- AGARWAL, A., KRANZ, D., AND NATARAJAN, V. 1995. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. Parall. Distrib. Syst. 6, 9, 943-962.
- (1995) IEEE Trans. Parall. Distrib. Syst. , vol.6 , Issue.9 , pp. 943-962
- Agarwal, A.¹ Kranz, D.² Natarajan, V.³

2
- 0023563093
- A model for hierarchical memory
- ACM, New York, NY
- AGGARWAL, A., ALPERN, B., CHANDRA, A., AND SNIR, M. 1987. A model for hierarchical memory. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC'87). ACM, New York, NY, 305-314.
- (1987) Proceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC'87) , pp. 305-314
- Aggarwal, A.¹ Alpern, B.² Chandra, A.³ Snir, M.⁴

3
- 77949601287
- Spenk: Adding another level of parallelism on the cell broadband engine
- ACM
- AHMED, M., AMMAR, R., AND RAJASEKARAN, S. 2008. Spenk: adding another level of parallelism on the cell broadband engine. In Proceedings of the 1st International Forum on Next-Generation Multicore/Manycore Technologies. ACM, 1-10.
- (2008) Proceedings of the 1st International Forum on Next-Generation Multicore/Manycore Technologies , pp. 1-10
- Ahmed, M.¹ Ammar, R.² Rajasekaran, S.³

4
- 52149087674
- Barrier synchronization for cell multi-processor architecture
- BAI, S., ZHOU, Q., ZHOU, R., AND LI, L. 2008. Barrier synchronization for cell multi-processor architecture. In Proceedings of the 1st IEEE International Conference on Ubi-Media Computing. 155-158.
- (2008) Proceedings of the 1st IEEE International Conference on Ubi-Media Computing , pp. 155-158
- Bai, S.¹ Zhou, Q.² Zhou, R.³ L, L.I.⁴

5
- 77952225553
- CellMT: A cooperative multithreading library for the Cell/BE
- IEEE
- BELTRAN, V., CARRERA, D., TORRES, J., AND AYGUADÉ, E. 2009. CellMT: A cooperative multithreading library for the Cell/BE. In Proceedings of the International Conference on High Performance Computing (HiPC). IEEE, 245-253.
- (2009) Proceedings of the International Conference on High Performance Computing (HiPC) , pp. 245-253
- Beltran, V.¹ Carrera, D.² Torres, J.³ Ayguadé, E.⁴

6
- 49949114550
- Modeling multigrain parallelism on heterogeneous multi-core processors: A case study of the cell be
- Springer-Verlag, Berlin
- BLAGOJEVIC, F., FENG, X., CAMERON, K. W., AND NIKOLOPOULOS, D. S. 2008. Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell be. In Proceedings of the 3rd International Conference on High Performance Embedded Architectures and Compilers (HiPEAC'08). Springer-Verlag, Berlin, 38-52.
- (2008) Proceedings of the 3rd International Conference on High Performance Embedded Architectures and Compilers (HiPEAC'08) , pp. 38-52
- Blagojevic, F.¹ Feng, X.² Cameron, K.W.³ Nikolopoulos, D.S.⁴

7
- 0026138044
- Software prefetching
- CALLAHAN, D., KENNEDY, K., AND PORTERFIELD, A. 1991. Software prefetching. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV). ACM, New York, NY, 40-52. (Pubitemid 21702169)
- (1991) International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS , vol.26 , Issue.4 , pp. 40-52
- Callahan David¹ Kennedy Ken² Porterfield Allan³

8
- 77949756278
- Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
- Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., Lecture Notes in Computer Science Series, Springer Berlin
- CARPENTER, P., RAMIREZ, A., AND AYGUAD, E. 2010. Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. In High Performance Embedded Architectures and Compilers, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., Lecture Notes in Computer Science Series, vol. 5952, Springer Berlin, 96-110.
- (2010) High Performance Embedded Architectures and Compilers , vol.5952 , pp. 96-110
- Carpenter, P.¹ Ramirez, A.² Ayguad, E.³

9
- 38149004865
- Optimizing the use of static buffers for dma on a cell chip
- Springer-Verlag, Berlin
- CHEN, T., SURA, Z., O'BRIEN, K., AND O'BRIEN, J. K. 2007. Optimizing the use of static buffers for dma on a cell chip. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC'06). Springer-Verlag, Berlin, 314-329.
- (2007) Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC'06) , pp. 314-329
- Chen, T.¹ Sura, Z.² O'Brien, K.³ O'Brien, J.K.⁴

10
- 0028202735
- A performance study of software and hardware data prefetching schemes
- CHEN, T.-F. AND BAER, J.-L. 1994. A performance study of software and hardware data prefetching schemes. SIGARCH Comput. Archit. News 22, 223-232.
- (1994) SIGARCH Comput. Archit. News , vol.22 , pp. 223-232
- Chen, T.-F.¹ Baer, J.-L.²

11
- 84976745804
- Tile size selection using cache organization and data layout
- ACM, New York, NY
- COLEMAN, S. AND MCKINLEY, K. S. 1995. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95). ACM, New York, NY, 279-290.
- (1995) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95) , pp. 279-290
- Coleman, S.¹ McKinley, K.S.²

12
- 0009346826
- LogP: Towards a realistic model of parallel computation
- ACM, New York, NY
- CULLER, D., KARP, R., PATTERSON, D., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. 1993. LogP: Towards a realistic model of parallel computation. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'93). ACM, New York, NY, 1-12.
- (1993) Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'93) , pp. 1-12
- Culler, D.¹ Karp, R.² Patterson, D.³ Sahay, A.⁴ Schauser, K.E.⁵ Santos, E.⁶ Subramonian, R.⁷ Von Eicken, T.⁸

13
- 0029341212
- Sequential hardware prefetching in shared-memory multiprocessors
- DAHLGREN, F., DUBOIS, M., AND STENSTRÖM, P. 1995. Sequential hardware prefetching in shared-memory multiprocessors. IEEE Trans. Parall. Distrib. Syst. 6, 733-746.
- (1995) IEEE Trans. Parall. Distrib. Syst. , vol.6 , pp. 733-746
- Dahlgren, F.¹ Dubois, M.² Stenström, P.³

14
- 0003455775
- Master's thesis, Department of Computer Science, Rice University
- ESSEGHIR, K. 1993. Improving data locality for caches. Master's thesis, Department of Computer Science, Rice University.
- (1993) Improving Data Locality for Caches
- Esseghir, K.¹

15
- 34548207355
- Sequoia: Programming the memory hierarchy
- FATAHALIAN, K., HORN, D., KNIGHT, T., LEEM, L., HOUSTON, M., PARK, J., EREZ, M., REN, M., AIKEN, A., DALLY, W., ET AL. 2006. Sequoia: Programming the memory hierarchy. In Proceedings of the ACM/IEEE Conference on Supercomputing.
- (2006) Proceedings of the ACM/IEEE Conference on Supercomputing
- Fatahalian, K.¹ Horn, D.² Knight, T.³ Leem, L.⁴ Houston, M.⁵ Park, J.⁶ Erez, M.⁷ Ren, M.⁸ Aiken, A.⁹ Dally, W.¹⁰

16
- 33646429702
- Multi-level memory prefetching for media and stream processing
- FRITTS, J. 2002. Multi-level memory prefetching for media and stream processing. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '02). vol. 2, 101-104.
- (2002) Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '02) , vol.2 , pp. 101-104
- Fritts, J.¹

17
- 0029308368
- Effective hardware-based data prefetching for high-performance processors
- FU CHEN, T. AND LOUP BAER, J. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44, 609-623.
- (1995) IEEE Trans. Comput. , vol.44 , pp. 609-623
- Chen T, F.U.¹ Loup Baer, J.²

18
- 34250167228
- The cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor
- DOI 10.1007/s10766-007-0035-4
- GSCHWIND, M. 2007. The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parall. Program. 35, 3, 233-262. (Pubitemid 46904453)
- (2007) International Journal of Parallel Programming , vol.35 , Issue.3 , pp. 233-262
- Gschwind, M.¹

19
- 84857885288
- IBM. 2008. Cell SDK 3.1. https://www.ibm.com/developerworks/power/cell/.
- (2008) Cell SDK 3.1

20
- 84857805403
- IBM. 2009. Cell Simulator. http://www.alphaworks.ibm.com/tech/ cellsystemsim.
- (2009) Cell Simulator

21
- 33746923043
- Cell multiprocessor communication network: Built for speed
- DOI 10.1109/MM.2006.49
- KISTLER, M., PERRONE, M., AND PETRINI, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro, 26, 3, 10-23. (Pubitemid 44194065)
- (2006) IEEE Micro , vol.26 , Issue.3 , pp. 10-23
- Kistler, M.¹ Perrone, M.² Petrini, F.³

22
- 84976859541
- The cache performance and optimizations of blocked algorithms
- LAM, M. D., ROTHBERG, E. E., AND WOLF, M. E. 1991. The cache performance and optimizations of blocked algorithms. SIGOPS Oper. Syst. Rev. 25, 63-74.
- (1991) SIGOPS Oper. Syst. Rev. , vol.25 , pp. 63-74
- Lam, M.D.¹ Rothberg, E.E.² Wolf, M.E.³

23
- 0002031606
- Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
- MOWRY, T. AND GUPTA, A. 1991. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parall. Distrib. Comp. 12, 87-106.
- (1991) J. Parall. Distrib. Comp. , vol.12 , pp. 87-106
- Mowry, T.¹ Gupta, A.²

24
- 0003631857
- Springer-Verlag, Berlin
- NUSSBAUMER, H. J. 1981. Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, Berlin.
- (1981) Fast Fourier Transform and Convolution Algorithms
- Nussbaumer, H.J.¹

25
- 34548757858
- Multicore surprises: Lessons learned from optimizing sweep3d on the cell broadband engine
- IEEE
- PETRINI, F., FOSSUM, G., FERNANDEZ, J., VARBANESCU, A., KISTLER, M., AND PERRONE, M. 2007. Multicore surprises: Lessons learned from optimizing sweep3d on the cell broadband engine. In Proceedings of IPDPS'07. IEEE, 1-10.
- (2007) Proceedings of IPDPS'07 , pp. 1-10
- Petrini, F.¹ Fossum, G.² Fernandez, J.³ Varbanescu, A.⁴ Kistler, M.⁵ Perrone, M.⁶

26
- 33748632352
- Key features of the design methodology enabling a multi-core SoC implementation of a first-generation CELL processor
- 1594796, Proceedings of the ASP-DAC 2006: Asia and South Pacific Design Automation Conference 2006
- PHAM, D., ANDERSON, H.-W., BEHNEN, E., BOLLIGER, M., GUPTA, S., HOFSTEE, H. P., HARVEY, P. E., JOHNS, C. R., KAHLE, J. A., KAMEYAMA, A., KEATY, J. M., LE, B., LEE, S., NGUYEN, T. V., PETROVICK, J. G., PHAM, M., PILLE, J., POSLUSZNY, S. D., RILEY, M. W., VEROCK, J., WARNOCK, J. D., WEITZEL, S., AND WENDEL, D. F. 2006. Key features of the design methodology enabling a multi-core soc implementation of a first-generation cell processor. In Proceedings of ASP-DAC. F. Hirose, Ed., IEEE, 871-878. (Pubitemid 44376041)
- (2006) Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC , vol.2006 , pp. 871-878
- Pham, D.¹ Anderson, H.-W.² Behnen, E.³ Bolliger, M.⁴ Gupta, S.⁵ Hofstee, P.⁶ Harvey, P.⁷ Johns, C.⁸ Kahle, J.⁹ Kameyama, A.¹⁰ Keaty, J.¹¹ Le, B.¹² Lee, S.¹³ Nguyen, T.¹⁴ Petrovick, J.¹⁵ Pham, M.¹⁶ Pille, J.¹⁷ Posluszny, S.¹⁸ Riley, M.¹⁹ Verock, J.²⁰ more..

27
- 51049109661
- Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE
- IEEE
- SANCHO, J. AND KERBYSON, D. 2008. Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE. In IPDPS 2008. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. IEEE.
- (2008) IPDPS 2008. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing
- Sancho, J.¹ Kerbyson, D.²

28
- 34548214244
- Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications
- ACM, New York, NY
- SANCHO, J. C., BARKER, K. J., KERBYSON, D. J., AND DAVIS, K. 2006. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'06). ACM, New York, NY.
- (2006) Proceedings of the ACM/IEEE Conference on Supercomputing (SC'06)
- Sancho, J.C.¹ Barker, K.J.² Kerbyson, D.J.³ Davis, K.⁴

29
- 73449104291
- Programming multiprocessors with explicitly managed memory hierarchies
- SCHNEIDER, S., YEOM, J., AND NIKOLOPOULOS, D. 2009. Programming multiprocessors with explicitly managed memory hierarchies. Computer 42, 12, 28-34.
- (2009) Computer , vol.42 , Issue.12 , pp. 28-34
- Schneider, S.¹ Yeom, J.² Nikolopoulos, D.³

30
- 70350597883
- A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
- SCHNEIDER, S., YEOM, S., ROSE, B., LINFORD, J., SANDU, A., AND NIKOLOPOULOS, D. 2009. A comparison of programming models for multiprocessors with explicitly managed memory hierarchies. ACM SIGPLAN Notices.
- (2009) ACM SIGPLAN Notices
- Schneider, S.¹ Yeom, S.² Rose, B.³ Linford, J.⁴ Sandu, A.⁵ Nikolopoulos, D.⁶

31
- 80052586589
- STMICROELECTRONICS AND CEA
- STMICROELECTRONICS AND CEA. 2010. Platform 2012: Amany core programmable accelerator for ultra efficient embedded computing in nanometer technology.
- (2010) Platform 2012: Amany Core Programmable Accelerator for Ultra Efficient Embedded Computing in Nanometer Technology

32
- 70450000311
- Optimizing assignment of threads to spes on the cell be processor
- SUDHEER, C., NAGARAJU, T., BARUAH, P., AND SRINIVASAN, A. 2009. Optimizing assignment of threads to spes on the cell be processor. In Proceedings of the International Parallel and Distributed Processing Symposium. 0, 1-8.
- (2009) Proceedings of the International Parallel and Distributed Processing Symposium , pp. 1-8
- Sudheer, C.¹ Nagaraju, T.² Baruah, P.³ Srinivasan, A.⁴

33
- 0025467711
- A bridging model for parallel computation
- VALIANT, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 103-111.
- (1990) Comm. ACM , vol.33 , pp. 103-111
- Valiant, L.G.¹

34
- 0038345683
- Guided region prefetching: A cooperative hardware/software approach
- WANG, Z., BURGER, D., MCKINLEY, K. S., REINHARDT, S. K., AND WEEMS, C. C. 2003. Guided region prefetching: a cooperative hardware/software approach. SIGARCH Comput. Archit. News 31, 388-398.
- (2003) SIGARCH Comput. Archit. News , vol.31 , pp. 388-398
- Wang, Z.¹ Burger, D.² McKinley, K.S.³ Reinhardt, S.K.⁴ Weems, C.C.⁵

35
- 0024935630
- More iteration space tiling
- WOLFE, M. 1989. More iteration space tiling. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing '89). ACM, New York, NY, 655-664. (Pubitemid 20665965)
- (1989) Proc Supercomput 89 , pp. 655-664
- Wolfe Michael¹

36
- 33749633118
- ROS-DMA: A DMA double buffering method for embedded image processing with resource optimized slicing
- DOI 10.1109/RTAS.2006.38, 1613350, Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
- ZINNER, C. AND KUBINGER, W. 2006. Ros-dma: A dma double buffering method for embedded image processing with resource optimized slicing. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium. 361-372. (Pubitemid 44539785)
- (2006) Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS , vol.2006 , pp. 361-372
- Zinner, C.¹ Kubinger, W.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.