메뉴 건너뛰기




Volumn 8, Issue 4, 2012, Pages

Optimizing explicit data transfers for data parallel applications on the cell architecture

Author keywords

Cell B.E.; Data parallelization; Direct memory access (DMA); Double buffering

Indexed keywords

CELL ARCHITECTURES; CELL BROADBAND ENGINE ARCHITECTURE; COMPUTATION TIME; CYCLE-ACCURATE SIMULATORS; DATA ITEMS; DATA PARALLEL; DATA PARALLELIZATION; DATA SHARING; DIRECT MEMORY ACCESS; DOUBLE BUFFERING; GENERAL APPROACH; LOCAL MEMORIES; MULTI CORE; NUMBER OF BLOCKS; OFF-CHIP; OPTIMAL VALUES;

EID: 84857823732     PISSN: 15443566     EISSN: 15443973     Source Type: Journal    
DOI: 10.1145/2086696.2086716     Document Type: Article
Times cited : (22)

References (36)
  • 1
    • 0029373981 scopus 로고
    • Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors
    • AGARWAL, A., KRANZ, D., AND NATARAJAN, V. 1995. Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors. IEEE Trans. Parall. Distrib. Syst. 6, 9, 943-962.
    • (1995) IEEE Trans. Parall. Distrib. Syst. , vol.6 , Issue.9 , pp. 943-962
    • Agarwal, A.1    Kranz, D.2    Natarajan, V.3
  • 8
    • 77949756278 scopus 로고    scopus 로고
    • Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
    • Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., Lecture Notes in Computer Science Series, Springer Berlin
    • CARPENTER, P., RAMIREZ, A., AND AYGUAD, E. 2010. Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. In High Performance Embedded Architectures and Compilers, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., Lecture Notes in Computer Science Series, vol. 5952, Springer Berlin, 96-110.
    • (2010) High Performance Embedded Architectures and Compilers , vol.5952 , pp. 96-110
    • Carpenter, P.1    Ramirez, A.2    Ayguad, E.3
  • 10
    • 0028202735 scopus 로고
    • A performance study of software and hardware data prefetching schemes
    • CHEN, T.-F. AND BAER, J.-L. 1994. A performance study of software and hardware data prefetching schemes. SIGARCH Comput. Archit. News 22, 223-232.
    • (1994) SIGARCH Comput. Archit. News , vol.22 , pp. 223-232
    • Chen, T.-F.1    Baer, J.-L.2
  • 14
    • 0003455775 scopus 로고
    • Master's thesis, Department of Computer Science, Rice University
    • ESSEGHIR, K. 1993. Improving data locality for caches. Master's thesis, Department of Computer Science, Rice University.
    • (1993) Improving Data Locality for Caches
    • Esseghir, K.1
  • 17
    • 0029308368 scopus 로고
    • Effective hardware-based data prefetching for high-performance processors
    • FU CHEN, T. AND LOUP BAER, J. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44, 609-623.
    • (1995) IEEE Trans. Comput. , vol.44 , pp. 609-623
    • Chen T, F.U.1    Loup Baer, J.2
  • 18
    • 34250167228 scopus 로고    scopus 로고
    • The cell broadband engine: Exploiting multiple levels of parallelism in a chip multiprocessor
    • DOI 10.1007/s10766-007-0035-4
    • GSCHWIND, M. 2007. The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor. Int. J. Parall. Program. 35, 3, 233-262. (Pubitemid 46904453)
    • (2007) International Journal of Parallel Programming , vol.35 , Issue.3 , pp. 233-262
    • Gschwind, M.1
  • 19
    • 84857885288 scopus 로고    scopus 로고
    • IBM. 2008. Cell SDK 3.1. https://www.ibm.com/developerworks/power/cell/.
    • (2008) Cell SDK 3.1
  • 20
    • 84857805403 scopus 로고    scopus 로고
    • IBM. 2009. Cell Simulator. http://www.alphaworks.ibm.com/tech/ cellsystemsim.
    • (2009) Cell Simulator
  • 21
    • 33746923043 scopus 로고    scopus 로고
    • Cell multiprocessor communication network: Built for speed
    • DOI 10.1109/MM.2006.49
    • KISTLER, M., PERRONE, M., AND PETRINI, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro, 26, 3, 10-23. (Pubitemid 44194065)
    • (2006) IEEE Micro , vol.26 , Issue.3 , pp. 10-23
    • Kistler, M.1    Perrone, M.2    Petrini, F.3
  • 22
    • 84976859541 scopus 로고
    • The cache performance and optimizations of blocked algorithms
    • LAM, M. D., ROTHBERG, E. E., AND WOLF, M. E. 1991. The cache performance and optimizations of blocked algorithms. SIGOPS Oper. Syst. Rev. 25, 63-74.
    • (1991) SIGOPS Oper. Syst. Rev. , vol.25 , pp. 63-74
    • Lam, M.D.1    Rothberg, E.E.2    Wolf, M.E.3
  • 23
    • 0002031606 scopus 로고
    • Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
    • MOWRY, T. AND GUPTA, A. 1991. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. J. Parall. Distrib. Comp. 12, 87-106.
    • (1991) J. Parall. Distrib. Comp. , vol.12 , pp. 87-106
    • Mowry, T.1    Gupta, A.2
  • 29
    • 73449104291 scopus 로고    scopus 로고
    • Programming multiprocessors with explicitly managed memory hierarchies
    • SCHNEIDER, S., YEOM, J., AND NIKOLOPOULOS, D. 2009. Programming multiprocessors with explicitly managed memory hierarchies. Computer 42, 12, 28-34.
    • (2009) Computer , vol.42 , Issue.12 , pp. 28-34
    • Schneider, S.1    Yeom, J.2    Nikolopoulos, D.3
  • 33
    • 0025467711 scopus 로고
    • A bridging model for parallel computation
    • VALIANT, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 103-111.
    • (1990) Comm. ACM , vol.33 , pp. 103-111
    • Valiant, L.G.1
  • 35
    • 0024935630 scopus 로고
    • More iteration space tiling
    • WOLFE, M. 1989. More iteration space tiling. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing '89). ACM, New York, NY, 655-664. (Pubitemid 20665965)
    • (1989) Proc Supercomput 89 , pp. 655-664
    • Wolfe Michael1
  • 36
    • 33749633118 scopus 로고    scopus 로고
    • ROS-DMA: A DMA double buffering method for embedded image processing with resource optimized slicing
    • DOI 10.1109/RTAS.2006.38, 1613350, Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
    • ZINNER, C. AND KUBINGER, W. 2006. Ros-dma: A dma double buffering method for embedded image processing with resource optimized slicing. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium. 361-372. (Pubitemid 44539785)
    • (2006) Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS , vol.2006 , pp. 361-372
    • Zinner, C.1    Kubinger, W.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.