메뉴 건너뛰기




Volumn 14, Issue 3, 2006, Pages 279-290

A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

Author keywords

Computer aided analysis; Design methodology; Memory architecture; Performance; Prefetching

Indexed keywords

DESIGN METHODOLOGY; MEMORY ARCHITECTURE; PREFETCHING;

EID: 33646413433     PISSN: 10638210     EISSN: None     Source Type: Journal    
DOI: 10.1109/TVLSI.2006.871759     Document Type: Article
Times cited : (23)

References (44)
  • 1
    • 33646429702 scopus 로고    scopus 로고
    • Multi-level memory prefetching for media and stream processors
    • J. Fritts, "Multi-level memory prefetching for media and stream processors," in Proc. Int. Conf. Multimedia Expo (ICME), 2002, pp. 101-104.
    • (2002) Proc. Int. Conf. Multimedia Expo (ICME) , pp. 101-104
    • Fritts, J.1
  • 2
    • 33646414407 scopus 로고
    • Paged control store prefech mechanism
    • Dec.
    • T. A. Enger, "Paged control store prefech mechanism," IBM Tech. Disci. Bull., vol. 7, no. 16, pp. 2140-2141, Dec. 1973.
    • (1973) IBM Tech. Disci. Bull. , vol.7 , Issue.16 , pp. 2140-2141
    • Enger, T.A.1
  • 3
    • 33646400221 scopus 로고
    • Cache memory with prefetching of data by priority
    • May
    • B. T. Bennet and P. A. Franaczek, "Cache memory with prefetching of data by priority," IBM Technical Disclosure Bulleting, vol. 18, no. 12, pp. 4231-4232, May 1976.
    • (1976) IBM Technical Disclosure Bulleting , vol.18 , Issue.12 , pp. 4231-4232
    • Bennet, B.T.1    Franaczek, P.A.2
  • 5
    • 0018106484 scopus 로고
    • Sequential program prefetching in memory hierarchies
    • Dec.
    • A. J. Smith, "Sequential program prefetching in memory hierarchies," IEEE Computer, vol. 11, no. 12, pp. 7-21, Dec. 1978.
    • (1978) IEEE Computer , vol.11 , Issue.12 , pp. 7-21
    • Smith, A.J.1
  • 6
    • 0025429331 scopus 로고
    • Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
    • N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," in Proc. Int. Symp. Comput. Arch., 1990, pp. 363-373.
    • (1990) Proc. Int. Symp. Comput. Arch. , pp. 363-373
    • Jouppi, N.P.1
  • 7
    • 0026267802 scopus 로고
    • An effective on-chip preloading scheme to reduce data access penalty
    • J.-L. Baer and T.-F. Chen, "An effective on-chip preloading scheme to reduce data access penalty." in Proc. Supercomputing, 1991, pp. 176-186.
    • (1991) Proc. Supercomputing , pp. 176-186
    • Baer, J.-L.1    Chen, T.-F.2
  • 8
    • 33646429702 scopus 로고    scopus 로고
    • Multi-level memory prefetching for media and stream processors
    • J. Fritts, "Multi-level memory prefetching for media and stream processors," in Int. Conf. Multimedia Expo (ICME), 2002, pp. 101-104.
    • (2002) Int. Conf. Multimedia Expo (ICME) , pp. 101-104
    • Fritts, J.1
  • 9
    • 0036005098 scopus 로고    scopus 로고
    • Prefetching for improved bus wrapper performance in cores
    • Jan.
    • R. Lysecky and F. Vahid, "Prefetching for improved bus wrapper performance in cores," ACM Trans. Des. Automat. Electron. Syst., vol. 7, no. 1, pp. 58-90, Jan. 2002.
    • (2002) ACM Trans. Des. Automat. Electron. Syst. , vol.7 , Issue.1 , pp. 58-90
    • Lysecky, R.1    Vahid, F.2
  • 10
    • 0038344707 scopus 로고    scopus 로고
    • Improving data prefetching efficacy in multimedia applications
    • Jun.
    • R. Cucchiara, A. Prati, and M. Piccardi, "Improving data prefetching efficacy in multimedia applications." Multimedia Tools Appl., vol. 20, no. 2, pp. 159-178, Jun. 2003.
    • (2003) Multimedia Tools Appl. , vol.20 , Issue.2 , pp. 159-178
    • Cucchiara, R.1    Prati, A.2    Piccardi, M.3
  • 11
    • 84944748972 scopus 로고    scopus 로고
    • A hardware-based cache pollution filtering mechanism for aggressive prefetches
    • X. Zhuang and H.-H. S. Lee, "A hardware-based cache pollution filtering mechanism for aggressive prefetches," in Proc. IEEE Int. Conf. Parallel Process. (ICPP), 2003, pp. 286-293.
    • (2003) Proc. IEEE Int. Conf. Parallel Process. (ICPP) , pp. 286-293
    • Zhuang, X.1    Lee, H.-H.S.2
  • 12
    • 0003690936 scopus 로고
    • Software methods for improvement of cache performance on supercomputer applications
    • Ph.D. dissertation, Rice University, Houston, TX
    • A. K. Porterfield, "Software methods for improvement of cache performance on supercomputer applications," Ph.D. dissertation, Rice University, Houston, TX, 1989, Tech. Rep. CRPC-TR89009.
    • (1989) Tech. Rep. , vol.CRPC-TR89009
    • Porterfield, A.K.1
  • 15
    • 0030129806 scopus 로고    scopus 로고
    • The MIPS R10000 superscalar microprocessor
    • Apr.
    • K. Yeager, "The MIPS R10000 superscalar microprocessor," IEEE Micro, vol. 16, no. 2, pp. 28-40, Apr. 1996.
    • (1996) IEEE Micro , vol.16 , Issue.2 , pp. 28-40
    • Yeager, K.1
  • 18
    • 0002031606 scopus 로고
    • Tolerating latency through software-controlled data prefetching
    • Jun.
    • T. Mowry and A. Gupta, "Tolerating latency through software-controlled data prefetching," J. Parallel Distrib. Comput., vol. 12, no. 2, pp. 87-106, Jun. 1991.
    • (1991) J. Parallel Distrib. Comput. , vol.12 , Issue.2 , pp. 87-106
    • Mowry, T.1    Gupta, A.2
  • 19
    • 0034839064 scopus 로고    scopus 로고
    • Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
    • C.-K. Luk, "Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors," in Proc. 28th Int. Conf. Comput. Arch., 2001, pp. 40-51.
    • (2001) Proc. 28th Int. Conf. Comput. Arch. , pp. 40-51
    • Luk, C.-K.1
  • 21
    • 84858881855 scopus 로고    scopus 로고
    • [Online]
    • Intel, Intel Corp. [Online]. Available: http://www.intel.com, 2005
    • (2005)
  • 22
    • 0004864204 scopus 로고
    • An integrated hardware/software scheme for shared-memory multiprocessors
    • E. H. Gornish and A. V. Veidenbaum, "An integrated hardware/software scheme for shared-memory multiprocessors," in Proc. Int. Conf. Parallel Process., 1994, pp. 281-284.
    • (1994) Proc. Int. Conf. Parallel Process. , pp. 281-284
    • Gornish, E.H.1    Veidenbaum, A.V.2
  • 23
    • 0029511258 scopus 로고
    • An effective programmable prefetch engine for on-chip caches
    • T. Chen, "An effective programmable prefetch engine for on-chip caches," in Proc. 28th Int. Symp. Microarch., 1995, pp. 237-242.
    • (1995) Proc. 28th Int. Symp. Microarch. , pp. 237-242
    • Chen, T.1
  • 28
    • 84858877281 scopus 로고    scopus 로고
    • [Online]
    • IMEC [Online]. Available: http://www.imec.be/design/atomium/, 2005
    • (2005)
  • 35
    • 2342635671 scopus 로고    scopus 로고
    • CACTI 3.0: An integrated cache timing, power and area model
    • COMPAQ, Palo Alto, CA
    • P. Shivakumar and N. Jouppi, "CACTI 3.0: An Integrated Cache Timing, Power and Area Model," COMPAQ, Palo Alto, CA, WRL Res. Rep. 2001/2, 2001.
    • (2001) WRL Res. Rep. , vol.2001 , Issue.2
    • Shivakumar, P.1    Jouppi, N.2
  • 37
    • 0029356792 scopus 로고
    • A fast hierarchical motion vector estimation algorithm using mean pyramid
    • Aug.
    • M. Nam, J.-S. Kim, R.-H. Park, and Y. S. Shim, "A fast hierarchical motion vector estimation algorithm using mean pyramid," IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 4, pp. 344-351, Aug. 1995.
    • (1995) IEEE Trans. Circuits Syst. Video Technol. , vol.5 , Issue.4 , pp. 344-351
    • Nam, M.1    Kim, J.-S.2    Park, R.-H.3    Shim, Y.S.4
  • 40
    • 33646424580 scopus 로고    scopus 로고
    • Platform independent data transfer and storage exploration illustrated on a parallel cavity detection algorithm
    • K. Danckaert, F. Catthoor, and H. D. Man, "Platform independent data transfer and storage exploration illustrated on a parallel cavity detection algorithm," in Proc. ACM Conf. Parrallel Distrib. Process. Tech. Appl., 1999, pp. 1669-1675.
    • (1999) Proc. ACM Conf. Parrallel Distrib. Process. Tech. Appl. , pp. 1669-1675
    • Danckaert, K.1    Catthoor, F.2    Man, H.D.3
  • 41
    • 0033875764 scopus 로고    scopus 로고
    • The local wavelet transform: A memory-efficient, high-speed architecture optimized to a region-oriented zero-tree coder
    • G. Lafruit, L. Nachtergaele, B. Vahnhoof, and F. Catthoor, "The local wavelet transform: A memory-efficient, high-speed architecture optimized to a region-oriented zero-tree coder," Integr. Comput.-Aided Eng., vol. 7, no. 2, pp. 89-103, 2000.
    • (2000) Integr. Comput.-aided Eng. , vol.7 , Issue.2 , pp. 89-103
    • Lafruit, G.1    Nachtergaele, L.2    Vahnhoof, B.3    Catthoor, F.4
  • 43
    • 33646413590 scopus 로고    scopus 로고
    • Texas Instruments, Dallas, TX, SPRA486C
    • "Power Consumption Summary," Texas Instruments, Dallas, TX, SPRA486C, 2002.
    • (2002) Power Consumption Summary


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.