SCOPUS 정보 검색 플랫폼

2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Volumn , Issue , 2014, Pages

HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM

(3) Akin, Berkin a Hoe, James C a Franchetti, Franz a

a CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATION THEORY; ENERGY EFFICIENCY; HARDWARE; MATRIX ALGEBRA; MEMORY ARCHITECTURE; THREE DIMENSIONAL INTEGRATED CIRCUITS;

COMMON OPERATIONS; DATA REORGANIZATION; DATA-INTENSIVE APPLICATION; HARDWARE-ACCELERATED; MEMORY ACCESS PATTERNS; PERFORMANCE OPTIMIZATIONS; SYSTEM UTILIZATION; TRANSFORM ALGORITHM;

DYNAMIC RANDOM ACCESS STORAGE;

EID: 84946692636 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HPEC.2014.7040954 Document Type: Conference Paper

Times cited : (23)

References (27)

1
- 0033691565
- Memory access scheduling
- S. Rixner et al., "Memory access scheduling," in Proc. of the 27th Int. Symposium on Computer Architecture (ISCA), 2000, pp. 128-138.
- (2000) Proc. of the 27th Int. Symposium on Computer Architecture (ISCA) , pp. 128-138
- Rixner, S.¹

2
- 0030190854
- Improving data locality with loop transformations
- K. S. McKinley et al., "Improving data locality with loop transformations," ACM Trns. Prog. Lang. Syst., vol. 18, no. 4, pp. 424-453, 1996.
- (1996) ACM Trns. Prog. Lang. Syst. , vol.18 , Issue.4 , pp. 424-453
- McKinley, K.S.¹

3
- 84976827033
- A data locality optimizing algorithm
- M. E. Wolf et al., "A data locality optimizing algorithm," in Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, ser. PLDI '91, 1991, pp. 30-44.
- (1991) Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, Ser. PLDI '91 , pp. 30-44
- Wolf, M.E.¹

4
- 77952283542
- Micro-pages: Increasing dram efficiency with localityaware data placement
- K. Sudan et al., "Micro-pages: Increasing dram efficiency with localityaware data placement," in Proc. of Arch. Sup. for Prog. Lang. and OS, ser. ASPLOS XV, 2010, pp. 219-230.
- (2010) Proc. of Arch. Sup. for Prog. Lang. and OS, Ser. ASPLOS XV , pp. 219-230
- Sudan, K.¹

5
- 84876588873
- Hybrid memory cube (HMC)
- J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.
- (2011) Hotchips
- Pawlowski, J.T.¹

6
- 52649125840
- 3D-stacked memory architectures for multi-core processors
- G. H. Loh, "3D-stacked memory architectures for multi-core processors," in Proc. of the 35th Annual International Symposium on Computer Architecture, (ISCA), 2008, pp. 453-464.
- (2008) Proc. of the 35th Annual International Symposium on Computer Architecture, (ISCA) , pp. 453-464
- Loh, G.H.¹

7
- 84875163754
- Exploration and optimization of 3-D integrated dram subsystems
- April
- C. Weis et al., "Exploration and optimization of 3-D integrated dram subsystems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 4, pp. 597-610, April 2013.
- (2013) IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol.32 , Issue.4 , pp. 597-610
- Weis, C.¹

8
- 84893898462
- A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing
- Oct
- Q. Zhu et al., "A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing," in 3D Systems Integration Conference (3DIC), 2013 IEEE International, Oct 2013, pp. 1-7.
- (2013) 3D Systems Integration Conference (3DIC), 2013 IEEE International , pp. 1-7
- Zhu, Q.¹

9
- 84862084382
- CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory
- K. Chen et al., "CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory," in Design, Automation Test in Europe (DATE), 2012, pp. 33-38.
- (2012) Design, Automation Test in Europe (DATE) , pp. 33-38
- Chen, K.¹

10
- 84866544858
- Hybrid memory cube new dram architecture increases density and performance
- June
- J. Jeddeloh et al., "Hybrid memory cube new dram architecture increases density and performance," in VLSI Technology (VLSIT), 2012 Symposium on, June 2012, pp. 87-88.
- (2012) VLSI Technology (VLSIT), 2012 Symposium on , pp. 87-88
- Jeddeloh, J.¹

11
- 70349972511
- Permuting streaming data using RAMs
- M. Püschel et al., "Permuting streaming data using RAMs," Journal of the ACM, vol. 56, no. 2, pp. 10:1-10:34, 2009.
- (2009) Journal of the ACM , vol.56 , Issue.2 , pp. 101-1034
- Püschel, M.¹

12
- 84905216981
- FFTs with near-optimal memory access through block data layouts
- B. Akin et al., "FFTs with near-optimal memory access through block data layouts," in Proc. IEEE Intl. Conf. Acoustics Speech and Signal Processing (ICASSP), 2014.
- (2014) Proc. IEEE Intl. Conf. Acoustics Speech and Signal Processing (ICASSP)
- Akin, B.¹

13
- 84924476773
- "Gromacs," http://www. gromacs. org, 2008.
- (2008)

14
- 0000011164
- A fast computer method for matrix transposing
- July
- J. O. Eklundh, "A fast computer method for matrix transposing," IEEE Transactions on Computers, vol. C-21, no. 7, pp. 801-803, July 1972.
- (1972) IEEE Transactions on Computers , vol.C-21 , Issue.7 , pp. 801-803
- Eklundh, J.O.¹

15
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- July
- N. Park et al., "Tiling, block data layout, and memory hierarchy performance," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 7, pp. 640-654, July 2003.
- (2003) IEEE Transactions on Parallel and Distributed Systems , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹

16
- 84947031567
- Parallel matrix transpose algorithms on distributed memory concurrent computers
- Oct
- J. Choi et al., "Parallel matrix transpose algorithms on distributed memory concurrent computers," in Proceedings of the Scalable Parallel Libraries Conference, Oct 1993, pp. 245-252.
- (1993) Proceedings of the Scalable Parallel Libraries Conference , pp. 245-252
- Choi, J.¹

17
- 84864952164
- Memory bandwidth efficient two-dimensional fast Fourier transform algorithm and implementation for large problem sizes
- B. Akin et al., "Memory bandwidth efficient two-dimensional fast Fourier transform algorithm and implementation for large problem sizes," in Proc. of the IEEE Symp. on FCCM, 2012, pp. 188-191.
- (2012) Proc. of the IEEE Symp. on FCCM , pp. 188-191
- Akin, B.¹

18
- 78650833009
- Simple but effective heterogeneous main memory with on-chip memory controller support
- Nov
- X. Dong et al., "Simple but effective heterogeneous main memory with on-chip memory controller support," in Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), Nov 2010, pp. 1-11.
- (2010) Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC) , pp. 1-11
- Dong, X.¹

19
- 84924495320
- Aug
- A. Vladimirov, "Multithreaded transposition of square matrices with common code for Intel Xeon processors and Intel Xeon Phi coprocessors," http://research. colfaxinternational. com, Aug 2013.
- (2013) Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors
- Vladimirov, A.¹

20
- 77952265152
- Optimizing matrix transpose in cuda
- Jan
- G. Ruetsch et al., "Optimizing matrix transpose in cuda," Nvidia Tech. Report, Jan 2009.
- (2009) Nvidia Tech. Report
- Ruetsch, G.¹

21
- 84924476772
- CACTI 6. 5, HP labs
- "CACTI 6. 5, HP labs," http://www. hpl. hp. com/research/cacti/.

22
- 3142665556
- Dynamic data layouts for cache-conscious implementation of a class of signal transforms
- July
- N. Park et al., "Dynamic data layouts for cache-conscious implementation of a class of signal transforms," IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 2120-2134, July 2004.
- (2004) IEEE Transactions on Signal Processing , vol.52 , Issue.7 , pp. 2120-2134
- Park, N.¹

23
- 33748543231
- Hardware support for bulk data movement in server platforms
- Oct
- L. Zhao et al., "Hardware support for bulk data movement in server platforms," in Proc. of IEEE Intl. Conf. on Computer Design, (ICCD), Oct 2005, pp. 53-60.
- (2005) Proc. of IEEE Intl. Conf. on Computer Design, (ICCD) , pp. 53-60
- Zhao, L.¹

24
- 84892504664
- Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization
- V. Seshadri et al., "Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization," in Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-46, 2013, pp. 185-197.
- (2013) Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, Ser. MICRO-46 , pp. 185-197
- Seshadri, V.¹

25
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- S. Che et al., "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), 2011, pp. 13:1-13:11.
- (2011) Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC) , pp. 131-1311
- Che, S.¹

26
- 84904469580
- NDC: Analyzing the impact of 3D-stacked memory+ logic devices on mapreduce workloads
- S. Pugsley et al., "NDC: Analyzing the impact of 3D-stacked memory+ logic devices on mapreduce workloads," in Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS), 2014.
- (2014) Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS)
- Pugsley, S.¹

27
- 84906342287
- Understanding the design space of DRAM-optimized FFT hardware accelerators
- B. Akin et al., "Understanding the design space of DRAM-optimized FFT hardware accelerators," in Proc. of IEEE Int. Conf. on Application-Specific Systems, Architectures and Processors (ASAP), 2014.
- (2014) Proc. of IEEE Int. Conf. on Application-Specific Systems, Architectures and Processors (ASAP)
- Akin, B.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.