-
2
-
-
0030190854
-
Improving data locality with loop transformations
-
K. S. McKinley et al., "Improving data locality with loop transformations," ACM Trns. Prog. Lang. Syst., vol. 18, no. 4, pp. 424-453, 1996.
-
(1996)
ACM Trns. Prog. Lang. Syst.
, vol.18
, Issue.4
, pp. 424-453
-
-
McKinley, K.S.1
-
4
-
-
77952283542
-
Micro-pages: Increasing dram efficiency with localityaware data placement
-
K. Sudan et al., "Micro-pages: Increasing dram efficiency with localityaware data placement," in Proc. of Arch. Sup. for Prog. Lang. and OS, ser. ASPLOS XV, 2010, pp. 219-230.
-
(2010)
Proc. of Arch. Sup. for Prog. Lang. and OS, Ser. ASPLOS XV
, pp. 219-230
-
-
Sudan, K.1
-
5
-
-
84876588873
-
Hybrid memory cube (HMC)
-
J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.
-
(2011)
Hotchips
-
-
Pawlowski, J.T.1
-
7
-
-
84875163754
-
Exploration and optimization of 3-D integrated dram subsystems
-
April
-
C. Weis et al., "Exploration and optimization of 3-D integrated dram subsystems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 4, pp. 597-610, April 2013.
-
(2013)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, vol.32
, Issue.4
, pp. 597-610
-
-
Weis, C.1
-
8
-
-
84893898462
-
A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing
-
Oct
-
Q. Zhu et al., "A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing," in 3D Systems Integration Conference (3DIC), 2013 IEEE International, Oct 2013, pp. 1-7.
-
(2013)
3D Systems Integration Conference (3DIC), 2013 IEEE International
, pp. 1-7
-
-
Zhu, Q.1
-
9
-
-
84862084382
-
CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory
-
K. Chen et al., "CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory," in Design, Automation Test in Europe (DATE), 2012, pp. 33-38.
-
(2012)
Design, Automation Test in Europe (DATE)
, pp. 33-38
-
-
Chen, K.1
-
10
-
-
84866544858
-
Hybrid memory cube new dram architecture increases density and performance
-
June
-
J. Jeddeloh et al., "Hybrid memory cube new dram architecture increases density and performance," in VLSI Technology (VLSIT), 2012 Symposium on, June 2012, pp. 87-88.
-
(2012)
VLSI Technology (VLSIT), 2012 Symposium on
, pp. 87-88
-
-
Jeddeloh, J.1
-
11
-
-
70349972511
-
Permuting streaming data using RAMs
-
M. Püschel et al., "Permuting streaming data using RAMs," Journal of the ACM, vol. 56, no. 2, pp. 10:1-10:34, 2009.
-
(2009)
Journal of the ACM
, vol.56
, Issue.2
, pp. 101-1034
-
-
Püschel, M.1
-
13
-
-
84924476773
-
-
"Gromacs," http://www. gromacs. org, 2008.
-
(2008)
-
-
-
14
-
-
0000011164
-
A fast computer method for matrix transposing
-
July
-
J. O. Eklundh, "A fast computer method for matrix transposing," IEEE Transactions on Computers, vol. C-21, no. 7, pp. 801-803, July 1972.
-
(1972)
IEEE Transactions on Computers
, vol.C-21
, Issue.7
, pp. 801-803
-
-
Eklundh, J.O.1
-
15
-
-
0042235298
-
Tiling, block data layout, and memory hierarchy performance
-
July
-
N. Park et al., "Tiling, block data layout, and memory hierarchy performance," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 7, pp. 640-654, July 2003.
-
(2003)
IEEE Transactions on Parallel and Distributed Systems
, vol.14
, Issue.7
, pp. 640-654
-
-
Park, N.1
-
16
-
-
84947031567
-
Parallel matrix transpose algorithms on distributed memory concurrent computers
-
Oct
-
J. Choi et al., "Parallel matrix transpose algorithms on distributed memory concurrent computers," in Proceedings of the Scalable Parallel Libraries Conference, Oct 1993, pp. 245-252.
-
(1993)
Proceedings of the Scalable Parallel Libraries Conference
, pp. 245-252
-
-
Choi, J.1
-
17
-
-
84864952164
-
Memory bandwidth efficient two-dimensional fast Fourier transform algorithm and implementation for large problem sizes
-
B. Akin et al., "Memory bandwidth efficient two-dimensional fast Fourier transform algorithm and implementation for large problem sizes," in Proc. of the IEEE Symp. on FCCM, 2012, pp. 188-191.
-
(2012)
Proc. of the IEEE Symp. on FCCM
, pp. 188-191
-
-
Akin, B.1
-
18
-
-
78650833009
-
Simple but effective heterogeneous main memory with on-chip memory controller support
-
Nov
-
X. Dong et al., "Simple but effective heterogeneous main memory with on-chip memory controller support," in Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), Nov 2010, pp. 1-11.
-
(2010)
Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC)
, pp. 1-11
-
-
Dong, X.1
-
20
-
-
77952265152
-
Optimizing matrix transpose in cuda
-
Jan
-
G. Ruetsch et al., "Optimizing matrix transpose in cuda," Nvidia Tech. Report, Jan 2009.
-
(2009)
Nvidia Tech. Report
-
-
Ruetsch, G.1
-
21
-
-
84924476772
-
-
CACTI 6. 5, HP labs
-
"CACTI 6. 5, HP labs," http://www. hpl. hp. com/research/cacti/.
-
-
-
-
22
-
-
3142665556
-
Dynamic data layouts for cache-conscious implementation of a class of signal transforms
-
July
-
N. Park et al., "Dynamic data layouts for cache-conscious implementation of a class of signal transforms," IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 2120-2134, July 2004.
-
(2004)
IEEE Transactions on Signal Processing
, vol.52
, Issue.7
, pp. 2120-2134
-
-
Park, N.1
-
23
-
-
33748543231
-
Hardware support for bulk data movement in server platforms
-
Oct
-
L. Zhao et al., "Hardware support for bulk data movement in server platforms," in Proc. of IEEE Intl. Conf. on Computer Design, (ICCD), Oct 2005, pp. 53-60.
-
(2005)
Proc. of IEEE Intl. Conf. on Computer Design, (ICCD)
, pp. 53-60
-
-
Zhao, L.1
-
25
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
S. Che et al., "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), 2011, pp. 13:1-13:11.
-
(2011)
Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC)
, pp. 131-1311
-
-
Che, S.1
|