-
1
-
-
84960084201
-
-
"CACTI 6.5, HP labs," http://www.hpl.hp.com/research/cacti/.
-
CACTI 6.5, HP Labs
-
-
-
4
-
-
84924470066
-
-
"McPAT 1.0, HP labs," http://www.hpl.hp.com/research/mcpat/.
-
McPAT 1.0, HP Labs
-
-
-
6
-
-
84960130034
-
-
"Gromacs," http://www.gromacs.org, 2008.
-
(2008)
Gromacs
-
-
-
9
-
-
84960189963
-
High bandwidth memory (HBM) dram
-
"High bandwidth memory (HBM) dram," JEDEC, JESD235, 2013.
-
(2013)
JEDEC, JESD
, vol.235
-
-
-
10
-
-
84960079694
-
-
October
-
"Intel 64 and ia-32 architectures software developers," http://www.intel.com/content/dam/www/public/us/en/documents /manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf, October 2014.
-
(2014)
Intel 64 and ia-32 Architectures Software Developers
-
-
-
11
-
-
84905216981
-
FFTS with near-optimal memory access through block data layouts
-
Florence, Italy, May 4-9, 2014
-
B. Akin, F. Franchetti, and J. C. Hoe, "FFTS with near-optimal memory access through block data layouts," in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, 2014, pp. 3898-3902.
-
(2014)
IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
, vol.2014
, pp. 3898-3902
-
-
Akin, B.1
Franchetti, F.2
Hoe, J.C.3
-
12
-
-
84906342287
-
Understanding the design space of dram-optimized hardware FFT accelerators
-
Zurich, Switzerland, June 18-20, 2014
-
-, "Understanding the design space of dram-optimized hardware FFT accelerators," in IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2014, Zurich, Switzerland, June 18-20, 2014, 2014, pp. 248-255.
-
(2014)
IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, ASAP
, vol.2014
, pp. 248-255
-
-
Akin, B.1
Franchetti, F.2
Hoe, J.C.3
-
13
-
-
84946692636
-
Hamlet: Hardware accelerated memory layout transform within 3d-stacked DRAM
-
Waltham, MA, USA, September 9-11, 2014
-
B. Akin, J. C. Hoe, and F. Franchetti, "Hamlet: Hardware accelerated memory layout transform within 3d-stacked DRAM," in IEEE High Performance Extreme Computing Conference, HPEC 2014, Waltham, MA, USA, September 9-11, 2014, 2014, pp. 1-6.
-
(2014)
IEEE High Performance Extreme Computing Conference, HPEC
, vol.2014
, pp. 1-6
-
-
Akin, B.1
Hoe, J.C.2
Franchetti, F.3
-
14
-
-
84864952164
-
Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes
-
29 April-1 May 2012, Toronto, Ontario, Canada
-
B. Akin, P. A. Milder, F. Franchetti, and J. C. Hoe, "Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes," in 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2012, 29 April-1 May 2012, Toronto, Ontario, Canada, 2012, pp. 188-191.
-
(2012)
2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM
, vol.2012
, pp. 188-191
-
-
Akin, B.1
Milder, P.A.2
Franchetti, F.3
Hoe, J.C.4
-
15
-
-
84881179047
-
Efficient virtual memory for big memory servers
-
ACM
-
A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, "Efficient virtual memory for big memory servers," in Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM, 2013, pp. 237-248.
-
(2013)
Proceedings of the 40th Annual International Symposium on Computer Architecture
, pp. 237-248
-
-
Basu, A.1
Gandhi, J.2
Chang, J.3
Hill, M.D.4
Swift, M.M.5
-
16
-
-
20744453223
-
Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models
-
Feb
-
G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models," Proceedings of the IEEE, vol. 93, no. 2, pp. 276-292, Feb 2005.
-
(2005)
Proceedings of the IEEE
, vol.93
, Issue.2
, pp. 276-292
-
-
Baumgartner, G.1
Auer, A.2
Bernholdt, D.3
Bibireata, A.4
Choppella, V.5
Cociorva, D.6
Gao, X.7
Harrison, R.8
Hirata, S.9
Krishnamoorthy, S.10
Krishnan, S.11
Lam, C.12
Lu, Q.13
Nooijen, M.14
Pitzer, R.15
Ramanujam, J.16
Sadayappan, P.17
Sibiryakov, A.18
-
17
-
-
63549095070
-
The parsec benchmark suite: Characterization and architectural implications
-
ACM
-
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The parsec benchmark suite: Characterization and architectural implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008, pp. 72-81.
-
(2008)
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques
, pp. 72-81
-
-
Bienia, C.1
Kumar, S.2
Singh, J.P.3
Li, K.4
-
18
-
-
70449629588
-
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
-
ACM
-
A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Proceedings of the twentyfirst annual symposium on Parallelism in algorithms and architectures. ACM, 2009, pp. 233-244.
-
(2009)
Proceedings of the Twentyfirst Annual Symposium on Parallelism in Algorithms and Architectures
, pp. 233-244
-
-
Buluç, A.1
Fineman, J.T.2
Frigo, M.3
Gilbert, J.R.4
Leiserson, C.E.5
-
19
-
-
0032761638
-
Impulse: Building a smarter memory controller
-
Jan
-
J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama, "Impulse: building a smarter memory controller," in High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium On, Jan 1999, pp. 70-79.
-
(1999)
High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium on
, pp. 70-79
-
-
Carter, J.1
Hsieh, W.2
Stoller, L.3
Swanson, M.4
Zhang, L.5
Brunvand, E.6
Davis, A.7
Kuo, C.-C.8
Kuramkote, R.9
Parker, M.10
Schaelicke, L.11
Tateyama, T.12
-
20
-
-
84876514971
-
-
N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, "Usimm: the utah simulated memory module," 2012.
-
(2012)
Usimm: The Utah Simulated Memory Module
-
-
Chatterjee, N.1
Balasubramonian, R.2
Shevgoor, M.3
Pugsley, S.4
Udipi, A.5
Shafiee, A.6
Sudan, K.7
Awasthi, M.8
Chishti, Z.9
-
21
-
-
83155184570
-
Dymaxion: Optimizing memory access patterns for heterogeneous systems
-
S. Che, J. W. Sheaffer, and K. Skadron, "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), 2011, pp. 13:1-13:11.
-
(2011)
Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC)
, pp. 131-1311
-
-
Che, S.1
Sheaffer, J.W.2
Skadron, K.3
-
22
-
-
84862084382
-
CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory
-
K. Chen, S. Li, N. Muralimanohar, J.-H. Ahn, J. Brockman, and N. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory," in Design, Automation Test in Europe (DATE), 2012, pp. 33-38.
-
(2012)
Design, Automation Test in Europe (DATE)
, pp. 33-38
-
-
Chen, K.1
Li, S.2
Muralimanohar, N.3
Ahn, J.-H.4
Brockman, J.5
Jouppi, N.6
-
23
-
-
84859721885
-
An 8x 10-gb/s source-synchronous i/o system based on high-density silicon carrier interconnects
-
T. O. Dickson, Y. Liu, S. V. Rylov, B. Dang, C. K. Tsang, P. S. Andry, J. F. Bulzacchelli, H. A. Ainspan, X. Gu, L. Turlapati et al., "An 8x 10-gb/s source-synchronous i/o system based on high-density silicon carrier interconnects," Solid-State Circuits, IEEE Journal of, vol. 47, no. 4, pp. 884-896, 2012.
-
(2012)
Solid-State Circuits, IEEE Journal of
, vol.47
, Issue.4
, pp. 884-896
-
-
Dickson, T.O.1
Liu, Y.2
Rylov, S.V.3
Dang, B.4
Tsang, C.K.5
Andry, P.S.6
Bulzacchelli, J.F.7
Ainspan, H.A.8
Gu, X.9
Turlapati, L.10
-
24
-
-
78650833009
-
Simple but effective heterogeneous main memory with on-chip memory controller support
-
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2010, pp. 1-11.
-
(2010)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society
, pp. 1-11
-
-
Dong, X.1
Xie, Y.2
Muralimanohar, N.3
Jouppi, N.P.4
-
25
-
-
84887089342
-
Centip3de: A many-core prototype exploring 3d integration and near-threshold computing
-
Nov
-
R. G. Dreslinski, D. Fick, B. Giridhar, G. Kim, S. Seo, M. Fojtik, S. Satpathy, Y. Lee, D. Kim, N. Liu, M. Wieckowski, G. Chen, D. Sylvester, D. Blaauw, and T. Mudge, "Centip3de: A many-core prototype exploring 3d integration and near-threshold computing," Commun. ACM, vol. 56, no. 11, pp. 97-104, Nov. 2013.
-
(2013)
Commun. ACM
, vol.56
, Issue.11
, pp. 97-104
-
-
Dreslinski, R.G.1
Fick, D.2
Giridhar, B.3
Kim, G.4
Seo, S.5
Fojtik, M.6
Satpathy, S.7
Lee, Y.8
Kim, D.9
Liu, N.10
Wieckowski, M.11
Chen, G.12
Sylvester, D.13
Blaauw, D.14
Mudge, T.15
-
26
-
-
84934280905
-
Nda: Near-dram acceleration architecture leveraging commodity dram devices and standard memory modules
-
Feb
-
A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim, "Nda: Near-dram acceleration architecture leveraging commodity dram devices and standard memory modules," in High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, Feb 2015, pp. 283-295.
-
(2015)
High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on
, pp. 283-295
-
-
Farmahini-Farahani, A.1
Ahn, J.H.2
Morrow, K.3
Kim, N.S.4
-
27
-
-
20744449792
-
The design and implementation of FFTW3 program generation, optimization, and platform adaptation
-
M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," Proceedings of the IEEE, Special issue on "Program Generation, Optimization, and Platform Adaptation", vol. 93, no. 2, pp. 216-231, 2005.
-
(2005)
Proceedings of the IEEE, Special Issue on
, vol.93
, Issue.2
, pp. 216-231
-
-
Frigo, M.1
Johnson, S.G.2
-
28
-
-
0029290396
-
Processing in memory: The terasys massively parallel pim array
-
Apr
-
M. Gokhale, B. Holmes, and K. Iobst, "Processing in memory: the terasys massively parallel pim array," Computer, vol. 28, no. 4, pp. 23-31, Apr 1995.
-
(1995)
Computer
, vol.28
, Issue.4
, pp. 23-31
-
-
Gokhale, M.1
Holmes, B.2
Iobst, K.3
-
29
-
-
44249094647
-
Anatomy of high-performance matrix multiplication
-
May
-
K. Goto and R. A. v. d. Geijn, "Anatomy of high-performance matrix multiplication," ACM Trans. Math. Softw., vol. 34, no. 3, pp. 12:1-12:25, May 2008.
-
(2008)
ACM Trans. Math. Softw.
, vol.34
, Issue.3
, pp. 121-1225
-
-
Goto, K.1
Geijn, R.A.2
-
30
-
-
77954724842
-
Sams multi-layout memory: Providing multiple views of data to boost simd performance
-
C. Gou, G. Kuzmanov, and G. N. Gaydadjiev, "Sams multi-layout memory: Providing multiple views of data to boost simd performance," in Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10, 2010, pp. 179-188.
-
(2010)
Proceedings of the 24th ACM International Conference on Supercomputing, Ser. ICS '10
, pp. 179-188
-
-
Gou, C.1
Kuzmanov, G.2
Gaydadjiev, G.N.3
-
31
-
-
84959888157
-
3d-stacked memory-side acceleration: Accelerator and system design
-
Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, "3d-stacked memory-side acceleration: Accelerator and system design," in In the Workshop on Near-Data Processing (WoNDP) (Held in conjunction with MICRO-47.), 2014.
-
(2014)
The Workshop on Near-Data Processing (WoNDP) (Held in Conjunction with MICRO-47.)
-
-
Guo, Q.1
Alachiotis, N.2
Akin, B.3
Sadi, F.4
Xu, G.5
Low, T.M.6
Pileggi, L.7
Hoe, J.C.8
Franchetti, F.9
-
32
-
-
36849034066
-
Spec cpu2006 benchmark descriptions
-
J. L. Henning, "Spec cpu2006 benchmark descriptions," ACM SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1-17, 2006.
-
(2006)
ACM SIGARCH Computer Architecture News
, vol.34
, Issue.4
, pp. 1-17
-
-
Henning, J.L.1
-
33
-
-
84960189965
-
Improving node-level map-reduce performance using processing-inmemory technologies
-
M. Islam, M. Scrback, K. Kavi, M. Ignatowski, and N. Jayasena, "Improving node-level map-reduce performance using processing-inmemory technologies," in 7th Workshop on UnConventional High Performance Computing held in conjunction with the EuroPar 2014, ser. UCHPC2014, 2014.
-
(2014)
7th Workshop on UnConventional High Performance Computing Held in Conjunction with the EuroPar 2014, Ser. UCHPC2014
-
-
Islam, M.1
Scrback, M.2
Kavi, K.3
Ignatowski, M.4
Jayasena, N.5
-
34
-
-
84866544858
-
Hybrid memory cube new dram architecture increases density and performance
-
June
-
J. Jeddeloh and B. Keeth, "Hybrid memory cube new dram architecture increases density and performance," in VLSI Technology (VLSIT), 2012 Symposium on, June 2012, pp. 87-88.
-
(2012)
VLSI Technology (VLSIT), 2012 Symposium on
, pp. 87-88
-
-
Jeddeloh, J.1
Keeth, B.2
-
35
-
-
0032318285
-
Improving locality using loop and data transformations in an integrated framework
-
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, ser. MICRO 31, 1998, pp. 285-297.
-
(1998)
Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, Ser. MICRO
, vol.31
, pp. 285-297
-
-
Kandemir, M.1
Choudhary, A.2
Ramanujam, J.3
Banerjee, P.4
-
36
-
-
84872090206
-
Flexram: Toward an advanced intelligent memory system
-
IEEE
-
Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, "Flexram: Toward an advanced intelligent memory system," in Computer Design (ICCD), 2012 IEEE 30th International Conference on. IEEE, 2012, pp. 5-14.
-
(2012)
Computer Design (ICCD), 2012 IEEE 30th International Conference on
, pp. 5-14
-
-
Kang, Y.1
Huang, W.2
Yoo, S.-M.3
Keen, D.4
Ge, Z.5
Lam, V.6
Pattnaik, P.7
Torrellas, J.8
-
37
-
-
80054875176
-
GPUs and the future of parallel computing
-
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, "Gpus and the future of parallel computing," IEEE Micro, vol. 31, no. 5, pp. 7-17, 2011.
-
(2011)
IEEE Micro
, vol.31
, Issue.5
, pp. 7-17
-
-
Keckler, S.W.1
Dally, W.J.2
Khailany, B.3
Garland, M.4
Glasco, D.5
-
38
-
-
84893595327
-
Quantifying the energy cost of data movement in scientific applications
-
Sept
-
G. Kestor, R. Gioiosa, D. Kerbyson, and A. Hoisie, "Quantifying the energy cost of data movement in scientific applications," in Workload Characterization (IISWC), 2013 IEEE International Symposium on, Sept 2013, pp. 56-65.
-
(2013)
Workload Characterization (IISWC), 2013 IEEE International Symposium on
, pp. 56-65
-
-
Kestor, G.1
Gioiosa, R.2
Kerbyson, D.3
Hoisie, A.4
-
39
-
-
84860655377
-
3d-maps: 3d massively parallel processor with stacked memory
-
Feb
-
D. H. Kim, K. Athikulwongse, M. Healy, M. Hossain, M. Jung, I. Khorosh, G. Kumar, Y.-J. Lee, D. Lewis, T.-W. Lin, C. Liu, S. Panth, M. Pathak, M. Ren, G. Shen, T. Song, D. H. Woo, X. Zhao, J. Kim, H. Choi, G. Loh, H.-H. Lee, and S.-K. Lim, "3d-maps: 3d massively parallel processor with stacked memory," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International, Feb 2012, pp. 188-190.
-
(2012)
Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International
, pp. 188-190
-
-
Kim, D.H.1
Athikulwongse, K.2
Healy, M.3
Hossain, M.4
Jung, M.5
Khorosh, I.6
Kumar, G.7
Lee, Y.-J.8
Lewis, D.9
Lin, T.-W.10
Liu, C.11
Panth, S.12
Pathak, M.13
Ren, M.14
Shen, G.15
Song, T.16
Woo, D.H.17
Zhao, X.18
Kim, J.19
Choi, H.20
Loh, G.21
Lee, H.-H.22
Lim, S.-K.23
more..
-
41
-
-
84876532873
-
A scalable 0.128-to-1tb/s 0.8-to-2.6 pj/b 64-lane parallel i/o in 32nm CMOS
-
IEEE
-
M. Mansuri, J. E. Jaussi, J. T. Kennedy, T. Hsueh, S. Shekhar, G. Balamurugan, F. O'Mahony, C. Roberts, R. Mooney, and B. Casper, "A scalable 0.128-to-1tb/s 0.8-to-2.6 pj/b 64-lane parallel i/o in 32nm cmos," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013, pp. 402-403.
-
(2013)
Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International
, pp. 402-403
-
-
Mansuri, M.1
Jaussi, J.E.2
Kennedy, J.T.3
Hsueh, T.4
Shekhar, S.5
Balamurugan, G.6
O'Mahony, F.7
Roberts, C.8
Mooney, R.9
Casper, B.10
-
42
-
-
84888048152
-
A computer oriented geodetic data base and a new technique in file sequencing
-
G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, 1966.
-
(1966)
International Business Machines Company
-
-
Morton, G.M.1
-
43
-
-
0031594009
-
Active pages: A computation model for intelligent memory
-
M. Oskin, F. T. Chong, and T. Sherwood, "Active pages: A computation model for intelligent memory," in ISCA, 1998, pp. 192-203.
-
(1998)
ISCA
, pp. 192-203
-
-
Oskin, M.1
Chong, F.T.2
Sherwood, T.3
-
44
-
-
0042235298
-
Tiling, block data layout, and memory hierarchy performance
-
July
-
N. Park, B. Hong, and V. Prasanna, "Tiling, block data layout, and memory hierarchy performance," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 7, pp. 640-654, July 2003.
-
(2003)
IEEE Transactions on Parallel and Distributed Systems
, vol.14
, Issue.7
, pp. 640-654
-
-
Park, N.1
Hong, B.2
Prasanna, V.3
-
45
-
-
0031096193
-
A case for intelligent ram
-
Mar
-
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, "A case for intelligent ram," Micro, IEEE, vol. 17, no. 2, pp. 34-44, Mar 1997.
-
(1997)
Micro, IEEE
, vol.17
, Issue.2
, pp. 34-44
-
-
Patterson, D.1
Anderson, T.2
Cardwell, N.3
Fromm, R.4
Keeton, K.5
Kozyrakis, C.6
Thomas, R.7
Yelick, K.8
-
46
-
-
84876588873
-
Hybrid memory cube (HMC)
-
J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.
-
(2011)
Hotchips
-
-
Pawlowski, J.T.1
-
47
-
-
84960110662
-
-
J. W. Poulton, W. J. Dally, X. Chen, J. G. Eyles, T. H. Greer, S. G. Tell, J. M. Wilson, and C. T. Gray, "A 0.54 pj/b 20 gb/s groundreferenced single-ended short-reach serial link in 28 nm cmos for advanced packaging applications," 2013.
-
(2013)
A 0.54 Pj/b 20 Gb/s Groundreferenced Single-ended Short-reach Serial Link in 28 Nm CMOS for Advanced Packaging Applications
-
-
Poulton, J.W.1
Dally, W.J.2
Chen, X.3
Eyles, J.G.4
Greer, T.H.5
Tell, S.G.6
Wilson, J.M.7
Gray, C.T.8
-
48
-
-
84904469580
-
NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads
-
S. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li, "NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads," in Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS), 2014.
-
(2014)
Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS)
-
-
Pugsley, S.1
Jestes, J.2
Zhang, H.3
Balasubramonian, R.4
Srinivasan, V.5
Buyuktosunoglu, A.6
Davis, A.7
Li, F.8
-
49
-
-
70349972511
-
Permuting streaming data using rams
-
Apr
-
M. Püschel, P. A. Milder, and J. C. Hoe, "Permuting streaming data using rams," J. ACM, vol. 56, no. 2, pp. 10:1-10:34, Apr. 2009.
-
(2009)
J. ACM
, vol.56
, Issue.2
, pp. 101-1034
-
-
Püschel, M.1
Milder, P.A.2
Hoe, J.C.3
-
50
-
-
19344368072
-
SPIRAL: Code generation for DSP transforms
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proc. of IEEE, special issue on "Program Generation, Optimization, and Adaptation", vol. 93, no. 2, pp. 232-275, 2005.
-
(2005)
Proc. of IEEE, Special Issue On"Program Generation, Optimization, and Adaptation
, vol.93
, Issue.2
, pp. 232-275
-
-
Püschel, M.1
Moura, J.M.F.2
Johnson, J.3
Padua, D.4
Veloso, M.5
Singer, B.6
Xiong, J.7
Franchetti, F.8
Gacic, A.9
Voronenko, Y.10
Chen, K.11
Johnson, R.W.12
Rizzolo, N.13
-
51
-
-
79959583242
-
Page placement in hybrid memory systems
-
ACM
-
L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the international conference on Supercomputing. ACM, 2011, pp. 85-95.
-
(2011)
Proceedings of the International Conference on Supercomputing
, pp. 85-95
-
-
Ramos, L.E.1
Gorbatov, E.2
Bianchini, R.3
-
53
-
-
84892504664
-
Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization
-
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization," in Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-46, 2013, pp. 185-197.
-
(2013)
Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, Ser. MICRO-46
, pp. 185-197
-
-
Seshadri, V.1
Kim, Y.2
Fallin, C.3
Lee, D.4
Ausavarungnirun, R.5
Pekhimenko, G.6
Luo, Y.7
Mutlu, O.8
Gibbons, P.B.9
Kozuch, M.A.10
Mowry, T.C.11
-
54
-
-
77952283542
-
Micro-pages: Increasing dram efficiency with localityaware data placement
-
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis, "Micro-pages: Increasing dram efficiency with localityaware data placement," in Proc. of Arch. Sup. for Prog. Lang. and OS, ser. ASPLOS XV, 2010, pp. 219-230.
-
(2010)
Proc. of Arch. Sup. for Prog. Lang. and OS, Ser. ASPLOS XV
, pp. 219-230
-
-
Sudan, K.1
Chatterjee, N.2
Nellans, D.3
Awasthi, M.4
Balasubramonian, R.5
Davis, A.6
-
55
-
-
84870691946
-
Dl: A data layout transformation system for heterogeneous computing
-
May 2012
-
I.-J. Sung, G. Liu, and W.-M. Hwu, "Dl: A data layout transformation system for heterogeneous computing," in Innovative Parallel Computing (InPar), 2012, May 2012, pp. 1-11.
-
(2012)
Innovative Parallel Computing (InPar)
, pp. 1-11
-
-
Sung, I.-J.1
Liu, G.2
Hwu, W.-M.3
-
56
-
-
0003215611
-
Computational frameworks for the fast Fourier transform
-
C. Van Loan, Computational frameworks for the fast Fourier transform. SIAM, 1992.
-
(1992)
SIAM
-
-
Van Loan, C.1
-
57
-
-
84875163754
-
Exploration and optimization of 3-d integrated dram subsystems
-
April
-
C.Weis, I. Loi, L. Benini, and N.Wehn, "Exploration and optimization of 3-d integrated dram subsystems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 4, pp. 597-610, April 2013.
-
(2013)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, vol.32
, Issue.4
, pp. 597-610
-
-
Weis, C.1
Loi, I.2
Benini, L.3
Wehn, N.4
-
58
-
-
77952554764
-
An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth
-
IEEE
-
D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth," in High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010, pp. 1-12.
-
(2010)
High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on
, pp. 1-12
-
-
Woo, D.H.1
Seong, N.H.2
Lewis, D.L.3
Lee, H.-H.4
-
59
-
-
0034826555
-
SPL: A language and compiler for DSP algorithms
-
J. Xiong, J. Johnson, R. W. Johnson, and D. Padua, "SPL: A language and compiler for DSP algorithms," in Programming Languages Design and Implementation (PLDI), 2001, pp. 298-308.
-
(2001)
Programming Languages Design and Implementation (PLDI)
, pp. 298-308
-
-
Xiong, J.1
Johnson, J.2
Johnson, R.W.3
Padua, D.4
-
60
-
-
84904424285
-
Top-pim: Throughput-oriented programmable processing in memory
-
New York, NY, USA: ACM
-
D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, "Top-pim: Throughput-oriented programmable processing in memory," in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC '14. New York, NY, USA: ACM, 2014, pp. 85-98.
-
(2014)
Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, Ser. HPDC '14
, pp. 85-98
-
-
Zhang, D.1
Jayasena, N.2
Lyashevsky, A.3
Greathouse, J.L.4
Xu, L.5
Ignatowski, M.6
-
61
-
-
0034460897
-
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
-
ACM Press
-
Z. Zhang, Z. Zhu, and X. Zhang, "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," in In Proceedings of the 33rd Annual International Symposium on Microarchitecture. ACM Press, 2000, pp. 32-41.
-
(2000)
Proceedings of the 33rd Annual International Symposium on Microarchitecture
, pp. 32-41
-
-
Zhang, Z.1
Zhu, Z.2
Zhang, X.3
-
62
-
-
33748543231
-
Hardware support for bulk data movement in server platforms
-
Oct
-
L. Zhao, R. Iyer, S. Makineni, L. Bhuyan, and D. Newell, "Hardware support for bulk data movement in server platforms," in Proc. of IEEE Intl. Conf. on Computer Design, (ICCD), Oct 2005, pp. 53-60.
-
(2005)
Proc. of IEEE Intl. Conf. on Computer Design, (ICCD)
, pp. 53-60
-
-
Zhao, L.1
Iyer, R.2
Makineni, S.3
Bhuyan, L.4
Newell, D.5
-
63
-
-
84893898462
-
A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing
-
Oct
-
Q. Zhu, B. Akin, H. Sumbul, F. Sadi, J. Hoe, L. Pileggi, and F. Franchetti, "A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing," in 3D Systems Integration Conference (3DIC), 2013 IEEE International, Oct 2013, pp. 1-7.
-
(2013)
3D Systems Integration Conference (3DIC), 2013 IEEE International
, pp. 1-7
-
-
Zhu, Q.1
Akin, B.2
Sumbul, H.3
Sadi, F.4
Hoe, J.5
Pileggi, L.6
Franchetti, F.7
|