SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on Computer Architecture

Volumn 13-17-June-2015, Issue , 2015, Pages 131-143

Data reorganization in memory using 3D-stacked DRAM

(3) Akin, Berkin a Franchetti, Franz a Hoe, James C a

a CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER ARCHITECTURE; ENERGY EFFICIENCY; INTEGRATED CIRCUIT LAYOUT; MEMORY ARCHITECTURE; PHYSICAL ADDRESSES; PROGRAM PROCESSORS; THREE DIMENSIONAL INTEGRATED CIRCUITS;

CONVENTIONAL SYSTEMS; DATA REORGANIZATION; ENERGY EFFICIENCY IMPROVEMENTS; MATHEMATICAL FRAMEWORKS; MEMORY HIERARCHY; OPTIMIZED IMPLEMENTATION; ORDERS OF MAGNITUDE; PERFORMANCE BENEFITS;

DYNAMIC RANDOM ACCESS STORAGE;

EID: 84960125983 PISSN: 10636897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2749469.2750397 Document Type: Conference Paper

Times cited : (158)

References (63)

1
- 84960084201
- "CACTI 6.5, HP labs," http://www.hpl.hp.com/research/cacti/.
- CACTI 6.5, HP Labs

2
- 84960102542
- "DDR3-1600 dram datasheet, MT41J256M4, Micron," http://www.micron.com/parts/dram/ddr3-sdram.
- DDR3-1600 Dram Datasheet, MT41J256M4, Micron

3
- 77952650268
- "Intel math kernel library (MKL)," http://software.intel.com/enus/articles/intel-mkl/.
- Intel Math Kernel Library (MKL)

4
- 84924470066
- "McPAT 1.0, HP labs," http://www.hpl.hp.com/research/mcpat/.
- McPAT 1.0, HP Labs

5
- 84870671761
- "Performance application programming interface (PAPI)," http://icl.cs.utk.edu/papi/.
- Performance Application Programming Interface (PAPI)

6
- 84960130034
- "Gromacs," http://www.gromacs.org, 2008.
- (2008) Gromacs

7
- 84960080825
- Dec
- "Itrs interconnect working group, winter update," http://www.itrs.net/, Dec 2012.
- (2012) Itrs Interconnect Working Group, Winter Update

8
- 84897531508
- "Memory scheduling championship (MSC)," http://www.cs.utah.edu/rajeev/jwac12/, 2012.
- (2012) Memory Scheduling Championship (MSC)

9
- 84960189963
- High bandwidth memory (HBM) dram
- "High bandwidth memory (HBM) dram," JEDEC, JESD235, 2013.
- (2013) JEDEC, JESD , vol.235

10
- 84960079694
- October
- "Intel 64 and ia-32 architectures software developers," http://www.intel.com/content/dam/www/public/us/en/documents /manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf, October 2014.
- (2014) Intel 64 and ia-32 Architectures Software Developers

11
- 84905216981
- FFTS with near-optimal memory access through block data layouts
- Florence, Italy, May 4-9, 2014
- B. Akin, F. Franchetti, and J. C. Hoe, "FFTS with near-optimal memory access through block data layouts," in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, 2014, pp. 3898-3902.
- (2014) IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP , vol.2014 , pp. 3898-3902
- Akin, B.¹ Franchetti, F.² Hoe, J.C.³

12
- 84906342287
- Understanding the design space of dram-optimized hardware FFT accelerators
- Zurich, Switzerland, June 18-20, 2014
- -, "Understanding the design space of dram-optimized hardware FFT accelerators," in IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2014, Zurich, Switzerland, June 18-20, 2014, 2014, pp. 248-255.
- (2014) IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, ASAP , vol.2014 , pp. 248-255
- Akin, B.¹ Franchetti, F.² Hoe, J.C.³

13
- 84946692636
- Hamlet: Hardware accelerated memory layout transform within 3d-stacked DRAM
- Waltham, MA, USA, September 9-11, 2014
- B. Akin, J. C. Hoe, and F. Franchetti, "Hamlet: Hardware accelerated memory layout transform within 3d-stacked DRAM," in IEEE High Performance Extreme Computing Conference, HPEC 2014, Waltham, MA, USA, September 9-11, 2014, 2014, pp. 1-6.
- (2014) IEEE High Performance Extreme Computing Conference, HPEC , vol.2014 , pp. 1-6
- Akin, B.¹ Hoe, J.C.² Franchetti, F.³

14
- 84864952164
- Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes
- 29 April-1 May 2012, Toronto, Ontario, Canada
- B. Akin, P. A. Milder, F. Franchetti, and J. C. Hoe, "Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes," in 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2012, 29 April-1 May 2012, Toronto, Ontario, Canada, 2012, pp. 188-191.
- (2012) 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM , vol.2012 , pp. 188-191
- Akin, B.¹ Milder, P.A.² Franchetti, F.³ Hoe, J.C.⁴

15
- 84881179047
- Efficient virtual memory for big memory servers
- ACM
- A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, "Efficient virtual memory for big memory servers," in Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM, 2013, pp. 237-248.
- (2013) Proceedings of the 40th Annual International Symposium on Computer Architecture , pp. 237-248
- Basu, A.¹ Gandhi, J.² Chang, J.³ Hill, M.D.⁴ Swift, M.M.⁵

16
- 20744453223
- Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models
- Feb
- G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models," Proceedings of the IEEE, vol. 93, no. 2, pp. 276-292, Feb 2005.
- (2005) Proceedings of the IEEE , vol.93 , Issue.2 , pp. 276-292
- Baumgartner, G.¹ Auer, A.² Bernholdt, D.³ Bibireata, A.⁴ Choppella, V.⁵ Cociorva, D.⁶ Gao, X.⁷ Harrison, R.⁸ Hirata, S.⁹ Krishnamoorthy, S.¹⁰ Krishnan, S.¹¹ Lam, C.¹² Lu, Q.¹³ Nooijen, M.¹⁴ Pitzer, R.¹⁵ Ramanujam, J.¹⁶ Sadayappan, P.¹⁷ Sibiryakov, A.¹⁸

17
- 63549095070
- The parsec benchmark suite: Characterization and architectural implications
- ACM
- C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The parsec benchmark suite: Characterization and architectural implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008, pp. 72-81.
- (2008) Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques , pp. 72-81
- Bienia, C.¹ Kumar, S.² Singh, J.P.³ Li, K.⁴

18
- 70449629588
- Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
- ACM
- A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks," in Proceedings of the twentyfirst annual symposium on Parallelism in algorithms and architectures. ACM, 2009, pp. 233-244.
- (2009) Proceedings of the Twentyfirst Annual Symposium on Parallelism in Algorithms and Architectures , pp. 233-244
- Buluç, A.¹ Fineman, J.T.² Frigo, M.³ Gilbert, J.R.⁴ Leiserson, C.E.⁵

19
- 0032761638
- Impulse: Building a smarter memory controller
- Jan
- J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama, "Impulse: building a smarter memory controller," in High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium On, Jan 1999, pp. 70-79.
- (1999) High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium on , pp. 70-79
- Carter, J.¹ Hsieh, W.² Stoller, L.³ Swanson, M.⁴ Zhang, L.⁵ Brunvand, E.⁶ Davis, A.⁷ Kuo, C.-C.⁸ Kuramkote, R.⁹ Parker, M.¹⁰ Schaelicke, L.¹¹ Tateyama, T.¹²

20
- 84876514971
- N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, "Usimm: the utah simulated memory module," 2012.
- (2012) Usimm: The Utah Simulated Memory Module
- Chatterjee, N.¹ Balasubramonian, R.² Shevgoor, M.³ Pugsley, S.⁴ Udipi, A.⁵ Shafiee, A.⁶ Sudan, K.⁷ Awasthi, M.⁸ Chishti, Z.⁹

21
- 83155184570
- Dymaxion: Optimizing memory access patterns for heterogeneous systems
- S. Che, J. W. Sheaffer, and K. Skadron, "Dymaxion: Optimizing memory access patterns for heterogeneous systems," in Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC), 2011, pp. 13:1-13:11.
- (2011) Proc. of Intl. Conf. for High Perf. Comp., Networking, Storage and Analysis (SC) , pp. 131-1311
- Che, S.¹ Sheaffer, J.W.² Skadron, K.³

22
- 84862084382
- CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory
- K. Chen, S. Li, N. Muralimanohar, J.-H. Ahn, J. Brockman, and N. Jouppi, "CACTI-3DD: Architecture-level modeling for 3D diestacked DRAM main memory," in Design, Automation Test in Europe (DATE), 2012, pp. 33-38.
- (2012) Design, Automation Test in Europe (DATE) , pp. 33-38
- Chen, K.¹ Li, S.² Muralimanohar, N.³ Ahn, J.-H.⁴ Brockman, J.⁵ Jouppi, N.⁶

23
- 84859721885
- An 8x 10-gb/s source-synchronous i/o system based on high-density silicon carrier interconnects
- T. O. Dickson, Y. Liu, S. V. Rylov, B. Dang, C. K. Tsang, P. S. Andry, J. F. Bulzacchelli, H. A. Ainspan, X. Gu, L. Turlapati et al., "An 8x 10-gb/s source-synchronous i/o system based on high-density silicon carrier interconnects," Solid-State Circuits, IEEE Journal of, vol. 47, no. 4, pp. 884-896, 2012.
- (2012) Solid-State Circuits, IEEE Journal of , vol.47 , Issue.4 , pp. 884-896
- Dickson, T.O.¹ Liu, Y.² Rylov, S.V.³ Dang, B.⁴ Tsang, C.K.⁵ Andry, P.S.⁶ Bulzacchelli, J.F.⁷ Ainspan, H.A.⁸ Gu, X.⁹ Turlapati, L.¹⁰

24
- 78650833009
- Simple but effective heterogeneous main memory with on-chip memory controller support
- X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi, "Simple but effective heterogeneous main memory with on-chip memory controller support," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2010, pp. 1-11.
- (2010) Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society , pp. 1-11
- Dong, X.¹ Xie, Y.² Muralimanohar, N.³ Jouppi, N.P.⁴

25
- 84887089342
- Centip3de: A many-core prototype exploring 3d integration and near-threshold computing
- Nov
- R. G. Dreslinski, D. Fick, B. Giridhar, G. Kim, S. Seo, M. Fojtik, S. Satpathy, Y. Lee, D. Kim, N. Liu, M. Wieckowski, G. Chen, D. Sylvester, D. Blaauw, and T. Mudge, "Centip3de: A many-core prototype exploring 3d integration and near-threshold computing," Commun. ACM, vol. 56, no. 11, pp. 97-104, Nov. 2013.
- (2013) Commun. ACM , vol.56 , Issue.11 , pp. 97-104
- Dreslinski, R.G.¹ Fick, D.² Giridhar, B.³ Kim, G.⁴ Seo, S.⁵ Fojtik, M.⁶ Satpathy, S.⁷ Lee, Y.⁸ Kim, D.⁹ Liu, N.¹⁰ Wieckowski, M.¹¹ Chen, G.¹² Sylvester, D.¹³ Blaauw, D.¹⁴ Mudge, T.¹⁵

26
- 84934280905
- Nda: Near-dram acceleration architecture leveraging commodity dram devices and standard memory modules
- Feb
- A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim, "Nda: Near-dram acceleration architecture leveraging commodity dram devices and standard memory modules," in High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, Feb 2015, pp. 283-295.
- (2015) High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on , pp. 283-295
- Farmahini-Farahani, A.¹ Ahn, J.H.² Morrow, K.³ Kim, N.S.⁴

27
- 20744449792
- The design and implementation of FFTW3 program generation, optimization, and platform adaptation
- M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," Proceedings of the IEEE, Special issue on "Program Generation, Optimization, and Platform Adaptation", vol. 93, no. 2, pp. 216-231, 2005.
- (2005) Proceedings of the IEEE, Special Issue on , vol.93 , Issue.2 , pp. 216-231
- Frigo, M.¹ Johnson, S.G.²

28
- 0029290396
- Processing in memory: The terasys massively parallel pim array
- Apr
- M. Gokhale, B. Holmes, and K. Iobst, "Processing in memory: the terasys massively parallel pim array," Computer, vol. 28, no. 4, pp. 23-31, Apr 1995.
- (1995) Computer , vol.28 , Issue.4 , pp. 23-31
- Gokhale, M.¹ Holmes, B.² Iobst, K.³

29
- 44249094647
- Anatomy of high-performance matrix multiplication
- May
- K. Goto and R. A. v. d. Geijn, "Anatomy of high-performance matrix multiplication," ACM Trans. Math. Softw., vol. 34, no. 3, pp. 12:1-12:25, May 2008.
- (2008) ACM Trans. Math. Softw. , vol.34 , Issue.3 , pp. 121-1225
- Goto, K.¹ Geijn, R.A.²

30
- 77954724842
- Sams multi-layout memory: Providing multiple views of data to boost simd performance
- C. Gou, G. Kuzmanov, and G. N. Gaydadjiev, "Sams multi-layout memory: Providing multiple views of data to boost simd performance," in Proceedings of the 24th ACM International Conference on Supercomputing, ser. ICS '10, 2010, pp. 179-188.
- (2010) Proceedings of the 24th ACM International Conference on Supercomputing, Ser. ICS '10 , pp. 179-188
- Gou, C.¹ Kuzmanov, G.² Gaydadjiev, G.N.³

31
- 84959888157
- 3d-stacked memory-side acceleration: Accelerator and system design
- Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti, "3d-stacked memory-side acceleration: Accelerator and system design," in In the Workshop on Near-Data Processing (WoNDP) (Held in conjunction with MICRO-47.), 2014.
- (2014) The Workshop on Near-Data Processing (WoNDP) (Held in Conjunction with MICRO-47.)
- Guo, Q.¹ Alachiotis, N.² Akin, B.³ Sadi, F.⁴ Xu, G.⁵ Low, T.M.⁶ Pileggi, L.⁷ Hoe, J.C.⁸ Franchetti, F.⁹

32
- 36849034066
- Spec cpu2006 benchmark descriptions
- J. L. Henning, "Spec cpu2006 benchmark descriptions," ACM SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1-17, 2006.
- (2006) ACM SIGARCH Computer Architecture News , vol.34 , Issue.4 , pp. 1-17
- Henning, J.L.¹

33
- 84960189965
- Improving node-level map-reduce performance using processing-inmemory technologies
- M. Islam, M. Scrback, K. Kavi, M. Ignatowski, and N. Jayasena, "Improving node-level map-reduce performance using processing-inmemory technologies," in 7th Workshop on UnConventional High Performance Computing held in conjunction with the EuroPar 2014, ser. UCHPC2014, 2014.
- (2014) 7th Workshop on UnConventional High Performance Computing Held in Conjunction with the EuroPar 2014, Ser. UCHPC2014
- Islam, M.¹ Scrback, M.² Kavi, K.³ Ignatowski, M.⁴ Jayasena, N.⁵

34
- 84866544858
- Hybrid memory cube new dram architecture increases density and performance
- June
- J. Jeddeloh and B. Keeth, "Hybrid memory cube new dram architecture increases density and performance," in VLSI Technology (VLSIT), 2012 Symposium on, June 2012, pp. 87-88.
- (2012) VLSI Technology (VLSIT), 2012 Symposium on , pp. 87-88
- Jeddeloh, J.¹ Keeth, B.²

35
- 0032318285
- Improving locality using loop and data transformations in an integrated framework
- M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, ser. MICRO 31, 1998, pp. 285-297.
- (1998) Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, Ser. MICRO , vol.31 , pp. 285-297
- Kandemir, M.¹ Choudhary, A.² Ramanujam, J.³ Banerjee, P.⁴

36
- 84872090206
- Flexram: Toward an advanced intelligent memory system
- IEEE
- Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, "Flexram: Toward an advanced intelligent memory system," in Computer Design (ICCD), 2012 IEEE 30th International Conference on. IEEE, 2012, pp. 5-14.
- (2012) Computer Design (ICCD), 2012 IEEE 30th International Conference on , pp. 5-14
- Kang, Y.¹ Huang, W.² Yoo, S.-M.³ Keen, D.⁴ Ge, Z.⁵ Lam, V.⁶ Pattnaik, P.⁷ Torrellas, J.⁸

37
- 80054875176
- GPUs and the future of parallel computing
- S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, "Gpus and the future of parallel computing," IEEE Micro, vol. 31, no. 5, pp. 7-17, 2011.
- (2011) IEEE Micro , vol.31 , Issue.5 , pp. 7-17
- Keckler, S.W.¹ Dally, W.J.² Khailany, B.³ Garland, M.⁴ Glasco, D.⁵

38
- 84893595327
- Quantifying the energy cost of data movement in scientific applications
- Sept
- G. Kestor, R. Gioiosa, D. Kerbyson, and A. Hoisie, "Quantifying the energy cost of data movement in scientific applications," in Workload Characterization (IISWC), 2013 IEEE International Symposium on, Sept 2013, pp. 56-65.
- (2013) Workload Characterization (IISWC), 2013 IEEE International Symposium on , pp. 56-65
- Kestor, G.¹ Gioiosa, R.² Kerbyson, D.³ Hoisie, A.⁴

39
- 84860655377
- 3d-maps: 3d massively parallel processor with stacked memory
- Feb
- D. H. Kim, K. Athikulwongse, M. Healy, M. Hossain, M. Jung, I. Khorosh, G. Kumar, Y.-J. Lee, D. Lewis, T.-W. Lin, C. Liu, S. Panth, M. Pathak, M. Ren, G. Shen, T. Song, D. H. Woo, X. Zhao, J. Kim, H. Choi, G. Loh, H.-H. Lee, and S.-K. Lim, "3d-maps: 3d massively parallel processor with stacked memory," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International, Feb 2012, pp. 188-190.
- (2012) Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International , pp. 188-190
- Kim, D.H.¹ Athikulwongse, K.² Healy, M.³ Hossain, M.⁴ Jung, M.⁵ Khorosh, I.⁶ Kumar, G.⁷ Lee, Y.-J.⁸ Lewis, D.⁹ Lin, T.-W.¹⁰ Liu, C.¹¹ Panth, S.¹² Pathak, M.¹³ Ren, M.¹⁴ Shen, G.¹⁵ Song, T.¹⁶ Woo, D.H.¹⁷ Zhao, X.¹⁸ Kim, J.¹⁹ Choi, H.²⁰ more..

40
- 52649125840
- 3d-stacked memory architectures for multi-core processors
- G. H. Loh, "3d-stacked memory architectures for multi-core processors," in Proc. of the 35th Annual International Symposium on Computer Architecture, (ISCA), 2008, pp. 453-464.
- (2008) Proc. of the 35th Annual International Symposium on Computer Architecture, (ISCA) , pp. 453-464
- Loh, G.H.¹

41
- 84876532873
- A scalable 0.128-to-1tb/s 0.8-to-2.6 pj/b 64-lane parallel i/o in 32nm CMOS
- IEEE
- M. Mansuri, J. E. Jaussi, J. T. Kennedy, T. Hsueh, S. Shekhar, G. Balamurugan, F. O'Mahony, C. Roberts, R. Mooney, and B. Casper, "A scalable 0.128-to-1tb/s 0.8-to-2.6 pj/b 64-lane parallel i/o in 32nm cmos," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013, pp. 402-403.
- (2013) Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International , pp. 402-403
- Mansuri, M.¹ Jaussi, J.E.² Kennedy, J.T.³ Hsueh, T.⁴ Shekhar, S.⁵ Balamurugan, G.⁶ O'Mahony, F.⁷ Roberts, C.⁸ Mooney, R.⁹ Casper, B.¹⁰

42
- 84888048152
- A computer oriented geodetic data base and a new technique in file sequencing
- G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, 1966.
- (1966) International Business Machines Company
- Morton, G.M.¹

43
- 0031594009
- Active pages: A computation model for intelligent memory
- M. Oskin, F. T. Chong, and T. Sherwood, "Active pages: A computation model for intelligent memory," in ISCA, 1998, pp. 192-203.
- (1998) ISCA , pp. 192-203
- Oskin, M.¹ Chong, F.T.² Sherwood, T.³

44
- 0042235298
- Tiling, block data layout, and memory hierarchy performance
- July
- N. Park, B. Hong, and V. Prasanna, "Tiling, block data layout, and memory hierarchy performance," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 7, pp. 640-654, July 2003.
- (2003) IEEE Transactions on Parallel and Distributed Systems , vol.14 , Issue.7 , pp. 640-654
- Park, N.¹ Hong, B.² Prasanna, V.³

45
- 0031096193
- A case for intelligent ram
- Mar
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, "A case for intelligent ram," Micro, IEEE, vol. 17, no. 2, pp. 34-44, Mar 1997.
- (1997) Micro, IEEE , vol.17 , Issue.2 , pp. 34-44
- Patterson, D.¹ Anderson, T.² Cardwell, N.³ Fromm, R.⁴ Keeton, K.⁵ Kozyrakis, C.⁶ Thomas, R.⁷ Yelick, K.⁸

46
- 84876588873
- Hybrid memory cube (HMC)
- J. T. Pawlowski, "Hybrid memory cube (HMC)," in Hotchips, 2011.
- (2011) Hotchips
- Pawlowski, J.T.¹

47
- 84960110662
- J. W. Poulton, W. J. Dally, X. Chen, J. G. Eyles, T. H. Greer, S. G. Tell, J. M. Wilson, and C. T. Gray, "A 0.54 pj/b 20 gb/s groundreferenced single-ended short-reach serial link in 28 nm cmos for advanced packaging applications," 2013.
- (2013) A 0.54 Pj/b 20 Gb/s Groundreferenced Single-ended Short-reach Serial Link in 28 Nm CMOS for Advanced Packaging Applications
- Poulton, J.W.¹ Dally, W.J.² Chen, X.³ Eyles, J.G.⁴ Greer, T.H.⁵ Tell, S.G.⁶ Wilson, J.M.⁷ Gray, C.T.⁸

48
- 84904469580
- NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads
- S. Pugsley, J. Jestes, H. Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and F. Li, "NDC: Analyzing the impact of 3D-stacked memory+logic devices on mapreduce workloads," in Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS), 2014.
- (2014) Proc. of IEEE Intl. Symp. on Perf. Analysis of Sys. and Soft. (ISPASS)
- Pugsley, S.¹ Jestes, J.² Zhang, H.³ Balasubramonian, R.⁴ Srinivasan, V.⁵ Buyuktosunoglu, A.⁶ Davis, A.⁷ Li, F.⁸

49
- 70349972511
- Permuting streaming data using rams
- Apr
- M. Püschel, P. A. Milder, and J. C. Hoe, "Permuting streaming data using rams," J. ACM, vol. 56, no. 2, pp. 10:1-10:34, Apr. 2009.
- (2009) J. ACM , vol.56 , Issue.2 , pp. 101-1034
- Püschel, M.¹ Milder, P.A.² Hoe, J.C.³

50
- 19344368072
- SPIRAL: Code generation for DSP transforms
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proc. of IEEE, special issue on "Program Generation, Optimization, and Adaptation", vol. 93, no. 2, pp. 232-275, 2005.
- (2005) Proc. of IEEE, Special Issue On"Program Generation, Optimization, and Adaptation , vol.93 , Issue.2 , pp. 232-275
- Püschel, M.¹ Moura, J.M.F.² Johnson, J.³ Padua, D.⁴ Veloso, M.⁵ Singer, B.⁶ Xiong, J.⁷ Franchetti, F.⁸ Gacic, A.⁹ Voronenko, Y.¹⁰ Chen, K.¹¹ Johnson, R.W.¹² Rizzolo, N.¹³

51
- 79959583242
- Page placement in hybrid memory systems
- ACM
- L. E. Ramos, E. Gorbatov, and R. Bianchini, "Page placement in hybrid memory systems," in Proceedings of the international conference on Supercomputing. ACM, 2011, pp. 85-95.
- (2011) Proceedings of the International Conference on Supercomputing , pp. 85-95
- Ramos, L.E.¹ Gorbatov, E.² Bianchini, R.³

52
- 84879521273
- Optimizing matrix transpose in CUDA
- G. Ruetsch and P. Micikevicius, "Optimizing matrix transpose in CUDA," Nvidia CUDA SDK Application Note, 2009.
- (2009) NVIDIA CUDA SDK Application Note
- Ruetsch, G.¹ Micikevicius, P.²

53
- 84892504664
- Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization
- V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization," in Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-46, 2013, pp. 185-197.
- (2013) Proc. of the IEEE/ACM Intl. Symp. on Microarchitecture, Ser. MICRO-46 , pp. 185-197
- Seshadri, V.¹ Kim, Y.² Fallin, C.³ Lee, D.⁴ Ausavarungnirun, R.⁵ Pekhimenko, G.⁶ Luo, Y.⁷ Mutlu, O.⁸ Gibbons, P.B.⁹ Kozuch, M.A.¹⁰ Mowry, T.C.¹¹

54
- 77952283542
- Micro-pages: Increasing dram efficiency with localityaware data placement
- K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis, "Micro-pages: Increasing dram efficiency with localityaware data placement," in Proc. of Arch. Sup. for Prog. Lang. and OS, ser. ASPLOS XV, 2010, pp. 219-230.
- (2010) Proc. of Arch. Sup. for Prog. Lang. and OS, Ser. ASPLOS XV , pp. 219-230
- Sudan, K.¹ Chatterjee, N.² Nellans, D.³ Awasthi, M.⁴ Balasubramonian, R.⁵ Davis, A.⁶

55
- 84870691946
- Dl: A data layout transformation system for heterogeneous computing
- May 2012
- I.-J. Sung, G. Liu, and W.-M. Hwu, "Dl: A data layout transformation system for heterogeneous computing," in Innovative Parallel Computing (InPar), 2012, May 2012, pp. 1-11.
- (2012) Innovative Parallel Computing (InPar) , pp. 1-11
- Sung, I.-J.¹ Liu, G.² Hwu, W.-M.³

56
- 0003215611
- Computational frameworks for the fast Fourier transform
- C. Van Loan, Computational frameworks for the fast Fourier transform. SIAM, 1992.
- (1992) SIAM
- Van Loan, C.¹

57
- 84875163754
- Exploration and optimization of 3-d integrated dram subsystems
- April
- C.Weis, I. Loi, L. Benini, and N.Wehn, "Exploration and optimization of 3-d integrated dram subsystems," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 4, pp. 597-610, April 2013.
- (2013) IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol.32 , Issue.4 , pp. 597-610
- Weis, C.¹ Loi, I.² Benini, L.³ Wehn, N.⁴

58
- 77952554764
- An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth
- IEEE
- D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. Lee, "An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth," in High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010, pp. 1-12.
- (2010) High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on , pp. 1-12
- Woo, D.H.¹ Seong, N.H.² Lewis, D.L.³ Lee, H.-H.⁴

59
- 0034826555
- SPL: A language and compiler for DSP algorithms
- J. Xiong, J. Johnson, R. W. Johnson, and D. Padua, "SPL: A language and compiler for DSP algorithms," in Programming Languages Design and Implementation (PLDI), 2001, pp. 298-308.
- (2001) Programming Languages Design and Implementation (PLDI) , pp. 298-308
- Xiong, J.¹ Johnson, J.² Johnson, R.W.³ Padua, D.⁴

60
- 84904424285
- Top-pim: Throughput-oriented programmable processing in memory
- New York, NY, USA: ACM
- D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, "Top-pim: Throughput-oriented programmable processing in memory," in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC '14. New York, NY, USA: ACM, 2014, pp. 85-98.
- (2014) Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, Ser. HPDC '14 , pp. 85-98
- Zhang, D.¹ Jayasena, N.² Lyashevsky, A.³ Greathouse, J.L.⁴ Xu, L.⁵ Ignatowski, M.⁶

61
- 0034460897
- A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality
- ACM Press
- Z. Zhang, Z. Zhu, and X. Zhang, "A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality," in In Proceedings of the 33rd Annual International Symposium on Microarchitecture. ACM Press, 2000, pp. 32-41.
- (2000) Proceedings of the 33rd Annual International Symposium on Microarchitecture , pp. 32-41
- Zhang, Z.¹ Zhu, Z.² Zhang, X.³

62
- 33748543231
- Hardware support for bulk data movement in server platforms
- Oct
- L. Zhao, R. Iyer, S. Makineni, L. Bhuyan, and D. Newell, "Hardware support for bulk data movement in server platforms," in Proc. of IEEE Intl. Conf. on Computer Design, (ICCD), Oct 2005, pp. 53-60.
- (2005) Proc. of IEEE Intl. Conf. on Computer Design, (ICCD) , pp. 53-60
- Zhao, L.¹ Iyer, R.² Makineni, S.³ Bhuyan, L.⁴ Newell, D.⁵

63
- 84893898462
- A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing
- Oct
- Q. Zhu, B. Akin, H. Sumbul, F. Sadi, J. Hoe, L. Pileggi, and F. Franchetti, "A 3d-stacked logic-in-memory accelerator for application-specific data intensive computing," in 3D Systems Integration Conference (3DIC), 2013 IEEE International, Oct 2013, pp. 1-7.
- (2013) 3D Systems Integration Conference (3DIC), 2013 IEEE International , pp. 1-7
- Zhu, Q.¹ Akin, B.² Sumbul, H.³ Sadi, F.⁴ Hoe, J.⁵ Pileggi, L.⁶ Franchetti, F.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.