SCOPUS 정보 검색 플랫폼

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09

Volumn , Issue , 2009, Pages

Increasing memory miss tolerance for SIMD cores

(3) Tarjan, David a,b Meng, Jiayuan a Skadron, Kevin a

a University of Virginia (United States)

b NVIDIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

MANY-CORE; MEMORY ACCESS PATTERNS; MEMORY LATENCIES;

COMPUTER ARCHITECTURE;

WEAVING;

EID: 74049151553 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1654059.1654082 Document Type: Conference Paper

Times cited : (34)

References (26)

1
- 84856653383
- Intel Advanced Vector Extensions Programming Reference, 2009. http://software.intel.com/file/21558.
- (2009) Intel Advanced Vector Extensions Programming Reference

2
- 20344374162
- Niagara: A 32-way Multithreaded Sparc Processor
- K. Aingaran, P. Kongetira, and K. Olukotun. Niagara: A 32-way Multithreaded Sparc Processor. IEEE Micro, 25:21-29, 2005.
- (2005) IEEE Micro , vol.25 , pp. 21-29
- Aingaran, K.¹ Kongetira, P.² Olukotun, K.³

3
- 43649092214
- ATI CTM Guide: Technical reference manual
- AMD, Technical report, AMD, 2006. Version 1.01
- AMD. ATI CTM Guide: Technical reference manual. Technical report, AMD, 2006. Version 1.01.

4
- 74049108383
- AMD. ATI Radeon HD 2900 Technology, GPU Specifications, 2007.
- (2007) 2900 Technology, GPU Specifications
- AMD, A.T.I.¹ Radeon, H.D.²

5
- 41249087856
- General Purpose Molecular Dynamics Simulations fully implemented on Graphics Processing Units
- J. A. Anderson, C. D. Lorenz, and A. Travesset. General Purpose Molecular Dynamics Simulations fully implemented on Graphics Processing Units. J. of Computational Physics, 227(10):5342-5359, 2008.
- (2008) J. of Computational Physics , vol.227 , Issue.10 , pp. 5342-5359
- Anderson, J.A.¹ Lorenz, C.D.² Travesset, A.³

6
- 84944390453
- Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
- R. D. Barnes, E. M. Nystrom, J. W. Sias, S. J. Patel, N. Navarro, and W.-m. W. Hwu. Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining. In Proc. 36th IEEE/ACM Int'l Symp. Microarchitecture (MICRO '03), pages 387-398, 2003.
- (2003) Proc. 36th IEEE/ACM Int'l Symp. Microarchitecture (MICRO '03) , pp. 387-398
- Barnes, R.D.¹ Nystrom, E.M.² Sias, J.W.³ Patel, S.J.⁴ Navarro, N.⁵ Hwu, W.-M.W.⁶

7
- 0033722744
- Piranha: A Scalable Architecture based on Single-Chip Multiprocessing
- L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture based on Single-Chip Multiprocessing. In Proc. 27th Int'l Symp. Computer Architecture (ISCA '00), pages 282-293, 2000.
- (2000) Proc. 27th Int'l Symp. Computer Architecture (ISCA '00) , pp. 282-293
- Barroso, L.A.¹ Gharachorloo, K.² McNamara, R.³ Nowatzyk, A.⁴ Qadeer, S.⁵ Sano, B.⁶ Smith, S.⁷ Stets, R.⁸ Verghese, B.⁹

8
- 77953980486
- The Direct3D 10 system
- D. Blythe. The Direct3D 10 system. ACM Trans. Graphics, 25(3):724-734, 2006.
- (2006) ACM Trans. Graphics , vol.25 , Issue.3 , pp. 724-734
- Blythe, D.¹

9
- 70450059008
- Accelerating Leukocyte Tracking using CUDA: A Case Study in Leveraging Manycore Coprocessors
- M. Boyer, D. Tarjan, S. T. Acton, and K. Skadron. Accelerating Leukocyte Tracking using CUDA: A Case Study in Leveraging Manycore Coprocessors. In Proc. 24th Int'l Parallel and Distributed Processing Symp. (IPDPS '09), pages 1-12, 2009.
- (2009) Proc. 24th Int'l Parallel and Distributed Processing Symp. (IPDPS '09) , pp. 1-12
- Boyer, M.¹ Tarjan, D.² Acton, S.T.³ Skadron, K.⁴

10
- 10644248153
- Brook for GPUs: Stream Computing on Graphics Hardware
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans. on Graphics, 23(3):777-786, 2004.
- (2004) ACM Trans. on Graphics , vol.23 , Issue.3 , pp. 777-786
- Buck, I.¹ Foley, T.² Horn, D.³ Sugerman, J.⁴ Fatahalian, K.⁵ Houston, M.⁶ Hanrahan, P.⁷

11
- 51449118065
- A Performance Study of General-Purpose Applications on Graphics Processors using CUDA
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A Performance Study of General-Purpose Applications on Graphics Processors using CUDA. J. of Parallel and Distributed Computing, 68(10):1370-1380, 2008.
- (2008) J. of Parallel and Distributed Computing , vol.68 , Issue.10 , pp. 1370-1380
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

12
- 84877083867
- Merrimac: Supercomputing with Streams
- W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J.-H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merrimac: Supercomputing with Streams. In Proc. 15th ACM/IEEE Conf. Supercomputing (SC '03), page 35, 2003.
- (2003) Proc. 15th ACM/IEEE Conf. Supercomputing (SC '03) , pp. 35
- Dally, W.J.¹ Labonte, F.² Das, A.³ Hanrahan, P.⁴ Ahn, J.-H.⁵ Gummaraju, J.⁶ Erez, M.⁷ Jayasena, N.⁸ Buck, I.⁹ Knight, T.J.¹⁰ Kapasi, U.J.¹¹

13
- 35649007366
- B. K. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. S. Liberty, B. W. Michael, H.-J. Oh, S. M. Müller, O. Takahashi, K. Hirairi, A. Kawasumi, H. Murakami, H. Noro, S. Onishi, J. Pille, J. Silberman, S. Yong, A. Hatakeyama, Y. Watanabe, N. Yano, D. A. Brokenshire, M. Peyravian, V. To, and E. Iwata. Microarchitecture and Implementation of the Synergistic Processor in 65-nm and 90-nm SOI. IBM J. Research and Development, 51(5):529-544, 2007.
- B. K. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. S. Liberty, B. W. Michael, H.-J. Oh, S. M. Müller, O. Takahashi, K. Hirairi, A. Kawasumi, H. Murakami, H. Noro, S. Onishi, J. Pille, J. Silberman, S. Yong, A. Hatakeyama, Y. Watanabe, N. Yano, D. A. Brokenshire, M. Peyravian, V. To, and E. Iwata. Microarchitecture and Implementation of the Synergistic Processor in 65-nm and 90-nm SOI. IBM J. Research and Development, 51(5):529-544, 2007.

14
- 47349104432
- Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
- W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. In Proc. 40th IEEE/ACM Int'l Symp. Microarchitecture (MICRO '07), pages 407-420, 2007.
- (2007) Proc. 40th IEEE/ACM Int'l Symp. Microarchitecture (MICRO '07) , pp. 407-420
- Fung, W.W.L.¹ Sham, I.² Yuan, G.³ Aamodt, T.M.⁴

15
- 53749092570
- Parallel Computing Experiences with CUDA
- M. Garland, S. L. Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov. Parallel Computing Experiences with CUDA. IEEE Micro, 28(4):13-27, 2008.
- (2008) IEEE Micro , vol.28 , Issue.4 , pp. 13-27
- Garland, M.¹ Grand, S.L.² Nickolls, J.³ Anderson, J.⁴ Hardwick, J.⁵ Morton, S.⁶ Phillips, E.⁷ Zhang, Y.⁸ Volkov, V.⁹

16
- 34247369230
- A. Glew. MLP yes! ILP no! In ASPLOS Wild and Crazy Ideas, 1998.
- (1998) MLP yes! ILP no! In ASPLOS Wild and Crazy Ideas
- Glew, A.¹

17
- 44849137198
- NVIDIA Tesla: A Unified Graphics and Computing Architecture
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 28(2):39-55, 2008.
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

18
- 74049148302
- The Cost of Uncore in Throughput-Oriented Many-Core Processors
- G. H. Loh. The Cost of Uncore in Throughput-Oriented Many-Core Processors. In Workshop on Architectures and Languages for Throughput Applications, 2008.
- (2008) Workshop on Architectures and Languages for Throughput Applications
- Loh, G.H.¹

19
- 84955506994
- Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
- O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In "Proc. 9th Int'l Conf. High Performance Computer Architecture (HPCA '03)", pages 129-140, 2003.
- (2003) Proc. 9th Int'l Conf. High Performance Computer Architecture (HPCA '03) , pp. 129-140
- Mutlu, O.¹ Stark, J.² Wilkerson, C.³ Patt, Y.N.⁴

20
- 47349098275
- MineBench: A Benchmark Suite for Data Mining Workloads
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In Proc. 2006 IEEE Int'l Symposium on Workload Characterization (ISWC '06), pages 182-188, 2006.
- (2006) Proc. 2006 IEEE Int'l Symposium on Workload Characterization (ISWC '06) , pp. 182-188
- Narayanan, R.¹ Ozisikyilmaz, B.² Zambreno, J.³ Memik, G.⁴ Choudhary, A.⁵

21
- 78651550268
- Scalable Parallel Programming with CUDA
- J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable Parallel Programming with CUDA. ACM Queue, 6(2):40-53, 2008.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

22
- 0030259458
- The Case for a Single-Chip Multiprocessor
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The Case for a Single-Chip Multiprocessor. In Proc. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pages 2-11, 1996.
- (1996) Proc. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII) , pp. 2-11
- Olukotun, K.¹ Nayfeh, B.A.² Hammond, L.³ Wilson, K.⁴ Chang, K.⁵

23
- 84868071789
- M. Raab, L. Grünschloss, J. Hanikaz, M. Finckh, and A. Keller. bwfirt. http://bwfirt.sourceforge.net/.
- M. Raab, L. Grünschloss, J. Hanikaz, M. Finckh, and A. Keller. bwfirt. http://bwfirt.sourceforge.net/.

24
- 38849131252
- High-Throughput Sequence Alignment using Graphics Processing Units
- M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. High-Throughput Sequence Alignment using Graphics Processing Units. BMC Bioinformatics, 8(1):474, 2007.
- (2007) BMC Bioinformatics , vol.8 , Issue.1 , pp. 474
- Schatz, M.¹ Trapnell, C.² Delcher, A.³ Varshney, A.⁴

25
- 49249086142
- Larrabee: A Many-Core x86 Architecture for Visual Computing
- L. Seiler et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Trans. on Graphics, 27(3):1-15, 2008.
- (2008) ACM Trans. on Graphics , vol.27 , Issue.3 , pp. 1-15
- Seiler, L.¹

26
- 12844269176
- Continual Flow Pipelines
- S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual Flow Pipelines. In Proc. 11th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI), pages 107-119, 2004.
- (2004) Proc. 11th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI) , pp. 107-119
- Srinivasan, S.T.¹ Rajwar, R.² Akkary, H.³ Gandhi, A.⁴ Upton, M.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.