SCOPUS 정보 검색 플랫폼

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Volumn , Issue , 2012, Pages 659-664

Thread affinity mapping for irregular data access on shared cache GPGPU

(4) Kuo, Hsien Kai a Chen, Kuan Ting a Lai, Bo Cheng Charles a Jou, Jing Yang a

a NATIONAL CHIAO TUNG UNIVERSITY (Taiwan)

Author keywords

[No Author keywords available]

Indexed keywords

CONCURRENT THREADS; DATA SHARING; IRREGULAR DATA; MANY-CORE; MAPPING METHODOLOGY; MEMORY ACCESS; MEMORY BOTTLENECK; ON CHIPS; PERFORMANCE BENEFITS; RUNTIMES; SHARED CACHE; TEST CASE;

COMPUTER AIDED DESIGN; FLOCCULATION; PROGRAM PROCESSORS;

CACHE MEMORY;

EID: 84860003663 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASPDAC.2012.6165038 Document Type: Conference Paper

Times cited : (10)

References (20)

1
- 77951154340
- The Gpu Computing Era
- Mar-Apr
- J. Nickolls and W. J. Dally, "The Gpu Computing Era," IEEE Micro, vol. 30, pp. 56-69, Mar-Apr 2010.
- (2010) IEEE Micro , vol.30 , pp. 56-69
- Nickolls, J.¹ Dally, W.J.²

2
- 78149258346
- Understanding Throughput-Oriented Architectures
- Nov
- M. Garland and D. B. Kirk, "Understanding Throughput-Oriented Architectures," Communications of the Acm, vol. 53, pp. 58-66, Nov 2010.
- (2010) Communications of the Acm , vol.53 , pp. 58-66
- Garland, M.¹ Kirk, D.B.²

3
- 65349159175
- Compute Unified Device Architecture Application Suitability
- H. Wen-Mei, C. Rodrigues, S. Ryoo, and J. Stratton, "Compute Unified Device Architecture Application Suitability," Computing in Science & Engineering, vol. 11, pp. 16-26, 2009.
- (2009) Computing in Science & Engineering , vol.11 , pp. 16-26
- Wen-Mei, H.¹ Rodrigues, C.² Ryoo, S.³ Stratton, J.⁴

4
- 35948991669
- Available
- NVIDIA. NVIDIA CUDA C Programming Guide 3.2. Available: http://developer.nvidia.com/object/cuda-download.html
- NVIDIA CUDA C Programming Guide 3.2

5
- 84870404942
- Available
- NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. Available: http://www.nvidia.com.tw/object/LO-gtx400-whitepaper-architecture-tw. html
- NVIDIA's next Generation CUDA Compute Architecture: Fermi

6
- 84859966698
- Available
- Nvidia. GPU Computing SDK code samples 3.2. Available: http://developer.nvidia.com/object/cuda-download.html
- GPU Computing SDK Code Samples 3.2

7
- 77954020709
- Exploiting inter-thread temporal locality for chip multithreading
- M. Jiayuan, J. W. Sheaffer, and K. Skadron, "Exploiting inter-thread temporal locality for chip multithreading," in Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010, pp. 1-12.
- Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010 , pp. 1-12
- Jiayuan, M.¹ Sheaffer, J.W.² Skadron, K.³

8
- 0029235623
- Hierarchical tiling for improved superscalar performance
- L. Carter, J. Ferrante, and S. F. Hummel, "Hierarchical tiling for improved superscalar performance," in Parallel Processing Symposium, 1995. Proceedings., 9th International, 1995, pp. 239-245.
- Parallel Processing Symposium, 1995. Proceedings., 9th International, 1995 , pp. 239-245
- Carter, L.¹ Ferrante, J.² Hummel, S.F.³

9
- 0030685988
- Data-centric multi-level blocking
- presented at the
- I. Kodukula, N. Ahmed, and K. Pingali, "Data-centric multi-level blocking," presented at the Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, Las Vegas, Nevada, United States, 1997.
- Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, Las Vegas, Nevada, United States, 1997
- Kodukula, I.¹ Ahmed, N.² Pingali, K.³

10
- 76349105923
- Taming irregular EDA applications on GPUs
- D. Yangdong, B. D. Wang, and M. Shuai, "Taming irregular EDA applications on GPUs," in Proceedings of the 2009 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2009), 2009, pp. 539-46.
- Proceedings of the 2009 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2009), 2009 , pp. 539-546
- Yangdong, D.¹ Wang, B.D.² Shuai, M.³

11
- 0033075285
- Effects of multithreading on cache performance
- Feb
- H. Kwak, B. Lee, A. R. Hurson, S. H. Yoon, and W. J. Hahn, "Effects of multithreading on cache performance," Ieee Transactions on Computers, vol. 48, pp. 176-184, Feb 1999.
- (1999) Ieee Transactions on Computers , vol.48 , pp. 176-184
- Kwak, H.¹ Lee, B.² Hurson, A.R.³ Yoon, S.H.⁴ Hahn, W.J.⁵

12
- 70649092154
- Rodinia: A Benchmark Suite for Heterogeneous Computing
- S. A. Che, M. Boyer, J. Y. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, et al., "Rodinia: A Benchmark Suite for Heterogeneous Computing," Proceedings of the 2009 Ieee International Symposium on Workload Characterization, pp. 44-54, 2009.
- (2009) Proceedings of the 2009 Ieee International Symposium on Workload Characterization , pp. 44-54
- Che, S.A.¹ Boyer, M.² Meng, J.Y.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Lee, S.H.⁶

13
- 21244474546
- Predicting inter-thread cache contention on a chip multi-processor architecture
- D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting inter-thread cache contention on a chip multi-processor architecture," in 11th International Symposium on High-Performance Computer Architecture, Proceedings, 2005, pp. 340-351.
- 11th International Symposium on High-Performance Computer Architecture, Proceedings, 2005 , pp. 340-351
- Chandra, D.¹ Guo, F.² Kim, S.³ Solihin, Y.⁴

14
- 70350712411
- GPU-Based Parallelization for Fast Circuit Optimization
- Y. F. Liu and J. A. Hu, "GPU-Based Parallelization for Fast Circuit Optimization," in Dac: 2009 46th Acm/Ieee Design Automation Conference, Vols 1 and 2, 2009, pp. 943-946.
- (2009) Dac: 2009 46th Acm/Ieee Design Automation Conference , vol.1-2 , pp. 943-946
- Liu, Y.F.¹ Hu, J.A.²

15
- 51549120204
- Towards acceleration of fault simulation using Graphics Processing Units
- K. Gulati and S. P. Khatri, "Towards acceleration of fault simulation using Graphics Processing Units," in 2008 45th Acm/Ieee Design Automation Conference, Vols 1 and 2, 2008, pp. 822-827.
- (2008) 2008 45th Acm/Ieee Design Automation Conference , vol.1-2 , pp. 822-827
- Gulati, K.¹ Khatri, S.P.²

16
- 84990479742
- An Efficient Heuristic Procedure for Partitioning Graphs
- B. W. Kernighan and B. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," The Bell system technical journal, vol. 49, pp. 291-307, 1970.
- (1970) The Bell System Technical Journal , vol.49 , pp. 291-307
- Kernighan, B.W.¹ Lin, B.²

17
- 85046457769
- A Linear-Time Heuristic for Improving Network Partitions
- C. M. Fiduccia and R. M. Mattheyses, "A Linear-Time Heuristic for Improving Network Partitions," in Design Automation, 1982. 19th Conference on, 1982, pp. 175-181.
- Design Automation, 1982. 19th Conference on, 1982 , pp. 175-181
- Fiduccia, C.M.¹ Mattheyses, R.M.²

18
- 0032131147
- A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
- G. Karypis and V. Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs," SIAM J. Sci. Comput., vol. 20, pp. 359-392, 1998.
- (1998) SIAM J. Sci. Comput. , vol.20 , pp. 359-392
- Karypis, G.¹ Kumar, V.²

19
- 70449461110
- Available
- ITC'99 Benchmarks. Available: http://www.cad.polito.it/downloads/tools/ itc99.html
- ITC'99 Benchmarks

20
- 76649122896
- L.-T. Wang, et al., Eds., ed: Morgan Kaufmann
- "Electronic Design Automation: Synthesis, Verification, and Test," L.-T. Wang, et al., Eds., ed: Morgan Kaufmann, 2009, pp. 236, 537.
- (2009) Electronic Design Automation: Synthesis, Verification, and Test

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.