SCOPUS 정보 검색 플랫폼

Proceedings - International Symposium on Computer Architecture

Volumn , Issue , 2013, Pages 368-379

SIMD divergence optimization through intra-warp compaction

(5) Vaidya, Aniruddha S a,b Shayesteh, Anahita a Woo, Dong Hyuk a Saharoy, Roy a Azimi, Mani a

a INTEL CORPORATION (United States)

b NVIDIA (United States)

Author keywords

Branch divergence; GPU; SIMD

Indexed keywords

BRANCH DIVERGENCE; COMPRESSION TECHNIQUES; DATA-PARALLEL APPLICATIONS; GPU; INSTRUCTION STREAMS; MICRO ARCHITECTURES; MICRO-ARCHITECTURAL OPTIMIZATION; SIMD;

APPLICATION PROGRAMMING INTERFACES (API); COMPACTION; COMPUTER ARCHITECTURE; COMPUTER GRAPHICS; ENERGY EFFICIENCY; OPTIMIZATION;

PROGRAM PROCESSORS;

EID: 84881183039 PISSN: 10636897 EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2485922.2485954 Document Type: Conference Paper

Times cited : (29)

References (34)

1
- 84881149867
- AMD Radeon HD 7970 Graphics, AMD. [Online]. Available: amd.com
- AMD Radeon HD 7970 Graphics

2
- 0009616548
- Ph.D. dissertation, UC Berkeley
- K. Asanovic, "Vector microprocessors," Ph.D. dissertation, UC Berkeley, 1998.
- (1998) Vector Microprocessors
- Asanovic, K.¹

3
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in Proceedings of International Symposium on Performance Analsys of Systems and Software, 2009.
- (2009) Proceedings of International Symposium on Performance Analsys of Systems and Software
- Bakhoda, A.¹ Yuan, G.² Fung, W.³ Wong, H.⁴ Aamodt, T.⁵

4
- 80052536247
- Ph.D. dissertation, MIT
- C. F. Batten, "Simplified Vector-Thread Architectures for Flexible and Efficient Data-Parallel Accelerators," Ph.D. dissertation, MIT, 2010.
- (2010) Simplified Vector-Thread Architectures for Flexible and Efficient Data-Parallel Accelerators
- Batten, C.F.¹

5
- 84864834311
- Simultaneous branch and warp interweaving for sustained GPU performance
- N. Brunie, S. Collange, and G. Diamos, "Simultaneous branch and warp interweaving for sustained GPU performance," in Proceedings of International Symposium on Computer Architecture, 2012, pp. 49-60.
- (2012) Proceedings of International Symposium on Computer Architecture , pp. 49-60
- Brunie, N.¹ Collange, S.² Diamos, G.³

6
- 84881177946
- Computer History Museum resource
- ILLIAC IV - System Description, Burroughs Corp, 1974, Computer History Museum resource.
- (1974) ILLIAC IV - System Description, Burroughs Corp

7
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of International Symposium on Workload Characterization, 2009, pp. 44-54.
- (2009) Proceedings of International Symposium on Workload Characterization , pp. 44-54
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.⁵ Lee, S.⁶ Skadron, K.⁷

8
- 78751505898
- A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads
- S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in Proceedings of International Symposium on Workload Characterization, 2010.
- (2010) Proceedings of International Symposium on Workload Characterization
- Che, S.¹ Sheaffer, J.² Boyer, M.³ Szafaryn, L.⁴ Wang, L.⁵ Skadron, K.⁶

9
- 84863351470
- SIMD re-convergence at thread frontiers
- G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili, "SIMD re-convergence at thread frontiers," in Proceedings of International Symposium on Microarchitecture, 2011, pp. 477-488.
- (2011) Proceedings of International Symposium on Microarchitecture , pp. 477-488
- Diamos, G.¹ Ashbaugh, B.² Maiyuran, S.³ Kerr, A.⁴ Wu, H.⁵ Yalamanchili, S.⁶

10
- 0030784080
- Multithreaded vector architectures
- R. Espasa and M. Valero, "Multithreaded vector architectures," in International Symposium on High Performance Computer Architecture, 1997, pp. 237-248.
- (1997) International Symposium on High Performance Computer Architecture , pp. 237-248
- Espasa, R.¹ Valero, M.²

11
- 79955923056
- Thread block compaction for efficient simt control flow
- W. Fung and T. Aamodt, "Thread block compaction for efficient simt control flow," in International Symposium on High Performance Computer Architecture, 2011, pp. 25-36.
- (2011) International Symposium on High Performance Computer Architecture , pp. 25-36
- Fung, W.¹ Aamodt, T.²

12
- 47349104432
- Dynamic warp formation and scheduling for efficient GPU control flow
- W. Fung, I. Sham, G. Yuan, and T. Aamodt, "Dynamic warp formation and scheduling for efficient GPU control flow," in Proceedings of International Symposium on Microarchitecture, 2007, pp. 407-420.
- (2007) Proceedings of International Symposium on Microarchitecture , pp. 407-420
- Fung, W.¹ Sham, I.² Yuan, G.³ Aamodt, T.⁴

13
- 84928640313
- Intel next generation microarchitecture code name IvyBridge
- Technology Insight Video
- V. George and H. Jiang, "Intel next generation microarchitecture code name IvyBridge," in Intel Developer Forum, 2012, Technology Insight Video.
- (2012) Intel Developer Forum
- George, V.¹ Jiang, H.²

14
- 84862154605
- Reducing branch divergence in GPU programs
- T. Han and T. Abdelrahman, "Reducing branch divergence in GPU programs," in Workshop on General Purpose Processing on GPU, 2011, p. 3.
- (2011) Workshop on General Purpose Processing on GPU , pp. 3
- Han, T.¹ Abdelrahman, T.²

15
- 84881191580
- GPU Computing Gems - Jade and Emerald Eds
- W. Hwu, Ed., GPU Computing Gems - Jade and Emerald Eds. Morgan Kaufmann, 2011.
- (2011) Morgan Kaufmann
- Hwu, W.¹

16
- 84881189397
- Intel Corp, April
- DirectX Developer's Guide for Intel Processor Graphics: Maximizing Performance on the New Intel Microarchitecture Codenamed IvyBridge, Intel Corp, April 2012. [Online]. Available: software.intel.com
- (2012) DirectX Developer's Guide for Intel Processor Graphics: Maximizing Performance on the New Intel Microarchitecture Codenamed IvyBridge

17
- 84881186270
- Intel open source HD graphics programmer's reference manual (PRM) for 2012 intel core processor family (codenamed IvyBridge)
- Intel Open Source HD Graphics Programmer's Reference Manual (PRM) for 2012 Intel Core Processor Family (codenamed IvyBridge), Intel Corp, 2012. [Online]. Available: intellinuxgraphics.org
- (2012) Intel Corp

18
- 84881185788
- Intel SDK for OpenCL applications 2012: OpenCL optimization guide
- Intel SDK for OpenCL Applications 2012: OpenCL Optimization Guide, Intel Corp, 2012. [Online]. Available: software.intel.com
- (2012) Intel Corp

19
- 84892505215
- D. Kanter, "Intel's IvyBridge graphics architecture.," [Online]. Available: realworldtech.com/ivy-bridge-gpu/
- Intel's IvyBridge Graphics Architecture
- Kanter, D.¹

20
- 74349092397
- The Khronos Group
- OpenCL - The open standard for parallel programming of heterogeneous systems, The Khronos Group. [Online]. Available: khronos.org/opencl/
- OpenCL - The Open Standard for Parallel Programming of Heterogeneous Systems

21
- 80052543989
- Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
- Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart, C. Batten, and K. Asanović, "Exploring the Tradeoffs between Programmability and Efficiency in Data-parallel Accelerators," in Proceedings of International Symposium on Computer Architecture, 2011, pp. 129-140.
- (2011) Proceedings of International Symposium on Computer Architecture , pp. 129-140
- Lee, Y.¹ Avizienis, R.² Bishara, A.³ Xia, R.⁴ Lockhart, D.⁵ Batten, C.⁶ Asanović, K.⁷

22
- 0021458622
- Chap - A simd graphics processor
- A. Levinthal and T. Porter, "Chap-a simd graphics processor," in ACM SIGGRAPH Computer Graphics, Vol. 18, no. 3, 1984, pp. 77-82.
- (1984) ACM SIGGRAPH Computer Graphics , vol.18 , Issue.3 , pp. 77-82
- Levinthal, A.¹ Porter, T.²

23
- 77954976292
- Dynamic warp subdivision for integrated branch and memory divergence tolerance
- J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," in Proceedings of International Symposium on Computer Architecture, 2010, pp. 235-246.
- (2010) Proceedings of International Symposium on Computer Architecture , pp. 235-246
- Meng, J.¹ Tarjan, D.² Skadron, K.³

24
- 84879701807
- Compute Shader Overview, Microsoft Corp. [Online]. Available: msdn.microsoft.com/en-us/library/ff476331.aspx
- Compute Shader Overview

25
- 84863342255
- Improving GPU performance via large warps and two-level warp scheduling
- V. Narasiman, M. Shebanow, C. Lee, R. Miftakhutdinov, O. Mutlu, and Y. Patt, "Improving GPU performance via large warps and two-level warp scheduling," in Proceedings of International Symposium on Microarchitecture, 2011, pp. 308-317.
- (2011) Proceedings of International Symposium on Microarchitecture , pp. 308-317
- Narasiman, V.¹ Shebanow, M.² Lee, C.³ Miftakhutdinov, R.⁴ Mutlu, O.⁵ Patt, Y.⁶

26
- 47349085381
- November
- Technical Brief: NVIDIA GeForce 8800 GPU Architecture Overview, Nvidia Corp, November 2006. [Online]. Available: nvidia.com
- (2006) Technical Brief: NVIDIA GeForce 8800 GPU Architecture Overview

27
- 84866918568
- April
- NVIDIA CUDA C Programming Guide: Version 4.2, Nvidia Corp, April 2012. [Online]. Available: nvidia.com
- (2012) NVIDIA CUDA C Programming Guide: Version 4.2

28
- 84872539869
- NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110, Nvidia Corp, 2012. [Online]. Available: nvidia.com
- (2012) NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110

29
- 49049088756
- Gpu computing
- J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, "Gpu computing," Proceedings of of IEEE, Vol. 96, no. 5, pp. 879-899, 2008.
- (2008) Proceedings of Of IEEE , vol.96 , Issue.5 , pp. 879-899
- Owens, J.¹ Houston, M.² Luebke, D.³ Green, S.⁴ Stone, J.⁵ Phillips, J.⁶

30
- 84864855982
- CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
- M. Rhu and M. Erez, "CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures," in Proceedings of International Symposium on Computer Architecture, 2012, pp. 61-71.
- (2012) Proceedings of International Symposium on Computer Architecture , pp. 61-71
- Rhu, M.¹ Erez, M.²

31
- 34547456450
- Vector lane threading
- S. Rivoire, R. Schultz, T. Okuda, and C. Kozyrakis, "Vector lane threading," in Proceedings of International Conference on Parallel Processing, 2006, pp. 55-64.
- (2006) Proceedings of International Conference on Parallel Processing , pp. 55-64
- Rivoire, S.¹ Schultz, R.² Okuda, T.³ Kozyrakis, C.⁴

32
- 0033727057
- Vector instruction set support for conditional operations
- J. E. Smith, S. G. Faanes, and R. Sugumar, "Vector instruction set support for conditional operations," in Proceedings of International Symposium on Computer Architecture, 2000, pp. 260-269.
- (2000) Proceedings of International Symposium on Computer Architecture , pp. 260-269
- Smith, J.E.¹ Faanes, S.G.² Sugumar, R.³

33
- 80052470693
- Active thread compaction for GPU path tracing
- I. Wald, "Active thread compaction for GPU path tracing," in Proceedings of ACM SIGGRAPH Symposium on High Performance Graphics, 2011, pp. 51-58.
- (2011) Proceedings of ACM SIGGRAPH Symposium on High Performance Graphics , pp. 51-58
- Wald, I.¹

34
- 84881124628
- May
- D. Woligroski, "AMD A10-4600M review: Mobile trinity gets tested," Tom's Hardware, May 2012. [Online]. Available: tomshardware.com
- (2012) AMD A10-4600M Review: Mobile Trinity Gets Tested
- Woligroski, D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.