-
3
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in Proceedings of International Symposium on Performance Analsys of Systems and Software, 2009.
-
(2009)
Proceedings of International Symposium on Performance Analsys of Systems and Software
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.3
Wong, H.4
Aamodt, T.5
-
5
-
-
84864834311
-
Simultaneous branch and warp interweaving for sustained GPU performance
-
N. Brunie, S. Collange, and G. Diamos, "Simultaneous branch and warp interweaving for sustained GPU performance," in Proceedings of International Symposium on Computer Architecture, 2012, pp. 49-60.
-
(2012)
Proceedings of International Symposium on Computer Architecture
, pp. 49-60
-
-
Brunie, N.1
Collange, S.2
Diamos, G.3
-
7
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of International Symposium on Workload Characterization, 2009, pp. 44-54.
-
(2009)
Proceedings of International Symposium on Workload Characterization
, pp. 44-54
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.5
Lee, S.6
Skadron, K.7
-
8
-
-
78751505898
-
A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads
-
S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in Proceedings of International Symposium on Workload Characterization, 2010.
-
(2010)
Proceedings of International Symposium on Workload Characterization
-
-
Che, S.1
Sheaffer, J.2
Boyer, M.3
Szafaryn, L.4
Wang, L.5
Skadron, K.6
-
9
-
-
84863351470
-
SIMD re-convergence at thread frontiers
-
G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu, and S. Yalamanchili, "SIMD re-convergence at thread frontiers," in Proceedings of International Symposium on Microarchitecture, 2011, pp. 477-488.
-
(2011)
Proceedings of International Symposium on Microarchitecture
, pp. 477-488
-
-
Diamos, G.1
Ashbaugh, B.2
Maiyuran, S.3
Kerr, A.4
Wu, H.5
Yalamanchili, S.6
-
12
-
-
47349104432
-
Dynamic warp formation and scheduling for efficient GPU control flow
-
W. Fung, I. Sham, G. Yuan, and T. Aamodt, "Dynamic warp formation and scheduling for efficient GPU control flow," in Proceedings of International Symposium on Microarchitecture, 2007, pp. 407-420.
-
(2007)
Proceedings of International Symposium on Microarchitecture
, pp. 407-420
-
-
Fung, W.1
Sham, I.2
Yuan, G.3
Aamodt, T.4
-
13
-
-
84928640313
-
Intel next generation microarchitecture code name IvyBridge
-
Technology Insight Video
-
V. George and H. Jiang, "Intel next generation microarchitecture code name IvyBridge," in Intel Developer Forum, 2012, Technology Insight Video.
-
(2012)
Intel Developer Forum
-
-
George, V.1
Jiang, H.2
-
15
-
-
84881191580
-
-
GPU Computing Gems - Jade and Emerald Eds
-
W. Hwu, Ed., GPU Computing Gems - Jade and Emerald Eds. Morgan Kaufmann, 2011.
-
(2011)
Morgan Kaufmann
-
-
Hwu, W.1
-
17
-
-
84881186270
-
Intel open source HD graphics programmer's reference manual (PRM) for 2012 intel core processor family (codenamed IvyBridge)
-
Intel Open Source HD Graphics Programmer's Reference Manual (PRM) for 2012 Intel Core Processor Family (codenamed IvyBridge), Intel Corp, 2012. [Online]. Available: intellinuxgraphics.org
-
(2012)
Intel Corp
-
-
-
18
-
-
84881185788
-
Intel SDK for OpenCL applications 2012: OpenCL optimization guide
-
Intel SDK for OpenCL Applications 2012: OpenCL Optimization Guide, Intel Corp, 2012. [Online]. Available: software.intel.com
-
(2012)
Intel Corp
-
-
-
21
-
-
80052543989
-
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
-
Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart, C. Batten, and K. Asanović, "Exploring the Tradeoffs between Programmability and Efficiency in Data-parallel Accelerators," in Proceedings of International Symposium on Computer Architecture, 2011, pp. 129-140.
-
(2011)
Proceedings of International Symposium on Computer Architecture
, pp. 129-140
-
-
Lee, Y.1
Avizienis, R.2
Bishara, A.3
Xia, R.4
Lockhart, D.5
Batten, C.6
Asanović, K.7
-
22
-
-
0021458622
-
Chap - A simd graphics processor
-
A. Levinthal and T. Porter, "Chap-a simd graphics processor," in ACM SIGGRAPH Computer Graphics, Vol. 18, no. 3, 1984, pp. 77-82.
-
(1984)
ACM SIGGRAPH Computer Graphics
, vol.18
, Issue.3
, pp. 77-82
-
-
Levinthal, A.1
Porter, T.2
-
23
-
-
77954976292
-
Dynamic warp subdivision for integrated branch and memory divergence tolerance
-
J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," in Proceedings of International Symposium on Computer Architecture, 2010, pp. 235-246.
-
(2010)
Proceedings of International Symposium on Computer Architecture
, pp. 235-246
-
-
Meng, J.1
Tarjan, D.2
Skadron, K.3
-
24
-
-
84879701807
-
-
Compute Shader Overview, Microsoft Corp. [Online]. Available: msdn.microsoft.com/en-us/library/ff476331.aspx
-
Compute Shader Overview
-
-
-
25
-
-
84863342255
-
Improving GPU performance via large warps and two-level warp scheduling
-
V. Narasiman, M. Shebanow, C. Lee, R. Miftakhutdinov, O. Mutlu, and Y. Patt, "Improving GPU performance via large warps and two-level warp scheduling," in Proceedings of International Symposium on Microarchitecture, 2011, pp. 308-317.
-
(2011)
Proceedings of International Symposium on Microarchitecture
, pp. 308-317
-
-
Narasiman, V.1
Shebanow, M.2
Lee, C.3
Miftakhutdinov, R.4
Mutlu, O.5
Patt, Y.6
-
29
-
-
49049088756
-
Gpu computing
-
J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, "Gpu computing," Proceedings of of IEEE, Vol. 96, no. 5, pp. 879-899, 2008.
-
(2008)
Proceedings of Of IEEE
, vol.96
, Issue.5
, pp. 879-899
-
-
Owens, J.1
Houston, M.2
Luebke, D.3
Green, S.4
Stone, J.5
Phillips, J.6
-
30
-
-
84864855982
-
CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
-
M. Rhu and M. Erez, "CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures," in Proceedings of International Symposium on Computer Architecture, 2012, pp. 61-71.
-
(2012)
Proceedings of International Symposium on Computer Architecture
, pp. 61-71
-
-
Rhu, M.1
Erez, M.2
-
31
-
-
34547456450
-
Vector lane threading
-
S. Rivoire, R. Schultz, T. Okuda, and C. Kozyrakis, "Vector lane threading," in Proceedings of International Conference on Parallel Processing, 2006, pp. 55-64.
-
(2006)
Proceedings of International Conference on Parallel Processing
, pp. 55-64
-
-
Rivoire, S.1
Schultz, R.2
Okuda, T.3
Kozyrakis, C.4
-
32
-
-
0033727057
-
Vector instruction set support for conditional operations
-
J. E. Smith, S. G. Faanes, and R. Sugumar, "Vector instruction set support for conditional operations," in Proceedings of International Symposium on Computer Architecture, 2000, pp. 260-269.
-
(2000)
Proceedings of International Symposium on Computer Architecture
, pp. 260-269
-
-
Smith, J.E.1
Faanes, S.G.2
Sugumar, R.3
|