-
1
-
-
77951154340
-
The Gpu Computing Era
-
Mar-Apr
-
J. Nickolls and W. J. Dally, "The Gpu Computing Era," IEEE Micro, vol. 30, pp. 56-69, Mar-Apr 2010.
-
(2010)
IEEE Micro
, vol.30
, pp. 56-69
-
-
Nickolls, J.1
Dally, W.J.2
-
2
-
-
78149258346
-
Understanding Throughput-Oriented Architectures
-
Nov
-
M. Garland and D. B. Kirk, "Understanding Throughput-Oriented Architectures," Communications of the Acm, vol. 53, pp. 58-66, Nov 2010.
-
(2010)
Communications of the Acm
, vol.53
, pp. 58-66
-
-
Garland, M.1
Kirk, D.B.2
-
3
-
-
65349159175
-
Compute Unified Device Architecture Application Suitability
-
H. Wen-Mei, C. Rodrigues, S. Ryoo, and J. Stratton, "Compute Unified Device Architecture Application Suitability," Computing in Science & Engineering, vol. 11, pp. 16-26, 2009.
-
(2009)
Computing in Science & Engineering
, vol.11
, pp. 16-26
-
-
Wen-Mei, H.1
Rodrigues, C.2
Ryoo, S.3
Stratton, J.4
-
7
-
-
77954020709
-
Exploiting inter-thread temporal locality for chip multithreading
-
M. Jiayuan, J. W. Sheaffer, and K. Skadron, "Exploiting inter-thread temporal locality for chip multithreading," in Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010, pp. 1-12.
-
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010
, pp. 1-12
-
-
Jiayuan, M.1
Sheaffer, J.W.2
Skadron, K.3
-
8
-
-
0029235623
-
Hierarchical tiling for improved superscalar performance
-
L. Carter, J. Ferrante, and S. F. Hummel, "Hierarchical tiling for improved superscalar performance," in Parallel Processing Symposium, 1995. Proceedings., 9th International, 1995, pp. 239-245.
-
Parallel Processing Symposium, 1995. Proceedings., 9th International, 1995
, pp. 239-245
-
-
Carter, L.1
Ferrante, J.2
Hummel, S.F.3
-
9
-
-
0030685988
-
Data-centric multi-level blocking
-
presented at the
-
I. Kodukula, N. Ahmed, and K. Pingali, "Data-centric multi-level blocking," presented at the Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, Las Vegas, Nevada, United States, 1997.
-
Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, Las Vegas, Nevada, United States, 1997
-
-
Kodukula, I.1
Ahmed, N.2
Pingali, K.3
-
10
-
-
76349105923
-
Taming irregular EDA applications on GPUs
-
D. Yangdong, B. D. Wang, and M. Shuai, "Taming irregular EDA applications on GPUs," in Proceedings of the 2009 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2009), 2009, pp. 539-46.
-
Proceedings of the 2009 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2009), 2009
, pp. 539-546
-
-
Yangdong, D.1
Wang, B.D.2
Shuai, M.3
-
11
-
-
0033075285
-
Effects of multithreading on cache performance
-
Feb
-
H. Kwak, B. Lee, A. R. Hurson, S. H. Yoon, and W. J. Hahn, "Effects of multithreading on cache performance," Ieee Transactions on Computers, vol. 48, pp. 176-184, Feb 1999.
-
(1999)
Ieee Transactions on Computers
, vol.48
, pp. 176-184
-
-
Kwak, H.1
Lee, B.2
Hurson, A.R.3
Yoon, S.H.4
Hahn, W.J.5
-
12
-
-
70649092154
-
Rodinia: A Benchmark Suite for Heterogeneous Computing
-
S. A. Che, M. Boyer, J. Y. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, et al., "Rodinia: A Benchmark Suite for Heterogeneous Computing," Proceedings of the 2009 Ieee International Symposium on Workload Characterization, pp. 44-54, 2009.
-
(2009)
Proceedings of the 2009 Ieee International Symposium on Workload Characterization
, pp. 44-54
-
-
Che, S.A.1
Boyer, M.2
Meng, J.Y.3
Tarjan, D.4
Sheaffer, J.W.5
Lee, S.H.6
-
13
-
-
21244474546
-
Predicting inter-thread cache contention on a chip multi-processor architecture
-
D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting inter-thread cache contention on a chip multi-processor architecture," in 11th International Symposium on High-Performance Computer Architecture, Proceedings, 2005, pp. 340-351.
-
11th International Symposium on High-Performance Computer Architecture, Proceedings, 2005
, pp. 340-351
-
-
Chandra, D.1
Guo, F.2
Kim, S.3
Solihin, Y.4
-
15
-
-
51549120204
-
Towards acceleration of fault simulation using Graphics Processing Units
-
K. Gulati and S. P. Khatri, "Towards acceleration of fault simulation using Graphics Processing Units," in 2008 45th Acm/Ieee Design Automation Conference, Vols 1 and 2, 2008, pp. 822-827.
-
(2008)
2008 45th Acm/Ieee Design Automation Conference
, vol.1-2
, pp. 822-827
-
-
Gulati, K.1
Khatri, S.P.2
-
16
-
-
84990479742
-
An Efficient Heuristic Procedure for Partitioning Graphs
-
B. W. Kernighan and B. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," The Bell system technical journal, vol. 49, pp. 291-307, 1970.
-
(1970)
The Bell System Technical Journal
, vol.49
, pp. 291-307
-
-
Kernighan, B.W.1
Lin, B.2
-
17
-
-
85046457769
-
A Linear-Time Heuristic for Improving Network Partitions
-
C. M. Fiduccia and R. M. Mattheyses, "A Linear-Time Heuristic for Improving Network Partitions," in Design Automation, 1982. 19th Conference on, 1982, pp. 175-181.
-
Design Automation, 1982. 19th Conference on, 1982
, pp. 175-181
-
-
Fiduccia, C.M.1
Mattheyses, R.M.2
-
18
-
-
0032131147
-
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
-
G. Karypis and V. Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs," SIAM J. Sci. Comput., vol. 20, pp. 359-392, 1998.
-
(1998)
SIAM J. Sci. Comput.
, vol.20
, pp. 359-392
-
-
Karypis, G.1
Kumar, V.2
-
19
-
-
70449461110
-
-
Available
-
ITC'99 Benchmarks. Available: http://www.cad.polito.it/downloads/tools/ itc99.html
-
ITC'99 Benchmarks
-
-
-
20
-
-
76649122896
-
-
L.-T. Wang, et al., Eds., ed: Morgan Kaufmann
-
"Electronic Design Automation: Synthesis, Verification, and Test," L.-T. Wang, et al., Eds., ed: Morgan Kaufmann, 2009, pp. 236, 537.
-
(2009)
Electronic Design Automation: Synthesis, Verification, and Test
-
-
|