-
1
-
-
70449949502
-
-
AMD CodeAnalyst, 2009. http://developer.amd.com/cpu/CodeAnalyst.
-
(2009)
-
-
-
2
-
-
70449963065
-
-
PAPI
-
PAPI, 2009. http://icl.cs.utk.edu/papi.
-
(2009)
-
-
-
3
-
-
70449827682
-
-
The R project for statistical computing, 2009. http:// www.r-project.org/.
-
(2009)
-
-
-
5
-
-
35148835330
-
Implementation of a cone-beam backprojection algorithm on the Cell Broadband Engine processor
-
San Diego, CA, Feb
-
O. Bockenbach, M. Knaup, and M. Kachelriess. Implementation of a cone-beam backprojection algorithm on the Cell Broadband Engine processor. In Proc. SPIE Medical Imaging, San Diego, CA, Feb. 2007.
-
(2007)
Proc. SPIE Medical Imaging
-
-
Bockenbach, O.1
Knaup, M.2
Kachelriess, M.3
-
6
-
-
51449118065
-
A performance study of general-purpose applications on graphics processors using CUDA
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing, 68(10):1370-1380, 2008.
-
(2008)
Journal of Parallel and Distributed Computing
, vol.68
, Issue.10
, pp. 1370-1380
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.W.5
Skadron, K.6
-
7
-
-
52349084750
-
Accelerating compute-intensive applications with GPUs and FPGAs
-
Anaheim, CA, Jun
-
S. Che, J. Li, J.W. Sheaffer, K. Skadron, and J. Lach. Accelerating compute-intensive applications with GPUs and FPGAs. In Proc. 6th IEEE Symp. on Application Specific Processors (SASP), Anaheim, CA, Jun. 2008.
-
(2008)
Proc. 6th IEEE Symp. on Application Specific Processors (SASP)
-
-
Che, S.1
Li, J.2
Sheaffer, J.W.3
Skadron, K.4
Lach, J.5
-
8
-
-
35648955176
-
Cell Broadband Engine Architecture and its first implementation-a performance view
-
Sep
-
T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its first implementation-a performance view. IBM Journal of Research and Developments, 51(5):559-572, Sep. 2007.
-
(2007)
IBM Journal of Research and Developments
, vol.51
, Issue.5
, pp. 559-572
-
-
Chen, T.1
Raghavan, R.2
Dale, J.3
Iwata, E.4
-
9
-
-
47749111716
-
Second generation quad-core Intel Xeon processors bring 45 nm technology and a new level of performance to HPC applications
-
P. Gepner, D. L. Fraser, and M. F. Kowalik. Second generation quad-core Intel Xeon processors bring 45 nm technology and a new level of performance to HPC applications. Lecture Notes in Computer Science, 5101:417-426, 2008.
-
(2008)
Lecture Notes in Computer Science
, vol.5101
, pp. 417-426
-
-
Gepner, P.1
Fraser, D.L.2
Kowalik, M.F.3
-
11
-
-
52649148744
-
Selfoptimizing memory controllers: A reinforcement learning approach
-
Jun
-
E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana. Selfoptimizing memory controllers: A reinforcement learning approach. ACM SIGARCH Computer Architecture News, 36(3):39-50, Jun. 2008.
-
(2008)
ACM SIGARCH Computer Architecture News
, vol.36
, Issue.3
, pp. 39-50
-
-
Ipek, E.1
Mutlu, O.2
Martinez, J.F.3
Caruana, R.4
-
12
-
-
36949033619
-
Performance analysis of Cell Broadband Engine for high memory bandwidth applications
-
San Jose, CA, Apr
-
D. Jimenez-Gonzalez, X. Martorell, and A. Ramirez. Performance analysis of Cell Broadband Engine for high memory bandwidth applications. In Proc. 7th IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS), San Jose, CA, Apr. 2007.
-
(2007)
Proc. 7th IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS)
-
-
Jimenez-Gonzalez, D.1
Martorell, X.2
Ramirez, A.3
-
13
-
-
0002282074
-
A new measure of rank correlation
-
Jun
-
M. G. Kendall. A new measure of rank correlation. Biometrika Trust, 30(1):81-93, Jun. 1938.
-
(1938)
Biometrika Trust
, vol.30
, Issue.1
, pp. 81-93
-
-
Kendall, M.G.1
-
14
-
-
44849137198
-
NVIDIA Tesla: A unified graphics and computing architecture
-
Mar
-
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39-55, Mar. 2008.
-
(2008)
IEEE Micro
, vol.28
, Issue.2
, pp. 39-55
-
-
Lindholm, E.1
Nickolls, J.2
Oberman, S.3
Montrym, J.4
-
15
-
-
0034314462
-
Dynamic access ordering for streamed computations
-
Nov
-
S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle. Dynamic access ordering for streamed computations. IEEE Transactions on Computers, 49(11):1255-1271, Nov. 2000.
-
(2000)
IEEE Transactions on Computers
, vol.49
, Issue.11
, pp. 1255-1271
-
-
McKee, S.A.1
Wulf, W.A.2
Aylor, J.H.3
Klenke, R.H.4
Salinas, M.H.5
Hong, S.I.6
Weikle, D.A.B.7
-
17
-
-
70449987304
-
Enhancing the performance and fairness of shared DRAM systems with parallelismaware batch scheduling
-
Beijing, China, Jun
-
O. Mutlu and T. Moscibroda. Enhancing the performance and fairness of shared DRAM systems with parallelismaware batch scheduling. In Proc. 35th Ann. Int'l Symp. on Computer Architecture (ISCA), Beijing, China, Jun. 2008.
-
(2008)
Proc. 35th Ann. Int'l Symp. on Computer Architecture (ISCA)
-
-
Mutlu, O.1
Moscibroda, T.2
-
18
-
-
47349089021
-
A study of performance impact of memory controller features in multiprocessor server environment
-
Munich, Germany, Jun
-
C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multiprocessor server environment. In Proc. 3rd Workshop on Memory Performance Issues (WMPI), Munich, Germany, Jun. 2004.
-
(2004)
Proc. 3rd Workshop on Memory Performance Issues (WMPI)
-
-
Natarajan, C.1
Christenson, B.2
Briggs, F.3
-
19
-
-
78651550268
-
Scalable parallel programming with CUDA
-
Mar
-
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. ACM Queue, 6(2):40- 53, Mar. 2008.
-
(2008)
ACM Queue
, vol.6
, Issue.2
, pp. 40-53
-
-
Nickolls, J.1
Buck, I.2
Garland, M.3
Skadron, K.4
-
21
-
-
33947588048
-
A survey of general-purpose computation on graphics hardware
-
Mar
-
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, Mar. 2007.
-
(2007)
Computer Graphics Forum
, vol.26
, Issue.1
, pp. 80-113
-
-
Owens, J.D.1
Luebke, D.2
Govindaraju, N.3
Harris, M.4
Kruger, J.5
Lefohn, A.E.6
Purcell, T.J.7
-
22
-
-
47349100893
-
Package technology to address the memory bandwidth challenge for tera-scale computing
-
L. A. Polka, H. Kalyanam, G. Hu, and S. Krishnamoorthy. Package technology to address the memory bandwidth challenge for tera-scale computing. Intel Technology Journal, 11(3), 2007.
-
(2007)
Intel Technology Journal
, vol.11
, Issue.3
-
-
Polka, L.A.1
Kalyanam, H.2
Hu, G.3
Krishnamoorthy, S.4
-
23
-
-
47849130815
-
Effective management of DRAM bandwidth in multicore processors
-
Brasov, Romania, Sep
-
N. Rafique, W. T. Lim, and M. Thottethodi. Effective management of DRAM bandwidth in multicore processors. In Proc. 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), Brasov, Romania, Sep. 2007.
-
(2007)
Proc. 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT)
-
-
Rafique, N.1
Lim, W.T.2
Thottethodi, M.3
-
24
-
-
0033691565
-
Memory access scheduling
-
Vancouver, Canada, Jun
-
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proc. 27th Ann. Int'l Symp. on Computer Architecture (ISCA), Vancouver, Canada, Jun. 2000.
-
(2000)
Proc. 27th Ann. Int'l Symp. on Computer Architecture (ISCA)
-
-
Rixner, S.1
Dally, W.J.2
Kapasi, U.J.3
Mattson, P.4
Owens, J.D.5
-
25
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
Salt Lake City, UT, Feb
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Salt Lake City, UT, Feb. 2008.
-
(2008)
Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Hwu, W.W.6
-
26
-
-
51449112813
-
Program optimization carving for GPU computing
-
S. Ryoo, C. I. Rodrigues, S. S. Stone, J. A. Stratton, S. Ueng, S. S. Baghsorkhi, and W. W. Hwu. Program optimization carving for GPU computing. Journal of Parallel and Distributed Computing, 68(10):1389-1401, 2008.
-
(2008)
Journal of Parallel and Distributed Computing
, vol.68
, Issue.10
, pp. 1389-1401
-
-
Ryoo, S.1
Rodrigues, C.I.2
Stone, S.S.3
Stratton, J.A.4
Ueng, S.5
Baghsorkhi, S.S.6
Hwu, W.W.7
-
27
-
-
70450024002
-
Parallelization schemes for memory optimization on the Cell processor: A case study of image processing algorithm
-
Brasov, Romania, Sep
-
T. Saidani, S. Piskorski, L. Lacassagne, and S. Bouaziz. Parallelization schemes for memory optimization on the Cell processor: a case study of image processing algorithm. In Proc.Workshop on memory performance (MEDEA), Brasov, Romania, Sep. 2007.
-
(2007)
Proc.Workshop on memory performance (MEDEA)
-
-
Saidani, T.1
Piskorski, S.2
Lacassagne, L.3
Bouaziz, S.4
-
28
-
-
51449090534
-
Algorithmic performance studies on graphics processing units
-
O. Schenk, M. Christen, and H. Burkhart. Algorithmic performance studies on graphics processing units. Journal of Parallel and Distributed Computing, 68(10):1360-1369, 2008.
-
(2008)
Journal of Parallel and Distributed Computing
, vol.68
, Issue.10
, pp. 1360-1369
-
-
Schenk, O.1
Christen, M.2
Burkhart, H.3
-
29
-
-
49249086142
-
Larrabee: A many-core x86 architecture for visual computing
-
Aug
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), Aug. 2008.
-
(2008)
ACM Transactions on Graphics
, vol.27
, Issue.3
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
31
-
-
56749158843
-
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
-
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. Int'l Conf. on High Performance Computing and Networking (SC), 2007.
-
(2007)
Proc. Int'l Conf. on High Performance Computing and Networking (SC)
-
-
Williams, S.1
Oliker, L.2
Vuduc, R.3
Shalf, J.4
Yelick, K.5
Demmel, J.6
|