-
1
-
-
0042674307
-
The LINPACK benchmark: Past, present and future
-
J. Dongarra, P. Luszczek, A. Petitet, The LINPACK Benchmark: past, present and future, Concurrency and Computation: Practice and Experience 15 (9) (2003) 803-820.
-
(2003)
Concurrency and Computation: Practice and Experience
, vol.15
, Issue.9
, pp. 803-820
-
-
Dongarra, J.1
Luszczek, P.2
Petitet, A.3
-
2
-
-
35348885705
-
LAPACK users' guide
-
E. Anderson, Z. Bai, C. Bischof, LAPACK Users' guide, Vol. 9, Society for Industrial Mathematics, 1999.
-
(1999)
Society for Industrial Mathematics
, vol.9
-
-
Anderson, E.1
Bai, Z.2
Bischof, C.3
-
3
-
-
0026294379
-
The NAS parallel benchmarks summary and preliminary results
-
Proceedings of the 1991 ACM/IEEE Conference on, IEEE
-
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, et al, The NAS parallel benchmarks summary and preliminary results, in: Supercomputing, 1991. Supercomputing'91. Proceedings of the 1991 ACM/IEEE Conference on, IEEE, 1991, pp. 158-165.
-
(1991)
Supercomputing, 1991. Supercomputing'91
, pp. 158-165
-
-
Bailey, D.1
Barszcz, E.2
Barton, J.3
Browning, D.4
Carter, R.5
Dagum, L.6
Fatoohi, R.7
Frederickson, P.8
Lasinski, T.9
Schreiber, R.10
-
4
-
-
83755206878
-
CaKernel - A parallel application programming framework for heterogenous computing architectures
-
M. Blazewicz, S. Brandt, M. Kierzynka, K. Kurowski, B. Ludwiczak, J. Tao, J. Weglarz, CaKernel - a parallel application programming framework for heterogenous computing architectures, Scientific Programming 19 (4) (2011) 185-197.
-
(2011)
Scientific Programming
, vol.19
, Issue.4
, pp. 185-197
-
-
Blazewicz, M.1
Brandt, S.2
Kierzynka, M.3
Kurowski, K.4
Ludwiczak, B.5
Tao, J.6
Weglarz, J.7
-
5
-
-
85030961442
-
-
LNCS, To appear
-
M. Ciznicki, M. Kierzynka, K. Kurowski, B. Ludwiczak, K. Napierala, J. Palczynski, Efficient isosurface extraction using marching tetrahedra and histogram pyramids on multiple GPUs, LNCS, To appear.
-
Efficient Isosurface Extraction Using Marching Tetrahedra and Histogram Pyramids on Multiple GPUs
-
-
Ciznicki, M.1
Kierzynka, M.2
Kurowski, K.3
Ludwiczak, B.4
Napierala, K.5
Palczynski, J.6
-
6
-
-
79956218804
-
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
-
(181)
-
J. Blazewicz, W. Frohmberg, M. Kierzynka, E. Pesch, P. Wojciechowski, Protein alignment algorithms with an efficient backtracking routine on multiple GPUs, BMC Bioinformatics 12:181 (181).
-
BMC Bioinformatics
, vol.12
, pp. 181
-
-
Blazewicz, J.1
Frohmberg, W.2
Kierzynka, M.3
Pesch, E.4
Wojciechowski, P.5
-
7
-
-
85030960469
-
G-MSA - GPU-based, fast and accurate algorithm for multiple sequence alignment
-
To appear
-
J. Blazewicz, W. Frohmberg, M. Kierzynka, P. Wojciechowski, G-MSA - GPU-based, fast and accurate algorithm for multiple sequence alignment, Journal of Parallel and Distributed Computing, To appear.
-
Journal of Parallel and Distributed Computing
-
-
Blazewicz, J.1
Frohmberg, W.2
Kierzynka, M.3
Wojciechowski, P.4
-
8
-
-
79958266939
-
Parallel application benchmarks and performance evaluation of the intel xeon 7500 family processors
-
P. Kopta, M. Kulczewski, K. Kurowski, T. Piontek, P. Gepner, M. Puchalski, J. Komasa, Parallel application benchmarks and performance evaluation of the Intel Xeon 7500 family processors, Procedia Computer Science 4 (2011) 372-381.
-
(2011)
Procedia Computer Science
, vol.4
, pp. 372-381
-
-
Kopta, P.1
Kulczewski, M.2
Kurowski, K.3
Piontek, T.4
Gepner, P.5
Puchalski, M.6
Komasa, J.7
-
9
-
-
79958284905
-
-
Innovative Computing Laboratory, University of Tennessee, Tech. Rep.
-
R. Nath, S. Tomov, J. Dongarra, An improved MAGMA GEMM for Fermi GPUs, Innovative Computing Laboratory, University of Tennessee, Tech. Rep.
-
An Improved MAGMA GEMM for Fermi GPUs
-
-
Nath, R.1
Tomov, S.2
Dongarra, J.3
-
10
-
-
77954701719
-
FAST: Fast architecture sensitive tree search on modern CPUs and GPUs
-
ACM
-
C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. Nguyen, T. Kaldewey, V. Lee, S. Brandt, P. Dubey, FAST: fast architecture sensitive tree search on modern CPUs and GPUs, in: Proceedings of the 2010 international conference on Management of data, ACM, 2010, pp. 339-350.
-
(2010)
Proceedings of the 2010 International Conference on Management of Data
, pp. 339-350
-
-
Kim, C.1
Chhugani, J.2
Satish, N.3
Sedlar, E.4
Nguyen, A.5
Kaldewey, T.6
Lee, V.7
Brandt, S.8
Dubey, P.9
-
11
-
-
77954743119
-
Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort
-
ACM
-
N. Satish, C. Kim, J. Chhugani, A. Nguyen, V. Lee, D. Kim, P. Dubey, Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort, in: Proceedings of the 2010 international conference on Management of data, ACM, 2010, pp. 351-362.
-
(2010)
Proceedings of the 2010 International Conference on Management of Data
, pp. 351-362
-
-
Satish, N.1
Kim, C.2
Chhugani, J.3
Nguyen, A.4
Lee, V.5
Kim, D.6
Dubey, P.7
-
12
-
-
74049114159
-
Auto-tuning 3-D FFT library for CUDA GPUs
-
ACM
-
A. Nukada, S. Matsuoka, Auto-tuning 3-D FFT library for CUDA GPUs, in: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ACM, 2009, p. 30.
-
(2009)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
, pp. 30
-
-
Nukada, A.1
Matsuoka, S.2
-
14
-
-
78649781713
-
Accelerating wavelet lifting on graphics hardware using CUDA
-
W. van der Laan, A. Jalba, J. Roerdink, Accelerating wavelet lifting on graphics hardware using CUDA, IEEE Trans. Parallel Distrib. Syst 22 (1) (2011) 132-146.
-
(2011)
IEEE Trans. Parallel Distrib. Syst
, vol.22
, Issue.1
, pp. 132-146
-
-
Van Der, W.1
Laan, A.J.2
Roerdink, J.3
-
15
-
-
60649110203
-
Computing discrete transforms on the cell broadband engine
-
D. Bader, V. Agarwal, S. Kang, Computing discrete transforms on the Cell Broadband Engine, Parallel Computing 35 (3) (2009) 119-137.
-
(2009)
Parallel Computing
, vol.35
, Issue.3
, pp. 119-137
-
-
Bader, D.1
Agarwal, V.2
Kang, S.3
-
16
-
-
70349160422
-
A parallel implementation of the 2D wavelet transform using CUDA
-
2009 17th Euromicro International Conference on, IEEE
-
J. Franco, G. Bernabé, J. Fernández, M. Acacio, A parallel implementation of the 2D wavelet transform using CUDA, in: Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on, IEEE, 2009, pp. 111-118.
-
(2009)
Parallel, Distributed and Network-based Processing
, pp. 111-118
-
-
Franco, J.1
Bernabé, G.2
Fernández, J.3
Acacio, M.4
-
17
-
-
84895561443
-
Considerations when evaluating microprocessor platforms
-
M. Anderson, B. Catanzaro, J. Chong, E. Gonina, K. Keutzer, C. Lai, M. Murphy, D. Sheffield, B. Su, N. Sundaram, Considerations when evaluating microprocessor platforms, in: Proceedings of the 3rd USENIX conference on Hot topic in parallelism, USENIX Association, 2011, pp. 1-1.
-
(2011)
Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism, USENIX Association
, pp. 1-1
-
-
Anderson, M.1
Catanzaro, B.2
Chong, J.3
Gonina, E.4
Keutzer, K.5
Lai, C.6
Murphy, M.7
Sheffield, D.8
Su, B.9
Sundaram, N.10
-
18
-
-
77954995885
-
Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
ACM
-
V. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, et al, Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, in: ACM SIGARCH Computer Architecture News, Vol. 38, ACM, 2010, pp. 451-460.
-
(2010)
ACM SIGARCH Computer Architecture News
, vol.38
, pp. 451-460
-
-
Lee, V.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
-
19
-
-
85092761228
-
On the limits of GPU acceleration
-
R. Vuduc, A. Chandramowlishwaran, J. Choi, M. Guney, A. Shringarpure, On the limits of GPU acceleration, in: Proceedings of the 2nd USENIX conference on Hot topics in parallelism, USENIX Association, 2010, pp. 13-13.
-
(2010)
Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism, USENIX Association
, pp. 13-13
-
-
Vuduc, R.1
Chandramowlishwaran, A.2
Choi, J.3
Guney, M.4
Shringarpure, A.5
-
20
-
-
80052336166
-
Lessons learned from exploring the backtracking paradigm on the GPU
-
J. Jenkins, I. Arkatkar, J. Owens, A. Choudhary, N. Samatova, Lessons learned from exploring the backtracking paradigm on the GPU, Euro-Par 2011 Parallel Processing (2011) 425-437.
-
(2011)
Euro-Par 2011 Parallel Processing
, pp. 425-437
-
-
Jenkins, J.1
Arkatkar, I.2
Owens, J.3
Choudhary, A.4
Samatova, N.5
-
22
-
-
78649975505
-
JPEG 2000 compression of medical imagery
-
D. Foes, E. Mukab, R. Sloneb, B. Erickson, M. Flynr, D. Clunie, K. Lloyd Hildebrand, S. Younga, JPEG 2000 compression of medical imagery, in: Proc. of SPIE, Vol. 3980, 2002.
-
(2002)
Proc. of SPIE
, vol.3980
-
-
Foes, D.1
Mukab, E.2
Sloneb, R.3
Erickson, B.4
Flynr, M.5
Clunie, D.6
Lloyd Hildebrand, K.7
Younga, S.8
-
23
-
-
81155126075
-
GPU implementation of JPEG2000 for hyperspectral image compression
-
M. Ciznicki, K. Kurowski, A. Plaza, GPU implementation of JPEG2000 for hyperspectral image compression, in: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 8183, 2011, p. 12.
-
(2011)
Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series
, vol.8183
, pp. 12
-
-
Ciznicki, M.1
Kurowski, K.2
Plaza, A.3
-
24
-
-
0034229499
-
High performance scalable image compression with EBCOT
-
IEEE transactions on
-
D. Taubman, High performance scalable image compression with EBCOT, Image Processing, IEEE transactions on 9 (7) (2000) 1158-1170.
-
(2000)
Image Processing
, vol.9
, Issue.7
, pp. 1158-1170
-
-
Taubman, D.1
-
25
-
-
34247544326
-
Transform coding techniques for lossy hyperspectral data compression
-
IEEE Transactions on
-
B. Penna, T. Tillo, E. Magli, G. Olmo, Transform coding techniques for lossy hyperspectral data compression, Geoscience and Remote Sensing, IEEE Transactions on 45 (5) (2007) 1408-1421.
-
(2007)
Geoscience and Remote Sensing
, vol.45
, Issue.5
, pp. 1408-1421
-
-
Penna, B.1
Tillo, T.2
Magli, E.3
Olmo, G.4
-
27
-
-
57649208491
-
A performance evaluation of the nehalem quad-core processor for scientific computing
-
K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang, S. Pakin, J. Sancho, A performance evaluation of the Nehalem quad-core processor for scientific computing, Parallel Processing Letters 18 (4).
-
Parallel Processing Letters
, vol.18
, Issue.4
-
-
Barker, K.1
Davis, K.2
Hoisie, A.3
Kerbyson, D.4
Lang, M.5
Pakin, S.6
Sancho, J.7
-
28
-
-
77956439996
-
Early performance evaluation of new six-core intel® xeon® 5600 family processors for HPC
-
P. Gepner, M. Kowalik, D. Fraser, K. Wackowski, Early performance evaluation of new Six-Core Intel® Xeon® 5600 family processors for HPC, in: Parallel and Distributed Computing (ISPDC), 2010 Ninth International Symposium on, IEEE, 2010, pp. 117-124.
-
(2010)
Parallel and Distributed Computing (ISPDC), 2010 Ninth International Symposium On, IEEE
, pp. 117-124
-
-
Gepner, P.1
Kowalik, M.2
Fraser, D.3
Wackowski, K.4
-
31
-
-
77952295055
-
Parallel GPU implementation of iterative pca algorithms
-
M. Andrecut, Parallel GPU implementation of iterative pca algorithms, Journal of Computational Biology 16 (11) (2009) 1593-1599.
-
(2009)
Journal of Computational Biology
, vol.16
, Issue.11
, pp. 1593-1599
-
-
Andrecut, M.1
-
32
-
-
85030958315
-
-
GPU JPEG2K, https://apps.man.poznan.pl/trac/jpeg2k/.
-
GPU JPEG2K
-
-
-
34
-
-
84875579589
-
-
Test images, http://www.imagecompression.info/testimages/.
-
Test Images
-
-
-
35
-
-
33746684281
-
High efficiency EBCOT with parallel coding architecture for JPEG2000
-
(2006)
-
J. Chiang, C. Chang, C. Hsieh, C. Hsia, High efficiency EBCOT with parallel coding architecture for JPEG2000, EURASIP journal on applied signal processing 2006 (2006) 17-17.
-
(2006)
EURASIP Journal on Applied Signal Processing
, pp. 17-17
-
-
Chiang, J.1
Chang, C.2
Hsieh, C.3
Hsia, C.4
-
36
-
-
77950498517
-
Parallelizing motion JPEG 2000 with CUDA
-
ICCEE'09. Second International Conference on IEEE
-
S. Datla, N. Gidijala, Parallelizing motion JPEG 2000 with CUDA, in: Computer and Electrical Engineering, 2009. ICCEE'09. Second International Conference on, Vol. 1, IEEE, 2009, pp. 630-634.
-
(2009)
Computer and Electrical Engineering, 2009
, vol.1
, pp. 630-634
-
-
Datla, S.1
Gidijala, N.2
-
37
-
-
85127697308
-
GPU-based sample-parallel context modeling for EBCOT in JPEG2000
-
Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
-
J. Matela, V. Rusnak, P. Holub, GPU-Based Sample-Parallel Context Modeling for EBCOT in JPEG2000, in: Sixth Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS'10)-Selected Papers, Vol. 16, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp. 77-84.
-
Sixth Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS'10)-Selected Papers
, vol.16
, pp. 77-84
-
-
Matela, J.1
Rusnak, V.2
Holub, P.3
|