-
1
-
-
85032750874
-
A survey of multicore architectures
-
G. Blake, R. G. Dreslinski, and T. Mudge, "A survey of multicore architectures," IEEE Signal Processing Mag., vol. 26, no. 6, pp. 26-37, 2009.
-
(2009)
IEEE Signal Processing Mag
, vol.26
, Issue.6
, pp. 26-37
-
-
Blake, G.1
Dreslinski, R.G.2
Mudge, T.3
-
2
-
-
0003474751
-
-
2nd ed. Cambridge, U.K, Cambridge Univ. Press
-
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1992.
-
(1992)
Numerical Recipes in C: The Art of Scientific Computing
-
-
Press, W.H.1
Flannery, B.P.2
Teukolsky, S.A.3
Vetterling, W.T.4
-
3
-
-
19344368072
-
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), 93, no. 2, pp. 232-275, 2005.
-
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, "SPIRAL: Code generation for DSP transforms," Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), vol. 93, no. 2, pp. 232-275, 2005.
-
-
-
-
4
-
-
67650568215
-
Computer generation of general size linear transform libraries
-
Y. Voronenko, F. de Mesmay, and M. Püschel, "Computer generation of general size linear transform libraries," in Proc. Code Generation and Optimization (CGO), 2009, pp. 102-113.
-
(2009)
Proc. Code Generation and Optimization (CGO)
, pp. 102-113
-
-
Voronenko, Y.1
de Mesmay, F.2
Püschel, M.3
-
6
-
-
20744449792
-
-
M. Frigo and S. G. Johnson, The design and implementation of FFTW3, Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), 93, no. 2, pp. 216-231, 2005.
-
M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," Proc. IEEE (Special Issue on Program Generation, Optimization, and Adaptation), vol. 93, no. 2, pp. 216-231, 2005.
-
-
-
-
8
-
-
85032767349
-
-
M. Frigo and S. G. Johnson, FFTW 3.2 [Online]. Available: Www.fftw.org
-
M. Frigo and S. G. Johnson, FFTW 3.2 [Online]. Available: Www.fftw.org
-
-
-
-
10
-
-
60849099135
-
High performance discrete Fourier transforms on graphics processors
-
N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli, "High performance discrete Fourier transforms on graphics processors," in Proc. Supercomputing (SC), 2008, pp. 1-12.
-
(2008)
Proc. Supercomputing (SC)
, pp. 1-12
-
-
Govindaraju, N.K.1
Lloyd, B.2
Dotsenko, Y.3
Smith, B.4
Manferdelli, J.5
-
14
-
-
0025600627
-
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures
-
J. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," IEEE Trans. Circuits Syst., vol. 9, no. 4, pp. 449-500, 1990.
-
(1990)
IEEE Trans. Circuits Syst
, vol.9
, Issue.4
, pp. 449-500
-
-
Johnson, J.1
Johnson, R.W.2
Rodriguez, D.3
Tolimieri, R.4
-
15
-
-
0023347849
-
Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures
-
A. Norton and A. J. Silberger, "Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared-memory architectures," IEEE Trans. Comput., vol. 36, no. 5, pp. 581-591, 1987.
-
(1987)
IEEE Trans. Comput
, vol.36
, Issue.5
, pp. 581-591
-
-
Norton, A.1
Silberger, A.J.2
-
16
-
-
0040546915
-
Block algorithms for FFTs on vector and parallel computer
-
Amsterdam, The Netherlands: Elsevier
-
M. Hegland, "Block algorithms for FFTs on vector and parallel computer," in Parallel Computing: Trends and Applications. Amsterdam, The Netherlands: Elsevier, 1994, pp. 129-136.
-
(1994)
Parallel Computing: Trends and Applications
, pp. 129-136
-
-
Hegland, M.1
-
17
-
-
0025403252
-
FFTs in external or hierarchical memory
-
Mar
-
D. H. Bailey, "FFTs in external or hierarchical memory," J. Supercomput., vol. 4, no. 1, pp. 23-35, Mar. 1990.
-
(1990)
J. Supercomput
, vol.4
, Issue.1
, pp. 23-35
-
-
Bailey, D.H.1
-
18
-
-
0017725678
-
Vector radix fast Fourier transform
-
D. B. Harris, J. H. Mc Clellan, D. S. K. Chan, and H. W. Schuessler, "Vector radix fast Fourier transform," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 1977, pp. 548-551.
-
(1977)
Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP)
, pp. 548-551
-
-
Harris, D.B.1
Mc Clellan, J.H.2
Chan, D.S.K.3
Schuessler, H.W.4
-
19
-
-
84968470212
-
An algorithm for the machine calculation of complex Fourier series
-
Apr
-
J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput., vol. 19, pp. 297-301, Apr. 1965.
-
(1965)
Math. Comput
, vol.19
, pp. 297-301
-
-
Cooley, J.W.1
Tukey, J.W.2
-
20
-
-
0001316941
-
An adaptation of the fast Fourier transform for parallel processing
-
Apr
-
M. C. Pease, "An adaptation of the fast Fourier transform for parallel processing," J. ACM, vol. 15, no. 2, pp. 252-264, Apr. 1968.
-
(1968)
J. ACM
, vol.15
, Issue.2
, pp. 252-264
-
-
Pease, M.C.1
-
21
-
-
0023380592
-
Multiprocessor FFTs
-
July
-
P. N. Schwarztrauber, "Multiprocessor FFTs," Parallel Comput., vol. 5, pp. 197-210, July 1987.
-
(1987)
Parallel Comput
, vol.5
, pp. 197-210
-
-
Schwarztrauber, P.N.1
-
23
-
-
18844422753
-
SPL: A language and compiler for DSP algorithms
-
J. Xiong, J. Johnson, R. Johnson, and D. Padua, "SPL: A language and compiler for DSP algorithms," in Proc. Programming Language Design and Implementation (PLDI), 2001, pp. 298-308.
-
(2001)
Proc. Programming Language Design and Implementation (PLDI)
, pp. 298-308
-
-
Xiong, J.1
Johnson, J.2
Johnson, R.3
Padua, D.4
-
24
-
-
0029771732
-
Automatic generation of prime length FFT programs
-
I. W. Selesnick and C. S. Burrus, "Automatic generation of prime length FFT programs," IEEE Trans. Signal Processing, vol. 44, no. 1, pp. 14-24, 1996.
-
(1996)
IEEE Trans. Signal Processing
, vol.44
, Issue.1
, pp. 14-24
-
-
Selesnick, I.W.1
Burrus, C.S.2
-
25
-
-
33745236838
-
Loop merging for signal transforms
-
F. Franchetti, Y. Voronenko, and M. Püschel, "Loop merging for signal transforms," in Proc. Programming Language Design and Implementation (PLDI), 2005, pp. 315-326.
-
(2005)
Proc. Programming Language Design and Implementation (PLDI)
, pp. 315-326
-
-
Franchetti, F.1
Voronenko, Y.2
Püschel, M.3
-
26
-
-
84949653778
-
Automatic performance tuning in the UHFFT library
-
Proc. Int. Conf. Computational Science ICCS, New York: Springer-Verlag
-
D. Mirković and S. L. Johnsson, "Automatic performance tuning in the UHFFT library," in Proc. Int. Conf. Computational Science (ICCS) (Lecture Notes in Computer Science, vol. 2073). New York: Springer-Verlag, 2001, pp. 71-80.
-
(2001)
Lecture Notes in Computer Science
, vol.2073
, pp. 71-80
-
-
Mirković, D.1
Johnsson, S.L.2
-
28
-
-
51049115051
-
Library generation for linear transforms,
-
Ph.D. dissertation, Elect. Comput. Eng, Carnegie Mellon Univ, Pittsburgh, PA
-
Y. Voronenko, "Library generation for linear transforms," Ph.D. dissertation, Elect. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, 2008.
-
(2008)
-
-
Voronenko, Y.1
-
29
-
-
57049117343
-
How to write fast numerical code: A small introduction
-
Proc. Summer School on Generative and Transformational Techniques in Software Engineering GTTSE, Berlin: Springer-Verlag
-
S. Chellappa, F. Franchetti, and M. Püschel, "How to write fast numerical code: A small introduction," in Proc. Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE) (Lecture Notes in Computer Science, vol. 5235). Berlin: Springer-Verlag, 2008, pp. 196-259.
-
(2008)
Lecture Notes in Computer Science
, vol.5235
, pp. 196-259
-
-
Chellappa, S.1
Franchetti, F.2
Püschel, M.3
-
30
-
-
0025600627
-
-
J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures, IEEE Trans. Circuits, Syst., Signal Processing, 9, no. 4, pp. 449-500, 1990.
-
J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, "A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures," IEEE Trans. Circuits, Syst., Signal Processing, vol. 9, no. 4, pp. 449-500, 1990.
-
-
-
-
31
-
-
34548012397
-
Scheduling FFT computation on SMP and multicore systems
-
A. Ali, L. Johnsson, and J. Subhlok, "Scheduling FFT computation on SMP and multicore systems," in Proc. Int. Conf. Supercomputing (ICS), 2007, pp. 293-301.
-
(2007)
Proc. Int. Conf. Supercomputing (ICS)
, pp. 293-301
-
-
Ali, A.1
Johnsson, L.2
Subhlok, J.3
-
32
-
-
85032767023
-
-
OpenMP, 1998, OpenMP C and C, application program interface, version 1.0 [Online, Available
-
OpenMP. (1998). OpenMP C and C++ application program interface, version 1.0 [Online]. Available: Www.openmp.org
-
-
-
-
33
-
-
33748881911
-
-
Sebastopol, CA: O'Reilly
-
B. Gallmeister, POSIX.4. Sebastopol, CA: O'Reilly, 1994.
-
(1994)
POSIX.4
-
-
Gallmeister, B.1
-
34
-
-
47249121824
-
Generating SIMD vectorized permutations
-
Proc. Int. Conf. Compiler Construction CC, Berlin: Springer-Verlag
-
F. Franchetti and M. Püschel, "Generating SIMD vectorized permutations," in Proc. Int. Conf. Compiler Construction (CC) (Lecture Notes in Computer Science, vol. 4959). Berlin: Springer-Verlag, 2008, pp. 116-131.
-
(2008)
Lecture Notes in Computer Science
, vol.4959
, pp. 116-131
-
-
Franchetti, F.1
Püschel, M.2
-
37
-
-
38049144052
-
A rewriting system for the vectorization of signal transforms
-
F. Franchetti, Y. Voronenko, and M. Püschel, "A rewriting system for the vectorization of signal transforms," in Proc. High Performance Computing for Computational Science (VECPAR), 2006, pp. 363-377.
-
(2006)
Proc. High Performance Computing for Computational Science (VECPAR)
, pp. 363-377
-
-
Franchetti, F.1
Voronenko, Y.2
Püschel, M.3
-
38
-
-
51549098228
-
Formal datapath representation and manipulation for implementing DSP transforms
-
P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, "Formal datapath representation and manipulation for implementing DSP transforms," in Proc. Design Automation Conf. (DAC), 2008, pp. 385-390.
-
(2008)
Proc. Design Automation Conf. (DAC)
, pp. 385-390
-
-
Milder, P.A.1
Franchetti, F.2
Hoe, J.C.3
Püschel, M.4
-
39
-
-
84877021547
-
Multi-processor performance on the Tera MTA
-
A. Snavely, L. Carter, J. Boisseau, A. Majumdar, K. S. Gatlin, N. Mitchell, J. Feo, and B. Koblenz, "Multi-processor performance on the Tera MTA," in Proc. Supercomputing (SC), 1998, pp. 1-8.
-
(1998)
Proc. Supercomputing (SC)
, pp. 1-8
-
-
Snavely, A.1
Carter, L.2
Boisseau, J.3
Majumdar, A.4
Gatlin, K.S.5
Mitchell, N.6
Feo, J.7
Koblenz, B.8
-
41
-
-
35948931417
-
Cache-efficient numerical algorithms using graphics hardware
-
N. K. Govindaraju and D. Manocha, "Cache-efficient numerical algorithms using graphics hardware," Parallel Comput., vol. 33, no. 10-11, pp. 663-684, 2007.
-
(2007)
Parallel Comput
, vol.33
, Issue.10-11
, pp. 663-684
-
-
Govindaraju, N.K.1
Manocha, D.2
-
42
-
-
70350725973
-
Bandwidth intensive 3-d FFT kernel for GPUs using CUDA
-
A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka, "Bandwidth intensive 3-d FFT kernel for GPUs using CUDA," in Proc. Supercomputing (SC , 2008, pp. 1-11.
-
(2008)
Proc. Supercomputing (SC
, pp. 1-11
-
-
Nukada, A.1
Ogata, Y.2
Endo, T.3
Matsuoka, S.4
-
43
-
-
84870629709
-
-
Nvidia Corp, Online, Available
-
Nvidia Corp., Nvidia CUDA [Online]. Available: Www.nvidia.com/cuda
-
Nvidia CUDA
-
-
-
44
-
-
85032772531
-
-
Khronos Group, Online, Available
-
Khronos Group, OpenCL [Online]. Available: Www.khronos.org/opencl/
-
OpenCL
-
-
-
45
-
-
70350705613
-
Computer generation of fast Fourier transforms for the cell broadband engine
-
S. Chellappa, F. Franchetti, and M. Püschel, "Computer generation of fast Fourier transforms for the cell broadband engine," in Proc. Int. Conf. Supercomputing (ICS), 2009, pp. 26-35.
-
(2009)
Proc. Int. Conf. Supercomputing (ICS)
, pp. 26-35
-
-
Chellappa, S.1
Franchetti, F.2
Püschel, M.3
-
46
-
-
85032777085
-
-
A. C. Chow, G. C. Fossum, and D. A. Brokenshire, A programming example: Large FFT on the cell broadband engine, IBM, Tech. Rep., May 2005 [Online]. Available: https://www-01.ibm.com/chips/techlib/ techlib.nsf/techdocs/0AA2394A505EF0FB872570AB005BF0F1/$file/ GSPx_FFT_paper_legal_0115.pdf
-
A. C. Chow, G. C. Fossum, and D. A. Brokenshire, "A programming example: Large FFT on the cell broadband engine," IBM, Tech. Rep., May 2005 [Online]. Available: https://www-01.ibm.com/chips/techlib/ techlib.nsf/techdocs/0AA2394A505EF0FB872570AB005BF0F1/$file/ GSPx_FFT_paper_legal_0115.pdf
-
-
-
-
47
-
-
49949095381
-
A parallel 64K complex FFT algorithm for the IBM/ Sony/Toshiba cell broadband engine processor
-
J. Greene and R. Cooper, "A parallel 64K complex FFT algorithm for the IBM/ Sony/Toshiba cell broadband engine processor," in Proc. Global Signal Processing Expo (GSPx), 2005.
-
(2005)
Proc. Global Signal Processing Expo (GSPx)
-
-
Greene, J.1
Cooper, R.2
-
48
-
-
38349071299
-
Performance and programmability of the IBM/Sony/Toshiba cell broadband engine processor
-
L. Cico, R. Cooper, and J. Greene, "Performance and programmability of the IBM/Sony/Toshiba cell broadband engine processor," in Proc. (EDGE) Workshop, 2006.
-
(2006)
Proc. (EDGE) Workshop
-
-
Cico, L.1
Cooper, R.2
Greene, J.3
-
50
-
-
85032774781
-
-
4DSP Inc, 4DSP [Online, Available
-
4DSP Inc., 4DSP [Online]. Available: Www.4dsp.com/fft.htm
-
-
-
-
51
-
-
85032753583
-
-
Dillon Engineering, Dillon FFT [Online]. Available: Www.dilloneng.com/ fft_ip
-
Dillon Engineering, Dillon FFT [Online]. Available: Www.dilloneng.com/ fft_ip
-
-
-
|