-
2
-
-
77951472684
-
Direct N-body kernels for multicore platforms
-
N. Arora, A. Shringarpure, and R. W. Vuduc. Direct N-body Kernels for Multicore Platforms. In ICPP, pages 379-387, 2009.
-
(2009)
ICPP
, pp. 379-387
-
-
Arora, N.1
Shringarpure, A.2
Vuduc, R.W.3
-
3
-
-
35648995516
-
The landscape of parallel computing research: A view from berkeley
-
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-183, 2006.
-
(2006)
Technical Report UCB/EECS-183
-
-
Asanovic, K.1
Bodik, R.2
Catanzaro, B.C.3
Gebis, J.J.4
Husbands, P.5
Keutzer, K.6
Patterson, D.A.7
Plishker, W.L.8
Shalf, J.9
Williams, S.W.10
Yelick, K.A.11
-
4
-
-
63549095070
-
The PARSEC benchmark suite: Characterization and architectural implications
-
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, pages 72-81, 2008.
-
(2008)
PACT
, pp. 72-81
-
-
Bienia, C.1
Kumar, S.2
Singh, J.P.3
Li, K.4
-
5
-
-
85015692260
-
The pricing of options and corporate liabilities
-
F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637-654, 1973.
-
(1973)
Journal of Political Economy
, vol.81
, Issue.3
, pp. 637-654
-
-
Black, F.1
Scholes, M.2
-
6
-
-
77954942935
-
Low depth cache-oblivious algorithms
-
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In SPAA, pages 189-199, 2010.
-
(2010)
SPAA
, pp. 189-199
-
-
Blelloch, G.E.1
Gibbons, P.B.2
Simhadri, H.V.3
-
7
-
-
79960806724
-
Can CPUs match GPUs on performance with productivity?: Experiences with optimizing aFLOP-intensive application on CPUs and GPU
-
August
-
R. Bordawekar, U. Bondhugula, and R. Rao. Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing aFLOP-intensive Application on CPUs and GPU. IBM Research Report, RC25033, August 2010.
-
(2010)
IBM Research Report, RC25033
-
-
Bordawekar, R.1
Bondhugula, U.2
Rao, R.3
-
8
-
-
0031489544
-
The market model of interest rate dynamics
-
A. Brace, D. Gatarek, and M. Musiela. The Market Model of Interest Rate Dynamics. Mathematical Finance, 7(2):127-155, 1997.
-
(1997)
Mathematical Finance
, vol.7
, Issue.2
, pp. 127-155
-
-
Brace, A.1
Gatarek, D.2
Musiela, M.3
-
10
-
-
85184648002
-
-
R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, and J. McDonald. Parallel Programming in OpenMP, 2010.
-
(2010)
Parallel Programming in OpenMP
-
-
Chandra, R.1
Menon, R.2
Dagum, L.3
Kohr, D.4
Maydan, D.5
McDonald, J.6
-
11
-
-
49249135216
-
Onvergence of recognition, mining, and synthesis workloads and its implications
-
Y. K. Chen, J. Chhugani, P. Dubey, C. J. Hughes, D. Kim, S. Kumar, et al. onvergence of recognition, mining, and synthesis workloads and its implications. Proceedings of the IEEE, 96(5):790-807, 2008.
-
(2008)
Proceedings of the IEEE
, vol.96
, Issue.5
, pp. 790-807
-
-
Chen, Y.K.1
Chhugani, J.2
Dubey, P.3
Hughes, C.J.4
Kim, D.5
Kumar, S.6
-
12
-
-
84865096511
-
Efficient implementation of sorting on multi-core simd cpu architecture
-
J. Chhugani, A. D. Nguyen, et al. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313-1324, 2008.
-
(2008)
PVLDB
, vol.1
, Issue.2
, pp. 1313-1324
-
-
Chhugani, J.1
Nguyen, A.D.2
-
16
-
-
36949031604
-
A platform 2015 workload model: Recognition, miniming and synthesis moves computers to the era of tera
-
P. Dubey. A Platform 2015 Workload Model: Recognition, Miniming and Synthesis Moves Computers to the Era of Tera. Intel, 2005.
-
(2005)
Intel
-
-
Dubey, P.1
-
17
-
-
8344245462
-
Vectorization for simd architectures with alignment constraints
-
A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for simd architectures with alignment constraints. In PLDI, pages 82-93, 2004.
-
(2004)
PLDI
, pp. 82-93
-
-
Eichenberger, A.E.1
Wu, P.2
O'brien, K.3
-
18
-
-
78650646788
-
Joint forces: From multithreaded programming to GPU computing
-
January
-
F. Feinbube, P. Troger, and A. Polze. Joint Forces: From Multithreaded Programming to GPU Computing. IEEE Softw., 28:51-57, January 2011.
-
(2011)
IEEE Softw.
, vol.28
, pp. 51-57
-
-
Feinbube, F.1
Troger, P.2
Polze, A.3
-
20
-
-
0042482650
-
'N-body' problems in statistical learning
-
A. G. Gray and A. W. Moore. 'N-Body' Problems in Statistical Learning. In NIPS, pages 521-527, 2000.
-
(2000)
NIPS
, pp. 521-527
-
-
Gray, A.G.1
Moore, A.W.2
-
21
-
-
56849108794
-
A portable runtime interface for multi-level memory hierarchies
-
M. Houston, J.-Y. Park, M. Ren, T. Knight, K. Fatahalian, A. Aiken, W. Dally, and P. Hanrahan. A portable runtime interface for multi-level memory hierarchies. In PPoPP, pages 143-152, 2008.
-
(2008)
PPoPP
, pp. 143-152
-
-
Houston, M.1
Park, J.-Y.2
Ren, M.3
Knight, T.4
Fatahalian, K.5
Aiken, A.6
Dally, W.7
Hanrahan, P.8
-
24
-
-
85184646781
-
-
Intel. Optimization Notice. http://software.intel.com/en-us/articles/ optimization-notice/, 2012.
-
(2012)
Optimization Notice
-
-
-
25
-
-
78650874239
-
Performance evaluation of convolution on the cell broadband engine processor
-
L. Ismail and D. Guerchi. Performance Evaluation of Convolution on the Cell Broadband Engine Processor. IEEE PDS, 22(2):337-351, 2011.
-
(2011)
IEEE PDS
, vol.22
, Issue.2
, pp. 337-351
-
-
Ismail, L.1
Guerchi, D.2
-
27
-
-
77954696758
-
Cache topology aware computation mapping for multicores
-
M. Kandemir, T. Yemliha, S. Muralidhara, S. Srikantaiah, M. Irwin, et al. Cache topology aware computation mapping for multicores. In PLDI, 2010.
-
(2010)
PLDI
-
-
Kandemir, M.1
Yemliha, T.2
Muralidhara, S.3
Srikantaiah, S.4
Irwin, M.5
-
28
-
-
77954701719
-
FAST: Fast architecture sensitive tree search on modern CPUs and GPUs
-
C. Kim, J. Chhugani, N. Satish, et al. FAST: Fast Architecture Sensitive Tree search on modern CPUs and GPUs. In SIGMOD, pages 339-350, 2010.
-
(2010)
SIGMOD
, pp. 339-350
-
-
Kim, C.1
Chhugani, J.2
Satish, N.3
-
29
-
-
84864839397
-
Closing the ninja performance gap through traditional programming and compiler technology
-
C. Kim, N. Satish, J. Chhugani, et al. Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology. Technical report, Intel Labs, 2011.
-
(2011)
Technical Report Intel Labs
-
-
Kim, C.1
Satish, N.2
Chhugani, J.3
-
30
-
-
77954995885
-
Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU
-
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey.Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. ISCA, pages 451-460, 2010.
-
(2010)
ISCA
, pp. 451-460
-
-
Lee, V.W.1
Kim, C.2
Chhugani, J.3
Deisher, M.4
Kim, D.5
Nguyen, A.D.6
Satish, N.7
Smelyanskiy, M.8
Chennupaty, S.9
Hammarlund, P.10
Singhal, R.11
Dubey, P.12
-
31
-
-
78650666949
-
A synergetic approach to throughput computing on x86-based multicore desktops
-
C.-K. Luk, R. Newton, et al. A synergetic approach to throughput computing on x86-based multicore desktops. IEEE Software, 28:39-50, 2011.
-
(2011)
IEEE Software
, vol.28
, pp. 39-50
-
-
Luk, C.-K.1
Newton, R.2
-
32
-
-
0035311079
-
Power: A first-class architectural design constraint
-
T. N. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4):52-58, 2001.
-
(2001)
IEEE Computer
, vol.34
, Issue.4
, pp. 52-58
-
-
Mudge, T.N.1
-
33
-
-
78650806116
-
3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
-
A. Nguyen, N. Satish, et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In SC10, pages 1-13, 2010.
-
(2010)
SC10
, pp. 1-13
-
-
Nguyen, A.1
Satish, N.2
-
34
-
-
79953275887
-
Multi-platform auto-vectorization
-
D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, pages 281-294, 2006.
-
(2006)
CGO
, pp. 281-294
-
-
Nuzman, D.1
Henderson, R.2
-
35
-
-
63549093768
-
Outer-loop vectorization: Revisited for short simd architectures
-
D. Nuzman and A. Zaks. Outer-loop vectorization: revisited for short simd architectures. In PACT, pages 2-11, 2008.
-
(2008)
PACT
, pp. 2-11
-
-
Nuzman, D.1
Zaks, A.2
-
38
-
-
85184635665
-
Black-Scholes option pricing
-
V. Podlozhnyuk. Black-Scholes option pricing. Nvidia, 2007.
-
(2007)
Nvidia
-
-
Podlozhnyuk, V.1
-
39
-
-
79959466764
-
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
-
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, pages 73-82, 2008.
-
(2008)
PPoPP
, pp. 73-82
-
-
Ryoo, S.1
Rodrigues, C.I.2
Baghsorkhi, S.S.3
Stone, S.S.4
Kirk, D.B.5
Mei, W.6
Hwu, W.7
-
40
-
-
77954743119
-
Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort
-
N. Satish, C. Kim, J. Chhugani, et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351-362, 2010.
-
(2010)
SIGMOD
, pp. 351-362
-
-
Satish, N.1
Kim, C.2
Chhugani, J.3
-
41
-
-
49249086142
-
Larrabee: A many-core x86 architecture for visual computing
-
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A Many-Core x86 Architecture for Visual Computing. SIGGRAPH, 27(3), 2008.
-
(2008)
SIGGRAPH
, vol.27
, Issue.3
-
-
Seiler, L.1
Carmean, D.2
Sprangle, E.3
Forsyth, T.4
Abrash, M.5
Dubey, P.6
Junkins, S.7
Lake, A.8
Sugerman, J.9
Cavin, R.10
Espasa, R.11
Grochowski, E.12
Juan, T.13
Hanrahan, P.14
-
43
-
-
70350681243
-
Mapping high-fidelity volume rendering for medical imaging to CPU, GPU and many-core architectures
-
M. Smelyanskiy, D. Holmes, et al. Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures. IEEE Trans. Vis. Comput. Graph., 15(6):1563-1570, 2009.
-
(2009)
IEEE Trans. Vis. Comput. Graph.
, vol.15
, Issue.6
, pp. 1563-1570
-
-
Smelyanskiy, M.1
Holmes, D.2
-
45
-
-
67650998701
-
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
-
S. Williams, J. Carter, L. Oliker, J. Shalf, and K. A. Yelick. Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms. J. Parallel Distrib. Comput., 69(9):762-777, 2009.
-
(2009)
J. Parallel Distrib. Comput.
, vol.69
, Issue.9
, pp. 762-777
-
-
Williams, S.1
Carter, J.2
Oliker, L.3
Shalf, J.4
Yelick, K.A.5
-
46
-
-
77952554764
-
An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth
-
D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. S. Lee. An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth. In HPCA, pages 1-12, 2010.
-
(2010)
HPCA
, pp. 1-12
-
-
Woo, D.H.1
Seong, N.H.2
Lewis, D.L.3
Lee, H.-H.S.4
-
47
-
-
77954691442
-
A GPGPU compiler for memory optimization and parallelism management
-
Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, pages 86-97, 2010.
-
(2010)
PLDI
, pp. 86-97
-
-
Yang, Y.1
Xiang, P.2
Kong, J.3
Zhou, H.4
-
48
-
-
77954699806
-
Bamboo: A data-centric, object-oriented approach to many-core software
-
J. Zhou and B. Demsky. Bamboo: a data-centric, object-oriented approach to many-core software. In PLDI, pages 388-399, 2010.
-
(2010)
PLDI
, pp. 388-399
-
-
Zhou, J.1
Demsky, B.2
|