-
2
-
-
84887456430
-
Managing shared lastlevel cache in a heterogeneous multicore processor
-
V. Mekkat, A. Holey, P.-C. Yew, and A. Zhai, "Managing shared lastlevel cache in a heterogeneous multicore processor," in International Conference on Parallel Architectures and Compilation Techniques, 2013.
-
(2013)
International Conference on Parallel Architectures and Compilation Techniques
-
-
Mekkat, V.1
Holey, A.2
Yew, P.-C.3
Zhai, A.4
-
3
-
-
84863550145
-
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
-
M. K. Jeong, M. Erez, C. Sudanthi, and N. Paver, "A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC," in Design Automation Conference (DAC), 2012, pp. 850-855.
-
(2012)
Design Automation Conference (DAC)
, pp. 850-855
-
-
Jeong, M.K.1
Erez, M.2
Sudanthi, C.3
Paver, N.4
-
4
-
-
84864843567
-
Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems
-
R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu, "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems," in 39th International Symposium on Computer Architecture (ISCA), 2012, pp. 416-427.
-
(2012)
39th International Symposium on Computer Architecture (ISCA)
, pp. 416-427
-
-
Ausavarungnirun, R.1
Chang, K.K.-W.2
Subramanian, L.3
Loh, G.H.4
Mutlu, O.5
-
5
-
-
84937711016
-
Managing GPU concurrency in heterogeneous architectures
-
O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das, "Managing GPU concurrency in heterogeneous architectures," in 47th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2014, pp. 114-126.
-
(2014)
47th IEEE/ACM International Symposium on Microarchitecture (MICRO)
, pp. 114-126
-
-
Kayiran, O.1
Nachiappan, N.C.2
Jog, A.3
Ausavarungnirun, R.4
Kandemir, M.T.5
Loh, G.H.6
Mutlu, O.7
Das, C.R.8
-
6
-
-
84887851142
-
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
-
J. Lee, S. Li, H. Kim, and S. Yalamanchili, "Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures," ACM Trans. Des. Autom. Electron. Syst., vol. 18, no. 4, pp. 48:1-48:28, 2013.
-
(2013)
ACM Trans. Des. Autom. Electron. Syst.
, vol.18
, Issue.4
, pp. 481-4828
-
-
Lee, J.1
Li, S.2
Kim, H.3
Yalamanchili, S.4
-
9
-
-
84921790112
-
Efficient breadth-first search on a heterogeneous processor
-
Oct
-
M. Daga, M. Nutter, and M. Meswani, "Efficient breadth-first search on a heterogeneous processor," in IEEE International Conference on Big Data, Oct. 2014, pp. 373-382.
-
(2014)
IEEE International Conference on Big Data
, pp. 373-382
-
-
Daga, M.1
Nutter, M.2
Meswani, M.3
-
10
-
-
84893233752
-
Parallel radix sort on the amd fusion accelerated processing unit
-
Oct
-
M. C. Delorme, T. S. Abdelrahman, and C. Zhao, "Parallel radix sort on the amd fusion accelerated processing unit," in 42nd International Conference on Parallel Processing, Oct. 2013, pp. 339-348.
-
(2013)
42nd International Conference on Parallel Processing
, pp. 339-348
-
-
Delorme, M.C.1
Abdelrahman, T.S.2
Zhao, C.3
-
11
-
-
84891121339
-
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
-
Aug
-
J. He, M. Lu, and B. He, "Revisiting co-processing for hash joins on the coupled CPU-GPU architecture," Proc. VLDB Endow., vol. 6, no. 10, pp. 889-900, Aug. 2013.
-
(2013)
Proc. VLDB Endow.
, vol.6
, Issue.10
, pp. 889-900
-
-
He, J.1
Lu, M.2
He, B.3
-
12
-
-
84962320479
-
GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors
-
J. Hestness, S. W. Keckler, and D. A. Wood, "GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors," in IEEE International Symposium on Workload Characterization (IISWC), 2015, pp. 87-97.
-
(2015)
IEEE International Symposium on Workload Characterization (IISWC)
, pp. 87-97
-
-
Hestness, J.1
Keckler, S.W.2
Wood, D.A.3
-
13
-
-
84942000186
-
Speculative segmented sum for sparse matrixvector multiplication on heterogeneous processors
-
Nov
-
W. Liu and B. Vinter, "Speculative segmented sum for sparse matrixvector multiplication on heterogeneous processors," Parallel Comput., vol. 49, no. C, pp. 179-193, Nov. 2015.
-
(2015)
Parallel Comput.
, vol.49
, Issue.C
, pp. 179-193
-
-
Liu, W.1
Vinter, B.2
-
18
-
-
85006870269
-
-
NVIDIA. [Online]
-
NVIDIA. (2015) NVIDIA Tegra X1. [Online]. Available: http://www.nvidia.com/object/tegra-x1-processor.html
-
(2015)
NVIDIA Tegra X1
-
-
-
19
-
-
84994702846
-
-
Compute Cores. [Online]
-
Compute Cores. Whitepaper, AMD, 2014. [Online]. Available: https://www.amd.com/Documents/Compute-Cores-Whitepaper.pdf
-
(2014)
Whitepaper AMD
-
-
-
20
-
-
84991631571
-
-
Intel Corporation. [Online]
-
Intel Corporation. (2015) The compute architecture of Intel processor graphics Gen9. [Online]. Available: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf
-
(2015)
The Compute Architecture of Intel Processor Graphics Gen9
-
-
-
22
-
-
84994749708
-
-
Exynos 5. [Online]
-
Exynos 5. Whitepaper, Samsung, 2012. [Online]. Available: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/data/Enjoy-the-Ultimate-WQXGA-Solution-with-Exynos-5-Dual-WP.pdf
-
(2012)
Whitepaper, Samsung
-
-
-
23
-
-
0018518477
-
How to make a multiprocessor computer that correctly executes multiprocess programs
-
Sep
-
L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs," IEEE Trans. Comput., vol. 28, no. 9, pp. 690-691, Sep. 1979.
-
(1979)
IEEE Trans. Comput.
, vol.28
, Issue.9
, pp. 690-691
-
-
Lamport, L.1
-
24
-
-
70350341656
-
A better x86 memory model: X86-TSO
-
S. Owens, S. Sarkar, and P. Sewell, "A better x86 memory model: X86-TSO," in 22Nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009, pp. 391-407.
-
(2009)
22Nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs)
, pp. 391-407
-
-
Owens, S.1
Sarkar, S.2
Sewell, P.3
-
25
-
-
0030382365
-
Shared memory consistency models: A tutorial
-
Dec
-
S. V. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," Computer, vol. 29, no. 12, pp. 66-76, Dec. 1996.
-
(1996)
Computer
, vol.29
, Issue.12
, pp. 66-76
-
-
Adve, S.V.1
Gharachorloo, K.2
-
26
-
-
22944444506
-
Parallelism and the ARM instruction set architecture
-
Jul
-
J. Goodacre and A. N. Sloss, "Parallelism and the ARM instruction set architecture," Computer, vol. 38, no. 7, pp. 42-50, Jul. 2005.
-
(2005)
Computer
, vol.38
, Issue.7
, pp. 42-50
-
-
Goodacre, J.1
Sloss, A.N.2
-
28
-
-
84994697544
-
-
GNC Architecture. [Online]
-
GNC Architecture. Whitepaper, AMD, 2012. [Online]. Available: https://www.amd.com/Documents/GCNArchitecture whitepaper.pdf
-
(2012)
Whitepaper AMD
-
-
-
29
-
-
84892508861
-
Heterogeneous system coherence for integrated CPU-GPU systems
-
J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood, "Heterogeneous system coherence for integrated CPU-GPU systems," in 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 457-467.
-
(2013)
46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
, pp. 457-467
-
-
Power, J.1
Basu, A.2
Gu, J.3
Puthoor, S.4
Beckmann, B.M.5
Hill, M.D.6
Reinhardt, S.K.7
Wood, D.A.8
-
30
-
-
35348920021
-
Adaptive insertion policies for high performance caching
-
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer, "Adaptive insertion policies for high performance caching," in 34th International Symposium on Computer Architecture (ISCA), 2007, pp. 381-391.
-
(2007)
34th International Symposium on Computer Architecture (ISCA)
, pp. 381-391
-
-
Qureshi, M.K.1
Jaleel, A.2
Patt, Y.N.3
Steely, S.C.4
Emer, J.5
-
31
-
-
84932617613
-
Gem5-GPU: A heterogeneous CPU-GPU simulator
-
Jan
-
J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood, "gem5-gpu: A heterogeneous CPU-GPU simulator," IEEE Computer Architecture Letters, vol. 14, no. 1, pp. 34-36, Jan. 2015.
-
(2015)
IEEE Computer Architecture Letters
, vol.14
, Issue.1
, pp. 34-36
-
-
Power, J.1
Hestness, J.2
Orr, M.S.3
Hill, M.D.4
Wood, D.A.5
-
32
-
-
84966338604
-
The gem5 simulator
-
Aug
-
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
-
(2011)
SIGARCH Comput. Archit. News
, vol.39
, Issue.2
, pp. 1-7
-
-
Binkert, N.1
Beckmann, B.2
Black, G.3
Reinhardt, S.K.4
Saidi, A.5
Basu, A.6
Hestness, J.7
Hower, D.R.8
Krishna, T.9
Sardashti, S.10
Sen, R.11
Sewell, K.12
Shoaib, M.13
Vaish, N.14
Hill, M.D.15
Wood, D.A.16
-
33
-
-
70349169075
-
Analyzing CUDA workloads using a detailed GPU simulator
-
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in International Symposium on Performance Analysis of Systems and Software, 2009.
-
(2009)
International Symposium on Performance Analysis of Systems and Software
-
-
Bakhoda, A.1
Yuan, G.2
Fung, W.3
Wong, H.4
Aamodt, T.5
-
34
-
-
70049105948
-
GARNET: A detailed on-chip network model inside a full-system simulator
-
N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha, "GARNET: A detailed on-chip network model inside a full-system simulator," in International Symposium on Performance Analysis of Systems and Software, 2009.
-
(2009)
International Symposium on Performance Analysis of Systems and Software
-
-
Agarwal, N.1
Krishna, T.2
Peh, L.S.3
Jha, N.K.4
-
36
-
-
70649092154
-
Rodinia: A benchmark suite for heterogeneous computing
-
Oct
-
S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in International Symposium on Workload Characterization, Oct. 2009.
-
(2009)
International Symposium on Workload Characterization
-
-
Che, S.1
Boyer, M.2
Meng, J.3
Tarjan, D.4
Sheaffer, J.5
Lee, S.-H.6
Skadron, K.7
-
37
-
-
84875979403
-
Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems
-
ACM
-
P. Mistry, Y. Ukidave, D. Schaa, and D. Kaeli, "Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems," in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM, 2013, pp. 54-65.
-
(2013)
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6)
, pp. 54-65
-
-
Mistry, P.1
Ukidave, Y.2
Schaa, D.3
Kaeli, D.4
-
39
-
-
84994785664
-
La Sapienza
-
University of Rome
-
University of Rome "La Sapienza", "9th DIMACS Implementation Challenge," 2014, http://www.dis.uniroma1.it/challenge9/index.shtml.
-
(2014)
9th DIMACS Implementation Challenge
-
-
-
40
-
-
84976501593
-
In-place data sliding algorithms for many-core architectures
-
Sep
-
J. Gómez Luna, L.-W. Chang, I.-J. Sung, W.-M. Hwu, and N. Guil, "In-place data sliding algorithms for many-core architectures," in 44th International Conference on Parallel Processing (ICPP), Sep. 2015.
-
(2015)
44th International Conference on Parallel Processing (ICPP)
-
-
Gómez Luna, J.1
Chang, L.-W.2
Sung, I.-J.3
Hwu, W.-M.4
Guil, N.5
-
42
-
-
84879555900
-
An optimized approach to histogram computation on GPU
-
J. Gómez-Luna, J. M. González-Linares, J. I. Benavides, and N. Guil, "An optimized approach to histogram computation on GPU," Machine Vision and Applications, vol. 24, no. 5, pp. 899-908, 2013.
-
(2013)
Machine Vision and Applications
, vol.24
, Issue.5
, pp. 899-908
-
-
Gómez-Luna, J.1
González-Linares, J.M.2
Benavides, J.I.3
Guil, N.4
-
43
-
-
84906545139
-
Egomotion compensation and moving objects detection algorithm on GPU
-
IOS Press
-
J. Gómez-Luna, H. Endt, W. Stechele, J. M. González-Linares, J. I. Benavides, and N. Guil, "Egomotion compensation and moving objects detection algorithm on GPU," in Applications, Tools and Techniques on the Road to Exascale Computing, ser. Advances in Parallel Computing, vol. 22. IOS Press, 2011, pp. 183-190.
-
(2011)
Applications, Tools and Techniques on the Road to Exascale Computing, Ser. Advances in Parallel Computing
, vol.22
, pp. 183-190
-
-
Gómez-Luna, J.1
Endt, H.2
Stechele, W.3
González-Linares, J.M.4
Benavides, J.I.5
Guil, N.6
-
44
-
-
84994695059
-
-
Intel Corporation. [Online]
-
Intel Corporation. (2013) Products (formerly Haswell). [Online]. Available: http://ark.intel.com/products/codename/42174/Haswell
-
(2013)
Products (Formerly Haswell)
-
-
-
45
-
-
78751505898
-
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
-
Dec
-
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in IEEE International Symposium on Workload Characterization (IISWC), Dec. 2010.
-
(2010)
IEEE International Symposium on Workload Characterization (IISWC)
-
-
Che, S.1
Sheaffer, J.W.2
Boyer, M.3
Szafaryn, L.G.4
Wang, L.5
Skadron, K.6
|