SCOPUS 정보 검색 플랫폼

Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016

Volumn , Issue , 2016, Pages 168-177

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications

(6) García, Víctor a,b Gómez Luna, Juan c Grass, Thomas a,b Rico, Alejandro d Ayguade, Eduard a,b Peña, Antonio J b

a UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

b BARCELONA SUPERCOMPUTING CENTER (Spain)

c UNIVERSITY OF CÓRDOBA (Spain)

d ARM Inc (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER GRAPHICS; ENERGY EFFICIENCY; MEMORY ARCHITECTURE; UBIQUITOUS COMPUTING; VIRTUAL ADDRESSES;

GRAPHICS PROCESSING UNITS; HETEROGENEOUS COMPUTATION; HETEROGENEOUS PROGRAMMING; HETEROGENEOUS SYSTEMS; HIGH PERFORMANCE COMPUTING (HPC); INTEGRATED ARCHITECTURE; LASTLEVEL CACHES (LLC); VIRTUAL ADDRESS SPACE;

PROGRAM PROCESSORS;

EID: 84994741673 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IISWC.2016.7581277 Document Type: Conference Paper

Times cited : (22)

References (45)

1
- 84860351946
- Tap: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture
- J. Lee and H. Kim, "Tap: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture," in IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA), 2012.
- (2012) IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA)
- Lee, J.¹ Kim, H.²

2
- 84887456430
- Managing shared lastlevel cache in a heterogeneous multicore processor
- V. Mekkat, A. Holey, P.-C. Yew, and A. Zhai, "Managing shared lastlevel cache in a heterogeneous multicore processor," in International Conference on Parallel Architectures and Compilation Techniques, 2013.
- (2013) International Conference on Parallel Architectures and Compilation Techniques
- Mekkat, V.¹ Holey, A.² Yew, P.-C.³ Zhai, A.⁴

3
- 84863550145
- A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
- M. K. Jeong, M. Erez, C. Sudanthi, and N. Paver, "A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC," in Design Automation Conference (DAC), 2012, pp. 850-855.
- (2012) Design Automation Conference (DAC) , pp. 850-855
- Jeong, M.K.¹ Erez, M.² Sudanthi, C.³ Paver, N.⁴

4
- 84864843567
- Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems
- R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu, "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems," in 39th International Symposium on Computer Architecture (ISCA), 2012, pp. 416-427.
- (2012) 39th International Symposium on Computer Architecture (ISCA) , pp. 416-427
- Ausavarungnirun, R.¹ Chang, K.K.-W.² Subramanian, L.³ Loh, G.H.⁴ Mutlu, O.⁵

5
- 84937711016
- Managing GPU concurrency in heterogeneous architectures
- O. Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das, "Managing GPU concurrency in heterogeneous architectures," in 47th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2014, pp. 114-126.
- (2014) 47th IEEE/ACM International Symposium on Microarchitecture (MICRO) , pp. 114-126
- Kayiran, O.¹ Nachiappan, N.C.² Jog, A.³ Ausavarungnirun, R.⁴ Kandemir, M.T.⁵ Loh, G.H.⁶ Mutlu, O.⁷ Das, C.R.⁸

6
- 84887851142
- Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
- J. Lee, S. Li, H. Kim, and S. Yalamanchili, "Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures," ACM Trans. Des. Autom. Electron. Syst., vol. 18, no. 4, pp. 48:1-48:28, 2013.
- (2013) ACM Trans. Des. Autom. Electron. Syst. , vol.18 , Issue.4 , pp. 481-4828
- Lee, J.¹ Li, S.² Kim, H.³ Yalamanchili, S.⁴

7
- 80054998134
- On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing
- M. Daga, A. M. Aji, and W. Feng, "On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing," in Symposium on Application Accelerators in High-Performance Computing, 2011.
- (2011) Symposium on Application Accelerators in High-Performance Computing
- Daga, M.¹ Aji, A.M.² Feng, W.³

8
- 84876574211
- Exploiting coarse-grained parallelism in B+ tree searches on an APU
- M. Daga and M. Nutter, "Exploiting coarse-grained parallelism in B+ tree searches on an APU," in SC Companion: High Performance Computing, Networking Storage and Analysis (SCC), 2012.
- (2012) SC Companion: High Performance Computing, Networking Storage and Analysis (SCC)
- Daga, M.¹ Nutter, M.²

9
- 84921790112
- Efficient breadth-first search on a heterogeneous processor
- Oct
- M. Daga, M. Nutter, and M. Meswani, "Efficient breadth-first search on a heterogeneous processor," in IEEE International Conference on Big Data, Oct. 2014, pp. 373-382.
- (2014) IEEE International Conference on Big Data , pp. 373-382
- Daga, M.¹ Nutter, M.² Meswani, M.³

10
- 84893233752
- Parallel radix sort on the amd fusion accelerated processing unit
- Oct
- M. C. Delorme, T. S. Abdelrahman, and C. Zhao, "Parallel radix sort on the amd fusion accelerated processing unit," in 42nd International Conference on Parallel Processing, Oct. 2013, pp. 339-348.
- (2013) 42nd International Conference on Parallel Processing , pp. 339-348
- Delorme, M.C.¹ Abdelrahman, T.S.² Zhao, C.³

11
- 84891121339
- Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
- Aug
- J. He, M. Lu, and B. He, "Revisiting co-processing for hash joins on the coupled CPU-GPU architecture," Proc. VLDB Endow., vol. 6, no. 10, pp. 889-900, Aug. 2013.
- (2013) Proc. VLDB Endow. , vol.6 , Issue.10 , pp. 889-900
- He, J.¹ Lu, M.² He, B.³

12
- 84962320479
- GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors
- J. Hestness, S. W. Keckler, and D. A. Wood, "GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors," in IEEE International Symposium on Workload Characterization (IISWC), 2015, pp. 87-97.
- (2015) IEEE International Symposium on Workload Characterization (IISWC) , pp. 87-97
- Hestness, J.¹ Keckler, S.W.² Wood, D.A.³

13
- 84942000186
- Speculative segmented sum for sparse matrixvector multiplication on heterogeneous processors
- Nov
- W. Liu and B. Vinter, "Speculative segmented sum for sparse matrixvector multiplication on heterogeneous processors," Parallel Comput., vol. 49, no. C, pp. 179-193, Nov. 2015.
- (2015) Parallel Comput. , vol.49 , Issue.C , pp. 179-193
- Liu, W.¹ Vinter, B.²

14
- 84888133920
- [Online]
- Heterogeneous System Architecture: A Technical Review, AMD, 2012. [Online]. Available: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf
- (2012) Heterogeneous System Architecture: A Technical Review AMD

15
- 84994749699
- Morgan Kaufman
- W. Hwu, Heterogeneous system architecture. Morgan Kaufman, 2016.
- (2016) Heterogeneous System Architecture
- Hwu, W.¹

16
- 84994749696
- [Online]
- The OpenCL Specification v2.0, Khronos OpenCL Working Group, 2015. [Online]. Available: https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
- (2015) The OpenCL Specification v2.0, Khronos OpenCL Working Group

17
- 84994697480
- [Online]
- CUDA C Programming Guide, NVIDIA Corporation, 2014. [Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/
- (2014) CUDA C Programming Guide, NVIDIA Corporation

18
- 85006870269
- NVIDIA. [Online]
- NVIDIA. (2015) NVIDIA Tegra X1. [Online]. Available: http://www.nvidia.com/object/tegra-x1-processor.html
- (2015) NVIDIA Tegra X1

19
- 84994702846
- Compute Cores. [Online]
- Compute Cores. Whitepaper, AMD, 2014. [Online]. Available: https://www.amd.com/Documents/Compute-Cores-Whitepaper.pdf
- (2014) Whitepaper AMD

20
- 84991631571
- Intel Corporation. [Online]
- Intel Corporation. (2015) The compute architecture of Intel processor graphics Gen9. [Online]. Available: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf
- (2015) The Compute Architecture of Intel Processor Graphics Gen9

21
- 84994722404
- Qualcomm. [Online]
- Qualcomm. (2013) Snapdragon S4 processors: System on chip solutions for a new mobile age. Whitepaper. [Online]. Available: https://www.qualcomm.com/documents/snapdragon-s4-processorssystem-chip-solutions-new-mobile-age
- (2013) Snapdragon S4 Processors: System on Chip Solutions for A New Mobile Age. Whitepaper

22
- 84994749708
- Exynos 5. [Online]
- Exynos 5. Whitepaper, Samsung, 2012. [Online]. Available: http://www.samsung.com/global/business/semiconductor/minisite/Exynos/data/Enjoy-the-Ultimate-WQXGA-Solution-with-Exynos-5-Dual-WP.pdf
- (2012) Whitepaper, Samsung

23
- 0018518477
- How to make a multiprocessor computer that correctly executes multiprocess programs
- Sep
- L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs," IEEE Trans. Comput., vol. 28, no. 9, pp. 690-691, Sep. 1979.
- (1979) IEEE Trans. Comput. , vol.28 , Issue.9 , pp. 690-691
- Lamport, L.¹

24
- 70350341656
- A better x86 memory model: X86-TSO
- S. Owens, S. Sarkar, and P. Sewell, "A better x86 memory model: X86-TSO," in 22Nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009, pp. 391-407.
- (2009) 22Nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs) , pp. 391-407
- Owens, S.¹ Sarkar, S.² Sewell, P.³

25
- 0030382365
- Shared memory consistency models: A tutorial
- Dec
- S. V. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," Computer, vol. 29, no. 12, pp. 66-76, Dec. 1996.
- (1996) Computer , vol.29 , Issue.12 , pp. 66-76
- Adve, S.V.¹ Gharachorloo, K.²

26
- 22944444506
- Parallelism and the ARM instruction set architecture
- Jul
- J. Goodacre and A. N. Sloss, "Parallelism and the ARM instruction set architecture," Computer, vol. 38, no. 7, pp. 42-50, Jul. 2005.
- (2005) Computer , vol.38 , Issue.7 , pp. 42-50
- Goodacre, J.¹ Sloss, A.N.²

27
- 84881167280
- Exploring memory consistency for massively-threaded throughput-oriented processors
- B. A. Hechtman and D. J. Sorin, "Exploring memory consistency for massively-threaded throughput-oriented processors," in 40th International Symposium on Computer Architecture (ISCA), 2013, pp. 201-212.
- (2013) 40th International Symposium on Computer Architecture (ISCA) , pp. 201-212
- Hechtman, B.A.¹ Sorin, D.J.²

28
- 84994697544
- GNC Architecture. [Online]
- GNC Architecture. Whitepaper, AMD, 2012. [Online]. Available: https://www.amd.com/Documents/GCNArchitecture whitepaper.pdf
- (2012) Whitepaper AMD

29
- 84892508861
- Heterogeneous system coherence for integrated CPU-GPU systems
- J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood, "Heterogeneous system coherence for integrated CPU-GPU systems," in 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 457-467.
- (2013) 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , pp. 457-467
- Power, J.¹ Basu, A.² Gu, J.³ Puthoor, S.⁴ Beckmann, B.M.⁵ Hill, M.D.⁶ Reinhardt, S.K.⁷ Wood, D.A.⁸

30
- 35348920021
- Adaptive insertion policies for high performance caching
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer, "Adaptive insertion policies for high performance caching," in 34th International Symposium on Computer Architecture (ISCA), 2007, pp. 381-391.
- (2007) 34th International Symposium on Computer Architecture (ISCA) , pp. 381-391
- Qureshi, M.K.¹ Jaleel, A.² Patt, Y.N.³ Steely, S.C.⁴ Emer, J.⁵

31
- 84932617613
- Gem5-GPU: A heterogeneous CPU-GPU simulator
- Jan
- J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood, "gem5-gpu: A heterogeneous CPU-GPU simulator," IEEE Computer Architecture Letters, vol. 14, no. 1, pp. 34-36, Jan. 2015.
- (2015) IEEE Computer Architecture Letters , vol.14 , Issue.1 , pp. 34-36
- Power, J.¹ Hestness, J.² Orr, M.S.³ Hill, M.D.⁴ Wood, D.A.⁵

32
- 84966338604
- The gem5 simulator
- Aug
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
- (2011) SIGARCH Comput. Archit. News , vol.39 , Issue.2 , pp. 1-7
- Binkert, N.¹ Beckmann, B.² Black, G.³ Reinhardt, S.K.⁴ Saidi, A.⁵ Basu, A.⁶ Hestness, J.⁷ Hower, D.R.⁸ Krishna, T.⁹ Sardashti, S.¹⁰ Sen, R.¹¹ Sewell, K.¹² Shoaib, M.¹³ Vaish, N.¹⁴ Hill, M.D.¹⁵ Wood, D.A.¹⁶

33
- 70349169075
- Analyzing CUDA workloads using a detailed GPU simulator
- A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," in International Symposium on Performance Analysis of Systems and Software, 2009.
- (2009) International Symposium on Performance Analysis of Systems and Software
- Bakhoda, A.¹ Yuan, G.² Fung, W.³ Wong, H.⁴ Aamodt, T.⁵

34
- 70049105948
- GARNET: A detailed on-chip network model inside a full-system simulator
- N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha, "GARNET: A detailed on-chip network model inside a full-system simulator," in International Symposium on Performance Analysis of Systems and Software, 2009.
- (2009) International Symposium on Performance Analysis of Systems and Software
- Agarwal, N.¹ Krishna, T.² Peh, L.S.³ Jha, N.K.⁴

35
- 84907071495
- N. Muralimanohar and R. Balasubramonian, "Cacti 6.0: A tool to understand large caches," 2007.
- (2007) Cacti 6.0: A Tool to Understand Large Caches
- Muralimanohar, N.¹ Balasubramonian, R.²

36
- 70649092154
- Rodinia: A benchmark suite for heterogeneous computing
- Oct
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in International Symposium on Workload Characterization, Oct. 2009.
- (2009) International Symposium on Workload Characterization
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.⁵ Lee, S.-H.⁶ Skadron, K.⁷

37
- 84875979403
- Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems
- ACM
- P. Mistry, Y. Ukidave, D. Schaa, and D. Kaeli, "Valar: A benchmark suite to study the dynamic behavior of heterogeneous systems," in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM, 2013, pp. 54-65.
- (2013) Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6) , pp. 54-65
- Mistry, P.¹ Ukidave, Y.² Schaa, D.³ Kaeli, D.⁴

38
- 84966891675
- Morgan Kaufmann
- W.-M. Hwu, Heterogeneous System Architecture: A new compute platform infrastructure. Morgan Kaufmann, 2015.
- (2015) Heterogeneous System Architecture: A New Compute Platform Infrastructure
- Hwu, W.-M.¹

39
- 84994785664
- La Sapienza
- University of Rome
- University of Rome "La Sapienza", "9th DIMACS Implementation Challenge," 2014, http://www.dis.uniroma1.it/challenge9/index.shtml.
- (2014) 9th DIMACS Implementation Challenge

40
- 84976501593
- In-place data sliding algorithms for many-core architectures
- Sep
- J. Gómez Luna, L.-W. Chang, I.-J. Sung, W.-M. Hwu, and N. Guil, "In-place data sliding algorithms for many-core architectures," in 44th International Conference on Parallel Processing (ICPP), Sep. 2015.
- (2015) 44th International Conference on Parallel Processing (ICPP)
- Gómez Luna, J.¹ Chang, L.-W.² Sung, I.-J.³ Hwu, W.-M.⁴ Guil, N.⁵

41
- 85006986544
- AMD
- AMD, "AMD accelerated parallel processing (APP) software development kit (SDK) 3.0," http://developer.amd.com/tools-and-sdks/openclzone/amd-accelerated-parallel-processing-app-sdk/, 2016.
- (2016) AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) 3.0

42
- 84879555900
- An optimized approach to histogram computation on GPU
- J. Gómez-Luna, J. M. González-Linares, J. I. Benavides, and N. Guil, "An optimized approach to histogram computation on GPU," Machine Vision and Applications, vol. 24, no. 5, pp. 899-908, 2013.
- (2013) Machine Vision and Applications , vol.24 , Issue.5 , pp. 899-908
- Gómez-Luna, J.¹ González-Linares, J.M.² Benavides, J.I.³ Guil, N.⁴

43
- 84906545139
- Egomotion compensation and moving objects detection algorithm on GPU
- IOS Press
- J. Gómez-Luna, H. Endt, W. Stechele, J. M. González-Linares, J. I. Benavides, and N. Guil, "Egomotion compensation and moving objects detection algorithm on GPU," in Applications, Tools and Techniques on the Road to Exascale Computing, ser. Advances in Parallel Computing, vol. 22. IOS Press, 2011, pp. 183-190.
- (2011) Applications, Tools and Techniques on the Road to Exascale Computing, Ser. Advances in Parallel Computing , vol.22 , pp. 183-190
- Gómez-Luna, J.¹ Endt, H.² Stechele, W.³ González-Linares, J.M.⁴ Benavides, J.I.⁵ Guil, N.⁶

44
- 84994695059
- Intel Corporation. [Online]
- Intel Corporation. (2013) Products (formerly Haswell). [Online]. Available: http://ark.intel.com/products/codename/42174/Haswell
- (2013) Products (Formerly Haswell)

45
- 78751505898
- A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
- Dec
- S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in IEEE International Symposium on Workload Characterization (IISWC), Dec. 2010.
- (2010) IEEE International Symposium on Workload Characterization (IISWC)
- Che, S.¹ Sheaffer, J.W.² Boyer, M.³ Szafaryn, L.G.⁴ Wang, L.⁵ Skadron, K.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.