SCOPUS 정보 검색 플랫폼

Concurrency and Computation: Practice and Experience

Volumn 24, Issue 7, 2012, Pages 663-675

A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators

(4) Buchty, Rainer a Heuveline, Vincent a Karl, Wolfgang a Weiss, Jan Philipp a

a KARLSRUHE INSTITUTE OF TECHNOLOGY (Germany)

Author keywords

accelerators; hardware aware computing; heterogeneity; multicore and manycore processors; numerical simulation; parallel programming

Indexed keywords

COMPUTER ARCHITECTURE; COMPUTER SIMULATION; NUMERICAL MODELS; PARALLEL PROCESSING SYSTEMS; PARALLEL PROGRAMMING; PARTICLE ACCELERATORS; SOFTWARE DESIGN; SUPERCOMPUTERS; SURVEYS;

HARDWARE ARCHITECTURE; HARDWARE-AWARE COMPUTING; HETEROGENEITY; HETEROGENEOUS COMPUTING; HETEROGENEOUS ENVIRONMENTS; MULTI-CORE PROCESSOR; MULTICORE AND MANYCORE PROCESSORS; NUMERICAL ALGORITHMS;

MULTICORE PROGRAMMING;

EID: 84859725414 PISSN: 15320626 EISSN: 15320634 Source Type: Journal
DOI: 10.1002/cpe.1904 Document Type: Conference Paper

Times cited : (34)

References (52)

1
- 34548083281
- The free lunch is over - A fundamental turn toward concurrency in software
- Sutter H,. The free lunch is over-a fundamental turn toward concurrency in software. Dr. Dobb's Journal 2005; 30 (3): 202-210.
- (2005) Dr. Dobb's Journal , vol.30 , Issue.3 , pp. 202-210
- Sutter, H.¹

2
- 34447569672
- [Oct 5]
- Intel Corp. Intel64 and IA-32 architectures software developer's manual. http://www.intel.com/products/processor/manuals/[Oct 5, 2011].
- (2011) Intel64 and IA-32 Architectures Software Developer's Manual

3
- 80052325923
- [Oct 5]
- AMD Inc. The AMD Fusion Family of APUs. http://sites.amd.com/us/fusion/ apu/Pages/fusion.aspx/ [Oct 5, 2011].
- (2011) The AMD Fusion Family of APUs.

4
- 33646015987
- Synergistic processing in Cell's multicore architecture
- Gschwind M, Hofstee HP, Flachs B, Hopkins M, Watanabe Y, Yamazaki T,. Synergistic processing in Cell's multicore architecture. IEEE Micro 2006; 26 (2): 10-24.
- (2006) IEEE Micro , vol.26 , Issue.2 , pp. 10-24
- Gschwind, M.¹ Hofstee, H.P.² Flachs, B.³ Hopkins, M.⁴ Watanabe, Y.⁵ Yamazaki, T.⁶

5
- 84859721294
- [Oct 5]
- Khronos OpenCL working group. OpenCL 1.0 Standard. http://www.khronos. org/opencl/[Oct 5, 2011].
- (2011) OpenCL 1.0 Standard

6
- 2042458649
- A survey of processors with explicit multithreading
- DOI 10.1145/641865.641867
- Ungerer T, Robič B, Åilc J,. A survey of processors with explicit multithreading. ACM Computing Surveys 2003; 35 (1): 29-63. (Pubitemid 44159292)
- (2003) ACM Computing Surveys , vol.35 , Issue.1 , pp. 29-63
- Ungerer, T.¹ Robic, B.² Silc, J.³

7
- 62349092536
- Scalable programming models for massively multicore processors
- McCool MD,. Scalable programming models for massively multicore processors. Proceedings of the IEEE, 2008; 816-831.
- (2008) Proceedings of the IEEE , pp. 816-831
- McCool, M.D.¹

8
- 56749158843
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- In. ACM: New York.
- Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J,. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In SC '07 Proceedings of the 2007 ACM/IEEE conference on Supercomputing. ACM: New York, 2007; 1-12.
- (2007) SC '07 Proceedings of the 2007 ACM/IEEE Conference on Supercomputing , pp. 1-12
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

9
- 70350771127
- Stencil computation optimization and auto-tuning on state-of-The-art multicore architectures
- In. ACM: New York.
- Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K,. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In SC '08 Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. ACM: New York, 2008; 1-12.
- (2008) SC '08 Proceedings of the 2008 ACM/IEEE Conference on Supercomputing , pp. 1-12
- Datta, K.¹ Murphy, M.² Volkov, V.³ Williams, S.⁴ Carter, J.⁵ Oliker, L.⁶ Patterson, D.⁷ Shalf, J.⁸ Yelick, K.⁹

10
- 51049106193
- Lattice Boltzmann simulation optimization on leading multicore platforms
- Williams S, Carter J, Oliker L, Shalf J, Yelick K,. Lattice Boltzmann simulation optimization on leading multicore platforms. Proceedings of International Parallel and Distributed Processing Symposium, 2008.
- (2008) Proceedings of International Parallel and Distributed Processing Symposium
- Williams, S.¹ Carter, J.² Oliker, L.³ Shalf, J.⁴ Yelick, K.⁵

11
- 84885784745
- Evaluating multi-core platforms for HPC data-intensive kernels
- In. ACM: New York.
- Van Amesfoort A, Varbanescu A, Sips H, Van Nieuwpoort R,. Evaluating multi-core platforms for HPC data-intensive kernels. In CF '09 Proceedings of the 6th ACM conference on Computing Frontiers. ACM: New York, 2009; 207-216.
- (2009) CF '09 Proceedings of the 6th ACM Conference on Computing Frontiers , pp. 207-216
- Van Amesfoort, A.¹ Varbanescu, A.² Sips, H.³ Van Nieuwpoort, R.⁴

12
- 36049051263
- The new landscape of parallel computer architecture
- Shalf J,. The new landscape of parallel computer architecture. Journal of Physics: Conference Series 2007; 78. 012066.
- (2007) Journal of Physics: Conference Series , vol.78 , pp. 012066
- Shalf, J.¹

13
- 35648995516
- The landscape of parallel computing research: A view from Berkeley
- [Oct 5, 2011]
- Asanovic K, et al,. The landscape of parallel computing research: a view from Berkeley. EECS technical report, 2006. http://www.eecs.berkeley.edu/Pubs/ TechRpts/2006/EECS-2006-183.html [Oct 5, 2011].
- (2006) EECS Technical Report
- Asanovic, K.¹

14
- 48249118853
- Amdahl's law in the multicore era
- Hill MD, Marty MR,. Amdahl's law in the multicore era. IEEE Computer, 2008.
- (2008) IEEE Computer
- Hill, M.D.¹ Marty, M.R.²

15
- 84859702835
- [Oct 5]
- Xilinx, Inc. Virtex-6 FPGA Family. http://www.xilinx.com/products/ virtex6/ [Oct 5, 2011].
- (2011) Virtex-6 FPGA Family

16
- 84859700527
- [Oct 5]
- Altera Corp. Stratix series high-end FPGAs. http://www.altera.com/ products/devices/stratix-fpgas/about/stx-about.html [Oct 5, 2011].
- (2011) Stratix Series High-end FPGAs

17
- 84879491134
- [Oct 5]
- Intel Single-Chip Cloud Computer. http://download.intel.com/pressroom/ pdf/rockcreek/SCC-Announcement-JustinRattner.pdf [Oct 5, 2011].
- (2011) Intel Single-Chip Cloud Computer

18
- 84879804402
- [Oct 5]
- Intel many integrated core architecture. http://www.intel.com/pressroom/ archive/releases/2010/20100531comp.htm [Oct 5, 2011].
- (2011) Intel Many Integrated Core Architecture

19
- 0029200683
- Simultaneous multithreading: Maximizing on-chip parallelism
- Tullsen DM, Eggers SJ, Levy HM,. Simultaneous multithreading: maximizing on-chip parallelism. ISCA '95 Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995; 533-544.
- (1995) ISCA '95 Proceedings of the 22nd Annual International Symposium on Computer Architecture , pp. 533-544
- Tullsen, D.M.¹ Eggers, S.J.² Levy, H.M.³

20
- 47149107263
- The impact of speculative execution on SMT processors
- Kang D, Liu C, Gaudiot JL,. The impact of speculative execution on SMT processors. International Journal of Parallel Programming 2008; 36 (4): 361-385.
- (2008) International Journal of Parallel Programming , vol.36 , Issue.4 , pp. 361-385
- Kang, D.¹ Liu, C.² Gaudiot, J.L.³

21
- 79959456077
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
- Baskaran MM, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P,. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. PPoPP '08 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008; 1-10.
- (2008) PPoPP '08 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 1-10
- Baskaran, M.M.¹ Bondhugula, U.² Krishnamoorthy, S.³ Ramanujam, J.⁴ Rountev, A.⁵ Sadayappan, P.⁶

22
- 4243648129
- Little's law and high performance computing
- Bailey DH,. Little's law and high performance computing. RNR Technical Report, 1997.
- (1997) RNR Technical Report
- Bailey, D.H.¹

23
- 67650056991
- LU, QR and Cholesky factorizations using vector capabilities of GPUs
- [Oct 5]
- Volkov V, Demmel J,. LU, QR and Cholesky factorizations using vector capabilities of GPUs. EECS Technical Report, 2008. http://www.eecs.berkeley.edu/ Pubs/TechRpts/2008/EECS-2008-49.html [Oct 5, 2011].
- (2008) EECS Technical Report
- Volkov, V.¹ Demmel, J.²

24
- 84859721293
- [Oct 5]
- Intel Corp. Intel many integrated core. http://newsroom.intel.com/ servlet/JiveServlet/download/2152-4-5220/ISC-Intel-MIC-factsheet.pdf [Oct 5, 2011].
- (2011) Intel Many Integrated Core

25
- 84859700528
- [Oct 5]
- Fujitsu Global. K computer: World's No.1 on TOP500 List. http://www.fujitsu.com/global/about/tech/k/ [Oct 5, 2011].
- (2011) K Computer: World's No.1 on TOP500 List

26
- 84859716320
- [Oct 5]
- Adapteva Inc. Epiphany IV multicore processors. http://www.adapteva.com/ [Oct 5, 2011].
- (2011) Epiphany IV Multicore Processors

27
- 0002806690
- OpenMP: An industry-standard API for shared-memory programming
- Dagum L, Menon R,. OpenMP: an industry-standard API for shared-memory programming. IEEE Computational Science and Engineering 1998; 05 (1): 46-55.
- (1998) IEEE Computational Science and Engineering , vol.5 , Issue.1 , pp. 46-55
- Dagum, L.¹ Menon, R.²

28
- 84859703334
- (version 2.2). [Oct 5]
- MPI: a message-passing interface standard (version 2.2). http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf [Oct 5, 2011].
- (2011) MPI: A Message-passing Interface Standard

29
- 0343644429
- High Performance Fortran - History, overview and current developments
- [Oct 5, 2011]
- Richardson H,. High Performance Fortran-history, overview and current developments. TMC-261, 1996. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.48.8497 [Oct 5, 2011].
- (1996) TMC-261
- Richardson, H.¹

30
- 84864154621
- Programming in the partitioned global address space model
- [Oct 5, 2011].
- Carlson B, El-Ghazawi T, Numrich R, Yelick K,. Programming in the partitioned global address space model. Supercomputing 2003. [Oct 5, 2011].
- Supercomputing 2003
- Carlson, B.¹ El-Ghazawi, T.² Numrich, R.³ Yelick, K.⁴

31
- 84859716318
- [Oct 5]
- Unified Parallel C. http://upc.gwu.edu/ [Oct 5, 2011].
- (2011)

32
- 84859716175
- [Oct 5]
- Coarray Fortran. http://caf.rice.edu/ [Oct 5, 2011].
- (2011)

33
- 84859731510
- [Oct 5]
- The chapel parallel programming language, http://chapel.cray.com/ [Oct 5, 2011].
- (2011) The Chapel Parallel Programming Language

34
- 84859700525
- [Oct 5]
- X10: performance and productivity at scale, http://x10-lang.org/ [Oct 5, 2011].
- (2011) X10: Performance and Productivity at Scale

35
- 73449104291
- Programming multiprocessors with explicitly managed memory hierarchies
- Schneider S, Yeom JS, Nikolopoulos DS,. Programming multiprocessors with explicitly managed memory hierarchies. IEEE Computer 2009; 42: 28-34.
- (2009) IEEE Computer , vol.42 , pp. 28-34
- Schneider, S.¹ Yeom, J.S.² Nikolopoulos, D.S.³

36
- 70349291808
- Parallel processing with CUDA
- Halfhill T,. Parallel processing with CUDA. Microprocessor Report 01/28/08-01, 2008.
- (2008) Microprocessor Report 01/28/08-01
- Halfhill, T.¹

37
- 84859723297
- [Oct 5]
- Intel Corp. Intel Parallel Building Blocks (PBB). http://software.intel. com/en-us/articles/intel-parallel-buildingblocks/ [Oct 5, 2011].
- (2011) Intel Parallel Building Blocks (PBB)

38
- 84859719676
- [Oct 5]
- Intel Corp. Intel Array Building Blocks (ArBB). http://software.intel. com/en-us/articles/intel-array-building-blocks/ [Oct 5, 2011].
- (2011) Intel Array Building Blocks (ArBB)

39
- 67650085808
- EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system
- DOI 10.1145/1250734.1250753, PLDI'07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation
- Wang P, Collins J, Chinya G, Jiang H, Tian X, Girkar M, Yang N, Lue GY, Wang H,. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. SIGPLAN Not 2007; 42 (6): 156-166. (Pubitemid 47630684)
- (2007) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) , pp. 156-166
- Wang, P.H.¹ Collins, J.D.² Chinya, G.N.³ Jiang, H.⁴ Tian, X.⁵ Girkar, M.⁶ Yang, N.Y.⁷ Lueh, G.-Y.⁸ Wang, H.⁹

40
- 77957759721
- Merge: A programming model for heterogeneous multi-core systems
- DOI 10.1145/1346281.1346318, ASPLOS XIII - Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems
- Linderman M, Collins J, Wang H, Meng T,. Merge: a programming model for heterogeneous multi-core systems. ASPLOS XIII, 2008; 287-296. (Pubitemid 351585414)
- (2008) International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS , pp. 287-296
- Linderman, M.D.¹ Collins, J.D.² Wang, H.³ Meng, T.H.⁴

41
- 67650081614
- Liquid Metal: Object-oriented programming across the hardware/software boundary
- Huang S, Hormati A, Bacon D, Rabbah R,. Liquid Metal: object-oriented programming across the hardware/software boundary. ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming.
- ECOOP '08 Proceedings of the 22nd European Conference on Object-Oriented Programming
- Huang, S.¹ Hormati, A.² Bacon, D.³ Rabbah, R.⁴

42
- 76749140917
- Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
- ISBN 978-1-60558-798-1
- Luk CK, Hong S, Kim H,. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009; 45-55. ISBN 978-1-60558-798-1.
- (2009) Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture , pp. 45-55
- Luk, C.K.¹ Hong, S.² Kim, H.³

43
- 63449095902
- A light-weight approach to dynamical run-time linking supporting heterogenous, parallel, and reconfigurable architectures
- In, , Lecture Notes in Computer Science.
- Buchty R, Kramer D, Kicherer M, Karl W,. A light-weight approach to dynamical run-time linking supporting heterogenous, parallel, and reconfigurable architectures. In Architecture of Computing Systems, vol. 5467, Lecture Notes in Computer Science, 2009; 60-71.
- (2009) Architecture of Computing Systems , vol.5467 , pp. 60-71
- Buchty, R.¹ Kramer, D.² Kicherer, M.³ Karl, W.⁴

44
- 84859699631
- An embrace-and-extend approach to managing the complexity of future heterogeneous systems
- In, , Lecture Notes in Computer Science, Springer: Berlin/Heidelberg.
- Buchty R, Kicherer M, Kramer D, Karl W,. An embrace-and-extend approach to managing the complexity of future heterogeneous systems. In SAMOS IX, vol. 5657, Lecture Notes in Computer Science, Springer: Berlin/Heidelberg, 2009; 226-235.
- (2009) SAMOS IX , vol.5657 , pp. 226-235
- Buchty, R.¹ Kicherer, M.² Kramer, D.³ Karl, W.⁴

45
- 84859700526
- Delivering guidance information in heterogeneous systems
- Hannover, Germany, February;. VDE, ISBN 978-3-8007-3222-7
- Nowak F, Kicherer M, Buchty R, Karl W,. Delivering guidance information in heterogeneous systems. ARCS 2010 Workshop Proceedings, Hannover, Germany, February 2010; 279-284. VDE, ISBN 978-3-8007-3222-7.
- (2010) ARCS 2010 Workshop Proceedings , pp. 279-284
- Nowak, F.¹ Kicherer, M.² Buchty, R.³ Karl, W.⁴

46
- 79952974311
- Extending a light-weight runtime system by dynamic instrumentation for performance evaluation
- Hannover, Germany, February;. VDE, ISBN 978-38007-3322-7
- Kicherer M, Nowak F, Buchty R, Karl W,. Extending a light-weight runtime system by dynamic instrumentation for performance evaluation. ARCS 2010 Workshop Proceedings, Hannover, Germany, February 2010; 95-101. VDE, ISBN 978-38007-3322-7.
- (2010) ARCS 2010 Workshop Proceedings , pp. 95-101
- Kicherer, M.¹ Nowak, F.² Buchty, R.³ Karl, W.⁴

47
- 79952903205
- Cost-aware function migration in heterogeneous systems
- Kicherer M, Buchty R, Karl W,. Cost-aware function migration in heterogeneous systems. HiPEAC '11 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, 2011.
- (2011) HiPEAC '11 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
- Kicherer, M.¹ Buchty, R.² Karl, W.³

48
- 74049146136
- Minimizing communication in sparse matrix solvers
- In, ACM: New York.
- Mohiyuddin M, Hoemmen M, Demmel J, Yelick K,. Minimizing communication in sparse matrix solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ACM: New York, 2009; 1-11.
- (2009) Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , pp. 1-11
- Mohiyuddin, M.¹ Hoemmen, M.² Demmel, J.³ Yelick, K.⁴

49
- 70350676807
- Optimized stencil computation using in-place calculation on modern multicore systems
- Augustin W, Heuveline V, Weiss JP,. Optimized stencil computation using in-place calculation on modern multicore systems. Proceedings of the 15th International Euro-Par Conference on Parallel Processing, 2009; 772-784.
- (2009) Proceedings of the 15th International Euro-Par Conference on Parallel Processing , pp. 772-784
- Augustin, W.¹ Heuveline, V.² Weiss, J.P.³

50
- 85110493784
- [Oct 5]
- 3-parallel finite element software http://www.hiflow3. org [Oct 5, 2011].
- (2011) 3 - Parallel Finite Element Software

51
- 78649815411
- A multi-platform linear algebra toolbox for finite element solvers on heterogeneous clusters
- to appear
- Heuveline V, Subramanian C, Lukarski D, Weiss JP,. A multi-platform linear algebra toolbox for finite element solvers on heterogeneous clusters. PPAAC'10, IEEE Cluster Workshops, 2010; to appear.
- (2010) PPAAC'10, IEEE Cluster Workshops
- Heuveline, V.¹ Subramanian, C.² Lukarski, D.³ Weiss, J.P.⁴

52
- 84859716317
- EMCL Preprint 2011-08, [Oct 5, 2011]
- Heuveline V, Lukarski D, Weiss JP,. Enhanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs -the power(q)-pattern method, 2011. EMCL Preprint 2011-08, http://www.emcl.kit.edu/preprints/emcl-preprint-2011-08. pdf [Oct 5, 2011].
- (2011) Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs - The Power(q)-pattern Method
- Heuveline, V.¹ Lukarski, D.² Weiss, J.P.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.