SCOPUS 정보 검색 플랫폼

Proceedings of the International Conference on Supercomputing

Volumn , Issue , 2013, Pages 337-348

Holistic run-time parallelism management for time and energy efficiency

(3) Sridharan, Srinath a Gupta, Gagan a Sohi, Gurindar S a

a University of Wisconsin (United States)

Author keywords

autotuning; parallel programming; performance portability; performance tuning; run time optimization

Indexed keywords

AUTOTUNING; DEGREE OF PARALLELISM; HARDWARE AND SOFTWARE; PARALLEL PROGRAMMING MODEL; PERFORMANCE PORTABILITY; PERFORMANCE TUNING; RUNTIME OPTIMIZATION; STATE-OF-THE-ART APPROACH;

ENERGY EFFICIENCY; MULTIPROGRAMMING; PARALLEL PROGRAMMING;

INTELLIGENT CONTROL;

EID: 84879814702 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2464996.2465016 Document Type: Conference Paper

Times cited : (45)

References (46)

1
- 84874338610
- Intel64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A and 3B: System Programming Guide, Parts 1 and 2. http://www.intel. com/Assets/PDF/manual/325384.pdf.
- Intel64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A and 3B: System Programming Guide, Parts 1 and 2

2
- 84858777927
- PhD thesis, University of Wisconsin, Madison
- M. D. Allen. Data-Driven Decomposition of Sequential Programs for Determinate Parallel Execution. PhD thesis, University of Wisconsin, Madison, 2010.
- (2010) Data-Driven Decomposition of Sequential Programs for Determinate Parallel Execution
- Allen, M.D.¹

3
- 67650076849
- Serialization sets: A dynamic dependence-based parallel execution model
- New York, NY, USA
- M. D. Allen, S. Sridharan, and G. S. Sohi. Serialization sets: a dynamic dependence-based parallel execution model. In PPoPP '09, pages 85-96, New York, NY, USA, 2009.
- (2009) PPoPP '09 , pp. 85-96
- Allen, M.D.¹ Sridharan, S.² Sohi, G.S.³

4
- 70449638442
- Redundancy in network traffic: Findings and implications
- New York, NY, USA
- A. Anand, C. Muthukrishnan, A. Akella, and R. Ramjee. Redundancy in network traffic: findings and implications. SIGMETRICS '09, pages 37-48, New York, NY, USA, 2009.
- (2009) SIGMETRICS '09 , pp. 37-48
- Anand, A.¹ Muthukrishnan, C.² Akella, A.³ Ramjee, R.⁴

5
- 20444409262
- Scheduler activations: Effective kernel support for the user-level management of parallelism
- New York, NY, USA, ACM
- T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In Proceedings of the thirteenth ACM symposium on Operating systems principles, SOSP '91, pages 95-109, New York, NY, USA, 1991. ACM.
- (1991) Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, SOSP '91 , pp. 95-109
- Anderson, T.E.¹ Bershad, B.N.² Lazowska, E.D.³ Levy, H.M.⁴

6
- 72249108375
- A type and effect system for deterministic parallel java
- New York, NY, USA
- R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for deterministic parallel java. OOPSLA '09, pages 97-116, New York, NY, USA, 2009.
- (2009) OOPSLA '09 , pp. 97-116
- Bocchino Jr., R.L.¹ Adve, V.S.² Dig, D.³ Adve, S.V.⁴ Heumann, S.⁵ Komuravelli, R.⁶ Overbey, J.⁷ Simmons, P.⁸ Sung, H.⁹ Vakilian, M.¹⁰

7
- 33846118079
- Designing reliable systems from unreliable components: The challenges of transistor variability and degradation
- S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. MICRO '05, 25(6):10-16, 2005.
- (2005) MICRO '05 , vol.25 , Issue.6 , pp. 10-16
- Borkar, S.¹

8
- 85076910730
- An analysis of linux scalability to many cores
- Berkeley, CA, USA, USENIX Association
- S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. OSDI'10, pages 1-8, Berkeley, CA, USA, 2010. USENIX Association.
- (2010) OSDI'10 , pp. 1-8
- Boyd-Wickizer, S.¹ Clements, A.T.² Mao, Y.³ Pesterev, A.⁴ Kaashoek, M.F.⁵ Morris, R.⁶ Zeldovich, N.⁷

9
- 34248374123
- Online power-performance adaptation of multithreaded programs using hardware event-based prediction
- New York, NY, USA
- M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. ICS '06, pages 157-166, New York, NY, USA, 2006.
- (2006) ICS '06 , pp. 157-166
- Curtis-Maury, M.¹ Dzierwa, J.² Antonopoulos, C.D.³ Nikolopoulos, D.S.⁴

10
- 63549125482
- Prediction models for multi-dimensional power-performance optimization on many cores
- New York, NY, USA
- M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz. Prediction models for multi-dimensional power-performance optimization on many cores. PACT '08, pages 250-259, New York, NY, USA, 2008.
- (2008) PACT '08 , pp. 250-259
- Curtis-Maury, M.¹ Shah, A.² Blagojevic, F.³ Nikolopoulos, D.S.⁴ De Supinski, B.R.⁵ Schulz, M.⁶

11
- 0021587237
- Implementation techniques for main memory database systems
- New York, NY, USA
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. SIGMOD '84, pages 1-8, New York, NY, USA, 1984.
- (1984) SIGMOD '84 , pp. 1-8
- DeWitt, D.J.¹ Katz, R.H.² Olken, F.³ Shapiro, L.D.⁴ Stonebraker, M.R.⁵ Wood, D.A.⁶

12
- 77949693607
- Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems
- New York, NY, USA
- E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS '10, pages 335-346, New York, NY, USA, 2010.
- (2010) ASPLOS '10 , pp. 335-346
- Ebrahimi, E.¹ Lee, C.J.² Mutlu, O.³ Patt, Y.N.⁴

13
- 0347507496
- The implementation of the Cilk-5 multithreaded language
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212-223, 1998.
- (1998) PLDI '98 , pp. 212-223
- Frigo, M.¹ Leiserson, C.E.² Randall, K.H.³

14
- 79959806789
- The National Academies Press
- S. H. Fuller and E. Lynette I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, 2011.
- (2011) The Future of Computing Performance: Game over or Next Level?
- Fuller, S.H.¹ Lynette, E.² Millett, I.³

15
- 16244379679
- An Introduction to Tabu Search
- F. Glover and G. Kochenberger, editors, chapter 2, Kluwer Academic Publishers
- M. Gendreau. An Introduction to Tabu Search. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, chapter 2, pages 37-54. Kluwer Academic Publishers, 2003.
- (2003) Handbook of Metaheuristics , pp. 37-54
- Gendreau, M.¹

16
- 11844281485
- Parallel data compression with bzip2
- J. Gilchrist. Parallel data compression with bzip2. In ICPDCS '04, pages 559-564, 2004.
- (2004) ICPDCS '04 , pp. 559-564
- Gilchrist, J.¹

17
- 0004215426
- Kluwer Academic Publishers, Norwell, MA, USA
- F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1997.
- (1997) Tabu Search
- Glover, F.¹ Laguna, M.²

18
- 84858770555
- Dataflow execution of sequential imperative programs on multicore architectures
- New York, NY, USA
- G. Gupta and G. S. Sohi. Dataflow execution of sequential imperative programs on multicore architectures. In MICRO '11, pages 59-70, New York, NY, USA, 2011.
- (2011) MICRO '11 , pp. 59-70
- Gupta, G.¹ Sohi, G.S.²

19
- 84975241853
- PIRATE: QoS and performance management in CMP architectures
- March
- R. Illikkal, V. Chadha, A. Herdrich, R. Iyer, and D. Newell. PIRATE: QoS and performance management in CMP architectures. SIGMETRICS Perform. Eval. Rev., 37:3-10, March 2010.
- (2010) SIGMETRICS Perform. Eval. Rev. , vol.37 , pp. 3-10
- Illikkal, R.¹ Chadha, V.² Herdrich, A.³ Iyer, R.⁴ Newell, D.⁵

20
- 8344246922
- CQoS: A framework for enabling QoS in shared caches of CMP platforms
- New York, NY, USA
- R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS '04, pages 257-266, New York, NY, USA, 2004.
- (2004) ICS '04 , pp. 257-266
- Iyer, R.¹

21
- 36349002905
- QoS policies and architecture for cache/memory in CMP platforms
- DOI 10.1145/1269899.1254886, SIGMETRICS'07 - Proceedings of the 2007 International Conference on Measurement and Modeling of Computer Systems
- R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform. Eval. Rev., 35(1):25-36, June 2007. (Pubitemid 350158070)
- (2007) Performance Evaluation Review , vol.35 , Issue.1 , pp. 25-36
- Iyer, R.¹ Zhao, L.² Guo, F.³ Illikkal, R.⁴ Makineni, S.⁵ Newell, D.⁶ Solihin, Y.⁷ Hsu, L.⁸ Reinhardt, S.⁹

22
- 0037253062
- The vision of autonomic computing
- Jan.
- J. O. Kephart and D. M. Chess. The vision of autonomic computing. Computer, 36(1):41-50, Jan. 2003.
- (2003) Computer , vol.36 , Issue.1 , pp. 41-50
- Kephart, J.O.¹ Chess, D.M.²

23
- 10444238444
- Fair cache sharing and partitioning in a chip multiprocessor architecture
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT '04, pages 111-122, 2004.
- (2004) PACT '04 , pp. 111-122
- Kim, S.¹ Chandra, D.² Solihin, Y.³

24
- 70349191933
- Lonestar: A suite of parallel irregular programs
- April
- M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval. Lonestar: A suite of parallel irregular programs. In ISPASS '09, pages 65-76, April 2009.
- (2009) ISPASS '09 , pp. 65-76
- Kulkarni, M.¹ Burtscher, M.² Pingali, K.³ Cascaval, C.⁴

25
- 77955001392
- Thread tailor: Dynamically weaving threads together for efficient, adaptive parallel applications
- New York, NY, USA
- J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In ISCA '10, pages 270-279, New York, NY, USA, 2010.
- (2010) ISCA '10 , pp. 270-279
- Lee, J.¹ Wu, H.² Ravichandran, M.³ Clark, N.⁴

26
- 77953990600
- Hybrid MPI/OpenMP power-aware computing
- April
- D. Li, B. de Supinski, M. Schulz, K. Cameron, and D. Nikolopoulos. Hybrid MPI/OpenMP power-aware computing. In IPDPS '10, pages 1-12, April 2010.
- (2010) IPDPS '10 , pp. 1-12
- Li, D.¹ De Supinski, B.² Schulz, M.³ Cameron, K.⁴ Nikolopoulos, D.⁵

27
- 33744504467
- Power-performance implications of thread-level parallelism on chip multiprocessors
- March
- J. Li and J. Martinez. Power-performance implications of thread-level parallelism on chip multiprocessors. In ISPASS '05, pages 124 -134, March 2005.
- (2005) ISPASS '05 , pp. 124-134
- Li, J.¹ Martinez, J.²

28
- 33748879741
- Dynamic power-performance adaptation of parallel computation on chip multiprocessors
- Feb.
- J. Li and J. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In HPCA '06, pages 77-87, Feb. 2006.
- (2006) HPCA '06 , pp. 77-87
- Li, J.¹ Martinez, J.²

29
- 85092783412
- Tessellation: Space-time partitioning in a manycore client os
- Berkeley, CA, USA
- R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: space-time partitioning in a manycore client os. HotPar'09, pages 10-10, Berkeley, CA, USA, 2009.
- (2009) HotPar'09 , pp. 10-10
- Liu, R.¹ Klues, K.² Bird, S.³ Hofmeyr, S.⁴ Asanović, K.⁵ Kubiatowicz, J.⁶

30
- 0038998034
- Memory bandwidth and machine balance in current high performance computers
- Dec.
- J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. TCCA Newsletter, pages 19-25, Dec. 1995.
- (1995) TCCA Newsletter , pp. 19-25
- McCalpin, J.D.¹

31
- 0027594835
- Dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
- DOI 10.1145/151244.151246
- C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Trans. Comput. Syst., 11(2):146-178, May 1993. (Pubitemid 23668699)
- (1993) ACM Transactions on Computer Systems , vol.11 , Issue.2 , pp. 146-178
- McCann, C.¹ Vaswani, R.² Zahorjan, J.³

32
- 33646222013
- Papi: A portable interface to hardware performance counters
- P. Mucci, S. Browne, C. Deane, and G. Ho. Papi: A portable interface to hardware performance counters. In Proc. Dept. of Defense HPCMP Users Group Conference, pages 7-10, 1999.
- (1999) Proc. Dept. of Defense HPCMP Users Group Conference , pp. 7-10
- Mucci, P.¹ Browne, S.² Deane, C.³ Ho, G.⁴

33
- 47349122373
- Stall-time fair memory access scheduling for chip multiprocessors
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO '07, pages 146-160, 2007.
- (2007) MICRO '07 , pp. 146-160
- Mutlu, O.¹ Moscibroda, T.²

34
- 52649119398
- Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA '08, pages 63-74, 2008.
- (2008) ISCA '08 , pp. 63-74
- Mutlu, O.¹ Moscibroda, T.²

35
- 34548050337
- Fair queuing memory systems
- K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO '06, pages 208-222, 2006.
- (2006) MICRO '06 , pp. 208-222
- Nesbit, K.J.¹ Aggarwal, N.² Laudon, J.³ Smith, J.E.⁴

36
- 35348816719
- Virtual private caches
- New York, NY, USA
- K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA '07, pages 57-68, New York, NY, USA, 2007.
- (2007) ISCA '07 , pp. 57-68
- Nesbit, K.J.¹ Laudon, J.² Smith, J.E.³

37
- 77957594732
- Composing parallel software efficiently with lithe
- New York, NY, USA
- H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with lithe. In PLDI '10, pages 376-387, New York, NY, USA, 2010.
- (2010) PLDI '10 , pp. 376-387
- Pan, H.¹ Hindman, B.² Asanović, K.³

38
- 57949083229
- A dependency-aware task-based programming environment for multi-core architectures
- 29 2008-oct. 1
- J. Perez, R. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In Cluster Computing, 2008 IEEE International Conference on, pages 142 -151, 29 2008-oct. 1 2008.
- (2008) Cluster Computing, 2008 IEEE International Conference on , pp. 142-151
- Perez, J.¹ Badia, R.² Labarta, J.³

39
- 79959909380
- Parallelism orchestration using DoPE: The degree of parallelism executive
- New York, NY, USA, ACM
- A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using DoPE: the degree of parallelism executive. In PLDI '11, pages 26-37, New York, NY, USA, 2011. ACM.
- (2011) PLDI '11 , pp. 26-37
- Raman, A.¹ Kim, H.² Oh, T.³ Lee, J.W.⁴ August, D.I.⁵

40
- 84866433289
- Parcae: A system for flexible parallel execution
- New York, NY, USA
- A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: a system for flexible parallel execution. In PLDI '12, pages 133-144, New York, NY, USA, 2012.
- (2012) PLDI '12 , pp. 133-144
- Raman, A.¹ Zaks, A.² Lee, J.W.³ August, D.I.⁴

41
- 34547679939
- Evaluating mapreduce for multi-core and multiprocessor systems
- Washington, DC, USA
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA '07, pages 13-24, Washington, DC, USA, 2007.
- (2007) HPCA '07 , pp. 13-24
- Ranger, C.¹ Raghuraman, R.² Penmetsa, A.³ Bradski, G.⁴ Kozyrakis, C.⁵

42
- 43149087461
- O'Reilly Media, Inc.
- J. Reinders. Intel Threading Building Blocks. O'Reilly Media, Inc., 2007.
- (2007) Intel Threading Building Blocks
- Reinders, J.¹

43
- 84874272738
- Towards a generic observer/controller architecture for organic computing
- C. Hochberger and R. Liskowsky, editors, GI Jahrestagung (1), GI
- U. Richter, M. Mnif, J. Branke, C. Müller-Schloer, and H. Schmeck. Towards a generic observer/controller architecture for organic computing. In C. Hochberger and R. Liskowsky, editors, GI Jahrestagung (1), volume 93 of LNI, pages 112-119. GI, 2006.
- (2006) LNI , vol.93 , pp. 112-119
- Richter, U.¹ Mnif, M.² Branke, J.³ Müller-Schloer, C.⁴ Schmeck, H.⁵

44
- 84867557523
- Scalability-based manycore partitioning
- New York, NY, USA
- H. Sasaki, T. Tanimoto, K. Inoue, and H. Nakamura. Scalability-based manycore partitioning. In PACT '12, pages 107-116, New York, NY, USA, 2012.
- (2012) PACT '12 , pp. 107-116
- Sasaki, H.¹ Tanimoto, T.² Inoue, K.³ Nakamura, H.⁴

45
- 77957764904
- Feedback-driven threading: Power-efficient and high-performance execution of multithreaded workloads on CMPs
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multithreaded workloads on CMPs. In ASPLOS '08, pages 277-286, 2008.
- (2008) ASPLOS '08 , pp. 277-286
- Suleman, M.A.¹ Qureshi, M.K.² Patt, Y.N.³

46
- 79958138859
- Invasive computing: An overview
- J. Teich, J. Henkel, A. Herkersdorf, D. Schmitt-Landsiedel, W. Schroder-Preikschat, and G. Snelting. Invasive computing: An overview. In Multiprocessor System-on-Chip, pages 241-268. 2011.
- (2011) Multiprocessor System-on-Chip , pp. 241-268
- Teich, J.¹ Henkel, J.² Herkersdorf, A.³ Schmitt-Landsiedel, D.⁴ Schroder-Preikschat, W.⁵ Snelting, G.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.