메뉴 건너뛰기




Volumn , Issue , 2013, Pages 337-348

Holistic run-time parallelism management for time and energy efficiency

Author keywords

autotuning; parallel programming; performance portability; performance tuning; run time optimization

Indexed keywords

AUTOTUNING; DEGREE OF PARALLELISM; HARDWARE AND SOFTWARE; PARALLEL PROGRAMMING MODEL; PERFORMANCE PORTABILITY; PERFORMANCE TUNING; RUNTIME OPTIMIZATION; STATE-OF-THE-ART APPROACH;

EID: 84879814702     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2464996.2465016     Document Type: Conference Paper
Times cited : (45)

References (46)
  • 3
    • 67650076849 scopus 로고    scopus 로고
    • Serialization sets: A dynamic dependence-based parallel execution model
    • New York, NY, USA
    • M. D. Allen, S. Sridharan, and G. S. Sohi. Serialization sets: a dynamic dependence-based parallel execution model. In PPoPP '09, pages 85-96, New York, NY, USA, 2009.
    • (2009) PPoPP '09 , pp. 85-96
    • Allen, M.D.1    Sridharan, S.2    Sohi, G.S.3
  • 4
    • 70449638442 scopus 로고    scopus 로고
    • Redundancy in network traffic: Findings and implications
    • New York, NY, USA
    • A. Anand, C. Muthukrishnan, A. Akella, and R. Ramjee. Redundancy in network traffic: findings and implications. SIGMETRICS '09, pages 37-48, New York, NY, USA, 2009.
    • (2009) SIGMETRICS '09 , pp. 37-48
    • Anand, A.1    Muthukrishnan, C.2    Akella, A.3    Ramjee, R.4
  • 7
    • 33846118079 scopus 로고    scopus 로고
    • Designing reliable systems from unreliable components: The challenges of transistor variability and degradation
    • S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. MICRO '05, 25(6):10-16, 2005.
    • (2005) MICRO '05 , vol.25 , Issue.6 , pp. 10-16
    • Borkar, S.1
  • 9
    • 34248374123 scopus 로고    scopus 로고
    • Online power-performance adaptation of multithreaded programs using hardware event-based prediction
    • New York, NY, USA
    • M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. ICS '06, pages 157-166, New York, NY, USA, 2006.
    • (2006) ICS '06 , pp. 157-166
    • Curtis-Maury, M.1    Dzierwa, J.2    Antonopoulos, C.D.3    Nikolopoulos, D.S.4
  • 10
    • 63549125482 scopus 로고    scopus 로고
    • Prediction models for multi-dimensional power-performance optimization on many cores
    • New York, NY, USA
    • M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz. Prediction models for multi-dimensional power-performance optimization on many cores. PACT '08, pages 250-259, New York, NY, USA, 2008.
    • (2008) PACT '08 , pp. 250-259
    • Curtis-Maury, M.1    Shah, A.2    Blagojevic, F.3    Nikolopoulos, D.S.4    De Supinski, B.R.5    Schulz, M.6
  • 12
    • 77949693607 scopus 로고    scopus 로고
    • Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems
    • New York, NY, USA
    • E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS '10, pages 335-346, New York, NY, USA, 2010.
    • (2010) ASPLOS '10 , pp. 335-346
    • Ebrahimi, E.1    Lee, C.J.2    Mutlu, O.3    Patt, Y.N.4
  • 13
    • 0347507496 scopus 로고    scopus 로고
    • The implementation of the Cilk-5 multithreaded language
    • M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212-223, 1998.
    • (1998) PLDI '98 , pp. 212-223
    • Frigo, M.1    Leiserson, C.E.2    Randall, K.H.3
  • 15
    • 16244379679 scopus 로고    scopus 로고
    • An Introduction to Tabu Search
    • F. Glover and G. Kochenberger, editors, chapter 2, Kluwer Academic Publishers
    • M. Gendreau. An Introduction to Tabu Search. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, chapter 2, pages 37-54. Kluwer Academic Publishers, 2003.
    • (2003) Handbook of Metaheuristics , pp. 37-54
    • Gendreau, M.1
  • 16
    • 11844281485 scopus 로고    scopus 로고
    • Parallel data compression with bzip2
    • J. Gilchrist. Parallel data compression with bzip2. In ICPDCS '04, pages 559-564, 2004.
    • (2004) ICPDCS '04 , pp. 559-564
    • Gilchrist, J.1
  • 17
    • 0004215426 scopus 로고    scopus 로고
    • Kluwer Academic Publishers, Norwell, MA, USA
    • F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1997.
    • (1997) Tabu Search
    • Glover, F.1    Laguna, M.2
  • 18
    • 84858770555 scopus 로고    scopus 로고
    • Dataflow execution of sequential imperative programs on multicore architectures
    • New York, NY, USA
    • G. Gupta and G. S. Sohi. Dataflow execution of sequential imperative programs on multicore architectures. In MICRO '11, pages 59-70, New York, NY, USA, 2011.
    • (2011) MICRO '11 , pp. 59-70
    • Gupta, G.1    Sohi, G.S.2
  • 20
    • 8344246922 scopus 로고    scopus 로고
    • CQoS: A framework for enabling QoS in shared caches of CMP platforms
    • New York, NY, USA
    • R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS '04, pages 257-266, New York, NY, USA, 2004.
    • (2004) ICS '04 , pp. 257-266
    • Iyer, R.1
  • 21
    • 36349002905 scopus 로고    scopus 로고
    • QoS policies and architecture for cache/memory in CMP platforms
    • DOI 10.1145/1269899.1254886, SIGMETRICS'07 - Proceedings of the 2007 International Conference on Measurement and Modeling of Computer Systems
    • R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform. Eval. Rev., 35(1):25-36, June 2007. (Pubitemid 350158070)
    • (2007) Performance Evaluation Review , vol.35 , Issue.1 , pp. 25-36
    • Iyer, R.1    Zhao, L.2    Guo, F.3    Illikkal, R.4    Makineni, S.5    Newell, D.6    Solihin, Y.7    Hsu, L.8    Reinhardt, S.9
  • 22
    • 0037253062 scopus 로고    scopus 로고
    • The vision of autonomic computing
    • Jan.
    • J. O. Kephart and D. M. Chess. The vision of autonomic computing. Computer, 36(1):41-50, Jan. 2003.
    • (2003) Computer , vol.36 , Issue.1 , pp. 41-50
    • Kephart, J.O.1    Chess, D.M.2
  • 23
    • 10444238444 scopus 로고    scopus 로고
    • Fair cache sharing and partitioning in a chip multiprocessor architecture
    • S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT '04, pages 111-122, 2004.
    • (2004) PACT '04 , pp. 111-122
    • Kim, S.1    Chandra, D.2    Solihin, Y.3
  • 24
    • 70349191933 scopus 로고    scopus 로고
    • Lonestar: A suite of parallel irregular programs
    • April
    • M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval. Lonestar: A suite of parallel irregular programs. In ISPASS '09, pages 65-76, April 2009.
    • (2009) ISPASS '09 , pp. 65-76
    • Kulkarni, M.1    Burtscher, M.2    Pingali, K.3    Cascaval, C.4
  • 25
    • 77955001392 scopus 로고    scopus 로고
    • Thread tailor: Dynamically weaving threads together for efficient, adaptive parallel applications
    • New York, NY, USA
    • J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In ISCA '10, pages 270-279, New York, NY, USA, 2010.
    • (2010) ISCA '10 , pp. 270-279
    • Lee, J.1    Wu, H.2    Ravichandran, M.3    Clark, N.4
  • 27
    • 33744504467 scopus 로고    scopus 로고
    • Power-performance implications of thread-level parallelism on chip multiprocessors
    • March
    • J. Li and J. Martinez. Power-performance implications of thread-level parallelism on chip multiprocessors. In ISPASS '05, pages 124 -134, March 2005.
    • (2005) ISPASS '05 , pp. 124-134
    • Li, J.1    Martinez, J.2
  • 28
    • 33748879741 scopus 로고    scopus 로고
    • Dynamic power-performance adaptation of parallel computation on chip multiprocessors
    • Feb.
    • J. Li and J. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In HPCA '06, pages 77-87, Feb. 2006.
    • (2006) HPCA '06 , pp. 77-87
    • Li, J.1    Martinez, J.2
  • 29
    • 85092783412 scopus 로고    scopus 로고
    • Tessellation: Space-time partitioning in a manycore client os
    • Berkeley, CA, USA
    • R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: space-time partitioning in a manycore client os. HotPar'09, pages 10-10, Berkeley, CA, USA, 2009.
    • (2009) HotPar'09 , pp. 10-10
    • Liu, R.1    Klues, K.2    Bird, S.3    Hofmeyr, S.4    Asanović, K.5    Kubiatowicz, J.6
  • 30
    • 0038998034 scopus 로고
    • Memory bandwidth and machine balance in current high performance computers
    • Dec.
    • J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. TCCA Newsletter, pages 19-25, Dec. 1995.
    • (1995) TCCA Newsletter , pp. 19-25
    • McCalpin, J.D.1
  • 31
    • 0027594835 scopus 로고
    • Dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
    • DOI 10.1145/151244.151246
    • C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Trans. Comput. Syst., 11(2):146-178, May 1993. (Pubitemid 23668699)
    • (1993) ACM Transactions on Computer Systems , vol.11 , Issue.2 , pp. 146-178
    • McCann, C.1    Vaswani, R.2    Zahorjan, J.3
  • 33
    • 47349122373 scopus 로고    scopus 로고
    • Stall-time fair memory access scheduling for chip multiprocessors
    • O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO '07, pages 146-160, 2007.
    • (2007) MICRO '07 , pp. 146-160
    • Mutlu, O.1    Moscibroda, T.2
  • 34
    • 52649119398 scopus 로고    scopus 로고
    • Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems
    • O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA '08, pages 63-74, 2008.
    • (2008) ISCA '08 , pp. 63-74
    • Mutlu, O.1    Moscibroda, T.2
  • 36
    • 35348816719 scopus 로고    scopus 로고
    • Virtual private caches
    • New York, NY, USA
    • K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA '07, pages 57-68, New York, NY, USA, 2007.
    • (2007) ISCA '07 , pp. 57-68
    • Nesbit, K.J.1    Laudon, J.2    Smith, J.E.3
  • 37
    • 77957594732 scopus 로고    scopus 로고
    • Composing parallel software efficiently with lithe
    • New York, NY, USA
    • H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with lithe. In PLDI '10, pages 376-387, New York, NY, USA, 2010.
    • (2010) PLDI '10 , pp. 376-387
    • Pan, H.1    Hindman, B.2    Asanović, K.3
  • 38
    • 57949083229 scopus 로고    scopus 로고
    • A dependency-aware task-based programming environment for multi-core architectures
    • 29 2008-oct. 1
    • J. Perez, R. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In Cluster Computing, 2008 IEEE International Conference on, pages 142 -151, 29 2008-oct. 1 2008.
    • (2008) Cluster Computing, 2008 IEEE International Conference on , pp. 142-151
    • Perez, J.1    Badia, R.2    Labarta, J.3
  • 39
    • 79959909380 scopus 로고    scopus 로고
    • Parallelism orchestration using DoPE: The degree of parallelism executive
    • New York, NY, USA, ACM
    • A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using DoPE: the degree of parallelism executive. In PLDI '11, pages 26-37, New York, NY, USA, 2011. ACM.
    • (2011) PLDI '11 , pp. 26-37
    • Raman, A.1    Kim, H.2    Oh, T.3    Lee, J.W.4    August, D.I.5
  • 40
    • 84866433289 scopus 로고    scopus 로고
    • Parcae: A system for flexible parallel execution
    • New York, NY, USA
    • A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: a system for flexible parallel execution. In PLDI '12, pages 133-144, New York, NY, USA, 2012.
    • (2012) PLDI '12 , pp. 133-144
    • Raman, A.1    Zaks, A.2    Lee, J.W.3    August, D.I.4
  • 41
    • 34547679939 scopus 로고    scopus 로고
    • Evaluating mapreduce for multi-core and multiprocessor systems
    • Washington, DC, USA
    • C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA '07, pages 13-24, Washington, DC, USA, 2007.
    • (2007) HPCA '07 , pp. 13-24
    • Ranger, C.1    Raghuraman, R.2    Penmetsa, A.3    Bradski, G.4    Kozyrakis, C.5
  • 43
    • 84874272738 scopus 로고    scopus 로고
    • Towards a generic observer/controller architecture for organic computing
    • C. Hochberger and R. Liskowsky, editors, GI Jahrestagung (1), GI
    • U. Richter, M. Mnif, J. Branke, C. Müller-Schloer, and H. Schmeck. Towards a generic observer/controller architecture for organic computing. In C. Hochberger and R. Liskowsky, editors, GI Jahrestagung (1), volume 93 of LNI, pages 112-119. GI, 2006.
    • (2006) LNI , vol.93 , pp. 112-119
    • Richter, U.1    Mnif, M.2    Branke, J.3    Müller-Schloer, C.4    Schmeck, H.5
  • 44
    • 84867557523 scopus 로고    scopus 로고
    • Scalability-based manycore partitioning
    • New York, NY, USA
    • H. Sasaki, T. Tanimoto, K. Inoue, and H. Nakamura. Scalability-based manycore partitioning. In PACT '12, pages 107-116, New York, NY, USA, 2012.
    • (2012) PACT '12 , pp. 107-116
    • Sasaki, H.1    Tanimoto, T.2    Inoue, K.3    Nakamura, H.4
  • 45
    • 77957764904 scopus 로고    scopus 로고
    • Feedback-driven threading: Power-efficient and high-performance execution of multithreaded workloads on CMPs
    • M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multithreaded workloads on CMPs. In ASPLOS '08, pages 277-286, 2008.
    • (2008) ASPLOS '08 , pp. 277-286
    • Suleman, M.A.1    Qureshi, M.K.2    Patt, Y.N.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.