SCOPUS 정보 검색 플랫폼

Proceedings of the 2010 CGO - The 8th International Symposium on Code Generation and Optimization

Volumn , Issue , 2010, Pages 121-130

Decoupled software pipelining creates parallelization opportunities

(6) Huang, Jialu a Raman, Arun a Jablin, Thomas B a Zhang, Yun a Hung, Tzu Han a August, David I a

a Princeton University (United States)

Author keywords

DSWP; enabling transformation; multicore; parallelization; speculation

Indexed keywords

MULTI CORE; PARALLELIZATIONS; PARALLELIZING; PERFORMANCE GAIN; SOFTWARE PIPELINING;

COMPUTER SOFTWARE; NETWORK COMPONENTS;

OPTIMIZATION;

EID: 77954006048 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1772954.1772973 Document Type: Conference Paper

Times cited : (48)

References (24)

1
- 0037952146
- Morgan Kaufmann Publishers Inc.
- R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.
- (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Allen, R.¹ Kennedy, K.²

2
- 84973836157
- The NAS Parallel Benchmarks
- Fall
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):63-73, Fall 1991.
- (1991) International Journal of Supercomputer Applications , vol.5 , Issue.3 , pp. 63-73
- Bailey, D.H.¹ Barszcz, E.² Barton, J.T.³ Browning, D.S.⁴ Carter, R.L.⁵ Dagum, D.⁶ Fatoohi, R.A.⁷ Frederickson, P.O.⁸ Lasinski, T.A.⁹ Schreiber, R.S.¹⁰ Simon, H.D.¹¹ Venkatakrishnan, V.¹² Weeratunga, S.K.¹³

3
- 77949706996
- PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November
- M. J. Bridges. The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008.
- (2008) The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes
- Bridges, M.J.¹

4
- 0022893044
- DOACROSS: Beyond vectorization for multiprocessors
- August
- R. Cytron. DOACROSS: Beyond vectorization for multiprocessors. In Proceedings of the International Conference on Parallel Processing, pages 836-884, August 1986.
- (1986) Proceedings of the International Conference on Parallel Processing , pp. 836-884
- Cytron, R.¹

5
- 84966550525
- The R-LRPD test: Speculative parallelization of partially parallel loops
- F. H. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 318, 2002.
- (2002) IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium , pp. 318
- Dang, F.H.¹ Yu, H.² Rauchwerger, L.³

6
- 26444605254
- Master's thesis, Department of Computer Science, University of Illinois, Urbana, IL, May
- J. R. B. Davies. Parallel loop constructs for multiprocessors. Master's thesis, Department of Computer Science, University of Illinois, Urbana, IL, May 1981.
- (1981) Parallel Loop Constructs for Multiprocessors
- Davies, J.R.B.¹

7
- 0023385308
- PROGRAM DEPENDENCE GRAPH and ITS USE in OPTIMIZATION
- DOI 10.1145/24039.24041
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9:319-349, July 1987. (Pubitemid 17641083)
- (1987) ACM Transactions on Programming Languages and Systems , vol.9 , Issue.3 , pp. 319-349
- Ferrante, J.¹ Ottenstein Karl, J.² Warren Joe, D.³

8
- 79959411439
- FastForward for efficient pipeline parallelism: A cache-optimized concurrent lock-free queue
- New York, NY, USA, February
- J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 43-52, New York, NY, USA, February 2008.
- (2008) PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , pp. 43-52
- Giacomoni, J.¹ Moseley, T.² Vachharajani, M.³

9
- 35048876693
- Improving parallel irregular reductions using partial array expansion
- (CDROM), New York, NY, USA, ACM
- E. Gutiérrez, O. Plata, and E. L. Zapata. Improving parallel irregular reductions using partial array expansion. In Supercomputing '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 38-38, New York, NY, USA, 2001. ACM.
- (2001) Supercomputing '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing , pp. 38-38
- Gutiérrez, E.¹ Plata, O.² Zapata, E.L.³

10
- 84947939650
- Improving compiler and run-time support for irregular reductions using local writes
- London, UK, Springer-Verlag
- H. Han and C.-W. Tseng. Improving compiler and run-time support for irregular reductions using local writes. In LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, pages 181-196, London, UK, 1999. Springer-Verlag.
- (1999) LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing , pp. 181-196
- Han, H.¹ Tseng, C.-W.²

11
- 0025550566
- Loop distribution with arbitrary control flow
- November
- K. Kennedy and K. S. McKinley. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing, pages 407-416, November 1990.
- (1990) Proceedings of Supercomputing , pp. 407-416
- Kennedy, K.¹ McKinley, K.S.²

12
- 35448941890
- Optimistic parallelism requires abstractions
- New York, NY, USA, ACM
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACMSIGPLAN Conference on Programming Language Design and Implementation, pages 211-222, New York, NY, USA, 2007. ACM.
- (2007) PLDI '07: Proceedings of the 2007 ACMSIGPLAN Conference on Programming Language Design and Implementation , pp. 211-222
- Kulkarni, M.¹ Pingali, K.² Walter, B.³ Ramanarayanan, G.⁴ Bala, K.⁵ Chew, L.P.⁶

13
- 47349098275
- Minebench: A benchmark suite for data mining workloads
- 0
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. Minebench: A benchmark suite for data mining workloads. IEEEWorkload Characterization Symposium, 0:182-188, 2006.
- (2006) IEEEWorkload Characterization Symposium , pp. 182-188
- Narayanan, R.¹ Ozisikyilmaz, B.² Zambreno, J.³ Memik, G.⁴ Choudhary, A.⁵

14
- 79959468284
- Software thread-level speculation: An optimistic library implementation
- New York, NY, USA, ACM
- C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop onMulticore Software Engineering, pages 23-32, New York, NY, USA, 2008. ACM.
- (2008) IWMSE '08: Proceedings of the 1st International Workshop OnMulticore Software Engineering , pp. 23-32
- Oancea, C.E.¹ Mycroft, A.²

15
- 33749375700
- Automatic thread extraction with decoupled software pipelining
- November
- G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 105-116, November 2005.
- (2005) Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 105-116
- Ottoni, G.¹ Rangan, R.² Stoler, A.³ August, D.I.⁴

16
- 43449113286
- Parallel-stage decoupled software pipelining
- E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 2008 International Symposium on Code Generation and Optimization, April 2008.
- Proceedings of the 2008 International Symposium on Code Generation and Optimization, April 2008
- Raman, E.¹ Ottoni, G.² Raman, A.³ Bridges, M.⁴ August, D.I.⁵

17
- 0033076827
- The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization
- L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160-180, 1999.
- (1999) IEEE Transactions on Parallel and Distributed Systems , vol.10 , Issue.2 , pp. 160-180
- Rauchwerger, L.¹ Padua, D.A.²

18
- 77954011474
- Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions
- D. E. Singh, M. J. Martin, and F. F. Rivera. Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions. Int. J. Comput. Sci. Eng., 1(1):1-14, 2005.
- (2005) Int. J. Comput. Sci. Eng. , vol.1 , Issue.1 , pp. 1-14
- Singh, D.E.¹ Martin, M.J.² Rivera, F.F.³

19
- 77954008735
- Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.

20
- 33745198176
- The STAMPede approach to thread-level speculation
- February
- J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACMTransactions on Computer Systems, 23(3):253-300, February 2005.
- (2005) ACMTransactions on Computer Systems , vol.23 , Issue.3 , pp. 253-300
- Steffan, J.G.¹ Colohan, C.² Zhai, A.³ Mowry, T.C.⁴

21
- 77953977802
- StreamIt benchmarks. http://compiler.lcs.mit.edu/streamit.
- StreamIt Benchmarks

22
- 41349089872
- Speculative decoupled software pipelining
- N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques, September 2007.
- Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques, September 2007
- Vachharajani, N.¹ Rangan, R.² Raman, E.³ Bridges, M.J.⁴ Ottoni, G.⁵ August, D.I.⁶

23
- 57749168614
- Uncovering hidden loop level parallelism in sequential applications
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, 2008.
- Proc. of the 14th International Symposium on High-Performance Computer Architecture, 2008
- Zhong, H.¹ Mehrara, M.² Lieberman, S.³ Mahlke, S.⁴

24
- 84948955651
- Master/slave speculative parallelization
- November
- C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 85-96, November 2002.
- (2002) Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture , pp. 85-96
- Zilles, C.¹ Sohi, G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.