SCOPUS 정보 검색 플랫폼

Proceedings of the Annual International Symposium on Microarchitecture, MICRO

Volumn , Issue , 2005, Pages 120-129

Exploiting vector parallelism in software pipelined loops

(3) Larsen, Samuel a Rabbah, Rodric a Amarasinghe, Saman a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COST ANALYSIS; DATA LEVEL PARALLELISM; SOFTWARE PIPELINED LOOPS; VECTORIZATION;

BENCHMARKING; COMPUTATIONAL METHODS; COMPUTER HARDWARE; EMBEDDED SYSTEMS; OPTIMIZATION; PIPELINE PROCESSING SYSTEMS; PROGRAM COMPILERS; PROGRAM PROCESSORS; SUPERCOMPUTERS;

PARALLEL PROCESSING SYSTEMS;

EID: 33749373820 PISSN: 10724451 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/MICRO.2005.20 Document Type: Conference Paper

Times cited : (18)

References (43)

1
- 84859683680
- Codeplay VectorC. http://www.codeplay.com.
- Codeplay VectorC

2
- 84859688499
- IBM XL C/C++ and Fortran compilers. http://www-306.ibm.com/software/ awdtools/xlcpp/.
- IBM XL C/C++ and Fortran Compilers

3
- 84859677886
- Trimaran Research Infrastructure, http://www.trimaran.org.

4
- 84859677890
- VAST-C/AltiVec. http://www.crescentbaysoftware.com.

5
- 0035696746
- Graph-partitioning based instruction scheduling for clustered processors
- Austin, TX, December
- A. Aletà, J. M. Codina, J. Sánchez, and A. Gonzalez. Graph-Partitioning Based Instruction Scheduling for Clustered Processors. In Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 150-159, Austin, TX, December 2001.
- (2001) Proceedings of the 34th Annual International Symposium on Microarchitecture , pp. 150-159
- Aletà, A.¹ Codina, J.M.² Sánchez, J.³ Gonzalez, A.⁴

6
- 0037952146
- Morgan Kaufmann, San Francisco, California
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, California, 2001.
- (2001) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Allen, R.¹ Kennedy, K.²

7
- 24144474794
- Intel Press, Hillsboro, OR
- A. J. Bik. The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro, OR, 2004.
- (2004) The Software Vectorization Handbook: Applying Multimedia Extensions for Maximum Performance
- Bik, A.J.¹

8
- 0035176849
- A unified modulo scheduling and register allocation technique for clustered processors
- Barcelona, Spain, September
- J. M. Codina, J. Sánchez, and A. Gonzalez. A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques, pages 175-184, Barcelona, Spain, September 2001.
- (2001) Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques , pp. 175-184
- Codina, J.M.¹ Sánchez, J.² Gonzalez, A.³

9
- 0033359181
- On the complexity of loop fusion
- Newport Beach, CA, October
- A. Darte. On the complexity of loop fusion. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, pages 149-157, Newport Beach, CA, October 1999.
- (1999) Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques , pp. 149-157
- Darte, A.¹

10
- 0004159203
- Master's thesis, University of Toronto, June
- D. J. DeVries. A Vectorizing SUIF Compiler: Implementation and Performance. Master's thesis, University of Toronto, June 1997.
- (1997) A Vectorizing SUIF Compiler: Implementation and Performance
- DeVries, D.J.¹

11
- 0033872689
- AltiVec extension to powerPC accelerates media processing
- March
- K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec Extension to PowerPC Accelerates Media Processing. IEEE Micro, 20(2):85-95, March 2000.
- (2000) IEEE Micro , vol.20 , Issue.2 , pp. 85-95
- Diefendorff, K.¹ Dubey, P.K.² Hochsprung, R.³ Scales, H.⁴

12
- 0029487076
- Stage scheduling: A technique to reduce the register requirements of a modulo schedule
- Ann Arbor, MI, November
- A. E. Eichenberger and E. S. Davidson. Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 180-191, Ann Arbor, MI, November 1995.
- (1995) Proceedings of the 28th Annual International Symposium on Microarchitecture , pp. 180-191
- Eichenberger, A.E.¹ Davidson, E.S.²

13
- 8344245462
- Vectorization for SIMD architectures with alignment constraints
- Washington, DC, June
- A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proceedings of the SIGPLAN '04 Conference on Programming Language Design and Implementation, pages 82-93, Washington, DC, June 2004.
- (2004) Proceedings of the SIGPLAN '04 Conference on Programming Language Design and Implementation , pp. 82-93
- Eichenberger, A.E.¹ Wu, P.² O'Brien, K.³

14
- 0033888003
- The TigerSHARC DSP architecture
- January
- J. Fridman and Z. Greenfield. The TigerSHARC DSP Architecture. IEEE Micro, 20(1):66-76, January 2000.
- (2000) IEEE Micro , vol.20 , Issue.1 , pp. 66-76
- Fridman, J.¹ Greenfield, Z.²

15
- 0027870809
- Lifetime-sensitive modulo scheduling
- Albuquerque, NM, June
- R. A. Huff. Lifetime-Sensitive Modulo Scheduling. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 258-267, Albuquerque, NM, June 1993.
- (1993) Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation , pp. 258-267
- Huff, R.A.¹

16
- 84990479742
- An efficient heuristic procedure for partitioning graphs
- February
- B. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. Bell System Technical Journal, 49:291-307, February 1970.
- (1970) Bell System Technical Journal , vol.49 , pp. 291-307
- Kernighan, B.¹ Lin, S.²

17
- 0034250996
- Compilation techniques for multimedia processors
- August
- A. Krall and S. Lelait. Compilation Techniques for Multimedia Processors. International Journal of Parallel Programming, 28(4):347-361, August 2000.
- (2000) International Journal of Parallel Programming , vol.28 , Issue.4 , pp. 347-361
- Krall, A.¹ Lelait, S.²

18
- 33745189827
- Generation of permutations for SIMD processors
- Chicago, IL, June
- A. Kudriavtsev and P. Kogge. Generation of Permutations for SIMD Processors. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pages 147-156, Chicago, IL, June 2005.
- (2005) Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems , pp. 147-156
- Kudriavtsev, A.¹ Kogge, P.²

19
- 0042650298
- Software pipelining: An effective scheduling technique for VLIW machines
- Atlanta, GA, June
- M. Lam. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation, pages 318-328, Atlanta, GA, June 1988.
- (1988) Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation , pp. 318-328
- Lam, M.¹

20
- 0034446825
- Exploiting superword level parallelism with multimedia instruction sets
- Vancouver, BC, June
- S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the SIGPLAN'00 Conference on Programming Language Design and Implementation, pages 145-156, Vancouver, BC, June 2000.
- (2000) Proceedings of the SIGPLAN'00 Conference on Programming Language Design and Implementation , pp. 145-156
- Larsen, S.¹ Amarasinghe, S.²

21
- 84948766393
- Increasing and detecting memory address congruence
- Charlottesville, VA, September
- S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques, pages 18-29, Charlottesville, VA, September 2002.
- (2002) Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques , pp. 18-29
- Larsen, S.¹ Witchel, E.² Amarasinghe, S.³

22
- 0030384118
- Modulo scheduling of loops in control-intensive non-numeric programs
- Paris, France, December
- D. M. Lavery and W. mei W. Hwu. Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 126-137, Paris, France, December 1996.
- (1996) Proceedings of the 29th Annual International Symposium on Microarchitecture , pp. 126-137
- Lavery, D.M.¹ Mei, W.² Hwu, W.³

23
- 0002449750
- Subword parallelism with MAX-2
- August
- R. Lee. Subword Parallelism with MAX-2. IEEE Micro, 16(4):51-59, August 1996.
- (1996) IEEE Micro , vol.16 , Issue.4 , pp. 51-59
- Lee, R.¹

24
- 0038633609
- Itanium 2 processor microarchitecture
- March
- C. McNairy and D. Soltis. Itanium 2 Processor Microarchitecture. IEEE Micro, 23(2):44-55, March 2003.
- (2003) IEEE Micro , vol.23 , Issue.2 , pp. 44-55
- McNairy, C.¹ Soltis, D.²

25
- 20444406225
- Autovectorization in GCC
- D. Naishlos. Autovectorization in GCC. In Proceedings of the 2004 GCC Developers Summit, pages 105-118, 2004.
- (2004) Proceedings of the 2004 GCC Developers Summit , pp. 105-118
- Naishlos, D.¹

26
- 4544372264
- Vectorizing for a SIMdD DSP architecture
- San Jose, CA, October
- D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP Architecture. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 2-11, San Jose, CA, October 2003.
- (2003) Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems , pp. 2-11
- Naishlos, D.¹ Biberstein, M.² Ben-David, S.³ Zaks, A.⁴

27
- 0032320834
- Effective cluster assignment for modulo scheduling
- Dallas, TX, December
- E. Nystrom and A. E. Eichenberger. Effective Cluster Assignment for Modulo Scheduling. In Proceedings of the 31st Annual. International Symposium on Microarchitecture, pages 103-114, Dallas, TX, December 1998.
- (1998) Proceedings of the 31st Annual. International Symposium on Microarchitecture , pp. 103-114
- Nystrom, E.¹ Eichenberger, A.E.²

28
- 33745223764
- Pointer alignment analysis for processors with SIMD instructions
- San Diego, CA, December
- I. Pryanishnikov, A. Krall, and N. Horspool. Pointer Alignment Analysis for Processors with SIMD Instructions. In Proceedings of the 5th Workshop on Media and Streaming Processors, pages 50-57, San Diego, CA, December 2003.
- (2003) Proceedings of the 5th Workshop on Media and Streaming Processors , pp. 50-57
- Pryanishnikov, I.¹ Krall, A.² Horspool, N.³

29
- 0034224812
- Implementing streaming SIMD extensions on the pentium III processor
- July
- S. K. Raman, V. Pentkovski, and J. Keshava. Implementing Streaming SIMD Extensions on the Pentium III Processor. IEEE Micro, 20(4):47-57, July 2000.
- (2000) IEEE Micro , vol.20 , Issue.4 , pp. 47-57
- Raman, S.K.¹ Pentkovski, V.² Keshava, J.³

30
- 0026966702
- Register allocation for software pipelined loops
- San Francisco, CA, June
- B. Rau, M. Lee, P. Tirumalai, and M. Schlansker. Register Allocation for Software Pipelined Loops. In Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 283-299, San Francisco, CA, June 1992.
- (1992) Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation , pp. 283-299
- Rau, B.¹ Lee, M.² Tirumalai, P.³ Schlansker, M.⁴

31
- 0009755242
- Iterative modulo scheduling
- Hewlett Packard Company, November
- B. R. Rau. Iterative Modulo Scheduling. Technical Report HPL-94-115, Hewlett Packard Company, November 1995.
- (1995) Technical Report , vol.HPL-94-115
- Rau, B.R.¹

32
- 0026976353
- Code generation schema for modulo scheduled loops
- Portland, OR, December
- B. R. Rau, M. S. Schlansker, and P. Tirumalai. Code Generation Schema for Modulo Scheduled Loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 158-169, Portland, OR, December 1992.
- (1992) Proceedings of the 25th Annual International Symposium on Microarchitecture , pp. 158-169
- Rau, B.R.¹ Schlansker, M.S.² Tirumalai, P.³

33
- 33646554301
- Superword-level parallelism in the presence of control flow
- San Jose, CA, March
- J. Shin, M. Hall, and J. Chame. Superword-Level Parallelism in the Presence of Control Flow. In Proceedings of the International Symposium on Code Generation and Optimization, pages 165-175, San Jose, CA, March 2005.
- (2005) Proceedings of the International Symposium on Code Generation and Optimization , pp. 165-175
- Shin, J.¹ Hall, M.² Chame, J.³

34
- 84948740064
- Compiler-controlled caching in superword register files for multimedia extension architecture
- Charlottesville, VA, September
- J. Shin, J. Chame, and M. Hall. Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architecture. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques, pages 45-55, Charlottesville, VA, September 2002.
- (2002) Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques , pp. 45-55
- Shin, J.¹ Chame, J.² Hall, M.³

35
- 0034249157
- A vectorizing compiler for multimedia extensions
- August
- N. Sreraman and R. Govindarajan. A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming, 28(4):363-400, August 2000.
- (2000) International Journal of Parallel Programming , vol.28 , Issue.4 , pp. 363-400
- Sreraman, N.¹ Govindarajan, R.²

36
- 0001790593
- Depth first search and linear graph algorithms
- June
- R. E. Tarjan. Depth First Search and Linear Graph Algorithms. SIAM Journal of Computing, 1(2):146-160, June 1972.
- (1972) SIAM Journal of Computing , vol.1 , Issue.2 , pp. 146-160
- Tarjan, R.E.¹

37
- 0041606016
- VIS speeds new media processing
- August
- M. Tremblay, M. O'Connor, V. Narayanan, and L. He. VIS Speeds New Media Processing. IEEE Micro, 16(4): 10-20, August 1996.
- (1996) IEEE Micro , vol.16 , Issue.4 , pp. 10-20
- Tremblay, M.¹ O'Connor, M.² Narayanan, V.³ He, L.⁴

38
- 84976692695
- SUIF: An infrastructure for research on parallelizing and optimizing compilers
- December
- R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12):31-37, December 1994.
- (1994) ACM SIGPLAN Notices , vol.29 , Issue.12 , pp. 31-37
- Wilson, R.P.¹ French, R.S.² Wilson, C.S.³ Amarasinghe, S.P.⁴ Anderson, J.M.⁵ Tjiang, S.W.K.⁶ Liao, S.-W.⁷ Tseng, C.-W.⁸ Hall, M.W.⁹ Lam, M.S.¹⁰ Hennessy, J.L.¹¹

39
- 0003927035
- Addison-Wesley, Redwood City, California
- M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, California, 1996.
- (1996) High Performance Compilers for Parallel Computing
- Wolfe, M.J.¹

40
- 33646833599
- Efficient SIMD code generation for runtime alignment and length conversion
- San Jose, CA, March
- P. Wu, A. E. Eichenberger, and A. Wang. Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In Proceedings of the International Symposium on Code Generation and Optimization, pages 153-164, San Jose, CA, March 2005.
- (2005) Proceedings of the International Symposium on Code Generation and Optimization , pp. 153-164
- Wu, P.¹ Eichenberger, A.E.² Wang, A.³

41
- 32844466554
- An integrated simdization framework using virtual vectors
- Cambridge, MA, June
- P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An Integrated Simdization Framework Using Virtual Vectors. In Proceedings of the 19th ACM International Conference on Supercomputing, pages 169-178, Cambridge, MA, June 2005.
- (2005) Proceedings of the 19th ACM International Conference on Supercomputing , pp. 169-178
- Wu, P.¹ Eichenberger, A.E.² Wang, A.³ Zhao, P.⁴

42
- 0035691538
- Modulo scheduling with integrated register spilling for clustered VLIW architectures
- Austin, TX, December
- J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures. In Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 160-169, Austin, TX, December 2001.
- (2001) Proceedings of the 34th Annual International Symposium on Microarchitecture , pp. 160-169
- Zalamea, J.¹ Llosa, J.² Ayguadé, E.³ Valero, M.⁴

43
- 33744471420
- Scalarization on short vector machines
- Austin, TX, March
- Y. Zhao and K. Kennedy. Scalarization on Short Vector Machines. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 187-196, Austin, TX, March 2005.
- (2005) IEEE International Symposium on Performance Analysis of Systems and Software , pp. 187-196
- Zhao, Y.¹ Kennedy, K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.