SCOPUS 정보 검색 플랫폼

Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003

Volumn , Issue , 2003, Pages

Hierarchical clustered register file organization for VLIW processors

(4) Zalamea, Javier a Llosa, Josep a Ayguadé, Eduard a Valero, Mateo a

a UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

DISTRIBUTED PARAMETER NETWORKS; FILE ORGANIZATION; PARALLEL PROCESSING SYSTEMS; SCHEDULING;

CLUSTER SELECTION; COMMUNICATION OPERATION; DESIGN EXPLORATION; INSTRUCTION SCHEDULING; INTERCLUSTER COMMUNICATION; MICROPROCESSOR DESIGNS; REGISTER ALLOCATION; REGISTER FILE ORGANIZATIONS;

VERY LONG INSTRUCTION WORD ARCHITECTURE;

EID: 84947286374 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2003.1213178 Document Type: Conference Paper

Times cited : (14)

References (38)

1
- 0020915645
- Conversion of control dependence to data dependence
- January
- J. Allen, K. Kennedy, and J. Warren. Conversion of control dependence to data dependence. In Proc. of the 10th annual Symposium on Principles of Programming Languages, January 1983.
- (1983) Proc. of the 10th Annual Symposium on Principles of Programming Languages
- Allen, J.¹ Kennedy, K.² Warren, J.³

2
- 3242744876
- ICTINEO: A tool for research on ilp
- E. Ayguadé, C. Barrado, J. Labarta, J. Llosa, D. López, S. Moreno, D. Padua, E. Riera, and M. Valero. ICTINEO: A tool for research on ilp. In Proc. of the Supercomputing'96 (SC'96), Research Exhibit "Polaris at Work", 1996.
- (1996) Proc. of the Supercomputing'96 (SC'96), Research Exhibit "Polaris at Work"
- Ayguadé, E.¹ Barrado, C.² Labarta, J.³ Llosa, J.⁴ López, D.⁵ Moreno, S.⁶ Padua, D.⁷ Riera, E.⁸ Valero, M.⁹

3
- 0003477925
- The perfect club benchmarks: Effective performance evaluation of supercomputers
- November
- M. Berry, D. Chen, P. Koss, and D. Kuck. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Technical Report 827, Center for Supercomputing Research and Development, November 1988.
- (1988) Technical Report 827, Center for Supercomputing Research and Development
- Berry, M.¹ Chen, D.² Koss, P.³ Kuck, D.⁴

4
- 0026138044
- Software prefetching
- April
- D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pages 40-52, April 1991.
- (1991) Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV) , pp. 40-52
- Callahan, D.¹ Kennedy, K.² Porterfield, A.³

5
- 0026993183
- Partitioned register files for VLIWs: A preliminary analysis of tradeoffs
- December
- A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proc. of the 25th Int. Symp. on Microarchitecture (MICRO-25), pages 292-300, December 1992.
- (1992) Proc. of the 25th Int. Symp. on Microarchitecture (MICRO-25) , pp. 292-300
- Capitanio, A.¹ Dutt, N.² Nicolau, A.³

6
- 0026157612
- IMPACT: An architectural framework for multiple-instruction-issue processors
- P. Chang, S. Mahlke, W. Chen, N. Warter, and W. Hwu. IMPACT: An architectural framework for multiple-instruction-issue processors. In Proc. of the 18th Int. Symp. on Computer Architecture, pages 266-275, 1991.
- (1991) Proc. of the 18th Int. Symp. on Computer Architecture , pp. 266-275
- Chang, P.¹ Mahlke, S.² Chen, W.³ Warter, N.⁴ Hwu, W.⁵

7
- 0019610938
- An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family
- A. Charlesworth. An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family. Computer, 14(9):18-27, 1981.
- (1981) Computer , vol.14 , Issue.9 , pp. 18-27
- Charlesworth, A.¹

8
- 0035176849
- A unified modulo scheduling and register allocation technique for clustered processors
- September
- J. M. Codina, J. Sánchez, and A. González. A unified modulo scheduling and register allocation technique for clustered processors. In Proc. of the Int. Conf. on Parallel Architecture and Compilation Techniques (PACT'01), pages 175-184, September 2001.
- (2001) Proc. of the Int. Conf. on Parallel Architecture and Compilation Techniques (PACT'01) , pp. 175-184
- Codina, J.M.¹ Sánchez, J.² González, A.³

9
- 0033716803
- Multiple-banked register file architectures
- June
- J. Cruz, A. Gonzalez, M. Valero, and N. Topham. Multiple-banked register file architectures. In Proc. , 27th Annual Internat. Symp. on Computer Architecture, June 2000.
- (2000) Proc. , 27th Annual Internat. Symp. on Computer Architecture
- Cruz, J.¹ Gonzalez, A.² Valero, M.³ Topham, N.⁴

10
- 0027590187
- Compiling for the cydra 5
- May
- J. Dehnert and R. Towle. Compiling for the Cydra 5. The Journal of Supercomputing, 7(1/2):181-228, May 1993.
- (1993) The Journal of Supercomputing , vol.7 , Issue.1-2 , pp. 181-228
- Dehnert, J.¹ Towle, R.²

11
- 0029487619
- Stage scheduling: A technique to reduce the register requirements of a modulo schedule
- November
- A. Eichenberger and E. Davidson. Stage scheduling: A technique to reduce the register requirements of a modulo schedule. In Proc. of the 28th Int. Symp. on Microarchitecture (MICRO-28), pages 338-349, November 1995.
- (1995) Proc. of the 28th Int. Symp. on Microarchitecture (MICRO-28) , pp. 338-349
- Eichenberger, A.¹ Davidson, E.²

12
- 0033703885
- Lx: A technology platform for customizable VLIW embedded porcessing
- June
- P. Faraboschi, G. Brown, G. Desoli, and F. Homewood. Lx: A technology platform for customizable VLIW embedded porcessing. In Proc. of the 27th Int. Symp. on Computer Architecture, pages 203-213, June 2000.
- (2000) Proc. of the 27th Int. Symp. on Computer Architecture , pp. 203-213
- Faraboschi, P.¹ Brown, G.² Desoli, G.³ Homewood, F.⁴

13
- 0031650008
- Partitioned schedules for clustered vliw architectures
- March
- M. Fernandes, J. Llosa, and N. Topham. Partitioned schedules for clustered vliw architectures. In Proc. , 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'1998), pages 386-391, March 1998.
- (1998) Proc. , 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'1998) , pp. 386-391
- Fernandes, M.¹ Llosa, J.² Topham, N.³

14
- 0020632876
- Very long instruction word architectures and the ELI-512
- June
- J. Fisher. Very long instruction word architectures and the ELI-512. In Proc. of the Tenth Annual Internat. Symp. on Computer Architecture, pages 140-150, June 1983.
- (1983) Proc. of the Tenth Annual Internat. Symp. on Computer Architecture , pp. 140-150
- Fisher, J.¹

15
- 0033888003
- The tigersharc DSP architecture
- January-February
- J. Fridman and Z. Greefield. The tigersharc DSP architecture. IEEE Micro, pages 66-76, January-February 2000.
- (2000) IEEE Micro , pp. 66-76
- Fridman, J.¹ Greefield, Z.²

16
- 0003318618
- MAP1000 unfolds at Equator
- December
- P. N. Glaskowsky. MAP1000 unfolds at Equator. Microporcessor Report. , 12(16), December 1998.
- (1998) Microporcessor Report , vol.12 , Issue.16
- Glaskowsky, P.N.¹

17
- 0036287089
- The optimal useful logic depth per pipeline stage is 6-8 FO4
- May
- M. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, S. W. Keckler, and P. Shivakumar. The optimal useful logic depth per pipeline stage is 6-8 FO4. In Proc. , 29th Annual Internat. Symp. on Computer Architecture, pages 14-24, May 2002.
- (2002) Proc. , 29th Annual Internat. Symp. on Computer Architecture , pp. 14-24
- Hrishikesh, M.¹ Jouppi, N.P.² Farkas, K.I.³ Burger, D.⁴ Keckler, S.W.⁵ Shivakumar, P.⁶

18
- 0027870809
- Lifetime-sensitive modulo scheduling
- R. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the 6th Conference on Programming Language, Design and Implementation, pages 258-267, 1993.
- (1993) Proc. of the 6th Conference on Programming Language, Design and Implementation , pp. 258-267
- Huff, R.¹

19
- 0027595384
- The superblock: An effective technique for VLIW and superscalar compilation
- W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. The superblock: An effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1/2):229-248, 1993.
- (1993) Journal of Supercomputing , vol.7 , Issue.1-2 , pp. 229-248
- Hwu, W.¹ Mahlke, S.² Chen, W.³ Chang, P.⁴ Warter, N.⁵ Bringmann, R.⁶ Ouellette, R.⁷ Hank, R.⁸ Kiyohara, T.⁹ Haab, G.¹⁰ Holm, J.¹¹ Lavery, D.¹²

20
- 0032639289
- The alpha 21264 microprocessor
- March
- R. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March 1999.
- (1999) IEEE Micro , vol.19 , Issue.2 , pp. 24-36
- Kessler, R.¹

21
- 0042650298
- Software pipelining: An effective scheduling technique for VLIW machines
- June
- M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation, pages 318-328, June 1988.
- (1988) Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation , pp. 318-328
- Lam, M.¹

22
- 0010311891
- Non-consistent dual register files to reduce register pressure
- January
- J. Llosa, M. Valero, and E. Ayguadé. Non-consistent dual register files to reduce register pressure. In 1st Symposium on High Performance Computer Architecture, pages 22-31, January 1995.
- (1995) 1st Symposium on High Performance Computer Architecture , pp. 22-31
- Llosa, J.¹ Valero, M.² Ayguadé, E.³

23
- 0029488251
- Hypernode reduction modulo scheduling
- November
- J. Llosa, M. Valero, E. Ayguadé, and A. González. Hypernode reduction modulo scheduling. In Proc. of the 28th Int. Symp. on Microarchitecture (MICRO-28), pages 350-360, November 1995.
- (1995) Proc. of the 28th Int. Symp. on Microarchitecture (MICRO-28) , pp. 350-360
- Llosa, J.¹ Valero, M.² Ayguadé, E.³ González, A.⁴

24
- 2342562830
- Using Sacks to organize register files in VLIW machines
- September
- J. Llosa, M. Valero, J. Fortes, and E. Ayguadé. Using Sacks to organize register files in VLIW machines. In CONPAR 94-VAPP VI, September 1994.
- (1994) CONPAR 94-VAPP , vol.6
- Llosa, J.¹ Valero, M.² Fortes, J.³ Ayguadé, E.⁴

25
- 0032320834
- Effective cluster assignment for modulo scheduling
- November
- E. Nystrom and E. Eichenberger. Effective cluster assignment for modulo scheduling. In Proc. of the 31st. Int. Symp. on Microarchitecture (MICRO-31), pages 103-114, November 1998.
- (1998) Proc. of the 31st. Int. Symp. on Microarchitecture (MICRO-31) , pp. 103-114
- Nystrom, E.¹ Eichenberger, E.²

26
- 0002017307
- Instruction-level parallel processing: History, overview and perspective
- July
- B. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing, 7(1/2):9-50, July 1993.
- (1993) Journal of Supercomputing , vol.7 , Issue.1-2 , pp. 9-50
- Rau, B.¹ Fisher, J.A.²

27
- 0028768013
- Iterative modulo scheduling: An algorithm for software pipelining loops
- November
- B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Int. Symp. on Microarchitecture (MICRO-27), pages 63-74, November 1994.
- (1994) Proc. of the 27th Int. Symp. on Microarchitecture (MICRO-27) , pp. 63-74
- Rau, B.R.¹

28
- 0034581535
- Register organization for media processing
- January
- S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register organization for media processing. In Proc. , 6th High-Performance Computer Architecture (HPCA-6), pages 375-386, January 2000.
- (2000) Proc. , 6th High-Performance Computer Architecture (HPCA-6) , pp. 375-386
- Rixner, S.¹ Dally, W.² Khailany, B.³ Mattson, P.⁴ Kapasi, U.⁵ Owens, J.⁶

29
- 0017922490
- CRAY-1 computer system
- January
- R. Rusell. CRAY-1 computer system. In Communications of the ACM, vol 21, pages 63-72, January 1978.
- (1978) Communications of the ACM , vol.21 , pp. 63-72
- Rusell, R.¹

30
- 0031333911
- Cache sensitive modulo scheduling
- December
- J. Sánchez and A. González. Cache sensitive modulo scheduling. In Proc. of the 30th Int. Symp. on Microarchitecture (MICRO-30), pages 338-348, December 1997.
- (1997) Proc. of the 30th Int. Symp. on Microarchitecture (MICRO-30) , pp. 338-348
- Sánchez, J.¹ González, A.²

31
- 84949503766
- The effectiveness of loop unrolling for modulo scheduling in clustered vliw architectures
- August
- J. Sánchez and A. González. The effectiveness of loop unrolling for modulo scheduling in clustered vliw architectures. In Proc. of the International Conference on Parallel Processing (ICPP'2000), pages 555-562, August 2000.
- (2000) Proc. of the International Conference on Parallel Processing (ICPP'2000) , pp. 555-562
- Sánchez, J.¹ González, A.²

32
- 0003450887
- CACTI 3. 0: An integrated cache timing, power and area model
- Augost
- P. Shivakumar and N. P. Jouppi. CACTI 3. 0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Computer Corporation, Augost 2001.
- (2001) Technical Report 2001/2, Compaq Computer Corporation
- Shivakumar, P.¹ Jouppi, N.P.²

33
- 85009790881
- Hierarchical registers for scientific computers
- July
- J. Swensen and Y. Patt. Hierarchical registers for scientific computers. In International Conference on Supercomputing, pages 346-353, July 1988.
- (1988) International Conference on Supercomputing , pp. 346-353
- Swensen, J.¹ Patt, Y.²

34
- 0003759409
- Texas Instruments Inc
- Texas Instruments Inc. TMS320C62x/67x CPU and Instruction Set Reference Guide. 1998.
- (1998) TMS320C62x/67x CPU and Instruction Set Reference Guide

35
- 2342481489
- POWER2: Next generation of the RISC system/6000 family
- S. White and S. Dhawan. POWER2: Next generation of the RISC System/6000 family. In IBM RISC System/6000 Technology: Volume II. IBM Corporation, 1993.
- (1993) IBM RISC System/6000 Technology: Volume II. IBM Corporation
- White, S.¹ Dhawan, S.²

36
- 0034462834
- Two-level hierarchical register file organization for vliw processors
- December
- J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. Two-level hierarchical register file organization for vliw processors. In Proc. of the 33rd Int. Symp. on Microarchitecture (MICRO-33), pages 137-146, December 2000.
- (2000) Proc. of the 33rd Int. Symp. on Microarchitecture (MICRO-33) , pp. 137-146
- Zalamea, J.¹ Llosa, J.² Ayguadé, E.³ Valero, M.⁴

37
- 0035691538
- Modulo scheduling with integrated register spilling for clustered VLIW architectures
- December
- J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. Modulo scheduling with integrated register spilling for clustered VLIW architectures. In Proc. of the 34th Int. Symp. on Microarchitecture (MICRO-34), pages 160-169, December 2001.
- (2001) Proc. of the 34th Int. Symp. on Microarchitecture (MICRO-34) , pp. 160-169
- Zalamea, J.¹ Llosa, J.² Ayguadé, E.³ Valero, M.⁴

38
- 0007993334
- MIRS: Modulo scheduling with integrated register spilling
- August
- J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. MIRS: Modulo scheduling with integrated register spilling. In Proc. of the 14th Workshop on Languages and Compilers for Parallel Computing (LCPC'2001), August 2001.
- (2001) Proc. of the 14th Workshop on Languages and Compilers for Parallel Computing (LCPC'2001)
- Zalamea, J.¹ Llosa, J.² Ayguadé, E.³ Valero, M.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.