SCOPUS 정보 검색 플랫폼

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

Volumn , Issue , 2009, Pages 348-357

Data layout transformation for enhancing data locality on NUCA chip multiprocessors

(11) Lu, Qingda a Alias, Christophe b Bondhugula, Uday a Henretty, Thomas a Krishnamoorthy, Sriram c Ramanujam, J d Rountev, Atanas a Sadayappan, P a Chen, Yongjian e Ngai, Tin Fook e Lin, Haibo f

a OHIO STATE UNIVERSITY (United States)

b UNIVERSITÉ DE LYON (France)

c PACIFIC NORTHWEST NATIONAL LABORATORY (United States)

d LOUISIANA STATE UNIVERSITY (United States)

e INTEL CORPORATION (United States)

f IBM CHINA RESEARCH LAB (China)

Author keywords

Data layout optimization; NUCA cache; Polyhedral model

Indexed keywords

ADDRESS SPACE; ARRAY REFERENCES; CHIP MULTIPROCESSOR; COMPILE TIME; DATA ACCESS PATTERNS; DATA LAYOUT OPTIMIZATION; DATA LAYOUTS; DATA LOCALITY; DATA LOCALITY OPTIMIZATION; HOTSPOTS; L2 CACHE; LOCALIZABILITY; MULTI-PROCESSORS; NONLOCAL; POLYHEDRAL MODELS; PROGRAM TRANSFORMATION TECHNIQUES; SIMULATION-BASED; TILED ARCHITECTURE; TILED CMP;

MULTIPROCESSING SYSTEMS; OPTIMIZATION; PARALLEL ARCHITECTURES; SIMULATORS; SYSTEMS ANALYSIS;

MICROPROCESSOR CHIPS;

EID: 70449628310 PISSN: 1089795X EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/PACT.2009.36 Document Type: Conference Paper

Times cited : (67)

References (42)

1
- 3042669130
- IBM POWER5 chip: A dual-core multithreaded processor
- R. N. Kalla, B. Sinharoy, and J. M. Tendler, "IBM POWER5 chip: A dual-core multithreaded processor," IEEE Micro, vol. 24, no. 2, pp. 40-47, 2004.
- (2004) IEEE Micro , vol.24 , Issue.2 , pp. 40-47
- Kalla, R.N.¹ Sinharoy, B.² Tendler, J.M.³

2
- 20344374162
- Niagara: A 32-way multithreaded sparc processor
- P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way multithreaded sparc processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, 2005.
- (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
- Kongetira, P.¹ Aingaran, K.² Olukotun, K.³

3
- 0036949388
- An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
- C. Kim, D. Burger, and S. W. Keckler, "An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches," in ASPLOS'02.
- ASPLOS'02
- Kim, C.¹ Burger, D.² Keckler, S.W.³

4
- 21644472427
- Managing wire delay in large chip-multiprocessor caches
- B. M. Beckmann and D. A. Wood, "Managing wire delay in large chip-multiprocessor caches," in MICRO'04.
- MICRO'04
- Beckmann, B.M.¹ Wood, D.A.²

5
- 84956979498
- Lawra: Linear algebra with recursive algorithms
- London, UK: Springer-Verlag
- B. S. Andersen, F. G. Gustavson, A. Karaivanov, M. Marinova, J. Waniewski, and P. Y. Yalamov, "Lawra: Linear algebra with recursive algorithms," in PARA '00. London, UK: Springer-Verlag, 2001, pp. 38-51.
- (2001) PARA '00 , pp. 38-51
- Andersen, B.S.¹ Gustavson, F.G.² Karaivanov, A.³ Marinova, M.⁴ Waniewski, J.⁵ Yalamov, P.Y.⁶

6
- 0026933251
- Some efficient solutions to the affine scheduling problem: I. one-dimensional time
- P. Feautrier, "Some efficient solutions to the affine scheduling problem: I. one-dimensional time," IJPP, vol. 21, no. 5, pp. 313-348, 1992.
- (1992) IJPP , vol.21 , Issue.5 , pp. 313-348
- Feautrier, P.¹

7
- 0032058019
- Constraint-based array dependence analysis
- W. Pugh and D. Wonnacott, "Constraint-based array dependence analysis," ACM Trans. Program. Lang. Syst., vol. 20, no. 3, pp. 635-678, 1998.
- (1998) ACM Trans. Program. Lang. Syst. , vol.20 , Issue.3 , pp. 635-678
- Pugh, W.¹ Wonnacott, D.²

8
- 0029711429
- Minimizing communication while preserving parallelism
- New York, NY, USA: ACM
- W. Kelly and W. Plugh, "Minimizing communication while preserving parallelism," in ICS '96. New York, NY, USA: ACM, 1996, pp. 52-60.
- (1996) ICS '96 , pp. 52-60
- Kelly, W.¹ Plugh, W.²

9
- 0034299275
- Generation of efficient nested loops from polyhedra
- F. Quilleré, S. V. Rajopadhye, and D. Wilde, "Generation of efficient nested loops from polyhedra," Intl. J. of Parallel Programming, vol. 28, no. 5, pp. 469-498, 2000.
- (2000) Intl. J. of Parallel Programming , vol.28 , Issue.5 , pp. 469-498
- Quilleré, F.¹ Rajopadhye, S.V.² Wilde, D.³

10
- 84877711343
- "CLooG: The Chunky Loop Generator," http://www.cloog.org.
- CLooG: The Chunky Loop Generator

11
- 57349167317
- Iterative optimization in the polyhedral model: Part II, multidimensional time
- Tucson, Arizona, June
- L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen, "Iterative optimization in the polyhedral model: Part II, multidimensional time," in PLDI'08, Tucson, Arizona, June 2008.
- (2008) PLDI'08
- Pouchet, L.-N.¹ Bastoul, C.² Cavazos, J.³ Cohen, A.⁴

12
- 57349139452
- A practical automatic polyhedral parallelizer and locality optimizer
- New York, NY, USA: ACM
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A practical automatic polyhedral parallelizer and locality optimizer," in PLDI '08. New York, NY, USA: ACM, 2008, pp. 101-113.
- (2008) PLDI '08 , pp. 101-113
- Bondhugula, U.¹ Hartono, A.² Ramanujam, J.³ Sadayappan, P.⁴

13
- 84869677671
- Version 4.3.2
- "GCC: GCC, the GNU Compiler Collection, Version 4.3.2, " http://gcc.gnu.org.
- GCC: GCC, the GNU Compiler Collection

14
- 33646559059
- Automatic parallelization of loop programs for distributed memory architectures
- University of Passau, habilitation Thesis. [Online]. Available
- M. Griebl, Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004, habilitation Thesis. [Online]. Available: http://www.uni-passau.de/~griebl/habilitation.html
- (2004) FMI
- Griebl, M.¹

15
- 0029181140
- Data and computation transformations for multiprocessors
- J. M. Anderson, S. P. Amarasinghe, and M. S. Lam, "Data and computation transformations for multiprocessors," in PPOPP'95.
- PPOPP'95
- Anderson, J.M.¹ Amarasinghe, S.P.² Lam, M.S.³

16
- 0031622954
- Data transformations for eliminating conflict misses
- G. Rivera and C.-W. Tseng, "Data transformations for eliminating conflict misses," in PLDI'98.
- PLDI'98
- Rivera, G.¹ Tseng, C.-W.²

17
- 0042193410
- Stanford University, phD Dissertation. [Online]. Available
- S. Amarasinghe, Paralelizing Compiler Techniques Based on Linear Inequalities. Stanford University, 1997, phD Dissertation. [Online]. Available: http://suif.stanford.edu/papers/amarasinghe97.ps
- (1997) Paralelizing Compiler Techniques Based on Linear Inequalities
- Amarasinghe, S.¹

18
- 43449109107
- Virtutech AB, "Simics full system simulator," http://www.simics.com.
- Simics Full System Simulator
- Virtutech, A.B.¹

19
- 33748870886
- Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
- M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood, "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset," SIGARCH Comput. Archit. News, vol. 33, no. 4, pp. 92-99, 2005.
- (2005) SIGARCH Comput. Archit. News , vol.33 , Issue.4 , pp. 92-99
- Martin, M.¹ Sorin, D.² Beckmann, B.³ Marty, M.⁴ Xu, M.⁵ Alameldeen, A.⁶ Moore, K.⁷ Hill, M.⁸ Wood, D.⁹

20
- 35348900723
- Virtual hierarchies to support server consolidation
- M. R. Marty and M. D. Hill, "Virtual hierarchies to support server consolidation," in ISCA'07.
- ISCA'07
- Marty, M.R.¹ Hill, M.D.²

21
- 33947595619
- Accelerator: Using data parallelism to program gpus for general-purpose uses
- D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: using data parallelism to program gpus for general-purpose uses," in ASPLOS'06.
- ASPLOS'06
- Tarditi, D.¹ Puri, S.² Oglesby, J.³

22
- 0009930394
- ZPL: A machine independent programming language for parallel computers
- B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby, "ZPL: A machine independent programming language for parallel computers," IEEE TSE, vol. 26, no. 3, pp. 197-211, 2000.
- (2000) IEEE TSE , vol.26 , Issue.3 , pp. 197-211
- Chamberlain, B.L.¹ Choi, S.-E.² Lewis, E.C.³ Lin, C.⁴ Snyder, L.⁵ Weathersby, W.D.⁶

23
- 84872231580
- "LooPo - Loop parallelization in the polytope model," http://www.fmi.uni-passau.de/loopo.
- LooPo - Loop Parallelization in the Polytope Model

24
- 16244422171
- Interconnect-power dissipation in a microprocessor
- N. Magen, A. Kolodny, U. Weiser, and N. Shamir, "Interconnect-power dissipation in a microprocessor," in SLIP'04, 2004.
- (2004) SLIP'04
- Magen, N.¹ Kolodny, A.² Weiser, U.³ Shamir, N.⁴

25
- 84955452760
- Dynamic voltage scaling with links for power optimization of interconnection networks
- L. Shang, L.-S. Peh, and N. K. Jha, "Dynamic voltage scaling with links for power optimization of interconnection networks," in HPCA'03.
- HPCA'03
- Shang, L.¹ Peh, L.-S.² Jha, N.K.³

26
- 33746085616
- Reducing noc energy consumption through compiler-directed channel voltage scaling
- G. Chen, F. Li, M. Kandemir, and M. J. Irwin, "Reducing NoC energy consumption through compiler-directed channel voltage scaling," in PLDI'06.
- PLDI'06
- Chen, G.¹ Li, F.² Kandemir, M.³ Irwin, M.J.⁴

27
- 35449000082
- Profile-driven energy reduction in network-on-chips
- F. Li, G. Chen, M. Kandemir, and I. Kolcu, "Profile-driven energy reduction in network-on-chips," in PLDI'07.
- PLDI'07
- Li, F.¹ Chen, G.² Kandemir, M.³ Kolcu, I.⁴

28
- 0033700063
- A case for userlevel dynamic page migration
- D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé, "A case for userlevel dynamic page migration," in ICS'00.
- ICS'00
- Nikolopoulos, D.S.¹ Papatheodorou, T.S.² Polychronopoulos, C.D.³ Labarta, J.⁴ Ayguadé, E.⁵

29
- 84989342078
- Scheduling and page migration for multiprocessor compute servers
- R. Chandra, S. Devine, B. Verghese, A. Gupta, and M. Rosenblum, "Scheduling and page migration for multiprocessor compute servers," in ASPLOS'94.
- ASPLOS'94
- Chandra, R.¹ Devine, S.² Verghese, B.³ Gupta, A.⁴ Rosenblum, M.⁵

30
- 0003582055
- Dept. Computer Science, University of Washington, Seattle, WA, Tech. Rep. TR-95-09-01
- S. Leung and J. Zahorjan, "Optimizing data locality by array restructuring," Dept. Computer Science, University of Washington, Seattle, WA, Tech. Rep. TR-95-09-01, 1995.
- (1995) Optimizing Data Locality by Array Restructuring
- Leung, S.¹ Zahorjan, J.²

31
- 70449655268
- Non-singular data transformations: Definition, validity, applications
- M. F. P. O'Boyle and P. M. W. Knijnenburg, "Non-singular data transformations: definition, validity, applications," in CPC'96.
- CPC'96
- O'Boyle, M.F.P.¹ Knijnenburg, P.M.W.²

32
- 0033077834
- A linear algebra framework for automatic determination of optimal data layouts
- M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A linear algebra framework for automatic determination of optimal data layouts," IEEE TPDS, vol. 10, no. 2, pp. 115-135, 1999.
- (1999) IEEE TPDS , vol.10 , Issue.2 , pp. 115-135
- Kandemir, M.¹ Choudhary, A.² Shenoy, N.³ Banerjee, P.⁴ Ramanujam, J.⁵

33
- 0035439109
- Static and dynamic locality optimizations using integer linear programming
- M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "Static and dynamic locality optimizations using integer linear programming," IEEE TPDS, vol. 12, no. 9, pp. 922-941, 2001.
- (2001) IEEE TPDS , vol.12 , Issue.9 , pp. 922-941
- Kandemir, M.¹ Banerjee, P.² Choudhary, A.³ Ramanujam, J.⁴ Ayguade, E.⁵

34
- 70449666072
- Improving locality using loop and data transformations in an integrated framework
- M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in MICRO'98.
- MICRO'98
- Kandemir, M.¹ Choudhary, A.² Ramanujam, J.³ Banerjee, P.⁴

35
- 0027311338
- Automatic array alignment in data-parallel programs
- New York, NY, USA: ACM Press
- S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng, "Automatic array alignment in data-parallel programs," in POPL'93. New York, NY, USA: ACM Press, 1993, pp. 16-28.
- (1993) POPL'93 , pp. 16-28
- Chatterjee, S.¹ Gilbert, J.R.² Schreiber, R.³ Teng, S.-H.⁴

36
- 0032630442
- Maps: A compiler-managed memory system for raw machines
- R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, "Maps: a compiler-managed memory system for raw machines," in ISCA'99.
- ISCA'99
- Barua, R.¹ Lee, W.² Amarasinghe, S.³ Agarwal, A.⁴

37
- 3042565514
- Custom data layout for memory parallelism
- B. So, M. W. Hall, and H. E. Ziegler, "Custom data layout for memory parallelism," in CGO'04.
- CGO'04
- So, B.¹ Hall, M.W.² Ziegler, H.E.³

38
- 85088084665
- Recursive array layouts and fast parallel matrix multiplication
- S. Chatterjee, A. R. Lebeck, P. K. Patnala, and M. Thottethodi, "Recursive array layouts and fast parallel matrix multiplication," in SPAA'99.
- SPAA'99
- Chatterjee, S.¹ Lebeck, A.R.² Patnala, P.K.³ Thottethodi, M.⁴

39
- 85088332028
- Nonlinear array layouts for hierarchical memory systems
- S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi, "Nonlinear array layouts for hierarchical memory systems," in ICS'99.
- ICS'99
- Chatterjee, S.¹ Jain, V.V.² Lebeck, A.R.³ Mundhra, S.⁴ Thottethodi, M.⁵

40
- 0033342448
- Cache-efficient matrix transposition
- S. Chatterjee and S. Sen, "Cache-efficient matrix transposition," in HPCA'00.
- HPCA'00
- Chatterjee, S.¹ Sen, S.²

41
- 0032067773
- Maximizing parallelism and minimizing synchronization with affine partitions
- A. W. Lim and M. S. Lam, "Maximizing parallelism and minimizing synchronization with affine partitions," Parallel Computing, vol. 24, no. 3-4, pp. 445-475, 1998.
- (1998) Parallel Computing , vol.24 , Issue.3-4 , pp. 445-475
- Lim, A.W.¹ Lam, M.S.²

42
- 0032662841
- An affine partitioning algorithm to maximize parallelism and minimize communication
- A. W. Lim, G. I. Cheong, and M. S. Lam, "An affine partitioning algorithm to maximize parallelism and minimize communication," in ICS'99, 1999, pp. 228-237.
- (1999) ICS'99 , pp. 228-237
- Lim, A.W.¹ Cheong, G.I.² Lam, M.S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.