메뉴 건너뛰기




Volumn , Issue , 2009, Pages 348-357

Data layout transformation for enhancing data locality on NUCA chip multiprocessors

Author keywords

Data layout optimization; NUCA cache; Polyhedral model

Indexed keywords

ADDRESS SPACE; ARRAY REFERENCES; CHIP MULTIPROCESSOR; COMPILE TIME; DATA ACCESS PATTERNS; DATA LAYOUT OPTIMIZATION; DATA LAYOUTS; DATA LOCALITY; DATA LOCALITY OPTIMIZATION; HOTSPOTS; L2 CACHE; LOCALIZABILITY; MULTI-PROCESSORS; NONLOCAL; POLYHEDRAL MODELS; PROGRAM TRANSFORMATION TECHNIQUES; SIMULATION-BASED; TILED ARCHITECTURE; TILED CMP;

EID: 70449628310     PISSN: 1089795X     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/PACT.2009.36     Document Type: Conference Paper
Times cited : (67)

References (42)
  • 1
    • 3042669130 scopus 로고    scopus 로고
    • IBM POWER5 chip: A dual-core multithreaded processor
    • R. N. Kalla, B. Sinharoy, and J. M. Tendler, "IBM POWER5 chip: A dual-core multithreaded processor," IEEE Micro, vol. 24, no. 2, pp. 40-47, 2004.
    • (2004) IEEE Micro , vol.24 , Issue.2 , pp. 40-47
    • Kalla, R.N.1    Sinharoy, B.2    Tendler, J.M.3
  • 2
    • 20344374162 scopus 로고    scopus 로고
    • Niagara: A 32-way multithreaded sparc processor
    • P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way multithreaded sparc processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, 2005.
    • (2005) IEEE Micro , vol.25 , Issue.2 , pp. 21-29
    • Kongetira, P.1    Aingaran, K.2    Olukotun, K.3
  • 3
    • 0036949388 scopus 로고    scopus 로고
    • An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
    • C. Kim, D. Burger, and S. W. Keckler, "An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches," in ASPLOS'02.
    • ASPLOS'02
    • Kim, C.1    Burger, D.2    Keckler, S.W.3
  • 4
    • 21644472427 scopus 로고    scopus 로고
    • Managing wire delay in large chip-multiprocessor caches
    • B. M. Beckmann and D. A. Wood, "Managing wire delay in large chip-multiprocessor caches," in MICRO'04.
    • MICRO'04
    • Beckmann, B.M.1    Wood, D.A.2
  • 6
    • 0026933251 scopus 로고
    • Some efficient solutions to the affine scheduling problem: I. one-dimensional time
    • P. Feautrier, "Some efficient solutions to the affine scheduling problem: I. one-dimensional time," IJPP, vol. 21, no. 5, pp. 313-348, 1992.
    • (1992) IJPP , vol.21 , Issue.5 , pp. 313-348
    • Feautrier, P.1
  • 7
    • 0032058019 scopus 로고    scopus 로고
    • Constraint-based array dependence analysis
    • W. Pugh and D. Wonnacott, "Constraint-based array dependence analysis," ACM Trans. Program. Lang. Syst., vol. 20, no. 3, pp. 635-678, 1998.
    • (1998) ACM Trans. Program. Lang. Syst. , vol.20 , Issue.3 , pp. 635-678
    • Pugh, W.1    Wonnacott, D.2
  • 8
    • 0029711429 scopus 로고    scopus 로고
    • Minimizing communication while preserving parallelism
    • New York, NY, USA: ACM
    • W. Kelly and W. Plugh, "Minimizing communication while preserving parallelism," in ICS '96. New York, NY, USA: ACM, 1996, pp. 52-60.
    • (1996) ICS '96 , pp. 52-60
    • Kelly, W.1    Plugh, W.2
  • 11
    • 57349167317 scopus 로고    scopus 로고
    • Iterative optimization in the polyhedral model: Part II, multidimensional time
    • Tucson, Arizona, June
    • L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen, "Iterative optimization in the polyhedral model: Part II, multidimensional time," in PLDI'08, Tucson, Arizona, June 2008.
    • (2008) PLDI'08
    • Pouchet, L.-N.1    Bastoul, C.2    Cavazos, J.3    Cohen, A.4
  • 12
    • 57349139452 scopus 로고    scopus 로고
    • A practical automatic polyhedral parallelizer and locality optimizer
    • New York, NY, USA: ACM
    • U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A practical automatic polyhedral parallelizer and locality optimizer," in PLDI '08. New York, NY, USA: ACM, 2008, pp. 101-113.
    • (2008) PLDI '08 , pp. 101-113
    • Bondhugula, U.1    Hartono, A.2    Ramanujam, J.3    Sadayappan, P.4
  • 14
    • 33646559059 scopus 로고    scopus 로고
    • Automatic parallelization of loop programs for distributed memory architectures
    • University of Passau, habilitation Thesis. [Online]. Available
    • M. Griebl, Automatic Parallelization of Loop Programs for Distributed Memory Architectures. FMI, University of Passau, 2004, habilitation Thesis. [Online]. Available: http://www.uni-passau.de/~griebl/habilitation.html
    • (2004) FMI
    • Griebl, M.1
  • 15
  • 16
    • 0031622954 scopus 로고    scopus 로고
    • Data transformations for eliminating conflict misses
    • G. Rivera and C.-W. Tseng, "Data transformations for eliminating conflict misses," in PLDI'98.
    • PLDI'98
    • Rivera, G.1    Tseng, C.-W.2
  • 20
    • 35348900723 scopus 로고    scopus 로고
    • Virtual hierarchies to support server consolidation
    • M. R. Marty and M. D. Hill, "Virtual hierarchies to support server consolidation," in ISCA'07.
    • ISCA'07
    • Marty, M.R.1    Hill, M.D.2
  • 21
    • 33947595619 scopus 로고    scopus 로고
    • Accelerator: Using data parallelism to program gpus for general-purpose uses
    • D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: using data parallelism to program gpus for general-purpose uses," in ASPLOS'06.
    • ASPLOS'06
    • Tarditi, D.1    Puri, S.2    Oglesby, J.3
  • 22
    • 0009930394 scopus 로고    scopus 로고
    • ZPL: A machine independent programming language for parallel computers
    • B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby, "ZPL: A machine independent programming language for parallel computers," IEEE TSE, vol. 26, no. 3, pp. 197-211, 2000.
    • (2000) IEEE TSE , vol.26 , Issue.3 , pp. 197-211
    • Chamberlain, B.L.1    Choi, S.-E.2    Lewis, E.C.3    Lin, C.4    Snyder, L.5    Weathersby, W.D.6
  • 24
    • 16244422171 scopus 로고    scopus 로고
    • Interconnect-power dissipation in a microprocessor
    • N. Magen, A. Kolodny, U. Weiser, and N. Shamir, "Interconnect-power dissipation in a microprocessor," in SLIP'04, 2004.
    • (2004) SLIP'04
    • Magen, N.1    Kolodny, A.2    Weiser, U.3    Shamir, N.4
  • 25
    • 84955452760 scopus 로고    scopus 로고
    • Dynamic voltage scaling with links for power optimization of interconnection networks
    • L. Shang, L.-S. Peh, and N. K. Jha, "Dynamic voltage scaling with links for power optimization of interconnection networks," in HPCA'03.
    • HPCA'03
    • Shang, L.1    Peh, L.-S.2    Jha, N.K.3
  • 26
    • 33746085616 scopus 로고    scopus 로고
    • Reducing noc energy consumption through compiler-directed channel voltage scaling
    • G. Chen, F. Li, M. Kandemir, and M. J. Irwin, "Reducing NoC energy consumption through compiler-directed channel voltage scaling," in PLDI'06.
    • PLDI'06
    • Chen, G.1    Li, F.2    Kandemir, M.3    Irwin, M.J.4
  • 27
    • 35449000082 scopus 로고    scopus 로고
    • Profile-driven energy reduction in network-on-chips
    • F. Li, G. Chen, M. Kandemir, and I. Kolcu, "Profile-driven energy reduction in network-on-chips," in PLDI'07.
    • PLDI'07
    • Li, F.1    Chen, G.2    Kandemir, M.3    Kolcu, I.4
  • 30
  • 31
    • 70449655268 scopus 로고    scopus 로고
    • Non-singular data transformations: Definition, validity, applications
    • M. F. P. O'Boyle and P. M. W. Knijnenburg, "Non-singular data transformations: definition, validity, applications," in CPC'96.
    • CPC'96
    • O'Boyle, M.F.P.1    Knijnenburg, P.M.W.2
  • 32
    • 0033077834 scopus 로고    scopus 로고
    • A linear algebra framework for automatic determination of optimal data layouts
    • M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, "A linear algebra framework for automatic determination of optimal data layouts," IEEE TPDS, vol. 10, no. 2, pp. 115-135, 1999.
    • (1999) IEEE TPDS , vol.10 , Issue.2 , pp. 115-135
    • Kandemir, M.1    Choudhary, A.2    Shenoy, N.3    Banerjee, P.4    Ramanujam, J.5
  • 33
    • 0035439109 scopus 로고    scopus 로고
    • Static and dynamic locality optimizations using integer linear programming
    • M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and E. Ayguade, "Static and dynamic locality optimizations using integer linear programming," IEEE TPDS, vol. 12, no. 9, pp. 922-941, 2001.
    • (2001) IEEE TPDS , vol.12 , Issue.9 , pp. 922-941
    • Kandemir, M.1    Banerjee, P.2    Choudhary, A.3    Ramanujam, J.4    Ayguade, E.5
  • 34
    • 70449666072 scopus 로고    scopus 로고
    • Improving locality using loop and data transformations in an integrated framework
    • M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, "Improving locality using loop and data transformations in an integrated framework," in MICRO'98.
    • MICRO'98
    • Kandemir, M.1    Choudhary, A.2    Ramanujam, J.3    Banerjee, P.4
  • 35
    • 0027311338 scopus 로고
    • Automatic array alignment in data-parallel programs
    • New York, NY, USA: ACM Press
    • S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng, "Automatic array alignment in data-parallel programs," in POPL'93. New York, NY, USA: ACM Press, 1993, pp. 16-28.
    • (1993) POPL'93 , pp. 16-28
    • Chatterjee, S.1    Gilbert, J.R.2    Schreiber, R.3    Teng, S.-H.4
  • 37
    • 3042565514 scopus 로고    scopus 로고
    • Custom data layout for memory parallelism
    • B. So, M. W. Hall, and H. E. Ziegler, "Custom data layout for memory parallelism," in CGO'04.
    • CGO'04
    • So, B.1    Hall, M.W.2    Ziegler, H.E.3
  • 40
    • 0033342448 scopus 로고    scopus 로고
    • Cache-efficient matrix transposition
    • S. Chatterjee and S. Sen, "Cache-efficient matrix transposition," in HPCA'00.
    • HPCA'00
    • Chatterjee, S.1    Sen, S.2
  • 41
    • 0032067773 scopus 로고    scopus 로고
    • Maximizing parallelism and minimizing synchronization with affine partitions
    • A. W. Lim and M. S. Lam, "Maximizing parallelism and minimizing synchronization with affine partitions," Parallel Computing, vol. 24, no. 3-4, pp. 445-475, 1998.
    • (1998) Parallel Computing , vol.24 , Issue.3-4 , pp. 445-475
    • Lim, A.W.1    Lam, M.S.2
  • 42
    • 0032662841 scopus 로고    scopus 로고
    • An affine partitioning algorithm to maximize parallelism and minimize communication
    • A. W. Lim, G. I. Cheong, and M. S. Lam, "An affine partitioning algorithm to maximize parallelism and minimize communication," in ICS'99, 1999, pp. 228-237.
    • (1999) ICS'99 , pp. 228-237
    • Lim, A.W.1    Cheong, G.I.2    Lam, M.S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.