SCOPUS 정보 검색 플랫폼

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

Volumn , Issue , 2013, Pages 341-351

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

(3) Govindaraju, Venkatraman a Nowatzki, Tony a Sankaralingam, Karthikeyan a

a University of Wisconsin Madison (United States)

Author keywords

Accelerators; Access Execute Program Dependence Graph; DySER; SIMD; Vectorization

Indexed keywords

DATA-LEVEL PARALLELISM; DYSER; MICRO ARCHITECTURES; MODERN MICROPROCESSOR; PROGRAM DEPENDENCE GRAPH; SHORT VECTORS; SIMD; VECTORIZATION;

COMPUTER ARCHITECTURE; PARALLEL ARCHITECTURES; PARTICLE ACCELERATORS;

PROGRAM COMPILERS;

EID: 84887502088 PISSN: 1089795X EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/PACT.2013.6618830 Document Type: Conference Paper

Times cited : (25)

References (38)

1
- 84871976440
- "The gem5 simulator system, http://www. m5sim. org. "
- The gem5 Simulator System

2
- 84887480470
- "Slicer-compiler for dyser. http://research. cs. wisc. edu/veritcal/dysercompiler. "
- Slicer-compiler for Dyser

3
- 0000690274
- Conversion of control dependence to data dependence
- J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, "Conversion of control dependence to data dependence," in POPL'83.
- POPL'83
- Allen, J.R.¹ Kennedy, K.² Porterfield, C.³ Warren, J.⁴

4
- 0023438847
- Automatic translation of fortran programs to vector form
- R. Allen and K. Kennedy, "Automatic translation of fortran programs to vector form," ACM Trans. Program. Lang. Syst. 1987.
- (1987) ACM Trans. Program. Lang. Syst.
- Allen, R.¹ Kennedy, K.²

5
- 24144474794
- Software vectorization handbook
- Intel Press
- A. J. C. Bik, Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press, 2004.
- (2004) The: Applying Intel Multimedia Extensions for Maximum Performance
- Bik, A.J.C.¹

6
- 34547185000
- Scalable subgraph mapping for acyclic computation accelerators
- N. Clark, A. Hormati, S. Mahlke, and S. Yehia, "Scalable subgraph mapping for acyclic computation accelerators," in CASES'06.
- CASES'06
- Clark, N.¹ Hormati, A.² Mahlke, S.³ Yehia, S.⁴

7
- 0023385308
- The program dependence graph and its use in optimization
- J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The program dependence graph and its use in optimization," ACM Trans. Program. Lang. Syst., 1987.
- (1987) ACM Trans. Program. Lang. Syst.
- Ferrante, J.¹ Ottenstein, K.J.² Warren, J.D.³

8
- 77954724842
- Sams multi-layout memory: Providing multiple views of data to boost simd performance
- C. Gou, G. Kuzmanov, and G. Gaydadjiev, "Sams multi-layout memory: providing multiple views of data to boost simd performance," in ICS'10.
- ICS'10
- Gou, C.¹ Kuzmanov, G.² Gaydadjiev, G.³

9
- 84869168810
- Dyser: Unifying functionality and parallelism specialization for energy efficient computing
- V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy efficient computing," IEEE Micro, vol. 33, no. 5, 2012.
- (2012) IEEE Micro , vol.33 , Issue.5
- Govindaraju, V.¹ Ho, C.-H.² Nowatzki, T.³ Chhugani, J.⁴ Satish, N.⁵ Sankaralingam, K.⁶ Kim, C.⁷

10
- 79955890625
- Dynamically specialized datapaths for energy efficient computing
- V. Govindaraju, C.-H. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA 2011.
- (2011) HPCA
- Govindaraju, V.¹ Ho, C.-H.² Sankaralingam, K.³

11
- 84871291822
- Bundled execution of recurring traces for energy-efficient general purpose processing
- S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO-44.
- MICRO-44
- Gupta, S.¹ Feng, S.² Ansari, A.³ Mahlke, S.⁴ August, D.⁵

12
- 0031360911
- Garp: A MIPS processor with a reconfigurable coprocessor
- J. R. Hauser and J. Wawrzynek, "Garp: A MIPS Processor with a Reconfigurable Coprocessor," in FCCM'97.
- FCCM'97
- Hauser, J.R.¹ Wawrzynek, J.²

13
- 84863451245
- Dynamic trace-based analysis of vectorization potential of applications
- J. Holewinski, R. Ramamurthi, M. Ravishankar, N. Fauzia, L.-N. Pouchet, A. Rountev, and P. Sadayappan, "Dynamic trace-based analysis of vectorization potential of applications," SIGPLAN Not., 2012.
- (2012) SIGPLAN Not
- Holewinski, J.¹ Ramamurthi, R.² Ravishankar, M.³ Fauzia, N.⁴ Pouchet, L.-N.⁵ Rountev, A.⁶ Sadayappan, P.⁷

14
- 0037952146
- San Francisco, CA, USA: Morgan Kaufmann Publishers In.
- K. Kennedy and J. R. Allen, Optimizing compilers for modern architectures: a dependence-based approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2002.
- (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach
- Kennedy, K.¹ Allen, J.R.²

15
- 0034446825
- Exploiting superword level parallelism with multimedia instruction sets
- S. Larsen and S. Amarasinghe, "Exploiting superword level parallelism with multimedia instruction sets," in PLDI'00.
- PLDI'00
- Larsen, S.¹ Amarasinghe, S.²

16
- 80052543989
- Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
- Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart, C. Batten, and K. Asanovíc, "Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators," in ISCA'11.
- ISCA'11
- Lee, Y.¹ Avizienis, R.² Bishara, A.³ Xia, R.⁴ Lockhart, D.⁵ Batten, C.⁶ Asanovíc, K.⁷

17
- 84863012838
- An evaluation of vectorizing compilers
- S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua, "An evaluation of vectorizing compilers," in PACT'11.
- PACT'11
- Maleki, S.¹ Gao, Y.² Garzarán, M.J.³ Wong, T.⁴ Padua, D.A.⁵

18
- 84887477162
- Tartan: Evaluating spatial computation for whole program execution
- M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: evaluating spatial computation for whole program execution," in ASPLOS-XII.
- ASPLOS-XII
- Mishra, M.¹ Callahan, T.J.² Chelcea, T.³ Venkataramani, G.⁴ Goldstein, S.C.⁵ Budiu, M.⁶

19
- 77951154340
- The gpu computing era
- Mar.
- J. Nickolls and W. J. Dally, "The gpu computing era," IEEE Micro, vol. 30, no. 2, Mar. 2010.
- (2010) IEEE Micro , vol.30 , Issue.2
- Nickolls, J.¹ Dally, W.J.²

20
- 84883088830
- A general constraint-centric scheduling framework for spatial architectures
- T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," in PLDI 2013.
- (2013) PLDI
- Nowatzki, T.¹ Sartin-Tarm, M.² De Carli, L.³ Sankaralingam, K.⁴ Estan, C.⁵ Robatmili, B.⁶

21
- 79953275887
- Multi-platform auto-vectorization
- D. Nuzman and R. Henderson, "Multi-platform auto-vectorization, " in CGO'06.
- CGO'06
- Nuzman, D.¹ Henderson, R.²

22
- 33746034953
- Auto-vectorization of interleaved data for simd
- D. Nuzman, I. Rosen, and A. Zaks, "Auto-vectorization of interleaved data for simd," in PLDI'06.
- PLDI'06
- Nuzman, D.¹ Rosen, I.² Zaks, A.³

23
- 0022874874
- Advanced compiler optimizations for supercomputers
- D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers," Commun. ACM, 1986.
- (1986) Commun. ACM
- Padua, D.A.¹ Wolfe, M.J.²

24
- 84872059246
- "Parboil benchmark suite, http://impact. crhc. illinois. edu/parboil. php. "
- Parboil Benchmark Suite

25
- 84876586321
- Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability
- Y. Park, J. J. K. Park, H. Park, and S. Mahlke, "Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability," in MICRO'12.
- MICRO'12
- Park, Y.¹ Park, J.J.K.² Park, H.³ Mahlke, S.⁴

26
- 84863353689
- Simd defragmenter: Efficient ilp realization on data-parallel architectures
- Y. Park, S. Seo, H. Park, H. K. Cho, and S. Mahlke, "Simd defragmenter: efficient ilp realization on data-parallel architectures," in ASPLOS'12.
- ASPLOS'12
- Park, Y.¹ Seo, S.² Park, H.³ Cho, H.K.⁴ Mahlke, S.⁵

27
- 84870653904
- Ispc: A spmd compiler for highperformance cpu programming
- M. Pharr and W. R. Mark, ""ispc: A spmd compiler for highperformance cpu programming"," in InPar 2012.
- (2012) InPar
- Pharr, M.¹ Mark, W.R.²

28
- 33745222449
- Optimizing data permutations for simd devices
- G. Ren, P. Wu, and D. Padua, "Optimizing data permutations for simd devices," in PLDI'06.
- PLDI'06
- Ren, G.¹ Wu, P.² Padua, D.³

29
- 84944402628
- Universal mechanisms for data-parallel architectures
- December
- K. Sankaralingam, S. W. Keckler, W. R. Mark, and D. Burger, "Universal Mechanisms for Data-Parallel Architectures," in MICRO'03: Proceedings of the 36th Annual International Symposium on Microarchitecture, December 2003, pp. 303-314.
- (2003) MICRO'03: Proceedings of the 36th Annual International Symposium on Microarchitecture , pp. 303-314
- Sankaralingam, K.¹ Keckler, S.W.² Mark, W.R.³ Burger, D.⁴

30
- 84887486539
- Constraint centric scheduling guide
- May
- M. Sartin-Tarm, T. Nowatzki, L. De Carli, K. Sankaralingam, and C. Estan, "Constraint centric scheduling guide," SIGARCH Comput. Archit. News, vol. 41, no. 2, pp. 17-21, May 2013.
- (2013) SIGARCH Comput. Archit. News , vol.41 , Issue.2 , pp. 17-21
- Sartin-Tarm, M.¹ Nowatzki, T.² De Carli, L.³ Sankaralingam, K.⁴ Estan, C.⁵

31
- 84864831385
- Can traditional programming bridge the ninja performance gap for parallel computing applications
- N. Satish, C. Kim, J. Chhugani, H. Saito, R. Krishnaiyer, M. Smelyanskiy, M. Girkar, and P. Dubey, "Can traditional programming bridge the ninja performance gap for parallel computing applications?" in ISCA 2012.
- (2012) ISCA
- Satish, N.¹ Kim, C.² Chhugani, J.³ Saito, H.⁴ Krishnaiyer, R.⁵ Smelyanskiy, M.⁶ Girkar, M.⁷ Dubey, P.⁸

32
- 57649106258
- Larrabee: A many-core x86 architecture for visual computing
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, "Larrabee: a many-core x86 architecture for visual computing," in SIGGRAPH 2008.
- (2008) SIGGRAPH
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

33
- 47849103500
- Introducing control flow into vectorized code
- J. Shin, "Introducing control flow into vectorized code," in PACT'07.
- PACT'07
- Shin, J.¹

34
- 85088882721
- Vector instruction set support for conditional operations
- J. E. Smith, G. Faanes, and R. Sugumar, "Vector instruction set support for conditional operations," in ISCA'00.
- ISCA'00
- Smith, J.E.¹ Faanes, G.² Sugumar, R.³

35
- 84857819522
- Using machine learning to improve automatic vectorization
- K. Stock, L.-N. Pouchet, and P. Sadayappan, "Using machine learning to improve automatic vectorization," TACO 2012.
- (2012) TACO
- Stock, K.¹ Pouchet, L.-N.² Sadayappan, P.³

36
- 70449626135
- Polyhedral-model guided loop-nest auto-vectorization
- K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, "Polyhedral-model guided loop-nest auto-vectorization," in PACT'09.
- PACT'09
- Trifunovic, K.¹ Nuzman, D.² Cohen, A.³ Zaks, A.⁴ Rosen, I.⁵

37
- 84887444440
- Relaxing simd control flow constraints using loop transformations
- R. v. Hanxleden and K. Kennedy, "Relaxing simd control flow constraints using loop transformations," in PLDI'92.
- PLDI'92
- Hanxleden, R.V.¹ Kennedy, K.²

38
- 77952256041
- Conservation cores: Reducing the energy of mature computations
- G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: reducing the energy of mature computations," in ASPLOS'10.
- ASPLOS'10
- Venkatesh, G.¹ Sampson, J.² Goulding, N.³ Garcia, S.⁴ Bryksin, V.⁵ Lugo-Martinez, J.⁶ Swanson, S.⁷ Taylor, M.B.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.