SCOPUS 정보 검색 플랫폼

IISWC 2014 - IEEE International Symposium on Workload Characterization

Volumn , Issue , 2014, Pages 1-12

Performance analysis of the memory management unit under scale-out workloads

(5) Karakostas, Vasileios a,b Unsal, Osman S b Nemirovsky, Mario a Cristal, Adrian a,b Swift, Michael c

a BARCELONA SUPERCOMPUTING CENTER (Spain)

b UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

c UNIVERSITY OF WISCONSIN MADISON (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BUFFER STORAGE; PHYSICAL ADDRESSES;

APPLICATION DATA; APPLICATION PERFORMANCE; CACHE HIERARCHIES; HARDWARE SUPPORTS; PERFORMANCE ANALYSIS; PERFORMANCE COUNTERS; PERFORMANCE GAPS; UPPER BOUND;

MEMORY MANAGEMENT UNITS;

EID: 84946036877 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IISWC.2014.6983034 Document Type: Conference Paper

Times cited : (42)

References (49)

1
- 84946029996
- "Cloudsuite Overview," http://parsa.epfl.ch/cloudsuite/overview.html.
- Cloudsuite Overview

2
- 84906694042
- "Huge Pages Part 1 (Introduction)," http://lwn.net/Articles/374424/.
- Huge Pages Part 1 (Introduction)

3
- 84857058270
- "International Technology Roadmap for Semiconductors: 2012," http://www.itrs.net/Links/2012ITRS/Home2012.htm.
- International Technology Roadmap for Semiconductors: 2012

4
- 84946076235
- "Perf wiki," https://perf.wiki.kernel.org/index.php/Main Page.
- Perf Wiki

5
- 84894591387
- "SPEC CPU 2006," https://www.spec.org/cpu2006/.
- (2006) SPEC CPU

6
- 84904578672
- "The/proc filesystem," www.kernel.org/doc/Documentation/filesystems/proc.txt.
- The/proc Filesystem

7
- 84937704401
- "Transparent Huge Pages in 2.6.38," http://lwn.net/Articles/423584/.
- Transparent Huge Pages in 2.6.38

8
- 33744484309
- Biobench: A benchmark suite of bioinformatics applications
- K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung, "BioBench: A Benchmark Suite of Bioinformatics Applications," ISPASS, 2005, pp. 2-9.
- (2005) ISPASS , pp. 2-9
- Albayraktaroglu, K.¹ Jaleel, A.² Wu, X.³ Franklin, M.⁴ Jacob, B.⁵ Tseng, C.-W.⁶ Yeung, D.⁷

9
- 0026140567
- The interaction of architecture and operating system design
- T. E. Anderson, H. M. Levy, B. N. Bershad, and E. D. Lazowska, "The interaction of architecture and operating system design," ASPLOS, 1991, pp. 108-120.
- (1991) ASPLOS , pp. 108-120
- Anderson, T.E.¹ Levy, H.M.² Bershad, B.N.³ Lazowska, E.D.⁴

10
- 67650706769
- Investigating cache parameters of x86 family processors
- V. Babka and P. Tuma, "Investigating Cache Parameters of x86 Family Processors," SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking, 2009, pp. 77-96.
- (2009) SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking , pp. 77-96
- Babka, V.¹ Tuma, P.²

11
- 77955012281
- Translation caching: Skip, don't walk (the page table)
- T. W. Barr, A. L. Cox, and S. Rixner, "Translation Caching: Skip, Don't Walk (the Page Table)," ISCA, 2010, pp. 48-59.
- (2010) ISCA , pp. 48-59
- Barr, T.W.¹ Cox, A.L.² Rixner, S.³

12
- 84883331358
- The datacenter as a computer: An introduction to the design of warehouse-scale machines
- second edition, ser
- L. A. Barroso, J. Clidaras, and U. Hlzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition, ser. Synthesis Lectures on Computer Architecture, 2013.
- (2013) Synthesis Lectures on Computer Architecture
- Barroso, L.A.¹ Clidaras, J.² Hlzle, U.³

13
- 84881179047
- Efficient virtual memory for big memory servers
- A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, "Efficient Virtual Memory for Big Memory Servers," ISCA, 2013, pp. 237-248.
- (2013) ISCA , pp. 237-248
- Basu, A.¹ Gandhi, J.² Chang, J.³ Hill, M.D.⁴ Swift, M.M.⁵

14
- 84864859089
- Reducing memory reference energy with opportunistic virtual caching
- A. Basu, M. D. Hill, and M. M. Swift, "Reducing memory reference energy with opportunistic virtual caching," ISCA, 2012, pp. 297-308.
- (2012) ISCA , pp. 297-308
- Basu, A.¹ Hill, M.D.² Swift, M.M.³

15
- 84892513543
- Large-reach memory management unit caches
- A. Bhattacharjee, "Large-reach Memory Management Unit Caches," MICRO, 2013, pp. 383-394.
- (2013) MICRO , pp. 383-394
- Bhattacharjee, A.¹

16
- 77952252973
- Inter-core cooperative tlb for chip multiprocessors
- A. Bhattacharjee and M. Martonosi, "Inter-core Cooperative TLB for Chip Multiprocessors," ASPLOS, 2010, pp. 359-370.
- (2010) ASPLOS , pp. 359-370
- Bhattacharjee, A.¹ Martonosi, M.²

17
- 70449652917
- Characterizing the tlb behavior of emerging parallel workloads on chip multiprocessors
- A. Bhattacharjee and M. Martonosi, "Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors," PACT, 2009, pp. 29-40.
- (2009) PACT , pp. 29-40
- Bhattacharjee, A.¹ Martonosi, M.²

18
- 79953093822
- Ph.D. dissertation Princeton University, January 2011
- C. Bienia, "Benchmarking Modern Multiprocessors," Ph.D. dissertation, Princeton University, January 2011.
- Benchmarking Modern Multiprocessors
- Bienia, C.¹

19
- 0031237070
- Virtual-address caches part 1: Problems and solutions in uniprocessors
- Sep.
- M. Cekleov and M. Dubois, "Virtual-address caches part 1: Problems and solutions in uniprocessors," IEEE Micro, vol. 17, no. 5, pp. 64-71, Sep. 1997.
- (1997) IEEE Micro , vol.17 , Issue.5 , pp. 64-71
- Cekleov, M.¹ Dubois, M.²

20
- 0031274147
- Virtual-address caches, part 2: Multiprocessor issues
- Nov.
- M. Cekleov and M. Dubois, "Virtual-address caches, part 2: Multiprocessor issues," IEEE Micro, vol. 17, no. 6, pp. 69-74, Nov. 1997.
- (1997) IEEE Micro , vol.17 , Issue.6 , pp. 69-74
- Cekleov, M.¹ Dubois, M.²

21
- 0026865575
- A simulation based study of tlb performance
- J. B. Chen, A. Borg, and N. P. Jouppi, "A Simulation Based Study of TLB Performance," ISCA, 1992, pp. 114-123.
- (1992) ISCA , pp. 114-123
- Chen, J.B.¹ Borg, A.² Jouppi, N.P.³

22
- 84862110045
- Dynamically reconfigurable hybrid cache: An energyefficient last-level cache design
- Y.-T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, "Dynamically Reconfigurable Hybrid Cache: An Energyefficient Last-level Cache Design," DATE, 2012, pp. 45-50.
- (2012) DATE , pp. 45-50
- Chen, Y.-T.¹ Cong, J.² Huang, H.³ Liu, B.⁴ Liu, C.⁵ Potkonjak, M.⁶ Reinman, G.⁷

23
- 0022020051
- Performance of the VAX-11/780 translation buffer: Simulation and measurement
- Feb.
- D. W. Clark and J. S. Emer, "Performance of the VAX-11/780 translation buffer: simulation and measurement," ACM Trans. Comput. Syst., vol. 3, no. 1, pp. 31-62, Feb. 1985.
- (1985) ACM Trans. Comput. Syst. , vol.3 , Issue.1 , pp. 31-62
- Clark, D.W.¹ Emer, J.S.²

24
- 0002634962
- Optimizing the idle task and other mmu tricks
- C. Dougan, P. Mackerras, and V. Yodaiken, "Optimizing the Idle Task and Other MMU Tricks," OSDI, 1999, pp. 229-237.
- (1999) OSDI , pp. 229-237
- Dougan, C.¹ Mackerras, P.² Yodaiken, V.³

25
- 84858791438
- Clearing the clouds: A study of emerging scale-out workloads on modern hardware
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012, pp. 37-48.
- (2012) ASPLOS , pp. 37-48
- Ferdman, M.¹ Adileh, A.² Kocberber, O.³ Volos, S.⁴ Alisafaee, M.⁵ Jevdjic, D.⁶ Kaynak, C.⁷ Popescu, A.D.⁸ Ailamaki, A.⁹ Falsafi, B.¹⁰

26
- 2342441476
- 3rd ed.
- J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd ed., 2002.
- (2002) Computer Architecture: A Quantitative Approach
- Hennessy, J.L.¹ Patterson, D.A.²

27
- 70449098063
- April 2012 248966- 026
- Intel Corporation, IntelR 64 and IA-32 Architectures Optimization Reference Manual, April 2012, no. 248966-026.
- Intelr 64 and ia-32 Architectures Optimization Reference Manual

28
- 0542404031
- A look at several memory management units, tlb-refill mechanisms, and page table organizations
- B. L. Jacob and T. N. Mudge, "A Look at Several Memory Management Units, TLB-refill Mechanisms, and Page Table Organizations," ASPLOS, 1998, pp. 295-306.
- (1998) ASPLOS , pp. 295-306
- Jacob, B.L.¹ Mudge, T.N.²

29
- 84881191462
- Die-stacked dram caches for servers: Hit ratio, latency, or bandwidth? Have it all with footprint cache
- D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM Caches for Servers: Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache," ISCA, 2013, pp. 404-415.
- (2013) ISCA , pp. 404-415
- Jevdjic, D.¹ Volos, S.² Falsafi, B.³

30
- 84893565625
- Characterizing data analysis workloads in data centers
- Z. Jia, L. Wang, J. Zhan, L. Zhang, and C. Luo, "Characterizing Data Analysis Workloads in Data Centers," IISWC, 2013, pp. 66-76.
- (2013) IISWC , pp. 66-76
- Jia, Z.¹ Wang, L.² Zhan, J.³ Zhang, L.⁴ Luo, C.⁵

31
- 0036287598
- Going the distance for TLB prefetching: An application-driven study
- G. B. Kandiraju and A. Sivasubramaniam, "Going the distance for TLB prefetching: an application-driven study," ISCA, 2002, pp. 195-206.
- (2002) ISCA , pp. 195-206
- Kandiraju, G.B.¹ Sivasubramaniam, A.²

32
- 0036039466
- Characterizing the d-tlb behavior of spec cpu2000 benchmarks
- G. B. Kandiraju and A. Sivasubramaniam, "Characterizing the d-TLB Behavior of SPEC CPU2000 Benchmarks," SIGMETRICS, 2002, pp. 129-139.
- (2002) SIGMETRICS , pp. 129-139
- Kandiraju, G.B.¹ Sivasubramaniam, A.²

33
- 84881178489
- A new perspective for efficient virtual-cache coherence
- S. Kaxiras and A. Ros, "A new perspective for efficient virtual-cache coherence," ISCA, 2013, pp. 535-546.
- (2013) ISCA , pp. 535-546
- Kaxiras, S.¹ Ros, A.²

34
- 84881144734
- Thin servers with smart pipes: Designing SoC accelerators for memcached
- K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin servers with smart pipes: designing SoC accelerators for memcached," ISCA, 2013, pp. 36-47.
- (2013) ISCA , pp. 36-47
- Lim, K.¹ Meisner, D.² Saidi, A.G.³ Ranganathan, P.⁴ Wenisch, T.F.⁵

35
- 84876533349
- Noc-out: Microarchitecting a scale-out processor
- P. Lotfi-Kamran, B. Grot, and B. Falsafi, "NOC-Out: Microarchitecting a Scale-Out Processor," MICRO, 2012, pp. 177-187.
- (2012) MICRO , pp. 177-187
- Lotfi-Kamran, P.¹ Grot, B.² Falsafi, B.³

36
- 84864861874
- Scaleout processors
- P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi, "Scaleout processors," ISCA, 2012, pp. 500-511.
- (2012) ISCA , pp. 500-511
- Lotfi-Kamran, P.¹ Grot, B.² Ferdman, M.³ Volos, S.⁴ Kocberber, O.⁵ Picorel, J.⁶ Adileh, A.⁷ Jevdjic, D.⁸ Idgunji, S.⁹ Ozer, E.¹⁰ Falsafi, B.¹¹

37
- 84878619560
- Tlb improvements for chip multiprocessors: Inter-core cooperative prefetchers and shared last-level tlbs
- Apr.
- D. Lustig, A. Bhattacharjee, and M. Martonosi, "TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs," ACM Trans. Archit. Code Optim., vol. 10, no. 1, pp. 2:1-2:38, Apr. 2013.
- (2013) ACM Trans. Archit. Code Optim. , vol.10 , Issue.1 , pp. 21-238
- Lustig, D.¹ Bhattacharjee, A.² Martonosi, M.³

38
- 52249092401
- Investigating the tlb behavior of high-end scientific applications on commodity microprocessors
- C. McCurdy, A. L. Cox, and J. Vetter, "Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors," ISPASS, 2008, pp. 95-104.
- (2008) ISPASS , pp. 95-104
- McCurdy, C.¹ Cox, A.L.² Vetter, J.³

39
- 84866872521
- Evaluating the impact of tlb misses on future hpc systems
- A. Morari, R. Gioiosa, R. W. Wisniewski, B. S. Rosenburg, T. Inglett, and M. Valero, "Evaluating the Impact of TLB Misses on Future HPC Systems," IPDPS, 2012, pp. 1010-1021.
- (2012) IPDPS , pp. 1010-1021
- Morari, A.¹ Gioiosa, R.² Wisniewski, R.W.³ Rosenburg, B.S.⁴ Inglett, T.⁵ Valero, M.⁶

40
- 0027204397
- Design tradeoffs for software-managed TLBs
- D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown, "Design tradeoffs for software-managed TLBs," ISCA, 1993, pp. 27-38.
- (1993) ISCA , pp. 27-38
- Nagle, D.¹ Uhlig, R.² Stanley, T.³ Sechrest, S.⁴ Mudge, T.⁵ Brown, R.⁶

41
- 84903973894
- Increasing TLB reach by exploiting clustering in page translations
- B. Pham, A. Bhattacharjee, Y. Eckert, and G. H. Loh, "Increasing TLB reach by exploiting clustering in page translations." HPCA, 2014, pp. 558-567.
- (2014) HPCA , pp. 558-567
- Pham, B.¹ Bhattacharjee, A.² Eckert, Y.³ Loh, G.H.⁴

42
- 84876544775
- Colt: Coalesced large-reach tlbs
- B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012, pp. 258-269.
- (2012) MICRO , pp. 258-269
- Pham, B.¹ Vaidyanathan, V.² Jaleel, A.³ Bhattacharjee, A.⁴

43
- 0034818158
- Towards virtually-addressed memory hierarchies
- X. Qiu and M. Dubois, "Towards Virtually-Addressed Memory Hierarchies," HPCA, 2001, pp. 51-62.
- (2001) HPCA , pp. 51-62
- Qiu, X.¹ Dubois, M.²

44
- 84883540577
- The impact of architectural trends on operating system performance
- M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta, "The Impact of Architectural Trends on Operating System Performance," SOSP, 1995, pp. 285-298.
- (1995) SOSP , pp. 285-298
- Rosenblum, M.¹ Bugnion, E.² Herrod, S.A.³ Witchel, E.⁴ Gupta, A.⁵

45
- 0033707299
- Recency-based tlb preloading
- A. Saulsbury, F. Dahlgren, and P. Stenstr, "Recency-based TLB Preloading," ISCA, 2000, pp. 117-127.
- (2000) ISCA , pp. 117-127
- Saulsbury, A.¹ Dahlgren, F.² Stenstr, P.³

46
- 0028305546
- Surpassing the tlb performance of superpages with less operating system support
- M. Talluri and M. D. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994, pp. 171-182.
- (1994) ASPLOS , pp. 171-182
- Talluri, M.¹ Hill, M.D.²

47
- 0022583630
- An in-cache address translation mechanism
- D. A. Wood, S. J. Eggers, G. Gibson, M. D. Hill, and J. M. Pendleton, "An In-cache Address Translation Mechanism," ISCA, 1986, pp. 358-365.
- (1986) ISCA , pp. 358-365
- Wood, D.A.¹ Eggers, S.J.² Gibson, G.³ Hill, M.D.⁴ Pendleton, J.M.⁵

48
- 79957470794
- Characterization and dynamic mitigation of intra-application cache interference
- C.-J. Wu and M. Martonosi, "Characterization and dynamic mitigation of intra-application cache interference," ISPASS, 2011, pp. 2-11.
- (2011) ISPASS , pp. 2-11
- Wu, C.-J.¹ Martonosi, M.²

49
- 79952037020
- Design of last-level on-chip cache using spin-torque transfer ram (stt ram)
- Mar.
- W. Xu, H. Sun, X. Wang, Y. Chen, and T. Zhang, "Design of Last-level On-chip Cache Using Spin-torque Transfer RAM (STT RAM)," IEEE Trans. Very Large Scale Integr. Syst., vol. 19, no. 3, pp. 483-493, Mar. 2011.
- (2011) IEEE Trans. Very Large Scale Integr. Syst. , vol.19 , Issue.3 , pp. 483-493
- Xu, W.¹ Sun, H.² Wang, X.³ Chen, Y.⁴ Zhang, T.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.