SCOPUS 정보 검색 플랫폼

IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium

Volumn , Issue , 2009, Pages

Understanding the design trade-offs among current multicore systems for numerical computations

(3) Kang, Seunghwa a Bader, David A a Vuduc, Richard a

a GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ACCELERATOR TECHNOLOGY; BARCELONA; CELL BROADBAND ENGINE; COMPUTATIONAL STATISTICS; COMPUTING SYSTEM; DESIGN TRADEOFF; FUNDAMENTAL DESIGN; MULTI-CORE PROCESSOR; MULTI-CORE SYSTEMS; NUMERICAL COMPUTATIONS; SOFTWARE IMPLEMENTATION; TOSHIBA;

ARCHITECTURAL DESIGN; COMPUTER SOFTWARE; DISTRIBUTED PARAMETER NETWORKS; SOFTWARE ARCHITECTURE;

COMPUTER ARCHITECTURE;

EID: 70449975572 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IPDPS.2009.5161055 Document Type: Conference Paper

Times cited : (4)

References (31)

1
- 70449949502
- AMD CodeAnalyst, 2009. http://developer.amd.com/cpu/CodeAnalyst.
- (2009)

2
- 70449963065
- PAPI
- PAPI, 2009. http://icl.cs.utk.edu/papi.
- (2009)

3
- 70449827682
- The R project for statistical computing, 2009. http:// www.r-project.org/.
- (2009)

4
- 42549111882
- AMD Corporation, 3.06 edition, Apr
- AMD Corporation. Software Optimization Guide for AMD Family 10h Processors, 3.06 edition, Apr. 2008.
- (2008) Software Optimization Guide for AMD Family 10h Processors

5
- 35148835330
- Implementation of a cone-beam backprojection algorithm on the Cell Broadband Engine processor
- San Diego, CA, Feb
- O. Bockenbach, M. Knaup, and M. Kachelriess. Implementation of a cone-beam backprojection algorithm on the Cell Broadband Engine processor. In Proc. SPIE Medical Imaging, San Diego, CA, Feb. 2007.
- (2007) Proc. SPIE Medical Imaging
- Bockenbach, O.¹ Knaup, M.² Kachelriess, M.³

6
- 51449118065
- A performance study of general-purpose applications on graphics processors using CUDA
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing, 68(10):1370-1380, 2008.
- (2008) Journal of Parallel and Distributed Computing , vol.68 , Issue.10 , pp. 1370-1380
- Che, S.¹ Boyer, M.² Meng, J.³ Tarjan, D.⁴ Sheaffer, J.W.⁵ Skadron, K.⁶

7
- 52349084750
- Accelerating compute-intensive applications with GPUs and FPGAs
- Anaheim, CA, Jun
- S. Che, J. Li, J.W. Sheaffer, K. Skadron, and J. Lach. Accelerating compute-intensive applications with GPUs and FPGAs. In Proc. 6th IEEE Symp. on Application Specific Processors (SASP), Anaheim, CA, Jun. 2008.
- (2008) Proc. 6th IEEE Symp. on Application Specific Processors (SASP)
- Che, S.¹ Li, J.² Sheaffer, J.W.³ Skadron, K.⁴ Lach, J.⁵

8
- 35648955176
- Cell Broadband Engine Architecture and its first implementation-a performance view
- Sep
- T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its first implementation-a performance view. IBM Journal of Research and Developments, 51(5):559-572, Sep. 2007.
- (2007) IBM Journal of Research and Developments , vol.51 , Issue.5 , pp. 559-572
- Chen, T.¹ Raghavan, R.² Dale, J.³ Iwata, E.⁴

9
- 47749111716
- Second generation quad-core Intel Xeon processors bring 45 nm technology and a new level of performance to HPC applications
- P. Gepner, D. L. Fraser, and M. F. Kowalik. Second generation quad-core Intel Xeon processors bring 45 nm technology and a new level of performance to HPC applications. Lecture Notes in Computer Science, 5101:417-426, 2008.
- (2008) Lecture Notes in Computer Science , vol.5101 , pp. 417-426
- Gepner, P.¹ Fraser, D.L.² Kowalik, M.F.³

10
- 70449098063
- Intel Corporation, Nov
- Intel Corporation. Intel 64 and IA-32 Architectures Optimization Reference Manual, Nov. 2007.
- (2007) Intel 64 and IA-32 Architectures Optimization Reference Manual

11
- 52649148744
- Selfoptimizing memory controllers: A reinforcement learning approach
- Jun
- E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana. Selfoptimizing memory controllers: A reinforcement learning approach. ACM SIGARCH Computer Architecture News, 36(3):39-50, Jun. 2008.
- (2008) ACM SIGARCH Computer Architecture News , vol.36 , Issue.3 , pp. 39-50
- Ipek, E.¹ Mutlu, O.² Martinez, J.F.³ Caruana, R.⁴

12
- 36949033619
- Performance analysis of Cell Broadband Engine for high memory bandwidth applications
- San Jose, CA, Apr
- D. Jimenez-Gonzalez, X. Martorell, and A. Ramirez. Performance analysis of Cell Broadband Engine for high memory bandwidth applications. In Proc. 7th IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS), San Jose, CA, Apr. 2007.
- (2007) Proc. 7th IEEE Int'l Symp. on Performance Analysis of Systems and Software (ISPASS)
- Jimenez-Gonzalez, D.¹ Martorell, X.² Ramirez, A.³

13
- 0002282074
- A new measure of rank correlation
- Jun
- M. G. Kendall. A new measure of rank correlation. Biometrika Trust, 30(1):81-93, Jun. 1938.
- (1938) Biometrika Trust , vol.30 , Issue.1 , pp. 81-93
- Kendall, M.G.¹

14
- 44849137198
- NVIDIA Tesla: A unified graphics and computing architecture
- Mar
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39-55, Mar. 2008.
- (2008) IEEE Micro , vol.28 , Issue.2 , pp. 39-55
- Lindholm, E.¹ Nickolls, J.² Oberman, S.³ Montrym, J.⁴

15
- 0034314462
- Dynamic access ordering for streamed computations
- Nov
- S. A. McKee, W. A. Wulf, J. H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, and D. A. B. Weikle. Dynamic access ordering for streamed computations. IEEE Transactions on Computers, 49(11):1255-1271, Nov. 2000.
- (2000) IEEE Transactions on Computers , vol.49 , Issue.11 , pp. 1255-1271
- McKee, S.A.¹ Wulf, W.A.² Aylor, J.H.³ Klenke, R.H.⁴ Salinas, M.H.⁵ Hong, S.I.⁶ Weikle, D.A.B.⁷

16
- 70449827677
- Analysis of a computational biology simulation technique on emerging processing architectures
- Long Beach, CA
- J. S. Meredith, S. R. Alam, and J. S. Vetter. Analysis of a computational biology simulation technique on emerging processing architectures. In Proc. 6th IEEE Int'l Workshop on High Performance Computational Biology (HICOMB), Long Beach, CA, 2007.
- (2007) Proc. 6th IEEE Int'l Workshop on High Performance Computational Biology (HICOMB)
- Meredith, J.S.¹ Alam, S.R.² Vetter, J.S.³

17
- 70449987304
- Enhancing the performance and fairness of shared DRAM systems with parallelismaware batch scheduling
- Beijing, China, Jun
- O. Mutlu and T. Moscibroda. Enhancing the performance and fairness of shared DRAM systems with parallelismaware batch scheduling. In Proc. 35th Ann. Int'l Symp. on Computer Architecture (ISCA), Beijing, China, Jun. 2008.
- (2008) Proc. 35th Ann. Int'l Symp. on Computer Architecture (ISCA)
- Mutlu, O.¹ Moscibroda, T.²

18
- 47349089021
- A study of performance impact of memory controller features in multiprocessor server environment
- Munich, Germany, Jun
- C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multiprocessor server environment. In Proc. 3rd Workshop on Memory Performance Issues (WMPI), Munich, Germany, Jun. 2004.
- (2004) Proc. 3rd Workshop on Memory Performance Issues (WMPI)
- Natarajan, C.¹ Christenson, B.² Briggs, F.³

19
- 78651550268
- Scalable parallel programming with CUDA
- Mar
- J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. ACM Queue, 6(2):40- 53, Mar. 2008.
- (2008) ACM Queue , vol.6 , Issue.2 , pp. 40-53
- Nickolls, J.¹ Buck, I.² Garland, M.³ Skadron, K.⁴

20
- 64549155924
- NVIDIA Corporation, 2.0 edition, Jun
- NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 2.0 edition, Jun. 2008.
- (2008) NVIDIA CUDA Compute Unified Device Architecture Programming Guide

21
- 33947588048
- A survey of general-purpose computation on graphics hardware
- Mar
- J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80-113, Mar. 2007.
- (2007) Computer Graphics Forum , vol.26 , Issue.1 , pp. 80-113
- Owens, J.D.¹ Luebke, D.² Govindaraju, N.³ Harris, M.⁴ Kruger, J.⁵ Lefohn, A.E.⁶ Purcell, T.J.⁷

22
- 47349100893
- Package technology to address the memory bandwidth challenge for tera-scale computing
- L. A. Polka, H. Kalyanam, G. Hu, and S. Krishnamoorthy. Package technology to address the memory bandwidth challenge for tera-scale computing. Intel Technology Journal, 11(3), 2007.
- (2007) Intel Technology Journal , vol.11 , Issue.3
- Polka, L.A.¹ Kalyanam, H.² Hu, G.³ Krishnamoorthy, S.⁴

23
- 47849130815
- Effective management of DRAM bandwidth in multicore processors
- Brasov, Romania, Sep
- N. Rafique, W. T. Lim, and M. Thottethodi. Effective management of DRAM bandwidth in multicore processors. In Proc. 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), Brasov, Romania, Sep. 2007.
- (2007) Proc. 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT)
- Rafique, N.¹ Lim, W.T.² Thottethodi, M.³

24
- 0033691565
- Memory access scheduling
- Vancouver, Canada, Jun
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proc. 27th Ann. Int'l Symp. on Computer Architecture (ISCA), Vancouver, Canada, Jun. 2000.
- (2000) Proc. 27th Ann. Int'l Symp. on Computer Architecture (ISCA)
- Rixner, S.¹ Dally, W.J.² Kapasi, U.J.³ Mattson, P.⁴ Owens, J.D.⁵

25
- 79959466764
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
- Salt Lake City, UT, Feb
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Salt Lake City, UT, Feb. 2008.
- (2008) Proc. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
- Ryoo, S.¹ Rodrigues, C.I.² Baghsorkhi, S.S.³ Stone, S.S.⁴ Kirk, D.B.⁵ Hwu, W.W.⁶

26
- 51449112813
- Program optimization carving for GPU computing
- S. Ryoo, C. I. Rodrigues, S. S. Stone, J. A. Stratton, S. Ueng, S. S. Baghsorkhi, and W. W. Hwu. Program optimization carving for GPU computing. Journal of Parallel and Distributed Computing, 68(10):1389-1401, 2008.
- (2008) Journal of Parallel and Distributed Computing , vol.68 , Issue.10 , pp. 1389-1401
- Ryoo, S.¹ Rodrigues, C.I.² Stone, S.S.³ Stratton, J.A.⁴ Ueng, S.⁵ Baghsorkhi, S.S.⁶ Hwu, W.W.⁷

27
- 70450024002
- Parallelization schemes for memory optimization on the Cell processor: A case study of image processing algorithm
- Brasov, Romania, Sep
- T. Saidani, S. Piskorski, L. Lacassagne, and S. Bouaziz. Parallelization schemes for memory optimization on the Cell processor: a case study of image processing algorithm. In Proc.Workshop on memory performance (MEDEA), Brasov, Romania, Sep. 2007.
- (2007) Proc.Workshop on memory performance (MEDEA)
- Saidani, T.¹ Piskorski, S.² Lacassagne, L.³ Bouaziz, S.⁴

28
- 51449090534
- Algorithmic performance studies on graphics processing units
- O. Schenk, M. Christen, and H. Burkhart. Algorithmic performance studies on graphics processing units. Journal of Parallel and Distributed Computing, 68(10):1360-1369, 2008.
- (2008) Journal of Parallel and Distributed Computing , vol.68 , Issue.10 , pp. 1360-1369
- Schenk, O.¹ Christen, M.² Burkhart, H.³

29
- 49249086142
- Larrabee: A many-core x86 architecture for visual computing
- Aug
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), Aug. 2008.
- (2008) ACM Transactions on Graphics , vol.27 , Issue.3
- Seiler, L.¹ Carmean, D.² Sprangle, E.³ Forsyth, T.⁴ Abrash, M.⁵ Dubey, P.⁶ Junkins, S.⁷ Lake, A.⁸ Sugerman, J.⁹ Cavin, R.¹⁰ Espasa, R.¹¹ Grochowski, E.¹² Juan, T.¹³ Hanrahan, P.¹⁴

30
- 70350771131
- Benchmarking GPUs to tune dense linear algebra
- Austin, TX
- V. Volkov and J. W. Demmel. Benchmarking GPUs to tune dense linear algebra. In Proc. Int'l Conf. on High Performance Computing and Networking (SC), Austin, TX, 2008.
- (2008) Proc. Int'l Conf. on High Performance Computing and Networking (SC)
- Volkov, V.¹ Demmel, J.W.²

31
- 56749158843
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. Int'l Conf. on High Performance Computing and Networking (SC), 2007.
- (2007) Proc. Int'l Conf. on High Performance Computing and Networking (SC)
- Williams, S.¹ Oliker, L.² Vuduc, R.³ Shalf, J.⁴ Yelick, K.⁵ Demmel, J.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.