메뉴 건너뛰기




Volumn , Issue , 2012, Pages 440-451

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Author keywords

[No Author keywords available]

Indexed keywords

C++ (PROGRAMMING LANGUAGE); CODES (SYMBOLS); MEMORY ARCHITECTURE; PROGRAM COMPILERS;

EID: 84864831385     PISSN: 10636897     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2366231.2337210     Document Type: Conference Paper
Times cited : (72)

References (48)
  • 2
    • 77951472684 scopus 로고    scopus 로고
    • Direct N-body kernels for multicore platforms
    • N. Arora, A. Shringarpure, and R. W. Vuduc. Direct N-body Kernels for Multicore Platforms. In ICPP, pages 379-387, 2009.
    • (2009) ICPP , pp. 379-387
    • Arora, N.1    Shringarpure, A.2    Vuduc, R.W.3
  • 4
    • 63549095070 scopus 로고    scopus 로고
    • The PARSEC benchmark suite: Characterization and architectural implications
    • C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, pages 72-81, 2008.
    • (2008) PACT , pp. 72-81
    • Bienia, C.1    Kumar, S.2    Singh, J.P.3    Li, K.4
  • 5
    • 85015692260 scopus 로고
    • The pricing of options and corporate liabilities
    • F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637-654, 1973.
    • (1973) Journal of Political Economy , vol.81 , Issue.3 , pp. 637-654
    • Black, F.1    Scholes, M.2
  • 6
    • 77954942935 scopus 로고    scopus 로고
    • Low depth cache-oblivious algorithms
    • G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In SPAA, pages 189-199, 2010.
    • (2010) SPAA , pp. 189-199
    • Blelloch, G.E.1    Gibbons, P.B.2    Simhadri, H.V.3
  • 7
    • 79960806724 scopus 로고    scopus 로고
    • Can CPUs match GPUs on performance with productivity?: Experiences with optimizing aFLOP-intensive application on CPUs and GPU
    • August
    • R. Bordawekar, U. Bondhugula, and R. Rao. Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing aFLOP-intensive Application on CPUs and GPU. IBM Research Report, RC25033, August 2010.
    • (2010) IBM Research Report, RC25033
    • Bordawekar, R.1    Bondhugula, U.2    Rao, R.3
  • 8
    • 0031489544 scopus 로고    scopus 로고
    • The market model of interest rate dynamics
    • A. Brace, D. Gatarek, and M. Musiela. The Market Model of Interest Rate Dynamics. Mathematical Finance, 7(2):127-155, 1997.
    • (1997) Mathematical Finance , vol.7 , Issue.2 , pp. 127-155
    • Brace, A.1    Gatarek, D.2    Musiela, M.3
  • 11
    • 49249135216 scopus 로고    scopus 로고
    • Onvergence of recognition, mining, and synthesis workloads and its implications
    • Y. K. Chen, J. Chhugani, P. Dubey, C. J. Hughes, D. Kim, S. Kumar, et al. onvergence of recognition, mining, and synthesis workloads and its implications. Proceedings of the IEEE, 96(5):790-807, 2008.
    • (2008) Proceedings of the IEEE , vol.96 , Issue.5 , pp. 790-807
    • Chen, Y.K.1    Chhugani, J.2    Dubey, P.3    Hughes, C.J.4    Kim, D.5    Kumar, S.6
  • 12
    • 84865096511 scopus 로고    scopus 로고
    • Efficient implementation of sorting on multi-core simd cpu architecture
    • J. Chhugani, A. D. Nguyen, et al. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313-1324, 2008.
    • (2008) PVLDB , vol.1 , Issue.2 , pp. 1313-1324
    • Chhugani, J.1    Nguyen, A.D.2
  • 16
    • 36949031604 scopus 로고    scopus 로고
    • A platform 2015 workload model: Recognition, miniming and synthesis moves computers to the era of tera
    • P. Dubey. A Platform 2015 Workload Model: Recognition, Miniming and Synthesis Moves Computers to the Era of Tera. Intel, 2005.
    • (2005) Intel
    • Dubey, P.1
  • 17
    • 8344245462 scopus 로고    scopus 로고
    • Vectorization for simd architectures with alignment constraints
    • A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for simd architectures with alignment constraints. In PLDI, pages 82-93, 2004.
    • (2004) PLDI , pp. 82-93
    • Eichenberger, A.E.1    Wu, P.2    O'brien, K.3
  • 18
    • 78650646788 scopus 로고    scopus 로고
    • Joint forces: From multithreaded programming to GPU computing
    • January
    • F. Feinbube, P. Troger, and A. Polze. Joint Forces: From Multithreaded Programming to GPU Computing. IEEE Softw., 28:51-57, January 2011.
    • (2011) IEEE Softw. , vol.28 , pp. 51-57
    • Feinbube, F.1    Troger, P.2    Polze, A.3
  • 20
    • 0042482650 scopus 로고    scopus 로고
    • 'N-body' problems in statistical learning
    • A. G. Gray and A. W. Moore. 'N-Body' Problems in Statistical Learning. In NIPS, pages 521-527, 2000.
    • (2000) NIPS , pp. 521-527
    • Gray, A.G.1    Moore, A.W.2
  • 24
    • 85184646781 scopus 로고    scopus 로고
    • Intel. Optimization Notice. http://software.intel.com/en-us/articles/ optimization-notice/, 2012.
    • (2012) Optimization Notice
  • 25
    • 78650874239 scopus 로고    scopus 로고
    • Performance evaluation of convolution on the cell broadband engine processor
    • L. Ismail and D. Guerchi. Performance Evaluation of Convolution on the Cell Broadband Engine Processor. IEEE PDS, 22(2):337-351, 2011.
    • (2011) IEEE PDS , vol.22 , Issue.2 , pp. 337-351
    • Ismail, L.1    Guerchi, D.2
  • 28
    • 77954701719 scopus 로고    scopus 로고
    • FAST: Fast architecture sensitive tree search on modern CPUs and GPUs
    • C. Kim, J. Chhugani, N. Satish, et al. FAST: Fast Architecture Sensitive Tree search on modern CPUs and GPUs. In SIGMOD, pages 339-350, 2010.
    • (2010) SIGMOD , pp. 339-350
    • Kim, C.1    Chhugani, J.2    Satish, N.3
  • 29
    • 84864839397 scopus 로고    scopus 로고
    • Closing the ninja performance gap through traditional programming and compiler technology
    • C. Kim, N. Satish, J. Chhugani, et al. Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology. Technical report, Intel Labs, 2011.
    • (2011) Technical Report Intel Labs
    • Kim, C.1    Satish, N.2    Chhugani, J.3
  • 31
    • 78650666949 scopus 로고    scopus 로고
    • A synergetic approach to throughput computing on x86-based multicore desktops
    • C.-K. Luk, R. Newton, et al. A synergetic approach to throughput computing on x86-based multicore desktops. IEEE Software, 28:39-50, 2011.
    • (2011) IEEE Software , vol.28 , pp. 39-50
    • Luk, C.-K.1    Newton, R.2
  • 32
    • 0035311079 scopus 로고    scopus 로고
    • Power: A first-class architectural design constraint
    • T. N. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4):52-58, 2001.
    • (2001) IEEE Computer , vol.34 , Issue.4 , pp. 52-58
    • Mudge, T.N.1
  • 33
    • 78650806116 scopus 로고    scopus 로고
    • 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
    • A. Nguyen, N. Satish, et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In SC10, pages 1-13, 2010.
    • (2010) SC10 , pp. 1-13
    • Nguyen, A.1    Satish, N.2
  • 34
    • 79953275887 scopus 로고    scopus 로고
    • Multi-platform auto-vectorization
    • D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, pages 281-294, 2006.
    • (2006) CGO , pp. 281-294
    • Nuzman, D.1    Henderson, R.2
  • 35
    • 63549093768 scopus 로고    scopus 로고
    • Outer-loop vectorization: Revisited for short simd architectures
    • D. Nuzman and A. Zaks. Outer-loop vectorization: revisited for short simd architectures. In PACT, pages 2-11, 2008.
    • (2008) PACT , pp. 2-11
    • Nuzman, D.1    Zaks, A.2
  • 38
    • 85184635665 scopus 로고    scopus 로고
    • Black-Scholes option pricing
    • V. Podlozhnyuk. Black-Scholes option pricing. Nvidia, 2007.
    • (2007) Nvidia
    • Podlozhnyuk, V.1
  • 39
    • 79959466764 scopus 로고    scopus 로고
    • Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
    • S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, pages 73-82, 2008.
    • (2008) PPoPP , pp. 73-82
    • Ryoo, S.1    Rodrigues, C.I.2    Baghsorkhi, S.S.3    Stone, S.S.4    Kirk, D.B.5    Mei, W.6    Hwu, W.7
  • 40
    • 77954743119 scopus 로고    scopus 로고
    • Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort
    • N. Satish, C. Kim, J. Chhugani, et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351-362, 2010.
    • (2010) SIGMOD , pp. 351-362
    • Satish, N.1    Kim, C.2    Chhugani, J.3
  • 43
    • 70350681243 scopus 로고    scopus 로고
    • Mapping high-fidelity volume rendering for medical imaging to CPU, GPU and many-core architectures
    • M. Smelyanskiy, D. Holmes, et al. Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures. IEEE Trans. Vis. Comput. Graph., 15(6):1563-1570, 2009.
    • (2009) IEEE Trans. Vis. Comput. Graph. , vol.15 , Issue.6 , pp. 1563-1570
    • Smelyanskiy, M.1    Holmes, D.2
  • 45
    • 67650998701 scopus 로고    scopus 로고
    • Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
    • S. Williams, J. Carter, L. Oliker, J. Shalf, and K. A. Yelick. Optimization of a lattice boltzmann computation on state-of-the-art multicore platforms. J. Parallel Distrib. Comput., 69(9):762-777, 2009.
    • (2009) J. Parallel Distrib. Comput. , vol.69 , Issue.9 , pp. 762-777
    • Williams, S.1    Carter, J.2    Oliker, L.3    Shalf, J.4    Yelick, K.A.5
  • 46
    • 77952554764 scopus 로고    scopus 로고
    • An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth
    • D. H. Woo, N. H. Seong, D. L. Lewis, and H.-H. S. Lee. An optimized 3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth. In HPCA, pages 1-12, 2010.
    • (2010) HPCA , pp. 1-12
    • Woo, D.H.1    Seong, N.H.2    Lewis, D.L.3    Lee, H.-H.S.4
  • 47
    • 77954691442 scopus 로고    scopus 로고
    • A GPGPU compiler for memory optimization and parallelism management
    • Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, pages 86-97, 2010.
    • (2010) PLDI , pp. 86-97
    • Yang, Y.1    Xiang, P.2    Kong, J.3    Zhou, H.4
  • 48
    • 77954699806 scopus 로고    scopus 로고
    • Bamboo: A data-centric, object-oriented approach to many-core software
    • J. Zhou and B. Demsky. Bamboo: a data-centric, object-oriented approach to many-core software. In PLDI, pages 388-399, 2010.
    • (2010) PLDI , pp. 388-399
    • Zhou, J.1    Demsky, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.