메뉴 건너뛰기




Volumn 10, Issue 1, 2015, Pages

Experiences with workflows for automating data-intensive bioinformatics

(18)  Spjuth, Ola a   Bongcam Rudloff, Erik b   Hernández, Guillermo Carrasco a   Forer, Lukas c   Giovacchini, Mario a   Guimera, Roman Valls a   Kallio, Aleksi d   Korpelainen, Eija d   Kańdula, Maciej M e   Krachunov, Milko f   Kreil, David P e   Kulev, Ognyan f   Łabaj, Pawel P e   Lampa, Samuel a   Pireddu, Luca g   Schönherr, Sebastian c   Siretskiy, Alexey h   Vassilev, Dimitar i  


Author keywords

Automation; Big data; Data intensive; High performance computing; Reproducibility; Workflow

Indexed keywords

BIOLOGY; HIGH THROUGHPUT SEQUENCING; INFORMATION PROCESSING; PROCEDURES; REPRODUCIBILITY; WORKFLOW;

EID: 84939529358     PISSN: None     EISSN: 17456150     Source Type: Journal    
DOI: 10.1186/s13062-015-0071-8     Document Type: Review
Times cited : (53)

References (51)
  • 1
    • 84878979335 scopus 로고    scopus 로고
    • Biology: The big challenges of big data
    • Marx V. Biology: The big challenges of big data. Nature. 2013; 498(7453):255-60. doi: 10.1038/498255a .
    • (2013) Nature , vol.498 , Issue.7453 , pp. 255-260
    • Marx, V.1
  • 2
    • 84887279087 scopus 로고    scopus 로고
    • Parallelization in Scientific Workflow Management Systems
    • Bux M, Leser U. Parallelization in Scientific Workflow Management Systems. ArXiv e-prints. 2013. 1303.7195.
    • (2013) ArXiv e-prints , vol.1303 , pp. 7195
    • Bux, M.1    Leser, U.2
  • 3
    • 10244255212 scopus 로고    scopus 로고
    • Taverna: a tool for the composition and enactment of bioinformatics workflows
    • Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al.Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004; 20(17):3045-54. doi: 10.1093/bioinformatics/bth361 .
    • (2004) Bioinformatics , vol.20 , Issue.17 , pp. 3045-3054
    • Oinn, T.1    Addis, M.2    Ferris, J.3    Marvin, D.4    Senger, M.5    Greenwood, M.6
  • 5
    • 25844449770 scopus 로고    scopus 로고
    • Galaxy: a platform for interactive large-scale genome analysis
    • Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, et al.Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005; 15(10):1451-5. doi: 10.1101/gr.4086505 .
    • (2005) Genome Res , vol.15 , Issue.10 , pp. 1451-1455
    • Giardine, B.1    Riemer, C.2    Hardison, R.C.3    Burhans, R.4    Elnitski, L.5    Shah, P.6
  • 6
    • 77955801615 scopus 로고    scopus 로고
    • Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences
    • Goecks J, Nekrutenko A, Taylor J, Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):86. doi: 10.1186/gb-2010-11-8-r86 .
    • (2010) Genome Biol , vol.11 , Issue.8 , pp. 86
    • Goecks, J.1    Nekrutenko, A.2    Taylor, J.3
  • 7
    • 5444235762 scopus 로고    scopus 로고
    • Kepler: an extensible system for design and execution of scientific workflows. In: Scientific and Statistical Database Management
    • Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S. Kepler: an extensible system for design and execution of scientific workflows. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference On: 2004. p. 423-4. doi: 10.1109/SSDM.2004.1311241 .
    • (2004) Proceedings. 16th International Conference On: 2004 , pp. 423-424
    • Altintas, I.1    Berkley, C.2    Jaeger, E.3    Jones, M.4    Ludascher, B.5    Mock, S.6
  • 8
    • 80054003242 scopus 로고    scopus 로고
    • Chipster: user-friendly analysis software for microarray and other high-throughput data
    • Kallio MA, Tuimala JT, Hupponen T, Klemelä P, Gentile M, Scheinin I, et al. Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics. 2011; 12:507. doi: 10.1186/1471-2164-12-507 .
    • (2011) BMC Genomics , vol.12 , pp. 507
    • Kallio, M.A.1    Tuimala, J.T.2    Hupponen, T.3    Klemelä, P.4    Gentile, M.5    Scheinin, I.6
  • 9
    • 84861746974 scopus 로고    scopus 로고
    • Bpipe: a tool for running and managing bioinformatics pipelines
    • Sadedin SP, Pope B, Oshlack A. Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics. 2012; 28(11):1525-6. doi: 10.1093/bioinformatics/bts167 .
    • (2012) Bioinformatics , vol.28 , Issue.11 , pp. 1525-1526
    • Sadedin, S.P.1    Pope, B.2    Oshlack, A.3
  • 10
    • 84867306721 scopus 로고    scopus 로고
    • Snakemake-a scalable bioinformatics workflow engine
    • Köster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2012; 28(19):2520-2. doi: 10.1093/bioinformatics/bts480 .
    • (2012) Bioinformatics , vol.28 , Issue.19 , pp. 2520-2522
    • Köster, J.1    Rahmann, S.2
  • 11
    • 0018456754 scopus 로고
    • Make - a program for maintaining computer programs a program for maintaining computer programs
    • Feldman SI. Make - a program for maintaining computer programs a program for maintaining computer programs. Softw Pract Experience. 1979; 9(4):255-65. doi: 10.1002/spe.4380090402 .
    • (1979) Softw Pract Experience , vol.9 , Issue.4 , pp. 255-265
    • Feldman, S.I.1
  • 13
    • 77954492012 scopus 로고    scopus 로고
    • Cloud computing and the DNA data race
    • Schatz M, Langmead B, Salzberg S. Cloud computing and the DNA data race. Nat Biotechnol. 2010; 28:691-3. doi: 10.1038/nbt0710-691 .
    • (2010) Nat Biotechnol , vol.28 , pp. 691-693
    • Schatz, M.1    Langmead, B.2    Salzberg, S.3
  • 14
    • 77954526823 scopus 로고    scopus 로고
    • The case for cloud computing in genome informatics
    • Stein L. The case for cloud computing in genome informatics. Genome Biol. 2010; 11:207. doi: 10.1186/gb-2010-11-5-207 .
    • (2010) Genome Biol , vol.11 , pp. 207
    • Stein, L.1
  • 16
    • 84939531634 scopus 로고    scopus 로고
    • Hadoop: The Definitive Guide, 1st edn. Sebastopol, CA: O'Reilly
    • White T. Hadoop: The Definitive Guide, 1st edn. Sebastopol, CA: O'Reilly; 2009. http://oreilly.com/catalog/9780596521981 .
    • (2009)
    • White, T.1
  • 18
    • 85006201950 scopus 로고    scopus 로고
    • Lessons learned from implementing a national infrastructure in sweden for storage and analysis of next-generation sequencing data
    • Lampa S, Dahlö M, Olason PI, Hagberg J, Spjuth O. Lessons learned from implementing a national infrastructure in sweden for storage and analysis of next-generation sequencing data. Gigascience. 2013; 2(1):9. doi: 10.1186/2047-217X-2-9 .
    • (2013) Gigascience , vol.2 , Issue.1 , pp. 9
    • Lampa, S.1    Dahlö, M.2    Olason, P.I.3    Hagberg, J.4    Spjuth, O.5
  • 19
    • 84861027751 scopus 로고    scopus 로고
    • Molecular modelling of g protein-coupled receptors through the web
    • Rodríguez D, Bello X, Gutiérrez-de-Terán H. Molecular modelling of g protein-coupled receptors through the web. Mol Inf. 2012; 31(5):334-41. doi: 10.1002/minf.201100162 .
    • (2012) Mol Inf , vol.31 , Issue.5 , pp. 334-341
    • Rodríguez, D.1    Bello, X.2    Gutiérrez-de-Terán, H.3
  • 21
    • 84979619031 scopus 로고    scopus 로고
    • A quantitative assessment of the hadoop framework for analyzing massively parallel dna sequencing data
    • Siretskiy A, Sundqvist T, Voznesenskiy M, Spjuth O. A quantitative assessment of the hadoop framework for analyzing massively parallel dna sequencing data. Gigascience. 2015; 4:26. doi: 10.1186/s13742-015-0058-5 .
    • (2015) Gigascience , vol.4 , pp. 26
    • Siretskiy, A.1    Sundqvist, T.2    Voznesenskiy, M.3    Spjuth, O.4
  • 22
    • 84919475560 scopus 로고    scopus 로고
    • Htseq-hadoop: Extending htseq for massively parallel sequencing data analysis using hadoop
    • IEEE 10th International Conference On: 2014.
    • Siretskiy A, Spjuth O. Htseq-hadoop: Extending htseq for massively parallel sequencing data analysis using hadoop. In: eScience (eScience), 2014 IEEE 10th International Conference On: 2014.
    • (2014) eScience (eScience)
    • Siretskiy, A.1    Spjuth, O.2
  • 23
    • 84928987900 scopus 로고    scopus 로고
    • Htseq-a python framework to work with high-throughput sequencing data
    • Anders S, Pyl PT, Huber W. Htseq-a python framework to work with high-throughput sequencing data. Bioinformatics. 2015; 31(2):166-169. doi: 10.1093/bioinformatics/btu638 .
    • (2015) Bioinformatics , vol.31 , Issue.2 , pp. 166-169
    • Anders, S.1    Pyl, P.T.2    Huber, W.3
  • 24
    • 84920550975 scopus 로고    scopus 로고
    • A comprehensive assessment of rna-seq accuracy, reproducibility and information content by the sequencing quality control consortium
    • SEQC/MAQC-III Consortium. A comprehensive assessment of rna-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol. 2014; 32(9):903-14. doi: 10.1038/nbt.2957 .
    • (2014) Nat Biotechnol , vol.32 , Issue.9 , pp. 903-914
  • 25
    • 84909587930 scopus 로고    scopus 로고
    • Detecting and correcting systematic variation in large-scale rna sequencing data
    • Li S, Łabaj PP, Zumbo P, Sykacek P, Shi W, Shi L, et al.Detecting and correcting systematic variation in large-scale rna sequencing data. Nat Biotechnol. 2014; 32(9):888-95. doi: 10.1038/nbt.3000 .
    • (2014) Nat Biotechnol , vol.32 , Issue.9 , pp. 888-895
    • Li, S.1    Łabaj, P.P.2    Zumbo, P.3    Sykacek, P.4    Shi, W.5    Shi, L.6
  • 28
    • 77958489718 scopus 로고    scopus 로고
    • Ruffus: a lightweight python library for computational pipelines
    • Goodstadt L. Ruffus: a lightweight python library for computational pipelines. Bioinformatics. 2010; 26(21):2778-9. doi: 10.1093/bioinformatics/btq524 .
    • (2010) Bioinformatics , vol.26 , Issue.21 , pp. 2778-2779
    • Goodstadt, L.1
  • 29
    • 84884994218 scopus 로고    scopus 로고
    • The cancer genome atlas pan-cancer analysis project
    • Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013; 45(10):1113-20. doi: 10.1038/ng.2764 .
    • (2013) Nat Genet , vol.45 , Issue.10 , pp. 1113-1120
    • Weinstein, J.N.1    Collisson, E.A.2    Mills, G.B.3    Shaw, K.R.M.4    Ozenberger, B.A.5
  • 30
    • 84859075799 scopus 로고    scopus 로고
    • Hadoop-bam: directly manipulating next generation sequencing data in the cloud
    • Niemenmaa M, Kallio A, Schumacher A, Klemelä P, Korpelainen E, Heljanko K. Hadoop-bam: directly manipulating next generation sequencing data in the cloud. Bioinformatics. 2012; 28(6):876-7. doi: 10.1093/bioinformatics/bts054 . http://bioinformatics.oxfordjournals.org/content/28/6/876.full.pdf+html .
    • (2012) Bioinformatics , vol.28 , Issue.6 , pp. 876-877
    • Niemenmaa, M.1    Kallio, A.2    Schumacher, A.3    Klemelä, P.4    Korpelainen, E.5    Heljanko, K.6
  • 31
    • 84891355779 scopus 로고    scopus 로고
    • Seqpig: simple and scalable scripting for large sequencing data sets in hadoop
    • Schumacher A, Pireddu L, Niemenmaa M, Kallio A, Korpelainen E, Zanetti G, et al.Seqpig: simple and scalable scripting for large sequencing data sets in hadoop. Bioinformatics. 2014; 30(1):119-20. doi: 10.1093/bioinformatics/btt601 . http://bioinformatics.oxfordjournals.org/content/30/1/119.full.pdf+html .
    • (2014) Bioinformatics , vol.30 , Issue.1 , pp. 119-120
    • Schumacher, A.1    Pireddu, L.2    Niemenmaa, M.3    Kallio, A.4    Korpelainen, E.5    Zanetti, G.6
  • 32
    • 77957944901 scopus 로고    scopus 로고
    • Computational science:..error
    • Merali Z. Computational science:..error. Nature. 2010; 467(7317):775-7. doi: 10.1038/467775a .
    • (2010) Nature , vol.467 , Issue.7317 , pp. 775-777
    • Merali, Z.1
  • 33
    • 84884878742 scopus 로고    scopus 로고
    • Genetic variants regulating immune cell levels in health and disease.
    • Orrù V, Steri M, Sole G, Sidore C, Virdis F, Dei M, et al. Genetic variants regulating immune cell levels in health and disease.Cell. 2013; 155(1):242-56. doi: 10.1016/j.cell.2013.08.041 .
    • (2013) Cell , vol.155 , Issue.1 , pp. 242-256
    • Orrù, V.1    Steri, M.2    Sole, G.3    Sidore, C.4    Virdis, F.5    Dei, M.6
  • 34
    • 84881048484 scopus 로고    scopus 로고
    • Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y-chromosome phylogeny.
    • Francalacci P, Morelli L, Angius A, Berutti R, Reinier F, Atzeni R, et al. Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y-chromosome phylogeny.Science (New York, N.Y.) 2013; 341(6145):565-9. doi: 10.1126/science.1237947 .
    • (2013) Science (New York, N.Y.) , vol.341 , Issue.6145 , pp. 565-569
    • Francalacci, P.1    Morelli, L.2    Angius, A.3    Berutti, R.4    Reinier, F.5    Atzeni, R.6
  • 35
    • 84908632088 scopus 로고    scopus 로고
    • An automated infrastructure to support high-throughput bioinformatics
    • International Conference On: 2014
    • Cuccuru G, Leo S, Lianas L, Muggiri M, Pinna A, Pireddu L, et al.An automated infrastructure to support high-throughput bioinformatics. In: High Performance Computing Simulation (HPCS), 2014 International Conference On: 2014. p. 600-7. doi: 10.1109/HPCSim.2014.6903742 .
    • (2014) High Performance Computing Simulation (HPCS) , pp. 600-607
    • Cuccuru, G.1    Leo, S.2    Lianas, L.3    Muggiri, M.4    Pinna, A.5    Pireddu, L.6
  • 36
    • 79960410567 scopus 로고    scopus 로고
    • Seal: a distributed short read mapping and duplicate removal tool
    • Pireddu L, Leo S, Zanetti G. Seal: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011. doi: 10.1093/bioinformatics/btr325 . http://bioinformatics.oxfordjournals.org/content/early/2011/06/22/bioinformatics.btr325.full.pdfhtml .
    • (2011) Bioinformatics
    • Pireddu, L.1    Leo, S.2    Zanetti, G.3
  • 38
    • 78650459836 scopus 로고    scopus 로고
    • Haplogrep: a fast and reliable algorithm for automatic classification of mitochondrial dna haplogroups
    • Kloss-Brandstätter A, Pacher D, Schönherr S, Weissensteiner H, Binna R, Specht G, et al.Haplogrep: a fast and reliable algorithm for automatic classification of mitochondrial dna haplogroups. Hum Mutat. 2011; 32(1):25-32. doi: 10.1002/humu.21382 .
    • (2011) Hum Mutat , vol.32 , Issue.1 , pp. 25-32
    • Kloss-Brandstätter, A.1    Pacher, D.2    Schönherr, S.3    Weissensteiner, H.4    Binna, R.5    Specht, G.6
  • 40
    • 84866781695 scopus 로고    scopus 로고
    • Using cloud computing infrastructure with cloudbiolinux, cloudman, and galaxy
    • Afgan E, Chapman B, Jadan M, Franke V, Taylor J. Using cloud computing infrastructure with cloudbiolinux, cloudman, and galaxy. Curr Protoc Bioinformatics. 2012; Chapter 11:11-9. doi: 10.1002/0471250953.bi1109s38 .
    • (2012) Curr Protoc Bioinformatics , pp. 11-19
    • Afgan, E.1    Chapman, B.2    Jadan, M.3    Franke, V.4    Taylor, J.5
  • 44
    • 25744456134 scopus 로고
    • Pgmake: A portable distributed make system
    • Lih A, Zadok E. Pgmake: A portable distributed make system. 1994. Technical report.
    • (1994) Technical report.
    • Lih, A.1    Zadok, E.2
  • 45
    • 84870255888 scopus 로고    scopus 로고
    • Design and implementation of gxp make - a workflow system based on make
    • Taura K, Matsuzaki T, Miwa M, Kamoshida Y, Yokoyama D, Dun N, et al.Design and implementation of gxp make - a workflow system based on make. Future Gener Comput Syst. 2013; 29(2):662-72. doi: 10.1016/j.future.2011.05.026 .
    • (2013) Future Gener Comput Syst , vol.29 , Issue.2 , pp. 662-672
    • Taura, K.1    Matsuzaki, T.2    Miwa, M.3    Kamoshida, Y.4    Yokoyama, D.5    Dun, N.6
  • 49
    • 84939520107 scopus 로고    scopus 로고
    • Interoperability With Moby 1.0 - It's Better Than Sharing Your Toothbrush!
    • Nature Precedings.
    • Wilkinson M. Interoperability With Moby 1.0 - It's Better Than Sharing Your Toothbrush!. 2008. Available from Nature Precedings.
    • (2008)
    • Wilkinson, M.1
  • 50
    • 79953305104 scopus 로고    scopus 로고
    • Conveyor: a workflow engine for bioinformatic analyses
    • Linke B, Giegerich R, Goesmann A. Conveyor: a workflow engine for bioinformatic analyses. Bioinformatics. 2011; 27(7):903-11.
    • (2011) Bioinformatics , vol.27 , Issue.7 , pp. 903-911
    • Linke, B.1    Giegerich, R.2    Goesmann, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.