IntOGen - Pipeline

DOI

Analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes.

Requirements: IntOGen depends on Python 3.4 or above and some python libraries. If you don't have Python 3.4 already installed, the easiest way to install all this software stack is using the well known Anaconda Python distribution. /nAlso Perl 5.10 (with DBI module installed) or above has to be available at PATH to be able to run VEP scripts./n By default MutsigCV is disabled. If you want to enable it you have to first download and install Matlab Runtime and MutsigCV](https://www.broadinstitute.org/cancer/cga/mutsig) and then edit the IntOGen configuration file that by default it's at /.intogen/system.conf (parameters: mutsig_enabled, mutsig_path and matlab_mcr) /nInstallation: To install or update to the last stable version of IntOGen you need to run: /n $ pip install intogen pandas=0.17/nAfter this you will have the intogen script available at your path and if this is the first time that you install IntOGen you need to run the setup to download all the data dependencies. This setup will download 3.6Gb of data that after uncompress it will need 9Gb of free space. /n $ intogen --setup/nTIP: By default the IntOGen configuration files are in /.intogen if you want to change this folder you need to define/nthe system environment variable INTOGEN_HOME using the export command. Also, all the datasets are downloaded by/ndefault at /.bgdata if you want to change this folder you need to define the system environment variable BGDATA_LOCAL./nRun an example:/nDownload and extract some samples VCF files:/n $ wget https://bitbucket.org/intogen/intogen-pipeline/downloads/intogen-samples.tar.gz/n $ tar xvzf intogen-samples.tar.gz /nRun IntOGen using the default tasks configuration./n $ intogen -i sample1.vcf -i sample2.vcf -i sample3.vcf -i sample4.vcf /nBrowse the results at the output folder./n /nCustom configuration:/nAt /.intogen/task.conf you can check the default task configuration values. If you want to run the pipeline /nusing different parameters you can change the default values or create a .smconfig file for each project. /nThe .smconfig files are a copy of /.intogen/task.conf but adding id and files parameters. The id is the name /nof the project and the files is a list separated by comma of all the files (MAF, VCF or tab format) that contain /nsamples for that project. /nYou can create a .smconfig file like this:/n $ echo -e "id = allsamples/nfiles = sample1.vcf,sample2.vcf,sample3.vcf,sample4.vcf/n" > allsamples.smconfig/n $ cat /.intogen/task.conf >> allsamples.smconfig/nTo run it again, you need to delete or move the previous output and run using the .smconfig file as input./n $ rm -rf output/n $ intogen -i allsamples.smconfig /nIf you want to run multiple projects at once you can create multiple .smconfig files in one folder and then give that/nfolder as input.

Identifier
DOI https://doi.org/10.34810/data407
Related Identifier IsCitedBy https://doi.org/10.1038/nmeth.2642
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data407
Provenance
Creator González-Pérez, Abel ORCID logo; Pérez Llamas, Christian, 1976-; Tamborero Noguera, David ORCID logo; Schroeder, Michael Philipp, 1986- (ORCID: 0000-0002-7563-509X); Jené i Sanz, Alba, 1984- ORCID logo; Santos, Alberto ORCID logo; López Bigas, Núria ORCID logo; Déu Pons, Jordi ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Rights Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data407
OpenAccess true
Representation
Resource Type Program source code; Dataset
Format text/x-python; application/x-ipynb+json; application/octet-stream; text/plain; charset=US-ASCII; text/plain; charset=UTF-8; text/plain; text/markdown
Size 1245; 13377; 66; 6876; 682; 6171; 3070; 1840; 1669; 3255; 1017; 1609; 13618; 302; 64076; 4819; 7961; 362; 10182; 15615; 631; 7004; 6021; 2074; 11113; 10141; 322; 1812; 0; 699; 16; 1; 20; 22; 14810; 1279; 11607; 1950; 4706; 847; 4649; 27; 656; 412; 2383; 7249; 12254; 7673; 3858; 2637; 562; 13553; 211; 30; 3437; 8207; 7166; 2163; 113; 3912; 3672; 3937; 3919; 1863; 440; 3381; 5302; 1228; 2222; 2448; 2001; 2202; 1271; 2312; 2017; 1877; 5222; 4945; 819; 798; 3737; 8075; 7587; 15104; 3685; 6210; 3057
Version 1.0
Discipline Life Sciences; Medicine