This repository contains datasets associated with the manuscript titled "Prediction of Causal Genes at GWAS Loci with Pleiotropic Gene Regulatory Effects Using Correlated Instrumental Variable Sets." These datasets serve the purpose of supporting the development and validation of a Multivariable Mendelian Randomization (MVMR) method, a statistical technique using sets of genetic instruments (SNPs) to estimate the direct causal effects of multiple exposures (genes) on Coronary Artery Disease (CAD).
The datasets aim to validate the Multivariable Mendelian Randomization (MVMR) method by utilizing summary statistics from Genome-Wide Association Studies (GWAS) on CAD and expression Quantitative Trait Loci (eQTL) analyses for gene expression data. The primary goal is to understand the genetic basis of CAD through pleiotropic gene regulatory effects. All files in this dataset have been generated from the following GWAS summary statistics and gene expression studies.
GWAS Summary Data (ebi-a-GCST003116):
- Trait: Coronary Artery Disease (CAD)
- Association Analysis: Instruments (SNPs) to Outcome (CAD)
- Year: 2015
- Population: European
- Source: TwoSampleMR Package
A GWAS (Genome-Wide Association Study) summary data file for a trait like Coronary Artery Disease (CAD) typically contains information about genetic variants across the entire genome and their associations with the trait of interest. Common components found in a GWAS summary data file include a SNP ID, which is a unique identifier for each genetic variant, often represented by a Single Nucleotide Polymorphism (SNP) ID. Additionally, the file contains the chromosome and position (genomic location of the variant on a specific chromosome), alleles associated with each variant, effect size (Beta or Odds Ratio), standard error of the effect size, p-value, minor allele frequency (MAF), and sample size.
eQTL Analysis Summary Data:
- Source: STARNET/GTEx
- Association Analysis: Instruments (SNPs) to Exposures (Genes)
- Validation Data: GTEx
- Population: European-American subjects
- Validation Data Download Link: GTEx Portal
Nature and Scope:
The eQTL Analysis Summary Data also contains information in the same format as the GWAS data file, focusing on the expression levels of genes across different tissues (atherosclerotic aortic root (Aor), blood, atherosclerotic-lesion-free internal mammary artery (Mam), subcutaneous fat (Sf), visceral abdominal fat (Vaf), skeletal muscle (Sklm), and liver (Liv)). Each entry includes the association between a genetic variant (SNP) and a gene identified using a Gene ID (e.g., Ensembl ID), with the effect size representing the magnitude and direction of the association.
The dataset encompasses various files, including .csv files providing information on SNPs, gene expression, and outcome effects, and presenting the results of causal analyses. The dataset's primary focus lies in understanding the genetic basis of CAD through pleiotropic gene regulatory effects.
Detailed files overview in 0_ReadMe.txt
Python, 3.8.10
R, 4.3.2