Title: | Bootstrapping and statistical analysis for TFBS-disrupting SNPs |
---|---|
Description: | Bootstrap motifbreakR results for statistical analysis on TFBS-disrupting SNPs in upstream regions of a set of genes of interest. |
Authors: | Stephen Turner [aut, cre] , Colossal Biosciences [cph] |
Maintainer: | Stephen Turner <[email protected]> |
License: | file LICENSE |
Version: | 1.0.1 |
Built: | 2024-11-11 05:35:14 UTC |
Source: | https://github.com/colossal-compsci/tfboot |
Get upstream intervals from a GRanges object.
get_upstream(gr, width = 5000L)
get_upstream(gr, width = 5000L)
gr |
A GenomicRanges object. |
width |
Width of the upstream interval. Default 5000. |
A GRanges object with intervals of width specified by width
upstream of the start position of the gr
input GRanges object.
gr <- data.frame(seqnames = "chr1", strand = c("+", "-"), start = c(42001, 67001), width = c(1000, 1000), gene = c("A", "B")) |> plyranges::as_granges() gr get_upstream(gr)
gr <- data.frame(seqnames = "chr1", strand = c("+", "-"), start = c(42001, 67001), width = c(1000, 1000), gene = c("A", "B")) |> plyranges::as_granges() gr get_upstream(gr)
Get SNPs upstream of gene regions. Input SNPs read in with read_vcf and an appropriate TxDb object.
get_upstream_snps(snps, txdb, level = "genes", ...)
get_upstream_snps(snps, txdb, level = "genes", ...)
snps |
SNPs read in with read_vcf. |
txdb |
A GenomicFeatures::TxDb object with transcript annotations for the organism of interest. Must match the organism specified in the |
level |
Currently only |
... |
Additional arguments passed to get_upstream (e.g., |
A GRanges object containing SNPs in regions upstream of intervals in the specified TxDb. See get_upstream.
## Not run: library(BSgenome.Ggallus.UCSC.galGal6) library(TxDb.Ggallus.UCSC.galGal6.refGene) bs <- BSgenome.Ggallus.UCSC.galGal6 tx <- TxDb.Ggallus.UCSC.galGal6.refGene vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE) snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6) upstreamsnps <- get_upstream_snps(snps=snps, txdb=tx, level="genes") upstreamsnps ## End(Not run)
## Not run: library(BSgenome.Ggallus.UCSC.galGal6) library(TxDb.Ggallus.UCSC.galGal6.refGene) bs <- BSgenome.Ggallus.UCSC.galGal6 tx <- TxDb.Ggallus.UCSC.galGal6.refGene vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE) snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6) upstreamsnps <- get_upstream_snps(snps=snps, txdb=tx, level="genes") upstreamsnps ## End(Not run)
Get statistics (p-values) on your gene set's motifbreakR results compared to the bootstrapped empirical null distribution.
mb_bootstats(mbsmry, mbboot)
mb_bootstats(mbsmry, mbboot)
mbsmry |
Results from running mb_summarize on motifbreakR results from your gene set of interest. |
mbboot |
Results from running mb_bootstrap on a full background of all genes. Typically this should be performed once, with results read in from a file. |
A tibble with each metric from your bootstrap resampling (see mb_summarize), and p-value comparing the actual value of your gene set (stat
) against the empirical null distribution (bootdist
).
data(vignettedata) mbres <- vignettedata$mbres mball <- vignettedata$mball mbsmry <- mb_summarize(mbres) mbsmry set.seed(42) mbboot <- mb_bootstrap(mball, ngenes=5, boots = 100) mbboot mb_bootstats(mbsmry, mbboot)
data(vignettedata) mbres <- vignettedata$mbres mball <- vignettedata$mball mbsmry <- mb_summarize(mbres) mbsmry set.seed(42) mbboot <- mb_bootstrap(mball, ngenes=5, boots = 100) mbboot mb_bootstats(mbsmry, mbboot)
Bootstrap motifbreakR results. Takes motifbreakR results as a tibble (from
mb_to_tibble), draws boots
random samples of ngenes
genes, and returns
as a list (1) a wide tibble with results for each bootstrap, and (2) another
tibble with the distribution of each metric in a listcol. See examples.
mb_bootstrap(mbtibble, ngenes, boots = 100, key_col = "gene_id")
mb_bootstrap(mbtibble, ngenes, boots = 100, key_col = "gene_id")
mbtibble |
A tibble of motifbreakR results from mb_to_tibble. |
ngenes |
The number of genes to sample with each bootstrap resample. |
boots |
The number of bootstrap resamples. |
key_col |
The name of the column used to key the txdb. Default |
Typically, you want to run this function on a full set of all genes to create an empirical null distribution. You should run motifbreakR once on all genes, save this as an RData object, and read it in during bootstrap resampling. See vignettes.
A list of two tibbles. See description and examples.
data(vignettedata) vignettedata$mball mb_bootstrap(vignettedata$mball, ngenes=5, boots=5)
data(vignettedata) vignettedata$mball mb_bootstrap(vignettedata$mball, ngenes=5, boots=5)
Summarizes motifbreakR results as a tibble from mb_to_tibble. See details.
mb_summarize(mbtibble, key_col = "gene_id")
mb_summarize(mbtibble, key_col = "gene_id")
mbtibble |
motifbreakR results summarized with mb_to_tibble. |
key_col |
The name of the column used to key the txdb. Default |
Summarizes motifbreakR results. Returns a tibble with columns indicating:
ngenes
: The number of genes in the SNP set.
nsnps
: The number of SNPs total.
nstrong
: The number of SNPs with a "strong" effect.
alleleDiffAbsMean
The mean of the absolute values of the alleleDiff
scores.
alleleDiffAbsSum
The sum of the absolute values of the alleleDiff
scores.
alleleEffectSizeAbsMean
The mean of the absolute values of the alleleEffectSize
scores.
alleleEffectSizeAbsSum
The sum of the absolute values of the alleleEffectSize
scores.
A tibble. See description.
data(vignettedata) vignettedata$mbres mb_summarize(vignettedata$mbres)
data(vignettedata) vignettedata$mbres mb_summarize(vignettedata$mbres)
Make a compact tibble with only select columns from motifbreakR results GRanges objects.
mb_to_tibble(mb, key_col = "gene_id")
mb_to_tibble(mb, key_col = "gene_id")
mb |
motifbreakR results GRanges object. |
key_col |
The name of the column used to key the txdb. Default |
It's a good idea to run motifbreakR once on the background set of all genes and save this as an RData (or .rds) file, and read in when you need them.
A tibble containing the key column (usually gene_id
), and a select number of other columns needed for downstream statistical analysis.
Plot bootstrap distributions of motifbreakR results with your critical value highlighted by a vertical red line.
plot_bootstats(bootstats)
plot_bootstats(bootstats)
bootstats |
Output from running mb_bootstats on your results and a bootstrap resampling. |
A ggplot2 plot object. See description and examples.
data(vignettedata) mbres <- vignettedata$mbres mball <- vignettedata$mball mbsmry <- mb_summarize(mbres) set.seed(42) mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250) bootstats <- mb_bootstats(mbsmry, mbboot) plot_bootstats(bootstats)
data(vignettedata) mbres <- vignettedata$mbres mball <- vignettedata$mball mbsmry <- mb_summarize(mbres) set.seed(42) mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250) bootstats <- mb_bootstats(mbsmry, mbboot) plot_bootstats(bootstats)
Helper function to read in SNP data from a VCF file.
read_vcf(file, bsgenome)
read_vcf(file, bsgenome)
file |
File path to a VCF file. See details. |
bsgenome |
An object of class BSgenome for the species you are interrogating; see BSgenome::available.genomes for a list of species. |
Note that the VCF must be filtered to only contain variant sites (i.e.,
no 0/0
), or only homozygous alt sites if you choose (0/1
or 1/1
). This
can be accomplished with bcftools:
# Filter to any variant sites:
bcftools view -i 'GT="alt"' ...
# Filter to homozygous alt sites:
bcftools view -i 'GT="AA"' ...
A GRanges object containing SNP_id
, REF
, and ALT
columns.
## Not run: library(BSgenome.Ggallus.UCSC.galGal6) vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE) snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6) snps ## End(Not run)
## Not run: library(BSgenome.Ggallus.UCSC.galGal6) vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE) snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6) snps ## End(Not run)
Splits a GRanges object into a GRangesList by a column (typically gene_id
).
This function is deprecated and generally has no good use case. Originally
written to split up a GRanges object into a list to iterate over using
furrr::future_map()
, but deprecated in favor of using built-in
parallelization in motifbreakR.
split_gr_by_id(gr, key_col = "gene_id")
split_gr_by_id(gr, key_col = "gene_id")
gr |
A GRanges object returned by get_upstream_snps. |
key_col |
The name of the column in |
A list of genomic ranges split by split_col
.
## Not run: gr <- data.frame(seqnames=rep(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), start=1:10, width=1, gene_id = rep(c("gene1", "gene2", "gene3", "gene4"), c(4, 2, 1, 3))) |> plyranges::as_granges() gr split_gr_by_id(gr, key_col="gene_id") ## End(Not run)
## Not run: gr <- data.frame(seqnames=rep(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), start=1:10, width=1, gene_id = rep(c("gene1", "gene2", "gene3", "gene4"), c(4, 2, 1, 3))) |> plyranges::as_granges() gr split_gr_by_id(gr, key_col="gene_id") ## End(Not run)
Pre-cooked data used by the vignette. This data is precomputed to speed vignette compilation time and is used throughout the examples. Contains a list of two objects:
mbres
: Results from running motifbreakR on five randomly selected genes, then run through mb_to_tibble.
mball
: Results from running motifbreakR on all genes, then run through mb_to_tibble.
vignettedata
vignettedata
An object of class list
of length 2.