Package 'tfboot' reference manual

Title:	Bootstrapping and statistical analysis for TFBS-disrupting SNPs
Description:	Bootstrap motifbreakR results for statistical analysis on TFBS-disrupting SNPs in upstream regions of a set of genes of interest.
Authors:	Stephen Turner [aut, cre] , Colossal Biosciences [cph]
Maintainer:	Stephen Turner <[email protected]>
License:	file LICENSE
Version:	1.0.1
Built:	2025-03-11 05:51:04 UTC
Source:	https://github.com/colossal-compsci/tfboot

Get upstream intervals

Description

Get upstream intervals from a GRanges object.

Usage

get_upstream(gr, width = 5000L)
get_upstream(gr, width = 5000L)

Arguments

`gr`	A GenomicRanges object.
`width`	Width of the upstream interval. Default 5000.

Value

A GRanges object with intervals of width specified by width upstream of the start position of the gr input GRanges object.

Examples

gr <- data.frame(seqnames = "chr1",
                 strand = c("+", "-"),
                 start = c(42001, 67001),
                 width = c(1000, 1000),
                 gene = c("A", "B")) |>
  plyranges::as_granges()
gr
get_upstream(gr)
gr <- data.frame(seqnames = "chr1",
                 strand = c("+", "-"),
                 start = c(42001, 67001),
                 width = c(1000, 1000),
                 gene = c("A", "B")) |>
  plyranges::as_granges()
gr
get_upstream(gr)

Get SNPs in upstream regions

Description

Get SNPs upstream of gene regions. Input SNPs read in with read_vcf and an appropriate TxDb object.

Usage

get_upstream_snps(snps, txdb, level = "genes", ...)
get_upstream_snps(snps, txdb, level = "genes", ...)

Arguments

`snps`	SNPs read in with read_vcf.
`txdb`	A GenomicFeatures::TxDb object with transcript annotations for the organism of interest. Must match the organism specified in the `bsgenome` argument of read_vcf.
`level`	Currently only `"genes"` and `"transcripts"` are supported, which run GenomicFeatures::genes and GenomicFeatures::transcripts, respectively.
`...`	Additional arguments passed to get_upstream (e.g., `⁠width=⁠`).

Value

A GRanges object containing SNPs in regions upstream of intervals in the specified TxDb. See get_upstream.

Examples

## Not run: 
library(BSgenome.Ggallus.UCSC.galGal6)
library(TxDb.Ggallus.UCSC.galGal6.refGene)
bs <- BSgenome.Ggallus.UCSC.galGal6
tx <- TxDb.Ggallus.UCSC.galGal6.refGene
vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE)
snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6)
upstreamsnps <- get_upstream_snps(snps=snps, txdb=tx, level="genes")
upstreamsnps

## End(Not run)
## Not run: 
library(BSgenome.Ggallus.UCSC.galGal6)
library(TxDb.Ggallus.UCSC.galGal6.refGene)
bs <- BSgenome.Ggallus.UCSC.galGal6
tx <- TxDb.Ggallus.UCSC.galGal6.refGene
vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE)
snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6)
upstreamsnps <- get_upstream_snps(snps=snps, txdb=tx, level="genes")
upstreamsnps

## End(Not run)

Bootstrap statistics

Description

Get statistics (p-values) on your gene set's motifbreakR results compared to the bootstrapped empirical null distribution.

Usage

mb_bootstats(mbsmry, mbboot)
mb_bootstats(mbsmry, mbboot)

Arguments

`mbsmry`	Results from running mb_summarize on motifbreakR results from your gene set of interest.
`mbboot`	Results from running mb_bootstrap on a full background of all genes. Typically this should be performed once, with results read in from a file.

Value

A tibble with each metric from your bootstrap resampling (see mb_summarize), and p-value comparing the actual value of your gene set (stat) against the empirical null distribution (bootdist).

Examples

data(vignettedata)
mbres <- vignettedata$mbres
mball <- vignettedata$mball
mbsmry <- mb_summarize(mbres)
mbsmry
set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 100)
mbboot
mb_bootstats(mbsmry, mbboot)
data(vignettedata)
mbres <- vignettedata$mbres
mball <- vignettedata$mball
mbsmry <- mb_summarize(mbres)
mbsmry
set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 100)
mbboot
mb_bootstats(mbsmry, mbboot)

Bootstrap motifbreakR results. Takes motifbreakR results as a tibble (from mb_to_tibble), draws boots random samples of ngenes genes, and returns as a list (1) a wide tibble with results for each bootstrap, and (2) another tibble with the distribution of each metric in a listcol. See examples.

Usage

mb_bootstrap(mbtibble, ngenes, boots = 100, key_col = "gene_id")
mb_bootstrap(mbtibble, ngenes, boots = 100, key_col = "gene_id")

Arguments

`mbtibble`	A tibble of motifbreakR results from mb_to_tibble.
`ngenes`	The number of genes to sample with each bootstrap resample.
`boots`	The number of bootstrap resamples.
`key_col`	The name of the column used to key the txdb. Default `gene_id`. May be `transcript_id` or otherwise if you use a different value of `level` in get_upstream_snps.

Details

Typically, you want to run this function on a full set of all genes to create an empirical null distribution. You should run motifbreakR once on all genes, save this as an RData object, and read it in during bootstrap resampling. See vignettes.

Value

A list of two tibbles. See description and examples.

Examples

data(vignettedata)
vignettedata$mball
mb_bootstrap(vignettedata$mball, ngenes=5, boots=5)
data(vignettedata)
vignettedata$mball
mb_bootstrap(vignettedata$mball, ngenes=5, boots=5)

Summarize motifbreakR results

Description

Summarizes motifbreakR results as a tibble from mb_to_tibble. See details.

Usage

mb_summarize(mbtibble, key_col = "gene_id")
mb_summarize(mbtibble, key_col = "gene_id")

Arguments

`mbtibble`	motifbreakR results summarized with mb_to_tibble.
`key_col`	The name of the column used to key the txdb. Default `gene_id`. May be `transcript_id` or otherwise if you use a different value of `level` in get_upstream_snps.

Details

Summarizes motifbreakR results. Returns a tibble with columns indicating:

ngenes: The number of genes in the SNP set.
nsnps: The number of SNPs total.
nstrong: The number of SNPs with a "strong" effect.
alleleDiffAbsMean The mean of the absolute values of the alleleDiff scores.
alleleDiffAbsSum The sum of the absolute values of the alleleDiff scores.
alleleEffectSizeAbsMean The mean of the absolute values of the alleleEffectSize scores.
alleleEffectSizeAbsSum The sum of the absolute values of the alleleEffectSize scores.

Value

A tibble. See description.

Examples

data(vignettedata)
vignettedata$mbres
mb_summarize(vignettedata$mbres)
data(vignettedata)
vignettedata$mbres
mb_summarize(vignettedata$mbres)

motifbreakR results to tibble

Description

Make a compact tibble with only select columns from motifbreakR results GRanges objects.

Usage

mb_to_tibble(mb, key_col = "gene_id")
mb_to_tibble(mb, key_col = "gene_id")

Arguments

`mb`	motifbreakR results GRanges object.
`key_col`	The name of the column used to key the txdb. Default `gene_id`. May be `transcript_id` or otherwise if you use a different value of `level` in get_upstream_snps.

Details

It's a good idea to run motifbreakR once on the background set of all genes and save this as an RData (or .rds) file, and read in when you need them.

Value

A tibble containing the key column (usually gene_id), and a select number of other columns needed for downstream statistical analysis.

Plot bootstrap distributions

Description

Plot bootstrap distributions of motifbreakR results with your critical value highlighted by a vertical red line.

Usage

plot_bootstats(bootstats)
plot_bootstats(bootstats)

Arguments

bootstats

Output from running mb_bootstats on your results and a bootstrap resampling.

Value

A ggplot2 plot object. See description and examples.

Examples

data(vignettedata)
mbres <- vignettedata$mbres
mball <- vignettedata$mball
mbsmry <- mb_summarize(mbres)
set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250)
bootstats <- mb_bootstats(mbsmry, mbboot)
plot_bootstats(bootstats)
data(vignettedata)
mbres <- vignettedata$mbres
mball <- vignettedata$mball
mbsmry <- mb_summarize(mbres)
set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250)
bootstats <- mb_bootstats(mbsmry, mbboot)
plot_bootstats(bootstats)

Read in SNPs from a VCF

Description

Helper function to read in SNP data from a VCF file.

Usage

read_vcf(file, bsgenome)
read_vcf(file, bsgenome)

Arguments

`file`	File path to a VCF file. See details.
`bsgenome`	An object of class BSgenome for the species you are interrogating; see BSgenome::available.genomes for a list of species.

Details

Note that the VCF must be filtered to only contain variant sites (i.e., no 0/0), or only homozygous alt sites if you choose (0/1 or 1/1). This can be accomplished with bcftools:

⁠# Filter to any variant sites:⁠
⁠bcftools view -i 'GT="alt"' ...⁠
⁠# Filter to homozygous alt sites:⁠
⁠bcftools view -i 'GT="AA"' ...⁠

Value

A GRanges object containing SNP_id, REF, and ALT columns.

Examples

## Not run: 
library(BSgenome.Ggallus.UCSC.galGal6)
vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE)
snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6)
snps

## End(Not run)
## Not run: 
library(BSgenome.Ggallus.UCSC.galGal6)
vcf_file <- system.file("extdata", "galGal6-chr33.vcf.gz", package="tfboot", mustWork = TRUE)
snps <- read_vcf(vcf_file, BSgenome.Ggallus.UCSC.galGal6)
snps

## End(Not run)

Split GRanges by gene

Description

Splits a GRanges object into a GRangesList by a column (typically gene_id). This function is deprecated and generally has no good use case. Originally written to split up a GRanges object into a list to iterate over using furrr::future_map(), but deprecated in favor of using built-in parallelization in motifbreakR.

Usage

split_gr_by_id(gr, key_col = "gene_id")
split_gr_by_id(gr, key_col = "gene_id")

Arguments

`gr`	A GRanges object returned by get_upstream_snps.
`key_col`	The name of the column in `gr` to split by (default `gene_id`).

Value

A list of genomic ranges split by split_col.

Examples

## Not run: 
gr <- data.frame(seqnames=rep(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
           start=1:10,
           width=1,
           gene_id = rep(c("gene1", "gene2", "gene3", "gene4"), c(4, 2, 1, 3))) |>
  plyranges::as_granges()
gr
split_gr_by_id(gr, key_col="gene_id")

## End(Not run)
## Not run: 
gr <- data.frame(seqnames=rep(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
           start=1:10,
           width=1,
           gene_id = rep(c("gene1", "gene2", "gene3", "gene4"), c(4, 2, 1, 3))) |>
  plyranges::as_granges()
gr
split_gr_by_id(gr, key_col="gene_id")

## End(Not run)

Vignette data

Description

Pre-cooked data used by the vignette. This data is precomputed to speed vignette compilation time and is used throughout the examples. Contains a list of two objects:

mbres: Results from running motifbreakR on five randomly selected genes, then run through mb_to_tibble.
mball: Results from running motifbreakR on all genes, then run through mb_to_tibble.

Usage

vignettedata
vignettedata

Format

An object of class list of length 2.

Package 'tfboot'

Help Index

Get upstream intervals

Description

Usage

Arguments

Value

Examples

Get SNPs in upstream regions

Description

Usage

Arguments

Value

Examples

Bootstrap statistics

Description

Usage

Arguments

Value

Examples

Bootstrap motifbreakR results

Description

Usage

Arguments

Details

Value

Examples

Summarize motifbreakR results

Description

Usage

Arguments

Details

Value

Examples

motifbreakR results to tibble

Description

Usage

Arguments

Details

Value

Plot bootstrap distributions

Description

Usage

Arguments

Value

Examples

Read in SNPs from a VCF

Description

Usage

Arguments

Details

Value

Examples

Split GRanges by gene

Description

Usage

Arguments

Value

Examples

Vignette data

Description

Usage

Format