Title: | Turner Miscellaneous |
---|---|
Description: | Miscellaneous utility functions for data manipulation, data tidying, and working with gene expression data and biological sequence data. |
Authors: | Stephen Turner [aut, cre] |
Maintainer: | Stephen Turner <[email protected]> |
License: | GPL-3 |
Version: | 1.1.0 |
Built: | 2024-11-01 04:54:24 UTC |
Source: | https://github.com/stephenturner/Tmisc |
Returns a logical vector of elements of x matching the regex y.
x %like% pattern
x %like% pattern
x |
a vector (numeric, character, factor) |
pattern |
a vector (numeric, character, factor), matching the mode of x |
A logical vector with length equal to x
of things in x
that are like y
.
(Name <- c("Mary","George","Martha")) Name %in% c("Mary") Name %like% "^Mar" Name %nin% c("George") Name %nlike% "^Mar"
(Name <- c("Mary","George","Martha")) Name %in% c("Mary") Name %like% "^Mar" Name %nin% c("George") Name %nlike% "^Mar"
Returns a logical vector of elements of x that are not in y.
x %nin% table
x %nin% table
x |
a vector (numeric, character, factor) |
table |
a vector (numeric, character, factor), matching the mode of x |
A logical vector with length equal to x
of things in x
that aren't in y
.
1:10 %nin% seq(from=2, to=10, by=2) c("a", "b", "c") %nin% c("a", "b")
1:10 %nin% seq(from=2, to=10, by=2) c("a", "b", "c") %nin% c("a", "b")
Returns a logical vector of elements of x not matching the regex y.
x %nlike% pattern
x %nlike% pattern
x |
a vector (numeric, character, factor) |
pattern |
a vector (numeric, character, factor), matching the mode of x |
A logical vector with length equal to x
of things in x
that aren't like y
.
(Name <- c("Mary","George","Martha")) Name %in% c("Mary") Name %like% "^Mar" Name %nin% c("George") Name %nlike% "^Mar"
(Name <- c("Mary","George","Martha")) Name %in% c("Mary") Name %like% "^Mar" Name %nin% c("George") Name %nlike% "^Mar"
Call these function as an addin to insert desired text at the cursor position. After installing Tmisc, hit the Addins menu, and optionally add a keyboard shortcut, e.g., Command+Shift+I, Alt+-, etc.
Are all the elements of a numeric vector (approximately) equal?
are_all_equal(x, na.rm = FALSE)
are_all_equal(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Remove missing values (FALSE by default; NAs in x will return NA). |
Logical, whether all elements of a numeric vector are equal.
are_all_equal(c(5,5,5)) are_all_equal(c(5,5,5,6)) are_all_equal(c(5,5,5,NA,6)) are_all_equal(c(5,5,5,NA,6), na.rm=TRUE) 5==5.000000001 identical(5, 5.000000001) are_all_equal(c(5L, 5, 5.000000001))
are_all_equal(c(5,5,5)) are_all_equal(c(5,5,5,6)) are_all_equal(c(5,5,5,NA,6)) are_all_equal(c(5,5,5,NA,6), na.rm=TRUE) 5==5.000000001 identical(5, 5.000000001) are_all_equal(c(5L, 5, 5.000000001))
Prints the first n rows and columns of a data frame or matrix.
corner(x, n = 5)
corner(x, n = 5)
x |
A data.frame. |
n |
The number of rows/columns to print. |
The corner of the data frame
corner(mtcars) corner(iris, n=4)
corner(mtcars) corner(iris, n=4)
Takes a count matrix and a vector of gene lengths and returns an optionally log2
-transformed FPKM matrix. Modified from edgeR.
counts2fpkm(x, length, log = FALSE, prior.count = 0.25)
counts2fpkm(x, length, log = FALSE, prior.count = 0.25)
x |
a matrix of counts. |
length |
a vector of length |
log |
logical, if |
prior.count |
average count to be added to each observation to avoid
taking log of zero. Used only if |
A matrix of FPKM values.
set.seed(123) genecounts <- matrix(sample(c(rep(0, 50), 1:100), 30), nrow=10) lengths <- sample(1000:10000, 10) counts2fpkm(genecounts, lengths)
set.seed(123) genecounts <- matrix(sample(c(rep(0, 50), 1:100), 30), nrow=10) lengths <- sample(1000:10000, 10) counts2fpkm(genecounts, lengths)
Prints the specified number of rows of a data frame, followed by a row of
ellipses. Useful for piping to knitr::kable()
for printing a truncated
table in a markdown document.
ellipses(df, n = 5L)
ellipses(df, n = 5L)
df |
A data frame. |
n |
The number of rows to show before an ellipses row. |
A data frame truncated by a row of ellipses.
ellipses(mtcars, 5)
ellipses(mtcars, 5)
Uses Fisher's method to combine p-values from different tests.
fisherp(x)
fisherp(x)
x |
A vector of p-values between 0 and 1. |
The combined p-value.
Fisher, R.A. (1925). Statistical Methods for Research Workers.
https://en.wikipedia.org/wiki/Fisher%27s_method.
fisherp(c(.042, .02, .001, 0.01, .89))
fisherp(c(.042, .02, .001, 0.01, .89))
Plots missing data as holes on a black canvas.
gg_na(df)
gg_na(df)
df |
A data frame |
This will emulate ggplot2's hues, which are equally spaced hues around the color wheel, starting from 15.
gghues(n, start = 15)
gghues(n, start = 15)
n |
The Numeric; number of hues to generate. |
start |
Numeric; the place on the color wheel to start. ggplot2 default is 15. |
A vector of hues
n <- 10 gghues(3) barplot(rep(1,n), col=gghues(n), names=gghues(n)) barplot(rep(1,n), col=gghues(n, start=15+180), names=gghues(n, start=15+180))
n <- 10 gghues(3) barplot(rep(1,n), col=gghues(n), names=gghues(n)) barplot(rep(1,n), col=gghues(n, start=15+180), names=gghues(n, start=15+180))
Get a two-letter genotype from a VCF GT field. Current implementation is quick and dirty, and only accepts 0/0, 0/1, or 1/1. Any other input to gt will return a missing value.
gt2refalt(gt, ref, alt)
gt2refalt(gt, ref, alt)
gt |
The genotype field (must be 0/0, 0/1, or 1/1). |
ref |
The reference allele. |
alt |
The alternate allele. |
Returnvalue
gt2refalt(gt="0/0", ref="R", alt="A") gt2refalt(gt="0/1", ref="R", alt="A") gt2refalt(gt="1/1", ref="R", alt="A") gt2refalt(gt="0/2", ref="R", alt="A") gt2refalt(gt="./.", ref="R", alt="A")
gt2refalt(gt="0/0", ref="R", alt="A") gt2refalt(gt="0/1", ref="R", alt="A") gt2refalt(gt="1/1", ref="R", alt="A") gt2refalt(gt="0/2", ref="R", alt="A") gt2refalt(gt="./.", ref="R", alt="A")
Calculates a distance matrix from a matrix of probability distributions using Jensen-Shannon divergence. Adapted from https://enterotype.embl.de/.
jsd(M, pseudocount = 1e-06, normalizeCounts = FALSE)
jsd(M, pseudocount = 1e-06, normalizeCounts = FALSE)
M |
a probability distribution matrix, e.g., normalized transcript compatibility counts. |
pseudocount |
a small number to avoid division by zero errors. |
normalizeCounts |
logical, whether to attempt to normalize by dividing by the column sums. Set to |
A Jensen-Shannon divergence-based distance matrix.
https://web.archive.org/web/20240131141033/https://enterotype.embl.de/enterotypes.html#dm.
set.seed(42) M <- matrix(rpois(100, lambda=100), ncol=5) colnames(M) <- paste0("sample", 1:5) rownames(M) <- paste0("gene", 1:20) Mnorm <- apply(M, 2, function(x) x/sum(x)) Mjsd <- jsd(Mnorm) # equivalently Mjsd <- jsd(M, normalizeCounts=TRUE) Mjsd plot(hclust(Mjsd))
set.seed(42) M <- matrix(rpois(100, lambda=100), ncol=5) colnames(M) <- paste0("sample", 1:5) rownames(M) <- paste0("gene", 1:20) Mnorm <- apply(M, 2, function(x) x/sum(x)) Mjsd <- jsd(Mnorm) # equivalently Mjsd <- jsd(M, normalizeCounts=TRUE) Mjsd plot(hclust(Mjsd))
Sometimes want to plot p-values (e.g., volcano plot or MA-plot), but if a statistical test returns a zero p-value, this causes problems with visualization on the log scale. This function returns a vector where the zero values are equal to the smallest nonzero value in the vector.
lowestnonzero(x)
lowestnonzero(x)
x |
A vector of p-values between 0 and 1. |
A vector of p-values where zero values are exchanged for the lowest non-zero p-value in the original vector.
lowestnonzero(c(.042, .02, 0, .001, 0, .89))
lowestnonzero(c(.042, .02, 0, .001, 0, .89))
Improved list of objects. Sorts by size by default. Adapted from https://stackoverflow.com/q/1358003/654296.
lsa( pos = 1, pattern, order.by = "Size", decreasing = TRUE, head = FALSE, n = 10 )
lsa( pos = 1, pattern, order.by = "Size", decreasing = TRUE, head = FALSE, n = 10 )
pos |
numeric. Position in the stack. |
pattern |
Regex to filter the objects by. |
order.by |
character. Either 'Type', 'Size', 'PrettySize', 'Rows', or 'Columns'. This will dictate how the output is ordered. |
decreasing |
logical. Should the output be displayed in decreasing order? |
head |
logical. Use head on the output? |
n |
numeric. Number of objects to display is head is TRUE. |
A data.frame with type, size in bytes, human-readable size, rows, and columns of every object in the environment.
a <- rnorm(100000) b <- matrix(1, 1000, 100) lsa()
a <- rnorm(100000) b <- matrix(1, 1000, 100) lsa()
Turns a distance matrix into a data frame of pairwise distances.
mat2df(M)
mat2df(M)
M |
a square pairwise matrix (e.g., of distances). |
Data frame with pairwise distances.
M <- matrix(1:25, nrow=5, dimnames=list(letters[1:5], letters[1:5])) M
M <- matrix(1:25, nrow=5, dimnames=list(letters[1:5], letters[1:5])) M
Returns the mode of a vector. First in a tie wins (see examples).
Mode(x, na.rm = FALSE)
Mode(x, na.rm = FALSE)
x |
A vector. |
na.rm |
Remove missing values before calculating the mode (FALSE by default). NAs are counted just like any other element. That is, an NA in the vector won't necessarily result in a return NA. See the first example. |
The mode of the input vector.
Mode(c(1,2,2,3,3,3, NA)) Mode(c(1,2,2,3,3,3, NA), na.rm=TRUE) Mode(c(1,2,2,3,3,3, NA, NA, NA, NA)) Mode(c(1,2,2,3,3,3, NA, NA, NA, NA), na.rm=TRUE) Mode(c("A", "Z", "Z", "B", "B"))
Mode(c(1,2,2,3,3,3, NA)) Mode(c(1,2,2,3,3,3, NA), na.rm=TRUE) Mode(c(1,2,2,3,3,3, NA, NA, NA, NA)) Mode(c(1,2,2,3,3,3, NA, NA, NA, NA), na.rm=TRUE) Mode(c("A", "Z", "Z", "B", "B"))
Get names and class of all columns in a data frame in a friendly format.
nn(df)
nn(df)
df |
A data frame. |
A data frame with index and class.
nn(iris)
nn(iris)
Open the current working directory on mac
o()
o()
## Not run: o() ## End(Not run)
## Not run: o() ## End(Not run)
This returns a character vector which shows the top n lines of a file.
peek(x, n = 5)
peek(x, n = 5)
x |
a filename |
n |
the number of lines to return |
A character vector of the first n lines of the file.
## Not run: filename <- tempfile() x <- matrix(round(rnorm(10^4), 2), 1000, 10) colnames(x) <- letters[1:10] write.table(x, file = filename, row.names = FALSE, quote = FALSE) peek(filename) ## End(Not run)
## Not run: filename <- tempfile() x <- matrix(round(rnorm(10^4), 2), 1000, 10) colnames(x) <- letters[1:10] write.table(x, file = filename, row.names = FALSE, quote = FALSE) peek(filename) ## End(Not run)
Tidy version of built-in Anscombe's Quartet data. Four datasets that have nearly identical linear regression properties, yet appear very different when graphed.
quartet
quartet
Data frame with three columns, set, x, y.
Reverse complements a sequence.
revcomp(x)
revcomp(x)
x |
A sequence to reverse complement |
The sequence, reverse complemented
revcomp("GATTACA") sapply(c("GATTACA", "CATATTAC"), revcomp)
revcomp("GATTACA") sapply(c("GATTACA", "CATATTAC"), revcomp)
Allows you to rename objects as you save them. See https://stackoverflow.com/a/21248218/654296.
saveit(..., file = stop("'file' must be specified"))
saveit(..., file = stop("'file' must be specified"))
... |
Objects to save. |
file |
Filename/path where data will be saved. |
## Not run: foo <- 1 saveit(bar=foo, file="foobar.Rdata") ## End(Not run)
## Not run: foo <- 1 saveit(bar=foo, file="foobar.Rdata") ## End(Not run)
Writes output of sessionInfo()
to the clipboard. Only works on Mac.
sicb()
sicb()
## Not run: sicb() ## End(Not run)
## Not run: sicb() ## End(Not run)
Alphabetically sorts characters in a string. Vectorized over x.
strSort(x)
strSort(x)
x |
A string to sort. |
A sorted string.
strSort("cba") strSort("zyxcCbB105.a") strSort(c("cba", "zyx")) strSort(c("cba", NA))
strSort("cba") strSort("zyxcCbB105.a") strSort(c("cba", "zyx")) strSort(c("cba", NA))
Plot a histogram with either a normal distribution or density curve overlay.
Thist(x, overlay = "normal", col = "gray80", ...)
Thist(x, overlay = "normal", col = "gray80", ...)
x |
A numeric vector. |
overlay |
Either "normal" (default) or "density" indicating whether a normal distribution or density curve should be plotted on top of the histogram. |
col |
Color of the histogram bars. |
... |
Other arguments to be passed to |
set.seed(42) x <- rnorm(1000, mean=5, sd=2) Thist(x) Thist(x, overlay="density") Thist(x^2) Thist(x^2, overlay="density", breaks=50, col="lightblue2")
set.seed(42) x <- rnorm(1000, mean=5, sd=2) Thist(x) Thist(x, overlay="density") Thist(x^2) Thist(x^2, overlay="density", breaks=50, col="lightblue2")
A matrix of scatter plots with rugged histograms, correlations, and significance stars. Much of the functionality borrowed from PerformanceAnalytics::chart.Correlation()
.
Tpairs(x, histogram = TRUE, gap = 0, ...)
Tpairs(x, histogram = TRUE, gap = 0, ...)
x |
A numeric matrix or data.frame. |
histogram |
Overlay a histogram on the diagonals? |
gap |
distance between subplots, in margin lines. |
... |
arguments to be passed to or from other methods. |
Tpairs(iris[-5]) Tpairs(iris[-5], pch=21, bg=gghues(3)[factor(iris$Species)], gap=1)
Tpairs(iris[-5]) Tpairs(iris[-5], pch=21, bg=gghues(3)[factor(iris$Species)], gap=1)