Package 'Tmisc' reference manual

Title:	Turner Miscellaneous
Description:	Miscellaneous utility functions for data manipulation, data tidying, and working with gene expression data and biological sequence data.
Authors:	Stephen Turner [aut, cre]
Maintainer:	Stephen Turner <[email protected]>
License:	GPL-3
Version:	1.1.0
Built:	2025-03-03 06:36:38 UTC
Source:	https://github.com/stephenturner/Tmisc

x like y

Description

Returns a logical vector of elements of x matching the regex y.

Usage

x %like% pattern
x %like% pattern

Arguments

`x`	a vector (numeric, character, factor)
`pattern`	a vector (numeric, character, factor), matching the mode of x

Value

A logical vector with length equal to x of things in x that are like y.

Examples

(Name <- c("Mary","George","Martha"))
Name %in% c("Mary")
Name %like% "^Mar"
Name %nin% c("George")
Name %nlike% "^Mar"
(Name <- c("Mary","George","Martha"))
Name %in% c("Mary")
Name %like% "^Mar"
Name %nin% c("George")
Name %nlike% "^Mar"

x not in y

Description

Returns a logical vector of elements of x that are not in y.

Usage

x %nin% table
x %nin% table

Arguments

`x`	a vector (numeric, character, factor)
`table`	a vector (numeric, character, factor), matching the mode of x

Value

A logical vector with length equal to x of things in x that aren't in y.

Examples

1:10 %nin% seq(from=2, to=10, by=2)
c("a", "b", "c") %nin% c("a", "b")
1:10 %nin% seq(from=2, to=10, by=2)
c("a", "b", "c") %nin% c("a", "b")

x not like y

Description

Returns a logical vector of elements of x not matching the regex y.

Usage

x %nlike% pattern
x %nlike% pattern

Arguments

`x`	a vector (numeric, character, factor)
`pattern`	a vector (numeric, character, factor), matching the mode of x

Value

A logical vector with length equal to x of things in x that aren't like y.

Examples

(Name <- c("Mary","George","Martha"))
Name %in% c("Mary")
Name %like% "^Mar"
Name %nin% c("George")
Name %nlike% "^Mar"
(Name <- c("Mary","George","Martha"))
Name %in% c("Mary")
Name %like% "^Mar"
Name %nin% c("George")
Name %nlike% "^Mar"

Insert text at current position.

Description

Call these function as an addin to insert desired text at the cursor position. After installing Tmisc, hit the Addins menu, and optionally add a keyboard shortcut, e.g., Command+Shift+I, Alt+-, etc.

Are all equal?

Description

Are all the elements of a numeric vector (approximately) equal?

Usage

are_all_equal(x, na.rm = FALSE)
are_all_equal(x, na.rm = FALSE)

Arguments

`x`	A numeric vector.
`na.rm`	Remove missing values (FALSE by default; NAs in x will return NA).

Value

Logical, whether all elements of a numeric vector are equal.

Examples

are_all_equal(c(5,5,5))
are_all_equal(c(5,5,5,6))
are_all_equal(c(5,5,5,NA,6))
are_all_equal(c(5,5,5,NA,6), na.rm=TRUE)
5==5.000000001
identical(5, 5.000000001)
are_all_equal(c(5L, 5, 5.000000001))

are_all_equal(c(5,5,5))
are_all_equal(c(5,5,5,6))
are_all_equal(c(5,5,5,NA,6))
are_all_equal(c(5,5,5,NA,6), na.rm=TRUE)
5==5.000000001
identical(5, 5.000000001)
are_all_equal(c(5L, 5, 5.000000001))

Print the top left corner of a data frame

Description

Prints the first n rows and columns of a data frame or matrix.

Usage

corner(x, n = 5)
corner(x, n = 5)

Arguments

`x`	A data.frame.
`n`	The number of rows/columns to print.

Value

The corner of the data frame

Examples

corner(mtcars)
corner(iris, n=4)

corner(mtcars)
corner(iris, n=4)

Fragments per kilobase per million

Description

Takes a count matrix and a vector of gene lengths and returns an optionally log2-transformed FPKM matrix. Modified from edgeR.

Usage

counts2fpkm(x, length, log = FALSE, prior.count = 0.25)
counts2fpkm(x, length, log = FALSE, prior.count = 0.25)

Arguments

`x`	a matrix of counts.
`length`	a vector of length `nrow(x)` giving length in bases.
`log`	logical, if `TRUE`, then `log2` values are returned.
`prior.count`	average count to be added to each observation to avoid taking log of zero. Used only if `log=TRUE`.

Value

A matrix of FPKM values.

Examples

set.seed(123)
genecounts <- matrix(sample(c(rep(0, 50), 1:100), 30), nrow=10)
lengths <- sample(1000:10000, 10)
counts2fpkm(genecounts, lengths)
  
set.seed(123)
genecounts <- matrix(sample(c(rep(0, 50), 1:100), 30), nrow=10)
lengths <- sample(1000:10000, 10)
counts2fpkm(genecounts, lengths)

Truncate a data frame with ellipses.

Description

Prints the specified number of rows of a data frame, followed by a row of ellipses. Useful for piping to knitr::kable() for printing a truncated table in a markdown document.

Usage

ellipses(df, n = 5L)
ellipses(df, n = 5L)

Arguments

`df`	A data frame.
`n`	The number of rows to show before an ellipses row.

Value

A data frame truncated by a row of ellipses.

Examples

ellipses(mtcars, 5)

ellipses(mtcars, 5)

Fisher's method to combine p-values

Description

Uses Fisher's method to combine p-values from different tests.

Usage

fisherp(x)
fisherp(x)

Arguments

`x`	A vector of p-values between 0 and 1.

Value

The combined p-value.

References

Fisher, R.A. (1925). Statistical Methods for Research Workers.

https://en.wikipedia.org/wiki/Fisher%27s_method.

Examples

fisherp(c(.042, .02, .001, 0.01, .89))

fisherp(c(.042, .02, .001, 0.01, .89))

Plot missing data

Description

Plots missing data as holes on a black canvas.

Usage

gg_na(df)
gg_na(df)

Arguments

`df`	A data frame

Emulate ggplot2 default hues

Description

This will emulate ggplot2's hues, which are equally spaced hues around the color wheel, starting from 15.

Usage

gghues(n, start = 15)
gghues(n, start = 15)

Arguments

`n`	The Numeric; number of hues to generate.
`start`	Numeric; the place on the color wheel to start. ggplot2 default is 15.

Value

A vector of hues

Examples

n <- 10
gghues(3)
barplot(rep(1,n), col=gghues(n), names=gghues(n))
barplot(rep(1,n), col=gghues(n, start=15+180), names=gghues(n, start=15+180))

n <- 10
gghues(3)
barplot(rep(1,n), col=gghues(n), names=gghues(n))
barplot(rep(1,n), col=gghues(n, start=15+180), names=gghues(n, start=15+180))

Two-letter genotype from VCF GT

Description

Get a two-letter genotype from a VCF GT field. Current implementation is quick and dirty, and only accepts 0/0, 0/1, or 1/1. Any other input to gt will return a missing value.

Usage

gt2refalt(gt, ref, alt)
gt2refalt(gt, ref, alt)

Arguments

`gt`	The genotype field (must be 0/0, 0/1, or 1/1).
`ref`	The reference allele.
`alt`	The alternate allele.

Value

Returnvalue

Examples

gt2refalt(gt="0/0", ref="R", alt="A")
gt2refalt(gt="0/1", ref="R", alt="A")
gt2refalt(gt="1/1", ref="R", alt="A")
gt2refalt(gt="0/2", ref="R", alt="A")
gt2refalt(gt="./.", ref="R", alt="A")

gt2refalt(gt="0/0", ref="R", alt="A")
gt2refalt(gt="0/1", ref="R", alt="A")
gt2refalt(gt="1/1", ref="R", alt="A")
gt2refalt(gt="0/2", ref="R", alt="A")
gt2refalt(gt="./.", ref="R", alt="A")

Jensen-Shannon divergence

Description

Calculates a distance matrix from a matrix of probability distributions using Jensen-Shannon divergence. Adapted from https://enterotype.embl.de/.

Usage

jsd(M, pseudocount = 1e-06, normalizeCounts = FALSE)
jsd(M, pseudocount = 1e-06, normalizeCounts = FALSE)

Arguments

`M`	a probability distribution matrix, e.g., normalized transcript compatibility counts.
`pseudocount`	a small number to avoid division by zero errors.
`normalizeCounts`	logical, whether to attempt to normalize by dividing by the column sums. Set to `TRUE` if this is, e.g., a count matrix.

Value

A Jensen-Shannon divergence-based distance matrix.

References

https://web.archive.org/web/20240131141033/https://enterotype.embl.de/enterotypes.html#dm.

Examples

set.seed(42)
M <- matrix(rpois(100, lambda=100), ncol=5)
colnames(M) <- paste0("sample", 1:5)
rownames(M) <- paste0("gene", 1:20)
Mnorm <- apply(M, 2, function(x) x/sum(x))
Mjsd <- jsd(Mnorm)
# equivalently
Mjsd <- jsd(M, normalizeCounts=TRUE)
Mjsd
plot(hclust(Mjsd))
  
set.seed(42)
M <- matrix(rpois(100, lambda=100), ncol=5)
colnames(M) <- paste0("sample", 1:5)
rownames(M) <- paste0("gene", 1:20)
Mnorm <- apply(M, 2, function(x) x/sum(x))
Mjsd <- jsd(Mnorm)
# equivalently
Mjsd <- jsd(M, normalizeCounts=TRUE)
Mjsd
plot(hclust(Mjsd))

Sometimes want to plot p-values (e.g., volcano plot or MA-plot), but if a statistical test returns a zero p-value, this causes problems with visualization on the log scale. This function returns a vector where the zero values are equal to the smallest nonzero value in the vector.

Usage

lowestnonzero(x)
lowestnonzero(x)

Arguments

`x`	A vector of p-values between 0 and 1.

Value

A vector of p-values where zero values are exchanged for the lowest non-zero p-value in the original vector.

Examples

lowestnonzero(c(.042, .02, 0, .001, 0, .89))

lowestnonzero(c(.042, .02, 0, .001, 0, .89))

Improved list of objects

Description

Improved list of objects. Sorts by size by default. Adapted from https://stackoverflow.com/q/1358003/654296.

Usage

lsa(
  pos = 1,
  pattern,
  order.by = "Size",
  decreasing = TRUE,
  head = FALSE,
  n = 10
)
lsa(
  pos = 1,
  pattern,
  order.by = "Size",
  decreasing = TRUE,
  head = FALSE,
  n = 10
)

Arguments

`pos`	numeric. Position in the stack.
`pattern`	Regex to filter the objects by.
`order.by`	character. Either 'Type', 'Size', 'PrettySize', 'Rows', or 'Columns'. This will dictate how the output is ordered.
`decreasing`	logical. Should the output be displayed in decreasing order?
`head`	logical. Use head on the output?
`n`	numeric. Number of objects to display is head is TRUE.

Value

A data.frame with type, size in bytes, human-readable size, rows, and columns of every object in the environment.

Examples

a <- rnorm(100000)
b <- matrix(1, 1000, 100)
lsa()
  
a <- rnorm(100000)
b <- matrix(1, 1000, 100)
lsa()

Matrix to pairwise data frame

Description

Turns a distance matrix into a data frame of pairwise distances.

Usage

mat2df(M)
mat2df(M)

Arguments

`M`	a square pairwise matrix (e.g., of distances).

Value

Data frame with pairwise distances.

Examples

M <- matrix(1:25, nrow=5, dimnames=list(letters[1:5], letters[1:5]))
M
  
M <- matrix(1:25, nrow=5, dimnames=list(letters[1:5], letters[1:5]))
M

Mode

Description

Returns the mode of a vector. First in a tie wins (see examples).

Usage

Mode(x, na.rm = FALSE)
Mode(x, na.rm = FALSE)

Arguments

`x`	A vector.
`na.rm`	Remove missing values before calculating the mode (FALSE by default). NAs are counted just like any other element. That is, an NA in the vector won't necessarily result in a return NA. See the first example.

Value

The mode of the input vector.

Examples

Mode(c(1,2,2,3,3,3, NA))
Mode(c(1,2,2,3,3,3, NA), na.rm=TRUE)
Mode(c(1,2,2,3,3,3, NA, NA, NA, NA))
Mode(c(1,2,2,3,3,3, NA, NA, NA, NA), na.rm=TRUE)
Mode(c("A", "Z", "Z", "B", "B"))

Mode(c(1,2,2,3,3,3, NA))
Mode(c(1,2,2,3,3,3, NA), na.rm=TRUE)
Mode(c(1,2,2,3,3,3, NA, NA, NA, NA))
Mode(c(1,2,2,3,3,3, NA, NA, NA, NA), na.rm=TRUE)
Mode(c("A", "Z", "Z", "B", "B"))

Get names and class of all columns in a data frame

Description

Get names and class of all columns in a data frame in a friendly format.

Usage

nn(df)
nn(df)

Arguments

`df`	A data frame.

Value

A data frame with index and class.

Examples

nn(iris)

nn(iris)

Open the current working directory on mac

Description

Open the current working directory on mac

Usage

o()
o()

Examples

## Not run: 
o()

## End(Not run)

## Not run: 
o()

## End(Not run)

Peek at the top of a text file

Description

This returns a character vector which shows the top n lines of a file.

Usage

peek(x, n = 5)
peek(x, n = 5)

Arguments

`x`	a filename
`n`	the number of lines to return

Value

A character vector of the first n lines of the file.

Examples

## Not run: 
filename <- tempfile()
x <- matrix(round(rnorm(10^4), 2), 1000, 10)
colnames(x) <- letters[1:10]
write.table(x, file = filename, row.names = FALSE, quote = FALSE)
peek(filename)

## End(Not run)
## Not run: 
filename <- tempfile()
x <- matrix(round(rnorm(10^4), 2), 1000, 10)
colnames(x) <- letters[1:10]
write.table(x, file = filename, row.names = FALSE, quote = FALSE)
peek(filename)

## End(Not run)

Anscombe's Quartet data (tidy)

Description

Tidy version of built-in Anscombe's Quartet data. Four datasets that have nearly identical linear regression properties, yet appear very different when graphed.

Usage

quartet
quartet

Format

Data frame with three columns, set, x, y.

Reverse complement

Description

Reverse complements a sequence.

Usage

revcomp(x)
revcomp(x)

Arguments

`x`	A sequence to reverse complement

Value

The sequence, reverse complemented

Examples

revcomp("GATTACA")
sapply(c("GATTACA", "CATATTAC"), revcomp)

revcomp("GATTACA")
sapply(c("GATTACA", "CATATTAC"), revcomp)

Rename objects while saving.

Description

Allows you to rename objects as you save them. See https://stackoverflow.com/a/21248218/654296.

Usage

saveit(..., file = stop("'file' must be specified"))
saveit(..., file = stop("'file' must be specified"))

Arguments

`...`	Objects to save.
`file`	Filename/path where data will be saved.

Examples

## Not run: 
foo <- 1
saveit(bar=foo, file="foobar.Rdata")

## End(Not run)
  
## Not run: 
foo <- 1
saveit(bar=foo, file="foobar.Rdata")

## End(Not run)

Write sessionInfo to the clipboard

Description

Writes output of sessionInfo() to the clipboard. Only works on Mac.

Usage

sicb()
sicb()

Examples

## Not run: 
sicb()

## End(Not run)

## Not run: 
sicb()

## End(Not run)

Sort characters in a string

Description

Alphabetically sorts characters in a string. Vectorized over x.

Usage

strSort(x)
strSort(x)

Arguments

`x`	A string to sort.

Value

A sorted string.

Examples

strSort("cba")
strSort("zyxcCbB105.a")
strSort(c("cba", "zyx"))
strSort(c("cba", NA))

strSort("cba")
strSort("zyxcCbB105.a")
strSort(c("cba", "zyx"))
strSort(c("cba", NA))

Histograms with overlays

Description

Plot a histogram with either a normal distribution or density curve overlay.

Usage

Thist(x, overlay = "normal", col = "gray80", ...)
Thist(x, overlay = "normal", col = "gray80", ...)

Arguments

`x`	A numeric vector.
`overlay`	Either "normal" (default) or "density" indicating whether a normal distribution or density curve should be plotted on top of the histogram.
`col`	Color of the histogram bars.
`...`	Other arguments to be passed to `hist()`.

Examples

set.seed(42)
x <- rnorm(1000, mean=5, sd=2)
Thist(x)
Thist(x, overlay="density")
Thist(x^2)
Thist(x^2, overlay="density", breaks=50, col="lightblue2")

set.seed(42)
x <- rnorm(1000, mean=5, sd=2)
Thist(x)
Thist(x, overlay="density")
Thist(x^2)
Thist(x^2, overlay="density", breaks=50, col="lightblue2")

Better scatterplot matrices

Description

A matrix of scatter plots with rugged histograms, correlations, and significance stars. Much of the functionality borrowed from PerformanceAnalytics::chart.Correlation().

Usage

Tpairs(x, histogram = TRUE, gap = 0, ...)
Tpairs(x, histogram = TRUE, gap = 0, ...)

Arguments

`x`	A numeric matrix or data.frame.
`histogram`	Overlay a histogram on the diagonals?
`gap`	distance between subplots, in margin lines.
`...`	arguments to be passed to or from other methods.

Examples

Tpairs(iris[-5])
Tpairs(iris[-5], pch=21, bg=gghues(3)[factor(iris$Species)], gap=1)

Tpairs(iris[-5])
Tpairs(iris[-5], pch=21, bg=gghues(3)[factor(iris$Species)], gap=1)

Package 'Tmisc'

Help Index

x like y

Description

Usage

Arguments

Value

See Also

Examples

x not in y

Description

Usage

Arguments

Value

See Also

Examples

x not like y

Description

Usage

Arguments

Value

See Also

Examples

Insert text at current position.

Description

Are all equal?

Description

Usage

Arguments

Value

Examples

Print the top left corner of a data frame

Description

Usage

Arguments

Value

Examples

Fragments per kilobase per million

Description

Usage

Arguments

Value

Examples

Truncate a data frame with ellipses.

Description

Usage

Arguments

Value

Examples

Fisher's method to combine p-values

Description

Usage

Arguments

Value

References

Examples

Plot missing data

Description

Usage

Arguments

Emulate ggplot2 default hues

Description

Usage

Arguments

Value

Examples

Two-letter genotype from VCF GT

Description

Usage

Arguments

Value

Examples

Jensen-Shannon divergence

Description

Usage

Arguments

Value

References

Examples

Lowest nonzero values