--- title: "Intro to the qqman package" author: "Stephen D. Turner" date: '`r Sys.Date()`' output: html_document: toc: yes --- ```{r, include=FALSE} library(qqman) library(knitr) opts_chunk$set(comment=NA, fig.width=12, fig.height=9, message=FALSE, tidy=TRUE, dpi=75) ``` # Intro to the **qqman** package ```{r generatedata, eval=FALSE, echo=FALSE} # This code used to generate the test data. Runs slow, but does the job. chrstats <- data.frame(chr=1:22, nsnps=1500) chrstats$nsnps <- with(chrstats, round(nsnps/chr^(1/3))) chrstats d <- data.frame(SNP=rep(NA, sum(chrstats$nsnps)), CHR=rep(NA, sum(chrstats$nsnps)), BP=rep(NA, sum(chrstats$nsnps)), P=rep(NA, sum(chrstats$nsnps))) snpi <- 1 set.seed(42) for (i in chrstats$chr) { for (j in 1:chrstats[i, 2]) { d[snpi, ]$SNP=paste0("rs", snpi) d[snpi, ]$CHR=i d[snpi, ]$BP=j d[snpi, ]$P=runif(1) snpi <- snpi+1 } } divisor <- c(seq(2,50,2), seq(50,2,-2)) divisor <- divisor^4 length(divisor) d[3026:3075, ]$P <- d[3026:3075, ]$P/divisor snpsOfInterest <- paste0("rs", 3001:3100) qq(d$P) manhattan(d, highlight=snpsOfInterest) gwasResults <- d save(gwasResults, file="data/gwasResults.RData") ``` The **qqman** package includes functions for creating manhattan plots and q-q plots from GWAS results. The `gwasResults` data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data: ```{r} str(gwasResults) head(gwasResults) tail(gwasResults) ``` How many SNPs on each chromosome? ```{r} as.data.frame(table(gwasResults$CHR)) ``` ## Creating manhattan plots Now, let's make a basic manhattan plot. ```{r} manhattan(gwasResults) ``` We can also pass in other graphical parameters. Let's add a title (`main=`), increase the y-axis limit (`ylim=`), reduce the point size to 60% (`cex=`), and reduce the font size of the axis labels to 90% (`cex.axis=`). While we're at it, let's change the colors (`col=`), remove the suggestive and genome-wide significance lines, and supply our own labels for the chromosomes: ```{r} manhattan(gwasResults, main="Manhattan Plot", ylim=c(0,10), cex=0.6, cex.axis=0.9, col=c("blue4", "orange3"), suggestiveline=F, genomewideline=F, chrlabs=c(1:20, "P", "Q")) ``` Now, let's look at a single chromosome: ```{r} manhattan(subset(gwasResults, CHR==1)) ``` Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called `snpsOfInterest`. You'll get a warning if you try to highlight SNPs that don't exist. ```{r} str(snpsOfInterest) manhattan(gwasResults, highlight=snpsOfInterest) ``` We can combine highlighting and limiting to a single chromosome, and use the `xlim` graphical parameter to zoom in on a region of interest (between position 200-500): ```{r} manhattan(subset(gwasResults, CHR==3), highlight=snpsOfInterest, xlim=c(200, 500), main="Chr 3") ``` We can also annotate SNPs based on their p-value. By default, this only annotates the top SNP per chromosome that exceeds the `annotatePval` threshold. ```{r} manhattan(gwasResults, annotatePval=0.01) ``` We can also annotate all SNPs that meet a threshold: ```{r} manhattan(gwasResults, annotatePval=0.005, annotateTop=FALSE) ``` Finally, the `manhattan` function can be used to plot any value, not just p-values. Here, we'll simply call the function passing to the `p=` argument the name of the column we want to plot instead of the default "P" column. In this example, let's create a test statistic ("zscore"), plot that instead of p-values, change the y-axis label, and remove the default log transformation. We'll also remove the genomewide and suggestive lines because these are only meaningful if you're plotting -log10(p-values). ```{r} # Add test statistics gwasResults <- transform(gwasResults, zscore=qnorm(P/2, lower.tail=FALSE)) head(gwasResults) # Make the new plot manhattan(gwasResults, p="zscore", logp=FALSE, ylab="Z-score", genomewideline=FALSE, suggestiveline=FALSE, main="Manhattan plot of Z-scores") ``` A few notes on creating manhattan plots: * Run `str(gwasResults)`. Notice that the `gwasResults` data.frame has SNP, chromosome, position, and p-value columns named `SNP`, `CHR`, `BP`, and `P`. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to the `chr=`, `bp=`, `p=`, and `snp=` arguments. See `help(manhattan)` for details. * The chromosome column must be numeric. If you have "X," "Y," or "MT" chromosomes, you'll need to rename these 23, 24, 25, etc. You can modify the source code (e.g., `fix(manhattan)`) to change the line designating the axis tick labels (`labs <- unique(d$CHR)`) to set this to whatever you'd like it to be. * If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for `col="blue"`, `col="red"`, or `col="green3"` to modify the suggestive line, genomewide line, and highlight colors, respectively. ## Creating Q-Q plots Creating Q-Q plots is straightforward - simply supply a vector of p-values to the `qq()` function. ```{r} qq(gwasResults$P) ``` We can optionally supply many other graphical parameters. ```{r} qq(gwasResults$P, main="Q-Q plot of GWAS p-values", xlim=c(0,7), ylim=c(0,12), pch=18, col="blue4", cex=1.5, las=1) ```