---
title: "Intro to the qqman package"
author: "Stephen D. Turner"
date: '`r Sys.Date()`'
output:
  html_document:
    toc: yes
---

<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Intro to the qqman package}
-->

```{r, include=FALSE}
library(qqman)
library(knitr)
opts_chunk$set(comment=NA, fig.width=12, fig.height=9, message=FALSE, tidy=TRUE, dpi=75)
```

# Intro to the **qqman** package

```{r generatedata, eval=FALSE, echo=FALSE}
# This code used to generate the test data. Runs slow, but does the job.
chrstats <- data.frame(chr=1:22, nsnps=1500)
chrstats$nsnps <- with(chrstats, round(nsnps/chr^(1/3)))
chrstats

d <- data.frame(SNP=rep(NA, sum(chrstats$nsnps)), 
                CHR=rep(NA, sum(chrstats$nsnps)), 
                BP=rep(NA, sum(chrstats$nsnps)), 
                P=rep(NA, sum(chrstats$nsnps)))
snpi <- 1
set.seed(42)
for (i in chrstats$chr) {
    for (j in 1:chrstats[i, 2]) {
        d[snpi, ]$SNP=paste0("rs", snpi)
        d[snpi, ]$CHR=i
        d[snpi, ]$BP=j
        d[snpi, ]$P=runif(1)
        snpi <- snpi+1
    }
}

divisor <- c(seq(2,50,2), seq(50,2,-2))
divisor <- divisor^4
length(divisor)
d[3026:3075, ]$P <- d[3026:3075, ]$P/divisor
snpsOfInterest <- paste0("rs", 3001:3100)
qq(d$P)
manhattan(d, highlight=snpsOfInterest)
gwasResults <- d
save(gwasResults, file="data/gwasResults.RData")
```

The **qqman** package includes functions for creating manhattan plots and q-q plots from GWAS results. The `gwasResults` data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:

```{r}
str(gwasResults)
head(gwasResults)
tail(gwasResults)
```

How many SNPs on each chromosome?

```{r}
as.data.frame(table(gwasResults$CHR))
```

## Creating manhattan plots

Now, let's make a basic manhattan plot. 

```{r}
manhattan(gwasResults)
```

We can also pass in other graphical parameters. Let's add a title (`main=`), increase the y-axis limit (`ylim=`), reduce the point size to 60% (`cex=`), and reduce the font size of the axis labels to 90% (`cex.axis=`). While we're at it, let's change the colors (`col=`), remove the suggestive and genome-wide significance lines, and supply our own labels for the chromosomes:

```{r}
manhattan(gwasResults, main="Manhattan Plot", ylim=c(0,10), cex=0.6, cex.axis=0.9, col=c("blue4", "orange3"), suggestiveline=F, genomewideline=F, chrlabs=c(1:20, "P", "Q"))
```

Now, let's look at a single chromosome:

```{r}
manhattan(subset(gwasResults, CHR==1))
```

Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called `snpsOfInterest`. You'll get a warning if you try to highlight SNPs that don't exist.

```{r}
str(snpsOfInterest)
manhattan(gwasResults, highlight=snpsOfInterest)
```

We can combine highlighting and limiting to a single chromosome, and use the `xlim` graphical parameter to zoom in on a region of interest (between position 200-500):

```{r}
manhattan(subset(gwasResults, CHR==3), highlight=snpsOfInterest, xlim=c(200, 500), main="Chr 3")
```

We can also annotate SNPs based on their p-value. By default, this only annotates the top SNP per chromosome that exceeds the `annotatePval` threshold.

```{r}
manhattan(gwasResults, annotatePval=0.01)
```

We can also annotate all SNPs that meet a threshold:

```{r}
manhattan(gwasResults, annotatePval=0.005, annotateTop=FALSE)
```

Finally, the `manhattan` function can be used to plot any value, not just p-values. Here, we'll simply call the function passing to the `p=` argument the name of the column we want to plot instead of the default "P" column. In this example, let's create a test statistic ("zscore"), plot that instead of p-values, change the y-axis label, and remove the default log transformation. We'll also remove the genomewide and suggestive lines because these are only meaningful if you're plotting -log10(p-values).

```{r}
# Add test statistics
gwasResults <- transform(gwasResults, zscore=qnorm(P/2, lower.tail=FALSE))
head(gwasResults)

# Make the new plot
manhattan(gwasResults, p="zscore", logp=FALSE, ylab="Z-score", genomewideline=FALSE, suggestiveline=FALSE, main="Manhattan plot of Z-scores")
```

A few notes on creating manhattan plots:

* Run `str(gwasResults)`. Notice that the `gwasResults` data.frame has SNP, chromosome, position, and p-value columns named `SNP`, `CHR`, `BP`, and `P`. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to the `chr=`, `bp=`, `p=`, and `snp=` arguments. See `help(manhattan)` for details.
* The chromosome column must be numeric. If you have "X," "Y," or "MT" chromosomes, you'll need to rename these 23, 24, 25, etc. You can modify the source code (e.g., `fix(manhattan)`) to change the line designating the axis tick labels (`labs <- unique(d$CHR)`) to set this to whatever you'd like it to be.
* If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for `col="blue"`, `col="red"`, or `col="green3"` to modify the suggestive line, genomewide line, and highlight colors, respectively.

## Creating Q-Q plots

Creating Q-Q plots is straightforward - simply supply a vector of p-values to the `qq()` function. 

```{r}
qq(gwasResults$P)
```

We can optionally supply many other graphical parameters.

```{r}
qq(gwasResults$P, main="Q-Q plot of GWAS p-values",
   xlim=c(0,7), ylim=c(0,12), pch=18, col="blue4", cex=1.5, las=1)
```