--- title: "Using GxEScanR" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Using GxEScanR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette shows some examples of using GxEScanR to perform genome-wide association study (GWAS) and genome-wide by environment interaction study (GWEIS) scans using all the options available to the user. ```{r setup} library(GxEScanR) ``` # Introduction With growing number of SNPs that can be imputed it is necessary to have software that can efficiently perform GWAS and GWEIS scans. GxEScanR can do this using files that were saved in the BinaryDosage format. The BinaryDosage package can convert VCF and GEN files into the BinaryDosage format. The BinaryDosage format was designed to keep the file with the genetic data small with fast read times. GxEScanR uses this and efficient large scale regression routines to perform GWAS and GWEIS scans quickly. # Example Files The examples below use three sample files. The first contains a data frame that has subject data. The second file is a genetic data file in the BinaryDosage format. The last file contains the information returned by the BinaryDosage::getbdinfo routine that returns information about the binary dosage file that makes reading it fast. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} covdatafile <- system.file("extdata", "covdata.rds", package = "GxEScanR") covdata <- readRDS(covdatafile) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(covdata[1:5,], caption = "First 5 Subjects") ``` To load the binary dosage information file, it is necessary to update the file name since the file has been moved from its original location during the installation process. The following loads the binary dosage information and updates the file name. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} bdinfofile <- system.file("extdata", "pdata_4_1.bdinfo", package = "GxEScanR") bdinfo <- readRDS(bdinfofile) bdinfo$filename <- system.file("extdata", "pdata_4_1.bdose", package = "GxEScanR") ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} modeldf <- readRDS(system.file("extdata", "models.rds", package = "GxEScanR")) knitr::kable(modeldf, caption = "Models Fit") ``` # Examples ## Linear Regression GWAS The simplest scan to do is a linear regression GWAS. The following model is first when doing a linear regression GWAS. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(modeldf[1,], caption = "Model Fit") ``` In the example data set, the phenotype, y, is coded 0,1. When GxEScanR sees the phenotype codes this way it assumes the outcome is binary and uses logistic regression. To perform a linear regression GWAS the binary option needs to be set to FALSE. The following shows how to do a linear GWAS along with the results. The routine outputs the number of subjects used in the analysis and returns a data frame with the results. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} lingwas1 <- gwas(data = covdata, bdinfo = bdinfo, binary = FALSE) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(lingwas1, caption = "Linear Regression GWAS") ``` The output can be redirected to output file that produces a plain test version of the results in a tab delimited file that can be read into R using the read.table routine. In this case, the gwas routine returns a value of 0. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} outfile <- tempfile() lingwas2 <- gwas(data = covdata, bdinfo = bdinfo, outfile = outfile, binary = FALSE) lingwas2 lingwas2 <- read.table(outfile, header = TRUE, sep ='\t') ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(lingwas2, caption = "Linear Regression GWAS") ``` ## Linear Regression GWEIS The gweis routine takes the same parameters as the gwas function but performs additional tests. The models fit for a linear regression GWAS are. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(modeldf[1:2,], caption = "Models Fit") ``` Note: When doing a GWEIS the interaction covariate is in the last column of the subject data frame. In this test the minmaf option was used. When minmaf is specified the minor allele for a SNP must exceed minmaf to be test. Notice that only 5 SNPs are in the output data frame. This is because one of the SNPs has a minor allele frequency below 0.2. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} lingweis1 <- gweis(data = covdata, bdinfo = bdinfo, minmaf = 0.2, binary = FALSE) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(lingweis1, caption = "Linear Regression GWEIS") ``` If the user is interested in see what happened to SNPs that weren't included in the data frame, the skipfile option can be used. The skipfile value is the name of a file to write the skipped SNPs to. The skipfile option can be used along with the outfile option. The skip file is in the same format as the output file. Below is an example using the skipfile option. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} skipfile = tempfile() lingweis2 <- gweis(data = covdata, bdinfo = bdinfo, skipfile = skipfile, minmaf = 0.2, binary = FALSE) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(lingweis2, caption = "Linear Regression GWEIS") skipsnps <- read.table(skipfile, header = TRUE, sep = '\t') ``` ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} knitr::kable(skipsnps, caption = "Skipped SNPs") ``` The following table lists the reasons SNPs were skipped given the skipped value. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} reasondf <- readRDS(system.file("extdata", "skipreason.rds", package = "GxEScanR")) knitr::kable(reasondf, caption = "Skipped Reasons") ``` ## Logistic Regression GWAS In this example, the phenotype is coded (0,1). The gwas and gweis routines check for this an will run logistic regressions if the outcome is coded (0,1) unless binary is set to FALSE. If the use wants to make sure the outcome is coded (0,1), the user may set binary to TRUE. In this case, if the outcome is not coded (0,1) an error is produced. The following model is fit when doing a logistic regression GWAS. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(modeldf[1,], caption = "Model Fit") ``` ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} loggwas1 <- gwas(data = covdata, bdinfo = bdinfo, blksize = 2, binary = TRUE) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(loggwas1, caption = "Logistic Regression GWAS", digits = 4) ``` In this example, the option blksize is used. When an analysis is run several SNPs are read in at one time. This saves disk time. The following are the default values for given the number of subjects. These values were chosen to keep the program running using less than 4GB of RAM. The user is allowed to specify a value up to twice the default value. Little performance gain is seen going with larger values. If the user enters 0 for blksize, the default value is used. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} defaultdf <- readRDS(system.file("extdata", "defaultblk.rds", package = "GxEScanR")) knitr::kable(defaultdf, caption = "Default blksize") ``` ## Logistic Regression GWEIS A logistic regression GWEIS fits an additional 4 models that produce 7 more tests. 3 of these models use the the interaction covariate as the outcome. The following show all the models fit in a logistic regression GWEIS. The following model is fit when doing a logistic regression GWAS. ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(modeldf, caption = "Models Fit") ``` Note: When doing a GWEIS the interaction covariate is in the last column of the data frame. ### Logistic Regression GWEIS with Binary Covariate In the example subject data, the covariate is coded (0,1). In this case, the gweis routine will use logistic regression to fit the last 3 models. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} loggweis1 <- gweis(data = covdata, bdinfo = bdinfo, snps = 1:2, binary = TRUE) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(loggweis1, caption = "Logistic Regression GWEIS", digits = 4) ``` In this example the snps options was used. The snps option can either be a vector of indices indicating what SNPs to include or a list of SNPs by SNP ID. A vector of indices was used in this example. ### Logistic Regression GWEIS with a Continuous Covariate In the example subject data, the covariate is coded (0,1) which the gweis routine sees a binary covariate to make the routine do a linear regression 1 can be added to the interaction covariate. This will change the coding to (1,2) which the routine sees as a continuous covariate. ``` {r, eval = T, echo = T, message = F, warning = F, tidy = T} covdata2 <- covdata covdata2$e <- covdata2$e + 1 loggweis2 <- gweis(data = covdata2, bdinfo = bdinfo, snps = c("1:10001", "1:10002")) ``` ``` {r, eval = T, echo = F, message = F, warning = F, tidy = T} knitr::kable(loggweis2, caption = "Logistic Regression GWEIS", digits = 4) ``` In this example the snps options was used with the SNP IDs.