Merging Files

library(BinaryDosage)

Quite often subjects have their genotypes imputed in batches. The files returned by these imputation can be converted into binary dosage files. These binary files can be merged into a single file if they have the same SNPs and different subjects using the bdmerge routine.

bdmerge

The bdmerge routine takes the following parameters

  • mergefiles - A character vector of the binary dosage file, family file, and map file names
  • format - Integer value indicating which format of the binary dosage file should be used for the merged files
  • subformat - Integer value indicating which subformat should be used for the merged files
  • bdfiles - A character vector of the binary dosage files to merge
  • famfiles - Character vector of the family files associated with the binary dosage files to merge
  • mapfiles - Character vector of the map files associated with the binary dosage files to merge
  • onegroup - Logical value indicating if the binary dosage saves SNP summary information about each merged file
  • bdoptions - Character vector indicating on which SNP information should be evaluated for the merged files. This cannot be used if onegroup is set to FALSE
  • snpjoin - Character value indicating if an inner or outer join is done for the SNPs

The following code merges vcf1a.bdose and vcf1b.bdose into one binary dosage file. It then displays the number of subjects in each file.

bd1afile <- system.file("extdata", "vcf1a.bdose", package = "BinaryDosage")
bd1bfile <- system.file("extdata", "vcf1b.bdose", package = "BinaryDosage")
bd1file <- tempfile()

bdmerge(mergefiles = bd1file, bdfiles = c(bd1afile, bd1bfile))

bd1ainfo <- getbdinfo(bd1afile)
bd1binfo <- getbdinfo(bd1bfile)
bd1info <- getbdinfo(bd1file)

nrow(bd1ainfo$samples)
#> [1] 60
nrow(bd1binfo$samples)
#> [1] 40
nrow(bd1info$samples)
#> [1] 100