Using Binary Dosage files

library(BinaryDosage)

The following routines are available for accessing information contained in binary dosage files

  • getbdinfo
  • bdapply
  • getsnp

getbdinfo

The getbdinfo routine returns information about a binary dosage file. For more information about the data returned see Genetic File Information. This information needs to be passed to bdapplygetsnp routines so it can read the binary dosage file.

The only parameter used by getbdinfo is bdfiles. This parameter is a character vector. If the format of the binary dosage file is 1, 2, or 3, this must be a character vector of length 3 with the following values, binary dosage file name, family file name, and map file name. If the format of the binary dosage file is 4 then the parameter value is a single character value with the name of the binary dosage file.

The following code gets the information about the binary dosage file vcf1a.bdose.

bd1afile <- system.file("extdata", "vcf1a.bdose", package = "BinaryDosage")
bd1ainfo <- getbdinfo(bdfiles = bd1afile)

bdapply

The bdapply routine applies a function to all SNPs in the binary dosage file. The routine returns a list with length equal to the number of SNPs in the binary dosage file. Each element in the list is the value returned by the user supplied function. The routine takes the following parameters.

  • bdinfo - list with information about the binary dosage file returned by getbdinfo.
  • func - user supplied function to be applied to each SNP in the VCF file.
  • - additional parameters needed by the user supplied function

The user supplied function must have the following parameters.

  • dosage - A numeric vector with the dosage values for each subject.
  • p0 - A numeric vector with the probabilities the subject has no alternate alleles for each subject.
  • p1 - A numeric vector with the probabilities the subject has one alternate allele for each subject.
  • p2 - A numeric vector with the probabilities the subject has two alternate alleles for each subject.

The user supplied function can have other parameters. These parameters need to passed to the bdapply routine.

There is a function in the BinaryDosage package named getaaf that calculates the alternate allele frequencies and is the format needed by vcfapply routine. The following uses getaaf to calculate the alternate allele frequency for each SNP in the vcf1a.bdose file using the bdapply routine.

aaf <- unlist(bdapply(bdinfo = bd1ainfo, func = getaaf))

altallelefreq <- data.frame(SNP = bd1ainfo$snps$snpid, aafcalc = aaf)
knitr::kable(altallelefreq, caption = "Information vs Calculated aaf", digits = 3)
Information vs Calculated aaf
SNP aafcalc
1:10000:C:A 0.347
1:11000:T:C 0.016
1:12000:T:C 0.265
1:13000:T:C 0.297
1:14000:G:C 0.204
1:15000:A:C 0.524
1:16000:G:A 0.425
1:17000:C:A 0.443
1:18000:C:G 0.265
1:19000:T:G 0.245

getsnp

The getsnp routine return the dosage and genotype probabilities for each subject for a given SNP in a binary dosage file.

The routine takes the following parameters.

  • bdinfo - list with information about the binary dosage file returned by getbdinfo.
  • snp - the SNP to return information about. This can either be the index of the SNP in the binary dosage file or its SNP ID.
  • dosageonly - a logical value indicating if only the dosage values are returned without the genotype probabilities. The default value is TRUE indicating that only the dosage values are returned.

The following code returns the dosage values and the genotype probabilities for SNP 1:12000:T:C from the *vcf1a.bdose” binary dosage file.

snp3 <- data.frame(getsnp(bdinfo = bd1ainfo, "1:12000:T:C", FALSE))

knitr::kable(snp3[1:20,], caption = "SNP 1:12000:T:C", digits = 3)
SNP 1:12000:T:C
dosage p0 p1 p2
1.000 0.000 1.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
1.016 0.012 0.959 0.029
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 1.000 0.000 0.000
1.002 0.000 0.998 0.002
0.000 1.000 0.000 0.000
0.808 0.192 0.808 0.000
0.168 0.832 0.168 0.000
0.000 1.000 0.000 0.000
1.025 0.000 0.975 0.025
0.000 1.000 0.000 0.000
1.000 0.000 1.000 0.000