A common way to measure the imputation r2 is calculate the variance of the imputed alleles probabilities and divide that by the variance if the alleles were perfectly imputed. An allele is perfectly imputed if Pr (ai = 1) equals 0 or 1 for all i.
The variance of the alleles when are perfectly imputed is q(1 − q) where q is the alternate allele frequency. Given the imputation data We do not know what q is the general population. However we can estimate it using the dosage values for each subject. $$ \hat q = \sum_{i = 1}^{N}\frac{d_i}{2N} $$ where the dosage is calculated as $$ d_i = \frac{\Pr(g_i = 1) + 2\Pr(g_i=2)}{2} $$ Another problem with the dosage data is we don’t have the probabilities for each allele. Instead we have Pr (gi = 0), Pr (gi = 1), and Pr (gi = 2). If we assume that a subject’s two allelic probabilities, q1, q2 are independently imputed we know the following q1(1 − q2) + (1 − q1)q2 = Pr (g = 1) and q1q2 = Pr (g = 2) These equations can be solved resulting in the following values $$ q_1 = \frac{d - \sqrt{d^2 - \Pr(g = 2)}}{2}\\% q_2 = \frac{d + \sqrt{d^2 - \Pr(g = 2)}}{2} $$ There can be some problems using the above equations. Sometimes the value inside the radical can be negative. This can be caused by roundoff error. If the value is negative and close to zero, the value can be set to zero.
Note: The documentation for minimac and Impute 2 indicate that the imputation values for the two alleles are imputed independently.
Since each subject has two alleles we can let q1 to qN represent the first allele of each subject and qN + 1 to q2N represent the second allele. Given this we can calculate all the q’s as follows $$ q_i = \left\{\begin{array}{ll}% \frac{d_i - \sqrt{d_i^2 - 4\Pr(g_i = 2)}}{2} & \; 0<i\leq N\\% \frac{d_i + \sqrt{d_i^2 - 4\Pr(g_i = 2)}}{2} & \; N<i\leq 2N % \end{array}\right. $$ Once the q’s have been calculated, the imputation r2 can be estimated as follows
$$ \hat r^2 = \frac{\sum_{i = 1}^{2N}\frac{(q_i - \hat q)^2}{2N}}{\hat q(1 - \hat q)} $$