A javascript implementation of the non-central version of Fisher’s exact test

A javascript implementation of the non-central version of Fisher’s exact test

In a previous post, I presented a case for choosing a non-central version of Fisher’s exact test for most of bioinformatics’ uses of this test. I will now present an implementation of this test in javascript that could easily be embedded in web interfaces.

Although javascript is probably the least likely language to implement statistical methods, I hope this article will fill in as many details as possible to make it trivial to port it to other languages if the need arises. At the very least, it should demystify the inner working of this extremely useful test. I’ll limit myself to the context of $2\times 2$ contingency tables. Finally, I must admit that I was pleasantly surprised by the performance of this code!

These first functions compute various expressions needed by the non-central hypergeometric distribution. More specifically, they use log-probabilities to avoid manipulating small numbers that would risk producing an underflow.

  function lngamm (z) {
     // Reference: "Lanczos, C. 'A precision approximation
     // of the gamma function', J. SIAM Numer. Anal., B, 1, 86-96, 1964."
     // Translation of  Alan Miller's FORTRAN-implementation
     // See http://lib.stat.cmu.edu/apstat/245
     
     var x = 0;
     x += 0.1659470187408462e-06 / (z + 7);
     x += 0.9934937113930748e-05 / (z + 6);
     x -= 0.1385710331296526     / (z + 5);
     x += 12.50734324009056      / (z + 4);
     x -= 176.6150291498386      / (z + 3);
     x += 771.3234287757674      / (z + 2);
     x -= 1259.139216722289      / (z + 1);
     x += 676.5203681218835      / (z);
     x += 0.9999999999995183;
     return (Math.log (x) - 5.58106146679532777 - z + (z - 0.5) * Math.log (z + 6.5));
  }
  function lnfact (n) {
     if (n <= 1) return (0);
     return (lngamm (n + 1));
  }
   
  function lnbico (n, k) {
     return (lnfact (n) - lnfact (k) - lnfact (n - k));
  }

The “lnbico” (ln of the binomial coefficient) function returns the log of the number of combinations for choosing $k$ in $n$:

\begin{eqnarray*}
\binom{n}{k} & = & \frac{n!}{k!(n-k)!}\\
\mbox{lnfact}(n) & = & \ln(n!)\\
\mbox{lnbico}(n,k) & = & \ln\binom{n}{k}\\
& = & \mbox{lnfact}(n) – (\mbox{lnfact}(k) + \mbox{lnfact}(n – k))
\end{eqnarray*}

The probability distribution of the non-central hypergeometric is as follows (with $\omega$ being the odds ratio threshold):

\[
\frac{\binom{m_1}{x} \binom{m_2}{n-x} \omega^x}{
\sum_{y=x_{\min}}^{x_{\max}} \binom{m_1}{y} \binom{m_2}{n – y} \omega^y
}\]

Where,

\[\begin{eqnarray*}
x_{\min} & = & \max(0, n – m_2)\\
x_{\max} & = & \min(n, m_1)
\end{eqnarray*}\]

Notation used in the code for contingency matrix and marginals:

\[\begin{array}{|cc||c}\hline
\bf n_{11} & \bf n_{12} & n \\
\bf n_{21} & \bf n_{22} &  \\\hline\hline
m_1 & m_2 &  \\
\end{array}\]

For computing the test’s p-value, counts must be rearranged and I’ve used the following notation:

\[\begin{array}{|cc||c}\hline
x & ? & n \\
? & ? &  \\\hline\hline
m_1 & m_2 &  \\
\end{array}\]

Here, deciding on an $x$ allows to fill the table using the sums. The p-value of Fisher’s exact test is then computed by adding the probabilities for scenarios where $x$ is greater than the observed $n_{11}$.  This is stored as “den_sum” and computed by the last “for” loop of the function.  Notice that all sums are done using scaled probabilities (- max_l in log-space), this avoids summing over very small numbers and risking an underflow.  Since the final p-value is the result of a ratio, the scaling factor cancels itself out in Math.exp (den_sum – sum_l). Finally, “w” is used to pass the odds ratio threshold ($\omega$) required.

 function exact_nc (n11, n12, n21, n22, w) {
   var x = n11;
   var m1 = n11 + n21;
   var m2 = n12 + n22;
   var n = n11 + n12;
   var x_min = Math.max (0, n - m2);
   var x_max = Math.min (n, m1);
   var l = [];

   for (var y = x_min; y <= x_max; y++) {
     l[y - x_min] = (lnbico (m1, y) + lnbico (m2, n - y) + y * Math.log (w));
   }
   var max_l = Math.max.apply (Math, l);
 
   var sum_l = l.map (function (x) { return Math.exp (x - max_l); }).reduce (function (a, b) {
     return a + b; }, 0);
   sum_l = Math.log (sum_l);
 
   var den_sum = 0;
   for (var y = x; y <= x_max; y++) {
     den_sum += Math.exp (l[y - x_min] - max_l);
   }
   den_sum = Math.log (den_sum);
   return Math.exp (den_sum - sum_l);
 };

Please be advised that this code has been only very lightly tested. I confirmed that it returns identical results to the R implementation (fisher.exact) on a limited number of non-trivial test cases. If you find any issue, I’d be happy to investigate!

By | 2017-01-13T11:01:46+00:00 January 13, 2017|Categories: Javascript, Statistics, Test|0 Comments

About the Author:

Enjoys turning raw (big, noisy and convoluted!) biological data into knowledge… using any tools that informatics has to offer!

Leave A Comment