Fisher’s exact test is widely applied in bioinformatics (it is the core computation in gene-set or pathway enrichment analysis). I won’t introduce the test itself as others have done it several times (here), but will rather point to a disconnect between what it does and what is often needed.
In Fisher’s exact test, the null hypothesis is that there is no enrichment between the two variables studied. When using this test with large numbers (such as the number of genes in the human genome, or the number of categories in the gene ontology), it is frequent to find very slight enrichments that are flagged as very significant by the test. The outcome of the test becomes very difficult to interpret (how do you deal with a 2% enrichment returning a p-value of 0.0001?).
There is a variation on the test that changes the null hypothesis by stating that the enrichment is less than a given threshold, it makes use of Fisher’s non-central hypergeometric distribution and is implemented in R (using the ‘or’ parameter, see documentation here). When you apply it, you now ask if the data significantly supports an enrichment that is above a threshold than you assume is biologically meaningful. Here is a fictitious example on which both versions of the test are applied:
> data [,1] [,2] [1,] 1100 8900 [2,] 1000 9000 > fisher.test (data, alt="greater") Fisher's Exact Test for Count Data data: data p-value = 0.01119 alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 1.029899 Inf sample estimates: odds ratio 1.112353 > fisher.test (data, or=1.25, alt="greater") Fisher's Exact Test for Count Data data: data p-value = 0.9946 alternative hypothesis: true odds ratio is greater than 1.25 95 percent confidence interval: 1.029899 Inf sample estimates: odds ratio 1.112353
This type of thresholded test provides a great response the problem stated in a previous post.