Factorial and Log Factorial

Factorial and Log Factorial


When you need to calculate n!, you have several solutions. 

  1. The “rush” solution: using a loop or a recursive function: 
    def factorial_for(n):
      r = 1
      for i in range(2, n + 1):
        r *= i
    def factorial_rec(n):
      if n > 1:
        return(n * factorial_rec(n - 1))

    Here, the multiplication of the numbers sequentially will create a huge number very quickly. This is good, but computers are faster when 2 small numbers (120×30240) are involved in a multiplication versus the same multiplication with one huge number (362880×10).

  2. The “smart” solution: Here, the idea is to split the multiplication sequence in two recursively, taking into account the characteristics of computers.
    10! = 1x2x3x4x5x6x7x8x9 x 10 = 362880 x 10 (rush solution)
    = 1x2x3x4x5 x 6x7x8x9x10 = 120 x 30240 (1st split)
    = 1x2x3 x 4×5 x 6x7x8 x 9×10 = 6 x 20 x 336 x 90 (2nd split)
    def range_prod(lo, hi):
      if lo + 1 < hi:
        mid = (hi + lo) // 2
        return(range_prod(lo, mid) * range_prod(mid + 1, hi))
      if lo == hi:
      return(lo * hi)
    def factorial_tree(n):
      if n < 2:
      return(range_prod(1, n))
  3. The “lazy” or “very smart one” (it depends on the point of view): calculating an estimate with Stirling’s approximation .
    import math
    def stirling(n):
      return(math.sqrt(2 * math.pi * n) * (n / math.e) ** n)

    This implementation is limited to n<170, but as you can see below at this number the result is huge. For this reason we don’t work with the factorial directly. We prefer the log factorial.

    [01]: stirling(170) Out[01]: 7.253858934542908e+306 In [02]: stirling(171) Out[02]: inf


Log Factorial:

For the two first solutions, you just need to add math.log() (with an import math):

def log_fact_rec(n):
def log_fact_tree(n):
def log_fact_for(n):

And the log factorial with Stirling’s approximation is implemented like this:

def log_stirling(n):
 fn = float(n)
 fn = n * math.log(fn) + math.log(2 * math.pi * fn) / 2 - fn + \
      (fn ** -1) / 12 - (fn ** -3) / 360 + (fn ** -5) / 1260 - \
      (fn ** -7) / 1680 + (fn ** -9) / 1188

In the case where you have multiple log factorials to calculate, it may be useful to save or use previous results (10! = 6!x7x8x9x10) like this:

def log_fact_exact_saved(n, value_saved):
  select = max((i for i in value_saved.keys() if i <= n))
  if select != n:
    value_saved[n] = range_prod(select + 1, n) * value_saved[select]

def log_fact_approx_saved(n, value_saved):
  if n not in value_saved:
    value_saved[n] = log_stirling(n)


Now we can test each solution and compare it with the library math and scipy:

import sys
sys.setrecursionlimit(1000000) // increase the depth 
import math
import scipy.special
def test_saved(saved, approx):
    value = [10, 50, 100, 1000, 1000, 2000, 5000, 5000, 5000,
             5000, 10000, 11000, 11000, 11000, 20000, 20000]
    if approx:
        if saved:
            value_saved = {1: 1}
            for num in value:
                log_fact_approx_saved(num, value_saved)
            for num in value:
        if saved:
            value_saved = {1: 1}
            for num in value:
                log_fact_exact_saved(num, value_saved)
            for num in value:

def log_fact_math(n):

def log_fact_scipy(n, exact=False):
    return(math.log(scipy.special.factorial(n, exact)))

import timeit
num = 10000
rep = 200
print("rec:" + str(timeit.timeit("log_fact_rec(" + str(num) + ")",
      setup="from __main__ import log_fact_rec", number=rep)))
print("for:" + str(timeit.timeit("log_fact_for(" + str(num) + ")",
      setup="from __main__ import log_fact_for", number=rep)))
print("scipy exact:" + str(timeit.timeit("log_fact_scipy(" + str(num) + ", True)",
      setup="from __main__ import log_fact_scipy", number=rep)))
print("math:" + str(timeit.timeit("log_fact_math(" + str(num) + ")",
      setup="from __main__ import log_fact_math", number=rep)))
print("tree:" + str(timeit.timeit("log_fact_tree(" + str(num) + ")",
      setup="from __main__ import log_fact_tree", number=rep)))

print("scipy approx:" + str(timeit.timeit("log_fact_scipy(" + str(num) + ")",
      setup="from __main__ import log_fact_scipy", number=rep)))
print("log_stirling:" + str(timeit.timeit("log_stirling(" + str(num) + ")",
      setup="from __main__ import log_stirling", number=rep)))

print("value log(100!) tree:" + str(log_fact_tree(100)))
print("value log(100!) scipy approx:" + str(log_fact_scipy(100)))
print("value log(100!) log_stirling:" + str(log_stirling(100)))

print("severals factorial:" + str(timeit.timeit("test_saved(False, False)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial saved:" + str(timeit.timeit("test_saved(True, False)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial approx:" + str(timeit.timeit("test_saved(False, True)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial approx saved:" + str(timeit.timeit("test_saved(True, True)",
      setup="from __main__ import test_saved", number=rep)))


With python 2.7.6 and scipy 0.17.0, we have:

scipy exact:4.08975815773
You can see that the “smart” solution is faster, even versus scipy
scipy approx:0.00166201591492
 Again scipy is slowest
value log(100!) tree:363.739375556
value log(100!) scipy approx:363.739375556
value log(100!) log_stirling:363.739375556
The log transformation estimate simulations are equal with n>=100
several factorials:14.3231370449
several factorials saved:3.51164102554
several factorials approx:0.00481510162354
several factorials approx saved:0.00360107421875
Saving previous results can save a lot of time, but it will take more memory.

I hope this blog post will help you to choose the best solution to compute factorial or log factorial, according to your needs.

By | 2016-11-08T09:30:06+00:00 February 22, 2016|Categories: Bioinformatics, Data Analysis, Performance, Python|0 Comments

Share This Story, Choose Your Platform!

About the Author:

I’ve started as a computer scientist, then I have quickly realised that bioinformatics are saturated by puzzles to solve. As in the "The Summit of the Gods" (Jirō Taniguchi), there are always a new mountain to climb or a path more straightforward.

Leave A Comment