Factorial and Log Factorial

Factorial:

When you need to calculate n!, you have several solutions.

The “rush” solution: using a loop or a recursive function:
def factorial_for(n): r = 1 for i in range(2, n + 1): r *= i return(r)
def factorial_rec(n): if n > 1: return(n * factorial_rec(n - 1)) else: return(1)
Here, the multiplication of the numbers sequentially will create a huge number very quickly. This is good, but computers are faster when 2 small numbers (120×30240) are involved in a multiplication versus the same multiplication with one huge number (362880×10).
The “smart” solution: Here, the idea is to split the multiplication sequence in two recursively, taking into account the characteristics of computers.

10! = 1x2x3x4x5x6x7x8x9 x 10 = 362880 x 10 (rush solution)

= 1x2x3x4x5 x 6x7x8x9x10 = 120 x 30240 (1st split)

= 1x2x3 x 4×5 x 6x7x8 x 9×10 = 6 x 20 x 336 x 90 (2nd split)
```
def range_prod(lo, hi):
  if lo + 1 < hi:
    mid = (hi + lo) // 2
    return(range_prod(lo, mid) * range_prod(mid + 1, hi))
  if lo == hi:
    return(lo)
  return(lo * hi)
```
```
def factorial_tree(n):
  if n < 2:
    return(1)
  return(range_prod(1, n))
```
The “lazy” or “very smart one” (it depends on the point of view): calculating an estimate with Stirling’s approximation .
```
import math
def stirling(n):
  return(math.sqrt(2 * math.pi * n) * (n / math.e) ** n)
```
This implementation is limited to n<170, but as you can see below at this number the result is huge. For this reason we don’t work with the factorial directly. We prefer the log factorial.
```
In [01]: stirling(170)
Out[01]: 7.253858934542908e+306
In [02]: stirling(171)
Out[02]: inf
```

Log Factorial:

For the two first solutions, you just need to add math.log() (with an import math):

def log_fact_rec(n):
  return(math.log(factorial_rec(n)))

def log_fact_tree(n):
  return(math.log(factorial_tree(n)))

def log_fact_for(n):
  return(math.log(factorial_for(n)))

And the log factorial with Stirling’s approximation is implemented like this:

def log_stirling(n):
 fn = float(n)
 fn = n * math.log(fn) + math.log(2 * math.pi * fn) / 2 - fn + \
      (fn ** -1) / 12 - (fn ** -3) / 360 + (fn ** -5) / 1260 - \
      (fn ** -7) / 1680 + (fn ** -9) / 1188
 return(fn)

Note:
In the case where you have multiple log factorials to calculate, it may be useful to save or use previous results (10! = 6!x7x8x9x10) like this:

def log_fact_exact_saved(n, value_saved):
  select = max((i for i in value_saved.keys() if i <= n))
  if select != n:
    value_saved[n] = range_prod(select + 1, n) * value_saved[select]
  return(math.log(value_saved[n]))

def log_fact_approx_saved(n, value_saved):
  if n not in value_saved:
    value_saved[n] = log_stirling(n)
  return(value_saved[n])

Test:

Now we can test each solution and compare it with the library math and scipy:


import sys
sys.setrecursionlimit(1000000) // increase the depth 
import math
import scipy.special
def test_saved(saved, approx):
    value = [10, 50, 100, 1000, 1000, 2000, 5000, 5000, 5000,
             5000, 10000, 11000, 11000, 11000, 20000, 20000]
    if approx:
        if saved:
            value_saved = {1: 1}
            for num in value:
                log_fact_approx_saved(num, value_saved)
        else:
            for num in value:
                log_stirling(num)
    else:
        if saved:
            value_saved = {1: 1}
            for num in value:
                log_fact_exact_saved(num, value_saved)
        else:
            for num in value:
                log_fact_tree(num)

def log_fact_math(n):
    return(math.log(math.factorial(n)))

def log_fact_scipy(n, exact=False):
    return(math.log(scipy.special.factorial(n, exact)))

import timeit
num = 10000
rep = 200
print("rec:" + str(timeit.timeit("log_fact_rec(" + str(num) + ")",
      setup="from __main__ import log_fact_rec", number=rep)))
print("for:" + str(timeit.timeit("log_fact_for(" + str(num) + ")",
      setup="from __main__ import log_fact_for", number=rep)))
print("scipy exact:" + str(timeit.timeit("log_fact_scipy(" + str(num) + ", True)",
      setup="from __main__ import log_fact_scipy", number=rep)))
print("math:" + str(timeit.timeit("log_fact_math(" + str(num) + ")",
      setup="from __main__ import log_fact_math", number=rep)))
print("tree:" + str(timeit.timeit("log_fact_tree(" + str(num) + ")",
      setup="from __main__ import log_fact_tree", number=rep)))

print("scipy approx:" + str(timeit.timeit("log_fact_scipy(" + str(num) + ")",
      setup="from __main__ import log_fact_scipy", number=rep)))
print("log_stirling:" + str(timeit.timeit("log_stirling(" + str(num) + ")",
      setup="from __main__ import log_stirling", number=rep)))

print("value log(100!) tree:" + str(log_fact_tree(100)))
print("value log(100!) scipy approx:" + str(log_fact_scipy(100)))
print("value log(100!) log_stirling:" + str(log_stirling(100)))

print("severals factorial:" + str(timeit.timeit("test_saved(False, False)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial saved:" + str(timeit.timeit("test_saved(True, False)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial approx:" + str(timeit.timeit("test_saved(False, True)",
      setup="from __main__ import test_saved", number=rep)))
print("severals factorial approx saved:" + str(timeit.timeit("test_saved(True, True)",
      setup="from __main__ import test_saved", number=rep)))

With python 2.7.6 and scipy 0.17.0, we have:

rec:4.52898597717 for:4.09089803696 scipy exact:4.08975815773 math:4.02955198288 tree:1.2138531208	You can see that the “smart” solution is faster, even versus `scipy`
scipy approx:0.00166201591492 log_stirling:0.000301837921143	Again `scipy` is slowest
value log(100!) tree:363.739375556 value log(100!) scipy approx:363.739375556 value log(100!) log_stirling:363.739375556	The log transformation estimate simulations are equal with n>=100
several factorials:14.3231370449 several factorials saved:3.51164102554 several factorials approx:0.00481510162354 several factorials approx saved:0.00360107421875	Saving previous results can save a lot of time, but it will take more memory.

I hope this blog post will help you to choose the best solution to compute factorial or log factorial, according to your needs.