The universal approximation theorem from scratch

7 Minute Read

Published Dec 2022

This theorem was essential in my understanding of activation functions in the hidden layer how they work, and why they are necessary. I wrote this to deepen my knowledge, while helping others understand the way I did.

↓ Click to expand

Table Of Contents

Neural networks have universality - no matter how complicated or simple and regardless of the number of inputs and outputs, for any continuous function f(x), there exists a network that can approximate the result. But how can we know for certain that this is true?

Setup

To explain this, let’s start with a quadratic function since most people have an intuition for them.

# general version of quadratic
def quad(a, b, c, x): return a*x**2 + b*x + c
# pass in any a,b,c to create function
def mk_quad(a,b,c): return partial(quad, a,b,c)

f2 = mk_quad(2,2,4)
plot_function(f2, "$2x^2 + 2x + 4$")

Untitled

Since the goal is to approximate this function, we can add noise to points on this quadratic.

def noise(x, scale): return np.random.normal(scale=scale, size=x.shape)
def add_noise(x, mult, add): return x * (1+noise(x,mult)) + noise(x,add)

noise(torch.linspace(-2, 2, steps=20)[:,None], 2)

That will print the following output,

which we can then use to plug into the function and get a scatterplot.

# deterministic results
np.random.seed(22)

# 20 random numbers between -2 and 2
# [:,None] creates a matrix of values instead of a vector, also known as a rank 2 tensor
x = torch.linspace(-2, 2, steps=20)[:,None]
y = add_noise(f(x), 0.15, 1.5)

Untitled

We are able to plot the actual function because we know the values for a, b, and c.