7 Minute Read
Published Dec 2022
This theorem was essential in my understanding of activation functions in the hidden layer how they work, and why they are necessary. I wrote this to deepen my knowledge, while helping others understand the way I did.
↓ Click to expand
Neural networks have universality - no matter how complicated or simple and regardless of the number of inputs and outputs, for any continuous function f(x), there exists a network that can approximate the result. But how can we know for certain that this is true?
To explain this, let’s start with a quadratic function since most people have an intuition for them.
# general version of quadratic
def quad(a, b, c, x): return a*x**2 + b*x + c
# pass in any a,b,c to create function
def mk_quad(a,b,c): return partial(quad, a,b,c)
f2 = mk_quad(2,2,4)
plot_function(f2, "$2x^2 + 2x + 4$")
Since the goal is to approximate this function, we can add noise to points on this quadratic.
def noise(x, scale): return np.random.normal(scale=scale, size=x.shape)
def add_noise(x, mult, add): return x * (1+noise(x,mult)) + noise(x,add)
noise(torch.linspace(-2, 2, steps=20)[:,None], 2)
which we can then use to plug into the function and get a scatterplot.
# deterministic results
np.random.seed(22)
# 20 random numbers between -2 and 2
# [:,None] creates a matrix of values instead of a vector, also known as a rank 2 tensor
x = torch.linspace(-2, 2, steps=20)[:,None]
y = add_noise(f(x), 0.15, 1.5)
We are able to plot the actual function because we know the values for a, b, and c.