Perceptron

Jan 15, 2025 | LLM Concepts

Perceptron = the simplest artificial neuron that takes multiple inputs, multiplies them by weights, adds a bias, and turns the result into a yes/no (or score) output.

  • Inputs: a vector of features (x1,x2,…,xn)
  • Parameters:
    • One weight per input (w1,w2,…,wn)
    • An overall bias (b) → sometimes written as w0 for a neutral input
  • Computation: weighted sum
  • Non-linearity (activation function) that determines the threshold after which output is generated. Most common is ReLU.

Basic perceptron diagram

Intuition

  • Think of a perceptron as a tiny yes/no rule that looks at several numeric input signals and then fires (1) or stays silent (0).
  • Each weight is how much that input “votes” for firing (positive weight) or against (negative).
  • The bias is how “easy” it is to fire even when inputs are small; it shifts the decision boundary.
  • In the simple case where there are only 2 input features, you can literally plot the input data points on a xy plane. Then the perceptron’s decision boundary is a straight line that splits them into ‘fire’ vs ‘don’t fire’ based on their combination of the two features.
  • In 3D, it’ll be a flat plane. In higher dimensions, it’s the same idea but perceptron boundary is a (hyper)plane in n-dimensions. It is always carving the input space into two half-spaces (fire vs not fire).
  • Just one perceptron on it’s own is a linear classifier. If the data needs a curved boundary or multiple disjoint regions (like XOR), a lone perceptron cannot represent it—that’s when we stack perceptrons to get non-linear decision boundaries (MLP or Multi-Layer Perceptrons). That’s what makes the layers of deep learning. See [[Layers]]
  • ChatGPT (GPT 3.5) had 100M perceptrons

Training Perceptron 3D

Complex Stuff

  • There is mathematical proof of Perceptron Convergence theorem which states that If your data can be perfectly separated by a straight line, the perceptron algorithm is guaranteed to find some separating line

Primary Resources

  • 3Blue1Brown – But what is a Neural Network?. The first 6ish minutes talks about neurons (perceptrons) and layers.
  • The first 15ish minutes of Lecture 5, Harvard CS50’s Introduction to Artificial Intelligence with Python 2020 as a simple way to understand perceptrons and it naturally extends into layers and gradient descent concepts. It was my first intro to the basic building blocks of LLMs.
  • Lecture 1: Introduction to Neural Networks and Deep Learning, Hands-On Deep Learning course by MIT OpenCourseWare. Prof. Rama Ramakrishnan delivers it well.
  • Karthik Vedula has an interesting interactive explanation of the Perceptron Learning Algorithm
  • I wrote a post comparing biological and artificial neurons, cross-walking their conceptual overlaps.