Note: I had stopped writing posts in 2017. Slowly getting back into it in 2024, mostly for AI.

Curious historical connection between psychology and LLMs

Feb 12, 2024 | Concepts

A few months ago my curiosity around how-are-LLMs-‘learning’ took me down the rabbit hole of AI and Psychology history and I ended up finding a string of very interesting and related developments from the last 120 years:

1905: Harvard-graduate psychologist Edward L. Thorndike published his ‘Law of Effect’ which basically says that animal behaviors are shaped by consequences. That is, behaviors that result in a pleasant consequence are likely to be repeated. And those that result in unfavorable or unpleasant consequence are less likely to recur.

1930s: B.F. Skinner, a prominent American psychologist expanded upon Thorndike’s ideas and developed the ‘Theory of Operant Conditioning’. It describes a learning process where voluntary behaviors are modified by association with a reward or punishment. Think of it like this: Imagine you’re teaching a dog to sit. Every time the dog sits when you say “sit,” you give it a treat. The dog learns that sitting leads to a reward, so it’s more likely to sit again when asked. Skinner’s famously develops ‘Skinner Box’, a controlled environment (with levers, traps, etc) where animals could perform specific behaviors by manipulating rewards (such as food pellets). Skinner used them to study how animals learned and adapted their behavior. During this time, the term “reinforcement” starts getting used in context of learning (not just by Skinner, but Thorndike himself. The term also appears in the english translation of Ivan Pavlov’s classical conditioning procedure book.)

1940s-50s: During World War II, B.F. Skinner works on “Project Pigeon,” an attempt to develop a pigeon-controlled guided bomb. The bombs never got deployed, but Skinner was successful in training pigeons, and discovers the concept of ‘shaping‘ – how to gradually mold or shape a desired behavior by reinforcing successive approximations of that behavior. Not rewarding just the final outcome of behavior, but gradually encouraging small steps towards it. The effectiveness was so striking and immense that two of Skinner’s researchers left that academic research in 1943 to start an animal-training company called Animal Behavior Enterprises (ABE). Over the next few decades, ABE became the largest company of this kind in the world – they trained over 15,000 animals across 150 species. Clients included theme parks, oceanariums, government agencies, TV agencies, and many others.

So where does this all fit in with LLMs?

Here is how the dots seem to connect for me: LLMs are based on Neural Networks (NNs) architecture, and NNs are made up of layers of neurons. Much like its biological counterpart, an artificial neuron receives inputs, processes them, and then produces an output based on those inputs.

When the brain forms memories or learns a new task, it encodes the new information by tuning connections between neurons – strengthening or weakening them. Similar to that biological neuron tuning, each input in an artificial neuron has an assigned a weight to signify its importance. These weights are adjustable parameters that influence the final output – that’s how NNs are trained (i.e. ‘learn’).

When training a NN, the initial weights are not prescribed, they randomly chosen. And from that random start of input weights, the NN starts optimizing the output. Basically doing innumerable cycles of changing weights, re-calculating output, checking the margin of error to the desired outcome (an optimization technique called gradient descent). Conceptually sounded very much like shaping to me. Of course, that’s not the only key aspect of LLMs – just the one that stood out for me.