How LLMs work – Pallav Sharda's Blog

All How LLMs Work AI and Health Build Notes

December 1, 2025

Encoder For each input word, we encode It’s position (in the input text). Since Transformer is not an RNN, the sequence aspect can get lost. So instead, here...

How LLMs Work

November 26, 2025

RNN

A lot of data in the world has sequence.. ie a temporal aspect. Eg. music, spoken language. By treating each data point independently (in a feed-forward only mode)...

How LLMs Work

November 25, 2025

Transformer

A transformer is the major breakthrough of recent times – a neural network architecture that uses attention to process all tokens in parallel instead of step-by-step like an...

How LLMs Work

September 25, 2025

Gradient Descent

Gradient descent is an iterative algorithm for minimizing a loss function by moving the model parameters in the direction that most rapidly decreases the loss. Process: Start initially...

How LLMs Work

September 24, 2025

Backpropagation

Backpropagation is the main algorithm used for training neural networks with hidden layers. The main idea is that you can calculate an estimate for how much the error...

How LLMs Work

September 19, 2025

Loss

Given an input and its correct (target) output, a loss function compares the model’s prediction to the target and returns a single number measuring how wrong the prediction...

How LLMs Work

September 11, 2025

Model Training Techniques

Eg. Dropout Randomly and temporarily remove/shutdown some of the interstitial neurons, so that the weights only flow through a subset. This builds new-paths ways and forces the neural...

How LLMs Work

August 3, 2025

Think about local minima in thousands of dimensions

When I first learned about Gradient Descent about two years ago, I pictured it in the most obvious 3D way – where one imagines two input variables (as...

How LLMs Work

August 2, 2025

Chain Rule of Calculus

Chain Rule = how you take derivatives when a value depends on another value, which itself depends on another value (i.e. a composition of functions) Intuition If A...

How LLMs Work

July 19, 2025

Embedding

“Embeddings” emphasizes the notion of representing data in a meaningful and structured way, while “[[Vectors]]” refers to the numerical representation itself. ‘Vector embeddings’ is a way to represent...

How LLMs Work

February 13, 2025

Dot Products

Conceptually, for two vectors x and y, x.y is defined as magnitude of a multiplied by projection of y onto x (think of it as shadow cast by...

How LLMs Work

January 20, 2025

Vectors

Vectors have two properties 1. Magnitude (length), 2. Direction. From a computer science perspective, it’s just an ordered list of numbers Vectors can be added, multiplied (=scaled) A...

How LLMs Work

January 20, 2025

Layers

Layers = groups of perceptrons (See [[Perceptron]]) stacked so that each layer’s outputs become the next layer’s inputs, letting the network learn increasingly abstract features. Stacked/layered perceptrons create...

How LLMs Work

January 15, 2025

Perceptron

Perceptron = the simplest artificial neuron that takes multiple inputs, multiplies them by weights, adds a bias, and turns the result into a yes/no (or score) output. Inputs:...

How LLMs Work

October 11, 2024

Exploring the Basics: Biological vs. Artificial Neurons

Alright, OpenAI o1 is out. If you are anything like me, you first chuckled at the description that it was “designed to spend more time thinking before they...

How LLMs Work

Older posts →

Earlier Writing (2008-2017): These are my old notes on health IT and the digital health industry before AI changed the conversation.