It happens to most of us multiple times a week: someone emails asking for a good time to meet, and before you know it, you're stuck in a back-and-forth scheduling spiral. It's a mundane friction that adds up. Four weekends ago, I decided to vibe-code my way to a solution. The result? A fragile but functional AI-native assistant I called Munshi (Hindi for secretary, pronounced moon-she). You can...
Think about local minima in thousands of dimensions
When I first learned about Gradient Descent about two years ago, I pictured it in the most obvious 3D way - where one imagines two input variables (as x and y axis in a 2D plane) and the loss being the third (z) axis. In terms of 'local minima' I imagined it as the model getting stuck in a "false bottom" of this bowl-shaped landscape, unable to reach the true minimum, the lowest point. But this...
Chain Rule of Calculus
Chain Rule = how you take derivatives when a value depends on another value, which itself depends on another value (i.e. a composition of functions) Intuition If A affects B and B affects C, then A affects C through B. The chain rule just says: Total sensitivity of these order of operations = (how sensitive C is to B) × (how sensitive B is to A) A neural network is a long chain of computations...
How Diffusion Models Power AI Videos: An Incredible Visual Explanation
I first wrapped my head around diffusion models in 2023, thanks to MIT 6.S191 Lecture on 'Deep Learning New Frontiers'. The idea of reverse-denoising just clicked for me—it reminded me of how our brains pick out shapes and objects in clouds or random mosaics. Yesterday my subscription from 3Blue1Brown surfaced 'But how do AI videos actually work?' a guest video by @WelchLabsVideo. That video...
Embedding
“Embeddings” emphasizes the notion of representing data in a meaningful and structured way, while “[[Vectors]]” refers to the numerical representation itself. ‘Vector embeddings’ is a way to represent different data types (like words, sentences, articles etc) as points in a multidimensional space. OpenAI’s vector embedding model is called ada-002 (read their Dec 2022 post announcing it) There...
Three Years of Learning AI: Resources That Shaped My Intuition
This weekend I finally finished reading Why Machines Learn: The Elegant Math Behind AI (by Anil Ananthaswamy). It took me seven months—an unusually long time for a 500-page book. But the detour was worth it: the book kept sending me down side paths, like brushing up on linear algebra and derivatives—topics I hadn’t revisited in nearly three decades. Now that I'm done, the book feels like a...
Dot Products
Conceptually, for two vectors x and y, x.y is defined as magnitude of a multiplied by projection of y onto x (think of it as shadow cast by y onto x) if the x and y are at right angles (orthogonal), x.y will be zero, regardless of the length of either of them Complex stuff The set of weights in a neuron are nothing but a vector (w1, w2, ..) That weight vector is orthogonal to the line that is...
Vectors
Vectors have two properties 1. Magnitude (length), 2. Direction. From a computer science perspective, it’s just an ordered list of numbers Vectors can be added, multiplied (=scaled) A key operation on vectors is the dot product. Conceptually the dot product a.b is defined as the magnitude of vector a multiplied by the projection of vector b onto a. Projection can be thought of as the “shadow...
Layers
Layers = groups of perceptrons (See [[Perceptron]]) stacked so that each layer’s outputs become the next layer’s inputs, letting the network learn increasingly abstract features. Stacked/layered perceptrons create a feed-forward network (ie going from an input to an output layer, never backwards). The layers between input and output are called hidden layers because they are the intermediate...
Perceptron
Perceptron = the simplest artificial neuron that takes multiple inputs, multiplies them by weights, adds a bias, and turns the result into a yes/no (or score) output. Inputs: a vector of features (x1,x2,...,xn) Parameters: One weight per input (w1,w2,...,wn) An overall bias (b) → sometimes written as w0 for a neutral input Computation: weighted sum Non-linearity (activation function) that...