Encoder For each input word, we encode It's position (in the input text). Since Transformer is not an RNN, the sequence aspect can get lost. So instead, here we encode the position explicitly. Evaluate 'self-attention' - what other OUTPUT words might matter Evaluate...
RNN
A lot of data in the world has sequence.. ie a temporal aspect. Eg. music, spoken language. By treating each data point independently (in a feed-forward only mode) we lose the ability to model the sequential aspect of it. In Recurrent Neural Network , computation of...
Transformer
A transformer is the major breakthrough of recent times - a neural network architecture that uses attention to process all tokens in parallel instead of step-by-step like an RNN. Processes sequences in parallel Built from stacked blocks of attention + feedforward...
Redesigning Apprenticeship for the AI Era
I first heard Ethan Mollick on The Ezra Klein Show in April 2024 (“How Should I Be Using A.I. Right Now?”). He offered sensible, practical ways to use AI without the hype. Shortly after, I read Co-Intelligence and have followed his writing and talks since. In a recent...
Faster writing, different thinking with AI Voice Dictation
AI voice dictation is having a moment. These tools do more than transcribe—they read context, add punctuation, and learn your style. Many creators say they work two to three times faster. Two weeks ago I started using Wispr Flow Pro. Here is what I found. The Good...
Gradient Descent
Gradient descent is an iterative algorithm for minimizing a loss function by moving the model parameters in the direction that most rapidly decreases the loss. Process: Start initially with random weights for all inputs. Compute the loss. Calculate the gradient of...
Backpropagation
Backpropagation is the main algorithm used for training neural networks with hidden layers. The main idea is that you can calculate an estimate for how much the error in the output node is based on the errors in the weights of the node before it. It does so by:...
Metaprompting
Dharmesh's post made me realize there’s a name for something I’ve been doing implicitly for a while—using AI to help me write better prompts. Strictly speaking, that’s AI-assisted prompt refinement. There’s a closely related idea called metaprompting—writing prompts...
Loss
Given an input and its correct (target) output, a loss function compares the model’s prediction to the target and returns a single number measuring how wrong the prediction is. That number is called the Loss Larger loss means a worse prediction; zero loss means a...
Model Training Techniques
Eg. Dropout Randomly and temporarily remove/shutdown some of the interstitial neurons, so that the weights only flow through a subset. This builds new-paths ways and forces the neural network to not depend on any particular neurons. Do it repeatedly, each time...