I keep a list of LLM-driven startups in healthcare. Most of what I track falls into two buckets: Operations (scheduling, billing, prior auth) and Clinical (scribes, summarization, care...
Encoder For each input word, we encode It’s position (in the input text). Since Transformer is not an RNN, the sequence aspect can get lost. So instead, here...
A lot of data in the world has sequence.. ie a temporal aspect. Eg. music, spoken language. By treating each data point independently (in a feed-forward only mode)...
A transformer is the major breakthrough of recent times – a neural network architecture that uses attention to process all tokens in parallel instead of step-by-step like an...
I first heard Ethan Mollick on The Ezra Klein Show in April 2024 (“How Should I Be Using A.I. Right Now?”). He offered sensible, practical ways to use...
AI voice dictation is having a moment. These tools do more than transcribe—they read context, add punctuation, and learn your style. Many creators say they work two to...
Gradient descent is an iterative algorithm for minimizing a loss function by moving the model parameters in the direction that most rapidly decreases the loss. Process: Start initially...
Backpropagation is the main algorithm used for training neural networks with hidden layers. The main idea is that you can calculate an estimate for how much the error...
Dharmesh’s post made me realize there’s a name for something I’ve been doing implicitly for a while—using AI to help me write better prompts. Strictly speaking, that’s AI-assisted...
Given an input and its correct (target) output, a loss function compares the model’s prediction to the target and returns a single number measuring how wrong the prediction...
Eg. Dropout Randomly and temporarily remove/shutdown some of the interstitial neurons, so that the weights only flow through a subset. This builds new-paths ways and forces the neural...
It happens to most of us multiple times a week: someone emails asking for a good time to meet, and before you know it, you’re stuck in a...
When I first learned about Gradient Descent about two years ago, I pictured it in the most obvious 3D way – where one imagines two input variables (as...
Chain Rule = how you take derivatives when a value depends on another value, which itself depends on another value (i.e. a composition of functions) Intuition If A...
I first wrapped my head around diffusion models in 2023, thanks to MIT 6.S191 Lecture on ‘Deep Learning New Frontiers‘. The idea of reverse-denoising just clicked for me—it...