Note: I had stopped writing posts in 2017. Started again in late-2024, mostly for AI.

Metaprompting

Dharmesh's post made me realize there’s a name for something I’ve been doing implicitly for a while—using AI to help me write better prompts. Strictly speaking, that’s AI-assisted prompt refinement. There’s a closely related idea called metaprompting—writing prompts that generate other prompts—which also makes a real difference, especially for deeper research. These omniscient models have been...

read more

Think about local minima in thousands of dimensions

When I first learned about Gradient Descent about two years ago, I pictured it in the most obvious 3D way - where one imagines two input variables (as x and y axis in a 2D plane) and the loss being the third (z) axis. In terms of 'local minima' I imagined it as the model getting stuck in a "false bottom" of this bowl-shaped landscape, unable to reach the true minimum, the lowest point. But this...

read more

How Diffusion Models Power AI Videos: An Incredible Visual Explanation

I first wrapped my head around diffusion models in 2023, thanks to MIT 6.S191 Lecture on 'Deep Learning New Frontiers'. The idea of reverse-denoising just clicked for me—it reminded me of how our brains pick out shapes and objects in clouds or random mosaics. Yesterday my subscription from 3Blue1Brown surfaced 'But how do AI videos actually work?' a guest video by ‪@WelchLabsVideo‬. That video...

read more

Three Years of Learning AI: Resources That Shaped My Intuition

This weekend I finally finished reading Why Machines Learn: The Elegant Math Behind AI (by Anil Ananthaswamy). It took me seven months—an unusually long time for a 500-page book. But the detour was worth it: the book kept sending me down side paths, like brushing up on linear algebra and derivatives—topics I hadn’t revisited in nearly three decades. Now that I'm done, the book feels like a...

read more

Exploring the Basics: Biological vs. Artificial Neurons

Alright, OpenAI o1 is out. If you are anything like me, you first chuckled at the description that it was "designed to spend more time thinking before they respond". But once I delved deeper, it quickly became mind-blowing. (By the way, Ethan Mollick offers an excellent explanation of the power of dedicating more computational resources to “thinking.”) Developments like this deepen my admiration...

read more

Generative AI and Healthcare: An ongoing list of application areas

It's easy to feel the immense transformational capacity of Generative AI as a solution. And healthcare has no shortage of problems to solve. The real insight is in figuring out viable application areas and use cases. Things are becoming a bit clearer in that aspect and it's worthwhile to keep an ongoing list of where Gen AI application makes sense in healthcare. This post is always under...

read more

Three simple examples of LLM confabulations

Large Language Models (LLMs) like ChatGPT can handle two aspects of communication very well: plausibility and fluency. Given an input context they determine what are the most probable sequence of words and string them in a way that is superbly eloquent. That makes the output very convincing. But it's no secret that LLMs can provide entirely false outputs - they can confabulate. Not hallucinate...

read more

Curious historical connection between psychology and LLMs

A few months ago my curiosity around how-are-LLMs-'learning' took me down the rabbit hole of AI and Psychology history and I ended up finding a string of very interesting and related developments from the last 120 years: 1905: Harvard-graduate psychologist Edward L. Thorndike published his 'Law of Effect' which basically says that animal behaviors are shaped by consequences. That is, behaviors...

read more

Language Models and GPT’s evolution

As explained in this Stanford CS50 tech talk, Language Models (LMs) are basically a probability distribution over some vocabulary. For every word we give an LM, it can determine what the most probable word to come after that. It's trained to predict the Nth word, given the previous N-1 words. If that sounds like simple probability calculation, you are not realizing that predicting the next word...

read more

Vector embeddings

This seemed like the core ideas so I wanted to clarify them conceptually. "Embeddings" emphasizes the notion of representing data in a meaningful and structured way, while "vectors" refers to the numerical representation itself. 'Vector embeddings' is a way to represent different data types (like words, sentences, articles etc) as points in a multidimensional space. Somewhat regrettably, both...

read more