Your AI Tools Should Get Better Every Time You Use Them

Most of them don't. Every session starts from zero. You fix the same mistakes, re-explain the same conventions, watch the same bad defaults play out. All that correction work disappears when you close the chat window. Context management is the most underrated skill in AI right now, and it's the reason most people think these tools aren't good enough. They're not wrong. Out of the box, they're generic. But you can change that, and the work you put into changing it compounds in a way that most people haven't experienced yet. I've been building this pattern across data engineering, content workflows, and personal projects. The examples here are from Claude Code, but the principle works regardless of tool. If your AI can read persistent instructions, you can build this.

How this started for me

It didn't start with engineering. It started with podcast transcripts. I co-host a data podcast called Data Renegades, and we were using Claude to turn episode transcripts into clips, social posts, and blog content. My teammate Karen had actually been the first to set up a content repo with Claude. She's ESL, so she used it for voice checking and substituting out phrasing that didn't land naturally. Basic edits, tone consistency. It worked for that. When she told me that she was writing and editing blog posts using Claude Code, I thought she was insane. Who can read a markdown file and do document edits? She was working in an IDE with the terminal. Turns out she was ahead of all of us on that. Then Jenneivere, our marketing consultant (and friend/mentor to me!), started building out the podcast workflow. She found through iteration that it was better to stagger different prompts and feed the output of one into the next. We ended up with four prompt stages, each running as a separate chat in Claude.ai. Extract the key moments. Draft social posts from those moments. Adapt for each platform. Review everything for voice. Each stage had its own prompt that Jenneivere kept refining to improve the outputs. It worked, but it was manual. Every session meant opening four chats, re-pasting the right prompts, feeding outputs between them. And every improvement Jenneivere made lived in a living Notion doc. That's when it clicked for me. Karen had a content repo for basic edits. Jenneivere had a multi-stage workflow held together by manually chained prompts. What if the prompts just became skills in a repo? Not just voice checking and term substitution, but actual workflows with persistent context. Files that Claude reads automatically, that reference each other, that improve over time. When I moved to Claude Code, that's exactly what happened. The four chat stages became a single pipeline skill. The ad-hoc preferences became configuration files. The "remember to do this" notes became persistent context that loaded every time. And because it all lived in a repo, Claude could propose updates to the skills itself based on what worked and what didn't. I review the changes, and the ones that make sense persist for every future session. The podcast workflow went from taking a full day to about two hours, including clips, scheduling, posting, and editing before final posts. And every episode run makes the next one tighter because the context keeps improving.

Decomposing skills like functions

The next iteration was realizing that skills don't have to be monolithic. Inside that podcast pipeline were smaller skills that could be reused across completely different workflows. The skill that extracts key moments from a transcript works for pulling highlights from a conference talk or a meeting recording. The skill that adapts content for a specific platform gets called by the podcast pipeline, the blog repurposing workflow, and the one-off "turn this thought into a post" flow. Think of them like functions or macros. Different triggers, same underlying skill. You write the logic once, and any workflow that needs that capability references it. When you improve that skill, every workflow that uses it gets better at the same time. This is where having the LLM help you build the system gets powerful. But "help" is the key word. I'll draft a new skill, then run an AI engineer review pass on it. Does this duplicate something that already exists? Could this be decomposed further? There is almost always something. A redundant instruction buried in two different files. A pattern that should be extracted into shared configuration. A workflow step that could reference an existing skill instead of reinventing it. The AI can surface these because I've built a repo structure where inconsistencies are findable. Small files, clear naming, scoped responsibilities. That's not the model being a reliable auditor. That's the organization making problems visible. And I'm still the one deciding which suggestions are right. I reprompt when the output drifts. I give conversational feedback mid-session to steer it back. I've built dedicated skills whose entire job is to force reinforcement of conventions that would otherwise fade across long sessions. I run subagents to review other subagents' output, and then I review all of it. The system compounds, but only because someone is actively maintaining it.

Expanding to engineering

I'm building out the analytics warehouse at Recce. Production databases flow through S3 into Snowflake, dbt-core on top, the usual STAR schema and medallion architecture. I wanted to see how much Claude could handle if I set up the context right. Skills for naming conventions, primary key patterns, & model structure. Example outputs so it had a reference for what good looked like. I started a fresh session where the only guidance was the context files, not me hand-holding. It followed conventions, wrote clean CTEs, and made certain tables incremental without being told. That last one wasn't the model being clever. It was the context doing its job. The example outputs and naming patterns I'd set up gave it enough signal to make the right call on its own. But it also ignored our existing dim_dates table entirely. Instead of referencing it, Claude did union alls of a bunch of date-related tables every time, then a final CTE before the model output. Rebuilt date logic from scratch when a perfectly good dimension table was sitting right there. Worse: it defaulted to inner joins everywhere without any testing. If we had null org_ids entering the pipeline, I would never know. The inner join silently dropped those rows. No test, no flag, no comment. An AI made a data quality decision that should have been a human decision, and it didn't tell me it was making it. I fixed the joins, fixed the date logic, then had Claude update the conventions file itself:

## Join Conventions
- Default to LEFT JOIN unless the business logic explicitly requires
  an inner join. Add a comment explaining why.
- Never silently filter rows. If a key field is null, flag it as a
  potential data quality issue for the data team to investigate.

## Date Logic
- Always reference dim_dates for date logic. Never rebuild date
  dimensions from source tables.

Those rules live in the repo now. The next time Claude builds models from a different set of production tables, it won't make those mistakes. Not because I'll remember to tell it. Because the context already knows. And nothing I write goes to production without me reviewing it first.

Beyond engineering conventions

Everyone who talks about AI context is talking about engineering conventions. Naming patterns, code style, linting rules. That's table stakes. The real value is baking in business and product context, or the why behind the point of building the thing. Why is a table structured the way it is? Because there was a conversation between the data team and product about what metrics matter for the next board deck. Why do we filter out certain org types? Because sales and finance agreed those are internal test accounts, not real customers. Why does this model join three sources that seem redundant? Because the PM needed a view that reconciles billing data with usage data for a CSM workflow. That context, or understanding, lives in people's heads, in Slack threads, in meeting notes, in the institutional memory of whoever has been at the company longest. When an engineer leaves, it leaves with them. When an AI starts writing code without it, you get technically correct output that misses the point. The engineering conventions tell Claude how to build. The business context tells it why something exists. And the why is what keeps it from making technically correct decisions that are wrong for the business.

The architecture

At the base level you have a persistent instruction file (CLAUDE.md or equivalent). It tells the AI how the project works, what conventions to follow, what behavioral rules to apply every session. Think of it as the constitution of the project. On top of that, skills. Reusable workflows that can reference each other and pull from shared configuration. A skill for dbt model structure. A skill for data validation. A skill that takes a podcast transcript and produces platform-ready content. Then configuration. Preferences, conventions, patterns that evolve over time. These get updated most often because they capture what I learn session by session. The key insight is that the AI can propose updates to all of these. I built a handoff workflow that runs at the end of substantive sessions. It reviews what happened, identifies what was learned, and suggests changes to the relevant context files. I review the proposals, approve what makes sense, and those improvements persist. It also flags stale context for removal. Decisions that were one-time, temporary patterns that expired, information that's been superseded. If you only add and never prune, the context gets noisy and the AI starts weighting outdated information alongside current stuff. The fix isn't one giant memory file. It's multiple memory files scoped to specific domains, loaded only by the skills that need them. Voice preferences load for content work. Business logic loads for engineering. Story examples load when the content calls for them. The fresh, relevant context shows up where it matters instead of everything loading every time. I've also built review panels. Instead of one AI checking work, I run specialized review passes. One checks for convention compliance. One reviews the analytical logic. One looks at it from a data consumer perspective. Each pulls from the same context layer but evaluates from a different angle. I ran this very post through a three-agent panel: an AI engineer checking technical claims, an editor checking voice, and a social media VP checking positioning. The AI engineer caught that I was overcrediting the model and underselling my own repo structure. The editor flagged sections that read like documentation instead of narrative. Those changes are in the version you're reading now. You can build exactly the review process that catches the things you actually get wrong.

You own all of this. Start now.

Prompting is what you type into the chat box. Context is what's already loaded before you type anything. The difference is whether the work you did last session carries forward. A good prompt gets you a good response once. Good context gets you good responses every time. Everything I'm describing lives in your repos, in your files, on your machine. There's no platform learning your preferences and using them to sell you something. No vendor lock-in on your context. No moat except yours. The patterns that are universal (how I think, how I write, what I care about) travel with me across projects. The patterns that are project-specific stay in the project. When I start a new project, I bring the portable context and start building the specific context from day one. There's a side effect worth mentioning. The context files get personal fast. My personal repo captures how I think and write, my work repos capture strategy and business logic that's genuinely proprietary. I wanted to screenshot the handoff workflow for this post but the files were too revealing, which is actually the point: the context is valuable because it's yours. You don't need to build the full architecture to start getting value. Start a CLAUDE.md in one project you work on regularly. After every session where you fix something the AI got wrong, add one line. "Don't do X. Do Y instead, because Z." Include the why. "Default to left joins" is useful. "Default to left joins because inner joins silently filter rows and hide data quality issues" is 10x more useful because it lets the AI apply the principle to situations you haven't anticipated yet. After a couple weeks, your AI sessions will feel noticeably different. Not because the model got better. Because the context did. Then build from there. Let it compound. I'm building a workshop on this for people who aren't software engineers but want to stop starting from scratch every session. Data people, GTM teams, anyone doing repeated knowledge work where the AI should already know how you work. If that's you, reach out. What's missing from most people's AI setup is the context layer that makes the tools theirs. And the only person who can build that is you.