Data Debug Dispatch #6: Building on Moving Targets

Feb 17th was Data Debug SF #6 (technically 7, but the first one was just a happy hour). It was the first one that felt like I've really been building a solid community. Besides having the largest number of RSVPs (110) + people attending (31) + reattending (8), I ended up having 5 attendees hanging out with me before the meetup from 4pm onwards. Chilling in the Heavybit basement, listening to EDM, and sitting at the same long-table with Souvla fries and computers out, it felt like a cool study-session in college. All 5 of them are people that also some flavor of business partner with Recce, so it's cool to deepen that working relationship too. Now on to a recap of the talks! I've had a few people ask about recording talks, so I finally started with January's meetup. But for February, I figured writing a recap would be a good writing exercise for me as well as internalizing more technical thoughts.

The Builder Stack

Elvis Kahoro kicked things off making the case for open source & smaller companies as the path forward against market consolidation. The gist: the tooling exists, the ecosystem is there, but the integration story has to be seamless. Fragmented open source tools that don't talk to each other lose to consolidated platforms every time. Developer experience is the differentiator, not the technology itself. In a room full of people who build with these tools every day, that argument carried weight.

Duck Lake, Without DuckDB

This was Zac Farrell (founder at Hotdata), and heads up, it's fairly technical. Duck Lake is a new lakehouse table format from the DuckDB folks. If you know Iceberg, the concept is familiar. The fun fact that surprised the room: Duck Lake is a spec. No dependency on DuckDB itself. You can use Duck Lake without DuckDB entirely. Zac's talk was about building a native integration for Apache DataFusion (a Rust query engine), and the interesting part wasn't really the technology, it was what happens when you build against a spec you don't own. The docs were user-focused, not implementer-focused. The docs & the code didn't always agree. No centralized test harness. You end up building your own validation wrapper around the reference implementation & guessing which features matter most. That challenge ended up being one of the threads the whole night kept coming back to.

Context Graphs, Explained

Daniel Davis (founder of Trustgraph) got into what a context graph actually is. Getting to talk with him before & after was a treat. He's deeply knowledgeable about the space & his passion is infectious. It was great to learn more about a subject I wasn't super deep in. The setup: enterprise AI POCs failed their year-end reviews in late 2025. The post-mortem came down to three failure patterns: the AI sounds right but is wrong, nobody could measure business value, & the compliance teams said stop. The C-suite response, almost universally, was some version of "we need context." A context graph is a subset of knowledge graphs optimized for AI. The foundation is triples: subject, predicate, object. Alice is mother of Bob. Fred is a cat. Fred has 4 legs. Where it gets interesting is reification, which is statements about statements. If an AI agent performs a task & generates new information, you want to capture the decision trace. Who asserted it, when, based on what. The part that got the most back-and-forth: AI-generated content becomes the new context for whatever comes next. The loop closes. Was the initial context wrong? Has ground truth changed? Was it ever ground truth? Daniel put it this way: "The closest we can get to ground truth is something that always was and always will be. Everything else, yeah." That landed. I've been thinking about "contextual observability" as I've gotten more experience with AI development, and this talk was a great way to think about how to answer that. After 2 years building TrustGraph (open source), his answer to "where does this fit in the existing data stack?" was honest: it doesn't. The entire enterprise data estate is built around row-columnar data. Graphs don't fit neatly into that. They'll probably exist in parallel to everything else.

The Throughline

If there was a throughline: building against moving targets. A spec that's brand new (Duck Lake) & a spec from the '90s (RDF) presenting the same challenge: the ground shifts under you. Enterprise AI that failed because the context wasn't there. Open source tooling that has to earn its integration story to survive. Big thanks to Zac, Elvis, & Daniel for the talks. And to everyone who came out. Roger Magoulas, who CL & I had on Data Renegades recently, pulled up too. He's a great conversationalist & I loved getting another chat in person. Practitioners talking to practitioners. That's the whole point of these. If you missed it, two of the talks are on YouTube:

Zac on Duck Lake + DataFusion: https://youtu.be/VtvjyMKYPEA
Daniel on context graphs: https://youtu.be/cgPw4SSl4Ew

We do this monthly. Next one Tuesday 3/24 at Mux near Montgomery Street BART. 5:30pm networking, 6:15pm talks, 7:00pm more networking. Come through. RSVP: https://luma.com/lo8ogbub