Output is not Signal | in principle

Big fan of OpenClaw. The solutions are interesting. The creativity is through the roof. And it’s a great showcase of what can be achieved by democratizing access to AI.

I’ll take this in a different direction though. Rather than go for “epistemic common sense” or “claw make toy”, it’s interesting to think about a structural issue OpenClaw is surfacing. And tie it back to this idea of collapsing the domain logic and how much we can afford to do it.

The training data for whatever agent, be it Claude, something OpenClaw pulls through OpenRouter and so on, is the visible surface of human expertise. Documentation, public writing, coding procedures, decisions coupled with first principles thinking. But it doesn’t see the Iceberg. Most people haven’t been confronted with having to create a flowchart of their day-to-day. With writing a reflection about a 4pm fire drill in their domain (that the machine later has access to). They live it.

This is the exact gap I’m referring to when talking about context graphs, and this substrate that agents need to do meaningful work. And it has a few implications.

The Agent Doesn’t Know Wikipedia Isn’t a Source

I’m suspicious of claims like “I just made my Claw do x” where x is a domain specific task. Take, for instance, a rather benign example of a daily “finance news” feed delivered via email or instant messaging. At first glance, all the ingredients are in place: Perplexity has rich APIs that can scour context, enrich data, even summarize, stock market prices are readily available through APIs that have generous usage limits. All that’s left is to add a bot key to your config and have OpenClaw set up a daily job that synthesizes this information. “I just made my Claw give me actionable financial insights, WSJ is in ruins”.

For Claw (I use this endearingly, and I don’t think it matters what foundation model it uses in the background) to give you something useful and relevant for your areas of investment (commodities, crypto, and so on), here’s what would have to happen in the background, in no particular order, and not exhaustively by any means:

Claw figures out that single open-ended calls randomly cover 2-3 assets, it decides to build some sort of universe file with canonical mappings from category to asset, normalizes so NVDA, nvda, Nvidia etc. map to the same row.
Claw figures out the difference between structural and tactical horizons. One-shot produces open-ended prose. Is Gold on a 30 year run? Or has it spiked in the past week? Same asset, different horizons, different conclusions. Which one does the agent give you?
Open-ended prose not particularly useful if you can’t map it to prior state. Claw would have to figure out a way to hold on to state and a delta function. An investor would care whether XAU has been on a run for 30 days or 7 days when reading about market bullishness.
Citation quality. Coming back to my Wikipedia jab. Each asset, and each bucket has its own reputable sources. When sourcing, Claw would have to add this to its universe.
Claw would have to think about confidence. Perplexity search_results are (as of writing this post) non-deterministic across calls. If the job runs once per day, the conclusions could be wildly different depending on what surfaces. Cross-run citation analysis would, if anything, teach us more about Perplexity’s search behaviour than about the underlying reporting.
Event calendars. You want those to surface. How do you balance that with the narrowed aperture from sourcing?

The catch is that each architectural choice pulls the next one in. Deterministic coverage forces structured output, forces validation, forces structured rendering. Claw can’t figure this out. It has a vague idea of what the user wants and is missing a deeper intention on how to bring it all together so that it’s more useful than, say, a TradingView watchlist digest.

All of this is buildable, of course, and whether a 50 line script or a 2000 line script is doing the heavy lifting in the background is immaterial. There are two takeaways ultimately: it’s unlikely that all of the above happened by chatting with Claw over WhatsApp while you were picking up the kids from school. But, more importantly, a clean practical implication: the machine is unlikely to surface what you need to be truly useful, especially in an autonomous regime such as Claw.

A tale of two strategies

OpenClaw’s version of addressing this challenge is the growing ecosystem of skills. There’s probably some arguments to be made about what you can conceivably generalize and wrap as an .md file of sorts but the point to take home is that OpenClaw, by virtue of adoption, hype, enthusiasm, whatever you want to call it, is running a breadth first search across the user space to populate the agent’s understanding of how the world works. It’s elegant in a way, because it bypasses the challenge of finding a domain expert who happens to be able to articulate what that 4pm fire drill looks like. This person will likely emerge as a user and contribute eventually. It’s a long bet, and assumes the incentive structures around publishing tacit knowledge hold steady.

Context graphs face a mirror-image version of this problem and are responses to the gap between training data and useful action. Skills bet on broadcast, graphs bet on depth, that the organization’s iceberg can be captured. They’re all about telling the machine what matters, with graphs turning this into a fidelity problem in a highly-specific, high-stakes environment. New facets arise here and a chicken-egg problem to unpack. Graphs have a cold-start problem. They’re only really as useful as the information contained therein. Without information, they are useless, a container to be filled, an interesting technical infrastructure piece. They rely on someone bootstrapping the substrate, on frictionless UX designed for people who haven’t been faced with having to build a sequence diagram of their work and deep intentionality in what is compressed and what actually lands in the graph.

Which incompleteness you can live with and how much latent structural truth must be externalized before an agent stops being theatrically useful and becomes trustworthy.

This is an old problem in new clothes. Apprenticeship was precisely a thing because of the lossy nature of writing knowledge down. Cookbooks went from scorching, searing, browning to counting minutes, thermometers, scientific-sounding videos about Maillard reactions. More precise, each time, but never quite the act of cooking.

There’s a recursive (and kind of funny) flavor of this: if there are verticals or niches where the cost of the pain offsets the cost of populating the graph - that’s something the machine, as of now, will get infinitely close to but won’t quite be able to tell you.