Four threads, one shared premise: the interesting problems in AI right now aren't inside the model — they're in the coordination layer around it. Between providers. Between the agent and the person authorizing it. Between the model's confidence and the human's trust. Between the simulation and the experiment. The gap between what a model can do and what it can do for you is where I spend my time.
No single model provider runs a 48-hour research job well. Claude has the reasoning, Gemini has the context window, local models have the rate-limit headroom, a specialized model has the domain knowledge. The actual research workflow inevitably crosses provider boundaries, and every crossing today is a manual handoff — copy a transcript, paste it into the next interface, re-explain the goal, hope the next model catches the thread.
The Hermitage is my attempt to do that coordination programmatically. A dedicated VM running an Agent Manager UI that dispatches tasks to Claude, Gemini, and local models based on which one is best for the subtask, and a Covenant dashboard that tracks which agent is holding which piece of state. It's a crude first draft of something that, done right, would be worth the whole current model-wrapper app category put together.
The interesting technical question isn't “which model is smartest.” It's: how does state — memory, open questions, partial conclusions, sources, corrections — survive a handoff between providers that don't know about each other? Whoever builds that layer cleanly gets to run research jobs the scale of a small lab out of a single laptop.
Existing auth was built assuming a human is at the endpoint. OAuth screens, SMS 2FA, “click approve on your phone” — the consent ceremony is sized to match a single human action. Agents break that in both directions: they need credentials to do useful work, and they need to prove to third parties that they're acting inside a scope the user authorized.
My first startup, LBR, was identity and access work. The problems I hit with agents now are the same category, one abstraction layer up. I run my personal agents on a Compute Engine VM via Chrome Remote Desktop specifically to sidestep credential delegation — the agent acts through my actual logged-in browser, so there's no “give Claude my API token” moment. That's not a solution; it's evidence that the tooling is bad enough that operating a remote desktop is the least-bad available option.
What I'm watching for: scoped, revocable, time-bounded credentials built for agents instead of humans. Tokens that say “this agent can read my Gmail for the next 4 hours, cannot send, cannot delete, and emits an audit trail I can review.” That primitive doesn't exist yet, and it's the thing that unlocks everything downstream — agent-to-agent delegation, marketplaces, third-party agent integrations — without the whole system collapsing into a trust bankruptcy the first time an agent gets prompt-injected.
Most AI UX today is a chat box wrapped around a model. That's not design; it's exposure. Real design shows up in the moments the chat box handles badly: when the model is uncertain and doesn't say so, when the user wants to interrupt and there's no way, when the answer is partially right and the interface offers no tool for correction, when trust is violated and there's no recovery path.
Trust calibration is the through-line. I wrote about it in trust-calibration-ai-ux because it's the UX problem I kept hitting at Manatt — lawyers needed document-AI outputs to be useful and distrustable in the right proportion. Too confident, and they stopped reading critically; too hedged, and they stopped using it. The interface has to carry signals the model itself doesn't know how to emit.
The /explore routes on this site are me poking at adjacent shapes: what if the narrative context for AI-generated content were spatial? What if the reader could walk around inside the essay instead of scrolling past it? Not a claim these are the answer — a claim that the chat-box default deserves actual competition.
The current build is glimPSE — a WebGPU-powered web app for LAMMPS molecular dynamics visualization. Drag a dump file into the browser, first frame renders in under two seconds, rotate around millions of atoms at 60fps, export a 4K publication image or an MP4 without installing anything. The competitive field right now is OVITO (desktop, paid-per-seat for the features researchers actually need for papers), VMD (1990s-era UI that routinely explodes to 220GB of RAM on a 4GB trajectory), and 50-line matplotlib scripts that only produce 2D plots. Nobody else has built a WebGPU-native molecular visualization tool for materials science, and they should have.
glimPSE is the wedge for glim— a longer-horizon open-source platform meant to unify DFT (VASP-compatible plane-wave PAW), classical and reactive molecular dynamics (LAMMPS-compatible), and an ML interatomic potential pipeline into one stack. The current research tooling landscape is a chain of specialized command-line programs with incompatible file formats, duplicated UIs, and paywalls at the quality-matters boundary. It's the same meta-problem from the first section: the capability already exists; the coordination layer around it is missing.
The materials-science version of the coordination problem is especially tangible — a DFT simulation produces training data for an ML potential, which enables a much larger MD simulation, which reveals a structural motif worth refining with more DFT. Nothing about that loop has to be hand-wired by a grad student, and nothing about the visualization layer has to cost per-seat per-year. Live at lupine.science; code, product plan, and research notes are public in the lupine repo.
Interested in any of these threads? I'm always open to meaningful conversations.