Scientific Compute as a Public Utility
I want scientific computing to feel like turning on a tap. Reliable, not glamorous. You expect clean water at pressure, within a known range of temperature and impurities, with a meter you can read and a bill you can dispute. Our pipelines should work the same way: available, audited, boring to operate, and clear about what they can’t do. Right now too much of the work depends on personalities and folklore. If progress relies on whoever remembers which flag made the figure, that’s not a good system.
The utility mindset starts with provenance. Every step becomes a node in a DAG with content-addressed inputs and outputs: model@sha, data@sha, code@sha, env@sha. If any hash changes, the graph knows what to rebuild. You can ask “what produced this number?” and get a path, not a story. Byte-identical reproduction is how collaborators disagree safely. If I can’t regenerate your result by pinning the same addresses, the argument will drift into taste.
Uptime and latency belong here too. Pipelines should publish an SLO the way services do: p95 time-to-reproduce, p95 time-to-ingest, error budgets that burn visibly. When a workflow exceeds its SLO, it should degrade in a defined way (smaller sample, lower fidelity, cached artifact with a warning) rather than timing out and wasting the day. It’s about aligning expectations so decisions don’t stall on invisible friction. When compute behaves like a utility, planning becomes possible.
Interfaces matter more than implementations. A pipeline that accepts typed, unit-annotated inputs and emits typed, unit-annotated outputs with a schema version is making a promise. Pre-conditions fail fast, and post-conditions are checked in the same run. Logs need to be records, not console graffiti: timestamps, seeds, device info, and a minimal audit trail for which human touched what. None of this is heavy. Most of it fits in a dozen lines of YAML and a small wrapper. I do it to reduce the places where personality can hide.
Fairness is a utility question too: who gets water when demand spikes. Shared infrastructure that exposes price signals (dollars, queue time, grams CO₂ per run) allows teams to schedule work like adults. Concessions become policy, not favors. Batch jobs shift to off-peak. Heavy runs require a justification tied to impact. Long-lived artifacts are archived with a retention plan. It’s easier to be generous when the meter is honest. It’s also easier to say no without drama.
I’ve learned to like boredom in this context. Boring means the same command works tomorrow. Boring means a new collaborator can rerun a figure on their machine without summoning a wizard. Boring frees the interesting parts, the ideas, the models, the measurements, from glue code and guesswork. The romance is in the question we can finally ask because the build no longer gets in the way, not in the build itself.
What I would like is two curves that climb together: time-to-reproduce trending down, and provenance coverage trending up. When those move in tandem across projects and institutions, we’ll have something close to a utility: scientific compute that you can trust with public work because it behaves like public infrastructure. Not a tap that gushes novelty, just one that runs clear.