Energy as a Public Metric
I used to treat energy like the room temperature: note it, complain about it, then ignore it. Accuracy and latency were the “real” numbers. Watt-hours were logistics. The turn for me was realizing that energy isn’t just background, but the cost function we pretend not to optimize. If a system only shines when electricity is cheap and invisible, it is not strong. Cheap conditions are doing part of the work.
When I say “energy as a metric,” I don’t mean an after-the-fact footnote. I mean a number that stands next to accuracy and latency in the first line of a result. The simplest one I trust is a ratio: quality per kilowatt-hour. Pick the quality that actually governs the task (mAP, success rate, control stability) and divide by measured kWh for a fixed protocol.
That single fraction forces honest trade-offs. Raise batch size, lower precision, change the planner. If the ratio goes up, we improved the run. If it goes down, we bought a headline.
The protocol matters more than the instrument. I like short, repeatable windows because repeatability beats grand measurement: sixty seconds of work, synchronized counters, stable workload. Report Joules, not guesses. Use on-board power sensors where possible. Otherwise use an external meter and align it carefully with the log.
Then keep the units plain enough for non-experts. Joules become Wh. Wh become grams of CO₂e using a grid-specific factor. It isn’t perfect, but it’s more honest than pretending “energy” is a property of the model rather than the world it runs in.
Energy also changes what we mean by “real-time.” The right question isn’t “Can it run at X Hz?” It’s “What cadence meets the task while staying inside the budget?” Some loops can tolerate bounded jitter if the planner knows its envelope. Some workloads should wait ten minutes for cleaner power and then finish with the same quality at a lower cost per answer. Treating time, accuracy, and energy as a Pareto surface in daily design, instead of only in a paper figure, makes the work less theatrical and more civil.
There are familiar companions to the ratio. Energy-Delay Product tells you when you traded too much latency for thrift. Rank stability checks whether your energy conclusions survive small protocol nudges. Selective reporting is a smell. I should abstain from comparisons I can’t instrument properly. And remember that energy is not a moral category, but a way to keep systems accountable to the infrastructure they depend on like hospitals, homes, and other people’s work.
What I want is cultural. A rising accuracy curve without an energy trace should look unfinished. “Quality per kWh” should be a first question, not an appendix. The code should make the measurement boring. The realization I needed to say out loud, mostly to myself, is simple: energy is not a constraint you bolt on after you’ve “won.” It is a lens that changes what winning means. I’ll know this has stuck when my dashboards feel incomplete without that ratio, and when decisions start moving because the line did.