Research That Does Not Become a Repository

2024-10-13

I used to think serious research had to leave behind a public object. A repository, a preprint, a clean dataset, a figure someone could rerun from their laptop. That instinct is mostly good. Public artifacts discipline the work. They punish hand-waving. They force enough structure that someone else can disagree with you without needing to inherit your entire life.

Industrial R&D complicates that story. Some of the most serious work I’ve done cannot be made public in the form I wish it could. The model is tied to proprietary chemistry. The dataset is entangled with plant assumptions, vendor information, or internal design choices. The useful script only makes sense inside a workflow that belongs to a company. That does not make the result fake. It just answers to a different chain of custody.

Molecular simulation taught me this early. A trajectory is atoms moving in a box, and also a stack of choices: force field, charge assignment, equilibration protocol, boundary conditions, sampling window, thermostat, barostat, convergence check, post-processing rule. A diffusion coefficient or adsorption trend looks like one number when it leaves the workflow, but inside the work it is a record of every assumption that survived enough scrutiny to be trusted downstream.

That was the useful lesson. The research object was not the plot. It was the capability.

A capability is harder to show and easier to underrate. It is the difference between “I ran a simulation” and “this team now has a repeatable way to ask molecular questions and pass the answers into larger engineering models.” The same question can be asked again when the material changes, when the temperature changes, when the downstream CFD boundary condition needs a number that is not decorative. Uncertainty has somewhere to go instead of being hidden in a slide note.

The repository version of this work would have been cleaner. A small example system, a few scripts, a notebook with pretty trajectories. Useful, maybe. Honest, only partly. The real work was uglier: environment setup, HPC friction, file formats, failed assumptions, internal validation, and the repeated act of turning atomistic information into something an engineering workflow could actually use. That kind of work does not always become a public package. Sometimes it becomes institutional memory with tests attached.

I don’t like using confidentiality as a shield for sloppy work. Private research still owes itself the same standards: pinned environments, documented parameters, provenance, versioned inputs, logs that survive the person who wrote them. If anything, private work needs more discipline because it does not get the corrective pressure of strangers. A public repo can be embarrassed into clarity. A private workflow has to choose it.

There is also a strange emotional adjustment. Public work gives you a visible receipt. Industrial research often gives you a quieter one: a team that stops guessing, a design decision that becomes less vague, a tool that lets the next question start one level higher. That can feel less satisfying because it is harder to see from the outside. But usefulness is not always photogenic.

The question I trust is simple: did the work reduce uncertainty in a way someone else can reuse? If yes, it was research. If the reusable thing is a method, a protocol, a calibrated workflow, or a set of constraints that prevented a bad decision, it still counts. Not every serious artifact becomes a repository. Some become machinery inside the walls. The obligation is to make that machinery traceable, boring, and honest enough that it can outlive the person who built it.