I use independent projects to test ideas that need room outside day-to-day work: small benchmarks, failure
cases, reproducible studies, and code that makes a technical claim easier to check.
Projects
Costgate: CI Cost Regression Gate for LLM Inference (2026)
Status: manuscript in preparation / released code · LLM
Evaluation Tooling
A CI-native tool for catching LLM inference cost regressions before they reach production.
Why - LLM changes can pass quality checks while becoming slower, longer, or
more expensive.
Approach - Runs fixed prompt suites, records token/cost/latency metrics, and
compares pull-request runs against a saved baseline.
Outcome - Produces pass/fail gates and reviewable Markdown/JSON reports for
cost-per-success, token drift, and latency regressions.
Benchmark/diagnostic for learned equations of state. Four identity checks (monotonicity, stability,
Clapeyron slope, acoustic speed), JSON scoring, and guardrails near critical regions.
Why - Accuracy metrics can hide violations of basic physical identities.
Approach - Minimal API; per-check assertions emit structured failures that map
directly to the violated identity.
Outcome - CI-ready tests that block physically impossible predictions before
deployment.
Assessing the Limits of Graph Neural Networks for Vapor-Liquid Equilibrium (2025)
Status: preprint · Negative ResultsVLE
Negative-results study showing that seemingly accurate GNNs fail global thermodynamic consistency;
proposes a hybrid fallback with classical libraries.
Why - Small pointwise error ≠ integrable or identity-preserving fields.
Approach - Stress tests with identity checks; trigger a classical fallback
policy when tests fail.
Outcome - The useful result was not a better GNN; it was a set of failure cases
and a fallback rule for when a learned property model should hand off to a classical library.