The Secret to Reproducible and Portable Optimization: ORPilot’s Intermediate Representation (IR)

The Secret to Reproducible and Portable Optimization: ORPilot’s Intermediate Representation (IR)


In my previous post, I four core innovations that makes ORPilot a production-oriented open-source LLM-for-OR tool, namely interview agent, data collection agent, parameter computation agent and intermediate representation (IR). Among the four innovations, the IR is the most important one that differentiates ORPilot from an academic prototype and endows it with the potential to be a production-level tool, since it deals with two issues that a production environment cares most about: reproducibility and portability. In this post, I will give you a deep dive into ORPilot’s IR structure.

What Is IR?

There is a problem that almost nobody talks about when discussing AI-generated optimization models: what happens after the first solve?

You get your model working. You get an optimal solution. And then three weeks later, you need to re-run it with updated demand data. Or your colleague on a different machine needs to reproduce the result. Or your company decides to switch from Gurobi to an open-source solver because of licensing costs. Or you want to ask “what if we increase the capacity of a facility by 20%?” With most existing LLM-for-OR tools, the answer to all of these questions is the same: you need to start over, call the LLM again, pay the API cost again, generate the solver code again, and hope to get the same model structure. However, the open-source AI optimization modeling agent ORPilot provides an alternative solution to this problem: Intermediate Representation (IR).

The IR is a solver-agnostic, typed JSON schema that captures the complete mathematical structure of an optimization model. Not the optimization code, but the model itself, expressed in a form that is independent of any particular solver.

ORPilot’s IR structure has five top-level sections.

(1) Sets: named collections of entities, such as Workers, Tasks, Plants, Periods. Each set knows where its members come from: a CSV file, a scalar count, or a hardcoded list.

(2) Parameters: indexed numerical data from CSV files, each linked to its domain (which sets index it) and to the exact column names needed to load it.

(3) Variables: decision variables with type (continuous, binary, integer), domain, bounds, and structural flags.

(4) Objective: a symbolic expression tree over variables and parameters — sums, differences, products, indexed sums in solver-neutral form.

(5) Constraints: named symbolic constraints with domains, expression trees, and sense (<= or = or >=). Every constraint is a complete, self-describing object.

Let’s make this concrete by looking at a specific worker task assignment problem below.

Worker-Task Assignment Problem Example

In this problem, four workers must be assigned to four tasks, one task per worker, one worker per task. Each (worker, task) pair has a cost from a CSV file. We try to minimize the total assignment cost. This is a classic assignment problem, which is an integer program.

The data lives in two files:
(1) sets.csv (all set members in one place):
set_name element
workers w1
workers w2
workers w3
workers w4
tasks t1
tasks t2
tasks t3
tasks t4
(2) assignment_costs.csv (the cost matrix):
worker_id task_id cost
w1 t1 2.0
w1 t2 4.0
… … …
Here is the full IR for this problem:

{
    "problem_class": "AssignmentProblem",
    "model_type": "Mixed Integer Program",
    "sense": "minimize",

    "sets": {
      "Workers": {
        "size": null,
        "index_symbol": "w",
        "source": "sets.csv",
        "column": "element",
        "filter_column": "set_name",
        "filter_value": "workers",
        "ordered": false
      },
      "Tasks": {
        "size": null,
        "index_symbol": "t",
        "source": "sets.csv",
        "column": "element",
        "filter_column": "set_name",
        "filter_value": "tasks",
        "ordered": false
      }
    },

    "parameters": {
      "assignment_cost": {
        "domain": ["Workers", "Tasks"],
        "type": "float",
        "source": "assignment_costs.csv",
        "column": "cost",
        "index_columns": ["worker_id", "task_id"],
        "missing_default": "inf"
      }
    },

    "variables": {
      "assign": {
        "description": "1 if worker w is assigned to task t, 0 otherwise",
        "label": "assignments",
        "domain": ["Workers", "Tasks"],
        "type": "binary",
        "lower_bound": 0,
        "upper_bound": 1,
        "upper_bound_set": null,
        "exclude_diagonal": false,
        "domain_filter": null
      }
    },

    "constraints": {
      "one_task_per_worker": {
        "domain": ["Workers"],
        "expression": {
          "operation": "indexed_sum",
          "over": ["Tasks:t"],
          "body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
        },
        "sense": "=",
        "rhs": {"type": "constant", "value": 1}
      },
      "one_worker_per_task": {
        "domain": ["Tasks"],
        "expression": {
          "operation": "indexed_sum",
          "over": ["Workers:w"],
          "body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
        },
        "sense": "=",
        "rhs": {"type": "constant", "value": 1}
      }
    },

    "objective": {
      "sense": "minimize",
      "expression": {
        "operation": "indexed_sum",
        "over": ["Workers:w", "Tasks:t"],
        "body": {
          "operation": "multiply",
          "left":  {"type": "parameter", "name": "assignment_cost", "indices": ["w", "t"]},
          "right": {"type": "variable",  "name": "assign",          "indices": ["w", "t"]}
        }
      }
    }
  }

Let’s walk through what each section is doing and why the design decisions were made.

Sets

The “sets” field indicates where set members come from. The most important design decision in “sets” is the data source convention. ORPilot requires all set members to live in a single file called sets.csv, using a two-column format: “set_name” and “element”. Every set — entities (workers, tasks, plants) and time sets (periods, months) is a filtered slice of this file. In this problem, the “Workers” field says: load members from sets.csv, read the “element” column, keep only rows where “set_name” column equals “workers”. The result at compile time will be Workers = [“w1”, “w2”, “w3”, “w4”].

This convention has two benefits. First, all master data is in one place. Adding a worker means adding a row to sets.csv, not modifying multiple files. Second, the “filter_value” field is verified against the actual distinct values in sets.csv at IR-generation time, catching typos before the solver code produces empty sets. The “index_symbol” field (“w” for Workers, “t” for Tasks) is the loop variable name that will appear in the complied solver code, e.g., “for w in Workers, for t in Tasks”. It must be chosen to avoid symbol conflicts across nested loops (see the shadow rule below). The “ordered” field is false for both sets here, but it becomes critical for time-indexed models. An ordered set supports temporal lag references, e.g., referencing inventory[t-1] from within a period-t constraint.

Parameters

The “parameters” field links data to the model. The “assignment_cost” parameter has six structural fields.

(1) “domain”: [“Workers”, “Tasks”] — this parameter is indexed by both sets, producing a 2D table.

(2) “type”: “float” — the data type of this parameter is float.

(3) “source”: “assignment_costs.csv” — the exact filename (with extension) that holds the data.

(4) “column”: “cost” — the CSV column that holds the numeric values to load.

(5) “index_columns”: [“worker_id”, “task_id”] — the CSV columns that serve as keys, in the same order as “domain”. The “index_columns” field is one of the most consequential pieces of the IR. Without it, the compiler cannot determine which columns in the CSV correspond to which domain sets. Historically, a common failure mode was the compiler guessing the wrong key column name and silently loading the wrong data. The IR enforces that the correct column names are always supplied explicitly.

(6) “missing_default”: “inf” — tells the compiler that any (worker, task) pair not present in the CSV should be treated as having infinite cost, meaning that route is unavailable. This is the correct semantic for cost and penalty parameters.

Variables

The “variables” field defines the decisions to be made in the optimization model. The “assign” variable is binary, indexed over “domain”: [“Workers”, “Tasks”]. So that at compile time, the compiler builds (assuming using PuLP solver):

assign = {(w, t): pulp.LpVariable(f"assign_{w}_{t}", cat="Binary") for w in Workers for t in Tasks}

Some key structural flags not used here but worth understanding are “exclude_diagonal”, “domain_filter” and “upper_bound_set”.

For variables indexed over the same set twice, like “arc[Location, Location]” in a routing model, setting “exclude_diagonal=true” tells the compiler to skip the (i, i) diagonal. No location travels to itself. The compiler emits an

if l1 == l2: 
   continue

guard and uses “.get(key, 0)” for all accesses so missing keys never cause “KeyError”.

When a cost table has fewer rows than the full Cartesian product of its domain sets (e.g. only valid routes exist in the CSV), setting “domain_filter” to that parameter’s name restricts the variable to only those combinations. The compiler emits the comprehension with “if (i, j) in transport_cost” so non-existent routes are never created as variables.

For integer variables whose natural upper bound is the cardinality of a set (e.g. MTZ position variables in subtour elimination), setting “upper_bound_set”=”Customers” causes the compiler to emit “len(Customers)” as the upper bound, keeping the model data agnostic even when the set size varies between runs.

Constraints

The “constraints” contains an expression trees that describe the constraints defined for this model. This is where the IR diverges most sharply from a code file. Constraints are not stored as strings or code, but they are expression trees. Each constraint has: (1) “domain”: the sets the compiler will loop over to generate one constraint instance per combination. For example, “domain”: [“Workers”] means one constraint per worker. (2) “expression”: the left-hand side, as a recursive tree of nodes. (3) sense: the sign for this constraint, “=” or “<=” or “>=”. (4) “rhs”: the right-hand side, also an expression tree (but containing only constants and parameters, never variables, which must be moved to the LHS). Let’s look at the “one_task_per_worker” constraint closely.

"one_task_per_worker": {
        "domain": ["Workers"],
        "expression": {
          "operation": "indexed_sum",
          "over": ["Tasks:t"],
          "body": {"type": "variable", "name": "assign", "indices": ["w", "t"]}
        },
        "sense": "=",
        "rhs": {"type": "constant", "value": 1}
      },

In the “expression” node above, The “over” field uses the alias “Tasks:t” to explicitly name the loop variable “t” for this inner sum. This is required because “t” is already the index_symbol of the Tasks set, and when the outer constraint domain doesn’t include Tasks, the compiler won’t have a “t” in scope, but the alias forces it to exist inside the sum. Whenever a set in “over” already appears in the constraint’s domain (with the same index_symbol), use an alias to avoid shadowing the outer loop variable. Otherwise the inner “t” would shadow the outer “t”, and the sum would always compute assign[t, t] (a self-loop diagonal) rather than the intended sum.

Objective

In the IR, the objective is written as below.

"objective": {
    "sense": "minimize",
    "expression": {
      "operation": "indexed_sum",
      "over": ["Workers:w", "Tasks:t"],
      "body": {
        "operation": "multiply",
        "left":  {"type": "parameter", "name": "assignment_cost", "indices": ["w", "t"]},
        "right": {"type": "variable",  "name": "assign",          "indices": ["w", "t"]}
      }
    }
}

The outer “indexed_sum” iterates over both Workers and Tasks simultaneously, using aliases “Workers:w” and “Tasks:t” to name both loop variables explicitly. The body is a multiply node, parameter × variable, which is the only form of multiplication the IR allows in a linear model. The result is one term per (worker, task) pair, summed into the total cost.

This is the simplest objective shape: a single indexed sum. More complex objectives combine multiple indexed sums using subtract. Say the model had both assignment cost and a bonus for certain assignments: maximize sum(bonus[w,t] × assign[w,t]) – sum(cost[w,t] × assign[w,t]). That would be encoded as:

subtract(
indexed_sum(over Workers,Tasks: bonus[w,t] × assign[w,t]),
indexed_sum(over Workers,Tasks: cost[w,t] × assign[w,t])
)

One critical rule about subtract: never nest a subtract on the right side of another subtract. Because subtract is a binary operation, left minus right, putting another subtract on the right flips the inner term’s sign:

subtract(A, subtract(B, C))
= A – (B – C)
= A – B + C ← C was supposed to be subtracted but ends up ADDED

Say the objective is revenue – shipping_cost – holding_cost. A common failure mode of LLMs is that they sometimes would group the two costs together on the right:

subtract(revenue, subtract(shipping_cost, holding_cost))
= revenue – (shipping_cost – holding_cost)
= revenue – shipping_cost + holding_cost

This is wrong as the holding cost becomes a revenue. The model still runs and the solver still returns “optimal”, but the objective value is
wrong, inflated by 2 × holding_cost. The correct form is a flat left-to-right chain:

subtract(subtract(revenue, shipping_cost), holding_cost)
= (revenue – shipping_cost) – holding_cost
= revenue – shipping_cost – holding_cost

ORPilot has an IR semantic validator that catches the right-side nesting pattern before compilation and names the specific term whose sign was flipped, so the LLM can fix the chain ordering.

From IR to Solver Code

The IR compiler is a deterministic piece of software — no LLM involved. Given the same ir.json and the same CSV data files, it always produces identical solver code. Always. The compiler currently supports five backends: PuLP, Pyomo, OR-Tools, Gurobi and CPLEX. Switching backends requires zero model changes. The IR is the same; only the compilation target changes. This means you can archive ir.json alongside your data and reproduce any past result exactly, without making a single API call. You can switch from Gurobi to PuLP by running: orpilot compile-ir output/ir.json --solver pulp --run. One command, zero LLM calls, same model structure. You can run CI/CD validation on solver outputs by committing ir.json and running the compiler in your pipeline. You can share ir.json with a colleague on a different machine and they can solve the same model without needing your LLM API key or even understanding the problem from scratch.

The IR Compilation Pipeline

Once you have a validated ir.json, ORPilot offers a lightweight compilation pipeline: ir.json + CSV Data → IR Compiler → Solver Code → Code Execution. This pipeline involves zero LLM calls end to end. It is fast, cheap, and fully deterministic. The only LLM call in the whole workflow was the one that produced the ir.json in the first place. The CLI command is: orpilot compile-ir output/ir.json –run. That compiles the IR, executes the model, and generates a solution report. To switch solvers: orpilot compile-ir output/ir.json –solver pyomo –run.

The IR Semantic Validator

Before an IR is saved and compiled, ORPilot runs a semantic validator that catches modeling errors that are structurally valid JSON but mathematically wrong. The validator currently catches three major categories, which are all common failure modes of LLMS during experiments.

1. Inventory balance sign errors. It detects when all flow variables in a balance constraint end up on the same side (e.g. inv = inflow + outflow instead of inv = inflow – outflow). The correct identity is: ending_inv = beginning_inv + inflow – outflow. Violations of this produce models that are either infeasible (the over-constrained case) or unbounded (the under-constrained case), and the sign error is almost impossible to spot in compiled code.

2. Missing init constraint. If a temporal-lag balance constraint exists, the validator requires a corresponding “_init” variant representing the constraint in the initial time period. A missing init constraint could leave the first period unconstrained, producing an unbounded model even when the subsequent-period constraint is correct.

3. Nested subtract in objective. Sometimes the IR builder LLM would write subtract(A, subtract(B, C)) while it intends to sequentially subtract cost B and C from revenue A. However, mathematically this expression evaluates to A – (B – C) = A – B + C, flipping C’s sign from cost to revenue. The model still solves to “optimal” but the objective value is inflated by 2 × C. The validator detects right-side nesting and names the affected term so the LLM can rewrite the objective as a flat left-to-right chain.

When validation fails, the specific error message is fed back to the LLM as a targeted retry prompt. The LLM does not see “invalid IR”, but it sees a message like “inventory_balance sign error: variable discharge appears to be negative (coefficient -1) but should be subtracted from inflow, not added to it.”

Why IR Matters For What-If Analysis

The IR’s reproducibility and portability properties have a natural extension: systematic what-if analysis. Once a model is solved and its IR is saved, a business user typically wants to explore how the optimal solution changes under different assumptions. What if demand increases by 20% in Q3? What if the cost of raw material rises to $15 per unit? What if we add a constraint that no single supplier accounts for more than 40% of total procurement? The IR structure makes two categories of what-if queries trivially cheap. The first category is data changes. If the question only modifies parameter values (leaving the model structure intact), you only need to update the CSV files. The IR JSON is unchanged. Run the compiler
against the new data and re-solve. This is a zero-LLM-call operation. You can run hundreds of scenarios this way with no API cost.

The second category is structural changes. If the question modifies a constraint, adds a new one, or changes the objective, you edit the IR JSON directly. Because the IR is a typed, schema-validated document with a well-defined expression tree, such edits are localized. Adding a constraint is a matter of appending a new constraint object, but not searching through hundreds of lines
of solver-specific code trying to find where to make the change.
This is a qualitatively different relationship with your optimization model than what any other existing tool offers. Instead of a one-shot artifact, you have a living, editable model structure that you can interrogate and modify independently of the LLM.

The Bigger Picture

The IR addresses something fundamental about the relationship between AI and production software: AI outputs need to be verifiable, portable, and durable. A solver code file generated by an LLM is an opaque blob. If something is wrong, you need the LLM to fix it. If you want to change something, you either understand solver API syntax well enough to edit it yourself, or you call the LLM again. The model lives only as code. The IR decouples the modeling intelligence (which requires an LLM) from the computational step (which does not require an LLM). The LLM’s job is to produce a clean, structured JSON artifact. Once that artifact exists and is validated, it is owned by you, not by the LLM. This design choice, more than anything else in ORPilot, is what makes it suitable for production deployment rather than academic demonstration.



Source link