Case Study | Top-5 Biopharma with Active Digital Twin Program

Operationalizing a digital twin by closing the Ambr250 data gap.

A Top-5 Biopharma had invested heavily in a bioprocess digital twin for upstream process development — over $2M in CapEx and six months of data science work to build the model. It worked. But it couldn't run properly, because the live run data it required wasn't accessible.

<1 moto go live vs. 6+ months for PI

$2M+digital twin investment unlocked

Real-timedata feed to digital twin

Organization TypeTop-5 Biopharma with Active Digital Twin Program

Investment$2M+ in digital twin

BottleneckAmbr250 data isolation

Integration TimeDays, not months

The Challenge

The company had committed significant resources to building a bioprocess digital twin for upstream process development: over $2M in Ambr250 hardware, dedicated data science headcount, and six months of modeling work. The digital twin was functional. The data infrastructure required to operate it was not.

The problem had two layers: no automated pipeline to the company's centralized data platform, and parameter naming inconsistent across instruments and scales — the digital twin required a unified namespace to correctly apply incoming data to its models.

An isolated system at the center of process development

Most other bioreactors in the fleet fed data automatically through a PI historian but the Ambr250 didn't. It ran on its own isolated control PC, connected to the broader network only through a separate bridge PC, with no direct path to the company's data infrastructure.

Manual data transfer blocking the digital twin

Without an automated pipeline, scientists manually extracted and sent data snippets to the data science team multiple times per day. Each snippet required reconciliation: matching time-series data to the correct batch, associating offline instrument results, and aligning experimental context from the ELN.

Inconsistent data structure was a second blocker

Even when data arrived on time, the digital twin couldn't reliably consume it. The model had been built around specific parameter definitions, but those same parameters were labeled differently across instruments and bioreactor scales. Without a harmonized data layer, deploying the model against new run data required manual tag reconciliation every time.

How Invert Solved It

Invert solved both problems faster than a PI integration could. Direct OPC and file-based connection to the Ambr250, capturing more data than PI could access. Offline instrument results and ELN metadata unified into a harmonized, batch-centric dataset. Everything exposed via API for the centralized data platform to consume in near real-time. The integration went live in days.

From an isolated Ambr250 to a real-time feed for the digital twin and the company's centralized data platform.

Direct Ambr250 integration, stood up in days

Invert connected to the Ambr250 via a hybrid OPC and file-based integration, capturing not just time-series process parameters but also temporary files, system logs, and event data that PI could not access. No changes to the Ambr250 system were required.

A unified namespace the digital twin could consume

Invert's ontology manager mapped parameters across all instruments and bioreactor scales to the canonical definitions the model was trained on — dissolved oxygen was dissolved oxygen, regardless of which system generated it. Manual tag reconciliation on every run was eliminated.

Batch-centric unification across instruments and systems

The digital twin required more than Ambr250 time-series data. It needed offline instrument results and experimental context joined to each batch. Invert automated that linkage — as soon as a run was created, records across the Ambr250, offline instruments, and the ELN were joined without manual reconciliation.

API-First Architecture

Feeding the centralized data lake

The company didn't want Invert to replace its centralized data platform — it wanted Invert to feed it. Invert's API gave the team a programmatic interface to pull structured, harmonized data — parent metrics, formulas, and unit conversions — directly into their modeling environment on an automated basis.

The six historical Ambr250 experiments were also ingested during implementation, creating a structured archive that could be queried, compared, and used for model training alongside incoming live data.

Results

The digital twin program is now operational — giving the data science team the proof points needed to secure additional funding and resources.

Integration went live in under a month, compared to the 6+ months a PI or systems integration route would have required.

Data science FTEs shifted from daily manual data reconciliation to model development.

End user adoption is increasing as scientists engage directly with the digital twin for the first time, realizing the benefits of a program years in the making.

Process optimization is accelerating as the digital twin receives higher-quality, real-time data with every run.