Campaign Foundations · Foundation 2
How campaigns define the biological objective, choose a molecular strategy, specify success, and allocate effort across a narrowing candidate pool.
A protein-engineering campaign does not begin with a generative model. It begins with a biological objective: activate or inhibit a pathway, block a molecular interaction, replace a missing function, redirect an existing activity, or create an entirely new one.
Depending on the organization, the target may already have been selected by disease-biology, target-validation, translational, or strategic teams. In other settings, protein engineers contribute directly to target assessment and modality selection. Either way, the computational campaign requires a precise design brief: which molecular state should be engaged, which surface or mechanism should be modulated, where the molecule must function, and which product constraints it must ultimately satisfy.
The target alone does not define the solution. The same biological problem might be approached with an antibody, VHH, mini-protein, peptide, enzyme, receptor trap, or multispecific construct. Each modality introduces different structural constraints, expression systems, assay requirements, developability risks, and manufacturing paths.
Campaign architecture begins by translating the biological objective into a target product profile, selecting a plausible modality, and defining the evidence required to advance a candidate.
The first applied volume uses PD-L1 to demonstrate these decisions. PD-L1 is structurally well characterized, biologically important, and sufficiently challenging to expose the tradeoffs involved in interface design. The campaign principles are general; the chosen epitope, hotspot residues, molecular geometry, and assays are specific to this target.
PD-L1 is an immune-checkpoint ligand expressed by multiple cell types and frequently upregulated by tumors. Its interaction with PD-1 on activated T cells suppresses immune signaling. Blocking the PD-1–PD-L1 interaction is therefore an established immunotherapy strategy.
The interface is captured in multiple co-complex structures, including PDB 4ZQK at 2.45 Å resolution, and presents a comparatively broad, shallow surface containing both polar and hydrophobic contacts. These features make it a useful case study for examining target preparation, hotspot selection, binding-site geometry, and interface validation.
A brief such as “inhibit PD-L1” does not specify how the intervention should be achieved. An antibody, VHH, peptide, receptor-derived construct, or de novo mini-protein could each engage the target, but each would produce a different design and development campaign.
Volume 1 selects a de novo mini-protein binder targeting the PD-1-binding face of PD-L1. This modality is well suited to demonstrating modern generative design because compact backbones can be generated, sequence-designed, and structurally assessed using accessible computational infrastructure.
Volume 2 returns to the same target using a scaffold-constrained VHH, allowing the consequences of modality choice to be compared directly.
Before candidates are generated, the campaign needs an explicit definition of success. These criteria should connect the intended biological function to measurable properties such as affinity, specificity, stability, expression, oligomeric state, and manufacturability.
Thresholds are not universal. They depend on the modality, intended use, assay format, development stage, and acceptable tradeoffs. Early-round criteria are often permissive because the objective is to identify usable starting points rather than finished products.
| Criterion | Illustrative target | Assay |
|---|---|---|
| Binding | Detectable target-dependent binding | BLI or SPR |
| Affinity | KD < 100 nM in Round 1; < 10 nM after optimization | SPR kinetics or equilibrium BLI |
| Functional blocking | Inhibits the PD-1–PD-L1 interaction | Competition BLI or blocking ELISA |
| Specificity | Minimal binding to defined counter-screen proteins | BLI, SPR, or plate-based counter-screen |
| Oligomeric state | Predominantly monomeric | SEC |
| Thermal stability | Tm > 50°C | DSF |
| Soluble yield | > 1 mg/L recovered from E. coli | A280 or BCA after purification |
Every campaign is shaped by one structural reality: each stage downstream is generally more expensive, slower, and lower throughput than the stage before it. This cost gradient underpins the screening funnel. The campaign begins broadly, using inexpensive and approximate evidence, then progressively narrows the candidate pool while increasing the information collected per candidate.
Before examining the individual computational stages, it helps to place computational design within the larger experimental screening hierarchy. Computational methods may identify promising scaffolds, poses, sequence families, mutable positions, or individual candidates. Depending on the campaign, these outputs may be tested directly or expanded into larger experimental libraries.
Figure 1. A generic campaign-level screening funnel. Computational design identifies promising scaffolds, interfaces, sequence families, mutable positions, or individual candidates. These hypotheses may be tested directly or expanded combinatorially. Subsequent stages progressively reduce candidate numbers while increasing the cost and information obtained per candidate.
The fundamental tension at every stage is between throughput and information depth. Near the top of the funnel, filters are fast and cheap but rely on incomplete or approximate evidence. Near the bottom, assays measure properties closer to the intended function, but each candidate requires more time, material, and money.
Filters that are too strict may eliminate unusual candidates that could have succeeded experimentally. Filters that are too permissive consume downstream capacity on avoidable failures. There is no universally correct threshold: stringency depends on budget, downstream throughput, confidence in the available measurements, and the relative cost of false negatives and false positives.
Figure 2. Candidate count and cost per candidate across a generic screening hierarchy. As the pool narrows, each surviving candidate typically receives more expensive and information-rich evaluation.
Designed to run on accessible Colab infrastructure. Hundreds of candidates can pass through inexpensive structural filters, while a smaller subset receives more detailed interface analysis, simulation, and independent complex assessment. This scale is sufficient to demonstrate the workflow and the reasoning behind each decision gate.
A larger campaign may generate thousands to millions of computational or experimental variants, depending on the modality and screening platform. Inexpensive filters are applied broadly, while progressively more expensive analyses are reserved for smaller subsets. The funnel logic remains the same even when its scale and exact stages change.
Computational design methods primarily assess properties that can be inferred from sequence and structure: geometric plausibility, sequence–structure compatibility, fold recovery, interface organization, energetic proxies, and prediction confidence. These outputs do not directly establish soluble expression, aggregation resistance, thermal stability, biological function, or manufacturability.
A well-designed high-throughput workflow can move from DNA design to assay-ready protein in approximately three to four weeks for a 96-well plate of constructs. Each step has its own failure modes, from codon optimization and expression-host choice through purification and concentration.
The goal is not merely to manufacture protein. It is to create a parallelized filter that reveals which designs are tractable enough to justify more expensive binding and functional assays.
Developability is the collection of biophysical and chemical properties that determine whether a protein can be manufactured, formulated, stored, and used reliably. Relevant risks include aggregation propensity, chemical liabilities, unpaired cysteines, surface hydrophobicity, unfavorable charge distributions, and concentration-dependent instability.
A protein can perform well in a binding assay and still fail as a development candidate. Campaigns should therefore identify developability problems as early as practical, rather than after extensive characterization of a molecule that cannot be produced or handled reliably.
A ranked computational shortlist enters the remaining stages of the Design-Build-Test-Learn cycle.
Build: Synthesize DNA, clone into expression vectors, transform the selected host, express the candidates, and purify them. Each operation is both a manufacturing step and a source of biological information.
Test: Apply decision gates in a deliberate order: expression, solubility, monomeric purification, detectable binding, dose response, functional activity, specificity, and stability. Expensive assays should not be consumed by candidates that have already failed more basic requirements.
Learn: Link experimental outcomes back to computational features in a structured data system. Which generation conditions produced tractable proteins? Which structural metrics correlated with expression or binding? Which failure modes were predictable, and which were not? These answers make the next design round better informed.