Campaign Foundations · Foundation 2

Target Selection, Modality & Screening Funnels

How campaigns define the biological objective, choose a molecular strategy, specify success, and allocate effort across a narrowing candidate pool.

From biological objective to design brief

A protein-engineering campaign does not begin with a generative model. It begins with a biological objective: activate or inhibit a pathway, block a molecular interaction, replace a missing function, redirect an existing activity, or create an entirely new one.

Depending on the organization, the target may already have been selected by disease-biology, target-validation, translational, or strategic teams. In other settings, protein engineers contribute directly to target assessment and modality selection. Either way, the computational campaign requires a precise design brief: which molecular state should be engaged, which surface or mechanism should be modulated, where the molecule must function, and which product constraints it must ultimately satisfy.

The target alone does not define the solution. The same biological problem might be approached with an antibody, VHH, mini-protein, peptide, enzyme, receptor trap, or multispecific construct. Each modality introduces different structural constraints, expression systems, assay requirements, developability risks, and manufacturing paths.

Campaign architecture begins by translating the biological objective into a target product profile, selecting a plausible modality, and defining the evidence required to advance a candidate.

Worked example: PD-L1

The first applied volume uses PD-L1 to demonstrate these decisions. PD-L1 is structurally well characterized, biologically important, and sufficiently challenging to expose the tradeoffs involved in interface design. The campaign principles are general; the chosen epitope, hotspot residues, molecular geometry, and assays are specific to this target.

Biological and structural context

PD-L1 is an immune-checkpoint ligand expressed by multiple cell types and frequently upregulated by tumors. Its interaction with PD-1 on activated T cells suppresses immune signaling. Blocking the PD-1–PD-L1 interaction is therefore an established immunotherapy strategy.

The interface is captured in multiple co-complex structures, including PDB 4ZQK at 2.45 Å resolution, and presents a comparatively broad, shallow surface containing both polar and hydrophobic contacts. These features make it a useful case study for examining target preparation, hotspot selection, binding-site geometry, and interface validation.

Worked modality decision: de novo mini-protein binder

A brief such as “inhibit PD-L1” does not specify how the intervention should be achieved. An antibody, VHH, peptide, receptor-derived construct, or de novo mini-protein could each engage the target, but each would produce a different design and development campaign.

Volume 1 selects a de novo mini-protein binder targeting the PD-1-binding face of PD-L1. This modality is well suited to demonstrating modern generative design because compact backbones can be generated, sequence-designed, and structurally assessed using accessible computational infrastructure.

Volume 2 returns to the same target using a scaffold-constrained VHH, allowing the consequences of modality choice to be compared directly.

Success criteria

Before candidates are generated, the campaign needs an explicit definition of success. These criteria should connect the intended biological function to measurable properties such as affinity, specificity, stability, expression, oligomeric state, and manufacturability.

Thresholds are not universal. They depend on the modality, intended use, assay format, development stage, and acceptable tradeoffs. Early-round criteria are often permissive because the objective is to identify usable starting points rather than finished products.

Illustrative target product profile for the PD-L1 case study

Criterion Illustrative target Assay
Binding Detectable target-dependent binding BLI or SPR
Affinity KD < 100 nM in Round 1; < 10 nM after optimization SPR kinetics or equilibrium BLI
Functional blocking Inhibits the PD-1–PD-L1 interaction Competition BLI or blocking ELISA
Specificity Minimal binding to defined counter-screen proteins BLI, SPR, or plate-based counter-screen
Oligomeric state Predominantly monomeric SEC
Thermal stability Tm > 50°C DSF
Soluble yield > 1 mg/L recovered from E. coli A280 or BCA after purification
Scope of the PD-L1 case study The computational campaign produces a ranked shortlist predicted to satisfy parts of this product profile. Expression, binding, blocking activity, specificity, and developability remain experimental questions. The output of the first design round is therefore a structured and testable hypothesis set, not a finished product.

Build the screening funnel

Every campaign is shaped by one structural reality: each stage downstream is generally more expensive, slower, and lower throughput than the stage before it. This cost gradient underpins the screening funnel. The campaign begins broadly, using inexpensive and approximate evidence, then progressively narrows the candidate pool while increasing the information collected per candidate.

The campaign-level funnel

Before examining the individual computational stages, it helps to place computational design within the larger experimental screening hierarchy. Computational methods may identify promising scaffolds, poses, sequence families, mutable positions, or individual candidates. Depending on the campaign, these outputs may be tested directly or expanded into larger experimental libraries.

Computational design
Scaffolds, poses, sequence families, mutations, or individual candidates
Library construction and ultra-high-throughput screening
Display, droplets, or pooled selection · 10⁶–10⁸
High-throughput screening
Plate-based · 10³–10⁵
Mid-throughput screening
Biophysical characterization · 10²
Hit identification
10¹
Lead validation

Figure 1. A generic campaign-level screening funnel. Computational design identifies promising scaffolds, interfaces, sequence families, mutable positions, or individual candidates. These hypotheses may be tested directly or expanded combinatorially. Subsequent stages progressively reduce candidate numbers while increasing the cost and information obtained per candidate.

Candidate count versus information depth

The fundamental tension at every stage is between throughput and information depth. Near the top of the funnel, filters are fast and cheap but rely on incomplete or approximate evidence. Near the bottom, assays measure properties closer to the intended function, but each candidate requires more time, material, and money.

Filters that are too strict may eliminate unusual candidates that could have succeeded experimentally. Filters that are too permissive consume downstream capacity on avoidable failures. There is no universally correct threshold: stringency depends on budget, downstream throughput, confidence in the available measurements, and the relative cost of false negatives and false positives.

Funnel stage (top → bottom) Candidate count / cost per candidate Comp. design uHTS HTS Mid-throughput Hit ID Lead val. Candidate count Cost per candidate

Figure 2. Candidate count and cost per candidate across a generic screening hierarchy. As the pool narrows, each surviving candidate typically receives more expensive and information-rich evaluation.

Two scales

📓 Portfolio case-study scale

Designed to run on accessible Colab infrastructure. Hundreds of candidates can pass through inexpensive structural filters, while a smaller subset receives more detailed interface analysis, simulation, and independent complex assessment. This scale is sufficient to demonstrate the workflow and the reasoning behind each decision gate.

🏭 Illustrative production scale

A larger campaign may generate thousands to millions of computational or experimental variants, depending on the modality and screening platform. Inexpensive filters are applied broadly, while progressively more expensive analyses are reserved for smaller subsets. The funnel logic remains the same even when its scale and exact stages change.

What happens after computational design

Computational design methods primarily assess properties that can be inferred from sequence and structure: geometric plausibility, sequence–structure compatibility, fold recovery, interface organization, energetic proxies, and prediction confidence. These outputs do not directly establish soluble expression, aggregation resistance, thermal stability, biological function, or manufacturability.

Expression and purification

A well-designed high-throughput workflow can move from DNA design to assay-ready protein in approximately three to four weeks for a 96-well plate of constructs. Each step has its own failure modes, from codon optimization and expression-host choice through purification and concentration.

The goal is not merely to manufacture protein. It is to create a parallelized filter that reveals which designs are tractable enough to justify more expensive binding and functional assays.

Developability

Developability is the collection of biophysical and chemical properties that determine whether a protein can be manufactured, formulated, stored, and used reliably. Relevant risks include aggregation propensity, chemical liabilities, unpaired cysteines, surface hydrophobicity, unfavorable charge distributions, and concentration-dependent instability.

A protein can perform well in a binding assay and still fail as a development candidate. Campaigns should therefore identify developability problems as early as practical, rather than after extensive characterization of a molecule that cannot be produced or handled reliably.

What does attrition actually look like?
A first-round campaign usually casts a wide net. Some constructs will not express detectably. Others will express but remain insoluble, aggregate during purification, or fail to adopt the expected oligomeric state. A further subset may be soluble and well behaved but show no measurable interaction with the target.

The exact attrition rate depends on the modality, protein family, expression system, design method, and screening strategy. The general pattern is consistent: significant losses occur before functional performance is measured. This is why even a modest number of confirmed starting hits can be valuable. A weak but tractable binder can often be optimized; a molecule that cannot be expressed, purified, or handled is much harder to rescue.

Decision gates

A ranked computational shortlist enters the remaining stages of the Design-Build-Test-Learn cycle.

Build: Synthesize DNA, clone into expression vectors, transform the selected host, express the candidates, and purify them. Each operation is both a manufacturing step and a source of biological information.

Test: Apply decision gates in a deliberate order: expression, solubility, monomeric purification, detectable binding, dose response, functional activity, specificity, and stability. Expensive assays should not be consumed by candidates that have already failed more basic requirements.

Learn: Link experimental outcomes back to computational features in a structured data system. Which generation conditions produced tractable proteins? Which structural metrics correlated with expression or binding? Which failure modes were predictable, and which were not? These answers make the next design round better informed.

The feedback loop is the actual product of a modern design campaign. Not the individual predictions and not the individual assay results, but the structured connection between computational features and experimental outcomes that makes each round better informed than the last.