Part 4: Energy Minimization & MD Simulation

Do the interfaces survive when we let the atoms move?

Previously, in Part 3: We fingerprinted residue-level contacts between all 31 passing designs and PD-L1. The designs converge on a shared surface patch centered on residues 119–122, but hotspot recovery is partial, interfaces are small (mean BSA ~648 Å²), and no sidechain–sidechain nearest contacts appeared in the unrelaxed models. This page asks the next question: when we energy-minimize these complexes, solvate them, and run molecular dynamics, do the interfaces hold up or fall apart?

Page 1 — Pipeline overview Page 2 — Setup & parameter sensitivity Page 3 — Structure & interface analysis Page 4 — Energy minimization & MD

What we found

All three binders retain their fold over 1 ns of MD: backbone RMSD stays below 2 Å for two of three systems, and the radius of gyration is stable across all three.
Interface contacts are maintained or increase during simulation. The Distributed design sustains ~40 residue contacts with 11 at 100% occupancy. Cluster B starts with 33 and finishes with 39.
Persistent contact pairs include candidate salt bridges, polar contacts, and aromatic contacts — interaction types that were topological labels on unrelaxed structures in Part 3 but now show sustained proximity under thermal motion.
Hotspot coverage under MD is stronger than static analysis suggested: Cluster B contacts 4 of 5 specified hotspot residues with ≥50% occupancy; Distributed contacts 5 of 8.
The full pipeline — from sequence threading through 1 ns production MD and analysis — runs in about 1–1.5 hours per system on a free Colab T4 GPU.

Why run MD on these complexes?

In Part 3, every interface analysis came with a caveat: the structures were not energy-minimized, sidechains were not packed in the context of PD-L1, and interaction labels were topological approximations rather than confirmed chemistry. Molecular dynamics addresses those caveats directly. Energy minimization resolves steric clashes. Equilibration relaxes sidechains in their binding-site context. Production MD then asks whether the interface survives thermal motion at 300 K — whether contacts persist, whether the binder stays engaged, and whether the fold itself remains intact.

What is molecular dynamics?

Molecular dynamics (MD) is a computational method that simulates the physical movements of atoms over time. Each atom in the system (protein, water, ions) is assigned a starting position and velocity. At each timestep, the forces on every atom are computed from a potential energy function called a force field — which encodes bonded terms (bond stretches, angle bends, torsions) and nonbonded terms (van der Waals interactions, electrostatics). Newton’s equations of motion are then integrated numerically to update positions and velocities, advancing the simulation by a tiny time increment (here, 1 femtosecond = 10⁻¹⁵ seconds).

The result is a trajectory: a time series of atomic coordinates that captures how the system evolves under thermal motion. This lets us ask dynamic questions that a single static structure cannot answer. Does the binder stay folded? Do interface contacts persist, or do they break and reform? Does the complex drift apart? MD does not predict binding affinity, but it can identify designs that are immediately unstable — and distinguish persistent contacts from transient ones.

The force field we use here, AMBER14 (ff14SB), is parameterized for proteins and nucleic acids. Water is modeled explicitly as individual TIP3P molecules rather than as a continuum, which captures solvation effects at the cost of more computation. The system is maintained at constant temperature (300 K) and pressure (1 bar) using a Langevin thermostat and Monte Carlo barostat, mimicking experimental conditions.

This is not a binding-affinity measurement. One nanosecond of MD cannot determine whether a design will bind PD-L1 experimentally. What it can do is eliminate designs that fall apart immediately: binders that unfold, interfaces that drift, contacts that vanish. Surviving MD is a necessary but not sufficient condition for binding.

Selecting systems & a note on individual vs. average behavior

We picked three designs, one from each hotspot configuration (Cluster A, Distributed, Cluster B), to run through the full MD pipeline. The goal was to test the simulation workflow on each design strategy, not to find the single best design.

Individual designs vs. cluster averages

In Part 3, we reported average behavior across all designs within a hotspot configuration: Cluster B designs as a group showed fewer contacts and lower hotspot coverage than Distributed. The individual Cluster B design simulated here may differ from that group average — and in fact performs quite well, with 32 mean contacts and 4/5 hotspot residues engaged. This is expected: the Part 3 averages describe the central tendency across ~10 designs per condition, while MD tests one specific design in detail. Readers should not be surprised if the behavior of an individual design departs from the average of its cluster.

The MD protocol

What is energy minimization?

Before running dynamics, we minimize the potential energy of the system. Predicted structures contain steric clashes (atoms too close together) and unresolved sidechain conformations that would cause the simulation to crash or produce artifacts. Energy minimization iteratively adjusts atom positions to find a nearby local energy minimum — resolving these clashes without fundamentally changing the protein’s conformation. Think of it as letting the structure “relax” into a physically plausible geometry before introducing thermal motion.

What does equilibration do?

After minimization, the system needs to equilibrate: atoms are assigned random velocities consistent with the target temperature (300 K), and the simulation is run with position restraints on the protein backbone. This allows the solvent (water and ions) to reorganize around the protein, the box volume to adjust to the target pressure, and sidechain conformations to relax — all without allowing the protein backbone to drift from its starting geometry. We run two stages: NVT (constant volume, 100 ps, strong restraints) to stabilize temperature, then NPT (constant pressure, 200 ps, lighter restraints) to equilibrate density. Only after equilibration do we release all restraints for production MD.

What “sequence threading” means here

The input to MD is not the raw RFdiffusion backbone (which has poly-glycine binder residues). Instead, the ProteinMPNN-designed sequence was threaded onto the RFdiffusion docked geometry: binder backbone atoms are kept in place, residue names are replaced with the designed sequence, and ESMFold-predicted sidechains are superimposed by Cα alignment. This gives the forcefield real residue identities and plausible sidechain conformations to work with during minimization.

Sequence-threaded complex → PDBFixer (missing atoms, pH 7) → Solvate (TIP3P, 0.15 M NaCl) → Energy minimize → NVT equilibration (100 ps) → NPT equilibration (200 ps) → Production MD (1 ns)

Parameter	Value
Force field	AMBER14 (ff14SB) + TIP3P explicit water
Temperature	300 K (Langevin thermostat)
Pressure	1 bar (Monte Carlo barostat)
Timestep	1 fs (conservative; HBonds constraints allow 2 fs)
Ionic strength	0.15 M NaCl
Solvent padding	1.0 nm
Production length	1 ns (100 frames saved at 10 ps intervals)
Platform	Google Colab T4 GPU (OpenCL)

Compute cost per system

Each solvated system contains ~185 protein residues and roughly 35,000–40,000 total atoms including water and ions. On a Google Colab T4 GPU, the full pipeline takes approximately 1–1.5 hours per design:

Stage	Duration	Steps
Solvation & preparation	~2 min	—
Energy minimization	~3–5 min	~1,000
NVT equilibration (100 ps)	~5–10 min	100,000
NPT equilibration (200 ps)	~10–15 min	200,000
Production MD (1 ns)	~30–60 min	1,000,000
Analysis & trajectory extraction	~5 min	—
Total per system	~1–1.5 hr	1,300,000

Three systems ran sequentially in a single Colab session (roughly 3–5 hours total). The notebook is restart-safe: if Colab disconnects mid-run, completed stages are detected on disk and skipped on restart, so only the interrupted stage needs to re-run.

Compute context across the pipeline

For comparison: RFdiffusion generates ~50 backbones per condition in minutes, ProteinMPNN sequences each backbone in seconds, and ESMFold predicts a structure in ~10–30 seconds per design. MD is by far the most expensive step: one nanosecond for one design costs more GPU time than the entire upstream pipeline for a full condition. This is why MD is used selectively on prioritized candidates rather than applied to all 31 passing designs.

The three test systems

Design	Hotspot config	Condition	pLDDT	Pre-MD RMSD (Å)
Cluster A `design_19_0_rank0`	18, 20, 120, 122	len70, noise 0.5	94.56	0.24
Distributed `design_17_0_rank1`	18, 20, 26, 56, 113, 120, 122, 125	len70, noise 0	94.33	0.33
Cluster B `design_19_0_rank1`	26, 56, 113, 123, 125	len70, noise 0	93.97	0.44

All three are 70-residue helical binders with high ESMFold confidence (pLDDT > 93) and low pre-MD alignment RMSD (< 0.5 Å), confirming good fold recapitulation before simulation begins.

Simulation trajectory

The viewer below shows the Distributed design’s production trajectory — 100 frames spanning 1 ns of simulation at 300 K. PD-L1 (gray) is held in a consistent orientation; the binder (green) moves freely. Use the playback controls to step through or animate the trajectory.

Loading trajectory…

Frame 0 / 99 — 0.00 ns

Figure 1. Production MD trajectory of the Distributed design (green) docked to PD-L1 (gray), 100 frames over 1 ns. The binder remains engaged with the target surface throughout. The multi-model PDB was generated by extracting protein-only coordinates from the solvated trajectory and aligning each frame on PD-L1 Cα atoms (target-locked view).

Do the binders hold their fold?

The first question is structural: does each binder maintain its predicted fold under thermal motion, or does it unfold? Backbone RMSD (aligned to the binder’s own starting structure) tracks fold deviation over time, while the radius of gyration (R_g) monitors compactness.

What are RMSD and R_g?

RMSD (root mean square deviation) measures how far a structure has moved from a reference. For binder fold stability, we align each MD frame’s binder Cα atoms onto the starting frame’s binder Cα atoms, then compute the average displacement. An RMSD of 1.5 Å means the backbone has shifted by about 1.5 Å on average from where it started — typical of a protein maintaining its fold with some thermal breathing. An RMSD climbing above 4–5 Å would suggest unfolding.

R_g (radius of gyration) is the mass-weighted average distance of atoms from the molecule’s center of mass. It measures how compact the protein is. A stable R_g means the binder is neither expanding (unfolding) nor collapsing. For a 70-residue helical bundle, we expect R_g around 16–17 Å.

Figure 2. Binder backbone RMSD and radius of gyration over 1 ns of production MD. All three binders remain folded. The Distributed design shows the largest RMSD excursion (~2.8 Å final) but its R_g is stable, suggesting local rearrangement rather than unfolding.

All three binders survive MD intact. Cluster A is the most rigid (mean RMSD 1.40 Å), Cluster B is intermediate (1.49 Å), and Distributed shows more motion (1.93 Å) but no sign of unfolding. R_g standard deviations are ≤0.18 Å across all three — these are compact helical bundles that remain compact.

Does the interface persist?

Binder fold stability is necessary but not sufficient. The binder also needs to stay engaged with PD-L1. Residue contact count over time (using a 4.5 Å heavy-atom cutoff) tracks whether the interface is maintained, growing, or eroding.

How are contacts counted?

For each frame of the trajectory, we identify all binder–PD-L1 residue pairs where any heavy atom (non-hydrogen) on the binder residue is within 4.5 Å of any heavy atom on the PD-L1 residue. This is computed using MDTraj’s compute_contacts with the “closest-heavy” scheme, which reports the minimum heavy-atom distance for each residue pair. A pair is “in contact” if that distance is below the cutoff. The count of such pairs per frame gives the residue contact number; summing all sub-cutoff atom pairs gives the atom contact number.

Figure 3. Interface metrics over 1 ns. Residue and atom contact counts track interface engagement; buried surface area (BSA) measures the total solvent-excluded area at the interface; minimum interface distance reports the single closest heavy-atom pair across chains (useful as a dissociation detector but dominated by one atom pair). The interfaces do not dissociate over this trajectory.

The interfaces remain engaged over 1 ns. Distributed maintains ~40 residue contacts throughout (started 38, ended 38) and has the largest BSA (mean 925 Å²). Cluster B gains contacts during simulation (33 → 39) with a BSA of 825 Å². Cluster A has the fewest contacts (mean ~20) and smallest BSA (470 Å²), consistent with its compact helical interface. At the atom level, the Distributed design shows a decline in atom contacts (255 → 175 over the trajectory) even as residue-level contacts hold steady — this likely reflects interface rearrangement rather than dissociation. Proteins sample conformational substates across multiple timescales even while remaining bound, and one should not expect contact counts to grow unidirectionally; fluctuations and rearrangements are normal features of a dynamic interface.

Which contacts persist?

Not all contacts are equal. A residue pair in contact for 100% of the trajectory is more informative than one that flickers on and off. Contact persistence analysis assigns each binder–PD-L1 residue pair an occupancy (fraction of frames in contact) and a maximum residence time (longest continuous stretch). This is where the “candidate” interaction labels from Part 3 start to gain credibility: a salt bridge candidate that maintains sub-3.5 Å distance for the entire nanosecond is much more plausible than one that appeared once in a static snapshot.

What are “candidate” interaction types?

In Part 3, we classified contacts by the chemical identity of the residue pair: ARG–ASP is labeled “candidate salt bridge” because one carries a positive charge and the other negative. GLN–ALA near a backbone is “candidate polar contact.” PHE–TYR is “candidate aromatic contact.” These labels reflect residue-class compatibility, not confirmed bond geometry. MD occupancy adds a second line of evidence: if a candidate salt bridge pair stays within bonding distance for 100% of a 1 ns trajectory, the interaction is substantially more plausible — though formally confirming it still requires checking donor–acceptor distances (<3.2 Å), angles (>120°), and charge states.

Figure 4. Top 20 most persistent binder–PD-L1 contact pairs by occupancy. Sustained proximity under MD increases confidence in the candidate interaction labels assigned in Part 3, but does not confirm bond formation without geometric validation.

Charged and polar contacts dominate the persistent interface. The Distributed design has 11 pairs at 100% occupancy, including GLU24–ARG113 and GLU24–ARG125 (candidate salt bridges at 2.73 and 2.77 Å mean distance). Cluster B maintains PHE19–TYR56 (candidate aromatic contact, 100% occupancy) and ASP50–ARG113 (candidate salt bridge, 100%). These sustained sub-3.5 Å contacts under thermal motion are substantially more credible than the static-snapshot labels from Part 3, though geometric validation (bond angles, donor–acceptor distances) is still needed to confirm specific interaction types.

Hotspot contacts under MD

In Part 3, hotspot recovery was partial: the average Cluster B design showed lower coverage than Distributed. When we examine the specific designs simulated here and look at which hotspot residues are contacted with ≥50% occupancy during MD, the picture is different — and more encouraging.

Figure 5. Hotspot residues contacted with ≥50% occupancy during 1 ns MD. Cluster B contacts 4 of 5 specified hotspots (Tyr56, Arg113, Tyr123, Arg125); Distributed contacts 5 of 8 (Tyr56, Arg113, Gly120, Asp122, Arg125). Cluster A contacts only Gly120 among its specified hotspots.

MD recovers hotspot contacts that static analysis missed. The improvement is most dramatic for the Cluster B design tested here. This is consistent with sidechains relaxing into more favorable geometries during simulation — the effect we predicted when noting that pre-relaxation fingerprinting was insufficient for helical binders. It also illustrates why individual designs can outperform their cluster’s average behavior.

Simulation health

Before interpreting MD results, we need to confirm the simulation itself ran correctly. A simulation that crashes, produces unphysical temperatures, or fails to equilibrate density will generate meaningless trajectories. This is the computational equivalent of checking that your instrument is calibrated before reading the data.

What does “simulation health” mean?

OpenMM’s StateDataReporter logs system properties at every reporting interval: temperature (should fluctuate around 300 K), potential energy (should stabilize after minimization, not drift), and density (should converge to ~1.0 g/cm³ for a properly solvated aqueous system during NPT equilibration). Anomalies in any of these indicate force field errors, insufficient equilibration, or numerical instability. We also check that no atoms have “exploded” — coordinates diverging to infinity, which would indicate a timestep that is too large or unresolved clashes.

Figure 6. Simulation health metrics during production MD. Temperature should fluctuate around the thermostat setpoint (300 K). Potential energy should be stable (no drift). Density should remain near 1.0 g/cm³. Data from OpenMM StateDataReporter production logs.

Summary scorecard

Metric	Cluster A	Distributed	Cluster B	What it means
Binder RMSD, mean (Å)	1.40	1.93	1.49	Fold stability
Binder RMSD, final (Å)	1.44	2.83	1.45	Final fold deviation
R_g std (Å)	0.18	0.11	0.12	Compactness stability
Contacts, mean	19.9	39.8	32.2	Interface size (contacts)
BSA, mean (Å²)	470	925	825	Interface size (buried area)
Contacts at 100% occupancy	7	11	7	Core persistent contacts
Contacts at ≥80% occupancy	14	33	22	Stable interface shell
Hotspots contacted (≥50%)	1 / 4	5 / 8	4 / 5	Intended epitope retention
RMSF mean (Å)	0.86	1.12	0.95	Per-residue flexibility

Bottom line

Short MD simulations moved these three designs from static structural predictions to dynamically relaxed complexes. Over 1 ns, all three binders remain compact and folded, and none of the interfaces dissociates after minimization, solvation, equilibration, and unrestrained production dynamics. Persistent contact analysis identifies several residue pairs that remain proximal throughout the trajectory, including charged, polar, hydrophobic, and aromatic contact candidates. Hotspot recovery under MD is substantially better than static fingerprinting predicted, especially for the Cluster B design tested here.

This is encouraging but deliberately limited evidence. A 1 ns trajectory cannot establish binding affinity, specificity, off-rate, or long-timescale interface stability. Contact persistence does not confirm hydrogen bonds, salt bridges, or aromatic stacking without explicit geometric validation. The value of this notebook is that it adds a stress-test layer to the computational design workflow: it demonstrates fold stability assessment, interface persistence tracking, trajectory handling, restart-safe Colab MD, and contact analysis — tools that would be applied at larger scale in a real design campaign.

Caveats

1 ns is very short. Many biologically relevant conformational changes occur on microsecond-to-millisecond timescales. A 1 ns simulation catches immediate instability (unfolding, interface dissociation, steric clashes), but it cannot sample slow binding/unbinding events, large domain motions, or solvent reorganization. A stable 1 ns trajectory is necessary but not sufficient evidence for a good interface.

Three systems are too few for general conclusions. We tested one representative per hotspot configuration. This demonstrates the MD workflow and asks whether individual complexes are physically plausible. It is not a statistical comparison of design strategies.

Force-field accuracy is approximate. AMBER14/TIP3P is a well-validated combination for protein simulation, but forcefield artifacts are real. We report stability metrics (RMSD, contact persistence), not binding energies, because the latter carry systematic errors that can exceed the differences between designs.

Contact occupancy is not bond confirmation. A “candidate salt bridge” at 100% occupancy means the charged residues stay within 4.5 Å throughout the trajectory. Confirming the salt bridge requires checking charge signs, donor–acceptor distances (<3.2 Å), and geometry. Similarly, H-bond candidates need angle validation. The MD persistence data raises confidence but does not replace geometric bond analysis.

Additional interface metrics would strengthen the analysis. Interface RMSD (binder displacement relative to PD-L1 after target alignment), binder–target center-of-mass distance, BSA over time, and fraction of original contacts retained would provide a more complete picture of interface behavior. These analyses are implemented in the MD extensions notebook (08b) and can be applied to future design batches.

Where this sits in the workflow

Generate backbones → Design sequences → Structure filter → Interface fingerprinting → Minimize & MD → Affinity scoring → Experimental validation

Appendix: definitions & methodology

Binder RMSD: backbone Cα RMSD of the binder chain aligned to the binder’s own starting-frame coordinates. Measures fold stability, not interface drift.

Radius of gyration (R_g): mass-weighted R_g of the binder chain. Monitors compactness; an unfolding event would produce a rising R_g.

Residue contact count: number of binder–PD-L1 residue pairs with any heavy atom within 4.5 Å, using MDTraj compute_contacts with scheme="closest-heavy" and periodic boundary conditions.

Contact occupancy: fraction of production frames in which a given residue pair is in contact (<4.5 Å). Occupancy of 1.0 means the contact is present in every saved frame.

Maximum residence time: longest continuous stretch (in ps) during which a contact remains unbroken.

Interaction type labels: residue-class compatibility labels inherited from Part 3. MD occupancy increases plausibility but does not confirm bond formation without geometric validation.

References

Eastman, P. et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Comput. Biol. 13, e1005659 (2017).
Maier, J.A. et al. ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
McGibbon, R.T. et al. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532 (2015).
PDB 4ZQK: native PD-1/PD-L1 interface reference.