Do the interfaces survive when we let the atoms move?
In Part 3, every interface analysis came with a caveat: the structures were not energy-minimized, sidechains were not packed in the context of PD-L1, and interaction labels were topological approximations rather than confirmed chemistry. Molecular dynamics addresses those caveats directly. Energy minimization resolves steric clashes. Equilibration relaxes sidechains in their binding-site context. Production MD then asks whether the interface survives thermal motion at 300 K — whether contacts persist, whether the binder stays engaged, and whether the fold itself remains intact.
Molecular dynamics (MD) is a computational method that simulates the physical movements of atoms over time. Each atom in the system (protein, water, ions) is assigned a starting position and velocity. At each timestep, the forces on every atom are computed from a potential energy function called a force field — which encodes bonded terms (bond stretches, angle bends, torsions) and nonbonded terms (van der Waals interactions, electrostatics). Newton’s equations of motion are then integrated numerically to update positions and velocities, advancing the simulation by a tiny time increment (here, 1 femtosecond = 10−15 seconds).
The result is a trajectory: a time series of atomic coordinates that captures how the system evolves under thermal motion. This lets us ask dynamic questions that a single static structure cannot answer. Does the binder stay folded? Do interface contacts persist, or do they break and reform? Does the complex drift apart? MD does not predict binding affinity, but it can identify designs that are immediately unstable — and distinguish persistent contacts from transient ones.
The force field we use here, AMBER14 (ff14SB), is parameterized for proteins and nucleic acids. Water is modeled explicitly as individual TIP3P molecules rather than as a continuum, which captures solvation effects at the cost of more computation. The system is maintained at constant temperature (300 K) and pressure (1 bar) using a Langevin thermostat and Monte Carlo barostat, mimicking experimental conditions.
This is not a binding-affinity measurement. One nanosecond of MD cannot determine whether a design will bind PD-L1 experimentally. What it can do is eliminate designs that fall apart immediately: binders that unfold, interfaces that drift, contacts that vanish. Surviving MD is a necessary but not sufficient condition for binding.
We picked three designs, one from each hotspot configuration (Cluster A, Distributed, Cluster B), to run through the full MD pipeline. The goal was to test the simulation workflow on each design strategy, not to find the single best design.
In Part 3, we reported average behavior across all designs within a hotspot configuration: Cluster B designs as a group showed fewer contacts and lower hotspot coverage than Distributed. The individual Cluster B design simulated here may differ from that group average — and in fact performs quite well, with 32 mean contacts and 4/5 hotspot residues engaged. This is expected: the Part 3 averages describe the central tendency across ~10 designs per condition, while MD tests one specific design in detail. Readers should not be surprised if the behavior of an individual design departs from the average of its cluster.
Before running dynamics, we minimize the potential energy of the system. Predicted structures contain steric clashes (atoms too close together) and unresolved sidechain conformations that would cause the simulation to crash or produce artifacts. Energy minimization iteratively adjusts atom positions to find a nearby local energy minimum — resolving these clashes without fundamentally changing the protein’s conformation. Think of it as letting the structure “relax” into a physically plausible geometry before introducing thermal motion.
After minimization, the system needs to equilibrate: atoms are assigned random velocities consistent with the target temperature (300 K), and the simulation is run with position restraints on the protein backbone. This allows the solvent (water and ions) to reorganize around the protein, the box volume to adjust to the target pressure, and sidechain conformations to relax — all without allowing the protein backbone to drift from its starting geometry. We run two stages: NVT (constant volume, 100 ps, strong restraints) to stabilize temperature, then NPT (constant pressure, 200 ps, lighter restraints) to equilibrate density. Only after equilibration do we release all restraints for production MD.
The input to MD is not the raw RFdiffusion backbone (which has poly-glycine binder residues). Instead, the ProteinMPNN-designed sequence was threaded onto the RFdiffusion docked geometry: binder backbone atoms are kept in place, residue names are replaced with the designed sequence, and ESMFold-predicted sidechains are superimposed by Cα alignment. This gives the forcefield real residue identities and plausible sidechain conformations to work with during minimization.
| Parameter | Value |
|---|---|
| Force field | AMBER14 (ff14SB) + TIP3P explicit water |
| Temperature | 300 K (Langevin thermostat) |
| Pressure | 1 bar (Monte Carlo barostat) |
| Timestep | 1 fs (conservative; HBonds constraints allow 2 fs) |
| Ionic strength | 0.15 M NaCl |
| Solvent padding | 1.0 nm |
| Production length | 1 ns (100 frames saved at 10 ps intervals) |
| Platform | Google Colab T4 GPU (OpenCL) |
Each solvated system contains ~185 protein residues and roughly 35,000–40,000 total atoms including water and ions. On a Google Colab T4 GPU, the full pipeline takes approximately 1–1.5 hours per design:
| Stage | Duration | Steps |
|---|---|---|
| Solvation & preparation | ~2 min | — |
| Energy minimization | ~3–5 min | ~1,000 |
| NVT equilibration (100 ps) | ~5–10 min | 100,000 |
| NPT equilibration (200 ps) | ~10–15 min | 200,000 |
| Production MD (1 ns) | ~30–60 min | 1,000,000 |
| Analysis & trajectory extraction | ~5 min | — |
| Total per system | ~1–1.5 hr | 1,300,000 |
Three systems ran sequentially in a single Colab session (roughly 3–5 hours total). The notebook is restart-safe: if Colab disconnects mid-run, completed stages are detected on disk and skipped on restart, so only the interrupted stage needs to re-run.
For comparison: RFdiffusion generates ~50 backbones per condition in minutes, ProteinMPNN sequences each backbone in seconds, and ESMFold predicts a structure in ~10–30 seconds per design. MD is by far the most expensive step: one nanosecond for one design costs more GPU time than the entire upstream pipeline for a full condition. This is why MD is used selectively on prioritized candidates rather than applied to all 31 passing designs.
| Design | Hotspot config | Condition | pLDDT | Pre-MD RMSD (Å) |
|---|---|---|---|---|
Cluster Adesign_19_0_rank0 |
18, 20, 120, 122 | len70, noise 0.5 | 94.56 | 0.24 |
Distributeddesign_17_0_rank1 |
18, 20, 26, 56, 113, 120, 122, 125 | len70, noise 0 | 94.33 | 0.33 |
Cluster Bdesign_19_0_rank1 |
26, 56, 113, 123, 125 | len70, noise 0 | 93.97 | 0.44 |
All three are 70-residue helical binders with high ESMFold confidence (pLDDT > 93) and low pre-MD alignment RMSD (< 0.5 Å), confirming good fold recapitulation before simulation begins.
The viewer below shows the Distributed design’s production trajectory — 100 frames spanning 1 ns of simulation at 300 K. PD-L1 (gray) is held in a consistent orientation; the binder (green) moves freely. Use the playback controls to step through or animate the trajectory.
Figure 1. Production MD trajectory of the Distributed design (green) docked to PD-L1 (gray), 100 frames over 1 ns. The binder remains engaged with the target surface throughout. The multi-model PDB was generated by extracting protein-only coordinates from the solvated trajectory and aligning each frame on PD-L1 Cα atoms (target-locked view).
The first question is structural: does each binder maintain its predicted fold under thermal motion, or does it unfold? Backbone RMSD (aligned to the binder’s own starting structure) tracks fold deviation over time, while the radius of gyration (Rg) monitors compactness.
RMSD (root mean square deviation) measures how far a structure has moved from a reference. For binder fold stability, we align each MD frame’s binder Cα atoms onto the starting frame’s binder Cα atoms, then compute the average displacement. An RMSD of 1.5 Å means the backbone has shifted by about 1.5 Å on average from where it started — typical of a protein maintaining its fold with some thermal breathing. An RMSD climbing above 4–5 Å would suggest unfolding.
Rg (radius of gyration) is the mass-weighted average distance of atoms from the molecule’s center of mass. It measures how compact the protein is. A stable Rg means the binder is neither expanding (unfolding) nor collapsing. For a 70-residue helical bundle, we expect Rg around 16–17 Å.
Figure 2. Binder backbone RMSD and radius of gyration over 1 ns of production MD. All three binders remain folded. The Distributed design shows the largest RMSD excursion (~2.8 Å final) but its Rg is stable, suggesting local rearrangement rather than unfolding.
Binder fold stability is necessary but not sufficient. The binder also needs to stay engaged with PD-L1. Residue contact count over time (using a 4.5 Å heavy-atom cutoff) tracks whether the interface is maintained, growing, or eroding.
For each frame of the trajectory, we identify all binder–PD-L1 residue pairs where any heavy atom (non-hydrogen) on the binder residue is within 4.5 Å of any heavy atom on the PD-L1 residue. This is computed using MDTraj’s compute_contacts with the “closest-heavy” scheme, which reports the minimum heavy-atom distance for each residue pair. A pair is “in contact” if that distance is below the cutoff. The count of such pairs per frame gives the residue contact number; summing all sub-cutoff atom pairs gives the atom contact number.
Figure 3. Interface metrics over 1 ns. Residue and atom contact counts track interface engagement; buried surface area (BSA) measures the total solvent-excluded area at the interface; minimum interface distance reports the single closest heavy-atom pair across chains (useful as a dissociation detector but dominated by one atom pair). The interfaces do not dissociate over this trajectory.
Not all contacts are equal. A residue pair in contact for 100% of the trajectory is more informative than one that flickers on and off. Contact persistence analysis assigns each binder–PD-L1 residue pair an occupancy (fraction of frames in contact) and a maximum residence time (longest continuous stretch). This is where the “candidate” interaction labels from Part 3 start to gain credibility: a salt bridge candidate that maintains sub-3.5 Å distance for the entire nanosecond is much more plausible than one that appeared once in a static snapshot.
In Part 3, we classified contacts by the chemical identity of the residue pair: ARG–ASP is labeled “candidate salt bridge” because one carries a positive charge and the other negative. GLN–ALA near a backbone is “candidate polar contact.” PHE–TYR is “candidate aromatic contact.” These labels reflect residue-class compatibility, not confirmed bond geometry. MD occupancy adds a second line of evidence: if a candidate salt bridge pair stays within bonding distance for 100% of a 1 ns trajectory, the interaction is substantially more plausible — though formally confirming it still requires checking donor–acceptor distances (<3.2 Å), angles (>120°), and charge states.
Figure 4. Top 20 most persistent binder–PD-L1 contact pairs by occupancy. Sustained proximity under MD increases confidence in the candidate interaction labels assigned in Part 3, but does not confirm bond formation without geometric validation.
In Part 3, hotspot recovery was partial: the average Cluster B design showed lower coverage than Distributed. When we examine the specific designs simulated here and look at which hotspot residues are contacted with ≥50% occupancy during MD, the picture is different — and more encouraging.
Figure 5. Hotspot residues contacted with ≥50% occupancy during 1 ns MD. Cluster B contacts 4 of 5 specified hotspots (Tyr56, Arg113, Tyr123, Arg125); Distributed contacts 5 of 8 (Tyr56, Arg113, Gly120, Asp122, Arg125). Cluster A contacts only Gly120 among its specified hotspots.
Before interpreting MD results, we need to confirm the simulation itself ran correctly. A simulation that crashes, produces unphysical temperatures, or fails to equilibrate density will generate meaningless trajectories. This is the computational equivalent of checking that your instrument is calibrated before reading the data.
OpenMM’s StateDataReporter logs system properties at every reporting interval: temperature (should fluctuate around 300 K), potential energy (should stabilize after minimization, not drift), and density (should converge to ~1.0 g/cm³ for a properly solvated aqueous system during NPT equilibration). Anomalies in any of these indicate force field errors, insufficient equilibration, or numerical instability. We also check that no atoms have “exploded” — coordinates diverging to infinity, which would indicate a timestep that is too large or unresolved clashes.
Figure 6. Simulation health metrics during production MD. Temperature should fluctuate around the thermostat setpoint (300 K). Potential energy should be stable (no drift). Density should remain near 1.0 g/cm³. Data from OpenMM StateDataReporter production logs.
| Metric | Cluster A | Distributed | Cluster B | What it means |
|---|---|---|---|---|
| Binder RMSD, mean (Å) | 1.40 | 1.93 | 1.49 | Fold stability |
| Binder RMSD, final (Å) | 1.44 | 2.83 | 1.45 | Final fold deviation |
| Rg std (Å) | 0.18 | 0.11 | 0.12 | Compactness stability |
| Contacts, mean | 19.9 | 39.8 | 32.2 | Interface size (contacts) |
| BSA, mean (Ų) | 470 | 925 | 825 | Interface size (buried area) |
| Contacts at 100% occupancy | 7 | 11 | 7 | Core persistent contacts |
| Contacts at ≥80% occupancy | 14 | 33 | 22 | Stable interface shell |
| Hotspots contacted (≥50%) | 1 / 4 | 5 / 8 | 4 / 5 | Intended epitope retention |
| RMSF mean (Å) | 0.86 | 1.12 | 0.95 | Per-residue flexibility |
Short MD simulations moved these three designs from static structural predictions to dynamically relaxed complexes. Over 1 ns, all three binders remain compact and folded, and none of the interfaces dissociates after minimization, solvation, equilibration, and unrestrained production dynamics. Persistent contact analysis identifies several residue pairs that remain proximal throughout the trajectory, including charged, polar, hydrophobic, and aromatic contact candidates. Hotspot recovery under MD is substantially better than static fingerprinting predicted, especially for the Cluster B design tested here.
This is encouraging but deliberately limited evidence. A 1 ns trajectory cannot establish binding affinity, specificity, off-rate, or long-timescale interface stability. Contact persistence does not confirm hydrogen bonds, salt bridges, or aromatic stacking without explicit geometric validation. The value of this notebook is that it adds a stress-test layer to the computational design workflow: it demonstrates fold stability assessment, interface persistence tracking, trajectory handling, restart-safe Colab MD, and contact analysis — tools that would be applied at larger scale in a real design campaign.
1 ns is very short. Many biologically relevant conformational changes occur on microsecond-to-millisecond timescales. A 1 ns simulation catches immediate instability (unfolding, interface dissociation, steric clashes), but it cannot sample slow binding/unbinding events, large domain motions, or solvent reorganization. A stable 1 ns trajectory is necessary but not sufficient evidence for a good interface.
Three systems are too few for general conclusions. We tested one representative per hotspot configuration. This demonstrates the MD workflow and asks whether individual complexes are physically plausible. It is not a statistical comparison of design strategies.
Force-field accuracy is approximate. AMBER14/TIP3P is a well-validated combination for protein simulation, but forcefield artifacts are real. We report stability metrics (RMSD, contact persistence), not binding energies, because the latter carry systematic errors that can exceed the differences between designs.
Contact occupancy is not bond confirmation. A “candidate salt bridge” at 100% occupancy means the charged residues stay within 4.5 Å throughout the trajectory. Confirming the salt bridge requires checking charge signs, donor–acceptor distances (<3.2 Å), and geometry. Similarly, H-bond candidates need angle validation. The MD persistence data raises confidence but does not replace geometric bond analysis.
Additional interface metrics would strengthen the analysis. Interface RMSD (binder displacement relative to PD-L1 after target alignment), binder–target center-of-mass distance, BSA over time, and fraction of original contacts retained would provide a more complete picture of interface behavior. These analyses are implemented in the MD extensions notebook (08b) and can be applied to future design batches.
Binder RMSD: backbone Cα RMSD of the binder chain aligned to the binder’s own starting-frame coordinates. Measures fold stability, not interface drift.
Radius of gyration (Rg): mass-weighted Rg of the binder chain. Monitors compactness; an unfolding event would produce a rising Rg.
Residue contact count: number of binder–PD-L1 residue pairs with any heavy atom within 4.5 Å, using MDTraj compute_contacts with scheme="closest-heavy" and periodic boundary conditions.
Contact occupancy: fraction of production frames in which a given residue pair is in contact (<4.5 Å). Occupancy of 1.0 means the contact is present in every saved frame.
Maximum residence time: longest continuous stretch (in ps) during which a contact remains unbroken.
Interaction type labels: residue-class compatibility labels inherited from Part 3. MD occupancy increases plausibility but does not confirm bond formation without geometric validation.