5,000 protein-ligand complexes selected from the Protein Data Bank (PDB). For each complex, it offers binding affinity data along with energy components—such as electrostatic, van der Waals, polar, and non-polar solvation energies—calculated via Molecular Dynamics (MD) simulations using the MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. There are two variations of the dataset:
This work has been published in Scientific Data: https://doi.org/10.1038/s41597-022-01631-9
Link to datasets
An additional 15,000 protein-ligand complexes were added to the PLAS 5K dataset to create the PLAS 20 K
This work was also published in Scientific Data: https://doi.org/10.1038/s41597-023-02872-y
Link to datasets
Collaborative effort from IIIT Hyderabad, Intel, AWS, and Insilico Medicine, we have conducted physics-based calculations (molecular dynamics simulations) on approximately 20,000 protein-ligand complexes. The resulting dataset includes molecular dynamics snapshots, binding affinities calculated using the MM-PBSA method, and individual energy components such as electrostatic and van der Waals interactions.
License
A dataset of ligand-unbound protein conformations for machine learning applications in de novo drug design was created. Ligand-free protein structure equivalents for 10,599 out of 16,608 protein-ligand complexes were sourced from the PDBbind v.2019 database. This effort aimed to facilitate computation and large-scale validation of structure-based drug design methods using apo structures. The ligand-free structures were identified by mining the PDB database for apo protein structures that showed strong sequence and structural alignment with the proteins found in PDBbind.
Link to dataset
HTML Website Builder