The data is released under the CC-BY-NC-ND license. Under this license users may download and distribute the unmodified data (with proper reference), but may not use it commercially.
There are 5 datasets available for download:
1. PMechRP Datasets
This folder contains all datasets used in the PMechRP paper titled "Mechanism-Aware Deep Learning for Polar Reaction Prediction", including train/val/test splits for Chemformer, Graph2Smiles, Molecular Transformer, T5Chem, and Two-Stage/ArrowFinder models. Each model has 5 folds for manually curated only (mc_only) and mixed datasets, where the mixed sets contain the additional combinatorial proton transfer steps.
The Pathway subdirectory includes the human benchmark dataset (350 textbook pathways), model-generated results, and plausibility analyses. It also contains ORD USPTO transformations, results, and plausibility evaluations.
Additional information is available in the README file.
Preprint available on arXiv: "Interpretable Deep Learning for Polar Mechanistic Reaction Prediction".
2. PMechRP Model Checkpoints
Contains model checkpoints used in the PMechRP paper.
3. Proton Transfer Dataset
Contains all 48M proton transfer steps and supporting files used to generate reactions.
Preprint available on ChemRxiv: "A Data Set of Plausible Proton Transfer Steps for Arrow-Pushing Mechanisms".
4. PMechDB Dataset
Includes all datasets associated with the JCIM manuscript:
"PMechDB: A Public Database of Elementary Polar Reaction Steps".
5. OLD PMechRP Proton Transfer Dataset
Includes the original 48M proton transfer steps sampled for the mixed data splits in the PMechRP paper. The steps have since been updated, and we recommend downloading the "Proton Transfer Dataset" above for future work.
Select a dataset below and submit your request to download.