PMechDB overview
The polar mechanism is the most common mechanism in organic and organometallic chemistry. These reactions result from the heterolytic cleavage of bonds and the formation of charged intermediates. Due to the high reactivity of the polar reactants, many of these types of reactions can occur at standard and physiological conditions. Therefore, polar reactions are frequently studied and used in organic, inorganic, organometallic, biological and environmental chemistry. Polar reactions are crucial to many of the most important biological processes, such as the synthesis of peptide bonds which form the backbone of proteins, and they are invaluable to the drug design process. Hence, polar organic reactions are of great importance and their mechanisms and outcomes are worthy of investigation
PMechDB is a live platform for aggregating, curating, and distributing chemcial reactions in the form of elementary polar steps to accelerate research in chemoinformatics and polar reaction modeling. The PMechDB platform is designed to facilitate training deep learning and other AI models in data-driven workflows using its tabular data, with no need for additional pre-processing steps It provides a unified model that ought to facilitate data sharing, model building, dissemination, and publications. We encourage the community to explore and use the PMechDB data and functionalities, and contribute to its expansion.
How to use PMechDB
Search by reaction
Note: The system does not currently search for alternate resonance structures. In order to find a match within the database, the exact resonance structure must be specified.
The search by reaction interface offers two search types:
- Exact search
- Similarity search
To perform an exact search, the search query must be a complete polar reaction step including both reactants and products. The result of this search is a list of all the polar reaction steps containing both the query reactants and the query products.
Input examples:
- C1[CH2:10][CH:11]1[CH2:20][N+:21]#N>>[CH2:10]1C[CH+:11][CH2:20]1.N#[N:21]
- [CH2:20]C1=CC=CC=C1.[N+:10](=O)[O-]>>c1ccc(cc1)C[N+](=O)[O-]
To perform a similarity search, the input query must be a complete polar reaction step including both reactants and products. Similarity search uses tanimoto similarity. The results of this search is the list of all the polar reaction steps in the PMechDB database sorted by their similarty to the query reaction step.
Input example:
- CC(C)C.[N+](=O)[O-]>>CC(C)(C)[N+](=O)[O-]
- [CH2:20]C1=CC=CC=C1.[N+:10](=O)[O-]>>c1ccc(cc1)C[N+](=O)[O-]
*** For all the aforementioend search types, the order of the molecules in the reactants or the products does not affect the search results.
*** For all the aforementioend search types, the numbering of the atoms does not affect the search results.
Search by compound
The search by compound interface offers three different search types using three entities:
- Molecule
- Reactive atom
- Substructure
For molecule and substructure search, the following rules apply. Each molecule/substructure is separated by the "." character, and reactants are separated from products by the ">>" character. If users want to search for reactants only, they may specify ">>" after the list of molecules/substructures. If users want to search for products only, they may specify ">>" before the list of molecules/substructures. If ">>" is omitted, the search will look for reactions where the molecules/substructures are contained on either side of the reaction. After validating the SMILES input, the platform displays all elementary steps in its database that contain the specified molecules.
To perform a molecule search, the input query must be a valid SMILES string containing the molecules of interest. The results of this search is a list of all the polar reaction steps containing the input molecule(s) on their specified side of the reaction. Please note that the numbering of different atoms within the input SMILES string does not affect the search results.
Input examples:
- CC(C)(C)CO[N+](=O)[O-]
- >>CC(C)(C)[N+](=O)[O-]
- CC(C)(C)[N+](=O)[O-]>>
- CC(C)(C)CO[N+](=O)[O-]>>CC(C)(C)[N+](=O)[O-]
- CC(C)(C)CO[N+](=O)[O-]>>CC(C)(C)[N+](=O)[O-].CC(C)(C)[N+](=O)[O-]
- CC(C)(C)[N+](=O)[O-]
To perform a reactive atom search, the search query must be a valid SMILES string of a molecule with the reactive atom labeled with an integer. The integer used for labeling the atom must be between 1 and 10 and only one atom within the molecule must be labeled.
Input examples:
- C[C:9](C)C
- CC(C)(C)[CH2:7][O]
To perform a substructure search, the input query must be a valid SMARTS containing the chemical substructures of interest. The results of this search is a list of all the polar reaction steps with at least one molecule on their specified side of the reaction that contains the input substructure(s).
Input examples:
- [CX4]
- C1CCCCC1
- C1CCCCC1>>
- C1CCCCC1>>C1CCCCC1
- >>C1CCCCC1
- >>C1CCCCC1.[CX4]
To download the PMechDB data set, the user must enter his information and email address. Upon reading and agreeing to the terms of the CC-BY-NC-ND license, the user will receive an email containing the PMechDB data set. The PMechDB data set is a directory with five comma separated value (csv) files:
- manually_curated_train.csv: All the manually curated polar elementary steps for training in the PMechDB data set.
- manually_curated_test_challenging.csv: The challenging manually curated polar elementary steps used to test trained models.
- combinatorial_all.csv: All the combinatorially generated polar elementary steps in the PmechDB data set.
- combinatorial_train.csv: The training split of combinatorial elementary steps.
- combinatorial_test.csv: The testing split of combinatorial elementary steps.
For the manually curated data, the csv file has one column containing the SMIRKS and arrow codes of the elementary step, and one column containing the orbital pair classification. For the combinatorial data, the csv files has five columns:
- 1. The SMIKRS and arrow codes of elementary steps.
- 2. The solvent of the elementary step.
- 3. The temperature (K) of the elementary step.
- 4. The sN(N+E) value.
- 5. The orbital pair classification.
Read this documentation on how to use Ketcher to draw molecules and reactions.
Upload multiple reaction steps
You must create a comma separated value (.csv) file containing all the steps. Each row of the file must represent a reaction with four columns:
(1) Reaction SMIRKS (2) Arrow codes (3) Original source (4) Auxilary information (e.g. initial energy, special condition such as low pressure) Here is a file sample that could be uploaded to PMechDB:Arrow Codes Description:
(1) The "=" character represents where the electrons flow by separating the initial position of the electrons from the final position of the electrons after an arrow is applied. (2) Any time we have a "," character between two atoms, this represents a bond between the two atoms. For example, "20,21" means a bond between atom 20 and atom 21. (3) The ";" character separates one arrow from the next arrow in the arrow pushing mechanism. Here is an example of a PMechDB reaction in the correct format:Atom Mapping Description:
All polar reaction steps involve formation of a new bond between a nucleophilic functional group and an electrophilic functional group. Resonance interconversions are treated as reaction steps. The NEW bond between the functional groups is formed between an atom mapped as 10 in the nucleophile and 20 in the electrophile. All atoms in the nucleophile that are involved in arrow-pushing are mapped sequentially, with respect to electron flow specification, as 10, 11, 12, 13, etc. All atoms in the electrophile that are involved in arrow-pushing are mapped sequentially, with respect to electron flow specification, as 20, 21, 22, 23, etc. The electron flow represented with curved arrows should follow a chain of mapped atoms that looks like, for example, 15>14>13>12>11>10>20>21>22>23>24>25.If you could not upload the file you prepared, or the reaction data you collected does not fit the format mentioned above, You can send the file via email to:
- Pierre Baldi , Email: pfbaldi [at] uci [dot] edu.
- Ryan Miller , Email: rjmille3 [at] uci [dot] edu.