Run With SMILES¶
Use this guide when your dataset contains SMILES strings and you want to run the RDKit Morgan Tanimoto workflow.
Required inputs¶
Your CSV should contain:
an endpoint column
a SMILES column
If no molecule identifier column is supplied, psma will generate one.
CLI example¶
pixi run psma run input.csv \
--output-dir outputs/run1 \
--y-col low_solubility \
--label-threshold 0.5 \
--label-direction ge \
--similarity-method rdkit_morgan_tanimoto \
--smiles-col canonical_smiles
Notes¶
invalid SMILES strings will fail validation during similarity construction
RDKit must be available for this backend
you can compare split behavior by changing
--split-method