Run With SMILES¶

Use this guide when your dataset contains SMILES strings and you want to run the RDKit Morgan Tanimoto workflow.

Required inputs¶

Your CSV should contain:

an endpoint column
a SMILES column

If no molecule identifier column is supplied, psma will generate one.

CLI example¶

pixi run psma run input.csv \
  --output-dir outputs/run1 \
  --y-col low_solubility \
  --label-threshold 0.5 \
  --label-direction ge \
  --similarity-method rdkit_morgan_tanimoto \
  --smiles-col canonical_smiles

Notes¶

invalid SMILES strings will fail validation during similarity construction
RDKit must be available for this backend
you can compare split behavior by changing --split-method