Run With SMILES

Use this guide when your dataset contains SMILES strings and you want to run the RDKit Morgan Tanimoto workflow.

Required inputs

Your CSV should contain:

  • an endpoint column

  • a SMILES column

If no molecule identifier column is supplied, psma will generate one.

CLI example

pixi run psma run input.csv \
  --output-dir outputs/run1 \
  --y-col low_solubility \
  --label-threshold 0.5 \
  --label-direction ge \
  --similarity-method rdkit_morgan_tanimoto \
  --smiles-col canonical_smiles

Notes

  • invalid SMILES strings will fail validation during similarity construction

  • RDKit must be available for this backend

  • you can compare split behavior by changing --split-method