Input Data

The package accepts a main dataframe plus, for one backend, an optional triples dataframe.

Main dataframe

Always required:

  • endpoint column selected by y_col

Optional but commonly used:

  • mol_id

  • SMILES column

  • fingerprint column

  • embedding column

Backend-specific requirements

rdkit_morgan_tanimoto

Requires either:

  • a SMILES column

  • or a fingerprint column

embedding_cosine

Requires:

  • an embedding column containing vector-like values

nams_triples_import

Requires:

  • the main dataframe

  • a triples dataframe or CSV with pairwise similarity rows

Label semantics

The package thresholds y_col into binary labels using:

  • label_threshold

  • label_direction

For already-binary endpoints, a threshold such as 0.5 with label_direction="ge" preserves the existing 0/1 semantics.