Input Data¶
The package accepts a main dataframe plus, for one backend, an optional triples dataframe.
Main dataframe¶
Always required:
endpoint column selected by
y_col
Optional but commonly used:
mol_idSMILES column
fingerprint column
embedding column
Backend-specific requirements¶
rdkit_morgan_tanimoto¶
Requires either:
a SMILES column
or a fingerprint column
embedding_cosine¶
Requires:
an embedding column containing vector-like values
nams_triples_import¶
Requires:
the main dataframe
a triples dataframe or CSV with pairwise similarity rows
Label semantics¶
The package thresholds y_col into binary labels using:
label_thresholdlabel_direction
For already-binary endpoints, a threshold such as 0.5 with
label_direction="ge" preserves the existing 0/1 semantics.