# Input Data The package accepts a main dataframe plus, for one backend, an optional triples dataframe. ## Main dataframe Always required: - endpoint column selected by `y_col` Optional but commonly used: - `mol_id` - SMILES column - fingerprint column - embedding column ## Backend-specific requirements ### `rdkit_morgan_tanimoto` Requires either: - a SMILES column - or a fingerprint column ### `embedding_cosine` Requires: - an embedding column containing vector-like values ### `nams_triples_import` Requires: - the main dataframe - a triples dataframe or CSV with pairwise similarity rows ## Label semantics The package thresholds `y_col` into binary labels using: - `label_threshold` - `label_direction` For already-binary endpoints, a threshold such as `0.5` with `label_direction="ge"` preserves the existing `0/1` semantics.