# Choose Random vs Butina Use this guide to choose between the available train/test split strategies. ## Random split Use `random` when: - you want a simple baseline split - the dataset is large enough that a random partition is acceptable - you are doing quick iteration Tradeoff: - a random split may not stress scaffold or chemotype generalisation ## Butina split Use `butina` when: - you want cluster-aware splitting - you care more about chemical dissimilarity between train and test - you want a more demanding validation setting Tradeoff: - the split behavior depends on the distance cutoff - cluster-based splitting can produce a harder task ## Practical recommendation For exploratory work: - start with `random` For more realistic evaluation: - compare `random` and `butina` The NCATS-sol example script renders both so you can inspect the difference directly.