# Choose Random vs Butina

Use this guide to choose between the available train/test split
strategies.

## Random split

Use `random` when:

- you want a simple baseline split
- the dataset is large enough that a random partition is acceptable
- you are doing quick iteration

Tradeoff:

- a random split may not stress scaffold or chemotype generalisation

## Butina split

Use `butina` when:

- you want cluster-aware splitting
- you care more about chemical dissimilarity between train and test
- you want a more demanding validation setting

Tradeoff:

- the split behavior depends on the distance cutoff
- cluster-based splitting can produce a harder task

## Practical recommendation

For exploratory work:

- start with `random`

For more realistic evaluation:

- compare `random` and `butina`

The NCATS-sol example script renders both so you can inspect the
difference directly.