Similarity Backends

psma currently supports three ways to build the similarity matrix.

RDKit Morgan Tanimoto

This backend uses SMILES strings or fingerprints to build Morgan fingerprints and then computes pairwise Tanimoto similarity.

Use it when:

  • you have SMILES

  • you want a standard cheminformatics baseline

Embedding cosine

This backend uses dense vector embeddings and computes cosine similarity.

Use it when:

  • you already have learned molecular embeddings

  • you want to compare representation-learning workflows

Imported triples

This backend reconstructs the full similarity matrix from a sparse triples table containing:

  • first molecule id

  • second molecule id

  • similarity score

Use it when:

  • the similarities were computed elsewhere

  • you want the package to consume a precomputed similarity definition

Choosing between them

Start with the backend that matches the representation you already have. The downstream PSMA surface workflow is the same once the similarity matrix exists.