psma package¶
PSMA surface modeling package.
- class psma.PsmaMetrics(auc, auc_defined, mcc, threshold, confusion, metric_notes)¶
Bases:
objectSummary metrics for PSMA test-set predictions.
- Parameters:
auc (float | None)
auc_defined (bool)
mcc (float)
threshold (float)
confusion (dict[str, int])
metric_notes (dict[str, str])
- auc: float | None¶
- auc_defined: bool¶
- mcc: float¶
- threshold: float¶
- confusion: dict[str, int]¶
- metric_notes: dict[str, str]¶
- class psma.PsmaParams(y_col, label_threshold, label_direction, similarity_method, embed_method, distance_k, tsne_perplexity, tsne_max_iter, grid_n, grid_lims_scale, eps, split_method, test_fraction, random_state, butina_distance_cutoff, mcc_prob_threshold, morgan_radius, morgan_n_bits)¶
Bases:
objectStable parameter payload describing one PSMA workflow run.
- Parameters:
y_col (str)
label_threshold (float)
label_direction (str)
similarity_method (str)
embed_method (str)
distance_k (float)
tsne_perplexity (float)
tsne_max_iter (int)
grid_n (int)
grid_lims_scale (float)
eps (float)
split_method (str)
test_fraction (float)
random_state (int)
butina_distance_cutoff (float)
mcc_prob_threshold (float)
morgan_radius (int)
morgan_n_bits (int)
- y_col: str¶
- label_threshold: float¶
- label_direction: str¶
- similarity_method: str¶
- embed_method: str¶
- distance_k: float¶
- tsne_perplexity: float¶
- tsne_max_iter: int¶
- grid_n: int¶
- grid_lims_scale: float¶
- eps: float¶
- split_method: str¶
- test_fraction: float¶
- random_state: int¶
- butina_distance_cutoff: float¶
- mcc_prob_threshold: float¶
- morgan_radius: int¶
- morgan_n_bits: int¶
- class psma.PsmaResult(indices, similarity, distance, coords, labels, grid, projection, prob_test, metrics, params, artifacts_dir, frames)¶
Bases:
objectTop-level typed result returned by the PSMA workflow.
- Parameters:
indices (PsmaIndices)
similarity (PsmaSimilarity)
distance (PsmaDistance)
coords (PsmaCoordinates)
labels (PsmaLabels)
grid (PsmaGrid)
projection (ProjectionDiagnostics)
prob_test (ndarray)
metrics (PsmaMetrics)
params (PsmaParams)
artifacts_dir (Path | None)
frames (PsmaFrames)
- indices: PsmaIndices¶
- similarity: PsmaSimilarity¶
- distance: PsmaDistance¶
- coords: PsmaCoordinates¶
- labels: PsmaLabels¶
- projection: ProjectionDiagnostics¶
- prob_test: ndarray¶
- metrics: PsmaMetrics¶
- params: PsmaParams¶
- artifacts_dir: Path | None¶
- frames: PsmaFrames¶
- psma.compute_psma_surface(df, *, y_col, label_threshold, label_direction, similarity_method='rdkit_morgan_tanimoto', embed_method='pcoa', smiles_col='smiles', fp_col=None, emb_col=None, mol_id_col='mol_id', triples_df=None, distance_k=0.382, tsne_perplexity=30, tsne_max_iter=500, grid_n=128, grid_lims_scale=2.0, eps=1e-12, split_method='random', test_fraction=0.2, random_state=42, butina_distance_cutoff=0.4, mcc_prob_threshold=0.5, morgan_radius=2, morgan_n_bits=2048)¶
Run the PSMA computation path without writing artifacts.
- Parameters:
df (DataFrame) – Input dataframe containing endpoint and feature columns.
y_col (str) – Continuous endpoint column name.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.similarity_method (str) – Similarity backend name.
embed_method (str) – Embedding backend name.
smiles_col (str | None) – Optional SMILES column for RDKit similarity.
fp_col (str | None) – Optional precomputed fingerprint column.
emb_col (str | None) – Optional embedding column for cosine similarity.
mol_id_col (str | None) – Optional molecule identifier column.
triples_df (DataFrame | None) – Optional triples dataframe for imported similarities.
distance_k (float) – Convex distance-transform parameter.
tsne_perplexity (float) – t-SNE perplexity when that backend is used.
tsne_max_iter (int) – t-SNE iteration cap when that backend is used.
grid_n (int) – Number of grid steps per axis.
grid_lims_scale (float) – Scale factor for the posterior grid bounds.
eps (float) – Numerical stabilizer used in posterior scoring.
split_method (str) – Split strategy, either
'random'or'butina'.test_fraction (float) – Fraction of compounds assigned to the test set.
random_state (int) – Random seed used across the workflow.
butina_distance_cutoff (float) – Distance threshold used by Butina clustering.
mcc_prob_threshold (float) – Probability threshold used for MCC.
morgan_radius (int) – Morgan fingerprint radius for RDKit similarity.
morgan_n_bits (int) – Morgan fingerprint size for RDKit similarity.
- Returns:
Typed PSMA result payload containing workflow outputs, diagnostics, parameters, and data frames.
- Return type:
- psma.run_psma_surface(df, *, y_col, label_threshold, label_direction, similarity_method='rdkit_morgan_tanimoto', embed_method='pcoa', smiles_col='smiles', fp_col=None, emb_col=None, mol_id_col='mol_id', triples_df=None, distance_k=0.382, tsne_perplexity=30, tsne_max_iter=500, grid_n=128, grid_lims_scale=2.0, eps=1e-12, split_method='random', test_fraction=0.2, random_state=42, butina_distance_cutoff=0.4, mcc_prob_threshold=0.5, morgan_radius=2, morgan_n_bits=2048, output_dir=None)¶
Run the PSMA workflow and optionally persist standard artifacts.
- Parameters:
df (DataFrame) – Input dataframe containing endpoint and feature columns.
y_col (str) – Continuous endpoint column name.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.similarity_method (str) – Similarity backend name.
embed_method (str) – Embedding backend name.
smiles_col (str | None) – Optional SMILES column for RDKit similarity.
fp_col (str | None) – Optional precomputed fingerprint column.
emb_col (str | None) – Optional embedding column for cosine similarity.
mol_id_col (str | None) – Optional molecule identifier column.
triples_df (DataFrame | None) – Optional triples dataframe for imported similarities.
distance_k (float) – Convex distance-transform parameter.
tsne_perplexity (float) – t-SNE perplexity when that backend is used.
tsne_max_iter (int) – t-SNE iteration cap when that backend is used.
grid_n (int) – Number of grid steps per axis.
grid_lims_scale (float) – Scale factor for the posterior grid bounds.
eps (float) – Numerical stabilizer used in posterior scoring.
split_method (str) – Split strategy, either
'random'or'butina'.test_fraction (float) – Fraction of compounds assigned to the test set.
random_state (int) – Random seed used across the workflow.
butina_distance_cutoff (float) – Distance threshold used by Butina clustering.
mcc_prob_threshold (float) – Probability threshold used for MCC.
morgan_radius (int) – Morgan fingerprint radius for RDKit similarity.
morgan_n_bits (int) – Morgan fingerprint size for RDKit similarity.
output_dir (str | Path | None) – Optional output directory. When provided, standard artifacts are written to disk.
- Returns:
Typed PSMA result payload. The
artifacts_dirfield is set only when artifact writing is requested.- Return type:
Submodules¶
psma.cli module¶
Command-line interface for reproducible PSMA workflow runs.
- psma.cli.main(argv=None)¶
Run the PSMA command-line interface.
- Parameters:
argv (Sequence[str] | None) – Optional argument vector excluding the executable name.
- Returns:
Process exit code where
0indicates success.- Return type:
int
psma.distance module¶
Similarity-to-distance transform utilities.
- psma.distance.similarity_to_distance(similarity, *, distance_k=0.382)¶
Convert similarities to convex-transformed distances.
- Parameters:
similarity (ndarray) – Similarity matrix or rectangular similarity block.
distance_k (float) – Convex transform parameter.
- Returns:
Distance matrix or block with the same shape as the input.
- Return type:
ndarray
psma.embed module¶
2D embedding utilities for PSMA workflows.
- psma.embed.pcoa_embed(distance_matrix, *, n_components=2)¶
Compute a classical multidimensional scaling embedding.
- Parameters:
distance_matrix (ndarray) – Square distance matrix for the training set.
n_components (int) – Number of embedding dimensions to keep.
- Returns:
Embedded coordinates with shape
(n_samples, n_components).- Return type:
ndarray
- psma.embed.tsne_embed(distance_matrix, *, perplexity=30.0, max_iter=500, random_state=42)¶
Compute a 2D t-SNE embedding from precomputed distances.
- Parameters:
distance_matrix (ndarray) – Square precomputed distance matrix.
perplexity (float) – t-SNE perplexity parameter.
max_iter (int) – Maximum number of t-SNE iterations.
random_state (int) – Random seed for reproducible fitting.
- Returns:
Embedded coordinates with shape
(n_samples, 2).- Return type:
ndarray
- psma.embed.embed_train(distance_matrix, *, method, tsne_perplexity, tsne_max_iter, random_state)¶
Dispatch the requested embedding backend for training distances.
- Parameters:
distance_matrix (ndarray) – Square training distance matrix.
method (str) – Embedding backend name.
tsne_perplexity (float) – t-SNE perplexity used when
method='tsne'.tsne_max_iter (int) – t-SNE iteration cap used when
method='tsne'.random_state (int) – Random seed for embedding backends that use one.
- Returns:
Embedded training coordinates.
- Raises:
ValueError – If
methodis not a supported embedding backend.- Return type:
ndarray
psma.io module¶
Input/output helpers for PSMA surface workflows.
- psma.io.ensure_output_dir(base_dir='data/processed/psma_surface')¶
Create and return the output directory.
- Parameters:
base_dir (str | Path) – Relative or absolute path to the output directory.
- Returns:
Created output directory path.
- Return type:
Path
- psma.io.validate_input_dataframe(df, *, y_col, smiles_col, fp_col, emb_col, similarity_method)¶
Validate dataframe columns required by the selected similarity backend.
- Parameters:
df (DataFrame) – Input dataframe for one PSMA workflow run.
y_col (str) – Name of the continuous endpoint column.
smiles_col (str | None) – Name of the SMILES column when using RDKit similarity.
fp_col (str | None) – Name of the fingerprint column when fingerprints are precomputed.
emb_col (str | None) – Name of the embedding column for cosine similarity.
similarity_method (str) – Selected similarity backend name.
- Raises:
ValueError – If the selected backend is unknown or required columns are missing.
- Return type:
None
- psma.io.extract_core_arrays(df, *, y_col, mol_id_col, smiles_col, fp_col, emb_col)¶
Extract aligned arrays used by the modeling pipeline.
- Parameters:
df (DataFrame) – Input dataframe for one PSMA workflow run.
y_col (str) – Name of the continuous endpoint column.
mol_id_col (str | None) – Optional molecule identifier column.
smiles_col (str | None) – Optional SMILES column.
fp_col (str | None) – Optional precomputed fingerprint column.
emb_col (str | None) – Optional embedding column.
- Returns:
Dictionary of aligned core arrays used across the workflow.
- Raises:
ValueError – If the embedding column does not contain 1D vectors.
- Return type:
dict[str, Any]
- psma.io.save_json(path, payload)¶
Save a JSON payload with stable formatting.
- Parameters:
path (Path) – Target JSON file path.
payload (dict[str, Any]) – JSON-serializable mapping to write.
- Return type:
None
- psma.io.save_csv(path, frame)¶
Save a dataframe to CSV without an index column.
- Parameters:
path (Path) – Target CSV file path.
frame (DataFrame) – Dataframe to persist.
- Return type:
None
- psma.io.write_psma_artifacts(results, *, output_dir)¶
Write standard PSMA artifacts for a computed result payload.
- Parameters:
results (PsmaResult) – Computed PSMA result payload containing frames, grids, params, and metrics.
output_dir (str | Path) – Relative or absolute target directory for persisted artifacts.
- Returns:
Created output directory containing the written artifacts.
- Return type:
Path
psma.kde module¶
Kernel density estimation helpers for PSMA posterior surfaces.
- class psma.kde.KDEResult¶
Bases:
TypedDictTyped output from class-conditional KDE fitting.
- hx: float¶
- hy: float¶
- kde_pos: KernelDensity¶
- kde_neg: KernelDensity¶
- c_train_scaled: ndarray¶
- class psma.kde.DensityGrid¶
Bases:
TypedDictTyped output for evaluated density grids.
- grid_x: ndarray¶
- grid_y: ndarray¶
- pos_density_z: ndarray¶
- neg_density_z: ndarray¶
- psma.kde.fit_class_kdes(c_train, y_train_bin)¶
Fit class-conditional KDEs in bandwidth-scaled coordinate space.
- Parameters:
c_train (ndarray) – Embedded training coordinates with shape
(n_samples, 2).y_train_bin (ndarray) – Binary training labels aligned to
c_train.
- Returns:
Fitted KDE models, bandwidths, and scaled coordinates.
- Raises:
ValueError – If both positive and negative classes are not present.
- Return type:
- psma.kde.evaluate_density_grid(kde_pos, kde_neg, c_train, *, hx, hy, grid_n, grid_lims_scale)¶
Evaluate class density surfaces on a regular 2D grid.
- Parameters:
kde_pos (KernelDensity) – KDE fitted on positive-class training coordinates.
kde_neg (KernelDensity) – KDE fitted on negative-class training coordinates.
c_train (ndarray) – Embedded training coordinates used to define grid limits.
hx (float) – Horizontal bandwidth used to scale coordinates.
hy (float) – Vertical bandwidth used to scale coordinates.
grid_n (int) – Number of grid steps per axis.
grid_lims_scale (float) – Scale factor applied to the training-coordinate bounding box.
- Returns:
Evaluated positive and negative density grids.
- Return type:
psma.labels module¶
Label thresholding utilities for continuous endpoints.
- class psma.labels.LabelResult¶
Bases:
TypedDictTyped output for thresholded labels and class priors.
- y_train_bin: ndarray¶
- y_test_bin: ndarray¶
- prior_pos: float¶
- prior_neg: float¶
- psma.labels.validate_thresholded_training_labels(y_all_bin, y_train_bin, *, label_threshold, label_direction, split_method)¶
Validate that thresholding produced a trainable class structure.
- Parameters:
y_all_bin (ndarray) – Thresholded labels for the full dataset.
y_train_bin (ndarray) – Thresholded labels for the training split.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.split_method (str) – Split strategy used to construct the training set.
- Raises:
ValueError – If thresholding collapses the full dataset to one class or if the selected split produces a single-class training set.
- Return type:
None
- psma.labels.threshold_labels(y_train, y_test, *, label_threshold, label_direction)¶
Threshold continuous values into binary labels and class priors.
- Parameters:
y_train (ndarray) – Continuous training endpoint values.
y_test (ndarray) – Continuous test endpoint values.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.
- Returns:
Thresholded train/test labels and class priors.
- Raises:
ValueError – If
label_directionis not supported.- Return type:
psma.main module¶
Main orchestration entrypoint for PSMA surface modeling.
- psma.main.compute_psma_surface(df, *, y_col, label_threshold, label_direction, similarity_method='rdkit_morgan_tanimoto', embed_method='pcoa', smiles_col='smiles', fp_col=None, emb_col=None, mol_id_col='mol_id', triples_df=None, distance_k=0.382, tsne_perplexity=30, tsne_max_iter=500, grid_n=128, grid_lims_scale=2.0, eps=1e-12, split_method='random', test_fraction=0.2, random_state=42, butina_distance_cutoff=0.4, mcc_prob_threshold=0.5, morgan_radius=2, morgan_n_bits=2048)¶
Run the PSMA computation path without writing artifacts.
- Parameters:
df (DataFrame) – Input dataframe containing endpoint and feature columns.
y_col (str) – Continuous endpoint column name.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.similarity_method (str) – Similarity backend name.
embed_method (str) – Embedding backend name.
smiles_col (str | None) – Optional SMILES column for RDKit similarity.
fp_col (str | None) – Optional precomputed fingerprint column.
emb_col (str | None) – Optional embedding column for cosine similarity.
mol_id_col (str | None) – Optional molecule identifier column.
triples_df (DataFrame | None) – Optional triples dataframe for imported similarities.
distance_k (float) – Convex distance-transform parameter.
tsne_perplexity (float) – t-SNE perplexity when that backend is used.
tsne_max_iter (int) – t-SNE iteration cap when that backend is used.
grid_n (int) – Number of grid steps per axis.
grid_lims_scale (float) – Scale factor for the posterior grid bounds.
eps (float) – Numerical stabilizer used in posterior scoring.
split_method (str) – Split strategy, either
'random'or'butina'.test_fraction (float) – Fraction of compounds assigned to the test set.
random_state (int) – Random seed used across the workflow.
butina_distance_cutoff (float) – Distance threshold used by Butina clustering.
mcc_prob_threshold (float) – Probability threshold used for MCC.
morgan_radius (int) – Morgan fingerprint radius for RDKit similarity.
morgan_n_bits (int) – Morgan fingerprint size for RDKit similarity.
- Returns:
Typed PSMA result payload containing workflow outputs, diagnostics, parameters, and data frames.
- Return type:
- psma.main.run_psma_surface(df, *, y_col, label_threshold, label_direction, similarity_method='rdkit_morgan_tanimoto', embed_method='pcoa', smiles_col='smiles', fp_col=None, emb_col=None, mol_id_col='mol_id', triples_df=None, distance_k=0.382, tsne_perplexity=30, tsne_max_iter=500, grid_n=128, grid_lims_scale=2.0, eps=1e-12, split_method='random', test_fraction=0.2, random_state=42, butina_distance_cutoff=0.4, mcc_prob_threshold=0.5, morgan_radius=2, morgan_n_bits=2048, output_dir=None)¶
Run the PSMA workflow and optionally persist standard artifacts.
- Parameters:
df (DataFrame) – Input dataframe containing endpoint and feature columns.
y_col (str) – Continuous endpoint column name.
label_threshold (float) – Threshold used to define the positive class.
label_direction (str) – Threshold direction, either
'le'or'ge'.similarity_method (str) – Similarity backend name.
embed_method (str) – Embedding backend name.
smiles_col (str | None) – Optional SMILES column for RDKit similarity.
fp_col (str | None) – Optional precomputed fingerprint column.
emb_col (str | None) – Optional embedding column for cosine similarity.
mol_id_col (str | None) – Optional molecule identifier column.
triples_df (DataFrame | None) – Optional triples dataframe for imported similarities.
distance_k (float) – Convex distance-transform parameter.
tsne_perplexity (float) – t-SNE perplexity when that backend is used.
tsne_max_iter (int) – t-SNE iteration cap when that backend is used.
grid_n (int) – Number of grid steps per axis.
grid_lims_scale (float) – Scale factor for the posterior grid bounds.
eps (float) – Numerical stabilizer used in posterior scoring.
split_method (str) – Split strategy, either
'random'or'butina'.test_fraction (float) – Fraction of compounds assigned to the test set.
random_state (int) – Random seed used across the workflow.
butina_distance_cutoff (float) – Distance threshold used by Butina clustering.
mcc_prob_threshold (float) – Probability threshold used for MCC.
morgan_radius (int) – Morgan fingerprint radius for RDKit similarity.
morgan_n_bits (int) – Morgan fingerprint size for RDKit similarity.
output_dir (str | Path | None) – Optional output directory. When provided, standard artifacts are written to disk.
- Returns:
Typed PSMA result payload. The
artifacts_dirfield is set only when artifact writing is requested.- Return type:
psma.metrics module¶
Evaluation metrics for PSMA predictions.
- psma.metrics.compute_metrics(y_true, prob, *, threshold=0.5)¶
Compute AUC, MCC, and confusion entries for test predictions.
- Parameters:
y_true (ndarray) – Binary ground-truth test labels.
prob (ndarray) – Predicted positive-class probabilities.
threshold (float) – Probability threshold used for hard predictions.
- Returns:
Metric summary containing AUC, MCC, threshold, and confusion counts.
- Return type:
dict[str, Any]
psma.plot module¶
Notebook-friendly plotting helpers for PSMA surfaces.
- psma.plot.plot_posterior_2d(ax, *, grid_x, grid_y, posterior_z, pos_density_z, c_test, y_test_bin, posterior_palette='rdylgn_reversed', contour_levels=(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), point_size=12, point_alpha=0.75)¶
Plot a 2D posterior surface with test points.
- Parameters:
ax (Axes) – Matplotlib axes used for rendering.
grid_x (ndarray) – X-axis coordinates of the posterior grid.
grid_y (ndarray) – Y-axis coordinates of the posterior grid.
posterior_z (ndarray) – Posterior probability surface values.
pos_density_z (ndarray) – Positive-class density surface values.
c_test (ndarray) – Projected test coordinates.
y_test_bin (ndarray) – Binary test labels aligned to
c_test.posterior_palette (str) – Palette selector for the posterior surface.
contour_levels (tuple[float, ...]) – Explicit posterior levels used for contour lines.
point_size (int) – Size of the projected test-point glyphs.
point_alpha (float) – Opacity for the projected test-point glyphs.
- Returns:
The input axes after rendering.
- Raises:
ValueError – If the posterior palette name is unsupported.
- Return type:
Axes
- psma.plot.plot_posterior_3d(*, grid_x, grid_y, posterior_z, c_test, prob_test, posterior_palette='rdylgn_reversed')¶
Plot a 3D posterior surface with scored test points.
- Parameters:
grid_x (ndarray) – X-axis coordinates of the posterior grid.
grid_y (ndarray) – Y-axis coordinates of the posterior grid.
posterior_z (ndarray) – Posterior probability surface values.
c_test (ndarray) – Projected test coordinates.
prob_test (ndarray) – Posterior probabilities for the test coordinates.
posterior_palette (str) – Palette selector for the posterior surface.
- Returns:
Tuple of the created figure and 3D axes.
- Raises:
ValueError – If the posterior palette name is unsupported.
- psma.plot.plot_posterior_2d_interactive(*, grid_x, grid_y, posterior_z, pos_density_z, c_test, y_test_bin, prob_test=None, mol_ids=None, smiles=None, width=950, height=720, title='Interactive PSMA Posterior (toggle layers via legend)', posterior_palette='rdylgn_reversed', heatmap_palette=None, contour_levels=(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), contour_width=5.0, contour_alpha=1.0, contour_color=None, point_size=4, point_fill_alpha=0.5, point_line_alpha=0.8, point_line_width=0.5)¶
Build an interactive Bokeh version of the 2D posterior plot.
- Parameters:
grid_x (ndarray) – X-axis coordinates of the posterior grid.
grid_y (ndarray) – Y-axis coordinates of the posterior grid.
posterior_z (ndarray) – Posterior probability surface values.
pos_density_z (ndarray) – Positive-class density surface values.
c_test (ndarray) – Projected test coordinates.
y_test_bin (ndarray) – Binary test labels aligned to
c_test.prob_test (ndarray | None) – Optional posterior probabilities for the test points.
mol_ids (ndarray | None) – Optional molecule identifiers aligned to
c_test.smiles (ndarray | None) – Optional SMILES strings aligned to
c_testfor hover depictions.width (int) – Width of the interactive plot in pixels.
height (int) – Height of the interactive plot in pixels.
title (str) – Figure title shown above the interactive plot.
posterior_palette (str) – Palette selector for the posterior heatmap. Use
"rdylgn_reversed"for notebook-style parity or"viridis"for the previous default.heatmap_palette (str | None) – Backwards-compatible alias for
posterior_palette.contour_levels (tuple[float, ...]) – Explicit posterior levels used for contour lines.
contour_width (float) – Width of the contour lines.
contour_alpha (float) – Opacity applied to the contour layer.
contour_color (str | None) – Optional fixed contour color. When
None, contour levels are color-mapped instead.point_size (int) – Size of the projected test-point glyphs.
point_fill_alpha (float) – Fill opacity for the projected test points.
point_line_alpha (float) – Outline opacity for the projected test points.
point_line_width (float) – Outline width for the projected test points.
- Returns:
Configured Bokeh figure object.
- Raises:
ImportError – If Bokeh is not installed or if SMILES hover depictions are requested without RDKit.
ValueError – If the heatmap palette name is unsupported.
- Return type:
Any
psma.project module¶
Projection utilities for mapping test points into reference embedding space.
- psma.project.project_test_coordinates(d_train: ndarray, d_test_train: ndarray, c_train: ndarray, *, return_diagnostics: Literal[False] = False) ndarray¶
- psma.project.project_test_coordinates(d_train: ndarray, d_test_train: ndarray, c_train: ndarray, *, return_diagnostics: Literal[True]) tuple[ndarray, ProjectionDiagnostics]
Project test distances into the 2D embedding using a pseudoinverse map.
- Parameters:
d_train – Square training distance matrix.
d_test_train – Rectangular test-vs-train distance block.
c_train – Embedded training coordinates.
return_diagnostics – Whether to return numerical stability diagnostics alongside projected coordinates.
- Returns:
Projected test coordinates, optionally with diagnostics.
- Raises:
ValueError – If projection inputs are malformed.
psma.psma module¶
Posterior surface calculation functions.
- psma.psma.compute_posterior(pos_density, neg_density, *, prior_pos, prior_neg, eps=1e-12)¶
Compute posterior probabilities from class densities and priors.
- Parameters:
pos_density (ndarray) – Positive-class density values.
neg_density (ndarray) – Negative-class density values.
prior_pos (float) – Positive-class prior probability.
prior_neg (float) – Negative-class prior probability.
eps (float) – Numerical stabilizer added to the denominator.
- Returns:
Posterior positive-class probabilities.
- Return type:
ndarray
psma.results module¶
Public typed result models for PSMA workflow outputs.
- class psma.results.PsmaParams(y_col, label_threshold, label_direction, similarity_method, embed_method, distance_k, tsne_perplexity, tsne_max_iter, grid_n, grid_lims_scale, eps, split_method, test_fraction, random_state, butina_distance_cutoff, mcc_prob_threshold, morgan_radius, morgan_n_bits)¶
Bases:
objectStable parameter payload describing one PSMA workflow run.
- Parameters:
y_col (str)
label_threshold (float)
label_direction (str)
similarity_method (str)
embed_method (str)
distance_k (float)
tsne_perplexity (float)
tsne_max_iter (int)
grid_n (int)
grid_lims_scale (float)
eps (float)
split_method (str)
test_fraction (float)
random_state (int)
butina_distance_cutoff (float)
mcc_prob_threshold (float)
morgan_radius (int)
morgan_n_bits (int)
- y_col: str¶
- label_threshold: float¶
- label_direction: str¶
- similarity_method: str¶
- embed_method: str¶
- distance_k: float¶
- tsne_perplexity: float¶
- tsne_max_iter: int¶
- grid_n: int¶
- grid_lims_scale: float¶
- eps: float¶
- split_method: str¶
- test_fraction: float¶
- random_state: int¶
- butina_distance_cutoff: float¶
- mcc_prob_threshold: float¶
- morgan_radius: int¶
- morgan_n_bits: int¶
- class psma.results.PsmaIndices(train, test)¶
Bases:
objectTrain and test split indices for one PSMA workflow run.
- Parameters:
train (ndarray)
test (ndarray)
- train: ndarray¶
- test: ndarray¶
- class psma.results.PsmaSimilarity(s_full, s_train, s_test_train)¶
Bases:
objectSimilarity matrices produced during the PSMA workflow.
- Parameters:
s_full (ndarray)
s_train (ndarray)
s_test_train (ndarray)
- s_full: ndarray¶
- s_train: ndarray¶
- s_test_train: ndarray¶
- class psma.results.PsmaDistance(d_train, d_test_train)¶
Bases:
objectDistance matrices derived from workflow similarity matrices.
- Parameters:
d_train (ndarray)
d_test_train (ndarray)
- d_train: ndarray¶
- d_test_train: ndarray¶
- class psma.results.PsmaCoordinates(c_train, c_test)¶
Bases:
objectEmbedded train and projected test coordinates.
- Parameters:
c_train (ndarray)
c_test (ndarray)
- c_train: ndarray¶
- c_test: ndarray¶
- class psma.results.PsmaLabels(y_train_bin, y_test_bin, prior_pos, prior_neg)¶
Bases:
objectThresholded labels and derived training priors.
- Parameters:
y_train_bin (ndarray)
y_test_bin (ndarray)
prior_pos (float)
prior_neg (float)
- y_train_bin: ndarray¶
- y_test_bin: ndarray¶
- prior_pos: float¶
- prior_neg: float¶
- class psma.results.PsmaGrid(grid_x, grid_y, pos_density_z, neg_density_z, posterior_z)¶
Bases:
objectEvaluated density and posterior surfaces on the PSMA grid.
- Parameters:
grid_x (ndarray)
grid_y (ndarray)
pos_density_z (ndarray)
neg_density_z (ndarray)
posterior_z (ndarray)
- grid_x: ndarray¶
- grid_y: ndarray¶
- pos_density_z: ndarray¶
- neg_density_z: ndarray¶
- posterior_z: ndarray¶
- class psma.results.ProjectionDiagnostics(train_rank, train_condition_number, is_ill_conditioned, diagnostics)¶
Bases:
objectNumerical diagnostics for pseudoinverse-based projection.
- Parameters:
train_rank (int)
train_condition_number (float)
is_ill_conditioned (bool)
diagnostics (list[str])
- train_rank: int¶
- train_condition_number: float¶
- is_ill_conditioned: bool¶
- diagnostics: list[str]¶
- class psma.results.PsmaMetrics(auc, auc_defined, mcc, threshold, confusion, metric_notes)¶
Bases:
objectSummary metrics for PSMA test-set predictions.
- Parameters:
auc (float | None)
auc_defined (bool)
mcc (float)
threshold (float)
confusion (dict[str, int])
metric_notes (dict[str, str])
- auc: float | None¶
- auc_defined: bool¶
- mcc: float¶
- threshold: float¶
- confusion: dict[str, int]¶
- metric_notes: dict[str, str]¶
- class psma.results.PsmaFrames(train, test)¶
Bases:
objectNotebook-friendly output frames for train and test compounds.
- Parameters:
train (DataFrame)
test (DataFrame)
- train: DataFrame¶
- test: DataFrame¶
- class psma.results.PsmaResult(indices, similarity, distance, coords, labels, grid, projection, prob_test, metrics, params, artifacts_dir, frames)¶
Bases:
objectTop-level typed result returned by the PSMA workflow.
- Parameters:
indices (PsmaIndices)
similarity (PsmaSimilarity)
distance (PsmaDistance)
coords (PsmaCoordinates)
labels (PsmaLabels)
grid (PsmaGrid)
projection (ProjectionDiagnostics)
prob_test (ndarray)
metrics (PsmaMetrics)
params (PsmaParams)
artifacts_dir (Path | None)
frames (PsmaFrames)
- indices: PsmaIndices¶
- similarity: PsmaSimilarity¶
- distance: PsmaDistance¶
- coords: PsmaCoordinates¶
- labels: PsmaLabels¶
- projection: ProjectionDiagnostics¶
- prob_test: ndarray¶
- metrics: PsmaMetrics¶
- params: PsmaParams¶
- artifacts_dir: Path | None¶
- frames: PsmaFrames¶
psma.score module¶
Direct test-point posterior scoring using fitted KDEs.
- psma.score.score_test_points(c_test, *, kde_pos, kde_neg, hx, hy, prior_pos, prior_neg, eps=1e-12)¶
Score test points with Bayesian posterior probabilities.
- Parameters:
c_test (ndarray) – Projected test coordinates.
kde_pos (KernelDensity) – KDE fitted on positive-class training coordinates.
kde_neg (KernelDensity) – KDE fitted on negative-class training coordinates.
hx (float) – Horizontal bandwidth used to scale coordinates.
hy (float) – Vertical bandwidth used to scale coordinates.
prior_pos (float) – Positive-class prior probability.
prior_neg (float) – Negative-class prior probability.
eps (float) – Numerical stabilizer added to the denominator.
- Returns:
Posterior positive-class probabilities for the test points.
- Return type:
ndarray
psma.similarity module¶
Similarity matrix construction backends.
- psma.similarity.build_similarity_from_triples(*, n_samples, triples_df, mol_id_to_index, molid1_col='molid1', molid2_col='molid2', score_col='Sscore')¶
Build a full similarity matrix from a sparse triples table.
- Parameters:
n_samples (int) – Number of samples expected in the full matrix.
triples_df (DataFrame) – Sparse triples dataframe containing pair scores.
mol_id_to_index (dict[str, int]) – Mapping from molecule id to matrix row index.
molid1_col (str) – Name of the first molecule id column.
molid2_col (str) – Name of the second molecule id column.
score_col (str) – Name of the similarity score column.
- Returns:
Symmetric similarity matrix with unit diagonal.
- Return type:
ndarray
- psma.similarity.build_similarity_rdkit_morgan(*, smiles=None, fps=None, radius=2, n_bits=2048)¶
Build a full Tanimoto similarity matrix using Morgan fingerprints.
- Parameters:
smiles (list[str] | None) – Optional SMILES strings used to generate fingerprints.
fps (list[Any] | None) – Optional precomputed fingerprints.
radius (int) – Morgan fingerprint radius.
n_bits (int) – Morgan fingerprint size.
- Returns:
Full pairwise Tanimoto similarity matrix.
- Raises:
ValueError – If neither SMILES nor fingerprints are provided, or if an invalid SMILES string is encountered.
ImportError – If RDKit is required but not installed.
- Return type:
ndarray
- psma.similarity.build_similarity_embedding_cosine(embeddings, *, map_to_unit=True)¶
Build a similarity matrix from cosine similarities of embeddings.
- Parameters:
embeddings (ndarray) – Dense embedding matrix with one row per sample.
map_to_unit (bool) – Whether to map cosine similarity from
[-1, 1]to[0, 1].
- Returns:
Pairwise similarity matrix clipped to
[0, 1].- Return type:
ndarray
psma.split module¶
Train/test splitting utilities.
- psma.split.make_train_test_split(n_samples, *, split_method='random', test_fraction=0.2, random_state=42, similarity_matrix=None, butina_distance_cutoff=0.4)¶
Create train/test index arrays for the selected split strategy.
- Parameters:
n_samples (int) – Total number of samples available for splitting.
split_method (str) – Split strategy, either
'random'or'butina'.test_fraction (float) – Fraction of samples assigned to the test split.
random_state (int) – Random seed used by the splitter.
similarity_matrix (ndarray | None) – Full similarity matrix required by Butina splitting.
butina_distance_cutoff (float) – Distance threshold used by Butina clustering.
- Returns:
Sorted training and test index arrays.
- Raises:
ValueError – If split configuration is unsupported or cannot produce a non-empty train/test split.
- Return type:
tuple[ndarray, ndarray]