API
- add_descriptors(examples, descriptor_type='MACCS', mols=None)
Add descriptors to passed examples
- Parameters:
examples (
List
[Example
]) – List of exampledescriptor_type (
str
) – Kind of descriptors to return, choose between ‘Classic’, ‘ECFP’, or ‘MACCS’. Default is ‘MACCS’.mols (
List
[Any
]) – Can be used if you already have rdkit Mols computed.
- Return type:
List
[Example
]- Returns:
List of examples with added descriptors
- cf_explain(examples, nmols=3, filter_nondrug=None)
From given
Examples
, find closest counterfactuals (see Getting Started)- Parameters:
examples (
List
[Example
]) – Output fromsample_space()
nmols (
int
) – Desired number of moleculesfilter_nondrug (
Optional
[bool
]) – Whether or not to filter out non-drug molecules. Default is True if input passes filter
- Return type:
List
[Example
]
- check_multiple_aromatic_rings(mol)
- clear_descriptors(examples)
Clears all descriptors from examples
- get_basic_alphabet()
Returns set of interpretable SELFIES tokens
Generated by removing P and most ionization states from
selfies.get_semantic_robust_alphabet()
- Return type:
Set
[str
]- Returns:
Set of interpretable SELFIES tokens
- lime_explain(examples, descriptor_type='MACCS', return_beta=True)
From given
Examples
, find descriptor t-statistics (see :doc: index)- Parameters:
examples (
List
[Example
]) – Output from :func: sample_spacedescriptor_type (
str
) – Desired descriptors, choose from ‘Classic’, ‘ECFP’ ‘MACCS’
- Return_beta:
Whether or not the function should return regression coefficient values
- merge_text_explains(*args, filter=None)
Merge multiple text explanations into one and sort.
- Return type:
List
[Tuple
[str
,float
]]
- name_morgan_bit(m, bitInfo, key)
Get the name of a Morgan bit using a SMARTS dictionary
- Parameters:
m (
Any
) – RDKit moleculebitInfo (
Dict
[Any
,Any
]) – bitInfo dictionary from rdkit.Chem.AllChem.GetMorganFingerprintkey (
int
) – bit key corresponding to the fingerprint you want to have named
- Return type:
str
- plot_cf(exps, fig=None, figure_kwargs=None, mol_size=(200, 200), mol_fontsize=10, nrows=None, ncols=None)
Draw the given set of Examples in a grid
- Parameters:
exps (
List
[Example
]) – Small list ofExample
which will be drawnfig (
Any
) – Figure to plot ontofigure_kwargs (
Dict
) – kwargs to pass toplt.figure
mol_size (
Tuple
[int
,int
]) – size of rdkit molecule rendering, in pixlesmol_fontsize (
int
) – minimum font size passed to rdkitnrows (
int
) – number of rows to draw in gridncols (
int
) – number of columns to draw in grid
- plot_descriptors(examples, output_file=None, fig=None, figure_kwargs=None, title=None, return_svg=False)
Plot descriptor attributions from given set of Examples.
- Parameters:
examples (
List
[Example
]) – Output fromsample_space()
output_file (
str
) – Output file name to save the plot - optional except for ECFPfig (
Any
) – Figure to plot on tofigure_kwargs (
Dict
) – kwargs to pass toplt.figure
title (
str
) – Title for the plotreturn_svg (
bool
) – Whether to return svg for plot
- plot_space(examples, exps, figure_kwargs=None, mol_size=(200, 200), highlight_clusters=False, mol_fontsize=8, offset=0, ax=None, cartoon=False, rasterized=False)
Plot chemical space around example and annotate given examples.
- Parameters:
examples (
List
[Example
]) – Large list of :obj:Example which make-up pointsexps (
List
[Example
]) – Small list of :obj:Example which will be annotatedfigure_kwargs (
Dict
) – kwargs to pass toplt.figure
mol_size (
Tuple
[int
,int
]) – size of rdkit molecule rendering, in pixleshighlight_clusters (
bool
) – if True, cluster indices are rendered instead of :obj:Example.yhatmol_fontsize (
int
) – minimum font size passed to rdkitoffset (
int
) – offset annotations to allow colorbar or other elements to fit into plot.ax (
Any
) – axis onto which to plotcartoon (
bool
) – do cartoon outline on points?rasterized (
bool
) – raster the scatter?
- rcf_explain(examples, delta=(-1, 1), nmols=4, filter_nondrug=None)
From given
Examples
, find closest counterfactuals (see Getting Started) This version works with regression, so that a counterfactual is if the given example is higher or lower than base.- Parameters:
examples (
List
[Example
]) – Output fromsample_space()
delta (
Union
[Any
,Tuple
[float
,float
]]) – float or tuple of hi/lo indicating margin for what is counterfactualnmols (
int
) – Desired number of moleculesfilter_nondrug (
Optional
[bool
]) – Whether or not to filter out non-drug molecules. Default is True if input passes filter
- Return type:
List
[Example
]
- run_chemed(origin_smiles, num_samples, similarity=0.1, fp_type='ECFP4', _pbar=None)
This method is similar to STONED but works by quering PubChem
- Parameters:
origin_smiles (
str
) – Base SMILESnum_samples (
int
) – Minimum number of returned molecules. May return less due to network timeout or exhausting treesimilarity (
float
) – Tanimoto similarity to use in query (float between 0 to 1)fp_type (
str
) – Fingerprint type
- Return type:
Tuple
[List
[str
],List
[float
]]- Returns:
SMILES and SCORES
- run_custom(origin_smiles, data, fp_type='ECFP4', _pbar=None, **kwargs)
This method is similar to STONED but uses a custom dataset provided by the user
- Parameters:
origin_smiles (
str
) – Base SMILESdata (
List
[Union
[str
,Mol
]]) – List of SMILES or RDKit moleculesfp_type (
str
) – Fingerprint type
- Return type:
Tuple
[List
[str
],List
[float
]]- Returns:
SMILES and SCORES
- run_stoned(start_smiles, fp_type='ECFP4', num_samples=2000, max_mutations=2, min_mutations=1, alphabet=None, return_selfies=False, _pbar=None)
Run ths STONED SELFIES algorithm. Typically not used, call
sample_space()
instead.- Parameters:
start_smiles (
str
) – SMILES string to start fromfp_type (
str
) – Fingerprint typenum_samples (
int
) – Number of total molecules to generatemax_mutations (
int
) – Maximum number of mutationsmin_mutations (
int
) – Minimum number of mutationsalphabet (
Union
[List
[str
],Set
[str
]]) – Alphabet to use for mutations, typically fromget_basic_alphabet()
return_selfies (
bool
) – If SELFIES should be returned as well
- Return type:
Union
[Tuple
[List
[str
],List
[float
]],Tuple
[List
[str
],List
[str
],List
[float
]]]- Returns:
SELFIES, SMILES, and SCORES generated or SMILES and SCORES generated
- sample_space(origin_smiles, f, batched=True, preset='medium', data=None, method_kwargs=None, num_samples=None, stoned_kwargs=None, quiet=False, use_selfies=False, sanitize_smiles=True)
Sample chemical space around given SMILES
This will evaluate the given function and run the
run_stoned()
function over chemical space around molecule.num_samples
will be set to 3,000 by default if using STONED and 150 if usingchemed
. If usingcustom
thennum_samples
will be set to the length of of thedata
list. If usingsynspace
thennum_samples
will be set to 1,000. Seerun_stoned()
andrun_chemed()
for more details.synspace
comes from the package synspace <https://github.com/whitead/synspace>. It generates synthetically feasible molecules from a given SMILES.- Parameters:
origin_smiles (
str
) – starting SMILESf (
Union
[Callable
[[str
,str
],List
[float
]],Callable
[[str
],List
[float
]],Callable
[[List
[str
],List
[str
]],List
[float
]],Callable
[[List
[str
]],List
[float
]]]) – A function which takes in SMILES or SELFIES and returns predicted value. Assumed to work with lists of SMILES/SELFIES unless batched = Falsebatched (
bool
) – If f is batchedpreset (
str
) – Can be “wide”, “medium”, “narrow”, “chemed”, “custom”, or “synspace”. Determines how far across chemical space is sampled. Try “chemed” preset to only sample pubchem compounds.data (
List
[Union
[str
,Mol
]]) – If not None and preset is “custom” will use this data instead of generating new ones.method_kwargs (
Dict
) – More control over STONED, CHEMED and CUSTOM can be set here. Seerun_stoned()
,run_chemed()
andrun_custom()
num_samples (
int
) – Number of desired samples. Can be set in method_kwargs (overrides) or here. None means default for presetstoned_kwargs (
Dict
) – Backwards compatible alias for methods_kwargsquiet (
bool
) – If True, will not print progress baruse_selfies (
bool
) – If True, will use SELFIES instead of SMILES for fsanitize_smiles (
bool
) – If True, will sanitize all SMILES
- Return type:
List
[Example
]- Returns:
List of generated
Example
- text_explain(examples, descriptor_type='maccs', count=5, presence_thresh=0.2, include_weak=None)
Take an example and convert t-statistics into text explanations
- Parameters:
examples (
List
[Example
]) – Output fromsample_space()
descriptor_type (
str
) – Type of descriptor, either “maccs”, or “ecfp”.count (
int
) – Number of text explanations to returnpresence_thresh (
float
) – Threshold for presence of descriptor in examplesinclude_weak (
Optional
[bool
]) – Include weak descriptors. If not set, the function
- Return type:
List
[Tuple
[str
,float
]]
will be first have this set to False, and if no descriptors are found, will be set to True and function will be re-run
- text_explain_generate(text_explanations, property_name, llm_model='gpt-4o', single=True)
Insert text explanations into template, and generate explanation.
- Return type:
str
- Args:
text_explanations: List of text explanations. property_name: Name of property. llm: Language model to use. single: Whether to use a prompt about a single molecule or multiple molecules.
- class Descriptors(descriptor_type, descriptors, descriptor_names, plotting_names=(), tstats=())
Molecular descriptors
-
descriptor_names:
tuple
-
descriptor_type:
str
Descriptor type
-
descriptors:
tuple
Descriptor values
-
plotting_names:
tuple
= ()
-
tstats:
tuple
= ()
-
descriptor_names:
- class Example(smiles, selfies, similarity, yhat, index, position=<factory>, is_origin=False, cluster=0, label=None, descriptors=None)
Example of a molecule
-
cluster:
int
= 0 Index of cluster, can be -1 for no cluster
-
descriptors:
Descriptors
= None Descriptors for this example
-
index:
int
Index relative to other examples
-
is_origin:
bool
= False True if base
-
label:
str
= None Label for this example
-
selfies:
str
SELFIES for molecule, as output from
selfies.encoder()
-
similarity:
float
Tanimoto similarity relative to base
-
smiles:
str
SMILES string for molecule
-
yhat:
float
Output of model function
-
cluster:
- insert_svg(exps, mol_size=(200, 200), mol_fontsize=10)
Replace rasterized image files with SVG versions of molecules
- Parameters:
exps (
List
[Example
]) – The molecules for which images should be replaced. Typically just counterfactuals or some small setmol_size (
Tuple
[int
,int
]) – If mol_size was specified, it needs to be re-specified here
- Return type:
str
- Returns:
SVG string that can be saved or displayed in juypter notebook
- moldiff(template, query)
Compare the two rdkit molecules.
- Parameters:
template – template molecule
query – query molecule
- Return type:
Tuple
[List
[int
],List
[int
]]- Returns:
list of modified atoms in query, list of modified bonds in query
- plot_space_by_fit(examples, exps, beta, mol_size=(200, 200), mol_fontsize=8, offset=0, ax=None, figure_kwargs=None, cartoon=False, rasterized=False)
Plot chemical space around example by LIME fit and annotate given examples. Adapted from
plot_space()
.- Parameters:
examples (
List
[Example
]) – Large list of :obj:Example which make-up pointsexps (
List
[Example
]) – Small list of :obj:Example which will be annotatedbeta (
List
) – beta output fromlime_explain()
mol_size (
Tuple
[int
,int
]) – size of rdkit molecule rendering, in pixlesmol_fontsize (
int
) – minimum font size passed to rdkitoffset (
int
) – offset annotations to allow colorbar or other elements to fit into plot.ax (
Any
) – axis onto which to plotfigure_kwargs (
Dict
) – kwargs to pass toplt.figure
cartoon (
bool
) – do cartoon outline on points?rasterized (
bool
) – raster the scatter?
- similarity_map_using_tstats(example, mol_size=(300, 200), return_svg=False)
Create similarity map for example molecule using descriptor t-statistics. Only works for ECFP descriptors
- Parameters:
example (
Example
) – Example objectmol_size (
Tuple
[int
,int
]) – size of molecule imagereturn_svg (
bool
) – return svg instead of saving to file
- Return type:
Optional
[str
]- Returns:
svg if return_svg is True, else None
- trim(im)
Implementation of whitespace trim
credit: https://stackoverflow.com/a/10616717
- Parameters:
im – PIL image
- Returns:
PIL image