Tutorial
We’ll show here how to explain molecular property prediction tasks without access to the gradients or any properties of a molecule. To set-up this activity, we need a black box model. We’ll use something simple here – the model is classifier that says if a molecule has an alcohol (1) or not (0). Let’s implement this model first
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
# set-up rdkit drawing preferences
IPythonConsole.ipython_useSVG = True
IPythonConsole.drawOptions.drawMolsSameScale = False
def model(smiles):
mol = Chem.MolFromSmiles(smiles)
match = mol.GetSubstructMatches(Chem.MolFromSmarts("[O;!H0]"))
return 1 if match else 0
Let’s now try it out on some molecules
smi = "CCCCCCO"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = "OCCCCCCO"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = "c1ccccc1"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 0
Counterfacutal explanations
Let’s now explain the model - pretending we don’t know how it works - using counterfactuals
import exmol
instance = "CCCCCCO"
space = exmol.sample_space(instance, model, batched=False)
cfs = exmol.cf_explain(space, 1)
exmol.plot_cf(cfs)
We can see that removing the alcohol is the smallest change to affect the prediction of this molecule. Let’s see the space and look at where these counterfactuals are.
exmol.plot_space(space, cfs)
Explain using substructures
Now we’ll try to explain our model using substructures.
exmol.lime_explain(space)
exmol.plot_descriptors(space)
SMARTS annotations for MACCS descriptors were created using SMARTSviewer (smartsview.zbh.uni-hamburg.de, Copyright: ZBH, Center for Bioinformatics Hamburg) developed by K. Schomburg et. al. (J. Chem. Inf. Model. 2010, 50, 9, 1529–1535)
This seems like a pretty clear explanation. Let’s take a look at using substructures that are present in the molecule
import skunk
exmol.lime_explain(space, descriptor_type="ECFP")
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)
svg = exmol.plot_utils.similarity_map_using_tstats(space[0], return_svg=True)
skunk.display(svg)
We can see that most of the model is explained from the presence of the alcohol group - as expected.
Text
We can prepare a natural language summary of these results using exmol
:
exmol.lime_explain(space, descriptor_type="ECFP")
e = exmol.text_explain(space)
for ei in e:
print(ei[0], end="")
Is there primary alcohol? Yes and this is positively correlated with property. This is very important for the property
To prepare the natural language summary, we need to convert to a prompt that a model like GPT-3 can parse. Insert the output below into a language model to get a summary.
Or you can pass it directly, by installing the langchain
package and setting-up an openai key
print(exmol.text_explain_generate(e, property_name="active"))
---------------------------------------------------------------------------
OpenAIError Traceback (most recent call last)
Cell In[11], line 1
----> 1 print(exmol.text_explain_generate(e, property_name="active"))
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/exmol/exmol.py:1444, in text_explain_generate(text_explanations, property_name, llm_model, single)
1435 prompt = prompt_template.format(property=property_name, text=text)
1437 messages = [
1438 {
1439 "role": "system",
(...)
1442 {"role": "user", "content": prompt},
1443 ]
-> 1444 response = openai.chat.completions.create(
1445 model=llm_model,
1446 messages=messages,
1447 temperature=0.05,
1448 )
1450 return response.choices[0].message.content
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_utils/_proxy.py:20, in LazyProxy.__getattr__(self, attr)
19 def __getattr__(self, attr: str) -> object:
---> 20 proxied = self.__get_proxied__()
21 if isinstance(proxied, LazyProxy):
22 return proxied # pyright: ignore
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_utils/_proxy.py:55, in LazyProxy.__get_proxied__(self)
54 def __get_proxied__(self) -> T:
---> 55 return self.__load__()
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_module_client.py:12, in ChatProxy.__load__(self)
10 @override
11 def __load__(self) -> resources.Chat:
---> 12 return _load_client().chat
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/__init__.py:327, in _load_client()
311 _client = _AzureModuleClient( # type: ignore
312 api_version=api_version,
313 azure_endpoint=azure_endpoint,
(...)
323 http_client=http_client,
324 )
325 return _client
--> 327 _client = _ModuleClient(
328 api_key=api_key,
329 organization=organization,
330 project=project,
331 base_url=base_url,
332 timeout=timeout,
333 max_retries=max_retries,
334 default_headers=default_headers,
335 default_query=default_query,
336 http_client=http_client,
337 )
338 return _client
340 return _client
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_client.py:105, in OpenAI.__init__(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
103 api_key = os.environ.get("OPENAI_API_KEY")
104 if api_key is None:
--> 105 raise OpenAIError(
106 "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
107 )
108 self.api_key = api_key
110 if organization is None:
OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable