Tutorial

We’ll show here how to explain molecular property prediction tasks without access to the gradients or any properties of a molecule. To set-up this activity, we need a black box model. We’ll use something simple here – the model is classifier that says if a molecule has an alcohol (1) or not (0). Let’s implement this model first

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole

# set-up rdkit drawing preferences
IPythonConsole.ipython_useSVG = True
IPythonConsole.drawOptions.drawMolsSameScale = False


def model(smiles):
    mol = Chem.MolFromSmiles(smiles)
    match = mol.GetSubstructMatches(Chem.MolFromSmarts("[O;!H0]"))
    return 1 if match else 0

Let’s now try it out on some molecules

smi = "CCCCCCO"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = "OCCCCCCO"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = "c1ccccc1"
print("f(s)", model(smi))
Chem.MolFromSmiles(smi)
f(s) 0

Counterfacutal explanations

Let’s now explain the model - pretending we don’t know how it works - using counterfactuals

import exmol

instance = "CCCCCCO"
space = exmol.sample_space(instance, model, batched=False)
cfs = exmol.cf_explain(space, 1)
exmol.plot_cf(cfs)
../_images/a8e3b269a69d4d7495013514cb3c985e1813bda56e31f632b7f0f792daa0d0b0.png

We can see that removing the alcohol is the smallest change to affect the prediction of this molecule. Let’s see the space and look at where these counterfactuals are.

exmol.plot_space(space, cfs)
../_images/baf769b3ad90847f3f319e8981636004031f02777c5201cc3afc61d416175849.png

Explain using substructures

Now we’ll try to explain our model using substructures.

exmol.lime_explain(space)
exmol.plot_descriptors(space)
SMARTS annotations for MACCS descriptors were created using SMARTSviewer (smartsview.zbh.uni-hamburg.de, Copyright: ZBH, Center for Bioinformatics Hamburg) developed by K. Schomburg et. al. (J. Chem. Inf. Model. 2010, 50, 9, 1529–1535)
../_images/c19bc3ff5a99b807843e59b3d07825b22c55747faba21f571032e369a579db2d.png

This seems like a pretty clear explanation. Let’s take a look at using substructures that are present in the molecule

import skunk

exmol.lime_explain(space, descriptor_type="ECFP")
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)
svg = exmol.plot_utils.similarity_map_using_tstats(space[0], return_svg=True)
skunk.display(svg)

We can see that most of the model is explained from the presence of the alcohol group - as expected.

Text

We can prepare a natural language summary of these results using exmol:

exmol.lime_explain(space, descriptor_type="ECFP")
e = exmol.text_explain(space)
for ei in e:
    print(ei[0], end="")
Is there primary alcohol? Yes and this is positively correlated with property. This is very important for the property

To prepare the natural language summary, we need to convert to a prompt that a model like GPT-3 can parse. Insert the output below into a language model to get a summary.

Or you can pass it directly, by installing the langchain package and setting-up an openai key

print(exmol.text_explain_generate(e, property_name="active"))
---------------------------------------------------------------------------
OpenAIError                               Traceback (most recent call last)
Cell In[11], line 1
----> 1 print(exmol.text_explain_generate(e, property_name="active"))

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/exmol/exmol.py:1444, in text_explain_generate(text_explanations, property_name, llm_model, single)
   1435 prompt = prompt_template.format(property=property_name, text=text)
   1437 messages = [
   1438     {
   1439         "role": "system",
   (...)
   1442     {"role": "user", "content": prompt},
   1443 ]
-> 1444 response = openai.chat.completions.create(
   1445     model=llm_model,
   1446     messages=messages,
   1447     temperature=0.05,
   1448 )
   1450 return response.choices[0].message.content

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_utils/_proxy.py:20, in LazyProxy.__getattr__(self, attr)
     19 def __getattr__(self, attr: str) -> object:
---> 20     proxied = self.__get_proxied__()
     21     if isinstance(proxied, LazyProxy):
     22         return proxied  # pyright: ignore

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_utils/_proxy.py:55, in LazyProxy.__get_proxied__(self)
     54 def __get_proxied__(self) -> T:
---> 55     return self.__load__()

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_module_client.py:12, in ChatProxy.__load__(self)
     10 @override
     11 def __load__(self) -> resources.Chat:
---> 12     return _load_client().chat

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/__init__.py:327, in _load_client()
    311         _client = _AzureModuleClient(  # type: ignore
    312             api_version=api_version,
    313             azure_endpoint=azure_endpoint,
   (...)
    323             http_client=http_client,
    324         )
    325         return _client
--> 327     _client = _ModuleClient(
    328         api_key=api_key,
    329         organization=organization,
    330         project=project,
    331         base_url=base_url,
    332         timeout=timeout,
    333         max_retries=max_retries,
    334         default_headers=default_headers,
    335         default_query=default_query,
    336         http_client=http_client,
    337     )
    338     return _client
    340 return _client

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/openai/_client.py:105, in OpenAI.__init__(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
    103     api_key = os.environ.get("OPENAI_API_KEY")
    104 if api_key is None:
--> 105     raise OpenAIError(
    106         "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
    107     )
    108 self.api_key = api_key
    110 if organization is None:

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable