23. Modern Molecular NNs

We have seen two chapters about equivariances in Input Data & Equivariances and Equivariant Neural Networks. We have seen one chapter on dealing with molecules as objects with permutation equivariance Graph Neural Networks. We will combine these ideas and create neural networks that can treat arbitrary molecules with point clouds and permutation equivariance. We already saw SchNet is able to do this by working with an invariant point cloud representation (distance to atoms), but modern networks mix in ideas from Equivariant Neural Networks. This is a highly-active research area, especially for predicting energies, forces, and relaxed structures of molecules.

Audience & Objectives

This chapter assumes you have read Input Data & Equivariances, Equivariant Neural Networks, and Graph Neural Networks. You should be able to

  • Categorize a task (features/labels) by equivariance

  • Understand body-ordered expansions

  • Differentiate models based on their message passing, message type, and body-ordering

Warning

This chapter is in progress

24. Expressiveness

The Equivariant SO(3) ideas from Equivariant Neural Networks will not work on variable sized molecules because the layers are not permutation equivariant. We also know that graph neural networks (GNNs) have permutation equivariance and, with the correct choice of edge features, rotation and translation invariance. So why go beyond GNNs?

One reason is that the standard GNNs cannot distinguish certain types of graphs relevant for chemistry [Wesifeiler-Lehman Test] like decaline and bicylopentyl, which indeed hvae different properties. These can be distinguished if we also have (and use) their Cartesian coordinates.

There is also a common example called the “Picasso Test”, which is that rotationally invariant image neural networks cannot tell if a human eye is rotated [].

In the end though, most work on molecular neural networks is for neural potentials. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by

(24.1)\[\begin{equation} F\left(\vec{r}\right) = -\nabla U\left(\vec{r}\right) \end{equation}\]

where \(U\left(\vec{x}\right)\) is the rotation invariant potential given all atom positions \(\vec{r}\). So if we’re predicting a translation, rotation, and permutation invariant potential, why use equivariance? Performance. Models like SchNet or ANI are invariant and are not as accurate as models like TensorNet or Comorant that have equivariances in their internal layers but an invariant readout.

24.1. The Elements of Modern Molecular NNs

Over the beginning of 2022, a categorization has emerged of the main elements of modern molecular NNs (molnets): atomic cluster expansions (ACE), the body-order of the messages, and the architecture of the message passing neural network (MPNN). This categorization might also be viewed within the GNN theory as node features (ACE), message creation and aggregation (body-order), and node update (MPNN details). We’ll details these different elements below, with the most foucus on ACEs. See Graph Neural Networks for more details on MPNNs.

24.1.1. Atomic Cluster Expansions

An ACE is a per-atom tensor. The main idea of ACE is to encode the local environment of an atom into a feaeture tensor that describes its neighborhood of nearby atoms. This is like distinguishing between an oxygen in an alcohol group vs an oxygen in an ether. Both are oxygens, but we expext them to behave differently. ACE is the same idea, but for nearby atoms in space instead of just on the molecular graph.

The general equation for ACE (assuming O(3) equivariance) is [cite]:

(24.2)\[\begin{equation} A^{(t)}_{i, kl_3m_3} = \sum_{l_1m_1,l2_m2}C_{l1m_1,l_2m_2}^{l_3,m_3}\sum_{j \in \mathcal{N}(i)} R^{(t)}_{kl_1l_2l_3}\left(r_{ji}\right)Y_{l1}^{m_1}\left(\hat{\mathbf{r}}_{ji}\right)\mathcal{W}^{(t)}_{kl_2}h_{j,l_2m_2}^{(t)} \end{equation}\]

Wow! What an expression. Let’s go through this carefully, starting with the output. \(A^{(t)}_{i, kl_3m_3}\) are the feature tensor values for atom \(i\) at layer \(t\). There are channels indexed by \(k\) and the spherical harmonic indexes \(l_3m_3\). The right-hand side is nearly identical to the G-equivariant neural network layer equation from Equivariant Neural Networks. We have the input

How is this different than a MPNN