\n",
" \n",
"

\n",
"\n",
"Let's start by loading the trajectory and defining/testing our centering function. The trajectory is a tensor that is shape `time, point number, xy -> (2048, 12, 2)`. I'll examine two centering functions: align to center of mass and align to point 0 to see what the two look like."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-cell"
]
},
"outputs": [],
"source": [
"# new imports\n",
"import matplotlib.pyplot as plt\n",
"import urllib\n",
"import dmol"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"urllib.request.urlretrieve(\n",
" \"https://github.com/whitead/dmol-book/raw/master/data/paths.npz\", \"paths.npz\"\n",
")\n",
"paths = np.load(\"paths.npz\")[\"arr\"]\n",
"# plot the first point\n",
"plt.title(\"First Frame\")\n",
"plt.plot(paths[0, :, 0], paths[0, :, 1], \"o-\")\n",
"plt.xticks([])\n",
"plt.yticks([])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def center_com(paths):\n",
" \"\"\"Align paths to COM at each frame\"\"\"\n",
" coms = np.mean(paths, axis=-2, keepdims=True)\n",
" return paths - coms\n",
"\n",
"\n",
"def center_point(paths):\n",
" \"\"\"Align paths to particle 0\"\"\"\n",
" return paths - paths[:, :1]\n",
"\n",
"\n",
"ccpaths = center_com(paths)\n",
"cppaths = center_point(paths)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To compare, we'll draw a sample of frames on top of one another to see the now translationally invariant coordinates. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, axs = plt.subplots(ncols=3, squeeze=True, figsize=(16, 4))\n",
"\n",
"axs[0].set_title(\"No Center\")\n",
"axs[1].set_title(\"COM Center\")\n",
"axs[2].set_title(\"Point 0 Center\")\n",
"cmap = plt.get_cmap(\"cool\")\n",
"for i in range(0, 2048, 16):\n",
" axs[0].plot(paths[i, :, 0], paths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048))\n",
" axs[1].plot(\n",
" ccpaths[i, :, 0], ccpaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048)\n",
" )\n",
" axs[2].plot(\n",
" cppaths[i, :, 0], cppaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048)\n",
" )\n",
"for i in range(3):\n",
" axs[i].set_xticks([])\n",
" axs[i].set_yticks([])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The color indicates time. You can see that having no alignment makes the spatial coordinates depend on time implicitly because the points drift over time. Both aligning to COM or point 0 removes this effect. The COM implicitly removes 2 degrees of freedom. Point 0 alignment makes point 0 have no variance (also remove degrees of freedom), which could affect how you design or interpret your model.\n",
"\n",
"Now we will align by rotation. We need to define a unique rotation. A simple way is to choose 1 (or 2 in 3D) vectors that define our coordinate system directions. For example, we could choose that the vector from point 0 to point 1 defines the positive direction of the x-axis. A more sophisticated way is to find the principal axes of our points and align along these. For 2D, we only need to align to one of them. Again, this implicitly removes a degree of freedom. We will examine both. Computing principle axes requires an eigenvalue decomposition, so it's a bit more numerically intense {cite}`foote2000relation`. \n",
"\n",
"```{warning}\n",
"For all rotation alignment methods, you must have already centered the points. \n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def make_2drot(angle):\n",
" mats = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])\n",
" # swap so batch axis is first\n",
" return np.swapaxes(mats, 0, -1)\n",
"\n",
"\n",
"def align_point(paths):\n",
" \"\"\"Align to 0-1 vector assuming 2D data\"\"\"\n",
" vecs = paths[:, 0, :] - paths[:, 1, :]\n",
" # find angle to rotate so these are pointed towards pos x\n",
" cur_angle = np.arctan2(vecs[:, 1], vecs[:, 0])\n",
" rot_angle = -cur_angle\n",
" rot_mat = make_2drot(rot_angle)\n",
" # to mat mult at each frame\n",
" return paths @ rot_mat\n",
"\n",
"\n",
"def find_principle_axis(points, naxis=2):\n",
" \"\"\"Compute single principle axis for points\"\"\"\n",
" inertia = points.T @ points\n",
" evals, evecs = np.linalg.eig(inertia)\n",
" order = np.argsort(evals)[::-1]\n",
" # return largest vectors\n",
" return evecs[:, order[:naxis]].T\n",
"\n",
"\n",
"def align_principle(paths, axis_finder=find_principle_axis):\n",
" # someone should tell me how to vectorize this in numpy\n",
" vecs = [axis_finder(p) for p in paths]\n",
" vecs = np.array(vecs)\n",
" # find angle to rotate so these are pointed towards pos x\n",
" cur_angle = np.arctan2(vecs[:, 0, 1], vecs[:, 0, 0])\n",
" cross = np.cross(vecs[:, 0], vecs[:, 1])\n",
" rot_angle = -cur_angle - (cross < 0) * np.pi\n",
" rot_mat = make_2drot(rot_angle)\n",
" # rotate at each frame\n",
" rpaths = paths @ rot_mat\n",
" return rpaths\n",
"\n",
"\n",
"appaths = align_point(cppaths)\n",
"apapaths = align_principle(ccpaths)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"fig, axs = plt.subplots(ncols=3, squeeze=True, figsize=(16, 4))\n",
"\n",
"axs[0].set_title(\"No Align\")\n",
"axs[1].set_title(\"COM Align\")\n",
"axs[2].set_title(\"Point 0 Align\")\n",
"for i in range(0, 2048, 16):\n",
" axs[0].plot(paths[i, :, 0], paths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048))\n",
" axs[1].plot(\n",
" apapaths[i, :, 0], apapaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048)\n",
" )\n",
" axs[2].plot(\n",
" appaths[i, :, 0], appaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / 2048)\n",
" )\n",
"for i in range(3):\n",
" axs[i].set_xticks([])\n",
" axs[i].set_yticks([])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can see how points far away on the chain from 0 have much more variance in the point 0 align, whereas the COM alignment looks better spread. Remember, to apply these methods you must do them to your both your training data and any prediction points. Thus, they should be viewed as part of your neural network. We can now check that rotating has no effect on these. The plots below have the trajectory rotated by 1 radian and you can see that both alignment methods have no change (the lines are overlapping)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"rot_paths = paths @ make_2drot(1)\n",
"rot_appaths = align_point(center_point(rot_paths))\n",
"rot_apapaths = align_principle(center_com(rot_paths))\n",
"fig, axs = plt.subplots(ncols=3, squeeze=True, figsize=(16, 4))\n",
"\n",
"axs[0].set_title(\"No Align\")\n",
"axs[1].set_title(\"COM Align\")\n",
"axs[2].set_title(\"Point 0 Align\")\n",
"for i in range(0, 500, 20):\n",
" axs[0].plot(paths[i, :, 0], paths[i, :, 1], \".-\", alpha=1, color=\"C1\")\n",
" axs[1].plot(apapaths[i, :, 0], apapaths[i, :, 1], \".-\", alpha=1, color=\"C1\")\n",
" axs[2].plot(appaths[i, :, 0], appaths[i, :, 1], \".-\", alpha=1, color=\"C1\")\n",
" axs[0].plot(rot_paths[i, :, 0], rot_paths[i, :, 1], \".-\", alpha=0.2, color=\"C2\")\n",
" axs[1].plot(\n",
" rot_apapaths[i, :, 0], rot_apapaths[i, :, 1], \".-\", alpha=0.2, color=\"C2\"\n",
" )\n",
" axs[2].plot(rot_appaths[i, :, 0], rot_appaths[i, :, 1], \".-\", alpha=0.2, color=\"C2\")\n",
"# plot again to get handles\n",
"axs[0].plot(np.nan, np.nan, \".-\", alpha=1, color=\"C2\", label=\"rotated\")\n",
"axs[0].plot(np.nan, np.nan, \".-\", alpha=1, color=\"C1\", label=\"non-rotated\")\n",
"axs[0].legend()\n",
"for i in range(3):\n",
" axs[i].set_xticks([])\n",
" axs[i].set_yticks([])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now which method is better? Aligning based on arbitrary points is indeed easier, but it creates an unusual new variance in your features. For example, let's see what happens if we make a small perturbation to one conformation. The code is hidden for simplicity. We try changing point 1, then point 0, then point 11 to see the effects of perturbations along the chain. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"NP = 16\n",
"\n",
"\n",
"def perturb_paths(perturb_point):\n",
" perturbation = np.zeros_like(paths[:NP])\n",
" perturbation[:, perturb_point, 0] = np.linspace(0, 0.2, NP)\n",
" test_paths = paths[0:1] - perturbation\n",
"\n",
" # compute aligned trajs\n",
" appaths = align_point(center_point(test_paths))\n",
" apapaths = align_principle(center_com(test_paths))\n",
"\n",
" fig, axs = plt.subplots(ncols=3, squeeze=True, figsize=(16, 4))\n",
" axs[0].set_title(f\"Perturb {perturb_point} - No Align\")\n",
" axs[1].set_title(f\"Perturb {perturb_point} - COM Align\")\n",
" axs[2].set_title(f\"Perturb {perturb_point} - Point 0 Align\")\n",
" for i in range(NP):\n",
" axs[0].plot(\n",
" test_paths[i, :, 0],\n",
" test_paths[i, :, 1],\n",
" \".-\",\n",
" alpha=0.2,\n",
" color=cmap(i / NP),\n",
" )\n",
" axs[1].plot(\n",
" apapaths[i, :, 0], apapaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / NP)\n",
" )\n",
" axs[2].plot(\n",
" appaths[i, :, 0], appaths[i, :, 1], \".-\", alpha=0.2, color=cmap(i / NP)\n",
" )\n",
" for i in range(3):\n",
" axs[i].set_xticks([])\n",
" axs[i].set_yticks([])\n",
"\n",
"\n",
"perturb_paths(0)\n",
"perturb_paths(1)\n",
"perturb_paths(11)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, perturbing one point alters all others after alignment. This makes these transformed features sensitive to noise, especially aligning to point 0 or 1. More importantly, this effect is uneven in the alignment to point 0. This can in-turn make training quite difficult. Of course, neural networks are universal approximators so in theory this should not matter. However, I expect that using the COM alignment approach will give better training because the network will not need to account for this unusual variance structure. \n",
"\n",
"```{margin}\n",
"The alignment changes due to small changes in input points is described by the Jacobian of our transform which measures how changes to one input dimension affects all output dimension.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"from moviepy.editor import VideoClip\n",
"from moviepy.video.io.bindings import mplfig_to_npimage\n",
"from matplotlib.collections import LineCollection\n",
"\n",
"pas = [find_principle_axis(p, 2) for p in ccpaths]\n",
"coms = np.mean(paths, axis=1)\n",
"# convert to plottable lines\n",
"pvecs = [\n",
" ([c[0] - p[0, 0], c[0], c[0] + p[1, 0]], [c[1] - p[0, 1], c[1], c[1] + p[1, 1]])\n",
" for c, p in zip(coms, pas)\n",
"]\n",
"\n",
"\n",
"def make_segments(data, time_index):\n",
" points = np.array([data[time_index, :, 0], data[time_index, :, 1]]).T.reshape(\n",
" -1, 1, 2\n",
" )\n",
" segments = np.concatenate([points[:-1], points[1:]], axis=1)\n",
" return segments\n",
"\n",
"\n",
"dpi = 96\n",
"fig, axs = plt.subplots(ncols=3, figsize=(1920 / dpi, 1080 / dpi / 2), dpi=dpi)\n",
"fps = 60\n",
"all_paths = [paths, apapaths, appaths]\n",
"fronts = []\n",
"for i, p in enumerate(all_paths):\n",
" fronts.append(\n",
" axs[i].plot(p[-1][:, 0], p[-1][:, 1], \"o\", zorder=0, color=\"C0\", markersize=12)[\n",
" 0\n",
" ]\n",
" )\n",
" axs[i].set_xlim(np.min(p[..., 0]), np.max(p[..., 0]))\n",
" axs[i].set_ylim(np.min(p[..., 1]), np.max(p[..., 1]))\n",
" axs[i].set_xticks([])\n",
" axs[i].set_yticks([])\n",
"\n",
"paplot = axs[0].plot(*pvecs[0], \"-\", alpha=1, color=\"C1\", label=\"Principle Axes\")[0]\n",
"axs[0].legend(loc=\"upper left\")\n",
"\n",
"axs[0].set_title(\"Trajectory\", fontsize=28)\n",
"axs[1].set_title(\"COM/Princ Axes Aligned\", fontsize=28)\n",
"axs[2].set_title(\"Point 0, Vec 0-1 Aligned\", fontsize=28)\n",
"\n",
"plt.tight_layout()\n",
"\n",
"T = paths.shape[0]\n",
"line_collections = []\n",
"line_segments = [[] for _ in all_paths]\n",
"aline_segments = []\n",
"for i, p in enumerate(all_paths):\n",
" for j in range(T):\n",
" seg = make_segments(p, j)\n",
" line_segments[i].append(seg)\n",
"\n",
"\n",
"def make_frame(t):\n",
" frame = int(fps * t)\n",
" if len(line_collections) == 0:\n",
" for i, p in enumerate(all_paths):\n",
" lc = LineCollection(\n",
" line_segments[i][frame], color=\"C0\", norm=plt.Normalize(0, 1), alpha=0.2\n",
" )\n",
" axs[i].add_collection(lc)\n",
" line_collections.append(lc)\n",
" j = min(frame, T - 1)\n",
" # Set the values used for colormapping\n",
" for i, p in enumerate(all_paths):\n",
" line_collections[i].set_segments(line_segments[i][j])\n",
" fronts[i].set_data(p[j][:, 0], p[j][:, 1])\n",
" paplot.set_data(*pvecs[j])\n",
" return mplfig_to_npimage(fig)\n",
"\n",
"\n",
"duration = T / fps\n",
"animation = VideoClip(make_frame, duration=duration)\n",
"animation.write_videofile(filename=\"../_static/images/pas_traj.mp4\", fps=fps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final analysis shows a video of the two frames. One thing you'll note are the jumps, when the principle axes swap direction. You can see that the ambiguity caused by these can create artifacts.\n",
"\n",
"\n",
"\n",
" \n",
"

\n",
"\n",
"\n",
"```{margin}\n",
"Some people refer to principle axes finding as a kind of PCA. It is. But when we discuss PCA, we mean identifying the sources of variances across the trajectory, not across points in a single frame. You could do PCA in a single frame and use that to define your trans/rot invariant frame. It's mathematically equivalent to principle axes finding.\n",
"```\n",
"\n",
"### Using Unsupervised Methods for Alignment\n",
"\n",
"There are additional methods for aligning trajectories. You could define one frame as the \"reference\" and find the translation and rotations that best align with that reference. This could give some interpretability to your rotation and translation alignment. A tempting option is to use dimensionality reduction (like PCA), but these are not rotation invariant. This is confusing at first, because remember PCA should remove translation and rotation. It removes it though from the training data and not an arbitrary frame because it examines motion along a trajectory. You can easily see this getting principle components and then trying to align a new frame to them and a rotated version of the new frame. You'll get different results. Another important consideration is if the unsupervised method can handle new data. Manifold embeddings do not provide a linear transform that can handle new data for inference. Manifold embeddings are also not necessarily rotation invariant or equivariant. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"def pca(paths, keep=3):\n",
" # get covariance matrix along dimensions of system\n",
" x = paths.reshape(paths.shape[0], -1)\n",
" # Get eigs\n",
" cmat = np.cov(x.T)\n",
" evals, evecs = np.linalg.eigh(cmat)\n",
" # sort by size\n",
" order = np.argsort(evals)[::-1]\n",
" # only keep a few\n",
" return evecs[:, order[-keep:]], evals[order[-keep:]]\n",
"\n",
"\n",
"def align_pca_naive(paths, a):\n",
" # Remove degrees of freedom from each conformation\n",
" offsets = paths.reshape(-1, len(a)) @ a\n",
" print(offsets.shape)\n",
" trans = offsets @ a.T\n",
" new_paths = paths - trans.reshape(paths.shape)\n",
" return new_paths\n",
"\n",
"\n",
"def align_pca(paths, a):\n",
" def get_pca_axis(p, a=a):\n",
" return np.sum(a.reshape(-1, 2) * p, axis=1)\n",
"\n",
" return align_principle(paths, get_pca_axis)\n",
"\n",
"\n",
"a, e = pca(paths, 3)\n",
"print(e)\n",
"pcapaths = align_pca_naive(paths, a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"pcapaths2 = align_pca_naive(paths @ make_2drot(1), a)\n",
"print(pcapaths2.shape)\n",
"for i in range(0, 2048, 256):\n",
" plt.plot(pcapaths[i, :, 0], pcapaths[i, :, 1], \".-\", alpha=1, color=\"C0\", zorder=1)\n",
" plt.plot(\n",
" pcapaths2[i, :, 0], pcapaths2[i, :, 1], \".-\", alpha=1, color=\"C1\", zorder=0\n",
" )\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distance Features\n",
"\n",
"One more topic on parsing data is treating distances. As we saw above, pairwise distance is a wise transformation because it is translation and rotation invariant. However, we may want to sometimes transform this further. One obvious choice is to use $1 / r$ as the input to our neural network. This is because most properties of atoms (and thus molecules) are affected by their closest nearby atoms. An oxygen nearby a carbon is much more important than an oxygen that is 100 nanometers away. Choosing $1 / r$ as an input makes it easier for a neural network to train because it encodes this physical insight about local interactions being the most important. Of course, a neural network could learn to turn $r$ into $1/r$ because they are universal approximators. Yet this approach means we do not need to waste training data and weights on learning to change $r$ into $1 / r$. *This approach will not work on non-local or mildly-non-local effects like electrostatics or multi-scale phenomena.*\n",
"\n",
"Another detail on distances is that we often want to \"featurize\" them; we'd like to go from one a single number like $r = 3.2$ to a vector of reals. Why? Well that's just how neural networks learn. Hidden-layers need to have more than 1 dimension to be expressive enough to model any function. This seems like an obvious point though: if you used $r$ in a neural network it would obviously get larger as it goes through hidden layers. However, there are a few \"standard\" ways that people like to do this process. There are valid reasons, like making it smoothly differentiable, that you might choose one of these special \"featurizing\" functions.\n",
"\n",
"### Repeating\n",
"\n",
"The first approach is to just repeat $r$ up to the desired hidden dimension. This is representable as a dense neural network with no activation.\n",
"\n",
"### Binning\n",
"\n",
"As explored in Gilmer et al. {cite}`gilmer2017neural` and others, you can bin your distances to be a one-hot vector. Essentially, you histogram $r$ into fixed bins so that you only have one bin being \"hot\". Each bin will represent a segment of positions (e.g., 4.5-5.0). This has discontinuous derivatives with respect to distance and is rarely used.\n",
"\n",
"### Radial Basis Functions\n",
"\n",
"Radial basis functions are a commonly used procedure for converting a scalar into a fixed number of features and were first used in interpolation{cite}`powell1977restart`. Radial basis functions use the following equation:\n",
"\n",
"\\begin{equation}\n",
" e_i = \\exp\\left[{-\\left(r - d_i\\right)^2 / w}\\right]\n",
"\\end{equation}\n",
"\n",
"where $d_i$ is an equally spaced vector of distances (e.g., $[1.5, 3.0, 4.5, 6.0]$) and $w$ is a trainable (or hyper) parameter. This computes a Gaussian kernel between $r$ and all distances $d_i$. What is nice about this expression is the smooth well-behaved derivatives with respect to $r$ and lack of trainable parameters. You can (almost) represent this with a dense layer and a softmax activation.\n",
"\n",
"\n",
"### Sub NN\n",
"\n",
"Another strategy used in Gilmer et al. {cite}`gilmer2017neural` is to just put your distances through a series of dense layers to get features. For example, if you're going to use the distance in a graph neural network you could run it through three dense layers first to get a larger feature dimension. Remember that repeating and radial basis functions are equivalent to dense layers (assuming correct activation choice), so this strategy can be a simple solution to the above choice."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chapter Summary\n",
"\n",
"* Machine learning models that work with molecules must be permutation invariant, such that if the atoms are rearranged, the output will not change. \n",
"* Translational invariance of molecular coordinates is when the coordinates are shifted and the resulting output does not change. \n",
"* Rotational invariance is similar, except the molecular coordinates are rotated.\n",
"* Data augmentation is when you try to teach your model the various types of equivariances by rotating and translating your training data to create additional examples.\n",
"* There are various techniques, such as eigendecomposition or pairwise distance to make molecular coordinates invariant.\n",
"* A one-hidden layer dense neural network is an example of a model with no equivariances. \n",
"* You can try alignment for trajectories, where each training example has the same ordering and number of atoms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cited References\n",
"\n",
"```{bibliography}\n",
":style: unsrtalpha\n",
":filter: docname in docnames\n",
"```"
]
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}