{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tensors and Shapes\n",
    "\n",
    "This textbook will draw upon linear algebra, vector calculus, and probability theory. Each chapter states the expected background in an `Audience & Objectives` admonition (see example below). This math review fills in details missing from those typical classes: working with tensors. If you would like a chemistry-focused background on those topics, you can read the online book [*Mathematical Methods for Molecule Sciences* by John Straub](https://unitcirclepress.com/){cite}`straub2020mathematical`.\n",
    "\n",
    "Tensors are the generalization of vectors (rank 1) and matrices (rank 2) to arbitrary **rank**. Rank can be defined as the number of indices required to get individual elements of a tensor. A matrix requires two indices (row, column), and is thus a rank 2 tensor. We may say in normal conversation that a matrix is a \"two-dimensional object\" because it has rows and columns, but this is ambiguous because the row could be 6 dimensions and the columns could be 1 dimension. Always use the word rank to distinguish vectors, matrices, and higher-order tensors. The components that make up rank are called **axes** (plural of **axis**). The **dimension** is how many elements are in a particular axis. The **shape** of a tensor combines all of these. A shape is a tuple (ordered list of numbers) whose length is the rank and elements are the dimension of each axis.\n",
    "\n",
    "```{admonition} Audience & Objectives\n",
    "You should have a background in linear algebra and basic Python programming to read this chapter. After reading, you should be able to\n",
    "\n",
    "  * Define a tensor and specify one in Python\n",
    "  * Modify tensor rank, shape, and axes\n",
    "  * Use Einstein notation to define equations/expressions of tensors\n",
    "  * Understand and use broadcasting rules to work with tensors  \n",
    "```\n",
    "\n",
    "\n",
    "```{margin} Rank\n",
    "[Tensor rank](https://mathworld.wolfram.com/TensorRank.html) and [matrix rank](https://mathworld.wolfram.com/MatrixRank.html) are two different concepts. Matrix rank\n",
    "is the number of linearly independent columns and has nothing to do with tensor rank. Some authors may use *order* to refer to tensor rank to distinguish the two terms.\n",
    "```\n",
    "\n",
    "Let's practice our new vocabulary. A Euclidean vector $(x, y, z)$ is a rank 1 tensor whose 0th axis is dimension 3. Its shape is $(3)$. Beautiful. A 5 row, 4 column matrix is now called a rank 2 tensor whose axes are dimension 5 and 4. Its shape is $(5, 4)$. The scalar (real number) 3.2 is a rank 0 tensor whose shape is $()$.\n",
    "\n",
    "TensorFlow has a [nice visual guide to tensors](https://www.tensorflow.org/guide/tensor).\n",
    "\n",
    "```{note}\n",
    "Array and tensor are synonyms. Array is the preferred word\n",
    "in numpy and often used when describing tensors in Python. Tensor is the mathematic\n",
    "equivalent.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Einstein Notation\n",
    "\n",
    "Einstein notation is the way tensor operations can be written out. We'll be using a simplified version, based on the `einsum` function available in many numerical libraries ({obj}`np.einsum<numpy.einsum>`, {obj}`tf.einsum`, {obj}`jnp.einsum<jax.numpy.einsum>`). It's relatively simple. Each tensor is written as a lower case variable with explicit indices, like $a_{ijk}$ for a rank 3 tensor. The reason the variable name is written in lower case is because if you fill in the indices $a_{023}$, you get a scalar.  A variable without an index, $b$, is a scalar. There is one rule for this notation: if an index doesn't appear on both sides of the equation, it is summed over on the one side in which it appears. Einstein notation requires both sides of the equation to be written-out, so that its clear what the input/output shapes of the operation are.\n",
    "\n",
    "```{warning}\n",
    "The concept of Tensors from physics involves a more complex picture of connecting algebraic sets of objects, typically vectors in a space. Here we just treat tensors as a synonym for multidimensional array. Be aware that looking-up tensor on wikipedia will bring you to the physics picture.\n",
    "```\n",
    "\n",
    "Here are some examples of writing tensor operations in Einstein notation.\n",
    "\n",
    "**Total Sum**\n",
    "\n",
    "Sum all elements of a rank 4 tensor. In Einstein notation this is:\n",
    "\n",
    "\\begin{equation}\n",
    "  a_{ijkl} = b\n",
    "\\end{equation}\n",
    "\n",
    "in normal mathematic notation, this would be\n",
    "\n",
    "\\begin{equation}\n",
    "\\sum_i \\sum_j \\sum_k \\sum_l a_{ijkl} = b\n",
    "\\end{equation}\n",
    "\n",
    "\n",
    "```{margin}\n",
    "There even is a framework-independent Einstein notation library that\n",
    "enables you to use this notation across multiple frameworks for neural network layers.\n",
    "It is called [einops](https://einops.rocks/)\n",
    "```\n",
    "\n",
    "**Sum Specific Axis**\n",
    "\n",
    "Sum over last axis\n",
    "\n",
    "\\begin{equation}\n",
    "  a_{ijkl} = b_{ijk}\n",
    "\\end{equation}\n",
    "\n",
    "\n",
    "In normal notation:\n",
    "\n",
    "\\begin{equation}\n",
    " \\sum_l a_{ijkl} = b_{ijk}\n",
    "\\end{equation}\n",
    "\n",
    "**Dot Product**\n",
    "\n",
    "In Einstein notation: \n",
    "\n",
    "\\begin{equation}\n",
    "  a_{i} b_{i} = c\n",
    "\\end{equation}\n",
    "\n",
    "In normal notation:\n",
    "\n",
    "\\begin{equation}\n",
    "  \\sum_i a_{i} b_{i} = c\n",
    "\\end{equation}\n",
    "\n",
    "Notice that $a_i$ and $b_i$ must have the same dimension in their 0th axis in order for the sum in the dot product to be valid. This makes sense, since to compute a dot product the vectors must be the same dimension. In general, if two tensors share the same index ($b_{ij}$, $a_{ik}$), then that axis must be the same dimension.\n",
    "\n",
    "Can you write the following out in Einstein notation?\n",
    "\n",
    "**Matrix Multiplication**\n",
    "\n",
    "The matrix product of 2 tensors, where each tensor is rank 2. \n",
    "\n",
    "```{admonition} Answer\n",
    ":class: dropdown\n",
    "\\begin{equation}\n",
    "    a_{ij} b_{jk} = c_{ik}\n",
    "\\end{equation}\n",
    "```\n",
    "\n",
    "**Matrix Vector Product**\n",
    "\n",
    "Apply matrix $a$ to column vector $b$ by multiplication. $\\mathbf{A}\\vec{b}$ in linear algebra notation.\n",
    "\n",
    "```{admonition} Answer\n",
    ":class: dropdown\n",
    "\\begin{equation}\n",
    "    a_{ij} b_{j} = c_{i}\n",
    "\\end{equation}\n",
    "```\n",
    "\n",
    "**Matrix Transpose**\n",
    "\n",
    "Swap the values in a matrix to make it a transpose.\n",
    "\n",
    "```{admonition} Answer\n",
    ":class: dropdown\n",
    "\\begin{equation}\n",
    "    a_{ij} = t_{ji}\n",
    "\\end{equation}\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "## Tensor Operations\n",
    "\n",
    "```{margin} Why Tensors?\n",
    "Tensors are the main building block\n",
    "of modern deep learning. Nearly all\n",
    "variables in equations are actually tensors.\n",
    "Being able to understand how shape affects them\n",
    "is the key to understanding how algorithms work. \n",
    "```\n",
    "\n",
    "\n",
    "Although you can specify operations in Einstein notation, it is typically not expressive enough. How would you write this operation: sum the last axis of a tensor? Without knowing the rank, you do not know how many indices you should indicate in the expression. Maybe like this?\n",
    "\n",
    "\\begin{equation}\n",
    "a_{i_0, i_1, \\ldots i_N} = a_{i_0, i_1, \\ldots i_{N - 1}}\n",
    "\\end{equation}\n",
    "\n",
    "Well, that's good but what if your operation has two arguments: which axis to sum and the tensor. That would also be clumsy to write. Einstein notation is useful and we'll see it more, but we need to think about **tensor operations** as analogies to functions. Tensor operations take in 1 or more tensors and output 1 or more tensors and the output shape depends on the input shape.\n",
    "\n",
    "One of the difficult things about tensors is understanding how shape is treated in equations. For example, consider this equation:\n",
    "\n",
    "\\begin{equation}\n",
    "    g = \\exp\\left(a - b\\right)^2\n",
    "\\end{equation}\n",
    "\n",
    "Seems like a reasonable enough equation. But what if $a$ is rank 3 and $b$ is rank 1? Is $g$ rank 1 or 3 then? Actually, this is taken from a real example where $g$ was rank 4. You subtract each element of $b$ from each element of $a$. You could write this in Einstein notation:\n",
    "\n",
    "\\begin{equation}\n",
    "    g_{ijkl} = \\exp\\left(a_{ijk} - b_l\\right)^2\n",
    "\\end{equation}\n",
    "\n",
    "except this function should work on arbitrary ranked $a$ and always output $g$ being the rank of $a + 1$. Typically, the best way to express this is explicitly stating how rank and shape are treated. \n",
    "\n",
    "### Reduction Operations\n",
    "\n",
    "Reduction operations reduce the rank of an input tensor. {obj}`np.sum(a, axis=0)<numpy.sum>` is an example. The axis argument means that we're summing over the 0th axis so that it will be removed. If `a` is a rank 1 vector, this would leave us with a scalar. If `a` is a matrix, this would remove the rows so that only columns are left over. That means we would be left with *column sums*. You can also specify a tuple of axes to be removed, which will be done in that order `np.sum(a, axis=(0,1) )`.\n",
    "\n",
    "In addition to {obj}`np.sum<numpy.sum>`, there are {obj}`np.minimum<numpy.minimum>`, {obj}`np.maximum<numpy.maximum>`, {obj}`np.any<numpy.any>` (logical or), and more. Let's see some examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "a_shape = (4, 3, 2)\n",
    "a_len = 4 * 3 * 2\n",
    "\n",
    "a = np.arange(a_len).reshape(a_shape)\n",
    "print(a.shape)\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try to guess the shape of the output tensors using `a` in the below code based on what you've learned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "b = np.sum(a, axis=0)\n",
    "print(b.shape)\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "c = np.any(a > 4, axis=1)\n",
    "print(c.shape)\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-output"
    ]
   },
   "outputs": [],
   "source": [
    "d = np.product(a, axis=(2, 1))\n",
    "print(d.shape)\n",
    "print(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Element Operations\n",
    "\n",
    "Default operations in Python, like `+` `-` `*` `/` `^` , are also tensor operations. They preserve shape so that the output shape is the same as the inputs'. The input tensors must have the same shape or be able to become the same shape through broadcasting, which is defined in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = np.ones((4, 3, 2))\n",
    "b.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = a * b\n",
    "c.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting\n",
    "\n",
    "One of the difficulties with the elementary operations is that they require the input tensors to have the same shape. For example, you cannot multiply a scalar (rank 0) and a vector (rank 1). Of course, if you're familiar with `numpy` this is common. It is done with **broadcasting** comes in. Broadcasting increases the rank of one of the input tensors to be compatible with another. Broadcasting works at the last axis and works its way forward. Let's see an example\n",
    "\n",
    "```{margin} Broadcasting order\n",
    "Broadcasting starts at the last axis and \n",
    "goes forward because getting an element\n",
    "at the last axis gives a scalar (rank 0) \n",
    "no matter what rank. This makes it possible to \n",
    "copy to fill up axis to align shapes.\n",
    "```\n",
    "\n",
    "\\begin{equation}\n",
    "    A + B\n",
    "\\end{equation}\n",
    "\n",
    "**Input A**\n",
    "\n",
    "Rank 2, shape is (2, 3)\n",
    "```\n",
    "A:\n",
    " 4  3  2 \n",
    "-1  2  4\n",
    "```\n",
    "\n",
    "**Input B**\n",
    "\n",
    "Rank 1, shape is (3), a vector:\n",
    "\n",
    "```\n",
    "B: 3  0  1\n",
    "```\n",
    "\n",
    "Now let's see how the broadcasting works. Broadcasting starts by lining up the shapes from the end of the tensors\n",
    "\n",
    "**Step 1: align on last axis**\n",
    "```\n",
    "tensor        shape\n",
    "A:             2  3\n",
    "B:                3\n",
    "broadcasted B: .  .\n",
    "```\n",
    "\n",
    "**Step 2: process last axis**\n",
    "\n",
    "Now broadcasting looks at the last axis (axis 1) and if one tensor has axis dimension 1, its value is copied to match the others. In our case, they agree.\n",
    "\n",
    "```\n",
    "tensor        shape\n",
    "A:             2  3\n",
    "B:                3\n",
    "broadcasted B: .  3\n",
    "```\n",
    "\n",
    "**Step 3: process next axis**\n",
    "\n",
    "Now we examine the next axis, axis 0. B has no axis there, because its rank is too low. Broadcasting will insert a new axis by (i) inserting a new axis with dimension 1 and (ii) copying the value at this new axis until its dimension matches.\n",
    "\n",
    "**Step 3i:**\n",
    "\n",
    "Add new axis of dimension 1. This is like making $B$ have 1 row and 3 columns:\n",
    "\n",
    "```\n",
    "B:\n",
    " 3  0  1\n",
    "```\n",
    "\n",
    "**Step 3ii:**\n",
    "\n",
    "Now we copy the values of this axis until its dimension matches $A$'s axis 0 dimension. We're basically copying $b_{0j}$ to $b_{1j}$. \n",
    "\n",
    "```\n",
    "B:\n",
    " 3  0  1\n",
    " 3  0  1\n",
    "```\n",
    "\n",
    "**Final**\n",
    "```\n",
    "tensor        shape\n",
    "A:             2  3\n",
    "B:                3\n",
    "broadcasted B: 2  3\n",
    "```\n",
    "\n",
    "Now, we compute the result by addition elementwise.\n",
    "\n",
    "```\n",
    "A + B\n",
    "  4 + 3  3 + 0  2 + 1  =   7  3  3\n",
    " -1 + 3  2 + 0  4 + 1      2  2  5\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "Let's see some more examples, but only looking at the input/output shape\n",
    "\n",
    "|    A Shape   |  B Shape |  Output Shape |\n",
    "| :----------- | :------: |  -----------: |\n",
    "|     (4,2)    |  (4,1)   |    (4,2)      |\n",
    "|     (4,2)    |  (2,)    |    (4,2)      |\n",
    "|     (16,1,3) |  (4,3)   |    (16,4,3)   |\n",
    "|     (16,3,3) |  (4,1)   |    ``Error``  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try some for yourself!\n",
    "\n",
    "|    A Shape   |  B Shape |  Output Shape |\n",
    "| :----------- | :------: |  -----------: |\n",
    "|     (7,4,3)    |  (1,)   |    ?     |\n",
    "|     (16, 16, 3) |  (3,)    | ?      |\n",
    "|     (2,4,5) |  (5,4,1)   |  ?   |\n",
    "|     (1,4) |  (16,)   |    ?  |\n",
    "\n",
    "```{admonition} Answer\n",
    ":class: dropdown\n",
    "|    A Shape   |  B Shape |  Output Shape |\n",
    "| :----------- | :------: |  -----------: |\n",
    "|     (7,4,3)    |  (1,)   |    (7,4,3)      |\n",
    "|     (16, 16, 3) |  (3,)    |    (16,16,3)      |\n",
    "|     (2,4,5) |  (5,4,1)   |    ``Error``  |\n",
    "|     (1,4) |  (16,)   |    ``Error``  |\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Suggested Reading for Broadcasting\n",
    "\n",
    "You can read more about broadcastnig in the [numpy tutorial](https://numpy.org/doc/stable/user/basics.broadcasting.html) or the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modifying Rank\n",
    "\n",
    "```{margin} newaxis\n",
    "``newaxis`` slices like ``a[np.newaxis]`` are possible\n",
    "    in tensorflow, jax, and numpy. In PyTorch there is `unsqueeze`.\n",
    "You can also use ``reshape`` and ignore newaxis\n",
    "```\n",
    "\n",
    "The last example we saw brings up an interesting questions: what if we want to add a (1,4) and (16,) to end up with a (4,16) tensor? We could insert a new axis at the end of $B$ to make its shape (16, 1). This can be done using the {obj}`np.newaxis<numpy.newaxis>` syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.ones((1, 4))\n",
    "b = np.ones(\n",
    "    16,\n",
    ")\n",
    "result = a + b[:, np.newaxis]\n",
    "result.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Just as newaxis can increase rank, we can decrease rank. One way is to just slice, like ``a[0]``. A more general way is to {obj}`np.squeeze<numpy.squeeze>` which removes any axes that are dimension 1 without needing to know the specific axes that are dimension 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.ones((1, 32, 4, 1))\n",
    "print(\"before squeeze:\", a.shape)\n",
    "a = np.squeeze(a)\n",
    "print(\"after squeeze:\", a.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It turns out that ``np.newaxis`` and ``tf.newaxis`` are actually defined as ``None``. Some programmers will exploit this to save some keystrokes and use ``None`` instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.ones((1, 4))\n",
    "b = np.ones(\n",
    "    16,\n",
    ")\n",
    "result = a + b[:, None]\n",
    "result.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I recommend against this because it can be a bit confusing and it's really not saving that many keystrokes. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reshaping\n",
    "\n",
    "The most general way of changing rank and shape is through {obj}`np.reshape<numpy.reshape>`. This allows you to reshape a tensor, as long as the number of elements remains the same. You could make a (4, 2) into an (8,). You could make a (4, 3) into a (1, 4, 3, 1). Thus it can accomplish the two tasks done by {obj}`np.squeeze<numpy.squeeze>` and {obj}`np.newaxis<numpy.newaxis>`. \n",
    "\n",
    "There is one special syntax element to shaping:  A ``-1`` dimension. ``-1`` can appear once in a reshape command and means to have the computer figure out what goes there. We know the number of elements doesn't change in a reshape, so the computer can infer what goes in the dimension marked as ``-1``. Let's see some examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(32)\n",
    "new_a = np.reshape(a, (4, 8))\n",
    "new_a.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_a = np.reshape(a, (4, -1))\n",
    "new_a.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "new_a = np.reshape(a, (1, 2, 2, -1))\n",
    "new_a.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rank Slicing\n",
    "\n",
    "Hopefully you're familiar with slicing in numpy/Python. Review at the [Python Tutorial](https://docs.python.org/3/tutorial/introduction.html#lists) and the [numpy tutorial](https://numpy.org/doc/stable/reference/arrays.indexing.html) for a refresher if you need it. **Rank Slicing** is just my terminology for slicing without knowing the rank of a tensor. Use the ``...`` (ellipsis) keyword. This allows you to account for unknown rank when slicing. Examples:\n",
    "\n",
    "* Access last axis: ``a[...,:]``\n",
    "* Access last 2 axes: ``a[...,:,:]``\n",
    "* Add new axis to end ``a[...,np.newaxis]``\n",
    "* Add new axis to beginning ``a[np.newaxis,...]``\n",
    "\n",
    "---\n",
    "\n",
    "Let's see if we can put together our skills to implement the equation example from above, \n",
    "\n",
    "\\begin{equation}\n",
    "    g = \\exp\\left(a - b\\right)^2\n",
    "\\end{equation}\n",
    "\n",
    "for arbitrary rank $a$. Recall $b$ is a rank 1 tensor and we want $g$ to be the rank of $a + 1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def eq(a, b):\n",
    "    return np.exp((a[..., np.newaxis] - b) ** 2)\n",
    "\n",
    "\n",
    "b = np.ones(4)\n",
    "a1 = np.ones((4, 3))\n",
    "a2 = np.ones((4, 3, 2, 1))\n",
    "\n",
    "g1 = eq(a1, b)\n",
    "print(\"input a1:\", a1.shape, \"output:\", g1.shape)\n",
    "\n",
    "g2 = eq(a2, b)\n",
    "print(\"input a2:\", a2.shape, \"output:\", g2.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View vs Copy\n",
    "\n",
    "```{margin}\n",
    "In most machine learning frameworks, it is actually not possible to modify an array because element assignment interferes with automatic differentiation. So this distinction of a view vs a copy is irrelevant. \n",
    "```\n",
    "\n",
    "Most slicing and reshaping operations produce a **view** of the original array. That means no copy operation is done. This is default behavior in all frameworks to reduce required memory -- you can slice as much as you want without increasing memory use. This typically has no consequences for how we program; it is more of an optimization detail. However, if you modify elements in a view, you will also modify the original array from which the view was constructed. Sometimes this can be unexpected. You should not rely on this behavior though, because in `numpy` a copy may be returned for certain {obj}`np.reshape<numpy.reshape>` and [slicing commands](https://numpy.org/doc/stable/reference/arrays.indexing.html#advanced-indexing). Thus, I recommend being aware that views may be returned as an optimization, but not assume that is always the case. If you actually want a copy you should explicitly create a copy, like with {obj}`np.copy<numpy.copy>`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chapter Summary\n",
    "\n",
    "* Tensors are the building blocks of machine learning. A tensor has a rank and shape that specifies how many elements it has and how they are arranged. An axis describes each element in the shape. \n",
    "\n",
    "* A euclidean vector is a rank 1 tensor with shape (3). It has 1 axis of dimension 3. A matrix is a rank 2 tensor. It has two axes. \n",
    "\n",
    "* Equations that describe operating on 1 or more tensors can be written using Einstein notation. Einstein notation uses indices to indicate the shape of tensors, how things are summed, and which axes must match up.\n",
    "\n",
    "* There are operations that reduce ranks of tensors, like ``sum`` or ``mean``.\n",
    "\n",
    "* Broadcasting is an automatic tool in programming languages that modifies shapes of tensors with different shapes to be compatible with operations like addition or division. \n",
    "\n",
    "* Tensors can be reshaped or have rank modified by ``newaxis``, ``reshape``, and ``squeeze``. These are not standardized among the various numeric libraries in Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "### Einstein notation\n",
    "\n",
    "Write out the following in Einstein notation:\n",
    "\n",
    "1. Matrix product of two matrices\n",
    "2. Trace of a matrix\n",
    "3. [Outer product](https://en.wikipedia.org/wiki/Outer_product) of two Euclidean vectors\n",
    "4. $\\mathbf{A}$ is a rank 3 tensor whose last axis is dimension 3 and contains Euclidean vectors. $\\mathbf{B}$ is Euclidean vector. Compute the dot product of each of the vectors in $\\mathbf{A}$ with B. So if $\\mathbf{A}$ is shape (11, 7, 3), it contains 11 $\\times$ 7 vectors and the output should be shape (11,7). $\\mathbf{B}$ is shape (3)\n",
    "\n",
    "###  Reductions\n",
    "\n",
    "Answer the following with Python code with reductions. Write your code to be as general as possible -- being able to take arbitrary rank tensors unless it is specified that something is a vector.\n",
    "\n",
    "1. Normalize a vector so that the sum of its elements is 1. Note the rank of the vector should be unchanged. \n",
    "\n",
    "2. Normalize the last axis of a tensor\n",
    "\n",
    "3. Compute the mean squared error between two tensors\n",
    "\n",
    "4. Compute the mean squared error between the last axis of tensor $\\mathbf{A}$ and vector $\\vec{b}$\n",
    "\n",
    "\n",
    "### Broadcasting and Shapes\n",
    "\n",
    "1. Consider two vectors $\\vec{a}$ and $\\vec{b}$. Using reshaping and broadcasting alone, write python code to compute their outer product. \n",
    "\n",
    "2. Why is the code ``a.reshape((-1, 3, -1))`` invalid?\n",
    "\n",
    "3. You have a tensor of unknown rank $\\mathbf{A}$ and would like to subtract both 3.5 and 2.5 from every element, giving two outputs for every input. Your output will be a new tensor $\\mathbf{B}$ with rank $\\textrm{rank}(\\mathbf{A}) + 1$. The last axis of $\\mathbf{B}$ should be dimension 2. Here is the example:\n",
    "\n",
    "```py\n",
    "a = np.array([10])\n",
    "f(a)\n",
    "# prints [[6.5, 7.5]]\n",
    "\n",
    "b = np.array([[5, 3, 0], [0, 2, 6]])\n",
    "f(b)\n",
    "# [[[ 1.5  2.5]\n",
    "#  [-0.5  0.5]\n",
    "#  [-3.5 -2.5]]\n",
    "\n",
    "# [[-3.5 -2.5]\n",
    "#  [-1.5 -0.5]\n",
    "#  [ 2.5  3.5]]]\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cited References\n",
    "\n",
    "```{bibliography}\n",
    ":style: unsrtalpha\n",
    ":filter: docname in docnames\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}