1. Tensors and Shapes#

This textbook will draw upon linear algebra, vector calculus, and probability theory. Each chapter states the expected background in an Audience & Objectives admonition (see example below). This math review fills in details missing from those typical classes: working with tensors. If you would like a chemistry-focused background on those topics, you can read the online book Mathematical Methods for Molecule Sciences by John Straub[Str20].

Tensors are the generalization of vectors (rank 1) and matrices (rank 2) to arbitrary rank. Rank can be defined as the number of indices required to get individual elements of a tensor. A matrix requires two indices (row, column), and is thus a rank 2 tensor. We may say in normal conversation that a matrix is a “two-dimensional object” because it has rows and columns, but this is ambiguous because the row could be 6 dimensions and the columns could be 1 dimension. Always use the word rank to distinguish vectors, matrices, and higher-order tensors. The components that make up rank are called axes (plural of axis). The dimension is how many elements are in a particular axis. The shape of a tensor combines all of these. A shape is a tuple (ordered list of numbers) whose length is the rank and elements are the dimension of each axis.

Audience & Objectives

You should have a background in linear algebra and basic Python programming to read this chapter. After reading, you should be able to

  • Define a tensor and specify one in Python

  • Modify tensor rank, shape, and axes

  • Use Einstein notation to define equations/expressions of tensors

  • Understand and use broadcasting rules to work with tensors

Let’s practice our new vocabulary. A Euclidean vector \((x, y, z)\) is a rank 1 tensor whose 0th axis is dimension 3. Its shape is \((3)\). Beautiful. A 5 row, 4 column matrix is now called a rank 2 tensor whose axes are dimension 5 and 4. Its shape is \((5, 4)\). The scalar (real number) 3.2 is a rank 0 tensor whose shape is \(()\).

TensorFlow has a nice visual guide to tensors.

Note

Array and tensor are synonyms. Array is the preferred word in numpy and often used when describing tensors in Python. Tensor is the mathematic equivalent.

1.1. Einstein Notation#

Einstein notation is the way tensor operations can be written out. We’ll be using a simplified version, based on the einsum function available in many numerical libraries (np.einsum, tf.einsum, jnp.einsum). It’s relatively simple. Each tensor is written as a lower case variable with explicit indices, like \(a_{ijk}\) for a rank 3 tensor. The reason the variable name is written in lower case is because if you fill in the indices \(a_{023}\), you get a scalar. A variable without an index, \(b\), is a scalar. There is one rule for this notation: if an index doesn’t appear on both sides of the equation, it is summed over on the one side in which it appears. Einstein notation requires both sides of the equation to be written-out, so that its clear what the input/output shapes of the operation are.

Warning

The concept of Tensors from physics involves a more complex picture of connecting algebraic sets of objects, typically vectors in a space. Here we just treat tensors as a synonym for multidimensional array. Be aware that looking-up tensor on wikipedia will bring you to the physics picture.

Here are some examples of writing tensor operations in Einstein notation.

Total Sum

Sum all elements of a rank 4 tensor. In Einstein notation this is:

(1.1)#\[\begin{equation} a_{ijkl} = b \end{equation}\]

in normal mathematic notation, this would be

(1.2)#\[\begin{equation} \sum_i \sum_j \sum_k \sum_l a_{ijkl} = b \end{equation}\]

Sum Specific Axis

Sum over last axis

(1.3)#\[\begin{equation} a_{ijkl} = b_{ijk} \end{equation}\]

In normal notation:

(1.4)#\[\begin{equation} \sum_l a_{ijkl} = b_{ijk} \end{equation}\]

Dot Product

In Einstein notation:

(1.5)#\[\begin{equation} a_{i} b_{i} = c \end{equation}\]

In normal notation:

(1.6)#\[\begin{equation} \sum_i a_{i} b_{i} = c \end{equation}\]

Notice that \(a_i\) and \(b_i\) must have the same dimension in their 0th axis in order for the sum in the dot product to be valid. This makes sense, since to compute a dot product the vectors must be the same dimension. In general, if two tensors share the same index (\(b_{ij}\), \(a_{ik}\)), then that axis must be the same dimension.

Can you write the following out in Einstein notation?

Matrix Multiplication

The matrix product of 2 tensors, where each tensor is rank 2.

Matrix Vector Product

Apply matrix \(a\) to column vector \(b\) by multiplication. \(\mathbf{A}\vec{b}\) in linear algebra notation.

Matrix Transpose

Swap the values in a matrix to make it a transpose.

1.2. Tensor Operations#

Although you can specify operations in Einstein notation, it is typically not expressive enough. How would you write this operation: sum the last axis of a tensor? Without knowing the rank, you do not know how many indices you should indicate in the expression. Maybe like this?

(1.10)#\[\begin{equation} a_{i_0, i_1, \ldots i_N} = a_{i_0, i_1, \ldots i_{N - 1}} \end{equation}\]

Well, that’s good but what if your operation has two arguments: which axis to sum and the tensor. That would also be clumsy to write. Einstein notation is useful and we’ll see it more, but we need to think about tensor operations as analogies to functions. Tensor operations take in 1 or more tensors and output 1 or more tensors and the output shape depends on the input shape.

One of the difficult things about tensors is understanding how shape is treated in equations. For example, consider this equation:

(1.11)#\[\begin{equation} g = \exp\left(a - b\right)^2 \end{equation}\]

Seems like a reasonable enough equation. But what if \(a\) is rank 3 and \(b\) is rank 1? Is \(g\) rank 1 or 3 then? Actually, this is taken from a real example where \(g\) was rank 4. You subtract each element of \(b\) from each element of \(a\). You could write this in Einstein notation:

(1.12)#\[\begin{equation} g_{ijkl} = \exp\left(a_{ijk} - b_l\right)^2 \end{equation}\]

except this function should work on arbitrary ranked \(a\) and always output \(g\) being the rank of \(a + 1\). Typically, the best way to express this is explicitly stating how rank and shape are treated.

1.2.1. Reduction Operations#

Reduction operations reduce the rank of an input tensor. np.sum(a, axis=0) is an example. The axis argument means that we’re summing over the 0th axis so that it will be removed. If a is a rank 1 vector, this would leave us with a scalar. If a is a matrix, this would remove the rows so that only columns are left over. That means we would be left with column sums. You can also specify a tuple of axes to be removed, which will be done in that order np.sum(a, axis=(0,1) ).

In addition to np.sum, there are np.minimum, np.maximum, np.any (logical or), and more. Let’s see some examples

import numpy as np

a_shape = (4, 3, 2)
a_len = 4 * 3 * 2

a = np.arange(a_len).reshape(a_shape)
print(a.shape)
print(a)
(4, 3, 2)
[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]

 [[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]]

Try to guess the shape of the output tensors using a in the below code based on what you’ve learned.

b = np.sum(a, axis=0)
print(b.shape)
print(b)
(3, 2)
[[36 40]
 [44 48]
 [52 56]]
c = np.any(a > 4, axis=1)
print(c.shape)
print(c)
(4, 2)
[[False  True]
 [ True  True]
 [ True  True]
 [ True  True]]
d = np.product(a, axis=(2, 1))
print(d.shape)
print(d)
(4,)
[       0   332640  8910720 72681840]

1.2.2. Element Operations#

Default operations in Python, like + - * / ^ , are also tensor operations. They preserve shape so that the output shape is the same as the inputs’. The input tensors must have the same shape or be able to become the same shape through broadcasting, which is defined in the next section.

a.shape
(4, 3, 2)
b = np.ones((4, 3, 2))
b.shape
(4, 3, 2)
c = a * b
c.shape
(4, 3, 2)

1.3. Broadcasting#

One of the difficulties with the elementary operations is that they require the input tensors to have the same shape. For example, you cannot multiply a scalar (rank 0) and a vector (rank 1). Of course, if you’re familiar with numpy this is common. It is done with broadcasting comes in. Broadcasting increases the rank of one of the input tensors to be compatible with another. Broadcasting works at the last axis and works its way forward. Let’s see an example

(1.13)#\[\begin{equation} A + B \end{equation}\]

Input A

Rank 2, shape is (2, 3)

A:
 4  3  2 
-1  2  4

Input B

Rank 1, shape is (3), a vector:

B: 3  0  1

Now let’s see how the broadcasting works. Broadcasting starts by lining up the shapes from the end of the tensors

Step 1: align on last axis

tensor        shape
A:             2  3
B:                3
broadcasted B: .  .

Step 2: process last axis

Now broadcasting looks at the last axis (axis 1) and if one tensor has axis dimension 1, its value is copied to match the others. In our case, they agree.

tensor        shape
A:             2  3
B:                3
broadcasted B: .  3

Step 3: process next axis

Now we examine the next axis, axis 0. B has no axis there, because its rank is too low. Broadcasting will insert a new axis by (i) inserting a new axis with dimension 1 and (ii) copying the value at this new axis until its dimension matches.

Step 3i:

Add new axis of dimension 1. This is like making \(B\) have 1 row and 3 columns:

B:
 3  0  1

Step 3ii:

Now we copy the values of this axis until its dimension matches \(A\)’s axis 0 dimension. We’re basically copying \(b_{0j}\) to \(b_{1j}\).

B:
 3  0  1
 3  0  1

Final

tensor        shape
A:             2  3
B:                3
broadcasted B: 2  3

Now, we compute the result by addition elementwise.

A + B
  4 + 3  3 + 0  2 + 1  =   7  3  3
 -1 + 3  2 + 0  4 + 1      2  2  5

Let’s see some more examples, but only looking at the input/output shape

A Shape

B Shape

Output Shape

(4,2)

(4,1)

(4,2)

(4,2)

(2,)

(4,2)

(16,1,3)

(4,3)

(16,4,3)

(16,3,3)

(4,1)

Error

Try some for yourself!

A Shape

B Shape

Output Shape

(7,4,3)

(1,)

?

(16, 16, 3)

(3,)

?

(2,4,5)

(5,4,1)

?

(1,4)

(16,)

?

1.3.1. Suggested Reading for Broadcasting#

You can read more about broadcastnig in the numpy tutorial or the Python Data Science Handbook.

1.4. Modifying Rank#

The last example we saw brings up an interesting questions: what if we want to add a (1,4) and (16,) to end up with a (4,16) tensor? We could insert a new axis at the end of \(B\) to make its shape (16, 1). This can be done using the np.newaxis syntax:

a = np.ones((1, 4))
b = np.ones(
    16,
)
result = a + b[:, np.newaxis]
result.shape
(16, 4)

Just as newaxis can increase rank, we can decrease rank. One way is to just slice, like a[0]. A more general way is to np.squeeze which removes any axes that are dimension 1 without needing to know the specific axes that are dimension 1.

a = np.ones((1, 32, 4, 1))
print("before squeeze:", a.shape)
a = np.squeeze(a)
print("after squeeze:", a.shape)
before squeeze: (1, 32, 4, 1)
after squeeze: (32, 4)

It turns out that np.newaxis and tf.newaxis are actually defined as None. Some programmers will exploit this to save some keystrokes and use None instead:

a = np.ones((1, 4))
b = np.ones(
    16,
)
result = a + b[:, None]
result.shape
(16, 4)

I recommend against this because it can be a bit confusing and it’s really not saving that many keystrokes.

1.4.1. Reshaping#

The most general way of changing rank and shape is through np.reshape. This allows you to reshape a tensor, as long as the number of elements remains the same. You could make a (4, 2) into an (8,). You could make a (4, 3) into a (1, 4, 3, 1). Thus it can accomplish the two tasks done by np.squeeze and np.newaxis.

There is one special syntax element to shaping: A -1 dimension. -1 can appear once in a reshape command and means to have the computer figure out what goes there. We know the number of elements doesn’t change in a reshape, so the computer can infer what goes in the dimension marked as -1. Let’s see some examples.

a = np.arange(32)
new_a = np.reshape(a, (4, 8))
new_a.shape
(4, 8)
new_a = np.reshape(a, (4, -1))
new_a.shape
(4, 8)
new_a = np.reshape(a, (1, 2, 2, -1))
new_a.shape
(1, 2, 2, 8)

1.4.2. Rank Slicing#

Hopefully you’re familiar with slicing in numpy/Python. Review at the Python Tutorial and the numpy tutorial for a refresher if you need it. Rank Slicing is just my terminology for slicing without knowing the rank of a tensor. Use the ... (ellipsis) keyword. This allows you to account for unknown rank when slicing. Examples:

  • Access last axis: a[...,:]

  • Access last 2 axes: a[...,:,:]

  • Add new axis to end a[...,np.newaxis]

  • Add new axis to beginning a[np.newaxis,...]


Let’s see if we can put together our skills to implement the equation example from above,

(1.14)#\[\begin{equation} g = \exp\left(a - b\right)^2 \end{equation}\]

for arbitrary rank \(a\). Recall \(b\) is a rank 1 tensor and we want \(g\) to be the rank of \(a + 1\).

def eq(a, b):
    return np.exp((a[..., np.newaxis] - b) ** 2)


b = np.ones(4)
a1 = np.ones((4, 3))
a2 = np.ones((4, 3, 2, 1))

g1 = eq(a1, b)
print("input a1:", a1.shape, "output:", g1.shape)

g2 = eq(a2, b)
print("input a2:", a2.shape, "output:", g2.shape)
input a1: (4, 3) output: (4, 3, 4)
input a2: (4, 3, 2, 1) output: (4, 3, 2, 1, 4)

1.5. View vs Copy#

Most slicing and reshaping operations produce a view of the original array. That means no copy operation is done. This is default behavior in all frameworks to reduce required memory – you can slice as much as you want without increasing memory use. This typically has no consequences for how we program; it is more of an optimization detail. However, if you modify elements in a view, you will also modify the original array from which the view was constructed. Sometimes this can be unexpected. You should not rely on this behavior though, because in numpy a copy may be returned for certain np.reshape and slicing commands. Thus, I recommend being aware that views may be returned as an optimization, but not assume that is always the case. If you actually want a copy you should explicitly create a copy, like with np.copy.

1.6. Chapter Summary#

  • Tensors are the building blocks of machine learning. A tensor has a rank and shape that specifies how many elements it has and how they are arranged. An axis describes each element in the shape.

  • A euclidean vector is a rank 1 tensor with shape (3). It has 1 axis of dimension 3. A matrix is a rank 2 tensor. It has two axes.

  • Equations that describe operating on 1 or more tensors can be written using Einstein notation. Einstein notation uses indices to indicate the shape of tensors, how things are summed, and which axes must match up.

  • There are operations that reduce ranks of tensors, like sum or mean.

  • Broadcasting is an automatic tool in programming languages that modifies shapes of tensors with different shapes to be compatible with operations like addition or division.

  • Tensors can be reshaped or have rank modified by newaxis, reshape, and squeeze. These are not standardized among the various numeric libraries in Python.

1.7. Exercises#

1.7.1. Einstein notation#

Write out the following in Einstein notation:

  1. Matrix product of two matrices

  2. Trace of a matrix

  3. Outer product of two Euclidean vectors

  4. \(\mathbf{A}\) is a rank 3 tensor whose last axis is dimension 3 and contains Euclidean vectors. \(\mathbf{B}\) is Euclidean vector. Compute the dot product of each of the vectors in \(\mathbf{A}\) with B. So if \(\mathbf{A}\) is shape (11, 7, 3), it contains 11 \(\times\) 7 vectors and the output should be shape (11,7). \(\mathbf{B}\) is shape (3)

1.7.2. Reductions#

Answer the following with Python code with reductions. Write your code to be as general as possible – being able to take arbitrary rank tensors unless it is specified that something is a vector.

  1. Normalize a vector so that the sum of its elements is 1. Note the rank of the vector should be unchanged.

  2. Normalize the last axis of a tensor

  3. Compute the mean squared error between two tensors

  4. Compute the mean squared error between the last axis of tensor \(\mathbf{A}\) and vector \(\vec{b}\)

1.7.3. Broadcasting and Shapes#

  1. Consider two vectors \(\vec{a}\) and \(\vec{b}\). Using reshaping and broadcasting alone, write python code to compute their outer product.

  2. Why is the code a.reshape((-1, 3, -1)) invalid?

  3. You have a tensor of unknown rank \(\mathbf{A}\) and would like to subtract both 3.5 and 2.5 from every element, giving two outputs for every input. Your output will be a new tensor \(\mathbf{B}\) with rank \(\textrm{rank}(\mathbf{A}) + 1\). The last axis of \(\mathbf{B}\) should be dimension 2. Here is the example:

a = np.array([10])
f(a)
# prints [[6.5, 7.5]]

b = np.array([[5, 3, 0], [0, 2, 6]])
f(b)
# [[[ 1.5  2.5]
#  [-0.5  0.5]
#  [-3.5 -2.5]]

# [[-3.5 -2.5]
#  [-1.5 -0.5]
#  [ 2.5  3.5]]]

1.8. Cited References#

Str20

J Straub. Mathematical methods for molecular science. 2020.