Matrix Multiplication

linear algebra
matrix multiplication
NoteQuestion

How do we multiply two matrices?

This question may seem elementary and obvious at first sight. Let us see if this impression remains the same after the end of this section. There are several ways to understand the product of two matrices \(\mathbf{A}\) and \(\mathbf{B}\) of compatible dimensions. We will look at four different ways to do the same. Before we go there, we will look at matrix-vector products. Let \(\mathbf{A}\) be an \(m\times n\) matrix and \(\mathbf{B}\) be a \(n\times p\) matrix. Let \(\mathbf{C} =\mathbf{AB}\). We will also define vectors \(\displaystyle \mathbf{a}\) and \(\displaystyle \mathbf{b}\) along the way.

Vector-vector

Dot-product

We are quite familiar with the dot between two vectors. Given two \(\displaystyle n\)-dimensional vectors \(\displaystyle \mathbf{a}\) and \(\displaystyle \mathbf{b}\), we have:

\[ \mathbf{a}^{T}\mathbf{b} =\sum\limits _{i=1}^{n} a_{i} b_{i} \]

We know that the dot product is symmetric for real vector spaces:

\[ \mathbf{a}^{T}\mathbf{b} =\mathbf{b}^{T}\mathbf{a} \]

Among all the products that we see in this section, the inner product is the only one that is a scalar.

Outer product

The outer product between two vectors \(\displaystyle \mathbf{a}\) and \(\displaystyle \mathbf{b}\) is a matrix and is defined as:

\[ \mathbf{ab}^{T} \]

Let \(\displaystyle \mathbf{a} \in \mathbb{R}^{m}\) and \(\displaystyle \mathbf{b} \in \mathbb{R}^{n}\), then \(\displaystyle \mathbf{ab}^{T}\) is a \(\displaystyle m\times n\) matrix. Let us take an example and understand what the outer product looks like. Let \(\displaystyle \mathbf{a} =\begin{bmatrix} a_{1} & a_{2} & a_{3} \end{bmatrix}^{T}\) and \(\displaystyle \mathbf{b} =\begin{bmatrix} b_{1} & b_{2} & b_{3} \end{bmatrix}^{T}\), then we have:

\[ \begin{bmatrix} a_{1}\\ a_{2}\\ a_{3} \end{bmatrix}\begin{bmatrix} b_{1} & b_{2} & b_{3} \end{bmatrix} =\begin{bmatrix} a_{1} b_{1} & a_{1} b_{2} & a_{1} b_{3}\\ a_{2} b_{1} & a_{2} b_{2} & a_{2} b_{3}\\ a_{3} b_{1} & a_{3} b_{2} & a_{3} b_{3} \end{bmatrix} \]

If we notice the matrix on the right, each row is a multiple of the row vector \(\begin{bmatrix} b_{1} & b_{2} & b_{3} \end{bmatrix}\) and each column is a multiple of the column vector \(\displaystyle \begin{bmatrix} a_{1}\\ a_{2}\\ a_{3} \end{bmatrix}\). From this we can deduce that this matrix has rank \(1\). Here is an interesting fact concerning outer products.

NoteRemark

Every matrix of unit rank can be expressed as the outer product of two vectors.

To see why this is true, start with any \(m\times n\) matrix \(\mathbf{A}\) of unit rank. If the rank is \(1\), then every vector in the column space of \(\mathbf{A}\) is of the form \(k\cdot \mathbf{a}\), where \(\mathbf{a} =\begin{bmatrix} a_{1} & \cdots & a_{m} \end{bmatrix}^{T}\) . Thus, each column is just some scalar multiple of \(\displaystyle \mathbf{a}\). Since there are \(\displaystyle n\) columns, we have \(\displaystyle n\) coefficients, say \(\displaystyle b_{1} ,\cdots ,b_{n}\), such that the \(\displaystyle i^{th}\) column is \(\displaystyle b_{i} \cdot \mathbf{a}\). Therefore, the matrix \(\displaystyle \mathbf{A}\) is:

\[ \mathbf{A} =\begin{bmatrix} | & & |\\ b_{1}\mathbf{a} & \cdots & b_{n}\mathbf{a}\\ | & & | \end{bmatrix} \]

We see that this is the outer product between the vectors \(\displaystyle \mathbf{a}\) and \(\displaystyle b=\begin{bmatrix} b_{1} & \cdots & b_{n} \end{bmatrix}\):

\[ \mathbf{A} =\mathbf{ab}^{T} \]

Matrix-Vector and Vector-Matrix

We will now turn to the situation where one of the two operands is a vector and the other a matrix:

Matrix-Vector

Let \(\displaystyle \mathbf{A}\) be a \(\displaystyle m\times n\) matrix and \(\displaystyle \mathbf{b}\) a column vector in \(\displaystyle \mathbb{R}^{n}\) defined as follows:

\[ \mathbf{A} =\begin{bmatrix} | & & |\\ \mathbf{a}_{1} & \cdots & \mathbf{a}_{n}\\ | & & | \end{bmatrix} ,\ \ \ \mathbf{b} =\begin{bmatrix} b_{1}\\ \vdots \\ b_{n} \end{bmatrix} \]

Here, \(\displaystyle \mathbf{a}_{i}\) is the \(\displaystyle i^{th}\) column of \(\displaystyle \mathbf{A}\). Now, the product \(\displaystyle \mathbf{Ab}\) is defined as follows:

\[ \begin{aligned} \mathbf{Ab} & =b_{1}\mathbf{a}_{1} +\cdots +b_{n}\mathbf{a}_{n}\\ & \\ & =\sum\limits _{i=1}^{n} b_{i}\mathbf{a}_{i} \end{aligned} \]

The product of a matrix and a vector is a linear combination of the columns of the matrix where the coefficients of the combination come from the vector. One way to understand it is to look at what happens when the vectors come from the standard basis \(\displaystyle \{\mathbf{e}_{1} ,\cdots ,\mathbf{e}_{n}\}\):

\[ \mathbf{A} =\begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9 \end{bmatrix} ,\ \ \ \mathbf{e}_{1} =\begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} ,\ \mathbf{e}_{2} =\begin{bmatrix} 0\\ 1\\ 0 \end{bmatrix} ,\ \mathbf{e}_{3} =\begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \]

We get:

\[ \begin{aligned} \mathbf{Ae}_{1} & =\begin{bmatrix} 1\\ 4\\ 7 \end{bmatrix}\\ & \\ \mathbf{Ae}_{2} & =\begin{bmatrix} 2\\ 5\\ 8 \end{bmatrix}\\ & \\ \mathbf{Ae}_{3} & =\begin{bmatrix} 3\\ 6\\ 9 \end{bmatrix} \end{aligned} \]

We see that \(\displaystyle \mathbf{Ae}_{1} ,\mathbf{Ae}_{2} ,\mathbf{Ae}_{3}\) represent the first three columns of \(\displaystyle \mathbf{A}\). Now, for any arbitrary vector \(\displaystyle \mathbf{b} =\begin{bmatrix} b_{1} & b_{2} & b_{3} \end{bmatrix}^{T}\), we have \(\displaystyle \mathbf{b} =b_{1}\mathbf{e}_{1} +b_{2}\mathbf{e}_{2} +b_{3}\mathbf{e}_{3}\). From this it follows that \(\displaystyle Ab\) is:

\[ \begin{aligned} \mathbf{Ab} & =\mathbf{A}( b_{1}\mathbf{e}_{1} +b_{2}\mathbf{e}_{2} +b_{3}\mathbf{e}_{3})\\ & \\ & =b_{1}\mathbf{Ae}_{1} +b_{2}\mathbf{Ae}_{2} +b_{3}\mathbf{Ae}_{3}\\ & \\ & =b_{1} \cdot \begin{bmatrix} 1\\ 4\\ 7 \end{bmatrix} +b_{2} \cdot \begin{bmatrix} 2\\ 5\\ 8 \end{bmatrix} +b_{3} \cdot \begin{bmatrix} 3\\ 6\\ 9 \end{bmatrix} \end{aligned} \]

Vector-Matrix

Let us now look at the a row vector \(\displaystyle \mathbf{a}^{T}\) and matrix \(\displaystyle \mathbf{B}\):

\[ \mathbf{a}^{T} =\begin{bmatrix} a_{1} & \cdots & a_{m} \end{bmatrix} ,\ \ \ \mathbf{B} =\begin{bmatrix} — & \mathbf{b}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{b}_{m} & — \end{bmatrix} \]

The product \(\displaystyle \mathbf{a}^{T}\mathbf{B}\) is:

\[ \begin{aligned} \mathbf{a}^{T}\mathbf{B} & =a_{1}\mathbf{b}_{1}^{T} +\cdots +a_{m}\mathbf{b}_{m}^{T}\\ & \\ & =\sum\limits _{i=1}^{m} a_{i}\mathbf{b}_{i}^{T} \end{aligned} \]

A vector matrix product is the linear combination of the rows of the matrix where the coefficients of the combination come from the row vector.

Matrix-Matrix products

Row-column picture

\(\mathbf{A}\) is expressed in terms of \(m\) row vectors. \(\mathbf{B}\) is expressed in terms of \(p\) column vectors.

\[ \begin{bmatrix} — & \mathbf{a}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{a}_{m}^{T} & — \end{bmatrix}\begin{bmatrix} | & & |\\ \mathbf{b}_{1} & \cdots & \mathbf{b}_{p}\\ | & & | \end{bmatrix} \]

The \(ij^{th}\) element of the matrix \(\mathbf{C}\) is:

\[ C_{ij} =\mathbf{a}_{i}^{T}\mathbf{b}_{j} \]

This is by far the most popular form of matrix multiplication and is the first thing that we learn in high school. Each element of the matrix is represented as the dot product of two vectors.

Matrix-column picture

\(\mathbf{B}\) is expressed as a sequence of \(p\) column vectors.

\[ \mathbf{A}\begin{bmatrix} | & & |\\ \mathbf{b}_{1} & \cdots & \mathbf{b}_{p}\\ | & & | \end{bmatrix} \]

Each column of \(\mathbf{C}\) can be represented as \(\mathbf{Ab}_{i}\), a linear combination of the columns of \(\mathbf{A}\), where the coefficients of the combination are given by the columns of \(\mathbf{B}\):

\[ \mathbf{C} =\begin{bmatrix} | & & |\\ \mathbf{Ab}_{1} & \cdots & \mathbf{Ab}_{p}\\ | & & | \end{bmatrix} \]

Row-matrix picture

\(\mathbf{A}\) is expressed as a sequence of \(m\) row vectors:

\[ \begin{bmatrix} — & \mathbf{a}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{a}_{m}^{T} & — \end{bmatrix}\mathbf{B} \]

Each row of the matrix \(\mathbf{C}\) is of the form \(\mathbf{a}_{i}^{T}\mathbf{B}\), a linear combination of the rows of \(\mathbf{B}\), where the coefficients of the combination are given by the rows of \(\mathbf{A}\):

\[ \mathbf{C} =\begin{bmatrix} — & \mathbf{a}_{1}^{T}\mathbf{B} & — \\ & \vdots & \\ — & \mathbf{a}_{m}^{T}\mathbf{B} & — \end{bmatrix} \]

Column-row picture

\(\mathbf{A}\) is expressed in terms of \(n\) column vectors. \(\mathbf{B}\) is expressed in terms of \(n\) row vectors.

\[ \begin{bmatrix} | & & |\\ \mathbf{a}_{1} & \cdots & \mathbf{a}_{n}\\ | & & | \end{bmatrix}\begin{bmatrix} — & \mathbf{b}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{b}_{n}^{T} & — \end{bmatrix} \]

Here is another way to multiply two matrices:

\[ \mathbf{C} =\sum\limits _{i=1}^{n}\mathbf{a}_{i}\mathbf{b}_{i}^{T} \]

Matrix \(\mathbf{C}\) is expressed as the sum of \(n\) matrices of unit rank. Alternatively, the matrix is represented as the sum of \(n\) outer products. To see why this is true, consider a unit vector \(\mathbf{e}_{k}\) in the standard basis for \(\mathbb{R}^{p}\). Now, let us look the product of the RHS with \(\mathbf{e}_{k}\). This is nothing but the \(k^{th}\) column of this matrix:

\[ \sum\limits _{i=1}^{n}\mathbf{a}_{i}\mathbf{b}_{i}^{T}\mathbf{e}_{k} =\sum\limits _{i=1}^{n} (\mathbf{b}_{i}^{T}\mathbf{e}_{k} )\mathbf{a}_{i} \]

The RHS of the above expression is a linear combinations of the columns of the matrix \(\mathbf{A}\). The coefficients of the combination are the columns of the matrix \(\mathbf{B}\). This is the same as the matrix-column picture.

Summary

Let us now summarize all the operations in the following table:

Left Operand Right Operand Operation Formula
Row vector Column vector Dot product \(\displaystyle \mathbf{a}^{T}\mathbf{b}\)
Column vector Row vector Outer product \(\displaystyle \mathbf{ab}^{T}\)
Matrix Column vector Matrix-vector product \(\displaystyle \mathbf{Ab}\)
Row vector Matrix Vector-matrix product \(\displaystyle \mathbf{a}^{T}\mathbf{B}\)
Matrix Matrix Row-column picture \(\begin{bmatrix} — & \mathbf{a}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{a}_{m}^{T} & — \end{bmatrix}\begin{bmatrix} | & & |\\ \mathbf{b}_{1} & \cdots & \mathbf{b}_{p}\\ | & & | \end{bmatrix}\)
Matrix Matrix Matrix-column picture \(\mathbf{A}\begin{bmatrix} | & & |\\ \mathbf{b}_{1} & \cdots & \mathbf{b}_{p}\\ | & & | \end{bmatrix}\)
Matrix Matrix Row-matrix picture \(\begin{bmatrix} — & \mathbf{a}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{a}_{m}^{T} & — \end{bmatrix}\mathbf{B}\)
Matrix Matrix Column-row picture \(\begin{bmatrix} | & & |\\ \mathbf{a}_{1} & \cdots & \mathbf{a}_{n}\\ | & & | \end{bmatrix}\begin{bmatrix} — & \mathbf{b}_{1}^{T} & — \\ & \vdots & \\ — & \mathbf{b}_{n}^{T} & — \end{bmatrix}\)