\(A^TA\) and \(AA^T\)

linear algebra
definite matrix
linear operator
eigenvalue
eigenvector
matrix factorization
svd
eigendecomposition
mlf
explainer
book
review

If \(\displaystyle \mathbf{A}\) is a \(\displaystyle m\times n\) matrix, then \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is a \(\displaystyle n\times n\) matrix and \(\displaystyle \mathbf{AA}^{T}\) is a \(\displaystyle m\times m\) matrix.

Property-1

\(\mathbf{A}^{T}\mathbf{A}\) and \(\displaystyle \mathbf{AA}^{T}\) are symmetric matrices.

Let us show this for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\):

\[ (\mathbf{A}^{T}\mathbf{A} )^{T} =\mathbf{A}^{T} (\mathbf{A}^{T} )^{T} =\mathbf{A}^{T}\mathbf{A} \]

The argument is similar for \(\displaystyle \mathbf{AA}^{T}\).

Property-2

The nullspace of \(\displaystyle \mathbf{A}\) is equal to the nullspace of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\). Likewise, the nullspace of \(\displaystyle \mathbf{A}^{T}\) is equal to the nullspace of \(\displaystyle \mathbf{AA}^{T}\).

Let us denote the nullspaces as \(\displaystyle \mathcal{N}(\mathbf{A})\) and \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\).

If \(\displaystyle \mathbf{x} \in \mathcal{N}(\mathbf{A})\), then we have the following sequence of observations:

\[ \begin{aligned} \mathbf{Ax} & =\mathbf{0}\\ \mathbf{A}^{T}\mathbf{Ax} & =\mathbf{0}\\ \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{x} & =\mathbf{0} \end{aligned} \]

Thus, \(\displaystyle \mathbf{x} \in \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\) and therefore \(\displaystyle \mathcal{N}(\mathbf{A}) \subseteq \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\).

If \(\displaystyle \mathbf{x} \in \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\), then we have the following sequence of observations:

\[ \begin{aligned} \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{x} & =\mathbf{0}\\ \mathbf{x}^{T}\mathbf{A}^{T}\mathbf{Ax} & =0\\ (\mathbf{Ax})^{T}(\mathbf{Ax}) & =0\\ \mathbf{Ax} & =\mathbf{0} \end{aligned} \]

Thus, \(\displaystyle \mathbf{x} \in \mathcal{N}(\mathbf{A})\) and therefore \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right) \subseteq \mathcal{N}(\mathbf{A})\).

From this, we conclude that \(\displaystyle \mathcal{N}(\mathbf{A}) =\mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\). A similar argument can be used for \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\right) =\mathcal{N}\left(\mathbf{AA}^{T}\right)\).

Property-3

\(\displaystyle \text{rank}\left(\mathbf{A}^{T}\mathbf{A}\right) =\text{rank}\left(\mathbf{AA}^{T}\right) =\text{rank}(\mathbf{A})\)

Using the rank nullity theorem, we see the following:

\[ \begin{aligned} \text{rank}(\mathbf{A}) +\text{nullity}(\mathbf{A}) & =n\\ \text{rank}\left(\mathbf{A}^{T}\mathbf{A}\right) +\text{nullity}\left(\mathbf{A}^{T}\mathbf{A}\right) & =n \end{aligned} \]

Using property-2 (refer to Section 2), we can conclude that the rank of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is the same as the rank of \(\displaystyle \mathbf{A}\). Next, we know that the row rank is the same as the column rank, or \(\displaystyle \text{rank}(\mathbf{A}) =\text{rank}\left(\mathbf{A}^{T}\right)\). Again applying the rank nullity theorem:

\[ \begin{aligned} \text{rank}\left(\mathbf{A}^{T}\right) +\text{nullity}\left(\mathbf{A}^{T}\right) & =m\\ \text{rank}\left(\mathbf{AA}^{T}\right) +\text{nullity}\left(\mathbf{AA}^{T}\right) & =m \end{aligned} \]

We can now conclude that \(\displaystyle \text{rank}\left(\mathbf{A}^{T}\mathbf{A}\right) =\text{rank}\left(\mathbf{AA}^{T}\right) =\text{rank}(\mathbf{A})\).

Property-4

If \(\text{rank} (\mathbf{A} )=n\), then \(\mathbf{A}^{T}\mathbf{A}\) is invertible.

From the previous property, we know that \(\text{rank}(\mathbf{A}^T\mathbf{A}) = \text{rank}(A)\). Since \(\mathbf{A}^T\mathbf{A}\) is full rank, it is invertible.

Property-5

The column space of \(\displaystyle \mathbf{A}^{T}\) is equal to the column space of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\). Likewise, the column space of \(\displaystyle \mathbf{A}\) is equal to the column space of \(\displaystyle \mathbf{AA}^{T}\).

First we note that the row space of \(\displaystyle \mathbf{A}\) is orthogonal to the nullspace of \(\displaystyle \mathbf{A}\). Alternatively put, the column space of \(\displaystyle \mathbf{A}^{T}\) is orthogonal to the nullspace of \(\displaystyle \mathbf{A}\). Denoting the column space as \(\displaystyle \mathcal{R}( \cdot )\), we can express this relationship as:

\[ \mathcal{R}\left(\mathbf{A}^{T}\right) \perp \mathcal{N}(\mathbf{A}) \]

There is actually something slightly stronger that we need. This is the notion of orthogonal complement.

NoteOrthogonal Complement

The orthogonal complement of a subspace \(\displaystyle U\) of a finite-dimensional vector space \(\displaystyle V\), denoted by \(\displaystyle U^{\perp }\), is the set of all vectors in \(\displaystyle V\) that are orthogonal to \(\displaystyle U\). It can be shown that:

  • \(\displaystyle U^{\perp }\) is a subspace of \(\displaystyle V\)

  • Every vector in \(\displaystyle V\) can be expressed uniquely as \(\displaystyle u+u^{\perp }\), where \(\displaystyle u\in U,u^{\perp } \in U^{\perp }\).

  • \(\displaystyle \left( U^{\perp }\right)^{\perp } =U\)

  • \(\displaystyle \text{dim}( U) +\text{dim}\left( U^{\perp }\right) =\text{dim}( V)\)

We can actually show that the row space is the orthogonal complement of the nullspace and vice-versa.

\[ \mathcal{N}(\mathbf{A})^{\perp } =\mathcal{R}\left(\mathbf{A}^{T}\right) \]

A similar result holds for \(\displaystyle A^{T} A\):

\[ \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)^{\perp } =R\left(\mathbf{A}^{T}\mathbf{A}\right) \]

From property-2, we know that \(\displaystyle \mathcal{N}(\mathbf{A}) =\mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\). This along with the properties of orthogonal complements allow us to conclude that:

\[ \mathcal{R}\left(\mathbf{A}^{T}\right) =\mathcal{R}\left(\mathbf{A}^{T}\mathbf{A}\right) \]

Property-6

All the eigenvalues of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and \(\displaystyle \mathbf{AA}^{T}\) are non-negative.

Let \(\displaystyle ( \lambda ,\mathbf{u})\) be an eigenpair of \(\displaystyle \mathbf{AA}^{T}\). Then:

\[ \begin{aligned} \left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{u}\\ \mathbf{u}^{T}\mathbf{AA}^{T}\mathbf{u} & =\lambda \mathbf{u}^{T}\mathbf{u}\\ \left(\mathbf{A}^{T}\mathbf{u}\right)^{T}\left(\mathbf{A}^{T}\mathbf{u}\right) & =\lambda ||\mathbf{u} ||^{2}\\ ||\mathbf{A}^{T}\mathbf{u} ||^{2} & =\lambda ||\mathbf{u} ||^{2} \end{aligned} \]

Since \(\displaystyle \mathbf{u}\) is an eigenvector of \(\displaystyle \mathbf{AA}^{T}\), it is not the zero vector, so \(\displaystyle ||\mathbf{u}||\neq 0\). We can now express \(\displaystyle \lambda\) as:

\[ \lambda =\cfrac{||\mathbf{A}^{T}\mathbf{u} ||^{2}}{||\mathbf{u} ||^{2}} \geqslant 0 \]

A similar argument holds for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\).

Property-7

\(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and \(\displaystyle \mathbf{AA}^{T}\) have the same non-zero eigenvalues.

Let \(\displaystyle ( \lambda ,\mathbf{v})\) be an eigenpair of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) with \(\displaystyle \lambda \neq 0\).

\[ \begin{aligned} \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v} & =\lambda \mathbf{v}\\ \mathbf{A}\left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v} & =\lambda (\mathbf{Av})\\ \left(\mathbf{AA}^{T}\right)(\mathbf{Av}) & =\lambda (\mathbf{Av}) \end{aligned} \]

\(\displaystyle ( \lambda ,\mathbf{Av})\) is an eigenpair of \(\displaystyle \mathbf{AA}^{T}\) provided \(\displaystyle \mathbf{Av} \neq \mathbf{0}\) as an eigenvector cannot be the zero-vector. We need to show that \(\displaystyle \mathbf{Av} \neq \mathbf{0}\). If \(\displaystyle \mathbf{Av}\) happens to be \(\displaystyle \mathbf{0}\), then \(\displaystyle \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v}\) will also become \(\displaystyle \mathbf{0}\). But this would mean that \(\displaystyle \mathbf{v}\) is an eigenvector of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) with eigenvalue \(\displaystyle 0\), implying that \(\displaystyle \lambda =0\). This contradicts the fact that \(\displaystyle \lambda \neq 0\). So, \(\displaystyle \mathbf{Av} \neq \mathbf{0}\) and hence is an eigenvector of \(\displaystyle \mathbf{AA}^{T}\).

We have shown that every non-zero eigenvalue of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is also an eigenvalue of \(\displaystyle \mathbf{AA}^{T}\). We can now go the other way and show that every non-zero eigenvalue of \(\displaystyle \mathbf{AA}^{T}\) is also an eigenvalue of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\).

Property-8

This is a result that lets us move between the eigenvectors of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and \(\displaystyle \mathbf{AA}^{T}\) for the positive eigenvalues. There are two similar looking results:

  1. If \(\displaystyle ( \lambda ,\mathbf{v})\) is an eigenpair for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) with \(\displaystyle \lambda >0\) and \(\displaystyle ||\mathbf{v} ||=1\), then \(\displaystyle ( \lambda ,\mathbf{u})\) is an eigenpair for \(\displaystyle \mathbf{AA}^{T}\) where \(\displaystyle \mathbf{u} =\cfrac{\mathbf{Av}}{\sigma }\) with \(\displaystyle \sigma =\sqrt{\lambda }\), so that \(\displaystyle ||\mathbf{u} ||=1\).

  2. If \(\displaystyle ( \lambda ,\mathbf{u})\) is an eigenpair for \(\displaystyle \mathbf{A A}^{T}\) with \(\displaystyle \lambda >0\) and \(\displaystyle ||\mathbf{u} ||=1\), then \(\displaystyle ( \lambda ,\mathbf{v})\) is an eigenpair for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) where \(\displaystyle \mathbf{v} =\cfrac{\mathbf{A^{T} u}}{\sigma }\) with \(\displaystyle \sigma =\sqrt{\lambda }\), so that \(\displaystyle ||\mathbf{v} ||=1\).

Let us prove this for (2):

\[ \begin{aligned} \left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{u}\\ \mathbf{A}^{T}\left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{A}^{T}\mathbf{u}\\ \left(\mathbf{A}^{T}\mathbf{A}\right)\left(\mathbf{A}^{T}\mathbf{u}\right) & =\lambda \left(\mathbf{A}^{T}\mathbf{u}\right) \end{aligned} \]

From property-6 (refer to Section 6) we already know that \(\displaystyle \left( \lambda ,\mathbf{A}^{T}\mathbf{u}\right)\) is an eigenpair of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\). We can normalize \(\displaystyle \mathbf{A}^{T}\mathbf{u}\) by dividing it by \(\displaystyle ||\mathbf{A}^{T}\mathbf{u} ||\), which we shall compute now:

\[ \begin{aligned} ||\mathbf{A}^{T}\mathbf{u} ||^{2} & =\left(\mathbf{A}^{T}\mathbf{u}\right)^{T}\left(\mathbf{A}^{T}\mathbf{u}\right)\\ & =\mathbf{u}^{T}\mathbf{AA}^{T}\mathbf{u}\\ & =\mathbf{u}^{T}\left[\left(\mathbf{AA}^{T}\right)\mathbf{u}\right]\\ & =\mathbf{u}^{T}( \lambda \mathbf{u})\\ & =\lambda \\ \Longrightarrow ||\mathbf{A}^{T}\mathbf{u} || & =\sqrt{\lambda }\\ & =\sigma \end{aligned} \]

Property-9

\(\displaystyle \mathbf{AA}^{T}\) and \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) have exactly \(\displaystyle r\) positive eigenvalues, where \(\displaystyle r\) is the rank of \(\displaystyle \mathbf{A}\).

Since \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is symmetric, the spectral theorem guarantees that it is orthogonally diagonalizable. That is, there exists a diagonal matrix \(\displaystyle \mathbf{D}\) and an orthogonal matrix \(\displaystyle \mathbf{Q}\) such that:

\[ \mathbf{A}^{T}\mathbf{A} =\mathbf{QDQ}^{T} \]

Here, the columns of \(\displaystyle \mathbf{Q}\) are the eigenvectors of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and the diagonal elements of \(\displaystyle \mathbf{D}\) are the corresponding eigenvalues. In other words, \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is similar to \(\displaystyle \mathbf{D}\). We know from an earlier result that the rank of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is equal to the rank of \(\displaystyle \mathbf{A}\), which we denote as \(\displaystyle r\). Since the rank of similar matrices are equal, we see that the rank of \(\displaystyle \mathbf{D}\) is also \(\displaystyle r\). The rank of a diagonal matrix is equal to the number of non-zero entries. Hence, \(\displaystyle r\) has exactly \(\displaystyle r\) non-zero entries. It follows that \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) has exactly \(\displaystyle r\) non-zero eigenvalues.

Property-10

A cluster of results related to definiteness of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and \(\displaystyle \mathbf{AA}^{T}\). We state only the results concerning \(\displaystyle \mathbf{A}^{T}\mathbf{A}\). All of this holds for \(\displaystyle \mathbf{AA}^{T}\) as well:

  • \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is positive semi-definite.

  • \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is positive definite if and only if it doesn’t have any zero eigenvalues.

  • \(\displaystyle \mathbf{A}^{T}\mathbf{A} +\lambda \mathbf{I}\) is invertible if \(\displaystyle \lambda >0\).

First, we see that \(\mathbf{A}^T \mathbf{A}\) is symmetric and this allows us to talk about its definiteness.

  • All eigenvalues of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) are non-negative. Hence, it is positive semi-definite.
  • The positive definiteness condition is straightforward.
  • The eigenvalues of \(\displaystyle \mathbf{A}^{T}\mathbf{A} +\lambda \mathbf{I}\) are positive if \(\displaystyle \lambda >0\), hence \(\displaystyle \mathbf{A}^{T}\mathbf{A} +\lambda \mathbf{I}\) is positive definite. Positive definite matrices are invertible. One way to see this is via the spectral theorem or to notice that the determinant is non-zero for a positive definite matrix.