\(A^TA\) and \(AA^T\)
If \(\displaystyle \mathbf{A}\) is a \(\displaystyle m\times n\) matrix, then \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is a \(\displaystyle n\times n\) matrix and \(\displaystyle \mathbf{AA}^{T}\) is a \(\displaystyle m\times m\) matrix.
Property-1
Let us show this for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\):
\[ (\mathbf{A}^{T}\mathbf{A} )^{T} =\mathbf{A}^{T} (\mathbf{A}^{T} )^{T} =\mathbf{A}^{T}\mathbf{A} \]
The argument is similar for \(\displaystyle \mathbf{AA}^{T}\).
Property-2
Let us denote the nullspaces as \(\displaystyle \mathcal{N}(\mathbf{A})\) and \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\).
If \(\displaystyle \mathbf{x} \in \mathcal{N}(\mathbf{A})\), then we have the following sequence of observations:
\[ \begin{aligned} \mathbf{Ax} & =\mathbf{0}\\ \mathbf{A}^{T}\mathbf{Ax} & =\mathbf{0}\\ \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{x} & =\mathbf{0} \end{aligned} \]
Thus, \(\displaystyle \mathbf{x} \in \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\) and therefore \(\displaystyle \mathcal{N}(\mathbf{A}) \subseteq \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\).
If \(\displaystyle \mathbf{x} \in \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\), then we have the following sequence of observations:
\[ \begin{aligned} \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{x} & =\mathbf{0}\\ \mathbf{x}^{T}\mathbf{A}^{T}\mathbf{Ax} & =0\\ (\mathbf{Ax})^{T}(\mathbf{Ax}) & =0\\ \mathbf{Ax} & =\mathbf{0} \end{aligned} \]
Thus, \(\displaystyle \mathbf{x} \in \mathcal{N}(\mathbf{A})\) and therefore \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right) \subseteq \mathcal{N}(\mathbf{A})\).
From this, we conclude that \(\displaystyle \mathcal{N}(\mathbf{A}) =\mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\). A similar argument can be used for \(\displaystyle \mathcal{N}\left(\mathbf{A}^{T}\right) =\mathcal{N}\left(\mathbf{AA}^{T}\right)\).
Property-3
Using the rank nullity theorem, we see the following:
\[ \begin{aligned} \text{rank}(\mathbf{A}) +\text{nullity}(\mathbf{A}) & =n\\ \text{rank}\left(\mathbf{A}^{T}\mathbf{A}\right) +\text{nullity}\left(\mathbf{A}^{T}\mathbf{A}\right) & =n \end{aligned} \]
Using property-2 (refer to Section 2), we can conclude that the rank of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is the same as the rank of \(\displaystyle \mathbf{A}\). Next, we know that the row rank is the same as the column rank, or \(\displaystyle \text{rank}(\mathbf{A}) =\text{rank}\left(\mathbf{A}^{T}\right)\). Again applying the rank nullity theorem:
\[ \begin{aligned} \text{rank}\left(\mathbf{A}^{T}\right) +\text{nullity}\left(\mathbf{A}^{T}\right) & =m\\ \text{rank}\left(\mathbf{AA}^{T}\right) +\text{nullity}\left(\mathbf{AA}^{T}\right) & =m \end{aligned} \]
We can now conclude that \(\displaystyle \text{rank}\left(\mathbf{A}^{T}\mathbf{A}\right) =\text{rank}\left(\mathbf{AA}^{T}\right) =\text{rank}(\mathbf{A})\).
Property-4
From the previous property, we know that \(\text{rank}(\mathbf{A}^T\mathbf{A}) = \text{rank}(A)\). Since \(\mathbf{A}^T\mathbf{A}\) is full rank, it is invertible.
Property-5
First we note that the row space of \(\displaystyle \mathbf{A}\) is orthogonal to the nullspace of \(\displaystyle \mathbf{A}\). Alternatively put, the column space of \(\displaystyle \mathbf{A}^{T}\) is orthogonal to the nullspace of \(\displaystyle \mathbf{A}\). Denoting the column space as \(\displaystyle \mathcal{R}( \cdot )\), we can express this relationship as:
\[ \mathcal{R}\left(\mathbf{A}^{T}\right) \perp \mathcal{N}(\mathbf{A}) \]
There is actually something slightly stronger that we need. This is the notion of orthogonal complement.
The orthogonal complement of a subspace \(\displaystyle U\) of a finite-dimensional vector space \(\displaystyle V\), denoted by \(\displaystyle U^{\perp }\), is the set of all vectors in \(\displaystyle V\) that are orthogonal to \(\displaystyle U\). It can be shown that:
\(\displaystyle U^{\perp }\) is a subspace of \(\displaystyle V\)
Every vector in \(\displaystyle V\) can be expressed uniquely as \(\displaystyle u+u^{\perp }\), where \(\displaystyle u\in U,u^{\perp } \in U^{\perp }\).
\(\displaystyle \left( U^{\perp }\right)^{\perp } =U\)
\(\displaystyle \text{dim}( U) +\text{dim}\left( U^{\perp }\right) =\text{dim}( V)\)
We can actually show that the row space is the orthogonal complement of the nullspace and vice-versa.
\[ \mathcal{N}(\mathbf{A})^{\perp } =\mathcal{R}\left(\mathbf{A}^{T}\right) \]
A similar result holds for \(\displaystyle A^{T} A\):
\[ \mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)^{\perp } =R\left(\mathbf{A}^{T}\mathbf{A}\right) \]
From property-2, we know that \(\displaystyle \mathcal{N}(\mathbf{A}) =\mathcal{N}\left(\mathbf{A}^{T}\mathbf{A}\right)\). This along with the properties of orthogonal complements allow us to conclude that:
\[ \mathcal{R}\left(\mathbf{A}^{T}\right) =\mathcal{R}\left(\mathbf{A}^{T}\mathbf{A}\right) \]
Property-6
Let \(\displaystyle ( \lambda ,\mathbf{u})\) be an eigenpair of \(\displaystyle \mathbf{AA}^{T}\). Then:
\[ \begin{aligned} \left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{u}\\ \mathbf{u}^{T}\mathbf{AA}^{T}\mathbf{u} & =\lambda \mathbf{u}^{T}\mathbf{u}\\ \left(\mathbf{A}^{T}\mathbf{u}\right)^{T}\left(\mathbf{A}^{T}\mathbf{u}\right) & =\lambda ||\mathbf{u} ||^{2}\\ ||\mathbf{A}^{T}\mathbf{u} ||^{2} & =\lambda ||\mathbf{u} ||^{2} \end{aligned} \]
Since \(\displaystyle \mathbf{u}\) is an eigenvector of \(\displaystyle \mathbf{AA}^{T}\), it is not the zero vector, so \(\displaystyle ||\mathbf{u}||\neq 0\). We can now express \(\displaystyle \lambda\) as:
\[ \lambda =\cfrac{||\mathbf{A}^{T}\mathbf{u} ||^{2}}{||\mathbf{u} ||^{2}} \geqslant 0 \]
A similar argument holds for \(\displaystyle \mathbf{A}^{T}\mathbf{A}\).
Property-7
Let \(\displaystyle ( \lambda ,\mathbf{v})\) be an eigenpair of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) with \(\displaystyle \lambda \neq 0\).
\[ \begin{aligned} \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v} & =\lambda \mathbf{v}\\ \mathbf{A}\left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v} & =\lambda (\mathbf{Av})\\ \left(\mathbf{AA}^{T}\right)(\mathbf{Av}) & =\lambda (\mathbf{Av}) \end{aligned} \]
\(\displaystyle ( \lambda ,\mathbf{Av})\) is an eigenpair of \(\displaystyle \mathbf{AA}^{T}\) provided \(\displaystyle \mathbf{Av} \neq \mathbf{0}\) as an eigenvector cannot be the zero-vector. We need to show that \(\displaystyle \mathbf{Av} \neq \mathbf{0}\). If \(\displaystyle \mathbf{Av}\) happens to be \(\displaystyle \mathbf{0}\), then \(\displaystyle \left(\mathbf{A}^{T}\mathbf{A}\right)\mathbf{v}\) will also become \(\displaystyle \mathbf{0}\). But this would mean that \(\displaystyle \mathbf{v}\) is an eigenvector of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) with eigenvalue \(\displaystyle 0\), implying that \(\displaystyle \lambda =0\). This contradicts the fact that \(\displaystyle \lambda \neq 0\). So, \(\displaystyle \mathbf{Av} \neq \mathbf{0}\) and hence is an eigenvector of \(\displaystyle \mathbf{AA}^{T}\).
We have shown that every non-zero eigenvalue of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is also an eigenvalue of \(\displaystyle \mathbf{AA}^{T}\). We can now go the other way and show that every non-zero eigenvalue of \(\displaystyle \mathbf{AA}^{T}\) is also an eigenvalue of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\).
Property-8
Let us prove this for (2):
\[ \begin{aligned} \left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{u}\\ \mathbf{A}^{T}\left(\mathbf{AA}^{T}\right)\mathbf{u} & =\lambda \mathbf{A}^{T}\mathbf{u}\\ \left(\mathbf{A}^{T}\mathbf{A}\right)\left(\mathbf{A}^{T}\mathbf{u}\right) & =\lambda \left(\mathbf{A}^{T}\mathbf{u}\right) \end{aligned} \]
From property-6 (refer to Section 6) we already know that \(\displaystyle \left( \lambda ,\mathbf{A}^{T}\mathbf{u}\right)\) is an eigenpair of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\). We can normalize \(\displaystyle \mathbf{A}^{T}\mathbf{u}\) by dividing it by \(\displaystyle ||\mathbf{A}^{T}\mathbf{u} ||\), which we shall compute now:
\[ \begin{aligned} ||\mathbf{A}^{T}\mathbf{u} ||^{2} & =\left(\mathbf{A}^{T}\mathbf{u}\right)^{T}\left(\mathbf{A}^{T}\mathbf{u}\right)\\ & =\mathbf{u}^{T}\mathbf{AA}^{T}\mathbf{u}\\ & =\mathbf{u}^{T}\left[\left(\mathbf{AA}^{T}\right)\mathbf{u}\right]\\ & =\mathbf{u}^{T}( \lambda \mathbf{u})\\ & =\lambda \\ \Longrightarrow ||\mathbf{A}^{T}\mathbf{u} || & =\sqrt{\lambda }\\ & =\sigma \end{aligned} \]
Property-9
Since \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is symmetric, the spectral theorem guarantees that it is orthogonally diagonalizable. That is, there exists a diagonal matrix \(\displaystyle \mathbf{D}\) and an orthogonal matrix \(\displaystyle \mathbf{Q}\) such that:
\[ \mathbf{A}^{T}\mathbf{A} =\mathbf{QDQ}^{T} \]
Here, the columns of \(\displaystyle \mathbf{Q}\) are the eigenvectors of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) and the diagonal elements of \(\displaystyle \mathbf{D}\) are the corresponding eigenvalues. In other words, \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is similar to \(\displaystyle \mathbf{D}\). We know from an earlier result that the rank of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) is equal to the rank of \(\displaystyle \mathbf{A}\), which we denote as \(\displaystyle r\). Since the rank of similar matrices are equal, we see that the rank of \(\displaystyle \mathbf{D}\) is also \(\displaystyle r\). The rank of a diagonal matrix is equal to the number of non-zero entries. Hence, \(\displaystyle r\) has exactly \(\displaystyle r\) non-zero entries. It follows that \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) has exactly \(\displaystyle r\) non-zero eigenvalues.
Property-10
First, we see that \(\mathbf{A}^T \mathbf{A}\) is symmetric and this allows us to talk about its definiteness.
- All eigenvalues of \(\displaystyle \mathbf{A}^{T}\mathbf{A}\) are non-negative. Hence, it is positive semi-definite.
- The positive definiteness condition is straightforward.
- The eigenvalues of \(\displaystyle \mathbf{A}^{T}\mathbf{A} +\lambda \mathbf{I}\) are positive if \(\displaystyle \lambda >0\), hence \(\displaystyle \mathbf{A}^{T}\mathbf{A} +\lambda \mathbf{I}\) is positive definite. Positive definite matrices are invertible. One way to see this is via the spectral theorem or to notice that the determinant is non-zero for a positive definite matrix.