Diagonalization

Question

What is diagonalization?

Diagonal matrices

Recall that diagonal matrices are quite interesting to study from the point of view of eigenvectors and eigenvalues. So let us first explore some properties of the same. Let \(\mathbf{D}\) be an arbitrary diagonal matrix:

\[ \mathbf{D} =\begin{bmatrix} a_{1} & & \\ & \ddots & \\ & & a_{n} \end{bmatrix} =\text{diag} (a_{1} ,\cdots ,a_{n} ) \]

Determinant

The determinant of a diagonal matrix is given by the product of the values along the diagonal:

\[ |\mathbf{D} |=a_{1} \cdots a_{n} \]

Invertibility and Rank

From the above equation, it follows that a diagonal matrix is invertible if and only if none of the diagonal elements are zero. If this is the case, then the inverse is given by:

\[ \mathbf{D}^{-1} = \text{diag}\left(\cfrac{1}{a_1}, \cdots, \cfrac{1}{a_n} \right) \]

Also, note that the rank of a diagonal matrix is given by the number of non-zero entries along the diagonal.

Matrix power

Consider the matrix \(\mathbf{D}^{k}\). This is extremely simple to compute in the case of a diagonal matrix:

\[ \mathbf{D}^{k} =\text{diag} (a_{1}^{k} ,\cdots ,a_{n}^{k} ) \]

To see why this is true, note that the product of two diagonal matrices is just the product of their diagonal entries.

Eigenvalues

This is something that we have already seen. The eigenvalues of a diagonal matrix are the diagonal elements. For the next two properties, let us express the diagonal matrix in terms of its eigenvalues:

\[ \mathbf{D} =\text{diag} (\lambda _{1} ,\cdots ,\lambda _{n} ) \]

A diagonal matrix has \(n\) eigenvalues, not necessarily distinct. The characteristic polynomial corresponding to it is:

\[ (\lambda _{1} -\lambda )\cdots (\lambda _{n} -\lambda ) \]

The characteristic polynomial can be decomposed into \(n\) factors, where the factors need not necessarily be distinct.

Eigenvectors

It is easy to see that the elements of the standard basis \(\beta =\{\mathbf{e}_{1} ,\cdots ,\mathbf{e}_{n} \}\) are the eigenvectors of \(\mathbf{D}\). Specifically, \(\mathbf{e}_{i}\) is an eigenvector of \(\mathbf{D}\) corresponding to the eigenvalue \(\lambda _{i}\) as \(\mathbf{De}_{i} =\lambda _{i}\mathbf{e}_{i}\). Can we characterize every possible eigenvector corresponding to eigenvalue \(\lambda _{i}\)? For this particular case, let us assume that the eigenvalues are all distinct. Now, let \(\mathbf{v}\) be some eigenvector of \(\mathbf{D}\) corresponding to \(\lambda _{i}\):

\[ \begin{bmatrix} \lambda _{1} & & \\ & \ddots & \\ & & \lambda _{n} \end{bmatrix}\begin{bmatrix} v_{1}\\ \vdots \\ v_{n} \end{bmatrix} =\begin{bmatrix} \lambda _{i} v_{1}\\ \vdots \\ \lambda _{i} v_{n} \end{bmatrix} \]

This can be further simplified as:

\[ \begin{bmatrix} (\lambda _{1} -\lambda _{i} )v_{1}\\ \vdots \\ (\lambda _{n} -\lambda _{i} )v_{n} \end{bmatrix} =\begin{bmatrix} 0\\ \vdots \\ 0 \end{bmatrix} \]

Since, \(\lambda _{i} \neq \lambda _{j}\) for \(i\neq j\), we have, \(v_{j} =0\) for \(j\neq i\). Thus, the eigenvector corresponding to \(\lambda _{i}\) is some non-zero vector in \(E( \lambda _{i}) =\text{span} (\{\mathbf{e}_{i} \})\). This argument can be extended to the case when the eigenvalues are not distinct. If \(\displaystyle \lambda _{i}\) occurs \(\displaystyle k\) times along the diagonal then the eigenspace is \(\displaystyle k\)-dimensional and is the span of the unit vectors corresponding to these \(\displaystyle k\) elements.

Similar Matrices

Most matrices are not necessarily diagonal. However, it would be very convenient to have matrices that share some of the properties of diagonal matrices that we have seen in the preceeding section. Even if a matrix is not diagonal, can we hope that it is related in some way to a diagonal matrix, matrices that share a close kinship with them? To establish this link, we turn to a concept we are already familiar with: similarity.

Similar Matrices

Matrices \(\displaystyle \mathbf{A}\) and \(\displaystyle \mathbf{B}\) of shape \(\displaystyle n\times n\) are similar if there exists an invertible matrix \(\displaystyle \mathbf{P}\) of the same shape such that \(\displaystyle \mathbf{A} =\mathbf{PBP}^{-1}\).

Similar matrices share a number of things in common. Let us explore them one by one. For the remainder of this section, \(\displaystyle \mathbf{A}\) and \(\displaystyle \mathbf{B}\) are similar matrices connected by the invertible matrix \(\displaystyle \mathbf{P}\).

Determinant

The determinant of similar matrices are equal:

\[ \begin{aligned} |\mathbf{A} | & =|\mathbf{PBP}^{-1} |\\ & \\ & =|\mathbf{P} |\cdot |\mathbf{B} |\cdot |\mathbf{P}^{-1} |\\ & \\ & =|\mathbf{B} | \end{aligned} \]

To get to the final step, we have used the fact that \(|\mathbf{P}^{-1}| = \cfrac{1}{|\mathbf{P}|}\).

Rank and Invertibility

If \(\displaystyle \mathbf{A}\) and \(\displaystyle \mathbf{B}\) are similar, then they have the same rank. Since \(\displaystyle \mathbf{P}\) is an invertible matrix, it won’t affect the rank of the matrix it multiplies. This is one way of seeing that \(\displaystyle \mathbf{A}\) and \(\displaystyle \mathbf{B}\) have the same rank. It follows that \(\displaystyle \mathbf{A}\) is invertible if and only if \(\displaystyle \mathbf{B}\) is. If this is not quite clear, one can also see this from the previous property concerning the equality of determinants.

Characteristic Polynomial and Eigenvalues

We can now look at the characteristic polynomial of \(\mathbf{A}\):

\[ \begin{aligned} |\mathbf{A} -\lambda \mathbf{I} | & =|\mathbf{PBP}^{-1} -\lambda \mathbf{I} |\\ & \\ & =|\mathbf{PBP}^{-1} -\lambda \mathbf{P I P}^{-1} |\\ & \\ & =|\mathbf{P} (\mathbf{B} -\lambda \mathbf{I} )\mathbf{P}^{-1} |\\ & \\ & =|\mathbf{P} |\cdot |\mathbf{B} -\lambda \mathbf{I} |\cdot |\mathbf{P}^{-1} |\\ & \\ & =|\mathbf{B} -\lambda \mathbf{I} | \end{aligned} \]

The two matrices have the same characteristic polynomial. We therefore see that similar matrices have the same eigenvalues.

Diagonalizable Matrices

Now that we have a clear notion of what it means for two matrices to be similar, we can see that it isn’t a bad idea to look at matrices that are similar to a diagonal matrix. We will call such matrices diagonalizable matrices and make the definition more formal at the end of this section.

Definition-1

Let us assume that \(\displaystyle \mathbf{A}\) is similar to a diagonal matrix \(\displaystyle \mathbf{D}\). Then, there is some invertible matrix \(\displaystyle \mathbf{P}\) such that:

\[ \mathbf{A} =\mathbf{PDP}^{-1} \]

We shall now explore the consequences of this link.

Characteristic Polynomial and Eigenvalues

As we saw before, \(\displaystyle \mathbf{A}\) and \(\displaystyle \mathbf{D}\) share the same characteristic polynomial and hence the same eigenvalues. But then the similarity is with a diagonal matrix, hence the eigenvalues of \(\displaystyle \mathbf{A}\) can be read off directly if we know what the corresponding diagonal matrix is! Moreover, we see that \(\displaystyle \mathbf{A}\) has a full set of \(\displaystyle n\) eigenvalues, not necessarily distinct.

Matrix Power

This link with a diagonal matrix also confers a few other powers. Let us compute \(\displaystyle \mathbf{A}^{k}\):

\[ \begin{aligned} \mathbf{A}^{k} & =(\mathbf{PDP}^{-1} )^{k}\\ & \\ & =\underbrace{(\mathbf{PDP}^{-1} )\cdots (\mathbf{PDP}^{-1} )}_{k\text{ blocks}}\\ & \\ & =\mathbf{PD}^{k}\mathbf{P}^{-1} \end{aligned} \]

Raising \(\displaystyle \mathbf{A}\) to the power \(\displaystyle k\) becomes so simple! Also note that the eigenvalues of \(\displaystyle \mathbf{A}^{k}\) are just the eigenvalues of \(\displaystyle \mathbf{A}\) raised to the power \(\displaystyle k\).

Invertibility

Continuing in this manner, we can see that \(\displaystyle \mathbf{A}\) is invertible if and only if \(\displaystyle \mathbf{D}\) is invertible. This happens only if \(\displaystyle \mathbf{D}\) has no zeros on its diagonal. If \(\displaystyle \mathbf{D}\) is invertible, then computing the inverse of \(\displaystyle \mathbf{A}\) is again straightforward.

\[ \begin{aligned} \mathbf{A}^{-1} & =\mathbf{PD}^{-1}\mathbf{P}^{-1} \end{aligned} \]

Eigenvectors

Finally, we also note that the eigenvectors are hiding in plain sight in the equation \(\displaystyle \mathbf{A} =\mathbf{PDP}^{-1}\). Let us work on this next. Rearranging this equation by right-multiplying by \(\displaystyle \mathbf{P}\) on both sides, we have:

\[ \begin{aligned} \mathbf{A} & =\mathbf{PDP}^{-1}\\ & \\ \mathbf{AP} & =\mathbf{PD} \end{aligned} \]

Let us now express \(\displaystyle \mathbf{P}\) as a collection of columns:

\[ \mathbf{P} =\begin{bmatrix} | & & |\\ \mathbf{v}_{1} & \cdots & \mathbf{v}_{n}\\ | & & | \end{bmatrix} \]

Then, \(\displaystyle \mathbf{AP}\) and \(\displaystyle \mathbf{PD}\) become:

\[ \begin{aligned} \mathbf{AP} & =\mathbf{A}\begin{bmatrix} | & & |\\ \mathbf{v}_{1} & \cdots & \mathbf{v}_{n}\\ | & & | \end{bmatrix}\\ & \\ & =\begin{bmatrix} | & & |\\ \mathbf{Av}_{1} & \cdots & \mathbf{Av}_{n}\\ | & & | \end{bmatrix} \end{aligned} \ \ \ \ \ \ \begin{aligned} \mathbf{PD} & =\begin{bmatrix} | & & |\\ \mathbf{v}_{1} & \cdots & \mathbf{v}_{n}\\ | & & | \end{bmatrix}\begin{bmatrix} \lambda _{1} & & \\ & \ddots & \\ & & \lambda _{n} \end{bmatrix}\\ & \\ & =\begin{bmatrix} | & & |\\ \lambda _{1}\mathbf{v}_{1} & \cdots & \lambda _{n}\mathbf{v}_{n}\\ | & & | \end{bmatrix} \end{aligned} \]

Comparing these two matrices column by column, we are led to conclude that:

\[ \mathbf{Av}_{i} =\lambda _{i}\mathbf{v}_{i} \]

\(\displaystyle \mathbf{v}_{i}\) is an eigenvector of \(\displaystyle \mathbf{A}\) with eigenvalue \(\displaystyle \lambda _{i}\)! Doesn’t that seem like a great coincidence? Now, we have to be careful. Eigenvectors are non-zero. Thankfully, \(\displaystyle \mathbf{v}_{i}\) is non-zero since it is a column of an invertible matrix. Therefore, we see that the columns of the matrix \(\displaystyle \mathbf{P}\) are the eigenvectors of \(\displaystyle \mathbf{A}\) and the corresponding eigenvalues are present on the diagonal \(\displaystyle \mathbf{D}\).

Let us now formally define a diagonalizable matrix.

Definition-1

A matrix \(\displaystyle \mathbf{A}\) of shape \(\displaystyle n\times n\) is diagonalizable if it is similar to a diagonal matrix \(\displaystyle \mathbf{D}\) of the same shape.

From this we make two immediate observations that we have already disucssed in detail. Nevertheless, this is important enough to deserve a second mention. If \(\mathbf{A}\) is diagonalizable, then:

There exists an invertible matrix \(\displaystyle \mathbf{P}\) and a diagonal matrix \(\displaystyle \mathbf{D}\) such that \(\mathbf{A} =\mathbf{PDP}^{-1}\)
The \(\displaystyle i^{\text{th}}\) diagonal entry of \(\displaystyle \mathbf{D}\) is an eigenvalue of \(\displaystyle \mathbf{A}\), say \(\displaystyle \lambda _{i}\), and the \(\displaystyle i^{\text{th}}\) column of \(\displaystyle \mathbf{P}\) is an eigenvector of \(\displaystyle \mathbf{A}\) corresponding to \(\displaystyle \lambda _{i}\).

Definition-2

Now, since \(\displaystyle \mathbf{P}\) is an invertible matrix, its columns form a basis for \(\mathbb{R}^{n}\). Moreover, since the columns of \(\displaystyle \mathbf{P}\) are the eigenvectors of \(\mathbf{A}\), we can conclude that we have a basis for \(\displaystyle \mathbb{R}^{n}\) that is made up of eigenvectors of \(\displaystyle \mathbf{A}\). What we have shown is an important result:

Basis of Eigenvectors

If \(\displaystyle \mathbf{A}\) is diagonalizable, then there is a basis for \(\displaystyle \mathbb{R}^{n}\) made up of eigenvectors of \(\displaystyle \mathbf{A}\).

In fact, the converse of this statement is also true and we will show this now.

Converse

If \(\displaystyle \mathbf{A}\) be a matrix such that \(\displaystyle \mathbb{R}^{n}\) has a basis made up of eigenvectors of \(\displaystyle \mathbf{A}\), then \(\mathbf{A}\) is diagonalizable.

To prove this, consider \(\displaystyle \beta =\{\mathbf{v}_{1} ,\dotsc ,\mathbf{v}_{n}\}\) to be such a basis. Since each \(\displaystyle \mathbf{v}_{i}\) is an eigenvector, we have \(\displaystyle \mathbf{A} v_{i} =\lambda _{i}\mathbf{v}_{i}\). Collecting the eigenvectors in the columns of a matrix \(\displaystyle \mathbf{P}\) and collecting the corresponding eigenvalues in a diagonal matrix \(\displaystyle \mathbf{D}\), we see that:

\[ \begin{aligned} \mathbf{AP} & =\mathbf{PD}\\ \Longrightarrow \mathbf{A} & =\mathbf{PDP}^{-1} \end{aligned} \]

This shows that \(\displaystyle \mathbf{A}\) is diagonalizable! That proves the converse. We can use this to come up with a second, equivalent definition of a diagonalizable matrix.

Definition-2

A matrix \(\displaystyle \mathbf{A}\) of shape \(\displaystyle n\times n\) is diagonalizable if there exists a basis of \(\displaystyle \mathbb{R}^{n}\) made up of the eigenvectors of \(\displaystyle \mathbf{A}\).

Here is another perspective in which we can look at diagonalizable matrices. Recall that a pair of similar matrices represent the same underlying linear transformation in different bases. The invertible matrix \(\displaystyle \mathbf{P}\) that is responsible for the similarity transformation can be seen as a change of basis matrix, where the columns of \(\mathbf{P}\) are the elements of the new basis, while the old basis is the standard basis.

Let us apply this understanding to diagonalizable matrices. If a matrix \(\displaystyle \mathbf{A}\) is diagonalizable, then it is similar to some diagonal matrix \(\displaystyle \mathbf{D}\). What this means is that the underlying linear transformation has a matrix representation that is diagonal in some basis. What is even more interesting is that this basis is exactly the one made up of the eigenvectors of the matrix \(\displaystyle \mathbf{A}\)!

Example

All diagonal matrices are diagonalizable. These are trivial examples of diagonalizable matrices. Let us take up a slightly non-trivial example of a diaongalizable matrix:

\[ \mathbf{A} =\begin{bmatrix} 4 & 3\\ 1 & 2 \end{bmatrix} \]

The eigenvalues are \(5\) and \(1\). A pair of eigenvectors corresponding to these two eigenvalues are \((3,1)^{T}\) and \((-1,1)^{T}\) respectively. If we let these two vectors be the columns of a matrix \(\mathbf{P}\), then:

\[ \mathbf{P} =\begin{bmatrix} 3 & -1\\ 1 & 1 \end{bmatrix} ,\ \mathbf{P}^{-1} =\frac{1}{4}\begin{bmatrix} 1 & 1\\ -1 & 3 \end{bmatrix} ,\ \mathbf{D} =\begin{bmatrix} 5 & 0\\ 0 & 1 \end{bmatrix} \]

We can verify that:

\[ \mathbf{A} =\mathbf{PDP}^{-1} \]

What we have done here is to factorize or decompose a matrix into other matrices. This is a form of matrix factorization. Particularly, this goes by the name of eigen-decomposition since the component matrices contain the eigenvalues and the eigenvectors of \(\displaystyle \mathbf{A}\). Only diagonalizable matrices admit an eigen-decomposition, which is obvious from the definition of diagonalizability. In the subsequent documents, we will understand something more about the collection of diagonalizable matrices.

Summary

Diagonalizable matrices are those matrices that are not necessarily diagonal, yet share a close kinship with them. The following are equivalent definitions. An \(n\times n\) matrix \(\mathbf{A}\) is diagonalizable if:

\(\mathbf{A}\) is similar to a diagonal matrix.
\(\mathbb{R}^{n}\) has a basis of eigenvectors of \(\mathbf{A}\).