\(\mathbf{A} \mathbf{x}=\mathbf{0}\)
How do we solve for \(\mathbf{x}\) in the equation \(\mathbf{Ax}=\mathbf{0}\)?
Setting
Before we tackle the general problem of \(\mathbf{Ax}=\mathbf{b}\), let us first see if we can solve the system when \(\mathbf{b}\) is the zero vector. In all the discussions that follow, this will be our setting:
\(\mathbf{A}\) is a matrix of dimensions \(m\times n\)
\(\mathbf{x}\in\mathbb{R}^{n}\)
\(\mathbf{Ax}\in\mathbb{R}^{m}\)
The problem that we have taken up is the homogeneous system of equations:
\[ \mathbf{Ax}=\mathbf{0} \]
Nullspace
We can immediately see that \(\mathbf{x}=\mathbf{0}\) is a solution. However, this is a trivial solution and is not particularly interesting. We would like to search for non-trivial solutions. Let us begin optimistically by assuming that at least one such solution exists, say \(\mathbf{x}_{1}\). Then, we can see that \(k\mathbf{x}_{1}\) is also a solution. This is because:
\[ \mathbf{A}(k\mathbf{x}_{1})=k\cdot\mathbf{Ax}_{1}=\mathbf{0} \]
Also, if \(\mathbf{x}_{1}\) and \(\mathbf{x}_{2}\) are two solutions to the equation, then \(\mathbf{x}_{1}+\mathbf{x}_{2}\) is also a solution, as:
\[ \mathbf{A}(\mathbf{x}_{1}+\mathbf{x}_{2})=\mathbf{Ax}_{1}+\mathbf{Ax}_{2}=0 \]
From these two observations, we see that the set of all solutions to the equation \(\mathbf{Ax}=\mathbf{0}\) is a subspace of \(\mathbb{R}^{m}\). We denote this by \(\mathcal{N}(\mathbf{A})\) and we call it the nullspace of \(\mathbf{A}\). The dimension of the nullspace is called nullity.
All this is fine, but how does it help us find all the solutions? If we can find a basis for the nullspace, that will help us characterize all the solutions. If \(B=\{\mathbf{x}_{1},\cdots,\mathbf{x}_{k}\}\) is a basis for the nullspace, then the set of all solutions to the equation is given by \(\mathcal{N}(\mathbf{A})=\text{span}(B)\). To get to the basis, we first need to revisit Gaussian elimination.
Row-Echelon form
The central idea in Gaussian elimination is to transform a matrix into its row-echelon form. Let us take up an example and work with that:
\[ \mathbf{A}=\begin{bmatrix}1 & 0 & -1 & 0\\ 2 & 1 & 0 & 1\\ 3 & 1 & -1 & 1 \end{bmatrix} \]
Recall that we can apply a sequence of any of these three row operations on a matrix:
swap two rows
scale a row by a non-zero constant
add a scalar multiple of a row to another row
Step-1
\[ \begin{bmatrix}1 & 0 & -1 & 0\\ 2 & 1 & 0 & 1\\ 3 & 1 & -1 & 1 \end{bmatrix}\ \xrightarrow{R_{2}\rightarrow R_{2}-2R_{1}}\begin{bmatrix}1 & 0 & -1 & 0\\ 0 & 1 & 2 & 1\\ 3 & 1 & -1 & 1 \end{bmatrix} \]
Step-2 \[ \begin{bmatrix}1 & 0 & -1 & 0\\ 0 & 1 & 2 & 1\\ 3 & 1 & -1 & 1 \end{bmatrix}\xrightarrow{R_{3}\rightarrow R_{3}-3R_{1}}\begin{bmatrix}1 & 0 & -1 & 0\\ 0 & 1 & 2 & 1\\ 0 & 1 & 2 & 1 \end{bmatrix} \]
Step-3 \[ \begin{bmatrix}1 & 0 & -1 & 0\\ 0 & 1 & 2 & 1\\ 0 & 1 & 2 & 1 \end{bmatrix}\xrightarrow{R_{3}\rightarrow R_{3}-R_{2}}\begin{bmatrix}1 & 0 & -1 & 0\\ 0 & 1 & 2 & 1\\ 0 & 0 & 0 & 0 \end{bmatrix} \]
The final matrix that we have is in row-echelon form. Here is a quick reminder of when a matrix is in row-echelon matrix:
All rows that have only zeros are at the bottom.
The first nonzero entry in a row is always to the right of the first nonzero entry in the row above it.
The first nonzero entry in a row is called a pivot and columns that contain pivots are called pivot columns. The row echelon form a matrix is not unique. However, we can reduce a matrix to its reduced row echelon form (RREF), which is unique. A matrix is in reduced row echelon form if:
- It is in row echelon form.
- The only nonzero entry in a pivot column is the pivot.
- All pivots are equal to \(1\).
The matrix that we have obtained in this example is actually in RREF. Let us call the RREF of \(\mathbf{A}\) as \(\mathbf{R}\). We state the following result without proof:
If \(\mathbf{R}\) is the reduced row echelon form of \(\mathbf{A}\), then \(\mathbf{Ax}=\mathbf{0}\) if and only if \(\mathbf{Rx}=\mathbf{0}\). In particular, \(\mathcal{N}(\mathbf{A}) = \mathcal{N}(\mathbf{R})\).
Thus, the nullspace of a matrix and the nullspace of its RREF are the same. This lets us forget \(\mathbf{A}\) and instead deal with its row-echelon form directly. Why does this work? The row operations are “reversible”, meaning, if \(\mathbf{R}\), the RREF of \(\mathbf{A}\), is obtained by a sequence of \(m\) row operations, then we can retrieve \(\mathbf{A}\) from \(\mathbf{R}\) by “reversing” each of the \(m\) row operations as well the sequence of the operations. We will leave out the exact mechanics for now. But it is useful to note that it is this reversibility that allows us to link the systems \(\mathbf{Ax} = \mathbf{0}\) and \(\mathbf{Rx} = \mathbf{0}\).
It is also instructive to note that each row operation transforms an equation into some other form in such a way that the set of \(m\) equations still retain their essence. Though we haven’t mentioned this explicitly, the operations that we do on the matrix \(\mathbf{A}\) should be done for the RHS of the equations. Since the vector \(\mathbf{b} = \mathbf{0}\), the operations on the RHS won’t change the RHS, so we don’t bother to mention them. In the next document that discusses the general system, \(\mathbf{Ax} = \mathbf{b}\), we will see the explicit involvement of the RHS.
Recipe for a Basis
Now we have to solve the following equation:
\[ \begin{bmatrix} \textbf{1} & 0 & -1 & 0\\ 0 & \textbf{1} & 2 & 1\\ 0 & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4} \end{bmatrix}=\begin{bmatrix}0\\ 0\\ 0\\ 0 \end{bmatrix} \]
Columns \(1\) and \(2\) are called the pivot columns as they contain the pivots. The variables corresponding to the pivots are called pivot variables, while the others are called free variables. This naming convention isn’t arbitrary. The free variables are free to assume whatever value they please. We can then solve for the pivot variables. An alternative naming convention is to call these two sets of variables dependent and independent variables:
- pivot variables \(\equiv\) dependent variables
- free variables \(\equiv\) independent variables
We can now state the algorithm for finding a basis for the nullspace of \(\mathbf{\boldsymbol{R}}\):
Algorithm
Let us now go ahead and specify the algorithm:
\(B\) is the required basis. Let us try to work it out for the example problem:
\[ \begin{bmatrix} \textbf{1} & 0 & -1 & 0\\ 0 & \textbf{1} & 2 & 1\\ 0 & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4} \end{bmatrix}=\begin{bmatrix}0\\ 0\\ 0\\ 0 \end{bmatrix} \]
\(x_{1}\) and \(x_{2}\) are the pivot variables. \(x_{3}\) and \(x_{4}\) are the free variables. First, let us set \(x_{3}=1,x_{4}=0\). This gives us \(x_{1}=1,x_{2}=-2\). The first element of the basis is therefore \(\begin{bmatrix}1 & -2 & 1 & 0\end{bmatrix}^{T}\). Next, we set \(x_{3}=0,x_{4}=1\). This gives us \(x_{1}=0,x_{2}=-1\). The second element of the basis is \(\begin{bmatrix}0 & -1 & 0 & 1\end{bmatrix}^{T}\). Thus, a basis for \(\mathcal{N}(\mathbf{X})\) is:
\[ B=\left\{ \begin{bmatrix}1\\ -2\\ 1\\ 0 \end{bmatrix},\begin{bmatrix}0\\ -1\\ 0\\ 1 \end{bmatrix}\right\} \]
The set of all solutions for the system \(\mathbf{Ax}=\mathbf{0}\) is \(\text{span}(B)\).
Proof of Correctness
If you are wondering why this algorithm works, note that the rank of the matrix, \(r\), is equal to the number of non-zero rows in the row-echelon form of \({\displaystyle \mathbf{A}}\). If we look at the columns, then we have \({\displaystyle r}\) pivot columns. These \({\displaystyle r}\) columns are linearly independent by construction. Therefore, each of the remaining \({\displaystyle n-r}\) columns can be expressed as a unique linear combination of these \({\displaystyle r}\) pivot columns. The coefficients of this linear combination are going to come from our \({\displaystyle n}\) variables, \({\displaystyle x_{1},\cdots,x_{n}}\).
Recall that the matrix \({\displaystyle \mathbf{A}}\) is \({\displaystyle m\times n}\). From the rank-nullity theorem, we know that the nullity is going to be \(n-r\). Thus a basis of \(\mathcal{N}(\mathbf{A})\) will have \(n-r\) linearly independent vectors. We have to hunt for these \(n-r\) vectors. To get there, we divide the \(n\) variables into two parts:
\(r\) pivot variables: variables corresponding to the pivot columns
\(n-r\) free variables: all other variables
To get a basis vector, we set one of the \(n-r\) free variables to \(1\) and the rest to \(0\). Basically, here we are trying to express a non-pivot column as a linear combination of the \({\displaystyle r}\) pivot columns. Then we determine the \(r\) pivot variables (coefficients of the combination) by solving the \(r\) equations corresponding to them. The resulting \({\displaystyle \mathbf{x}}\) is one element in the basis. By repeating this process with each of the \(n-r\) free variables, we are guaranteed to have \(n-r\) linearly independent vectors.
Summary
The set of all solutions to \(\mathbf{Ax} = \mathbf{0}\) is the nullspace of \(\mathbf{A}\). The nullspace, \(\mathcal{N}(\mathbf{A})\), can be succinctly expressed as \(\mathcal{N}(\mathbf{A}) = \text{span}(B)\), where \(B\) is a basis for it. We use a sequence of elementary row operations to reduce the matrix \(\mathbf{A}\) to its row echelon form \(\mathbf{R}\). We then use an algorithm to extract a basis from the system \(\mathbf{Rx} = \mathbf{0}\).