Matrices
Norms - Frobenius Norm
||\boldsymbol{X}||{_F} = \sqrt{\displaystyle\sum_{i,j} X_{i,j}{^2} }
For i rows, and j columns, sum the square of each element, then take the square root. This is analogous to the L^2 norm of vectors, and measures the Euclidean size of the matrix. Can also be thought of as the sum of the magnitude of all the vectors in \boldsymbol{X}.
Multiplication
Number of columns in the first matrix must match the number of rows in the second, i.e. in the following example, m is the number of rows in the first matrix, n is the number of columns in the first matrix and the number of rows in the second, and p is the number of columns in the second matrix.
m\underset{n}{\begin{bmatrix} \, \\ A \\ \,\end{bmatrix}} \quad n\underset{p}{\begin{bmatrix}& B &\end{bmatrix}} = m\underset{p}{\begin{bmatrix} \, \\ C \\ \,\end{bmatrix}}
C_{i,k} = \displaystyle\sum_{j}A_{i,j}B_{j,k}
Multiplying a matrix with a vector:
\begin{bmatrix} 3 & 4 \\ 5 & 6 \\ 7 & 8 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 3*1 + 4*2\\ 5*1+6*2 \\ 7*1+8*2 \end{bmatrix} = \begin{bmatrix} 11 \\ 17 \\ 23\end{bmatrix}
Multiplying a matrix with a matrix:
\begin{bmatrix} 3 & 4 \\ 5 & 6 \\ 7 & 8 \end{bmatrix} \begin{bmatrix} 1 & 9\\ 2 & 0 \end{bmatrix} = \begin{bmatrix} 3*1 + 4*2 & 3*9+4*0\\ 5*1+6*2 & 5*9+6*0\\ 7*1+8*2 &7*9+8*0 \end{bmatrix} = \begin{bmatrix} 11 & 27\\ 17 & 45\\ 23&63\end{bmatrix}
Matrix multiplication is typically not commutative: \boldsymbol{X}\boldsymbol{Y} \not= \boldsymbol{Y}\boldsymbol{X}
If both matrices are square, multiplying either way will work, but the results will usually be different.
Can represent the following regression as a matrix multiplication:
\begin{bmatrix} y{_1} |\alpha + \beta x{_{1,1}} + \gamma x{_{1,2}} + ... + m_{-1}x{_{1,m_{-1}}} \\ y{_2} | \alpha + \beta x{_{2,1}} + \gamma x{_{2,2}} + ... + m_{-1}x{_{2,m_{-1}}} \\ \vdots \\ y{_n} | \alpha + \beta x{_{n,1}} + \gamma x{_{n,2}} + ... + m_{-1}x{_{n,m_{-1}}} \end{bmatrix} \\ \begin{bmatrix} y{_1} \\ y{_2} \\ \vdots \\ y{_n}\end{bmatrix} =\begin{bmatrix} 1 & X_{1,1} & X_{1,2} & \dots & X_{1,m_{-1}}\\ 1 & X_{2,1} & X_{2,2} & \dots & X_{2,m_{-1}} \\ \vdots \\ 1 & X_{n,1} & X_{n,2} & \dots & X_{n,m_{-1}} \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \\ \vdots \\ m_{-1}\end{bmatrix}
n cases tall, m features wide.
Symmetric & Identity Matrices
Symmetric
Number of rows = number of columns i.e. square Transpose is equal to itself i.e. any value on the diagonal but the values either side must be equal.
Identity
An identity matrix is a symmetric matrix where every element along the main diagonal is 1, all others are 0. This satisfies the condition of being an identity as, if an identity matrix is multiplied by a n length vector, it will remain unchanged i.e. we get the same vector returned.
Notation \boldsymbol{I}_n where n = height or width.
1 is the identity element for multiplication - the number which when operated on with any other number results in the other number. 0 is the identity element for addition, and 1 is the identity element for multiplication. Whenever the identity element for an operation is the solution, the two items operated on are inverses of each other e.g. 5x = 15, both sides can be multiplied by the multiplicative inverse i.e. \frac{1}{5} or 5^{-1} to get x=3. A number multiplied by its reciprocal or inverse always returns 1 e.g. \frac{1}{5} * 5 = 1. Similarly, a matrix multiplied by its inverse, will return the identity matrix.
Matrix Inversion
Used to computationally solve linear equations. Matrix inverse is denoted \boldsymbol{X}^{-1}. It satisfies the following property:
\boldsymbol{X}^{-1} \boldsymbol{X} = \boldsymbol{X}\boldsymbol{X}^{-1} = \boldsymbol{I}_n
2x2 Matrix inversion: \begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
where {ad-bc} is the determinant (which cannot be 0).
Can’t divide by a matrix, but can multiply by an inverse to get the same result e.g. we know matrix \boldsymbol{A} and \boldsymbol{B}, and want to get the values for matrix \boldsymbol{X}:
\boldsymbol{X} \boldsymbol{A} = \boldsymbol{B}
We can’t divide by \boldsymbol{A}, but can multiply both sides by the inverse of \boldsymbol{A}:
$ ^{-1} = ^{-1}$ and we know that \boldsymbol{A}\boldsymbol{A}^{-1} = \boldsymbol{I}
which gives us
$ = ^{-1}$
and \boldsymbol{I} can be ignored, just like 1 would be during multiplication:
$ = ^{-1}$
If a matrix has no inverse i.e. the determinant is zero, then it is called singular.
Matrix inversion can also only be calculated if the matrix is square i.e. the vector span (number of rows) is equal to the matrix range (number of cols). This avoids overdetermination (rows > cols) i.e. number of equations > number of dimensions (multiple points of overlap), and underdetermination (rows < cols) i.e. number of equations are less than dimensions (no overlap as single line) - may still be able to solve these, but can’t invert a matrix as a solution.
Example Usage
\begin{bmatrix} y{_1} \\ y{_2} \\ \vdots \\ y{_n}\end{bmatrix} =\begin{bmatrix} 1 & X_{1,1} & X_{1,2} & \dots & X_{1,m_{-1}}\\ 1 & X_{2,1} & X_{2,2} & \dots & X_{2,m_{-1}} \\ \vdots \\ 1 & X_{n,1} & X_{n,2} & \dots & X_{n,m_{-1}} \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \\ \vdots \\ m_{-1}\end{bmatrix} Would like to solve for the vector of variables on the right, in this regression model. This can be represented as: \boldsymbol{y} = \boldsymbol{X}\boldsymbol{w} where \boldsymbol{w} is the vector of weights, \alpha to m_{-1}, corresponding to the number of columns in our feature matrix.
In this equation, \boldsymbol{y} = \boldsymbol{X}\boldsymbol{w}, we know the outcomes \boldsymbol{y}, and we know the features \boldsymbol{X}, and we wish to work out the weights, \boldsymbol{w}, which we can do using matrix inversion, assuming the inverse matrix exists i.e. it isn’t singular - all columns must be linearly independent e.g. one column should not be a multiple of another one like [1,2] - another can’t be [2,4] or [1,2] again as we would have paralell lines or overlapping lines respectively.
We can multiply both sides by the inverse of \boldsymbol{X}, \boldsymbol{X}^{-1} to obtain:
$^{-1} = ^{-1} $ and given \boldsymbol{X}^{-1}\boldsymbol{X} = \boldsymbol{I}, we can simplify this to:
$ = ^{-1} $ and given the property of the identity matrix, we can remove it to leave:
$ = ^{-1} $
The order of operations is key to preserve, as matrix multiplication is not always commutative i.e. $ = $
Diagonal Matrices
Non-zero elements along the main diagonal, zero elsewhere e.g. identity matrix. If it is square, denoted as diag(\boldsymbol{x}) where \boldsymbol{x} is a vector of main diagonal elements e.g. [1,1,1] to represent \boldsymbol{I}_3. Much more computationally efficient: - Multiplication: diag(\boldsymbol{x})\boldsymbol{y} = \boldsymbol{x}\odot{}\boldsymbol{y} - Inversion: $diag()^{-1} = diag(,,)^T $ - Can’t divide by zero, so if any elemenet along the diagonal contains zero, the diagonal can’t be inverted - Can be non-square and computation can still be efficient e.g. by adding zeros to the product if height > width, or by removing elements from the product if the width > height
Orthogonal Matrices
In orthogonal matrices, orthonormal vectors make up all rows and columns. This means that \boldsymbol{A}^{T} \boldsymbol{A} = \boldsymbol{A}\boldsymbol{A}^{T} = \boldsymbol{I} and also that \boldsymbol{A}^{T} = \boldsymbol{A}^{-1}\boldsymbol{I} = \boldsymbol{A}^{-1}. Calculating \boldsymbol{A}^{T} is cheap, and therefore so is \boldsymbol{A}^{-1} for orthogonal matrices.