Matrix Vector Multiplication

Martrix-vector multiplication is one of the most commonly used operations in real life. We unfortunately won't be able to talk about this in CSE 331 lectures, so this page is meant as a substitute. We will also use this as an excuse to point out how a very simple property of numbers can be useful in speeding up algorithms.

The Khan academy video above calls the inner product as just dot product and used the notation x.y instead of <x,y>, which is what we will use in this note.
Since i is used liberally as an index in this note.
the video is actually a 2D-DFT and not exactly the DFT as defined above.
I am being purposefully being vague about what exactly I mean by numbers. But for this section pretty much any reasonable set of numbers (e.g. integers, real numbers, complex numbers) work. If you must ask, the set of number of which this law holds (plus some other requirements) is called a semi-ring .
If the problem seems too esoteric, just hold on to your (judgmental) horses. We will soon see this sum of product corresponds to a very natural problem.
For this note we will assume that the numbers are small enough so that all basic operations (addition, multiplication, subtraction and division) all take constant time.

Background

In this note we will be working with matrices and vectors. Simply put, matrices are two dimensional arrays and vectors are one dimensional arrays (or the "usual" notion of arrays). We will be using notation that is consistent with array notation. So e.g. a matrix $\mathbf{A}$ with $m$ rows and $n$ columns (also denoted as an $m\times n$ matrix) will in code be defined as int [][] A = new int[m][n] (assuming the matrix stores integers). Also a vector $\mathbf{x}$ of size $n$ in code will be declared as int [] x = new int[n] (again assuming the vector contains integers). To be consistent with the array notations, we will denote the entry in $\mathbf{A}$ corresponding to the $i$th row and $j$th column as $A[i][j]$ (or A[i][j]). Similarly, the $i$th entry in the vector $\mathbf{x}$ will be denoted as $x[i]$ (or x[i]). We will follow the array convention assume that the indices $i$ and $j$ start at $0$.

If you want a refresher on matrices, you might want to start with this Khan academy video (though if you are comfortable with the array analogy above you should not really need much more for this note):

The Problem

Matrix-Vector Multiplication

Given an $n\times n$ matrix $\mathbf{A}$ and a vector $\mathbf{x}$ of length $n$, their product is denoted by \[\mathbf{y} =\mathbf{A}\cdot \mathbf{x},\] where $\mathbf{y}$ is also a vector of length $n$ and its $i$th entry for $0\le i\lt n$ is defined as follows: \[y_i =\sum_{j=0}^{n-1} A[i][j]\cdot x[j].\]

Simplification

Note that in the problem definition above, we have the same number of rows and columns, i.e. $m=n$. I'm using this simplification to reduce the number of variables one needs to keep in their head. Most of the arguments in this note can be generalized to the $m\neq n$ case pretty easily.

If you have not seen this operation before, this Khan academy video has some examples (or read the next section):

The Inner Product

It turns out that to visualize the matrix-vector multiplication, it is helpful to define a simpler operation between two vectors:

Inner Product

Given two vectors $\mathbf{x}$ and $\mathbf{y}$, both of length $n$, their inner product is defined as¹ \[\left\langle \mathbf{x},\mathbf{y}\right\rangle = \sum_{i=0}^{n-1} x[i]\cdot y[i].\]

Below is an animation that shows the inner product of two random vectors of size $n=6$ (click on "Next" to continue the computation of the inner product: the animation starts with two new random vectors if "Next" is pressed once the inner product has already been computed):

Armed with the definition of the inner product, here is an equivalent definition of the matrix-vector multiplication problem:

Matrix-vector multiplication (redefined)

Given an $n\times n$ matrix $\mathbf{A}$ and a vector $\mathbf{x}$ of length $n$, their product $\mathbf{y}=\mathbf{A}\cdot\mathbf{x}$ can also be defined as follows. Its $i$th entry for $0\le i\lt n$ is defined as follows: \[y_i =\left\langle \mathbf{A}[i],\mathbf{x}\right\rangle,\] where $\mathbf{A}[i]$ denotes the $i$th row of $\mathbf{A}$.

Below is one example of a matrix vector multiplication: \[ \begin{pmatrix} 1 & 2 & -3\\ 2 & 9 &0\\ 6 & -1 & -2 \end{pmatrix} \times \begin{pmatrix} 2\\ 3\\ -1 \end{pmatrix} = \begin{pmatrix} 1\times 2 + 2\times 3 + (-3)\times(-1)\\ 2\times 2+ 9\times 3+0\times (-1)\\ 6\times 2+(-1)\times 3+ (-2)\times(-1) \end{pmatrix} = \begin{pmatrix} 11\\ 31\\ 11 \end{pmatrix}. \]

From the above the following should be easy to show:

Exercise 1

Present an algorithm that can multiply an $n\times n$ matrix with a vector of length $n$ in time $O(n^2)$.

In general the above cannot be improved:

Exercise 2

Argue that any algorithm that can multiply an arbitrary $n\times n$ matrix with an arbitrary vector of length $n$ takes $\Omega(n^2)$ time.

Structured Matrices

It turns out that in many practical applications of matrix-vector multiplication, the matrix $\mathbf{A}$ has some structure that potentially allows one to perform matrix vector multiplication (asymptotically) faster than $\Omega(n^2)$ time. We illustrate this point with a specific family of structured matrices:

Outer Product

Given two vectors $\mathbf{s}$ and $\mathbf{t}$, their outer product is defined as the $n\times n$ matrix $\mathbf{A}$ such that for any entry $(i,j)$, we have \[A[i][j]=s[i]\cdot t[j].\]

If you are interested in more properties of the outer product (and some applications), take a look at its Wikipedia page .

Next, we present an algorithm to multiply the outer product of two arbitrary vectors $\mathbf{s}$ and $\mathbf{t}$ with a vector $\mathbf{x}$ in time $O(n)$. (Note that this is a significant improvement over the runtime from the trivial algorithm in Exercise 1.) In other words, we want to solve the following problem

Multiplying outer product with a vector

Given three vectors $\mathbf{s},\mathbf{t}$ and $\mathbf{x}$, compute the product of the outer product of $\mathbf{s}$ and $\mathbf{t}$ with the vector $\mathbf{x}$. If $\mathbf{y}$ is the product, then it is defined as (for every $0\le i\lt n$): \[y_i=\sum_{j=0}^{n-1} s[i]\cdot t[j]\cdot x[j].\]

Algorithm Idea

Observe that \[y_i=\sum_{j=0}^{n-1} s[i]\cdot t[j]\cdot x[j]=s[i]\cdot \sum_{j=0}^{n-1} t[j]\cdot x[j] = s[i]\left\langle \mathbf{t},\mathbf{x}\right\rangle.\] In other words, once we have computed $\left\langle \mathbf{t},\mathbf{x}\right\rangle$, one can compute $y_i$ by just multiplying the inner product with $s[i]$, which takes $O(1)$ time for each $0\le i\lt n$.

Algorithm Details

                                	//Input: s[i], t[i], x[i] for 0<= i<n

					//Compute <s,t> first
					ip=0;
					for(j=0;j < n; j++)
						ip+=t[j]*x[j];

					//Now compute the output vector
					y = new int[n];
					for(i=0; i< n;i++)
						y[i] = s[i]*ip;

					return y;

The runtime analysis is left as an exercise:

Exercise 3

Argue that the algorithm above runs in time $\Theta(n)$.

Discrete Fourier Transform

In this section, we consider perhaps the world's most widely used matrix-vector multiplication: the Discrete Fourier Transform (or DFT). For applications and more math (than below), look at its Wikipedia page . Here is the definition of the Discrete Fourier transform:

Discrete Fourier Transform

We first define the discrete Fourier matrix. The $n\times n$ (discrete) Fourier matrix $\mathbf{F}$ is defined as follows. Its $(i,j)$th entry for $0\le i,j\lt n$: \[F[i][j] = e^{-2\pi \iota \cdot \frac{i\cdot j}{n}},\] where $\iota$ is the imaginary number.²

Finally, given a vector $\mathbf{x}$ of length $n$, its Fourier transform is given by \[\mathbf{F}\cdot \mathbf{x}.\]

Before going a bit more into the math of DFT, here is a video showing the DFT³ of an image of a table cloth:

And playing with DFTs can be addictive:

I'm not sure why, but I like taking Fourier transforms of everyday things. Here's a drain cover. pic.twitter.com/31RqNK52bo
— Aatish Bhatia (@aatishb) August 25, 2015

The complex number thingy

As you must have noticed the DFT uses complex numbers. Here's a video on why complex numbers are really useful (and why "imaginary number" is one of Math's PR mistakes):

And, sorry could not resist this video on "imaginary Erdos number" (skip this if you do not want to get side-tracked):

So it turns out that the complex numbers used in the discrete Fourier matrix are roots of unity . However, for whatever is coming up next, you only need to know the following two simple facts:

Fact 1

For any integer $j$ we have \[e^{2\pi\iota\cdot j}=1.\]

Fact 2

For any two integers $j,k$ we have \[e^{2\pi\iota\cdot\frac{j+k}{n}}=e^{2\pi\iota\cdot\frac{j}{n}}\cdot e^{2\pi\iota\cdot\frac{k}{n}}.\]

Some Comments on Structure of the DFT

We now present some observation on the discrete Fourier matrix that can be used to speed up the computation of the DFT from the trivial $O(n^2)$ bound. For simplicity we will make the following assumption.

Assumption

In this section, unless stated otherwise, we will assume that \[n =2^m,\] for some integer $m$. In other words, $m=\log_2{n}$ is an integer.

Let $0\le i,j\lt n$ be row and column indices and consider the entry $F[i][j]= e^{-2\pi\iota\cdot i\cdot j/n}$. We will express this entry as a product of new numbers. In particular, let $(i_0,\dots, i_{m-1})$ and $(j_0,\dots,j_{m-1})$ be the binary representation of $i$ and $j$ (note that $m$ are sufficient). In other words, \[i= i_0 + i_1\cdot 2+\cdots +i_{m-1}\cdot 2^{m-1},\] and \[j= j_0 + j_1\cdot 2+\cdots +j_{m-1}\cdot 2^{m-1}.\]

Using the above we get that for $0\le i,j\lt n$ \[\frac{i\cdot j}{n} =2^{-m}\cdot \left(\sum_{k=0}^{m-1} i_k\cdot 2^k\right)\cdot \left(\sum_{k=0}^{m-1} j_k\cdot 2^k\right) = \sum_{k,\ell: 0\le k+\ell\le 2m-2} i_k\cdot j_{\ell}\cdot 2^{k+\ell-m}.\] Using the above identity along with Fact 2, we get that \[ F[i][j] = e^{-2\pi\iota\cdot i\cdot j/n} =\prod_{k,\ell: 0\le k+\ell\le 2m-2} e^{-2\pi\iota\cdot i_k\cdot j_{\ell}\cdot 2^{k+\ell-m}}.\] Using Fact 1, we note that for every $k+\ell\ge m$, we have that $e^{-2\pi\iota\cdot i_k\cdot j_{\ell}\cdot 2^{k+\ell-m}}=1$. This implies that \[ F[i][j] = \prod_{k,\ell: 0\le k+\ell\le m-1} e^{-2\pi\iota\cdot i_k\cdot j_{\ell}\cdot 2^{k+\ell-m}}.\] In other words, we have reduced each entry from being a product of $2m-1$ numbers to being a product of only $m$ numbers. We record this fact

Proposition 1

For any $k,\ell$ such that $0\le k+\ell\le m-1$, define $2\times 2$ matrices $f_{k,\ell}$ as follows. For every $a,b\in \{0,1\}$: \[f_{k,\ell}[a][b] = e^{-2\pi\iota\cdot a\cdot b\cdot 2^{k+\ell-m}}.\] Then the above discussion implies that \[ F[i][j] = \prod_{k,\ell: 0\le k+\ell\le m-1} f_{k,\ell}[i_k][j_{\ell}].\]

It turns out that the above savings of "only" $m-1$ products (i.e. we went down from a product of $2m-1$ terms to a product of $m$ terms) is enough to drive down the time to compute the DFT from $O(n^2)$ to $O(n\log{n})$:

Exercise 4

Design an algorithm that computes the DFT for $n=2^m$ in time $O(nm)$.

Hint: Use Proposition 1.

The Distributive Law

In this section, we will present a simple observation that turns out to be very useful in designing efficient algorithms (especially those involving addition and multiplication of numbers). In particular, this idea (suitably implemented) can also be used to solve Exercise 4 above.

We begin with the statement of the distributive law:

Distributive Law

Given three numbers $a,b,c$ we have⁴ \[a\cdot b+a\cdot c = a\cdot(b+c).\]

Note that the above implies that \[\sum_{i=0}^{n-1} a\cdot b_i = a\cdot\left(\sum_{i=0}^{n-1} b_i\right).\] The above implies that instead of the $n$ multiplications on the LHS, we can get away with just one multiplication on the RHS. Of course in the case above we still need $\Theta(n)$ arithmetic operation in both cases so we do not get any asymptotic improvement.

However, if we combine the distributive law with some judicious saving of intermediate results then we can result in asymptotic improvement in runtimes. We actually have already seen such an example. Recall the example of multiplying the outer product of two vectors $\mathbf{s}$ and $\mathbf{t}$ with an arbitrary vector $\mathbf{x}$, we saw that $y[i]=s[i]\cdot \left\langle \mathbf{t},\mathbf{x}\right\rangle$. The expression was achieved from the original expression of $\sum_{j=0}^{n-1} s[i]\cdot t[j]\cdot x[j]$ by using the distributive law. Then we noticed that if we stored the "common" term $\left\langle \mathbf{t},\mathbf{x}\right\rangle$, then we could avoid recomputing this term for different $i$. We summarize this strategy as follows:

Distributive Law Strategy

Compute sums of products of the form \[\sum_{i=0}^{n-1} a\cdot b_i,\] as \[a\cdot \sum_{i=0}^{n-1} b_i.\] Further store all intermediate result for potential future use.

Computing Sums of Products

It turns out that the distributive law strategy is especially good at efficiently computing sums of products. Instead of defining more formally what I mean by sums of products, let me illustrate my point with a specific problem:⁵

An example of sums of product

Given $n\times n$ matrices $\mathbf{A},\mathbf{B},\mathbf{C}$ and a vector $\mathbf{x}$, consider the following sums of product. In particular we want to compute a vector $\mathbf{y}$ such that for every $0\le i\lt n$: \[y[i]=\sum_{j=0}^{n-1}\sum_{k=0}^{n-1}\sum_{\ell=0}^{n-1} A[i][j]\cdot B[j][k]\cdot C[k][\ell]\cdot x[\ell].\]

Consider the following trivial algorithm to solve this problem. Compute a $4$-D array defined as (where $0\le i,j,k,\ell\lt n$): \[T[i][j][k][\ell] = A[i][j]\cdot B[j][k]\cdot C[k][\ell]\cdot x[\ell].\] Note that $T$ can be computed in $O(n^4)$ time⁶. Then in further $O(n^3)$ time one can compute $y[i]$ for each $0\le i\lt n$ using the definition of $\mathbf{y}$ above. Thus, we can compute $\mathbf{y}$ in $O(n^4)$ overall. Next, we show that one can compute $\mathbf{y}$ must faster in $O(n^2)$ time using the distributive law strategy.

Algorithm Idea

Using the distributive law (and the fact that $A[i][j]$ and $B[j][k]$ are independent of $\ell$) we note that for fixed $0\le i,j,k\lt n$, we have that \[y[i]=\sum_{j=0}^{n-1}\sum_{k=0}^{n-1}A[i][j]\cdot B[j][k]\cdot \sum_{\ell=0}^{n-1}C[k][\ell]\cdot x[\ell].\] Now define a vector $\mathbf{z}$ such that for every $0\le k\lt n$, we have \[z[k] = \sum_{\ell=0}^{n-1}C[k][\ell]\cdot x[\ell].\] Then note that \[y[i]=\sum_{j=0}^{n-1}\sum_{k=0}^{n-1}A[i][j]\cdot B[j][k]\cdot z[k].\] Again, using the distributive law, we get \[y[i]=\sum_{j=0}^{n-1}A[i][j]\cdot \sum_{k=0}^{n-1} B[j][k]\cdot z[k].\] Then if we define a vector $\mathbf{u}$ such that for every $0\le j\lt n$ \[u[j] = \sum_{k=0}^{n-1} B[j][k]\cdot z[k],\] we need to compute \[y[i]=\sum_{j=0}^{n-1}A[i][j]\cdot u[j].\] Note that the above is just the same as computing $\mathbf{y}=\mathbf{A}\cdot\mathbf{u}$, which we know how to do in $O(n^2)$ time (by Exercise 1). Further, it is easy to check that $\mathbf{z}=\mathbf{C}\cdot\mathbf{x}$ and $\mathbf{u}=\mathbf{B}\cdot \mathbf{z}$, each of which can be computed in $O(n^2)$ time (and thus, $O(n^2)$ time overall).

Algorithm Details

                                //Input: A[i][j], B[j][k], C[k][l], x[l] for  for 0<= i,j,k,l<n

					z = new int[n];
					u = new int[n];
					y = new int[n];

					//Compute the vector z first
					for( k=0; k< n; k++)
					{
						z[k]=0;
						for(l=0; l< n; l++)
						z[k]+= C[k][l]*x[l];
					}

					//Compute the u vector next
					for( j=0; j< n; j++)
					{
						u[j]=0;
						for(k=0; k< n; k++)
						u[j]+= B[j][k]*z[k];
					}

					//Finally compute the output y
					for(i=0; i< n;i++)
					{
						y[i]=0;
						for(j=0; j< n;j++)
							y[i]+= A[i][j]*u[j];
					}

					return y;

We leave the runtime analysis of the algorithm above as an exercise:

Exercise 5

Argue that the algorithm above runs in time $\Theta(n^2)$.

What just happened here?

We now return to the unresolved question of why the sums of product considered above is a natural question in itself. It is not too hard to see that the sums of products can also be stated as \[\mathbf{y}= \mathbf{A}\cdot \mathbf{B}\cdot \mathbf{C}\cdot \mathbf{x}.\] Further, the algorithm discussed above basically follows the obvious way to compute $\mathbf{y}$. That is compute the RHS as \[\mathbf{A}\cdot \underbrace{\left(\mathbf{B}\cdot \underbrace{\left(\mathbf{C}\cdot \mathbf{x}\right)}_{\mathbf{z}}\right)}_{\mathbf{u}}.\]