PLA (Perceptron Learning Algorithm) is a very basic machine learning algorithm. It serves as an approach to separate two class of staffs. In this post, I will first introduce the basic logic, then talk about the difficulties I encountered when implementing this algorithm.
Algorithm Summary
Theory
I learnt this algorithm from the course of “Machine learning algorithm” on the platform of coursera.
For the input $(x_i,y_i)$ (here we take $x_i$ as a vector, and the dimension is d), we will give a weight vector $w_i$ and a threshold such that:
$$ t_i = \sum^d_{j=1}w_{ij}x_{ij} $$
Then we compare $t$ to the threshold. If $t > threshold$, then we claim the result will be positive, and negative otherwise.
Of course we can incorporate the threshold into the sum formula:
$$
\begin{align}
h(x_i) &= sign((\sum^d_{j=1}w_{ij}x_{ij})  threshold) \\
&= sign((\sum^d_{j=1}w_{ij}x_{ij}) + (threshold) * 1) \\
&= sign((\sum^d_{j=0}w_{ij}x_{ij})) \\
&= sign(w^Tx)
\end{align}
$$
It is known as the perceptron classifier.
Pictures
At first, we will give PLA a initial weight vector $w_0$, and at most times it will not be correct to separate all the data. Then we will correct the mistakes one by one:
(This picture is copied from the lecture slides of the course “Machine larning algorithm” on the platform of coursera)
It says, when the classifier is false, we can correct as follows: if the result is positive but we give is negative, then we will add the weight vector by the false $x$ vector. In the opposite, we will subtract the false $x$ vector from the weight vector.
The essence of this approach is that, we will always make our weight vector to be nearer to the mistake classified observation. The best case will be they are overlapped, since our weight vector is served as the normal vector to the region it defines separately.
But there will come up with one problem: the order to correct the mistakes. By default, after we correct one mistake, we will go on to the next observation, not from the first one.
R implementation
Here is the R code I write:


My thoughts
But now here there is some confusion that puzzles me the most. We have already give the weight a initial value, but it only serves as a normal vector. To confirm the plane, no matter a line or a 3Dplane or higher plans, we need a point together with the normal vector to locate the position of the plane. The formula for a normal vector and a point to confirm a plane is (I illustrate it in the 3Dplane):
$$ (x  x_0) * a_i + (y  y_0) * b_i + (z  z_0) * c_i = 0 $$
where $(x_0, y_0, z_0)$ is the point and $(a_i, b_i, c_i)$ is the normal vector. If we just left the point with blank, the point we actually use is (0,0,0). But there are some circumstances that starting with the zero point will give us no answer. The picture for this point will be shown in the example part.
So I think the most safe way to do is find two points that are different from each other in the output, then calculate the midder point of them as the initial point.
Example
Here I use rnorm()
to generate five observations, and then generate some corresponding z to make them separable.


And the p_norm vector I chose is the middle point of the first two vectors.


Then the PLA will update for three times, if the initial vector is zero vector:


What if we start with point zero as the initial point? You can draw it, but the result is, the PLA will run for indefinite time, since it will never halt.
Even if we run the right PLA, we may still cannot find the perfect classifier. That is due to the dataset is not separable. So in this circumstance, We might want to calculate a weight vector which minimize the error proportion. But this has been proved to be a NP problem, so the only way to do this is to try for a number of times, say 100 times, then calculate its error rate, and try another 100 times, calculate again and again…