Jarl K. Holta

Best known as slacky.
I have nearly 20 years of self-taught programming experience, specialize in tackling challenges in computer vision and computational geometry.

Categories

Tags

Links

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

article source

Simba 2.0: A Step into Machine Learning

25 December 2024 Machine Learning# simba, kNN, ML, AI

132 views 1306 words

Simba 2.0: Entering the Realm of Machine Learning

Simba 2.0 has traditionally relied on classical methods for object detection. However, with the new addition of a powerful KDTree data structure and the KNearestClassify-method, Simba 2.0 now enters the realm of Machine Learning. This development empowers users to leverage the power of k-Nearest Neighbors (kNN) for a range of image analysis tasks.

The Versatility of kNN

The k-Nearest Neighbors (kNN) algorithm, a core concept in Machine Learning, has a long history starting in the 1950s. It operates by classifying new data points based on the labels of their closest neighbors in a dataset. This versatile technique finds applications across various domains. From having powered recommendation systems on platforms like Netflix and Amazon, assisting in medical image analysis, character recognition to even driving financial forecasting, kNN demonstrates the power of Machine Learning to recognize hidden patterns and make intelligent predictions.

Fun fact

When HP created Tesseract OCR, it used k-NN as its primary method for character recognition. This technique remained part of the engine until version 3.0, when Google introduced a more sophisticated neural network-based approach.
Google Image Search was claimed to originally use k-NN to find and rank visually similar images [unsure if this is true].
A number of image related services have or uses to this day kNN in a similar way as to what google did, but day by day more sophisticated take over.

Exploring kNN

In this article, we delve into the practical application of kNN within the Simba 2.0 environment by exploring its implementation in classifying handwritten digits from the renowned MNIST dataset. I'll guide you through the algorithm's inner workings, analyze its performance, and discuss the challenges and considerations. This exploration will provide valuable insights into the power and versatility of this machine learning algorithm.

kNN classification

kNN classification finds the 'k' nearest neighbors to a new data point within the dataset. Imagine a set of points representing different fruits, like apples and cucumbers. Each point has two properties:

Height: The length of the fruit (up and down on the graph).
Width: The thickness of the fruit (left to right on the graph).

Each point is labeled with its fruit type (e.g., "apple" or "cucumber"). To classify a new fruit, kNN finds the 'k' closest fruits in terms of height and width. The new fruit is then assigned the label that appears most often among its 'k' nearest neighbors.

For example, if the three closest fruits to a new fruit are two apples and one cucumber, the new fruit would be classified as an apple.

Visualizing with a Graph:

Red points represent apples, and green points represent cucumbers. In this graph, apples tend to be clustered around the same height and width (around 9cm for both). Cucumbers, however, are longer and thinner, spread out in a line between 12cm and 22cm in height but only 3cm to 7cm wide. The image also includes both standing and laying cucumbers to account for orientation variations.

Apples and cucumbers

Querying the Classifier:

If we query a point like [15, 6] (high height, low width), the kNN classifier will find mostly cucumbers as its nearest neighbors due to their similar heights.
Even if we reverse the order to [6, 15], the closest neighbors might still be cucumbers because kNN considers both height and width, accounting for the rotation thanks to our training data including both orientations.
On the other hand, a query point like [7, 7] (similar height and width) would likely have apples as its nearest neighbors, classifying the new fruit as an apple.

Limitations:

kNN doesn't predict "unknown". If a query point like [0, 0] is far from all data points, kNN will still classify it based on the nearest neighbors, even if they aren't very close.
No model training takes place; all computation occurs during prediction.
Curse of Dimensionality, the performance of kNN degrades as the number of dimensions increases.
Requires manual preprocessing like careful feature scaling and selection, while NN can learn features in training
kNN lacks the ability to generalize well from the training data
kNN is relatively sensitive to noisy or mislabeled data

Note

This example uses only two dimensions (height and width) for simplicity. However, kNN can be applied to datasets with many more dimensions. For example, to classify images, you could use features like color (represented by red, green, and blue values). This would result in data points with higher dimensionality, such as [width, height, R, G, B]. By considering multiple features, kNN can make more accurate and nuanced classifications.

kNN with Simba 2.0

Now that you hopefully understands the base principles of kNN we can look at how we solve something like MNIST dataset.

To start we need to be able to parse the MNIST dataset, for this I have written a couple functions that can be viewed here: https://pastebin.com/H3eEMa0G

While the dataset itself can be found many places, here is one set (top google result): https://www.kaggle.com/datasets/hojjatk/mnist-dataset

MNIST example

var
  Images,TrainImages: array of TSingleMatrix;
  Labels,TrainLabels: array of Byte;
begin
  WriteLn('>>> Parsing MNIST dataset');
  TrainImages := ParseMNISTFile('Scripts\MNIST\train-images.idx3-ubyte');
  TrainLabels := ParseMNISTLabels('Scripts\MNIST\train-labels.idx1-ubyte');

  Images := ParseMNISTFile('Scripts\MNIST\t10k-images.idx3-ubyte');
  Labels := ParseMNISTLabels('Scripts\MNIST\t10k-labels.idx1-ubyte');
end;

All the letters in the images are grayscale, 0..255, and 24x24 in size, we store it in a TSingleMatrix for rows[24] and columns[24], rather than a TImage.

We get two sets, one is 50K images and Labels (0..9) and the second one is for us to test on with 10K images and labels that have not been trained on.

My approach to extracting features is very simplistic, I divide the image into 6x6 quadrants, then I count the number of pixels in each quadrant. This gives us 16 features to use, or 16 dimensions.

(*
  Parse a single image into a 16 features, which is just us counting
  the number of pixels in a quadrant of the shape.

  As the image is antialiased (smooth) we threshold by simply
  extracting pixels that have more than 64 brightness.

  Each feature is normalized to the range 0..1
*) 
function VectorDataBlock(Img: TSingleMatrix): TSingleArray;
var
  TPA: TPointArray;
  tmp: T2DPointArray;
  Bounds: TBoxArray;
  i: Int32;
begin
  SetLength(Result, 4*4);
  TPA := Img.Indices(64, __GT__);
  Bounds  := TPA.Bounds.Partition(4,4);
  for i:=0 to High(Bounds) do
    Result[i] := TPA.ExtractBox(Bounds[i]).Length / ((24*24) / (4*4));
end;

(* 
  Do the above for every image in our set
*)
function ComputeVectors(Images: array of TSingleMatrix; Labels: TByteArray): TKDItems;
var i: Int32;
begin
  SetLength(Result, Length(Images));
  for i:=0 to High(Images) do
  begin
    Result[i].Ref := Labels[i];
    Result[i].Vector   := VectorDataBlock(Images[i]);
  end;
end;

Now we can compute the feature vectors for every number in our test and training data

var
  TrainingData, TestData: TKDItems;
begin
  WriteLn('>>> Generating features');
  TrainingData := ComputeVectors(TrainImages, TrainLabels);
  TestData     := ComputeVectors(Images, Labels);
end;

Finally we can now generate a KDTree for our classification. KDTrees are well suited for kNN as they provide a very fast way to get the k nearest neighbors of some input, but it does take some time to build.

var
  Tree: TKDTree;
begin
  WriteLn('>>> Training the lookup tree with features');
  Tree.Init(TrainingData);
end;

Finally we can use it by calling Tree.KNearestClassify:

var
  i, category, hits: Int32;
begin
  WriteLn('>>> Testing against test-set');
  for i:=0 to High(TestData) do
  begin
    if i mod 1000 = 999 then
      Writeln(i+1,' -> Accuracy: ', Format('%.3f', [hits / (i+1)*100]),'%');

    category := Tree.KNearestClassify(TestData[i].Vector, 7);
    if category = TestData[i].Ref then
      Inc(hits);
  end;

  WriteLn('MNIST Accuracy: ', hits / Length(TestData)*100,'%');
end;

Impressive accuracy

The kNN model we just made achieved an impressive 94% accuracy on the MNIST dataset, demonstrating its effectiveness for handwritten digit recognition. This result is comparable many simpler neural networks while being much easier to implement.

/slacky out