Home » Siamese neural network

Siamese neural network


I think it’s a great project! But it could do with a few improvements:

Neuron Type(1)

Suppose we have a network of perceptrons that we’d like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned image of a signature. And we’d like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we’d like is for this small change in weight to cause only a small corresponding change in the output from the network.

enter image description here

If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an “c” when it should be a “o”. We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a “o”. And then we’d repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning.

The problem is that this isn’t what happens when our network contains perceptrons. In fact, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. That flip may then cause the behavior of the rest of the network to completely change in some very complicated way. So while your “o” might now be classified correctly, the behavior of the network on all the other images is likely to have completely changed in some hard-to-control way. That makes it difficult to see how to gradually modify the weights and biases so that the network gets closer to the desired behavior. Perhaps there’s some clever way of getting around this problem. But it’s not immediately obvious how we can get a network of perceptrons to learn.

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but modified so that small changes in their weights and bias cause only a small change in their output. That’s the crucial fact which will allow a network of sigmoid neurons to learn.

Just like a perceptron, the sigmoid neuron has inputs, $ x1 $, $ x2 $, … But instead of being just 0 or 1, these inputs can also take on any values between 0 and 1. So, for instance, 0.638 is a valid input for a sigmoid neuron.

enter image description here

The Sigmoid Neuron is defined as:

$$ sigma(z) = dfrac{1}{1 + e^{-z}} $$

Torch implements this neuron type here.

(1) Excerpt with minor edits from Neural Networks and Deep learning

Cost Function

I don’t see any use of a cost function in your code. I’m going to recommend you read this section in Neural Networks and Deep Learning to get a good reason why you should be using one.

In short, the cost function returns a number representing how well the neural network performed to map training examples to correct output. The basic idea is that the more “wrong” our network is at achieving the desired results, the higher the cost and the more we’ll want to adjust the weights and bias to achieve a lower cost. We try and minimize this cost using methods such as gradient descent.

There are certain properties that you look for in a cost function, such as convexity (so gradient descent finds a global optima instead of getting stuck in a local optima). As the book suggests, I would lean towards using the cross-entropy cost function.

The way we implement this in Torch is with Criterions. Torch seems to have implemented a bunch of these cost functions, and I encourage you to try different ones and see how they affect your neural net accuracy.


It could be possible that you fit your data too well, to the point where we don’t generalize well enough. An example of this is given in the picture:

enter image description here

Noisy, linear-ish data is fitted to both linear and polynomial functions. Although the polynomial function is a perfect fit, the linear version generalizes the data better.

I don’t know Lua very well, but by looking at your code I don’t see any attempts to reduce over-fitting. A common approach to this is by implementing regularization. Since it’s too hard of a topic to cover in-depth here, I’ll leave you to understand it if you would like. It is quite simple to use once its concepts are understood, you can see from this Torch implementation here.

Another way to reduce over-fitting is by introducing dropout. At each training stage, individual nodes are “dropped out” of the net so that a reduced network is left. Only the reduced network is trained on the data in that stage. The removed nodes are then reinserted into the network with their original weights. The nodes become somewhat more insensitive to the weights of the other nodes, and they learn how to decide more on their own.

Dropout also significantly improves the speed of training while improving performance (important for deep learning)!

Gradient Checking

For more complex models, gradient computation can be notoriously difficult to debug and get right. Sometimes a buggy implementation will manage to learn something that can look surprisingly reasonable (while performing less well than a correct implementation). Thus, even with a buggy implementation, it may not at all be apparent that anything is amiss. Therefore, you should numerically check the derivatives computed by your code to make sure that your implementation is correct.

I found an implementation of gradient checking with Torch here. Be warned that this check is computationally expensive, so once you’ve verified that your implementation of backpropagation is correct you should turn off gradient checking.

Principal Components Analysis

PCA can be used for data compression to speed up learning algorithms, and can also be used to visualize feature relations. Basically, in a situation where you have a WHOLE BUNCH of independent variables, PCA helps you figure out which ones matter the most and gets rid of the others (think of having centimeters and inches both as input features, we only need one to get the same information).

Looking at the research paper you linked, it looks like there are only 10 features input into the neural net. But to me, it looks like we could get rid of 2, possibly 3 features! That’s quite a bit for the few features we have. The functions $ sin $ and $ cos $ are related to each other, why do we need both to measure direction and curvature of the trajectory when we could use just one and get the same information into the neural network?

One could also make the argument that the centripetal and tangential accelerations are related to each other, or that the velocity and curvature together rule out the need for the centripetal acceleration since $ a_c = frac{v^2}{r}$. More analysis would be needed by the software to determine that thoroughly.

Be warned, if not applied correctly PCA can reduce neural network accuracy. PCA is also not to be used to handle over-fitting (since overfitting usually occurs when many features are present). There is a nice GitHub repo here covering PCA using Torch.

Related Solutions

Calculate the sum with minimum usage of numbers

Here's a hint: 23 : 11 + 11+ 1 ( 3 magic numbers) 120: 110+ 10 (2 magic numbers) The highest digit in the target number is the answer, since you need exactly k magic numbers (all having 1 in the relevant position) in order for the sum to contain the digit k. So...

Why not drop the “auto” keyword? [duplicate]

Your proposal would be rejected on the basis of backward compatibility alone. But let's say for the sake of argument that the standards committee like your idea. You don't take into account the numerous ways you can initialize a variable widget w; // (a) widget...

Recursive to iterative using a systematic method [closed]

So, to restate the question. We have a function f, in our case fac. def fac(n): if n==0: return 1 else: return n*fac(n-1) It is implemented recursively. We want to implement a function facOpt that does the same thing but iteratively. fac is written almost in...

How can I match values in one file to ranges from another?

if the data file sizes are not huge, there is a simpler way $ join input1 input2 | awk '$5<$4 && $3<$5 {print $2, $5-$3+1}' B100002 32 B100043 15 B123465 3 This Perl code seems to solve your problem It is a common idiom: to load the entire...

Javascript difference between “=” and “===” [duplicate]

You need to use == or === for equality checking. = is the assignment operator. You can read about assignment operators here on MDN. As a quick reference as you are learning JS: = assignment operator == equal to === equal value and equal type != not equal !==...

Compiler complains about misplaced else [closed]

Your compiler complains about an misplaced else because, well, there is an else without a preceding if: // ... for (j=1; j<n-i; j++) { if(a[j]<=a[j+1]) { // ... } // END OF IF } // END OF FOR else { continue; } // ... The else in your code does not follow...

Bootstrap – custom alerts with progress bar

/* !important are just used to overide the bootstrap css in the snippet */ .alertContainer { border-radius: 0 !important; border-width: 0 !important; padding: 0 !important; height: auto !important; position: absolute !important; bottom: 15px !important; left:...

How to Garbage Collect an external Javascript load?

Yes, s.onload = null is useful and will garbage collect! As of 2019, it is not possible to explicitly or programmatically trigger garbage collection in JavaScript. That means it collects when it wants. Although there is cases where setting to null may do a GC...

Math programming with python

At first, what you are looking for is the modulo operator and the function math.floor() Modulo from wikipedia: In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus). for example: 12%12=0...

Android slide over letters to create a word [closed]

Here some advice you can use: First for each cell you can create an object that represents the state of that cell: class Cell { char mChar; int row,column; boolean isSelected; } then you can create a 2D array of your cells Cell[][] mTable = ... For views you...

Sum two integers in Java

You reused the x and y variable names (hence the variable x is already defined in method main error), and forgot to assign the ints read from the Scanner to the x and y variables. Besides, there's no need to create two Scanner objects. public static void...

Extend three classes that implements an interface in Java

Using this simplified implementation of the library, using method() instead of M(): interface IFC { void method(); } class A implements IFC { public void method() { System.out.println("method in A"); }; } As akuzminykh mentions in their comment You'd write a...

How to set the stream content in PHPExcel? [closed]

Okey, First thing first PHPExcel_Worksheet_MemoryDrawing() can't solve your problem if you insist to use stream content and pass that to your worksheet your PDF will not render your image. But you can use `PHPExcel_Worksheet_Drawing()' if you want to render...

How to remove all files from a directory?

Linux does not use extensions. It is up to the creator of the file to decide whether the name should have an extension. Linux looks at the first few bytes to figure out what kind of file it is dealing with. To remove all non-hidden files* in a directory use: rm...