Skip to main content

What Ifs? Sigmoid Function vs. Error Function in Machine Learning through Logistic Regression

While brushing up on some study materials in Mathematics, a familiar function piqued my interest. It was the error function, the solution to the non-elementary integral exp(-x^2) and whose complement is used in determining the conditional probability of bit error due to noise:

Or quite simply, the probability of error due to noise.

But the real point of interest was the nature of the curve of the function shown below.

Now, why be so interested in such a function? When I compared erf(x) with the sigmoid function commonly used in defining the decision boundary in machine learning algorithms, it returned a steeper slope. Then the thought came to me. What would be the differences of using the error function instead of the sigmoid function? Would the cost improve? Would the training accuracy improve?

And so my curiosity got the better of me and I played around with both of the functions to see what would happen.

Sigmoid vs. Error

First of all, replacing the sigmoid function with the error function outright won’t work. The levels are all wrong. To get both functions to be at similar levels (logistic right?), I add offset to the error function by 1 unit and scale it down by a factor of 2.

To mathematically check its similarity to the sigmoid function, I take the correlation of the 2 functions. I am expecting the correlation to be close to 1.


>>y=1./(1+exp(-x)); %the sigmoid function

>>a=(1/2)*(erf(x)+1); %the adjusted error function

>>corr(a’,y’) = 0.9901 %correlation is indeed close to 1 which proves the similarity between the 2 functions, this is Pearson’s linear correlation coefficient

>>corr(a’,y’,’type’,’Kendall’) = 0.9565 %Kendall’s tau

>>corr(a’,y’,’type’,’Spearman’) = 0.9912 %Spearman’s rho

To compare both functions visually, I overlay the plots of both functions on the figure below.

The eye can easily judge that the rising slope of the error function is steeper than the sigmoid function.

Testing the performance of the sigmoid and error functions in logistic regression

In order to see the effect of using the error function (a function with a steeper slope) instead of the sigmoid function as a hypothesis in logistic regression, I will be using a 100 sample training set whose final theta will be determined by the fminunc function.

The cost at initial values of theta (i.e. 0) are the same for both the sigmoid and error functions, that is 0.693147. However there is a slight difference between the costs of the 2 functions at the final value of theta. Fminunc determined a cost of 0.203506 for the sigmoid function while a cost of 0.201282 was determined for the error function. I am not sure if this is due to the iteration being terminated earlier for the sigmoid function but the diff. is too small to significantly impact our 100 sample training set.

Finally, for a 100 sample training set, both functions arrived at the same train accuracy of 89 after comparing the predictions. I am a bit skeptical though, perhaps the train accuracy of the error function would be higher if samples chanced on the area to the right of the sigmoid boundary but to the left of the error function boundary. A recommended study of this would be how the performance would change with variable sizes of the training set.


  1. I've been thinking about the exact same thing. This comparison is great. I had some data that was normally distributed. I decided to map the data to its difference from the mean scaled by its standard deviation. But when plotting the data some points were 200 standard deviations out from the mean and skewed the plot. Since in reality anything about 4 or 5 SDev is already anomalous enough, I applied the logit function to scale all the data into probabilities. I then thought, "...hang on. Isn't this the same as using the error function!"


Post a Comment

Popular posts from this blog

Calculator Techniques for the Casio FX-991ES and FX-991EX Unraveled

In solving engineering problems, one may not have the luxury of time. Most situations demand immediate results. The price of falling behind schedule is costly and demeaning to one's reputation. Therefore, every bit of precaution must be taken to expedite calculations. The following introduces methods to tackle these problems speedily using a Casio calculator FX-991ES and FX-991EX.

►For algebraic problems where you need to find the exact value of a dependent or independent variable, just use the CALC or [ES] Mode 5 functions or [EX] MENU A functions.

►For definite differentiation and integration problems, simply use the d/dx and integral operators in the COMP mode.

►For models that follow the differential equation: dP/dx=kt and models that follow a geometric function(i.e. A*B^x).

-Simply go to Mode 3 (STAT) (5)      e^x
-For geometric functions Mode 3 (STAT) 6 A*B^x
-(Why? Because the solution to the D.E. dP/dx=kt is an exponential function e^x.
When we know the boundary con…

How to Fix "Virtual Router" 0.9 and Above - "Virtual Router Can't Be Started"

Aside from electronics and communications (physical layer), we've also had network fundamentals taught in our curriculum (through advanced). I've heard of this free software called Virtual Router that allows you to turn your computer into a router per se. Strictly speaking, by definition of a router, it doesn't. But it looks like it serves the purpose of a wireless router configured for 802.11 services, so that will suffice. And its free, so I gave it a shot. I've managed to turn a broadband connection from a USB connector (SMART, that is) into a wi-fi hotspot. It offers WPA2-PSK encryption and a view-able list of devices connected. However, a problem occured after some time using it. That is, when I click on the Start Virtual Router button, a window would appear saying that "Virtual Router Can't Be Started". Some suggest to simply restart the program, or wait some time. Sometimes this won't work, and not for everyone. Why? Sometimes, the targets…