Skip to main content

What Ifs? Sigmoid Function vs. Error Function in Machine Learning through Logistic Regression



While brushing up on some study materials in Mathematics, a familiar function piqued my interest. It was the error function, the solution to the non-elementary integral exp(-x^2) and whose complement is used in determining the conditional probability of bit error due to noise:






Or quite simply, the probability of error due to noise.




But the real point of interest was the nature of the curve of the function shown below.



Now, why be so interested in such a function? When I compared erf(x) with the sigmoid function commonly used in defining the decision boundary in machine learning algorithms, it returned a steeper slope. Then the thought came to me. What would be the differences of using the error function instead of the sigmoid function? Would the cost improve? Would the training accuracy improve?

And so my curiosity got the better of me and I played around with both of the functions to see what would happen.


Sigmoid vs. Error


First of all, replacing the sigmoid function with the error function outright won’t work. The levels are all wrong. To get both functions to be at similar levels (logistic right?), I add offset to the error function by 1 unit and scale it down by a factor of 2.



To mathematically check its similarity to the sigmoid function, I take the correlation of the 2 functions. I am expecting the correlation to be close to 1.

>>x=[-10:0.01:10];

>>y=1./(1+exp(-x)); %the sigmoid function

>>a=(1/2)*(erf(x)+1); %the adjusted error function

>>corr(a’,y’) = 0.9901 %correlation is indeed close to 1 which proves the similarity between the 2 functions, this is Pearson’s linear correlation coefficient

>>corr(a’,y’,’type’,’Kendall’) = 0.9565 %Kendall’s tau

>>corr(a’,y’,’type’,’Spearman’) = 0.9912 %Spearman’s rho

To compare both functions visually, I overlay the plots of both functions on the figure below.



The eye can easily judge that the rising slope of the error function is steeper than the sigmoid function.


Testing the performance of the sigmoid and error functions in logistic regression

In order to see the effect of using the error function (a function with a steeper slope) instead of the sigmoid function as a hypothesis in logistic regression, I will be using a 100 sample training set whose final theta will be determined by the fminunc function.



The cost at initial values of theta (i.e. 0) are the same for both the sigmoid and error functions, that is 0.693147. However there is a slight difference between the costs of the 2 functions at the final value of theta. Fminunc determined a cost of 0.203506 for the sigmoid function while a cost of 0.201282 was determined for the error function. I am not sure if this is due to the iteration being terminated earlier for the sigmoid function but the diff. is too small to significantly impact our 100 sample training set.

Finally, for a 100 sample training set, both functions arrived at the same train accuracy of 89 after comparing the predictions. I am a bit skeptical though, perhaps the train accuracy of the error function would be higher if samples chanced on the area to the right of the sigmoid boundary but to the left of the error function boundary. A recommended study of this would be how the performance would change with variable sizes of the training set.





Comments

  1. I've been thinking about the exact same thing. This comparison is great. I had some data that was normally distributed. I decided to map the data to its difference from the mean scaled by its standard deviation. But when plotting the data some points were 200 standard deviations out from the mean and skewed the plot. Since in reality anything about 4 or 5 SDev is already anomalous enough, I applied the logit function to scale all the data into probabilities. I then thought, "...hang on. Isn't this the same as using the error function!"

    ReplyDelete

Post a Comment

Popular posts from this blog

Calculator Techniques for the Casio FX-991ES and FX-991EX Unraveled

In solving engineering problems, one may not have the luxury of time. Most situations demand immediate results. The price of falling behind schedule is costly and demeaning to one's reputation. Therefore, every bit of precaution must be taken to expedite calculations. The following introduces methods to tackle these problems speedily using a Casio calculator FX-991ES and FX-991EX.


►For algebraic problems where you need to find the exact value of a dependent or independent variable, just use the CALC or [ES] Mode 5 functions or [EX] MENU A functions.


►For definite differentiation and integration problems, simply use the d/dx and integral operators in the COMP mode.


►For models that follow the differential equation: dP/dx=kt and models that follow a geometric function(i.e. A*B^x).

[ES]
-Simply go to Mode 3 (STAT) (5)      e^x
-For geometric functions Mode 3 (STAT) 6 A*B^x
-(Why? Because the solution to the D.E. dP/dx=kt is an exponential function e^x.
When we know the boundary con…

Common Difficulties and Mishaps in 6.004 Computation Structures (by MITx)

Updated: 
May 6, 2018
VLSI Project: The Beta Layout [help needed]Current Tasks: ►Complete 32-bit ALU layout [unpipelined] in a 3-metal-layer C5 process. ►Extend Excel VBA macro to generate code for sequential instructions (machine language to actual electrical signals).
Current Obstacles/Unresolved Decisions:
►Use of complementary CMOS or pass transistor logic (do both? time expensive, will depend on sched.
►Adder selection: Brent-Kung; Kogge Stone; Ladner Fischer (brent takes up most space but seems to be fastest, consider fan-out) [do all? time expensive, will depend on sched.)
►layout requirements and DRC errors

Please leave a comment on the post below for advise. Any help is highly appreciated.




Yay or Nay? A Closer Look at AnDapt’s PMIC On-Demand Technology

Innovations on making product features customizable are recently gaining popularity. Take Andapt for example, a fabless start-up that unveiled its Multi-Rail Power Platform technology for On-Demand PMIC applications a few months back. (read all about it here: Will PMIC On-Demand Replace Catalog Power Devices?) Their online platform, WebAmp, enables the consumer to configure the PMIC based on desired specifications. Fortunately, I got a hands-on experience during the trial period (without the physical board (AmP8DB1) or adaptor (AmpLink)). In my opinion, their GUI is friendly but it lacks a verification method for tuning (i.e. the entered combination of specs). How would we know if it will perform as expected or if there are contradicting indications that yield queer behavior? Also, there is not just one IP available, but many that cater to a differing number of channels and voltage requirements (each with their own price tag).
Every new emerging technology has the potential to oversh…