Measures of Central Tendency

You are gathering information on the number of customers you entertain everyday in a fast food chain. You decide to sample for a period of 1 week, and you decide to assume that the number of customers you get daily is completely uncorrelated with time. You're census yields the results below:

Sunday - 1430

Monday - 950

Tuesday - 870

Wednesday - 1100

Thursday - 950

Friday - 950

Saturday - 1600

Now, you want to process the data so that you arrive with a singular value that can best represent the entirety of the data. The best and simplest processes to use are called measures of central tendency, and the 3 mostly used are the mean, median and mode.

The mean is simply the average of the data. The median is the center of the data set regardless of the value of the data. The mode is the data that has possesses the highest number of occurences.

To get the mean, simply sum the entire data set and divide by the number of data elements. (1430+950+870+1100+950+950+1600)/7=1121.4

To get the median, simply arrange the data and get the middle element. If the data count is odd, there is no problem. But if the data count is even, then we add the two data elements in the middle and take their average.

870 950 950 950 1100 1430 1600

950 is the element in the middle therefore the median is 950.

To get the mode, we simply observe which element occurs the most. 950 has shown up 3 times, compared to the other elements that have only shown up once. Thus, 950 is the mode.

The mean is the most popular method used because it takes into account the values of the data. The median and mode are only functions of position and occurrence respectively.

If we choose the mean, we can say that the number of customers you are entertaining everyday is 1122 (rounded up, since there is no such thing as a 0.4 person)

But we might also want to consider whether the data we are getting makes any sense at all. What if the number of customers who enter that week are too erratic/random, even if a prior stochastic model has been assumed? What if we get the results below instead of the census above?

Sunday - 15200

Monday - 90

Tuesday - 8700

Wednesday - 5

Thursday - 25000

Friday - 500

Saturday - 10000

If we attempt to get the mean, we'll arrive at (15200+90+8700+5+25000+500+10000)/7 = 8499.3

But is it really reasonable to claim that you are entertaining an average of 8500 customers a day after reviewing your census? Perhaps we would like to quantify the variance of our census and define a certain level which when exceeded could invalidate the census? Luckily, the variable for variance in statistics is properly termed for the function it serves (because some terms are coined with words that don't correctly describe the function they serve), and is termed - variance. Variance is the summation of the square of the deviation of each element from the mean multiplied by 1/(n-1) where n is the total number of data points. I didn't describe by word 1/(n-1) because I wanted to point out that it is in a way similar to a form of central tendency only this time you are taking away the central tendency (by subtracting the mean from each element inside the summation) thereby leaving the "value that varies" as a result. (The reason why it is squared can be found on one of my previous Fast Fact posts, RMS Unraveled)

With this new handy tool, let us compare the variance of the two census:

(((1430-1.1214e+003)^2)+3*((950-1.1214e+003)^2)+((870-1.1214e+003)^2)+((1100-1.1214e+003)^2)+((1600-1.1214e+003)^2))/6

=79348

(((15200-8.4993e+003)^2)+((90-8.4993e+003)^2)+((8700-8.4993e+003)^2)+((5-8.4993e+003)^2)+((25000-8.4993e+003)^2)+((500-8.4993e+003)^2)+((10000-8.4993e+003)^2))/6

= 87721000

As we can see, there is a huge difference between the variance of both census. Perhaps we can define a variance that should not exceed 100000 for a census to be considered valid for processing with a means of central tendency.

If you have read the Fast Fact post RMS Unraveled (where you'll know why we take the square when measuring variations), maybe you are wondering also, why not reverse the squaring process by taking the square root of the variance? Yes, in fact, statisticians have a term for the value arrived at when we take the square root of the variance, which is standard deviation (another properly coined term).

When two data sets are involved, we can take its covariance. The formula is similar to variance, except that instead of squaring the deviation, we multiply it with the deviation of the other data set from its mean. Everything else in the formula remains unchanged.

Correlation is when we divide the covariance by the product of the standard deviations of the two data sets. By doing this, we are measuring how similar the two data sets are. The resulting value of the correlation operation is termed the cross-correlation of the two data sets. Another way of correlation, in digital signal processing application, is by reverse convolution, or more accurately, to reverse the signal prior to convolving.

Probability

(Before reading on, I believe it is important to remember the significance of the sample space. The sample space is the set of all outcomes of which when their probabilities are summed together, the result is 1. This summation law does not work with points since it requires the addends to be a sequence of sets.)

Let:

A = event of you getting a date

B = event of an apocalypse

The chances of you getting a date and the apocalypse is equal to the sum of the probability of each happening independently subtracted by the chances of each happening by precluding each other. If the apocalypse is completely unrelated to you getting a date, then the chances of each happening by precluding each other is zero, and the chances are simply composed of the individual chances of happening.
(More formally, the sample spaces of each event is unique and does not intersect)

However, if A and B are not mutually exclusive (their sample space intersects), the prior stated variable is non-zero.

This is logical, because if you getting a date contributes to the chances of an apocalypse, then you must subtract that away from the probabilities of each happening independently.

P(A Union B) = P(A) + P(B) - P(A intersection with B)

Now, to calculate the probability of you getting a date intersection with the apocalypse, you simply multiply the probability of the apocalypse happening because of you getting a date to the probability of you getting a date. (More formally, the cause can effectuate both the events)

P(A intersection with B) = P(B|A)*P(A)

Let us take a heroic example:

Given:

P(A)=0.00001
P(B)=0.00001
P(B|A)=0.9999

Find:

The probability of the apocalypse while you are getting a date.

Solution:

P(A Union B)= P(A)+P(B)-P(A intersection with B)

P(A intersection with B)=P(B|A)*P(A)

=0.9999*0.00001

=0.000009999

P(A Union B)=0.00001+0.00001-0.000009999

=0.000010001

Imagine if P(A) were 0.8

P(A intersection with B)=0.7999

P(A Union B)=0.00011

(0.000010001-0.00011)/0.000010001=998.9% or 1000% increase in the chances of the apocalypse happening!

Thus, by minimizing your chances of getting a date from 80% to 0.001%, you're actually decreasing the chances of an apocalypse happening by 1000%, a feat that can only be achieved by a true hero.

Let:

A=event of getting a college diploma

B1=event of clearance with tuition fees

B2=event of passing all the subjects

B3=event of getting kicked out of the university

According to the partition law, the probability of getting a college diploma is equal to the probability of getting a college diploma given that the tuition fees are cleared multiplied by the probability of clearing pending tuition fees, plus the probability of getting a college diploma given that you passed all the subjects multiplied by the probability of passing all the subjects, etc.

P(A)=P(A|B1)*P(B1)+P(A|B2)*P(B2)+P(A|(1-B3))*P(1-B3)

Take note that I used 1-B3 for the third partition because getting kicked out of the university doesn't help give you a college diploma.

Bayes' Formula

Let:

A=event of an IC failing

B=event of a problem in the production line

The multiplication property of the intersection of 2 events is the foundation of Bayes' formula P(A|B)*P(B)=P(B|A)*P(A). The probability of an IC failing given a problem in the production line scaled by the probability of a problem in the production line is equal to the probability of a problem in the production line given that an IC failed scaled by the probability that an IC failed. Does it make sense to you? Neither does it make sense to me. What if I re-arrange the equation.

P(A|B)/P(B|A)=P(A)/P(B)

Perhaps now, it would be easier to comprehend and conceptualize if the formula is shown in the form above. The ratio between the probability of 2 non-mutually exclusive events ocurring (A and B) is equal to the probability of the same 2 non-mutually exclusive events occurring even if the other has happened. Makes complete sense now.

Probability of an IC failing over a problem in the production line is equal to the probability of an IC failing over a problem in the production line given that both have happened. (No pun intended.)

And now, to introduce Bayes' formula. I know the chances of an IC failing given that there was a problem in the production line by simply observing that a problem in the production line happened, but how would I know the chances that there is a problem in the production line because I saw an IC fail? You can't sample it, you can't go through the entire production line and look for a problem, and you certainly can't rely on a fortune teller. I know P(A|B) but I do not know P(B|A). So:

P(B|A)=P(A|B)*P(B)/P(A)

Now I'll know if the chances of my IC failing because of a flaw in the production line. But what if I want to know what specific part of the production line seeds the problem? Then, we can simply use the partition law for P(B):

P(B|A)=P(A|B)*P(B)/(P(A|B1)*P(B1)+P(A|B2)*P(B2)+...P(A|Bn)*P(Bn))

B1 can be the UV printing part of the production line, B2 can be the division of Si wafers part of the production line, up to Bn.