Monday, March 5, 2012

Lies, damn lies, and statistics!

Agent Log Entry 2012.03.05 : Special Agent Ethan Hunt: Security Clearance Level Theta 

I am a big fan of statistics.  They have helped me escape from many dangerous missions.  My last post referenced some statistical elements that many agents working their dangerous supply chains may not be familiar with.  I'd like to first review some of those, and then finish explaining how I built a metric for demand variation that is compressed down to one number.

I won't be at this location long enough to explain the details of everything I referenced in my last communique.  I will, however, cover the following elements:
  • Slope
  • Intercept
  • Regression
Slope is normally described by the ratio of the "rise" divided by the "run" between two points on a line.  This is easy to calculate if we have a straight line, but we know our demand signals are not linear (See Figure 1 in my last entry).  This means we need to generate the linear line that passes "through" our volatile demand signal.

Excel can accomplish this for you automatically, but what if we'd like to view the data that is generated, and actually perform other formulas against it?
Last time I mentioned calculating "cross products":

(X-Avg X)(Y-Avg Y)
(X-Avg X)^2

X represents our time periods, and Y represents our demand in dollars.  Figure 1 displays our sample data and the cross products are on the right in Figure 2.

Figure 1
Figure 2
With the data in Figure 2, we can easily calculate the Slope by dividing the SUM of the dollars in Figure 2 by the sum of the squared values on the right of Figure 2.

Once we know the Slope, calculating the Intercept is also easy.  In this case we need the Y-Intercept, which is the point where our linear line would cross the Y Axis.

The Intercept can be calculated by first calculating the average of column X (Figure 1).  Next calculate the average of column Y (Figure 1) and multiply it by the Slope value we calculated above.  So if b is our Slope, then the equation looks like this:

AVG(X)-b*AVG(Y)

With the Slope and Intercept calculated, we can actually formulate the data necessary to chart our Regression line right through our volatile demand signal.

Hang on for a second...
Their belts are blacker than the
inside of a coffin at midnight
Figure 3
OK, I am back.  I thought I heard the footsteps of ninjas and would have to change location.  Must have been my imagination.

Where was I?  Oh yes, the Regression line.  If the Intercept is a and the Slope is b, then we can use the following formula to calculate our data points for plotting:

Y = a + b*X

Using this formula and the data generated in Figures 1 & 2, we can generate the data in Figure 3.

You can test the quality of your formula by simply picking two consecutive values, and if the difference between them is equal to the Slope, you should see a straight line when you chart the values.

Hopefully, if you overlay the Regression line with the existing demand data, your chart will look something like this:

Figure 4


You'll instantly notice I only generated the Regression, or as we're calling it, Trend line for the most recent 6 months.  This was another decision I made with the help of Agent GK.  There's no special secret to making this decision; it just seemed reasonable with respect to the data.

My next entry will explain the final steps of our journey:
  • Using the data generated above to calculate the difference between each data point and the Regression line
  • Calculating the Standard Deviation of those error values
  • Calculating the Coefficient of Variation with the Standard Deviation value.


Time to move.  February metrics are ready to be reported.

No comments:

Post a Comment