Statistical Analysis Part I: Standard Deviation and Correlation

Submitted By Aaron NematNejad

In this article we examine some of the main statistical tools used in trading securities. We use examples from trading long short equity pairs and analyze statistical methods to determine the quality of the trade. The statistical concepts covered include, averages, correlation, standard deviation, mean reversion.

Although statistical analysis could be used to analyze single stocks we will mainly apply the principles in long short equity pair trades. Reasons being that correlation can only be applied when describing the behavior of more than one asset. All the examples given are by being long Superior Energy (SPN) and short Cameron International (CAM). The ratio of two stocks is simply the price of the long stock divided by the price of the short position.  A fundamental analysis is made to buy SPN and sell CAM using the methods described in the section ¡°Advanced Equity Trading¡±. However before a trade is carried out we will need to check to see if the two stocks correlate with each other and if there is a significant mean reversion return. We will start of by explaining the main statistical tools.

Click Here to Download the Spreadsheet

Mean and Standard Deviation
The mean of a variable is the simple average of the variable. It is usually denoted with a bar on top of the variable being calculated. Its is calculated by summing the values of the variables, in our case the price ratio between the two stocks and then dividing the summation by the number of data points. This is expressed as

Mean

 

[1]

Mean = ratio1 + ratio 2 + ratio 3 + ………ratio n

                       ------------------------------------------------------           

                                    Number of observations

On 7/31/07 the ratio was 0.5169. If we download 1 year worth of data we will get 252 business days worth of prices. The average will be 0.5763. Many traders will look at a continues moving average figure instead a single static number. This is simply a rolling daily average going back the number of days you want to calculate the average for. For example a 30 day moving average is simply a chart or data figure representing what the 30 day average is at each point in time.

Standard deviation is measure that identifies the spread in a set of values. It is usually denoted by a lower case sigma (¥ò). To calculate the standard deviation, the mean needs to be calculated first using [1]. Followed by this a table needs to be created to calculate the resulting value when subtracting the mean from each price ratio for the sample. The square of the difference is taken to remove negative figures and the resulting value is divided by the number of observations. This number is the variance. The square root of the variance is the standard deviation. We will replace the stock price ratio by x, and the number of observations by N, to be consistent with mainstream mathematical notations.

[2]

If there is a large spread in values, the standard deviation of the sample will be large and vice versa. If there is no change in the variables, the sample will have 0 standard deviations.

Covariance and Correlation
In our case we need to apply the principles of covariance and correlation by looking at the two stocks separately and not as a price ratio. Here we are trying to determine the relationship between two random prices.
 
Covariance is a measure of how much two variables move together. This is different to variance which measures how much a single variable moves. If two variables move together the covariance is positive. If they move in opposite directions the covariance is negative. The formula is expressed as

[3]

In our case the function that we want to calculate is Cov (SPN,CAM).
Like standard deviation, when calculating covariance the range of values is infinite. Correlation however ranges between -100% to +100% which makes it a better measure. The relationship between correlation and covariance is
simply represented by the following formula:

[4]


Substituting equation [3] into [4] we arrive at

[5]

Correlation is an extremely important tool in trading for two reasons. If a trader wants to hedge a position he has to have confidence that the two stocks generally trade in the same direction so as to protect the value of the portfolio if the markets take the wrong turn.

Examples in Excel

The attached Excel spreadsheet will allow one to see how the principles and math discussed above can be applied in practice. The aim of the model in the spreadsheet is to evaluate if the long short pair idea is profitable. Note that the columns highlighted in yellow mean are columns where the numbers are calculated by using excel functions rather than manual calculations. For example columns H and I both calculate the standard deviations, however column H is calculated by using simple Excel additions and subtractions where as column I uses =STDEVA().

We start of by downloading the price history of the Superior Energy (SPN) and Cameron International Corp (CAM) in columns C and D. Column E is the price ratio which is simply column C divided by D. Column F is the mean using formula [1]. I included 2 columns that calculate the standard deviations. The first one in column H is calculated manually using equation [2] and column I calculates it using the Excel function =STDEV(¡¦). As we can see the two numbers back each other up. Columns J to N are similar except they analyze CAM instead of SPN.

Columns M to U calculate the statistical data using a combination of SPN and SAM, whether it is the correlation or price ratio. Column N contains the manual calculation for covariance using equation 2. Column O contains the same calculation but using the excel function =COVAR(). The two resulting numbers are the same.  Column P and Q are the correlations of the two stock, P is the manual calculation and Q uses the Excel function =CORREL(..).

Bollinger Bands
Bollinger Bands are plotted on a chart to indicate a certain number of standard deviations above and below the moving average. To plot Bollinger bands one needs to calculate the standard deviation of the stock price ratio. To plot the upper Bollinger chart one has to multiply the standard deviation by 2 and add that to the average. Similar method, for the lower Bollinger band where one subtracts two standard deviations from the average. These are represented in columns T and U.

August,6, 2007



Did you like this article?
 

One more

The last thing is that your columns H and N don't track very well against the Excel function. Sure, the latest cel backs up the function in column I and O, but if I fill the handwritten formula down, the handmade formula departs controlled flight very quickly. This formula should track the Excel function relatively closely as you're calculating a rolling value as opposed to a spot value.

Another question

In your worksheet, column M, the calculation of x minus x-bar times y minus y-bar has another anomaly. You've locked the formula so that all price values are subtracting the most recent mean value. If you're doing a rolling average, why would you subtract the latest mean? If anything, you'd be subtracting the mean of the whole range or at least the period n. I can't see why the price of SPN and CAM on 5/8/06 would be subtracting the mean of the data FORWARD over a year in July of 07. Can you explain your reasoning behind this?

Anomaly in your worksheet

In your Excel sheet, you calculate the SPN mean based on the n-day mean in cell B2 as your divisor. However, your CAM mean is based on the number of data points with cell B1 as your divisor. One is a 60 day mean, the other a 1 year mean. Shouldn't they both be the same, otherwise you're comparing apples to oranges? And what's the purpose of column H if you're using the STDEVA Excel function to calculate a rolling standard deviation?

Free Course

Ask An Expert