Weekly maintenance every saturday 03:00 - 06:00 UTC

Expected return and covariance matrix calculation.

Expected return of the asset for holding period can be calculated as

$R = \frac{V_f - V_i}{V_i}$

where

$V_f$ = final value, including dividends and interest

$V_{i}$ = initial value

and the excess return formula is $R_p = R - 1$

$V_{i}$ = initial value

Expected return of the asset for a long time period in most cases calculated as the geometric mean of the asset return for a small-time periods.

$\overline{R} = \left(\prod _{i=1}^{N}a_{i}\right)^{\frac {1}{N}}={\sqrt[{N}]{a_{1}a_{2}\cdots a_{N}}}$

or, equivalently, as the arithmetic mean in logscale:

$\overline{R} = \exp {\left({{\frac {1}{N}}\sum \limits _{i=1}^{N}\ln a_{i}}\right)}$

where $a_i$ are the asset returns for each period, $N$ is a number of periods.

The expected return of the linear combination of $N$ random variables $x_i$ with coefficients $w_i$ correspondingly can be calculated as

$\overline{R} = \sum _{i=1}^{N}w_{i} \mu_i$

Where $\mu_i$ is the expected return of $i$ - th random variable.

or, in matrix notation

$\overline{R} = \mathbf{w} \mathbf{\mu}$

Where $\mathbf{w}$ is the the vector of coefficients, $\mathbf{\mu}$ is the vector of the expected return of the vector of random variables.

The variance is the expectation of the squared deviation of a random variable from its population mean or sample mean.

The formula for biased sample variance is:

$\begin{aligned}\sigma ^{2}&={\frac {1}{N}}\sum _{i=1}^{N}\left(a_{i}-\mu \right)^{2}={\frac {1}{N}}\sum _{i=1}^{N}\left(a_{i}^{2}-2\mu a_{i}+\mu ^{2}\right)\\[5pt]&=\left({\frac {1}{N}}\sum _{i=1}^{N}a_{i}^{2}\right)-2\mu \left({\frac {1}{N}}\sum _{i=1}^{N}a_{i}\right)+\mu ^{2}\\[5pt]&=\left({\frac {1}{N}}\sum _{i=1}^{N}a_{i}^{2}\right)-\mu ^{2}\end{aligned}$

where the sample mean is

${\displaystyle \mu ={\frac {1}{N}}\sum _{i=1}^{N}a_{i}.} $

The formula for unbiased sample variance is:

$ {\displaystyle \sigma ^{2}={\frac {1}{N - 1}}\sum _{i=1}^{N}\left(a_{i}-\mu \right)^{2}. }$

The standard deviation of the asset return is the square root of the variance.

$\sigma_p = \sqrt{{\sigma}^2}$

The covariance is a measure of the joint variability of two random variables.
The biased estimate of the sample covariance of two assets $X, \;Y$ can be calculated as:

$cov(X, Y) ={\frac {1}{N}}\sum _{i=1}^{N}\left(X_{i}-\mu_{x} \right) \left(Y_{i}-\mu_{y} \right)$

The unbiased estimate of the sample covariance of two assets $X, \;Y$ can be calculated as:

$cov(X, Y) ={\frac {1}{N - 1}}\sum _{i=1}^{N}\left(X_{i}-\mu_{x} \right) \left(Y_{i}-\mu_{y} \right)$

Where $X_i, \; Y_i$ are the asset returns for assets $X, \;Y$ for each period, $\mu_{x}, \; \mu_{y}$ are their corresponding sample means, $N$ is a number of periods.

Note that covariance of the random variable $X$ with itself is the variance of $X$, i.e.

$cov(X, X) = var(X) = {{\sigma}_x}^2$

The covariance matrix $A$ for $M$ random variables is a square $M \times M$ matrix with elements $a_{ij} = cov(X_i, X_j)$; where $X_i, \;X_j$ are the $i, \; j$ - th random variable correspondingly; $i, \; j \in \{1, \dots , M \}$. The covariance matrix in the context of the portfolio optimization theory is the covariance matrix of the returns of the stocks included in the portfolio.

The variance of a linear combination of random variables weighted with the weight vector $\mathbf{w}$ (portfolio variance) can be calculated as: $$ var(R) = \mathbf{w}^T \mathbf{\Sigma} \mathbf{w} $$

Where $\mathbf{\Sigma}$ is the $n \times n$ covariance matrix calculated for the random variables (returns of the stocks in our portfolio), $\mathbf{w} \in \mathbb{R}^n$ is our weight vector.


Some tips:

Try to use at least 10 years of history data while optimizing portfolio because there are quite big falls of stock prices nearly every 10 years. It is noted in the Economic Cycles theory (Juglar cycle). You can optimize portfolio with up to 150 stock symbols using Daily stock history data for free account type. Also try to use a small (some years maybe) time period of stock quote history to check if the expected return/volatility for some stocks from the chosen stock set is not degraded recently.

About the Covariance matrix shrinkage:

This is essential that the number of the stock price data for calculating the covariance matrix should be greater than the overall symbol count in your portfolio, because in the other case the covariance matrix will be singular (i.e. non-invertible) [9]. Even in the case when the number of the stock price data points is not bigger enough than the overall symbol count in your portfolio, you can get a poor estimate for the covariance matrix if you will not use the covariance matrix shrinkage [9], [10]. In that case it is recommended that you will use the shrinkage estimator for the covariance matrix calculation.


About the stock data timeframe:

It should be clear from above, that the larger the number of the stock historical price samples, the larger stock count can be used to get the good estimation of the covariance matrix. For example if you are using 10 years of the history data with the monthly timeframe, you can get a descent estimation of the covariance matrix for about 10 stocks in your portfolio. For 10 years of stock data with the daily timeframe you can get a good covariance matrix estimation for about 100 stocks in your portfolio.

References:

[1] Expected value

[2] Variance

[3] Covariance

[4] Covariance Matrix

[5] Mean-variance optimization

[6] Portfolio optimization

[7] Portfplio optimization 2

[8] Sharpe ratio

[9] Economical cycles

[10] Improved Estimation of the Covariance Matrix of Stock Returns With an Application to Portfolio Selection

[11] Honey, I Shrunk the Sample Covariance Matrix