A selection of topics on IT and its application to finance.
Send me your comments, questions or suggestions by clicking here
Carrying forward data (or how to fill in the missing blanks)
I was working on a problem at work where I need to calculate the correlation between a pair of stocks (apparently this can help identify arbitrage opportunities for my esteemed colleagues in London). To calculate the correlation between (the prices) of two stocks over say, N days, for each of the N days you take the price of Stock1 minus its average over all N days. Do the same for stock2 then multiply these two figures together and divide by N * the standard deviation of stock1 * standard deviation of stock2. This is straightforward, assuming you can take into account or neglect FX issues and capitilisation changes. However it does depend on you having a price for each of the N days in question. Thats where I hit my first stumbling block. We operate a changes-only pricing mechanism whereby we only get a closing price from our data vendor if the closing price of a stock is different from its previous close. Normally this is the case but occasionally its not. So what you end up with is a table with data such as that shown below:-
P Date Stock Price
01-Apr-1997 Stagecoach 6.61
02-Apr-1997 Stagecoach 6.575
03-Apr-1997 Stagecoach 6.55
04-Apr-1997 Stagecoach 6.485
07-Apr-1997 Stagecoach 6.46
11-Apr-1997 Stagecoach 6.385
So, on the 8th,9th and 10th April 1997 Stagecoach's price closing price was unchanged from that of the 7th.
Filling in the missing data here is trivial to sort out using a procedural language such as PL/SQL for Oracle but I wanted to try and do it using pure SQL. I finally figured out how to do with a little help from Tom Kyte. You might not be surprised to know that it entails the use of analytic functions. Here's the solution:-
max(price) over (partition by max_rn order by p_date) price2
max(rn) over (order by p_date) max_rn
case when price is not null or price <> next_price then rn end rn
LEAD(price) over (order by p_date) next_price,
ROW_NUMBER() over (order by p_date) RN
from price_hist where stock = 'STAGECOACH'
and p_date between '01-apr-97' and '11-apr-97'
order by 1
Running this against the original table gives:-
P Date XXXXXXStock XXPrice Price2
01-Apr-1997 Stagecoach 6.61 X6.61
02-Apr-1997 Stagecoach 6.575 6.575
03-Apr-1997 Stagecoach 6.55 X6.55
04-Apr-1997 Stagecoach 6.485 6.485
07-Apr-1997 Stagecoach 6.46 X6.46
08-Apr-1997 Stagecoach XXXX6.46
09-Apr-1997 Stagecoach XXXX6.46
10-Apr-1997 Stagecoach XXXX6.46
11-Apr-1997 Stagecoach 6.385 6.385