Wednesday, March 20, 2013

Indices, there are so many of them ...

The good thing about indices (and hopefully pretty soon ETFs tracking them) is that there are so many of them.

The bad thing about indices ... you know it was coming ... is that there are so many of them.

On last count, NSE site listed as many as 33 separate indices (or is it indexes?)

There are 8 broad market ones such as cnx nifty, nifty junior, cnx 500 etc. Then there are 11 covering sectors such as cnx pharma, cnx realty etc. Then it gets pretty esoteric: we have indexes covering themes such as consumption and strategies such as low volatility.

Which ones should you track? Ideally, you want to track only the unique ones which are not so correlated with other.

I calculated the co-relation matrix of the indices with each other based on daily returns in 2012.



As you can see from image (click for larger view), only the FMCG, Pharma, IT and media indexes have their own personality. All others are highly co-related with each other.

Maybe IT and pharma have something common in that they are dominated by companies which are export oriented and benefit by depreciating Rupee which was the case in 2012.

Though this analysis was based on 2012 data, the above co-relations continue to be valid so far in 2013. Pharma, FMCG and IT were the only indices trading above their 50 day moving average as of today.

Code:

Wednesday, October 10, 2012

Nifty flash crash analysis - part 2


When we look at the order book in real time, we get to see only top 5 buy and sell orders in the book. That is just the tip of the iceberg and one is left wondering how much of the iceberg is really below the surface. As it turns out, not much!

Thanks to the  flash crash, we just got a sneak peek below the surface.

Take a look at the plot below which depicts transacted value (in Cr) during 43rd minute plotted against the price drop during that minute. Click the image for more resolution.



Two things stand out:
1) As shown by the blue line, for the smaller transacted values, the drop is roughly proportional to the transacted value. Even in this range, the impact cost is already above 5%. Market simply cannot absorb more than 5 Cr worth of transaction in a single name nifty component.

2) There is a cluster of names around 20% drop. These are likely to be passive limit orders in the system where the buyer didn't really expect to be hit. Since emkay did not ask for trades to be reversed, that's one lucky and/or smart buyer.

I wonder what would have happened if there were limit buy orders at totally ridiculous prices. Maybe we could have seen the trades like Accenture trading at 1 cent which happened in US market flash crash on  May 6, 2010.

Note to self: Figure out how to automatically put in ridiculous buy orders every day at open. Chances of getting lucky are non-zero.

Update (11/Oct/2012):
In the post I speculated about smart/lucky guy having limit orders 20% below CMP. Today's Economic times story confirms that hypothesis.

How Inventure's client made a killing from Nifty's Friday crash
"He randomly puts many buy orders in stocks that are part of the Nifty at 20% below the market value. On Friday, this bet clicked."

Nifty flash crash analysis - part 1


Investing is an observational science  i.e. We cannot run controlled experiements and see what happens. Sometimes, a natural experiment takes place and we get to see some of the market mechanisms.

Nifty flash crash of October 5 offers just such an opportunity for a data driven investigation. For detailed analysis, we will have to wait till NSE releases the trade-by-trade data. In the meanwhile, we can look at the minute-by-minute data (available via Google Finance or vendors like Global datafeed) and make some conjectures.

From the story in the media, we know that a trader at Emkay Securities punched wrong orders that amounted to Rs 650 Cr sell orders hitting the market and thereby leading to flash crash.

From the minute-by-minute charts, it is clear that all hell broke loose in the 43rd minute. Let's piece together the minute-by-minute trading value for nifty components for the minutes leading up to the 43rd minute.

Minute, Approximated Traded Value (Cr)
40, 15.9
41, 19.5
42, 35.7
43, 610.8
44, 39.6
... Market halted

The minute data that we have roughly matches the news report value of 650 Cr of transactions happening in a small time window. Let's drill down into minute number 43.

We know that what emkay tried to execute was a Nifty basket sell order i.e. all the nifty components should be sold in proportion to their weight in Nifty. For example, Hindustan Lever which has 3.2% weight in the Nifty should have been sold approx 19.5 Cr (610 Cr basket order * 3.2%). And this is exactly what we find in the minute-by-minute data.

Here is a chart of expected traded value Vs actual traded value for the 43rd minute. (Click the image for higher resolution)


The blue line indicates the line where expected traded value is equal to actual traded value. This is the line on which the order book presumably had sufficient depth (obviously at atrocious impact cost – but that is the topic of next post).

The interesting part is to look at the outliers and speculate what could have caused them. For scrips below the blue line - such as ITC, HDFCBANK, RELIANCE, INFY, ICICIBANK - it is very likely that there was not enough depth in the limit buy order book at any price. In other words, order books were burned in the 43rd minute. 

What about scrips above blue line such as HDFC and LT which have transacted value above the ones predicted by our model? There must have been some other big traders in the market in the 43rd minute in these scrips apart from emkay. 

Bulk trade data corraborates this theory for HDFC. There was a bulk sale trade by Carlyle on October 5th


Bulk Deals Historical Data


Date Symbol Security Name Client Name Buy / Sell Quantity Traded Trade Price /
Wght. Avg.
Price
Remarks
05-Oct-2012 HDFC HDFC Ltd. CMP ASIA LIMITED SELL 430,00,000 761.08 -


Takeaways:

1) Indian markets hardly have any depth. It is surprising and worrying to see that a mere 15 Cr worth of sell order was sufficient to knock off 44,000 Cr of value from ITC's market cap for a few seconds. What if this happens in the last couple of minutes on F&O expiry date?

2) It is also interesting (well, not really!) to note that some players in the market were already aware of HDFC Carlyle trade happening and were positioned with buy orders.

Monday, August 6, 2012

Sell in May and go away?

Recently came across a paper which claims that the 'Sell in May and go away' strategy continues to work in many markets.

Here is the paper abstract:
Abstract:     
We perform the first out-of-sample test of the Sell in May effect studied by Bouman and Jacobsen (American Economic Review, 2002). Surprisingly to us, the old adage "Sell in May and Go Away" remains good advice. Reducing equity exposure starting in May and levering it up starting in November persists a profitable market timing strategy. The economic magnitude of the effect is the same in- and out-of-sample: on average, stock returns are about 10 percentage points higher in November to April semesters than in May to October semesters.  


10 percentage points out-performance with just 2 trades per year? I had to try this at home.

After about 30 minutes on a rainy afternoon with R and some friendly packages (hat tip to ggplot2, plyr, lubridate, xts) my hopes were dashed.

Nope, doesn't work in India.




If anything, it is: Buy in May and stay put till Santa Clause!
(sigh, doesn't rhyme!)

R code for those interested follows:


Thursday, March 15, 2012

Book Review: Think Stats

I reviewed 'Think Stats' by Allen Downey on Amazon.com

One  line summary: Quickest introduction to Bayesian stats if you know some python programming

Friday, March 9, 2012

Pre budget rally - what pre budget rally?

Media has moved away from UP elections and the talking heads are now talking about budget. I heard this term 'pre budget rally' being tossed around.


Do they actually bother to go and look at the data? I mean in this case, the data is easily available and calculation is trivial. You don't even need Excel, a calculator is sufficient.

Year Budget Date return 1 week before return 1 week after
2000-2001 29-Feb-2000 -4.84 2.90
2001-2002 28-Feb-2001 -1.36 -4.51
2002-2003 28-Feb-2002 -0.68 4.47
2003-2004 28-Feb-2003 -0.26 -4.35
2004-2005 (interim) 3-Feb-2004 -7.12 6.32
2004-2005 8-Jul-2004 -1.24 1.40
2005-2006 28-Feb-2005 2.94 2.70
2006-2007 28-Feb-2006 1.29 3.52
2007-2008 28-Feb-2007 -8.57 -3.16
2008-2009 29-Feb-2008 2.21 -8.65
2009-2010 (interim) 16-Feb-2009 -2.45 -4.02
2009-2010 6-Jul-2009 -5.13 -4.60
2010-2011 26-Feb-2010 1.60 3.38
2011-2012 28-Feb-2011 -3.36 2.44
2012-2013 16-Mar-2012





Average (%)
-1.93 -0.16    

What pre-budget rally? Didn't happen at least in last decade. If anything, the result in the week after budget has been better than the week before.

However, since the number of data points in this case is too small (a grand total of 14 points including the interim budgets), it is hard to say whether the difference between pre-budget week and post-budget week is statistically significant.

I don't prefer to dwell in such 'tiny data' domain. Visualizations can actually mislead you in this case e.g. you might conclude that week before budget looks bad. This is where the classical statistics can help. T-test was designed precisely for such a purpose.

> t.test(df$ret.before, df$ret.after)

 Welch Two Sample t-test

data:  df$ret.before and df$ret.after 
t = -1.1539, df = 24.465, p-value = 0.2597
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -4.938753  1.394467 
sample estimates:
 mean of x  mean of y 
-1.9264286 -0.1542857 

And, the t-test tells us that difference is not statically significant. So what to do this week? Whatever you do, don't cite the budget week as reason.

Thursday, February 23, 2012

Book review: machine learning for hackers

There is a new book out titled 'Machine learning for hackers' from the publishing house O'Reilly.

I have written a review which is now live on Amazon.

One line summary: 5-stars if you already know some R.

I will be using some of the machine learning techniques in the forthcoming series of posts. Since this is a vast field, there is always that nagging feeling at the back of your mind that maybe what you are doing is naive and not at all what the guru data miners will do. Maybe there is some complex technique that can bypass all the grunt work of data scraping / cleaning / plotting and just transform the data into some n-dimensional space where the problem is trivially solved :)

The book served as a morale boosting benchmark. Authors are well known bloggers in R community. It is reassuring to learn that every traveler to machine learning promised land has to muddle through the messy swamp of unclean data and foggy relationships.