or: scaling with elastic map-reduce
Between a rapidly evolving compute environment in which cores are multiplying like springtime rabbits, and a business domain in which the fecundity of market data is making those same rabbits look downright prudish, we are always looking for ways to scale our efforts. There are three levels at which this can be typically done:
- “below the processor” with things like CUDA or FPGAs,
- “amongst the cores” with things like TBB, Cilk, &tc, or
- “in the cloud” (or grid) with things likes Amazon’s EC2 and its bewildering and fast-growing coterie of associated technologies and products
Today we’ll look at a simple example which we implement on top of Hadoop and then deploy into Amazon’s cloud to get a back-of-the envelope feel for what kind of scaling we might expect to gain – and at what costs – from using this smorgasboard of technologies.
People have asked me how I go about implementing a strategy in Stratbox. While I’ve illustrated a good number of strategies running in Stratbox in these pages, I’ve never walked through a non-trivial example from conception, through design, implementation and iteration. Today we’ll go through a reasonably complex example in total (I’ll provide source) detail.
The example I’ve chosen is, I think, very nice because it’s a portfolio-oriented strategy, which is pretty much the only kind I care to explore; it’s also based around the concept of pairs trading, which is something which most can easily relate to; and finally, it’s already public domain and yet almost certainly has some juice in it for those who care to understand it and extend it intelligently.
The example comes from the blog of a company, Palantir which does (something like?) analytical / decision support software for both finance and intelligence-gathering services (quants and spooks – spooky quants?). The specific example is here and is described thusly: