
One Big Table (and chair)
Last time I described the trajectory of my research into using hdf5 for large amounts of tick data. This time I describe the basic design of the prototype I implemented and some of its performance characteristics.
Read more…
EMS Internals, market data, open-source software, technology

One of the nicest things about the holiday season (Happy New Year, btw) is that it provides a lovely opportunity to spend some quality time with a project that’s a bit more exploratory than might be meaningfully undertaken while trading in lively markets.
A number of months ago, I mentioned using HDF5 to manage tick data as RDBMSes just aren’t up to the task and specialized Tick DBs are absurdly expensive. While I’d spent some time exploring this idea through the fall, I never had a discrete chunk of time to really explore the technology beyond determing that its Java interfaces weren’t production-worthy. This meant that we’d have to drop into C to access the functionality we’re interested in and that we’d have to come up with our own bridge out into Java for access by StratBox while StratCloud could access it directly.
Below, I describe what I’ve learned through my holiday geek-spelunking-trek including some timings on various configurable characteristics of HDF5 (e.g., compression and “chunking”).
Read more…
EMS Internals, market data, open-source software, post-trade analysis, technology
Inevitably one of the first ideas people have when they start thinking about how to write a trading algorithm turns out to be among the hardest: trading the news. The problems are many and in some cases not so obvious…but the natural appeal of the idea seems universally compelling.
Just after the dot.com craze, a brilliant friend of mine (who had just sold his web consulting startup) decided to write a book. The premise was glorious. A bunch of clever college-age kids formed a startup to predict the stock market. The method they used was to constantly comb the web with ultra-sophisticated algorithms which would run across giant server farms overnight and ultimately generate tomorrow’s headlines. Based on the headlines that their system generated, they would place trades that would take advantage of these predicted events.
Sadly, my friend never went on to complete his book, so I don’t know how it all turned out. (Instead, he went on to start another successful company, this time in the field of robotics.) While he was writing it, I loved getting new drafts as they were filled with clever ideas. But the core idea of predicting headlines and then using those headlines to trade always struck me as especially cute.
For those of us without access to news-predicting algos, writing strategies based on the news is rather less straight forward, though there are a growing variety of products and services aiming to fill the gaps. Today must have been trading-the-news-day as I found a few articles on the topic in my mailbox and even received a cold call from a vendor, Need to Know News, with just such an offering. Below I’ll look at some of these offerings and consider some of the issues involved in writing trading strategies based on the news. Read more…
back-testing, market data, startup, strategy development, technology

While Carl Sagan’s famous formulation introduced a generation to the vastness of the cosmos, more recent history suggests that his memorable term might now be more aptly applied to financial extents: our deficits and debts, perhaps, to the economically or politically minded. But for those of us with the markets on our mind, the term has to evoke the enormity of the data we create and must manage every day. We’ve recently been working with the NYSE’s TAQ data in an effort to integrate it into StratBox’s back-testing and optimization capabilities. And the enormity of the data is really just staggering.
Each day, the NYSE publishes all of the day’s quotes and trades as well as some reference data. Compressed, the data will just about fit onto a DVD. For one day. A DVD. Compressed. It’s really mind-boggling. A year of the stuff, uncompressed, will require over a petabyte of storage. Over 1,125,899,906,842,624 bytes. And that’s just the US Equities markets. You want options data, too? I hope your uncle is named EMC, because just managing the data is going to be a challenge…
Read more…
back-testing, market data, open-source software, post-trade analysis, technology