messages per second across "all" feeds
I came across this compelling site which uses a hardware-based ticker plant (Exegy) in a colo environment to measure peak bandwidth across scads of NA feeds and then, every minute, updates a chart like the above to capture the average messages/sec across all of them. Pretty swank.
While the uninformed may rail against colocation rather than focus on less intriguing issues like banana-variety corruption, they miss the basic point that colo can be done by anyone with the checkbook and the wish to do so.
It’s sort of like that boat in Forrest Gump. Forrest wanted to be a shrimper. So he invested in a boat. With his initial capital, hard work, perseverance and a bit of luck, Forrest made a go of it. He might easily have not made it. Colo is like that. You can shrimp without a boat if you have a mask and fins, but it’s likely not a sustainable model… either way, it’s hard to see the harm in Gump’s boat. Or colocation.
Hat-tip to Rodrick’s Web Log !! for spotting the market data peaks site.
I’ve never been a hardware guy. Hardware has gotten so fast throughout my professional life that it has just never been a big issue. Also, on wall st we had a robust and annual budget for h/w so I’d routinely sign-off on hundreds of thousands of dollars on all sorts of machines I’d never lay eyes on and somehow they always did the trick.
Before 9/11, they’d be in server racks in the building or down the street, but since then they might also be in increasingly far-flung places like weehawken or long island, tampa, even texas or beyond. The machines always seemed unbelievably overpriced – I remember over the years pretty consistently paying something like $40K for a low-end db server. But that’s what it cost and you could only purchase approved products from approved channels, so nobody spent much thought on it. Now that I don’t have the same kinds of constraints – or budgets! – I increasingly have to think of hardware.
As a software engineer, the hardware itself is also insisting that I pay some uncharacteristic attention to it. The evolution of processors has reached a point where the programming paradigms many of us have fruitfully employed over many years are no longer suited for getting full performance out of today’s machines. The recent introduction of remarkably powerful and inexpensive parallel-computing platforms based on GPUs like nvidia’s cuda also outline a future that even current university training doesn’t address in a fashion practically adapted for institutional application. Cores are multiplying like Tribbles.
The lines between persistent storage and main memory are also blurring as consumer SSDs push up from the ‘low’-end while exotic ioDrives and the like offer a glimpse of a world where the performance gap between the two approaches nil and after their long reign myriad metallic platters will spin no more.
One Big Table (and chair)
Last time I described the trajectory of my research into using hdf5 for large amounts of tick data. This time I describe the basic design of the prototype I implemented and some of its performance characteristics.
One of the nicest things about the holiday season (Happy New Year, btw) is that it provides a lovely opportunity to spend some quality time with a project that’s a bit more exploratory than might be meaningfully undertaken while trading in lively markets.
A number of months ago, I mentioned using HDF5 to manage tick data as RDBMSes just aren’t up to the task and specialized Tick DBs are absurdly expensive. While I’d spent some time exploring this idea through the fall, I never had a discrete chunk of time to really explore the technology beyond determing that its Java interfaces weren’t production-worthy. This meant that we’d have to drop into C to access the functionality we’re interested in and that we’d have to come up with our own bridge out into Java for access by StratBox while StratCloud could access it directly.
Below, I describe what I’ve learned through my holiday geek-spelunking-trek including some timings on various configurable characteristics of HDF5 (e.g., compression and “chunking”).
Inevitably one of the first ideas people have when they start thinking about how to write a trading algorithm turns out to be among the hardest: trading the news. The problems are many and in some cases not so obvious…but the natural appeal of the idea seems universally compelling.
Just after the dot.com craze, a brilliant friend of mine (who had just sold his web consulting startup) decided to write a book. The premise was glorious. A bunch of clever college-age kids formed a startup to predict the stock market. The method they used was to constantly comb the web with ultra-sophisticated algorithms which would run across giant server farms overnight and ultimately generate tomorrow’s headlines. Based on the headlines that their system generated, they would place trades that would take advantage of these predicted events.
Sadly, my friend never went on to complete his book, so I don’t know how it all turned out. (Instead, he went on to start another successful company, this time in the field of robotics.) While he was writing it, I loved getting new drafts as they were filled with clever ideas. But the core idea of predicting headlines and then using those headlines to trade always struck me as especially cute.
For those of us without access to news-predicting algos, writing strategies based on the news is rather less straight forward, though there are a growing variety of products and services aiming to fill the gaps. Today must have been trading-the-news-day as I found a few articles on the topic in my mailbox and even received a cold call from a vendor, Need to Know News, with just such an offering. Below I’ll look at some of these offerings and consider some of the issues involved in writing trading strategies based on the news. Read more…
While Carl Sagan’s famous formulation introduced a generation to the vastness of the cosmos, more recent history suggests that his memorable term might now be more aptly applied to financial extents: our deficits and debts, perhaps, to the economically or politically minded. But for those of us with the markets on our mind, the term has to evoke the enormity of the data we create and must manage every day. We’ve recently been working with the NYSE’s TAQ data in an effort to integrate it into StratBox‘s back-testing and optimization capabilities. And the enormity of the data is really just staggering.
Each day, the NYSE publishes all of the day’s quotes and trades as well as some reference data. Compressed, the data will just about fit onto a DVD. For one day. A DVD. Compressed. It’s really mind-boggling. A year of the stuff, uncompressed, will require over a petabyte of storage. Over 1,125,899,906,842,624 bytes. And that’s just the US Equities markets. You want options data, too? I hope your uncle is named EMC, because just managing the data is going to be a challenge…