<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hack the market &#187; open-source software</title>
	<atom:link href="http://www.puppetmastertrading.com/blog/index.php/category/open-source-software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.puppetmastertrading.com/blog</link>
	<description>algorithmic trading experiences</description>
	<lastBuildDate>Sat, 20 Nov 2010 14:46:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Kooderive</title>
		<link>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/</link>
		<comments>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 15:31:20 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[monte-carlo methods]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[options pricing]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=1000</guid>
		<description><![CDATA[Some time back, I&#8217;d written about NVidia&#8217;s CUDA noting that it looked ideal for many asset-pricing and monte-carlo type problems in finance.  At the time, I was hopeful that it would be quickly integrated into existing open source efforts like QuantLib, but adoption has proved slower than I&#8217;d hoped, most likely because implementing non-trivial problems [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 240px"><img class="   " src="/images/cuda_simonRogerson.jpg" alt="photo by Simon Rogerson" width="230" height="173" /><p class="wp-caption-text">photo by Simon Rogerson</p></div>
<p>Some time back, I&#8217;d <a title="TESLA &amp; CUDA" href="http://www.puppetmastertrading.com/blog/2008/11/29/nvidias-tesla-and-the-compute-unified-device-architecture/" target="_blank">written</a> about NVidia&#8217;s CUDA noting that it looked ideal for many asset-pricing and monte-carlo type problems in finance.  At the time, I was hopeful that it would be quickly integrated into existing open source efforts like <a title="QuantLib: a free/open-source library for quantitative finance" href="http://quantlib.org/" target="_blank">QuantLib</a>, but adoption has proved slower than I&#8217;d hoped, most likely because implementing non-trivial problems on CUDA is, well, even less trivial than doing them without..</p>
<p><strong>LMM on CUDA</strong></p>
<p><strong> </strong>Happily, I&#8217;ve just seen a promising first step in this direction as Über-quant and C++ artisan <a title="Mark Joshi" href="http://www.markjoshi.com/" target="_blank">Mark Joshi</a> recently announced an open-source project, <a title="Sourceforge: Kooderive" href="http://sourceforge.net/projects/kooderive/" target="_blank">Kooderive</a> which looks to implement the <a title="Wiki: LMM" href="http://en.wikipedia.org/wiki/LIBOR_market_model" target="_blank">LIBOR Market Model</a> (LMM)  on top of CUDA.  His announcement on the QuantLib mailing lists reads:</p>
<blockquote><p>Dear All,</p>
<p>various people have shown interest in the use of <span id="lw_1265210335_0">CUDA</span> with QuantLib. I<br />
have now made some progress on a CUDA implementation of the <span id="lw_1265210335_1" style="border-bottom: 1px dashed #0066cc; background: transparent none repeat scroll 0% 0%; cursor: pointer;">LIBOR<br />
market model</span>.</p>
<p>In particular, I now have a path generator for the LMM working which<br />
does 16384 paths for 40 rates, 40 steps, 5 factor model, displaced<br />
diffusion predictor-corrector that takes 0.1 seconds on my Quadro 4600.</p>
<p>The state of the project is code fragments that can be called from<br />
other code. Those who are interested can get the code via<br />
the subversion repository on <a href="http://kooderive.sourceforge.net/" target="_blank"><span id="lw_1265210335_2">kooderive.sourceforge.net</span></a> .  The only<br />
project file is currently for VC9 x64. It also uses thrust and the<br />
CUDA SDK.</p>
<p>The next stage will be writing routines, that use QuantLib for the CPU<br />
stuff and kooderive for the GPU stuff,  to actually price things.</p>
<p>A gentle reminder that I will be giving a course on the LMM and<br />
QuantLib in June in <span id="lw_1265210335_3" style="background: transparent none repeat scroll 0% 0%; cursor: pointer;">London</span>, and I will include a session on kooderive<br />
if there<br />
is sufficient interest.</p>
<p>I am happy to take code contributions for kooderive. However, I am not<br />
looking for a redesign of the library or contributions which introduce<br />
dependence on other libraries. I am interested in contributions of<br />
separate routines and of optimizations of existing routines that do<br />
not change interfaces.</p>
<p>regards</p>
<p>Mark<br />
&#8211;<br />
Pricing exotic <span id="lw_1265210335_4" style="border-bottom: 1px dashed #0066cc; background: transparent none repeat scroll 0% 0%; cursor: pointer;">interest rate derivatives</span> &#8211; The <span id="lw_1265210335_5" style="background: transparent none repeat scroll 0% 0%; cursor: pointer;">LIBOR Market Model</span> in<br />
QuantLib <span id="lw_1265210335_6" style="border-bottom: 1px dashed #0066cc; cursor: pointer;">June 2010</span>, London,<br />
<a href="http://www.moneyscience.com/training/index.html" target="_blank"><span id="lw_1265210335_7">http://www.moneyscience.com/training/index.html</span></a></p>
<p>Assoc Prof Mark Joshi<br />
Centre for Actuarial Studies<br />
<span id="lw_1265210335_8">University of Melbourne</span><br />
My website is <a href="http://www.markjoshi.com/" target="_blank"><span id="lw_1265210335_9">www.markjoshi.com</span></a></p></blockquote>
<p><span><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>go figure</title>
		<link>http://www.puppetmastertrading.com/blog/2009/01/14/go-figure/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/01/14/go-figure/#comments</comments>
		<pubDate>Thu, 15 Jan 2009 01:05:16 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[open-source software]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=333</guid>
		<description><![CDATA[As I&#8217;ve written before, I&#8217;m not a particularly big fan of technical analysis or any of the many and varied charting techniques people espouse.  That said, we are working with a proprietary futures trading company and some of the successful (non-algo) trading that they do involves point-and-figure charts.  Although a trading algorithm doesn&#8217;t care about [...]]]></description>
			<content:encoded><![CDATA[<p><applet width="450" height="550"  codebase="pf" code="examples.PandF" archive="/randomWalk/pf.jar">  </applet><br />
As I&#8217;ve <a title="Every sunken ship..." href="http://www.puppetmastertrading.com/blog/2008/11/12/every-sunken-ships-got-a-room-full-of-charts/" target="_blank">written before</a>, I&#8217;m not a particularly big fan of technical analysis or any of the many and varied charting techniques people espouse.  That said, we are working with a proprietary futures trading company and some of the successful (non-algo) trading that they do involves <a title="Point &amp; Figure Charts" href="http://www.investopedia.com/terms/p/pointandfigurechart.asp" target="_blank">point-and-figure charts</a>.  Although a trading algorithm doesn&#8217;t care about graphical representations, I wasn&#8217;t familiar with the technique and decided that the best way to understand it was to try to implement it, which is how I spent my Saturday evening &#8230;</p>
<p>The above applet re-uses the <a title="Engineering Randomness" href="http://www.puppetmastertrading.com/blog/2008/01/06/engineering-randomness/" target="_blank">one</a> I&#8217;d written previously in discussing simple stochastic processes.  This time, it illustrates a point &amp; figure chart below the regular line chart.  Point &amp; figure charts expose two characteristics: a &#8220;box size&#8221; (in ticks) and a &#8220;reversal&#8221; (in boxes).  The applet allows you to vary both and then generate a day&#8217;s worth of random/synthetic data to view it.  One of the nice features of <a title="JFreeChart" href="http://www.jfree.org/jfreechart/" target="_blank">JFreeChart</a> is that you can easily &#8220;zoom&#8221; into a chart by dragging within the chart.  I&#8217;ve disabled this in the line chart but you can try it in the p&amp;f chart.  (Note: you should right-click and &#8220;Auto-Range-Both Axes&#8221; before you generate new data or you&#8217;ll stay in the zoomed segment of the chart.)</p>
<p>Now that I think I understand the basics of point &amp; figure charting, it will be interesting to see what an algo might do with it&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/01/14/go-figure/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>tick data &amp; hdf5 (part 2)</title>
		<link>http://www.puppetmastertrading.com/blog/2009/01/06/tick-data-hdf5-part-2/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/01/06/tick-data-hdf5-part-2/#comments</comments>
		<pubDate>Tue, 06 Jan 2009 18:46:06 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[market data]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=280</guid>
		<description><![CDATA[Last time I described the trajectory of my research into using hdf5 for large amounts of tick data.  This time I describe the basic design of the prototype I implemented and some of its performance characteristics. Prototype Design With One Big Table (OBT) holding all of your data, you need some help finding what you [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption aligncenter" style="width: 510px"><img src="/images/obt.jpg" alt="" width="500" height="375" /><p class="wp-caption-text">One Big Table (and chair)</p></div>
<p style="text-align: left;"><a title="part 1" href="http://www.puppetmastertrading.com/blog/2009/01/04/managing-tick-data-with-hdf5/" target="_blank">Last time</a> I described the trajectory of my research into using hdf5 for large amounts of tick data.  This time I describe the basic design of the prototype I implemented and some of its performance characteristics.</p>
<p style="text-align: left;">
<p style="text-align: left;"><span id="more-280"></span></p>
<p style="text-align: left;"><strong>Prototype Design</strong></p>
<p style="text-align: left;">With One Big Table (OBT) holding all of your data, you need some help finding what you need in that data.  To this end, I wrote an index on the big table which basically stored, per instrument, beginning and ending indices into the OBT as well as the timestamp at each extreme.</p>
<p style="text-align: left;">So, the main table was: { conid, timestamp, open, high, low, close, adjustedClose, volume } and the index table was: { conid, minIndex, maxIndex, minTimestamp, maxTimestamp }.  The index is read into memory and kept with a few other handy bits of data in a structure which represents a &#8220;connection&#8221; into the OBT.</p>
<p style="text-align: left;">To identify contracts, I use a <em>long int</em> contract identifier which is already employed within our environments.  For timestamps I used the java convention of using a <em>64-bit long</em> denoting milliseconds since the “epoch” and, initially, used <em>mktime</em> to support this approach in C.  After my first iteration, I found that an incredible proportion of my time spent writing a HDF5 table from a CSV file was spent in <em>mktime</em> whereupon my <a title="Professor Giorgio Ingargiola" href="http://www.cis.temple.edu/~ingargio/" target="_blank">Dad</a> suggested the use of <em>gmtime</em>.  Remarkably, this yielded a fully 40% improvement to the process!  It&#8217;s nice having a guru in the family.</p>
<p style="text-align: left;">In order to retrieve data for one contract, I implemented an iterator which would find the appropriate section of the OBT for the particular contract.  Another handy piece of data the connection struct maintained is the entire timestamp column for the main table.  Clearly, this isn&#8217;t scalable for really big datasets and for the real implementation I&#8217;ll have to read this in on an instrument basis at the time of a query.  But for this data, this seemed the sensible implementation.  Two binary searches are performed on the appropriate, indexed subset of this big array to determine exactly which records will need to be read to satisfy the iterator&#8217;s query.  Thus, the iterator is primed with the exact location of the data it will require on initialization.  Then, buffered reads are performed by the iterator as data is requested from it.</p>
<p style="text-align: left;">My only critical use-case is to retrieve a time-ordered stream of OHLCVs (in this case) across potentially many contracts.  This one query meets my needs both for back-testing purposes as well as statistical calculations.  But it requires an efficient merge operation across a potentially large set of these iterators.  To accomplish this, I&#8217;ve got a composite iterator which uses a red-black tree to keep all of the contained iterators sorted in the order of their <em>Next()</em> OHLCV.  Thus, the composite iterator will always return the oldest OHLCV amongst all of its contained iterators.</p>
<p style="text-align: left;">That&#8217;s pretty much it.  Apart from the cached timestamp column, this should all scale well to datasets the size of a day&#8217;s worth of tick data for the US equity markets with a few foreign markets and maybe some futures thrown in as well.  Options data is a worse case than the one I&#8217;m envisioning, but I imagine a similar approach would still work, though perhaps you may have to distribute a day&#8217;s data across multiple files.</p>
<p style="text-align: left;"><strong>Benchmarking the Prototype</strong></p>
<p style="text-align: left;">The data I tested this implementation on is daily data going back to 1990 on some 7100 us, lse and hk equities.  In uncompressed, CSV format, the data weighed-in at 850M and was made-up of ~16.5M records.  The tests I ran were:</p>
<ul>
<li>Read the CSV file and write an HDF5 file varying the compression { TRUE | FALSE } and hdf5 chunk sizes { 2^12, 2^13, 2^15, 2^17, 2^19 }.  (For the chunksizes, I based it roughly on the <a title="PyTables chunksize Guidelines" href="http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune" target="_blank">excellent PyTables documentation</a> and then expressed my preference for base-2 scales&#8230;)</li>
<li>Respond to 100 queries across {1, 2^4, 2^8, 2^10, 2^12) randomized contracts over a randomized 2-year period (within the 19 year range).</li>
</ul>
<p>The first part is really about hdf5 and how its different options will affect my results.  The second is my actual use-case: &#8220;give me an ordered stream for some set of contracts over some (two-year) period&#8221;.</p>
<p>The results were interesting as these parameters matter.  Especially compression.  Below you can see the results of the tests.</p>
<div class="wp-caption aligncenter" style="width: 406px"><img title="HDF5 Write Performance " src="/images/h5Write.jpg" alt="HDF5 Write Performance " width="396" height="244" /><p class="wp-caption-text">HDF5 Write Performance </p></div>
<p>This &#8220;write test&#8221; is just a C program which reads in an 848M CSV file and writes and indexes an HDF5 file using the OBT approach as described above.  Apart the curious bump in file size for the Compressed+8192 variant, the results aren&#8217;t too remarkable except to note that compression wants smaller chunksizes.  Badly.</p>
<div class="wp-caption aligncenter" style="width: 548px"><img title="HDF5 Read Performance " src="/images/h5Read.jpg" alt="" width="538" height="244" /><p class="wp-caption-text">HDF5 Read Performance </p></div>
<p>Likewise, the read test is the same C program which then reads from each of the files written with the varying HDF5 write parameters.  Here we really see the effect of compression on performance.  It seems that if you want performance, then compression isn&#8217;t for you.  If you need compression, then you need to use small chunksizes.</p>
<p>Apart this, it seems that performance from 1-4096 randomly selected contracts degrades <em>reasonably</em> linearly.  The red-black tree is doing an effective job of merging even a reasonably large number of streams.</p>
<p>From an absolute perspective, the performance strikes me as pretty smoking in the good cases.  In a two year period, you&#8217;ll have about 500 trading days.  So, for the {No-compression/32768 chunk size/256 contracts} case we have (500 * 100 * 256) / 10 =  ~1.28M records per second including all of the look-ups.</p>
<p><strong>Java + SWIG + HDF5</strong></p>
<p>One of my needs is to be able to access this functionality in Java so that the StratBox GUI can use this data.  To that end, I made sure that as I was developing the prototype I maintained SWIG interfaces and a parallel test driver in Java.  Apart the initial set-up, this proved pretty easy, though adding SWIG to a project does add some complexity.  In any case, I wanted to see how bad a performance hit I&#8217;d get running the same tests in Java.  Again, looking at the case highlighted above { No-compression/32768 chunk size/256 contracts}, the Java/SWIG timings are about twice those of HDF5&#8242;s native C.  So, we take a pretty significant hit, but it seems unavoidable and ~600K records a second isn&#8217;t exactly slow.</p>
<div class="wp-caption aligncenter" style="width: 548px"><img title="HDF5 Read performance from Java / SWIG" src="/images/h5Java.jpg" alt="HDF5 Read performance from Java / SWIG" width="538" height="64" /><p class="wp-caption-text">HDF5 Read performance from Java / SWIG</p></div>
<p><strong>Conclusion</strong></p>
<p>The prototype I&#8217;ve implemented has the nice characteristic that its design is very similar to what I expect should work well with much larger quantities of tick data (as opposed to the ohlcv data I&#8217;ve used here).  The only significant difference is that seeking within an indexed range would be slower as I&#8217;d first have to read in the range instead of keeping it handy in memory.  Apart this, I&#8217;d need a layer on top of what I have here to manage HDF5 files as well.  Of course, until I actually implement this for tick data, it&#8217;s not certain that it will be adequately performant on that much harder case, but I&#8217;m reasonably confident that something like what I&#8217;ve done here could be made to work well.</p>
<p>There&#8217;s a significant learning curve to using HDF5 and one clearly has to spend some time benchmarking for the specific requirements to make sure that the settings used are appropriate.  If I were a little less obsessed with speed or a little more discerning about how I spend my holidays, I&#8217;d likely find using PyTables or something similar to be a much better solution than trying to roll my own in this fashion.</p>
<p style="text-align: left;">
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/01/06/tick-data-hdf5-part-2/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>managing tick data with hdf5</title>
		<link>http://www.puppetmastertrading.com/blog/2009/01/04/managing-tick-data-with-hdf5/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/01/04/managing-tick-data-with-hdf5/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 18:45:36 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[market data]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[post-trade analysis]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=262</guid>
		<description><![CDATA[One of the nicest things about the holiday season (Happy New Year, btw) is that it provides a lovely opportunity to spend some quality time with a project that&#8217;s a bit more exploratory than might be meaningfully undertaken while trading in lively markets. A number of months ago, I mentioned using HDF5 to manage tick [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin-left: 5px; margin-right: 5px;" title="big data" src="/images/bigdata.jpg" alt="" width="328" height="248" /></p>
<p>One of the nicest things about the holiday season (Happy New Year, btw) is that it provides a lovely opportunity to spend some quality time with a project that&#8217;s a bit more exploratory than might be meaningfully undertaken while trading in lively markets.</p>
<p>A number of months ago, I <a title="billions and billions" href="http://www.puppetmastertrading.com/blog/2008/08/22/billions-and-billions/" target="_blank">mentioned</a> using <a title="HDF5" href="http://www.hdfgroup.org/HDF5/" target="_blank">HDF5</a> to manage tick data as RDBMSes just aren&#8217;t up to the task and specialized Tick DBs are absurdly expensive.  While I&#8217;d spent some time exploring this idea through the fall, I never had a discrete chunk of time to really explore the technology beyond determing that its Java interfaces weren&#8217;t production-worthy.  This meant that we&#8217;d have to drop into C to access the functionality we&#8217;re interested in and that we&#8217;d have to come up with our own bridge out into Java for access by StratBox while StratCloud could access it directly.</p>
<p>Below, I describe what I&#8217;ve learned through my holiday geek-spelunking-trek including some timings on various configurable characteristics of HDF5 (e.g., compression and &#8220;chunking&#8221;).</p>
<p><span id="more-262"></span></p>
<p>After spending some time looking at the java interfaces to HDF5, I determined it wasn&#8217;t up to snuff.  Why?  Primarily because no-one seems to use it, it lags the main api from a versioning standpoint and it appears to be more-or-less impossible to build from source.  Looking a bit more carefully, it seems to have been written by one (undoubtedly talented and well-meaning) individual who isn&#8217;t familiar with java.  (The most egregious example was to use a javax.swing.tree.TreeNode as the base class for a key model object&#8230;)</p>
<p>I then spent some time looking at the native api and the underlying object model it exposes.  The model is both powerful and pretty low-level.  They&#8217;ve implemented many of the goodies of a file system including groups (&#8220;directories&#8221;), datasets (&#8220;files&#8221; or in RDBMS-land, &#8220;tables&#8221;) and a variety of nice linking mechanisms as well as attributes which might index or otherwise annotate data.  There&#8217;s also powerful, extensible I/O options which I didn&#8217;t much study beyond compression and &#8220;chunking.&#8221;</p>
<p>The library is provided with two &#8220;first-class&#8221; APIs &#8211; in C and Fortran &#8211; a secondary API in C++ and then the Java interface I mentioned.  Others have written interfaces for other languages, most notably the much-lauded <a title="PyTables" href="http://www.pytables.org" target="_blank">PyTables</a> implementation for Python which is used by many in conjunction with the popular <a title="Numpy" href="http://numpy.scipy.org/" target="_blank">NumPy</a> package.</p>
<p>Given this spread of implementation languages I chose C and determined that I&#8217;d steal a page from the talented crew behind <a title="QuantLib" href="http://quantlib.org" target="_blank">QuantLib</a> and use <a title="Simplified Wrapper and Interface Generator" href="http://www.swig.org/" target="_blank">SWIG</a> to expose relevant functionality into Java.  This has proven to be a splendid choice for my needs.</p>
<p>Having gotten this far, I started examining how I&#8217;d represent market data with hdf5 and came up with two broadly opposed approaches.  In order to gain some insights from those more experienced, I sent the below problem statement / inquiry to the main HDF5 mailing list:</p>
<blockquote><p><span style="font-weight: bold;">A description of the data and its use</span></p>
<p>The data is all timestamped financial streams of &#8220;tick&#8221; data.  Each record is small (a few hundred bytes at the most), but there are many &#8211; in a day you may see many hundred million to a few billion.  Each record is naturally partitioned by instrument (eg, &#8220;microsoft&#8221;, &#8220;ibm&#8221;, &#8220;dec crude&#8221;, etc).  There are less than 30K instruments in the universe I might care about.</p>
<p>I (more or less) don&#8217;t care how long it takes to construct the h5 files/structures as it will be performed offline and the only critical query I care about is something like:</p>
<div style="margin-left: 40px;"><span style="font-style: italic;">&#8220;Get ticks for instruments {i1&#8230;in} from time t1 to time t2 ordered by time, instr&#8221;. </span></div>
<p>That is, I need to be able to &#8220;replay&#8221; a subset of the instruments within the data store over some period of time.  But I really care that this be as fast as possible.</p>
<p><span style="font-weight: bold;">Questions </span></p>
<p>0.  Am I barking up the wrong tree?  Is HDF5 an appropriate technology for the use I&#8217;ve described?</p>
<p>1. Given the size/volume of the data, my thought is to partition h5 files by day.  Uncompressed, the files will be on the order of ~25G.  Does this sound reasonable?  What are the key factors impacting this decision from an hdf5 perspective?</p>
<p>Two alternative models come immediately to mind: one big table (OBT) per day ordered by instrument and then time, or one table per instrument (OTPI) ordered by time.  My current inclination is OTPI as it seems more manageable assuming the overhead of so many tables isn&#8217;t an issue.</p>
<p>2a.  Are there other, better models you suggest I investigate?</p>
<p>2b.  With the OBT, I&#8217;d need to be able &#8220;index into&#8221; the table to identify the beginning of each instrument&#8217;s section (at least).  How would you recommend doing this?  It seems possible to do this with references or perhaps a separate table with numerical indices into the main table.  Any pros/cons/alternatives to these approaches?</p>
<p>2c.  With the OTPI, I&#8217;d need to have many tables (at most ~30K) per file.  Is this an issue?</p>
<p>2d. For both models, I&#8217;d need to be able to merge sorted sets of h5 data into one sorted set as quickly as possible.  Is there any hdf5 support for doing such a thing or external libraries created for this purpose?</p>
<p>3. What impact on retrieval/querying should I expect to see with varying levels of  compression?</p>
<p>4. Any suggestions on chunksizes for this application?</p></blockquote>
<p>I was fortunate to receive some excellent responses to my query, including from Francesc Alted, the gracious author of the PyTables library, and from a gentleman who&#8217;d implemented similar functionality for his own trading environment.  Interestingly, both approaches &#8211; OBT and OTPI &#8211; were championed.  It seems that OTPI is probably to be preferred if the number of instruments/tables to be stored isn&#8217;t excessive (perhaps below 10K though I can&#8217;t quantify this) and the frequency of update is significant.  OTPI is easier to implement as it means you can rely more upon the infrastructure provided by HDF5.  OBT instead seems more scalable as you incur less overhead (and goodies) with the one table, though you pay for this by having to implement your own indexing logic.</p>
<p>Given the divergent advice and my own lack of hands-on familiarity with the C library, I decided to try both approaches on a prototype.  Instead of looking at vast amounts of tick data, I&#8217;d try both approaches on a smaller store  (~1G with ~7K instruments) of OHLCV data.</p>
<p>By far, the easier to implement is the OTPI approach.  However, even with this relatively small amount of data, the difference in write performance and file size was substantial.  Clearly, expanding this to the scale of tick data wasn&#8217;t going to yield sufficiently performant results.  I focused on the OBT approach.</p>
<p>&#8212;</p>
<p>Given the length of this post and keeping in mind that the holiday season isn&#8217;t over just yet (about ten hours remaining as I write this!), I&#8217;m going to stop writing now and continue with the remainder of my implementation and findings in a follow-up post later this week..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/01/04/managing-tick-data-with-hdf5/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>beyond the bull puppet</title>
		<link>http://www.puppetmastertrading.com/blog/2008/11/11/beyond-the-bull-puppet/</link>
		<comments>http://www.puppetmastertrading.com/blog/2008/11/11/beyond-the-bull-puppet/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 02:07:45 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[startup]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=109</guid>
		<description><![CDATA[Normally I spend my design-oriented thoughts on object models &#8211; when I&#8217;m working on StratBox  &#8211; or about volatility, latency, executions, &#38;tc &#8211; when I&#8217;m working on a trading strategy.  But a recent trip abroad has inspired me to consider more fanciful design horizons. After more than a year of blogging I&#8217;ve finally decided to [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><img class="aligncenter" title="StratCloud" src="/images/stratCloud.jpg" alt="" width="294" height="334" /></p>
<p>Normally I spend my design-oriented thoughts on object models &#8211; when I&#8217;m working on StratBox  &#8211; or about volatility, latency, executions, &amp;tc &#8211; when I&#8217;m working on a trading strategy.  But a recent trip abroad has inspired me to consider more fanciful design horizons.</p>
<p>After more than a year of blogging I&#8217;ve finally decided to refresh the look of the site and you&#8217;re looking at the first iteration of this effort.  Blogging software is pretty remarkable as it allowed me to essentially change the &#8220;skin&#8221; of the blog without affecting its content.  This is like <a title="Substance - open source Java L&amp;F" href="http://www.pushing-pixels.org/" target="_blank">Kirill Grouchnikov</a>&#8216;s lovely open source &#8220;Substance&#8221; Look&amp;Feel for Java which does the same trick for swing-based applications: just include his magic code and your system automagically looks a lot better!</p>
<p>More substantively, my recent trip to Israel and the subsequent agreement to open a Tel Aviv office to take advantage of a felicitous new partnership and Israeli algorithmic talent, has led to a broadening of our mission.  This in turn led to the foray into graphic design I describe below.<span id="more-109"></span></p>
<p>While we still haven&#8217;t gotten around to updating our main site, I have had the opportunity to work again with <a title="Erica Green Design" href="http://egreendesign.com/" target="_blank">Erica Green</a>, the talented designer who brought the Puppetmaster Trading <em><strong>bull puppet</strong></em> into existence&#8230;</p>
<p><strong>Origins of the bull puppet</strong></p>
<p>The name &#8220;Puppetmaster Trading&#8221; came about following a discussion I&#8217;d had with a friend in spring 2005 shortly before I&#8217;d left my job to try my luck with this startup.  After stating that I wanted to become an algorithmic trader and receiving the now familiar blank stare in response, I had explained to him that I explicitly didn&#8217;t want to trade myself, but instead wanted to write little autonomous software agents which would themselves trade on my behalf.  He immediately got it and said, &#8220;Oh!  So you&#8217;d be like a <em>puppetmaster</em> managing your cast of trading marionettes&#8230;&#8221; and I knew the startup had found its name.</p>
<div class="wp-caption aligncenter" style="width: 417px"><img title="the bull puppet" src="/images/logo-txt.gif" alt="the bull puppet" width="407" height="263" /><p class="wp-caption-text">&quot;the bull puppet&quot;</p></div>
<p>Erica had developed the original logo based on my wife&#8217;s loose vision of a trading puppetmaster, but this was long before I knew we&#8217;d have to write the algorithmic trading platform itself.  I had hoped to use an off-the-shelf product so I could focus on the algorithms themselves. That didn&#8217;t turn out to be feasible&#8230;</p>
<p>Between developing the platform, devising algorithms, overseeing the trading &#8220;marionettes&#8221; that implemented those algorithms and making most of the many mistakes startups make (and possibly a few of our own invention), I never got a chance to go beyond the bull puppet&#8230; until now.</p>
<p style="text-align: center;"><img class="aligncenter" title="StratBox" src="/images/stratBox.jpg" alt="" width="357" height="380" /></p>
<p style="text-align: left;">Now that StratBox is a reality, and with a new vision for an exchange-colocated, cloud-computing version of the platform, it seemed a good opportunity to renew the collaboration with Erica.  I like her design for StratBox so much that I&#8217;ve lassooed it into service on this blog.</p>
<p>At the head of this post, you&#8217;ll see my favorite design for our nascent StratCloud offering, though the one below remains in contention.  (I&#8217;d love to hear your opinion!)</p>
<p style="text-align: center;"><img class="aligncenter" title="StratCloud - green" src="/images/stratCloud_g.jpg" alt="" width="346" height="365" /></p>
<p>As it was three years ago, the experience has been very positive.   If you have some challenging graphic design opportunity I encourage you to see if Erica can&#8217;t help you realize your vision.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2008/11/11/beyond-the-bull-puppet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>billions and billions</title>
		<link>http://www.puppetmastertrading.com/blog/2008/08/22/billions-and-billions/</link>
		<comments>http://www.puppetmastertrading.com/blog/2008/08/22/billions-and-billions/#comments</comments>
		<pubDate>Fri, 22 Aug 2008 16:21:22 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[back-testing]]></category>
		<category><![CDATA[market data]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[post-trade analysis]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog-test/?p=81</guid>
		<description><![CDATA[While Carl Sagan&#8217;s famous formulation introduced a generation to the vastness of the cosmos, more recent history suggests that his memorable term might now be more aptly applied to financial extents: our deficits and debts, perhaps, to the economically or politically minded. But for those of us with the markets on our mind, the term [...]]]></description>
			<content:encoded><![CDATA[<p><img hspace="7" align="left" alt="billions and billions" title="billions and billions" src="http://puppetmastertrading.com/images/stars.jpg" /></p>
<p>While Carl Sagan&#8217;s famous formulation introduced a generation to the vastness of the cosmos, more recent history suggests that his memorable term might now be more aptly applied to financial extents: our deficits and <a target="_blank" title="US Public Debt" href="http://en.wikipedia.org/wiki/United_States_public_debt">debts</a>, perhaps, to the economically or politically minded.  But for those of us with the markets on our mind, the term has to evoke the enormity of the data we create and must manage every day.  We&#8217;ve recently been working with the <a target="_blank" title="NYSE TAQ Data" href="http://www.nyxdata.com/nysedata/default.aspx?tabid=730">NYSE&#8217;s TAQ data</a> in an effort to integrate it into <a target="_blank" title="Puppetmaster Trading: StratBox" href="http://puppetmastertrading.com">StratBox</a>&#8216;s back-testing and optimization capabilities.  And the enormity of the data is really just staggering.</p>
<p>Each day, the NYSE publishes all of the day&#8217;s quotes and trades as well as some reference data.  Compressed, the data will just about fit onto a DVD.  For one day.  A DVD.  Compressed.  It&#8217;s really mind-boggling.  A year of the stuff, uncompressed, will require over a <em>petabyte </em>of storage.  Over 1,125,899,906,842,624 bytes.  And that&#8217;s just the US Equities markets.  You want options data, too?  I hope your uncle is named <a title="EMC - " target="_blank" href="http://www.emc.com/index.htm">EMC</a>, because just managing the data is going to be <em>a challenge</em>&#8230;</p>
<p><span id="more-81"></span></p>
<blockquote><p>&#8220;Information about money has become almost as important as money itself.&#8221; &#8212; Walter Wriston, former Chairman of Citicorp</p></blockquote>
<p>The enormity and profile of market data far exceeds the capacity of traditional RDBMSes. While RDBMSes continue to expand their usable capacity &#8211; we have used partitioned tables with nearly a billion rows of market data which have performedÂ  astonishingly well &#8211; they simply can&#8217;t deal with the kinds of quote volumes modern markets are generating daily.  This has spawned a host of specialized timeseries database products, like the grandaddy: <a target="_blank" title="Sungard's Fame" href="http://www.sungard.com/Fame/">Sungard&#8217;s FAME</a> which I&#8217;d used back in the 90&#8242;s to write programs to calculate bond indices at JPM, to more recent offerings like <a target="_blank" title="Vhayu" href="http://www.vhayu.com/">Vhayu</a> and <a target="_blank" title="kdb+" href="http://kx.com/">Kdb+</a>.  These timeseries oriented data products undoubtedly have many distinguishing characteristics and features, but they share one immutable characteristic: they are unbelievably expensive &#8211; in some cases a single developer seat costs in the high 6-figures for an annual license.</p>
<p>Thus, while no doubt missing out on some of their high-end features and niceties, we&#8217;ve decided to seek solutions from some of the original purveyors of petabyte-scaled data: NASA and the NCSA through their <a title="HDF5: what is it?" target="_blank" href="http://hdf.ncsa.uiuc.edu/HDF5/whatishdf5.html">HDF5</a> system.  Designed to support vast scientific data stores and boasting sophisticated capabilities in support of parallel computing environments, it should be possible to get comparable performance to some of the high-end specialized finance products without the sticker shock.  Indeed, it&#8217;s potentially <a title="Quantlib" target="_blank" href="http://puppetmastertrading.com/blog/2008/06/14/using-quantlib-from-java/">another example of free software</a> providing a meaningful contribution to finance.</p>
<p>In researching cost-efficient and highly parallel hardware solutions to pair with our emergent data solution, I&#8217;ve come to realize that open-source is expanding its reach into the hardware sphere.</p>
<p><img title="Linux cluster in an IKEA Filing cabinet" alt="Linux cluster in an IKEA Filing cabinet" src="http://puppetmastertrading.com/images/helmer.png" /></p>
<p><a target="_blank" title="Helmer" href="http://helmer.sfe.se/">This guy</a> shares his experience and &#8220;recipe&#8221; for building a powerful and unique rendering cluster inside an IKEA filing cabinet.  It&#8217;s admittedly on the funky side for even a SOHO operation, but it&#8217;s no joke &#8211; it&#8217;s more powerful than a lot of production blade servers used on wall st and it cost him less than $4K.  He also includes a (very loosely described) spec for a more powerful next-generation version with some 50-Teraflops of capacity!  So, while the data we&#8217;re having to deal with is growing at an incredible rate, the tools we have to manage it are growing proportionately for those who know how to leverage the work so many smart people are producing and freely sharing.</p>
<p>As my dad told me years ago:</p>
<blockquote><p>&#8220;Good programmers write good programs.  Great programmers <em>steal </em>good programs.&#8221;</p></blockquote>
<p>At this point, we&#8217;re still in the &#8220;discovery&#8221; stage of our development of TAQ+HDF5 for StratBox, but as we progress I&#8217;ll periodically post some of our experiences.</p>
<p>&#8212;</p>
<p>UPDATE</p>
<p>Speaking of my dad, he saw this posting and pointed me to some <a target="_blank" title="Massive Information processing and Fault-Tolerance: The Google Approach" href="http://www.cis.temple.edu/~ingargio/cis307/readings/MapReduce.html">class notes</a> he&#8217;s been working on which describe the Google approach to massive information processing and fault-tolerance.Â  Interesting and full of great links to both academic and industrial papers/sites on the topic.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2008/08/22/billions-and-billions/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>using Quantlib from Java</title>
		<link>http://www.puppetmastertrading.com/blog/2008/06/14/using-quantlib-from-java/</link>
		<comments>http://www.puppetmastertrading.com/blog/2008/06/14/using-quantlib-from-java/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 04:26:43 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[FIX Protocol]]></category>
		<category><![CDATA[monte-carlo methods]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[options pricing]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog-test/?p=3</guid>
		<description><![CDATA[One of these days I’m going to give an overview of all the excellent open-source software I use on a daily basis. Until that day comes, I’ll observe that finance remains one of the big areas where open-source software has made relatively limited inroads. Two production-quality packages fight that unhappy state: QuantLib &#8211; a comprehensive [...]]]></description>
			<content:encoded><![CDATA[<p><img title="A free/open-source library for quantitative finance" src="http://puppetmastertrading.com/images/QL-title.jpg" alt="A free/open-source library for quantitative finance" /></p>
<p>One of these days I’m going to give an overview of all the excellent open-source software I use on a daily basis.  Until that day comes, I’ll observe that finance remains one of the big areas where open-source software has made relatively limited inroads.<img title="Java" src="/images/javabrew.jpg" alt="Java" align="right" /></p>
<p>Two production-quality packages fight that unhappy state: <a title="QuantLib: A free/open-source library for quantitative finance" href="http://quantlib.org" target="_blank">QuantLib</a> &#8211; a comprehensive framework for quantitative finance &#8211; and <a title="QuickFix: Open-source FIX engine" href="http://www.quickfixengine.org/" target="_blank">QuickFix</a> &#8211; a full-featured <a title="Financial Information eXchange" href="http://fixprotocol.org" target="_blank">FIX</a> engine.   Both are C++ libraries and both provide very nice interfaces to facilitate integration with other languages, including Java.  QuantLib is a big and complicated library and integrating it with Java is not totally obvious. Below, I’ll describe how to build and use QuantLib from Java.</p>
<p><a id="more-29"></a> These instructions are based on a unix installation.  I’m not really a windows developer and don’t have all the shiny tools that windows developers use, so it’s not an area of focus for me.  That said, I have managed to build QuantLib under windows by using <a title="MIinGW+MSYS: Linux in a box" href="http://www.mingw.org/" target="_blank">MinGW+MSYS</a> but it wasn’t terribly easy and I don’t currently have a working installation, so I won’t cover that here.  If this is your aim, don’t be dismayed as it is possible and it had all the functionality I enjoy under linux.</p>
<p><strong>Using QuantLib from Java (on linux)</strong></p>
<ul>
<li>Build QuantLib
<ul>
<li>Requires a working version of <a title="Boost C++ Library" href="http://www.boost.org/" target="_blank">Boost</a>.   This may prove to be the hardest step of all and you’ll need to use the ample documentation provided by the Boost team.</li>
<li>Once you have a working copy of Boost, building QuantLib should require little more than</li>
<p><code>sh autogen.sh</code><br />
<code>./configure</code></p>
<p><code>make</code><br />
<code>sudo make install</code></ul>
</li>
<li>Build QuantLib-SWIG
<ul>
<li>Requires a working copy of <a title="SWIG - Simplified Wrapper and Interface Generator" href="http://www.swig.org/" target="_blank">SWIG</a>.  Again, look to the SWIG instructions, but it should be easy.</li>
<li>Once SWIG is available, building the QuantLib/SWIG interfaces should only require:</li>
<p><code>sh autogen.sh</code><br />
<code>./configure \</code></p>
<p>&#8211;with-jdk-include=${JAVA_HOME}/include \<br />
&#8211;with-jdk-system-include=${JAVA_HOME}/include/linux<br />
<code>make -C Java</code><br />
<code>sudo make install</code></ul>
</li>
</ul>
<ul>
<li>Now you’ll have a Jar file with all of the SWIG/JNI stubs in it available in /usr/local/lib/QuantLib.jar.  Add this to your classpath.</li>
<li>Programs which call QuantLib functionality will need to have the <code>LD_LIBRARY_PATH</code> set.  This can be done by invoking the vm with something like:</li>
<p><code>-Djava.library.path=/usr/local/lib </code></ul>
<ul>
<li>Programs which call QuantLib functionality will also need to explicitly load the QuantLib libraries.  This can be done with something like the following static block appearing before your main method:</li>
<p><code> static {  // Load QuantLib<br />
try { System.loadLibrary("QuantLibJNI"); }<br />
catch (RuntimeException e) { e.printStackTrace(); }<br />
} </code></p>
<li>That’s it.  Now test your configuration by running the examples in Quantlib-SWIG/Java/examples.</li>
</ul>
<p>It’s worth understanding how Quantlib is being used from java.  SWIG is creating a JNI interface into those methods within Quantlib which have been exposed through their declaration in the swig *.i files.  These files are found in Quantlib-SWIG/SWIG and they determine what functionality from Quantlib will be available to you.  You’ll likely need to get familiar with a subset of those files that you care about. If you find that some functionality you care about isn’t exposed in those files, you may need to expose it yourself.</p>
<p>There’s a learning curve, but it’s worth traversing so you can get at all the rich functionality so many smart people have put together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2008/06/14/using-quantlib-from-java/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>the problem with easy</title>
		<link>http://www.puppetmastertrading.com/blog/2008/02/22/the-problem-with-easy/</link>
		<comments>http://www.puppetmastertrading.com/blog/2008/02/22/the-problem-with-easy/#comments</comments>
		<pubDate>Fri, 22 Feb 2008 13:54:24 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog-test/?p=42</guid>
		<description><![CDATA[Among the more challenging questions we face when describing the Puppetmaster environment are those like &#8220;how do you create new proprietary trading strategies within the environment?&#8221; It&#8217;s a difficult question because of expectations &#8211; people want to hear about some super simple scripting language that any non-technical person can immediately learn and be up and [...]]]></description>
			<content:encoded><![CDATA[<p><a target="_blank" href="http://xkcd.com/221/"><img align="middle" alt="the problem with easy" title="the problem with easy" src="http://imgs.xkcd.com/comics/random_number.png" /></a></p>
<p>Among the more challenging questions we face when describing the Puppetmaster environment are those like &#8220;how do you create new proprietary trading strategies within the environment?&#8221;  It&#8217;s a difficult question because of expectations &#8211; people want to hear about some super simple scripting language that any non-technical person can immediately learn and be up and algorithmically trading in no time.  A few platforms intended for retail users offer such things &#8211; one is even appropriately named <em>easy language.</em>  When researching approaches for our system, we spent some time learning easy language and found that it in fact did <strong>make easy things easy!</strong></p>
<p>The problem was that it also <strong>made sophisticated things impossible</strong>.</p>
<p>This led us to pursue another, more powerful, approach for which we are currently seeking a patent.</p>
<p><span id="more-42"></span></p>
<p>That said, once upon a time I&#8217;d worked on a system into which some enterprising engineer had embedded a <a target="_blank" href="http://www.tcl.tk/">TCL</a> interpreter with which one could monitor and manipulate objects within the runtime environment.  We didn&#8217;t use it for much, but it was certainly interesting and helped with debugging the live system.  Ever since then, I&#8217;ve had an interest in embedding a simple language+interpreter into my systems.</p>
<p>Thus, when a friend who builds systems for <a target="_blank" title="UBS" href="http://www.ubs.com/">UBS</a> told me about a talk on the <a title="Groovy" target="_blank" href="http://groovy.codehaus.org/">groovy</a> language in the google auditorium.  I went along to see what it was about.   (Admittedly, I was in significant part motivated to see inside the mysterious googleplex.  I found it disappointing unless you fancy an environment that looks like an apple store with free razor scooters for anyone who might fancy a spin&#8230;)<img align="right" alt="In my world, this is *not* a perk ..." title="In my world, this is *not* a perk ..." src="http://puppetmastertrading.com/images/scooter.jpg" /></p>
<p>The talk itself was also a bit disappointing as the speaker was very permissive with the questions and google had graciously given the crowd access to free beer.  Many will argue that java programmers aren&#8217;t the sharpest tools in the shed and while I won&#8217;t weigh-in on that issue, I&#8217;m pretty certain that adding beer to the equation isn&#8217;t likely to help.</p>
<p>In any case, I did come away from the talk understanding that 1) if our business takes off in a big way, we will not attempt to motivate people with silly toys and 2) that groovy (or any other such scripting language) isn&#8217;t a reasonable mechanism for enabling non-technical people to write trading strategies.</p>
<p><img align="left" alt="Groovy" title="Groovy" src="http://media.xircles.codehaus.org/_projects/groovy/_logos/medium.png" />This is no fault of the groovy language which seems fine enough.  It&#8217;s a problem with the complexity of algorithmic trading.  Trading is an intrinsically uncertain, asynchronous and demanding activity.  I send orders to exchanges and they respond in their own good time.  It might be 20ms if I&#8217;m sitting on top of the exchange or it might be seconds if I&#8217;m around the world or it might be days or more if it&#8217;s some kind of conditional order, but it&#8217;s certainly not a fixed time and it&#8217;s an uncertain message I&#8217;ll receive in response.  Messages arrive when they arrive and you have to be ready for them.</p>
<p>If you&#8217;re trading a portfolio re-balancing model that trades infrequently and in size then it might suffice to send a few or perhaps a few hundred vwap orders a month and things will mostly take care of themselves.  Here, maybe a scripting language is sufficiently expressive.  But if you want to implement that vwap algorithm yourself or implement a strategy that trades <em>against your broker&#8217;s vwap</em> algorithm or implement a high-frequency strategy that is very &#8220;close to the market&#8221; requiring extreme low latency and precise order management or write a discovery algorithm that&#8217;s employing proprietary understanding of the ubiquitous and ephemeral &#8220;dark pools of liquidity,&#8221; then no scripting language will suffice.</p>
<p>From a technology perspective, my curiosity for groovy was sated &#8211; it&#8217;s certainly possible (and even easy!) to embed a powerful scripting language into your application.  But from an algorithmic trading perspective, the problem is designing a trading-specific language that allows easy things to be simply done while allowing complex strategies to be efficiently expressed as well.  This remains an area of active interest and research as it may someday complement our existing methodology, but for now we think we&#8217;ve got the best solution for the difficult &#8220;problem with easy.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2008/02/22/the-problem-with-easy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Engineering Randomness</title>
		<link>http://www.puppetmastertrading.com/blog/2008/01/06/engineering-randomness/</link>
		<comments>http://www.puppetmastertrading.com/blog/2008/01/06/engineering-randomness/#comments</comments>
		<pubDate>Sun, 06 Jan 2008 13:46:41 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[monte-carlo methods]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[performance analysis]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog-test/?p=36</guid>
		<description><![CDATA[It turns out that one can actually sculpt or engineer randomness. Further, this ability can shed remarkable light on otherwise mysterious phenomena &#8211; like the value of an option or the performance of a complex trading strategy. The applet above is governed by the equation (for Geometric Brownian Motion) below. It provides you with the [...]]]></description>
			<content:encoded><![CDATA[<p><applet width="500" height="350" codebase="/randomWalk" code="examples.RandomWalk" archive="rw.jar">  </applet><br />
It turns out that one can actually <em>sculpt </em>or engineer randomness.  Further, this ability can shed remarkable light on otherwise mysterious phenomena &#8211; like the value of an option or the performance of a complex trading strategy.  The applet above is governed by the equation (for Geometric Brownian Motion) below.  It provides you with the ability to produce a random stream with three modifiable characteristics: an initial value (in this case fixed at 100), volatility (or &#8216;diffusion&#8217;) and expected return (or &#8216;drift&#8217; or slope).  Every five seconds it will generate a new path based on the current settings.  <img title="path generator" src="/images/GeometricBrownianMotionSDE.png" alt="path generator" align="bottom" /> Hopefully, this applet provides a little insight into the machinery of a powerful algorithmic trading tool: monte-carlo methods.  <span id="more-36"></span>Just as we can permute a strategy&#8217;s parameters over some period of historical data, we can also permute synthetic data as input to our strategy.  This is one of the classic methods for pricing a wide variety of derivative instruments and it can be productively applied to the performance analysis of algorithmic trading strategies.  As a technique, it has several advantages over historical back-testing.  It doesn&#8217;t require costly market data, it allows a user to engineer key characteristics of the market(s) to be tested, and it eliminates many of the messy characteristics of the real world.  We&#8217;ll look at each of these advantages in turn.  Market data is one of the inevitable nightmares for an algorithmic trader.  It costs more than gold, comes in many forms from many sources and is all too frequently dirty.  Even when you have perfect data, the world intercedes.  Equities have splits, reverse splits, de-listings, re-listings, mergers, dividends and probably a host of other corporate actions not to mention the gazillions of &#8216;dark pools&#8217; which might all act as sources of data or at least confusion.  Futures are constantly rolling and exchanges will not infrequently change their contract specs.  Options share all of the difficulties of their underlying instruments but add a bewildering array of contracts as well as the problem of very sparse data streams for far out of the money contracts.  In sum, managing real-time and historical market data is simply an expensive and messy endeavor.  While I wouldn&#8217;t recommend going data-free, it is helpful to be able to generate synthetic data to order when a particular period or variety of market data isn&#8217;t available.  The ability to actually engineer the data that acts as input to your monte-carlo simulations is particularly powerful.  In the simple example above, I&#8217;ve only allowed you to modify a constant slope and volatility of the sequences you generate, but a sophisticated user can do much more powerful tricks.  To list just a few, one can introduce term structures for volatility, dividends, coupons and interest rates; one can spice-up one&#8217;s models with the introduction of some flavor of jump-diffusion or one can scale the process by introducing correlated multi-path sequences.  This last capability, the ability to <a title="correlated random sequence" href="http://sitmo.com/doc/Generating_Correlated_Random_Numbers" target="_blank">generate sequences of correlated random data</a>, is especially useful as it allows you to generate data for correlated markets &#8211; say gold and silver &#8211; or even simulate market micro-structure including depth of market.  This ability to engineer markets with characteristics that you define also gives you the ability to create circumstances for which you may not have available historical precedent.  This is not so uncommon as one might think.  This point is colorfully illustrated by <a title="Dr. Andrew Lo" href="http://web.mit.edu/alo/www/" target="_blank">Dr Andrew Lo&#8217;s</a> hypothetical <a title="Recipe for capital decimation" href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=671443" target="_blank">Capital Decimation Partners</a>&#8216; trading strategy which simply wrote puts on the s&amp;p500 throughout the 90&#8242;s to produce spectacular gains in the face of massive risk.  It&#8217;s a curious fact of our modern financial history that one needs to go pretty far back in time to see the US equity markets do anything extraordinarily nasty.  Building trading strategies or a business without at least considering such risks is ill-considered and the techniques we&#8217;re discussing are one way of analyzing those out-of-sample risks within one&#8217;s strategy development process.  The practice of engineering markets &#8211; writing markets &#8211; inevitably leads to an increased understanding of how to read them.  If a particular strategy works well under some kinds of market conditions, one can attempt to identify such market conditions as they unfold so the strategy can be deployed conditionally.  It&#8217;s interesting to note that while many algorithmic trading products feature some form  of back-testing and some feature some type of parameter optimization, few if any offer the kind of functionality which I describe here and which is a critical element of the Puppetmaster Trading platform.  A final comment/credit before completing this post &#8211; the applet above uses <a title="Dave Gilbert" href="http://www.jroller.com/dgilbert/" target="_blank">Dave Gilbert&#8217;s</a> excellent open source <a title="JFreeChart" href="http://www.jfree.org/jfreechart/" target="_blank">JFreeChart</a> package.  Although the applet uses its own very simple brownian motion implementation, I wouldn&#8217;t generally recommend trying to write your own monte-carlo primitives but would instead recommend using those from a mature library like that supplied by <a title="QuantLib" href="http://quantlib.org/" target="_blank">QuantLib.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2008/01/06/engineering-randomness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A trading strategy is an option</title>
		<link>http://www.puppetmastertrading.com/blog/2007/10/10/a-trading-strategy-is-an-option/</link>
		<comments>http://www.puppetmastertrading.com/blog/2007/10/10/a-trading-strategy-is-an-option/#comments</comments>
		<pubDate>Wed, 10 Oct 2007 13:34:36 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[back-testing]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[monte-carlo methods]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[options pricing]]></category>
		<category><![CDATA[performance analysis]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog-test/?p=27</guid>
		<description><![CDATA[The best way to reason about a trading strategy&#8217;s performance, that is valuing it, is as an option. Or perhaps as a collection or portfolio of them. I have to assume that people reading this have a working idea of what an option is, so I&#8217;m not going to provide definitions that can be readily [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="options, options, ..." title="options, options, ..." src="http://puppetmastertrading.com/blog/images/oneway.jpg" /></p>
<p>The best way to reason about a trading strategy&#8217;s <em>performance</em>, that is <strong><em>valuing</em></strong> it, is as an option.</p>
<p>Or perhaps as a collection or portfolio of them.</p>
<p>I have to assume that people reading this have a working idea of what an option is, so I&#8217;m not going to provide definitions that can be readily found <a target="_blank" title="option definition" href="http://en.wikipedia.org/wiki/Option_%28finance%29">elsewhere.</a> I will note that my favorite book on the trading of options is by <a target="_blank" title="Option Market Making" href="http://www.amazon.com/Option-Market-Making-Financial-Commodity/dp/0471578320/ref=pd_bbs_sr_4/102-4760953-2767316?ie=UTF8&#038;s=books&#038;qid=1192021949&#038;sr=8-4">Allen Jan Baird.</a></p>
<p>Let&#8217;s consider the three illustrative trading strategies we&#8217;ve looked at up until now.  The trend-following strategy suffered many little losses and then enjoyed a big win.  Sounds like buying options. The mean-reverting strategy made lots of little profits and then risked getting clobbered with a big loss.  Sounds like someone who&#8217;s writing options.  And the first strategy we <a title="morning range breakout" href="http://puppetmastertrading.com/blog/2007/10/02/anatomy-of-a-knockout/">looked at</a>, the morning range breakout, had a payoff which looked like a long straddle or strangle where the break-evens were near the observed high and lows for the session (where we set our entry stops).</p>
<p><img title="straddle payoff" alt="straddle payoff" src="http://puppetmastertrading.com/blog/images/straddle.gif" /></p>
<p>Now, there&#8217;s obvious differences between the trading strategies&#8217; payoff structures as compared to the similar options strategies.  There&#8217;s no premium, for instance, and that&#8217;s clearly significant.  The morning range breakout seems to exhibit a sort of knockout effect when a position has been entered but then the market reverses and you&#8217;re &#8220;knocked-out&#8221; of your position.  You just take a loss and do not collect even if the market turns back in your direction.  With a straddle you don&#8217;t have this behavior.  There are differences and they are worth keeping in mind.  But the reasons for viewing trading strategies as options portfolios are many and compelling.</p>
<p>The superficial reason, as I mentioned, is that the basic payout structures are potentially similar.  The deep reason is that ultimately the problems are the same &#8211; how to value complex instruments with engineered payouts.  And the pragmatic reason is that many many very smart people have applied their considerable brains and diverse skill-sets to advancing options pricing techniques.  There&#8217;s also a great deal of high quality software available <a target="_blank" title="QuantLib" href="http://www.quantlib.org">out there</a> which can be used to adapt these time-proven techniques to your own algorithmic trading strategy valuations.</p>
<p>The techniques which we&#8217;ve seen up until now, back-testing and parameter optimization, are sort of  weak cousins of a family of techniques long used for options pricing: Monte-Carlo (MC) methods.  MC simulation can clearly be used to assess a trading strategy&#8217;s performance.</p>
<p>In subsequent posts, we&#8217;ll talk about some of the details of each of these techniques and about some of their respective trade-offs.  That should keep my pump primed for a bit, but in the meanwhile I leave you with a parting inquiry: what other options pricing techniques might we apply to our algorithmic trading practices?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2007/10/10/a-trading-strategy-is-an-option/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

