<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hack the market &#187; technology</title>
	<atom:link href="http://www.puppetmastertrading.com/blog/index.php/category/technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.puppetmastertrading.com/blog</link>
	<description>Algorithmic trading experiences</description>
	<lastBuildDate>Wed, 21 Apr 2010 23:11:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>head in the clouds</title>
		<link>http://www.puppetmastertrading.com/blog/2010/04/21/head-in-the-clouds/</link>
		<comments>http://www.puppetmastertrading.com/blog/2010/04/21/head-in-the-clouds/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 23:10:29 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[back-testing]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=1219</guid>
		<description><![CDATA[or: scaling with elastic map-reduce
Between a rapidly evolving compute environment in which cores are multiplying like springtime rabbits, and a business domain in which the fecundity of market data is making those same rabbits look downright prudish, we are always looking for ways to scale our efforts.  There are three levels at which this can [...]]]></description>
			<content:encoded><![CDATA[<p><strong><img class="alignright" src="/images/headintheclouds.jpg" alt="" width="200" height="259" />or: scaling with elastic map-reduce</strong></p>
<p>Between a rapidly evolving compute environment in which cores are multiplying like springtime rabbits, and a business domain in which the fecundity of market data is making those same rabbits look downright prudish, we are always looking for ways to scale our efforts.  There are three levels at which this can be typically done:</p>
<ul>
<li>&#8220;below the processor&#8221; with things like CUDA or FPGAs,</li>
<li>&#8220;amongst the cores&#8221; with things like TBB, Cilk, &amp;tc, or</li>
<li>&#8220;in the cloud&#8221; (or grid) with things likes Amazon&#8217;s EC2 and its bewildering and fast-growing coterie of associated technologies and products</li>
</ul>
<p>Today we&#8217;ll look at a simple example which we implement on top of Hadoop and then deploy into Amazon&#8217;s cloud to get a back-of-the envelope feel for what kind of scaling we might expect to gain &#8211; and at what costs &#8211; from using this smorgasboard of technologies.</p>
<p><span id="more-1219"></span></p>
<p><strong>&#8217;smorgasboard&#8217; is being generous</strong></p>
<p>Probably the hardest thing whenever you look at some new set of technologies is just getting your bearings.  Amazon&#8217;s stack is complex and very much of a moving target, so I&#8217;ll start by providing a bit of a taxonomy of the beasties we&#8217;ll need to at least familiarize ourselves with in order to get anything done.</p>
<p>First, there&#8217;s the family of <a title="AWS" href="http://aws.amazon.com/" target="_blank">Amazon&#8217;s Web Services</a> (AWS).  This has been around for a good few years, so some parts of it are quite mature and well documented and understood where others are bleeding edge with an emphasis on bleeding.  All of them can be played with quite easily by simply signing up (this involves providing a credit card, so head&#8217;s up).  I can&#8217;t think of any more interesting family of technologies, so if you think there&#8217;s any possibility these technologies might help you scale, then I encourage you to sign up and give it a spin.  Very interesting stuff.</p>
<p>Anyway, within AWS we&#8217;ll play with Elastic Compute Cloud (EC2), Elastic MapReduce (EMR), Simple Storage Service (S3) and Elastic Block Storage (EBS).</p>
<p>EC2 provides you with direct access to Amazon&#8217;s racks of humming servers in the cloud.  Thanks to the miracles of modern virtualization technologies, an entire machine can be bundled up as an image file &#8211; an Amazon Machine Image (AMI &#8211; think of an OS&#8217;s DVD but with software that you specify, configured as you please). Once bundled into an AMI, your computer can be &#8216;installed&#8217; onto any number of machines within the cloud.  Nifty.</p>
<p>Such machines can be bought for long term use (at quite affordable rates), or they can be used on an as-needed basis for an hourly charge.  Hence the &#8216;Elastic&#8217; in EC2.  They can also be managed programmatically, and until recently, this was pretty much the only way to develop jobs in the cloud.</p>
<p>Each machine that you use in EC2 has some local storage associated with it, but this is not persistent across sessions.  Once your instance goes down, the data is all gone.  To get around this and to provide an all-around storage solution, AWS provides S3 which is, to me, an overly-simplified distributed file system which is something of a pain to use.  It is also limited to file sizes of 5G which I find to be a significant annoyance.  The good news is that S3 seems to be both highly durable and reasonably performant.</p>
<p>A more recent entry in the AWS world of storage is the EBS facility.  Here, you can create arbitrary sized &#8220;disks&#8221; in the cloud which can be attached directly to your computer nodes just as though they were local disks.  Very handy.  They don&#8217;t have the same level of durability as S3 as they&#8217;re not distributed, but I haven&#8217;t had any difficulties with them and they&#8217;re very useful for putting together AMIs within the cloud.</p>
<p>Finally, EMR brings the Hadoop implementation of Google&#8217;s marvelous distributed map-reduce model into the AWS cloud.  If you haven&#8217;t read about map-reduce, well perhaps you should &#8211; it&#8217;s really powerful.  <a title="Hadoop" href="http://hadoop.apache.org/" target="_blank">Hadoop</a> is an open-source implementation of it done in Java and seems quite nice though it&#8217;s under very active development, so the APIs are still very much a work in progress.  Be prepared to change your code.  Even though it&#8217;s written in Java, any language can be used to implement your own applications, and it seems as though both ruby and python are used pretty extensively along with java and c/c++.</p>
<p><strong>what map-reduce buys you</strong></p>
<p>The first thing map-reduce buys you is something of a headache as fitting your application design into the map-reduce model isn&#8217;t so obvious.  That said, it may well be worth a couple aspirin as it brings a great deal to the game.<strong> </strong></p>
<p>Consider an application we might find interesting which we&#8217;d like to place in the cloud &#8211; strategy back-testing.<strong> </strong>It seems to be a very natural fit for parallel processing as each strategy is independent of the others, so theoretically we should be able to run each one on it&#8217;s own virtual server without interdependencies.  While this is true, it&#8217;s rather easier than it sounds.  In order to implement our distributed backtesting platform, we first need to implement one that works on one node.  Fair enough.  We must then introduce a &#8220;chunker&#8221; which will break the overall set of simulations into bite-sized chunks.  We must then define an intermediate format for sending the strategies (or instructions) to each node.  And we must define an intermediate format for the results.  And then we need to implement an &#8220;assembler&#8221; which coalesces all of the results back into one result set which can be reported on or displayed etc.  And if one of the nodes fails, we have to notice it.  And if we notice that a node has failed, we need to figure out which job(s) it was responsible for and reallocate them to another node which presumably hasn&#8217;t failed&#8230;</p>
<p>And on and on.  The point is that this isn&#8217;t something that you&#8217;re going to get done in an afternoon.  Indeed, it&#8217;s unlikely that you could write a correct and complete specification of the system in an afternoon.</p>
<p>What map-reduce buys you is all of the distribution bits that I mention above.  The costs are that you will have to figure out how to model things such that they&#8217;re compliant with the map-reduce model and that the end result might not be as efficient as a hand-coded solution.  But when you can scale easily and cheaply, then raw efficiency may not be such an over-arching concern.</p>
<p><strong>scaling in the cloud with elastic map-reduce<br />
</strong></p>
<p>In order to get an idea how difficult it is to actually scale a solution, I implemented a simple program using real data and collecting some timings across a variety of  AWS/EMR configurations.<strong> </strong>To do this, I first had to code against Hadoop v18.3 which is what is supported in AWS.  (Initially I wrote it to v20.2 and was surprised at how much I had to change to back-port it to 18.3.)</p>
<p>The details of my test program aren&#8217;t important &#8211; basically counting various features from a set of heterogeneous files.  The files are approximately the same size ~1.8G and there were 10 of them, so the overall dataset is a bit over 18G.  I only do one map-reduce; I&#8217;m not chaining them together.  On a local machine &#8211; a pre-nehalem, dual-chip w/ quad-cores (8 total) xeon server with hardware raid &#8211; the process took 1068s or a touch under 18 minutes.</p>
<p>I was impressed to see that with absolutely no code changes, I was able to run the system on EMR without difficulties.  In fact, selecting the number of nodes to run against and the size of those nodes is just a matter of changing launch parameters.  Very nice.  AWS offers a variety of different specs on their servers.  I tested the classic &#8217;small&#8217; server which provides a 32-bit O/S, one core and 1.7G of memory as well as the &#8216;large&#8217; server which comes with a 64bit O/S and 2X2 cores and 7.5Gs of memory.  There are other configurations available, but I didn&#8217;t test them.  In all cases I was running centos5.4.</p>
<p>The results are tabulated and graphed below.</p>
<div class="wp-caption aligncenter" style="width: 441px"><img src="/images/emr-timings.jpg" alt="" width="431" height="181" /><p class="wp-caption-text">performance locally and in various cloud configurations </p></div>
<div class="wp-caption aligncenter" style="width: 420px"><img src="/images/emr-timings-chart.jpg" alt="" width="410" height="228" /><p class="wp-caption-text">time to complete vs. # nodes for large &amp; small nodes</p></div>
<p>Scaling isn&#8217;t exactly linear, but it is good and it is simple and it is cheap &amp; on-demand. Clearly for any given use-case, it will make sense to do tests like this to figure out what makes the most sense.  Another dimension to consider is cost.  Even partial hours count as a full hour, so putting big or many boxes against jobs that don&#8217;t take long to run isn&#8217;t cost effective.  That said, even though I fecklessly ran jobs which took as little as 10 minutes while paying for 60, the entire costs associated with developing and running the example I present here was less than $4US.</p>
<p>One thing I noticed is that the time to initialize instances seems to grow as you increase the number of nodes, though I didn&#8217;t quantify this.  In some cases, it was really quite long before real work started getting done &#8211; several minutes at least.  Getting this to work for interactive processes will take some thought and, probably, some dedicated nodes.  Another things to note is that although my local server did a fine job for itself, if we were to increase the dataset by a factor of, say 100, we&#8217;d reach a point where it simply couldn&#8217;t complete the operation whereas the cloud isn&#8217;t so strictly bounded.  It&#8217;s also pretty interesting that my local machine with 8-cores and 24G of RAM and pretty decent RAID subsystem got soundly spanked by eight of Amazon&#8217;s cheapie 1-core boxes.</p>
<p>Given that it is cheap, massively and simply scalable and not terribly difficult, one could say that if you don&#8217;t spend some time with your head in the clouds, then it&#8217;s possible you&#8217;ve got it stuck in the sand.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2010/04/21/head-in-the-clouds/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>lock free</title>
		<link>http://www.puppetmastertrading.com/blog/2010/02/16/lock-free/</link>
		<comments>http://www.puppetmastertrading.com/blog/2010/02/16/lock-free/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 00:55:38 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=1010</guid>
		<description><![CDATA[
One of the recurring technology themes in these pages has been the ongoing and dramatic move from single to multi-core systems and the need to seriously increase the parallelism in our software designs. For me, one of the seminal, large-grained design patterns was the SEDA Architecture. For years, this informed my systems&#8217; designs and formed [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright" style="margin-left: 5px; margin-right: 5px;" src="/images/lockfree.jpg" alt="" width="310" height="324" /></p>
<p>One of the recurring technology themes in these pages has been the ongoing and dramatic move from single to multi-core systems and the need to seriously increase the parallelism in our software designs. For me, one of the seminal, large-grained design patterns was the <a title="SEDA Architecture" href="http://www.eecs.harvard.edu/~mdw/proj/seda/" target="_blank">SEDA Architecture</a>. For years, this informed my systems&#8217; designs and formed a conceptual backbone for development. That said, I&#8217;ve been broadly aware for some time that SEDA&#8217;s golden age has (incredibly!) already passed us by, but haven&#8217;t identified what might replace it as a reference point for my design efforts.</p>
<p>Before considering tools, languages or patterns that might help, we need to reflect on the problem(s) we&#8217;re trying to solve. The problems inside an EMS look to me, after years of development, a lot like network routing problems. Indeed, my current view is that this (not just concurrency as I&#8217;d suggested at the time) is why the <a title="FT: Aleynikov indicted" href="http://www.ft.com/cms/s/0/235b1260-1776-11df-87f6-00144feab49a.html" target="_blank">unfortunate</a> Aleynikov &amp; co. at GS were <a title="the other interesting thing about the serge aleynikov story" href="http://www.puppetmastertrading.com/blog/2009/07/08/the-other-interesting-thing-about-the-serge-aleynikov-story/" target="_blank">using erlang</a>.</p>
<p>Why network routing? Think about the load on an EMS. The main issue is that you&#8217;re getting many thousands of teeny little messages per second and only a relatively small number of them matter to only a relatively small subset of &#8216;agents&#8217; within the system. Reducing latency is all about making sure the time you spend on each message is minimized, and that the agents who are interested in a particular message needn&#8217;t wait for each other to do whatever they care to do based on the message. So, really you&#8217;re trying to route each message through your system with as few &#8216;hops&#8217; possible and as much parallelism as you can muster under the (radically!) new assumption that you may have hundreds or thousands of cores available to you during the lifetime of the design.</p>
<p>I spent some time thinking (hoping) that languages might help furnish an answer. Perhaps a move to a functional language like erlang, ocaml or scala might help furnish at least a partial answer. But erlang is slow and peculiar, ocaml doesn&#8217;t support intra-process concurrency and scala looks like a bloated language on a bloated platform (jvm+java class library). And none of them seem to have achieved anything near the critical mass which is so crucial for the development of usable libraries and the availability of skilled developers with long experience in the technology.  Naturally, reasonable people will disagree about such things, but this is my view (today). Java is ok (and certainly sells servers), but it&#8217;s not obvious how it&#8217;s going to help me offload my work onto a GPU anytime soon (and jni is both painful and slow) and I&#8217;ve never been able to get comfortable with just how damn big VMs get.  Image size isn&#8217;t free and if we&#8217;re looking to go deep into the sub-millisecond response time, <em>while running thousands of concurrent strategies</em>, it seems we need to disintermediate the VMs and interpreters of the world. If they&#8217;re really necessary, they can be happily used for the analysis process (as I currently use R), or they can be lit-up and bridged from some lower-level language for batch-like services.</p>
<p>The <a title="greatest technology advertisement ever?" href="http://www.youtube.com/watch?v=jqLPHrCQr2I" target="_blank">good people</a> at Intel have been thinking about this problem for a while as have many other seriously over-educated people. One of the (sensible sounding) conclusions reached as people look for ways to solve problems similar to my own, is that in such systems we should keep messages waiting as little as possible &#8211; ideally, not at all(!). This can be a problem in SEDA-like architectures which are basically made-up of (non-blocking, asynchronous) i/o processes linked to (blocking) queues linking pools of workers. Blocking queues can pile up and cause all sorts of problems like priority inversion and other such enigmatically named nasties. Lock-free queues and other data structures, algos and techniques promise some ways around this and I&#8217;ve been spending time looking into how they might be employed to address my issues.</p>
<p>Before I&#8217;m besieged by throngs of angry erlang/ocaml/scala/java developers, allow me one last observation on the topic.  (Peeved python and ruby users may rant away &#8211; vous m&#8217;amusez  ;^)</p>
<p>Why might a lock free algorithm be better than an equivalent, hardware-based locking implementation?  The answer isn&#8217;t obvious.  If locking is implemented in hardware as is typical (eg, with a compare-and-swap (CAS) instruction), then its explicit cost is measurable in (few) nanoseconds.  Hardware is fast.  The issue isn&#8217;t the speed of execution of the underlying primitives so much as it&#8217;s a consequence of the side effects of these operations at a very low level.  For real performance, cache coherence is King.  See <a title="McKenney: Memory Barriers: a Hardware View for Software Hackers" href="/images/hwViewForSwHackers.pdf" target="_blank">here</a> for an accessible discussion by IBM&#8217;s Paul McKenney and <a title="Ostrovsky: gallery of processor cache effects" href="http://igoro.com/archive/gallery-of-processor-cache-effects/" target="_blank">here</a> for some remarkable examples from Igor Ostrovsky.  This indicates that if you want the highest possible performance, you need to be aware of what is happening &#8216;in the metal.&#8217;  So we need to use a system-level language and erlang, java &amp; friends lose their candidacy in spite of any fantastic benefits they might offer.</p>
<p>Given that even the DoD has mostly given up on ADA means that we&#8217;re left with C/C++.</p>
<p>Ok, so language doesn&#8217;t seem to resolve much for us. (Indeed, it was mostly hopeful thinking on my part – design is mostly language agnostic and hardware is hardware&#8230;)</p>
<p>Apart from Intel&#8217;s own <a title="Intel's Threading Building Blocks" href="http://www.threadingbuildingblocks.org/" target="_blank">Threading Building Blocks (TBB) framework</a>, there are a variety of toolkits available for exploiting lock free parallelism. Perhaps the newest and least known is called <a title="FastFlow parallel programming framework" href="http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about" target="_blank">FastFlow</a>, which is a C++ template library that provides a variety of facilities for writing efficient lock-free network models. It also claims to be faster than TBB, <a title="MIT's Cilk" href="http://supertech.csail.mit.edu/cilk/" target="_blank">Cilk</a> and <a title="OpenMP" href="http://openmp.org/wp/" target="_blank">OpenMP</a> while holding out the promise of one day becoming CUDA- (or more generally, GPU-) aware which would be an incredible win. Finally, it is very small &#8211; the current version (not including tests and examples), weighs in at ~5K lines of (mostly) C++ templates.  Thus, it seems to me particularly well-suited for some experimentation to assess the fit of these techniques in this space and the level of difficulty of doing so.</p>
<p>In the remainder of this post, I&#8217;ll briefly describe the FF design and then illustrate a sample C++ program which uses FastFlow to &#8216;architecturally prototype&#8217; a feed handler interacting with strategies inside an EMS / <a title="containing a strategy" href="http://www.puppetmastertrading.com/blog/2009/08/19/containing-a-strategy/" target="_blank">strategy container</a>.</p>
<p><span id="more-1010"></span></p>
<div class="wp-caption aligncenter" style="width: 509px"><img src="/images/fastflow_architecture.png" alt="FastFlow architecture" width="499" height="382" /><p class="wp-caption-text">FastFlow architecture</p></div>
<p>FastFlow&#8217;s (FF) goals seem quite ambitious, but the design is layered to allow users to work at a level that makes sense for the problems they are trying to solve, so they needn&#8217;t worry themselves (or even understand necessarily) layers below where they need to work.  At bottom is a thin layer of hardware-specific code supporting, one level up, single-producer-single-consumer (SPSC) lock-free queues and a threading model for their efficient interaction.  Above this are defined objects for generalizing from simple to arbitrary network types.  Here you see the definition of a composable Node which serves as the key abstraction for general-purpose streaming networks.  Above this are defined a variety of skeleton templates for building commonly useful graph types.  Because they are Nodes themselves, they can be nested and coupled to any depth.</p>
<p>Consider a simple pipeline:</p>
<div class="wp-caption aligncenter" style="width: 300px"><img title="pipeline" src="/images/ff_pipeline.png" alt="" width="290" height="60" /><p class="wp-caption-text">a simple FastFlow pipeline</p></div>
<p>A farm is slightly more complex and acts something like a load balancing threadpool.  The Emitter pushes &#8216;tasks&#8217; of arbitrary type to an arbitrary number of workers who then coalesce back into an (optional) Collector.  A simple Farm makes no promises about the ordering of tasks, but (I believe) FF provides mechanisms for ensuring their ordering at the collector.</p>
<div class="wp-caption aligncenter" style="width: 300px"><img title="FastFlow Farm" src="/images/ff_farm_with_coll.png" alt="" width="290" height="140" /><p class="wp-caption-text">a FastFlow Farm</p></div>
<p>Finally, because of the composable nature of Nodes, all of these graph types can be combined arbitrarily.  Here you see a Farm of pipelines with an &#8216;accelerator&#8217; (which, if I understand correctly is just an integrated thread for pushing tasks into the network) and a feedback channel.</p>
<div class="wp-caption aligncenter" style="width: 340px"><img title="Composition of subnets" src="/images/ff_composition2.png" alt="" width="330" height="140" /><p class="wp-caption-text">composition of FastFlow subnets</p></div>
<p>For a more complete (and probably accurate!) description of FastFlow, I encourage you to visit their site and <a title="FastFlow Tutorial" href="http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:usermanual" target="_blank">tutorial</a>.  The documentation remains limited, but there&#8217;s a good and growing selection of examples to examine and all of the source is available for inspection.</p>
<p><strong>a simple example</strong></p>
<p>Given my use-case, my first interest is in seeing how FastFlow might work in the context of asynchronous, non-blocking network handling.  To this end, I&#8217;ve built a simple Farm as depicted above in which the Emitter uses <strong>select()</strong> to read messages off the network which it parses and pushes into the farm for &#8216;processing&#8217; which is currently <a title="Wiki: NOP" href="http://en.wikipedia.org/wiki/NOP" target="_blank">NOP</a>-ed.  The messages/tasks are then coalesced into a collector which does nothing more than free memory allocated in the Emitter.  Within the Farm, each of the nodes inhabits its own thread and outside of the farm I&#8217;ve implemented a very simple &#8216;client&#8217; which connects to the Emitter&#8217;s socket and sends delimited Integral &#8216;messages.&#8217;  Ordinarily this would appear in its own process, but for the sake of only requiring 1 file, I&#8217;ve mashed them all together and the client lives in its own thread.  That&#8217;s it.</p>
<p>It turns out that the FastFlow part of this is, by far, the simplest.  Since the FF skeletons are implemented as templates, extending them couldn&#8217;t be much easier.  Here&#8217;s the definition of the worker and collector components.</p>
<pre class="brush: cpp;">
// typical worker: does little ;^p
class Worker: public ff_node {
public:
  void * svc(void * task) {
    return task;
  }
};

// collector that just frees the malloc'ed memory
class freeing_collector: public ff_node {
public:
  void * svc(void * task) {
    int * t = (int *)task;
    if (*t == -1) return NULL;
    else free(task);
    return GO_ON;
  }
};
</pre>
<p>They are both nodes and their only requirement is to implement the <strong>svc</strong> method which is called by the FF framework when they are meant to service a task.  They can optionally implement <strong>svc_init</strong> and <strong>svc_end </strong>methods which provide an opportunity for initialization and cleanup.  The socket-reading emitter makes use of the <strong>svc_init</strong> method to initially setup the socket.  Later, as it repeatedly reads from the socket into a buffer, it cracks the incoming stream and then calls the <strong>ff_send_out()</strong> method to push those messages out to the downstream workers.  The whole example is about 350 lines of code and nearly 200 lines are consumed by the Emitter.  Almost all of this bulk is network-related, so I won&#8217;t illustrate here.  But highlighted here you can see the main FF-specific part of the Emitter:</p>
<pre class="brush: cpp; highlight: [16,17,18,19];">
 // handle new data coming in
 void new_data(int nbytes, const char* buf) {
   static char remainder[kMaxMsgSz];
   int *t;
   int j = 0;
   int i = 0;
   char msgbuf[kMaxMsgSz];
   for (; i &lt; nbytes; i++) {
     if ( kDelimiter == buf[i] ) {
       int rlen = (j==0) ? strlen(remainder) : 0; // remainder?
       int len = i-j;
       if (rlen&gt;0) strncpy(msgbuf,remainder,rlen);
       int start = (rlen==0) ? 0 : rlen;
       strncpy(&amp;(msgbuf[start]),&amp;(buf[j]),len);
       msgbuf[len+rlen] = '&#92;&#48;';
       t = (int *)malloc(sizeof(int));
       *t = atoi(msgbuf);
       j = i;
       ff_send_out(t);  // push message into farm
     }
   } // for

   int k = j+1;
   if (k &lt; i) { // save 'remainder'
     strncpy(remainder,&amp;(buf[k]),(i-k));
   } else remainder[0] = '&#92;&#48;';
 }
</pre>
<p>Finally, all of the pieces are assembled and launched in the<strong> main()</strong>.</p>
<pre class="brush: cpp;">
//  We construct a 'server' which is a fastflow emitter and which reads
//    integral 'msgs' from a client over a non-blocking socket.  Msgs
//    are parsed and fed into a fastflow farm for further handling.
//    The client is placed within its own thread - this is only so the
//    whole example can be placed in one file.
//
int main(int argc, char * argv[]) {
 // use: argv[0] &lt;#msgs=1024&gt; &lt;#port=9999&gt;

 int msgs = (argc &gt; 1) ? atoi(argv[1]) : 1024;
 int port = (argc &gt; 2) ? atoi(argv[2]) : 9999;

 int nworkers = 5;               // how many workers will recv msgs

 printf(&quot;main: sending #%d msgs to port &lt;%d&gt;\n&quot;,msgs,port);

 select_reader sr(port);         // we create 'server'
 freeing_collector fc;           // and a freeing collector
 ff_farm&lt;&gt; farm;                 // and a farm for it to live in
 farm.add_emitter(&amp;sr);          // add both to the farm
 farm.add_collector(&amp;fc);

 std::vector&lt;ff_node *&gt; workers; // build a collection of workers
 for(int i =0; i &lt; nworkers; i++) { workers.push_back(new Worker); }

 farm.add_workers(workers);      // add all workers to the farm

 farm.run();                     // launch the farm
 printf(&quot;main: started farm.\n&quot;);

 sr.wait_til_ready();            // don't create client til srvr ready

 // create client which will send msgs via socket
 //
 printf(&quot;main: creating client...\n&quot;);
 client_thread* client = new client_thread(msgs,port);
 if (client != NULL) client-&gt;join();

 farm.wait();                    // wait for farm to be done its workload

 // emit some stats
#if defined(TRACE_FASTFLOW)
 std::cout &lt;&lt; &quot;DONE, time= &quot; &lt;&lt; farm.ffTime() &lt;&lt; &quot; (ms)\n&quot;;
 farm.ffStats(std::cout);
#endif

 return 0;
}
</pre>
<p>The whole example can be found <a title="FF networking example: C++ source" href="/images/ttest.cpp" target="_blank">here</a>.</p>
<p>Writing this example was a bit of a &#8216;tale of two frameworks&#8217; for me as I found the FastFlow part to be really easy and powerful.  On the other hand, I wanted to try out the <a title="boost::asio" href="http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio.html" target="_blank">boost::asio</a> framework for non-blocking networking i/o (primarily in the hope it might cut down the code size a bit) and while implementing the trivial (blocking) client was, well, trivial, implementing a non-blocking server with boost::asio was a serious pain and I ultimately decided that <a title="Linus' screed on C++ and boost" href="http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918" target="_blank">Linus just might be on to something wrt to boost</a> and scrapped that part of the effort (but kept the boost client).</p>
<p>Nothing about this example illuminates the potential performance advantages of a lock free approach, but this exercise has certainly convinced me that the FF authors have managed to combine very low-level performance considerations within a package which can be easily used from a high-level perspective.  Most problems will most likely be better served by higher-level programming languages, but some problems remain intractably bound to the metal and I suspect that writing ultra-low-latency trading systems will remain in the latter class for some time to come.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2010/02/16/lock-free/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Kooderive</title>
		<link>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/</link>
		<comments>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 15:31:20 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[monte-carlo methods]]></category>
		<category><![CDATA[open-source software]]></category>
		<category><![CDATA[options pricing]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=1000</guid>
		<description><![CDATA[Some time back, I&#8217;d written about NVidia&#8217;s CUDA noting that it looked ideal for many asset-pricing and monte-carlo type problems in finance.  At the time, I was hopeful that it would be quickly integrated into existing open source efforts like QuantLib, but adoption has proved slower than I&#8217;d hoped, most likely because implementing non-trivial problems [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 240px"><img class="   " src="/images/cuda_simonRogerson.jpg" alt="photo by Simon Rogerson" width="230" height="173" /><p class="wp-caption-text">photo by Simon Rogerson</p></div>
<p>Some time back, I&#8217;d <a title="TESLA &amp; CUDA" href="http://www.puppetmastertrading.com/blog/2008/11/29/nvidias-tesla-and-the-compute-unified-device-architecture/" target="_blank">written</a> about NVidia&#8217;s CUDA noting that it looked ideal for many asset-pricing and monte-carlo type problems in finance.  At the time, I was hopeful that it would be quickly integrated into existing open source efforts like <a title="QuantLib: a free/open-source library for quantitative finance" href="http://quantlib.org/" target="_blank">QuantLib</a>, but adoption has proved slower than I&#8217;d hoped, most likely because implementing non-trivial problems on CUDA is, well, even less trivial than doing them without..</p>
<p><strong>LMM on CUDA</strong></p>
<p><strong> </strong>Happily, I&#8217;ve just seen a promising first step in this direction as Über-quant and C++ artisan <a title="Mark Joshi" href="http://www.markjoshi.com/" target="_blank">Mark Joshi</a> recently announced an open-source project, <a title="Sourceforge: Kooderive" href="http://sourceforge.net/projects/kooderive/" target="_blank">Kooderive</a> which looks to implement the <a title="Wiki: LMM" href="http://en.wikipedia.org/wiki/LIBOR_market_model" target="_blank">LIBOR Market Model</a> (LMM)  on top of CUDA.  His announcement on the QuantLib mailing lists reads:</p>
<blockquote><p>Dear All,</p>
<p>various people have shown interest in the use of <span id="lw_1265210335_0">CUDA</span> with QuantLib. I<br />
have now made some progress on a CUDA implementation of the <span id="lw_1265210335_1" style="border-bottom: 1px dashed #0066cc; background: transparent none repeat scroll 0% 0%; cursor: pointer;">LIBOR<br />
market model</span>.</p>
<p>In particular, I now have a path generator for the LMM working which<br />
does 16384 paths for 40 rates, 40 steps, 5 factor model, displaced<br />
diffusion predictor-corrector that takes 0.1 seconds on my Quadro 4600.</p>
<p>The state of the project is code fragments that can be called from<br />
other code. Those who are interested can get the code via<br />
the subversion repository on <a href="http://kooderive.sourceforge.net/" target="_blank"><span id="lw_1265210335_2">kooderive.sourceforge.net</span></a> .  The only<br />
project file is currently for VC9 x64. It also uses thrust and the<br />
CUDA SDK.</p>
<p>The next stage will be writing routines, that use QuantLib for the CPU<br />
stuff and kooderive for the GPU stuff,  to actually price things.</p>
<p>A gentle reminder that I will be giving a course on the LMM and<br />
QuantLib in June in <span id="lw_1265210335_3" style="background: transparent none repeat scroll 0% 0%; cursor: pointer;">London</span>, and I will include a session on kooderive<br />
if there<br />
is sufficient interest.</p>
<p>I am happy to take code contributions for kooderive. However, I am not<br />
looking for a redesign of the library or contributions which introduce<br />
dependence on other libraries. I am interested in contributions of<br />
separate routines and of optimizations of existing routines that do<br />
not change interfaces.</p>
<p>regards</p>
<p>Mark<br />
&#8211;<br />
Pricing exotic <span id="lw_1265210335_4" style="border-bottom: 1px dashed #0066cc; background: transparent none repeat scroll 0% 0%; cursor: pointer;">interest rate derivatives</span> &#8211; The <span id="lw_1265210335_5" style="background: transparent none repeat scroll 0% 0%; cursor: pointer;">LIBOR Market Model</span> in<br />
QuantLib <span id="lw_1265210335_6" style="border-bottom: 1px dashed #0066cc; cursor: pointer;">June 2010</span>, London,<br />
<a href="http://www.moneyscience.com/training/index.html" target="_blank"><span id="lw_1265210335_7">http://www.moneyscience.com/training/index.html</span></a></p>
<p>Assoc Prof Mark Joshi<br />
Centre for Actuarial Studies<br />
<span id="lw_1265210335_8">University of Melbourne</span><br />
My website is <a href="http://www.markjoshi.com/" target="_blank"><span id="lw_1265210335_9">www.markjoshi.com</span></a></p></blockquote>
<p><span><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2010/02/03/kooderive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>core arb</title>
		<link>http://www.puppetmastertrading.com/blog/2009/12/15/core-arb/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/12/15/core-arb/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 15:05:17 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[FIX Protocol]]></category>
		<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=918</guid>
		<description><![CDATA[Cloud computing looks to have turned yet another interesting corner.   This time the turn leads towards the development of a liquid, fully electronic new marketplace in &#8220;spot instances&#8221;.
&#8216;Spot&#8216; means what you would expect it to in the context of trading: the current pricing for immediate delivery of a commodity.  &#8216;Instance&#8216; is the atomic element [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 95px"><img src="/images/core.jpeg" alt="core arbitrage?" width="85" height="127" /><p class="wp-caption-text">FIX interface?</p></div>
<p>Cloud computing looks to have turned yet <a title="Amazon EC2 Spot Instances Blog post" href="http://aws.typepad.com/aws/2009/12/ec2-spot-instances-and-now-how-much-would-you-pay.html" target="_blank">another interesting corner</a>.   This time the turn leads towards the development of a liquid, fully electronic new marketplace in &#8220;spot instances&#8221;.</p>
<p>&#8216;<a title="wikipedia: spot priing" href="http://en.wikipedia.org/wiki/Spot_price" target="_blank"><em>Spot</em></a>&#8216; means what you would expect it to in the context of trading: the current pricing for immediate delivery of a commodity.  &#8216;<em>Instance</em>&#8216; is the atomic element within Amazon&#8217;s cloud environment; an instance is the smallest chunk of computing capability which can be provisioned within the cloud.</p>
<p><strong>Amazon is making markets in <em>cores</em> and they&#8217;re exposing functionality just as a regular exchange would: both through user interface &#8217;screens&#8217; as well as programmable APIs.</strong></p>
<p>From their <a title="spot instances announcement" href="http://aws.amazon.com/ec2/spot-instances/" target="_blank">announcement</a>:<strong><br />
</strong></p>
<blockquote><p>Spot Instances enable you to bid for unused Amazon <span>EC2</span> capacity.  Instances are charged the Spot Price set by Amazon <span>EC2</span>, which fluctuates periodically depending on the supply of and demand for Spot Instance capacity. To use Spot Instances, you place a Spot Instance request, specifying the instance type, the region desired, the number of Spot Instances you want to run, and the maximum price you are willing to pay per instance hour. To determine how that maximum price compares to past Spot Prices, the Spot Price history is available via the Amazon <span>EC2 API</span> and the <span>AWS</span> Management Console. If your maximum price bid exceeds the current Spot Price, your request is fulfilled and your instances will run until either you choose to terminate them or the Spot Price increases above your maximum price (whichever is sooner).</p></blockquote>
<h5>embedded optionality</h5>
<p>While the inclusion of, effectively, a market data service is neat, probably the most interesting aspect of the initial protocol they&#8217;ve designed is that it contains embedded optionality and behaves a bit like <a title="wikipedia: barrier options" href="http://en.wikipedia.org/wiki/Barrier_option" target="_blank">barrier options</a>.  That is, when I setup an &#8216;order&#8217;, I need specify a maximum price I&#8217;m willing to pay.  When the spot price drops below my max, I get &#8220;knocked-into&#8221; a contract and instances are allocated to me.  If the spot price rises above my max while I&#8217;m running, I get &#8220;knocked-out&#8221; of the contract and my jobs get terminated.</p>
<p>The intent is to allow for low-priority jobs to be dynamically run whenever pricing drops below a user&#8217;s threshold, but the (intended?) consequence is that it adds the <em>delicious and malleable tang of path dependency</em> to these instruments&#8230;</p>
<h5>secondary markets, FIX, arbitrage..?</h5>
<p>Amazon currently controls the market entirely, but it&#8217;s not hard to imagine a secondary market evolving.  Given that others are beginning to copy Amazon&#8217;s APIs, one can also imagine markets which operate across providers &#8230;  perhaps accessed via FIX?&#8230;</p>
<p>Who knows?  In the not-too-distant future, we may well be able to implement &#8216;<strong><em>core arb</em></strong>&#8216; strategies&#8230;or make markets in cores&#8230; or find that we can effectively hedge with disciplined exposure to the &#8216;core market&#8217; or &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/12/15/core-arb/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ready to launch</title>
		<link>http://www.puppetmastertrading.com/blog/2009/11/08/ready-to-launch/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/11/08/ready-to-launch/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 13:08:50 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[back-testing]]></category>
		<category><![CDATA[regime-switching]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=784</guid>
		<description><![CDATA[In this post I&#8217;m going to revisit some of the topics discussed in the recent &#8216;containing a strategy&#8216; and &#8216;multi-strategy trading with regimes&#8216; posts, focusing on the process of assembling a strategy and its context in preparation for its launch into any of a variety of modes.
I recently realized that &#8211; from the perspective of [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 208px"><img src="/images/slightMiscalc.jpg" alt="he wasnt ready..." width="198" height="148" /><p class="wp-caption-text">poor Jorge wasn&#39;t ready...</p></div>
<p>In this post I&#8217;m going to revisit some of the topics discussed in the recent &#8216;<a title="containing a strategy" href="http://www.puppetmastertrading.com/blog/2009/08/19/containing-a-strategy/" target="_blank">containing a strategy</a>&#8216; and &#8216;<a title="multi strategy trading with regimes" href="http://www.puppetmastertrading.com/blog/2009/09/13/multi-strategy-trading-with-regimes/" target="_blank">multi-strategy trading with regimes</a>&#8216; posts, focusing on the process of assembling a strategy and its context in preparation for its launch into any of a variety of modes.</p>
<p>I recently realized that &#8211; from the perspective of a strategy container &#8211; the process of walk-forward testing is remarkably similar to the regime-switching model we&#8217;d discussed previously.  Up until now, I&#8217;ve employed walk-forward testing in an ad-hoc manner by taking an existing strategy and then writing a little driver very much like a unit-test scaffolding which would walk the strategy forward, permuting parameters based on previous performance.  Not a general solution, but straight-forward as I employ the strategy parameter optimizer from stratbox in this kind of a <em>toolkit</em> use-case.</p>
<p>I sat down to write one of these walk-forward scaffolds yesterday and started to think about how I could generalize the solution and roll it into stratbox&#8217;s GUI and it occurred to me that I could likely kill two birds with one stone&#8230;</p>
<h4><span id="more-784"></span><span style="color: #000000;">walk-forward testing</span></h4>
<p>&#8211;</p>
<p>I imagine there are different ideas/implementations of it, but for me walk-forward testing is the case where you are using repeated historical parameter &#8216;optimizations&#8217; to dynamically modify the parameters on a back-tested strategy.  For example, say we have strategy X with parameters p1 and p2.  Let&#8217;s say we want to walk-forward test it from the beginning of the year until now.  We&#8217;ll need a look-back period for optimizations &#8211; say 20 trading days &#8211; and a frequency with which to apply the results of our optimizations &#8211; say every 5 trading days.  So, on a weekly basis we&#8217;re going to look back at the prior (rolling) month performance and adjust our parameters to match the <em><strong>best</strong></em> performer.  How you define &#8216;best&#8217; is your business but I&#8217;ll use something simple like sharpe here.</p>
<p>Continuing with our example, we&#8217;ll say that p1 will be permuted 3 ways with values from { 1, 2, 3 } and p2 will be permuted across 4 values { &#8220;a&#8221;, &#8220;b&#8221;, &#8220;c&#8221;, &#8220;abcd&#8221; }.  So we have a basis of 12 strategies for optimization and we&#8217;ll call these O1&#8230;O12.</p>
<p>To perform a walk-forward test on X from 2009/01/01-now, we will begin a parallel back-test on O1..O12 starting 20 trading days <strong><em>before</em></strong> Jan1.  On Jan2 we will look at the prior ~month&#8217;s results and pick the strategy who had the highest sharpe ratio and clone him and start testing him &#8220;walking forward.&#8221;  Each week, we&#8217;ll repeat the process except that instead of cloning the winner as we did the first time, we&#8217;ll just make sure his params are the same as the new winner&#8217;s params.</p>
<p>Repeat this process until &#8216;now&#8217; and you&#8217;re done &#8211; you&#8217;ve performed walk-forward testing on X.</p>
<p>&#8211;</p>
<p>Let&#8217;s think about what&#8217;s happening here.  There are 12 strategies running from ~2008/12/1 and another strategy running from 2009/1/2; all strategies run until now.  There&#8217;s also the activity that &#8220;we&#8221; have done during this process.  What we are doing is the role of a meta-strategy just as we&#8217;d seen in the regime-switching case.  The walk-forward meta-strategy needs to watch a set of &#8216;optimization&#8217; strategies and permute another target (&#8220;X marks the spot&#8221;) strategy on a regular basis to match parameters with the prior period&#8217;s winner.</p>
<h4><span style="color: #000000;"><strong>differences vs. allocating to strategies with regimes</strong></span></h4>
<p>Let&#8217;s imagine that we&#8217;re going to take the 13 lovingly-named strategies O1&#8230;O12 and X, along with the newly revealed meta-strategy we&#8217;ll dub &#8220;M&#8221;, and put them onto a live environment instead of a historical back-test.  Suddenly we&#8217;re pretty close to what we&#8217;d described previously.  The biggest difference is just the <em>temporality</em> of the strategy context: historical or real-time.  Another difference is that O1&#8230;O<em><strong>n</strong></em> (let&#8217;s generalize just a bit) are no longer necessarily all the same strategy &#8211; they may be completely different. X is also no longer a strategy, but a portfolio of strategies:  a replicated subset of O1&#8230;On we&#8217;ll call X1..Xm.  &#8220;Best&#8221; is no longer a unary choice but must be rich enough to allow M to support some kind of an allocation algorithm across the X family.</p>
<p>OK, so there are some significant differences.  But from the perspective of the object model on which all of this can be implemented, the differences may be limited enough to warrant tackling together.</p>
<h4><span style="color: #000000;"><strong>ready to launch</strong></span></h4>
<p>Before looking at the workflow of a strategy launch, let&#8217;s consider all of the various modes we support with our design:</p>
<ul>
<li>Historical: back-test single strategy</li>
<li>Historical: parameter-based optimization of strategy (decomposes to many back-tests)</li>
<li>Historical: walk-forward testing of strategy</li>
<li>Real-time: SIM &#8211; run single strategy with real-time market data and an exchange simulator</li>
<li>Real-time: LIVE &#8211; run single strategy with real-time market data and a live connection to OMS/broker/exchange</li>
<li>Real-time: LIVE+SIM &#8211; run two cloned instances using the same market data, but with one routing orders to live venue(s) and the other to a simulator.  This is mostly useful for the development of a simulator or to evaluate the quality of simulation for a given strategy.</li>
<li>Real-time: optimization &#8211; run many strategies, permuted differently across live data but routing to a simulator.  I don&#8217;t know what this is useful for, but I find it entertaining&#8230; The screenshots from the <em>regimes</em> post were illustrating this mode.</li>
<li>Real-time: meta-strategy &#8211; our mythic regime-switching allocator will be a nice example of this mode.</li>
</ul>
<p>-</p>
<p>Consider the following sorta-flowchart illustrating the preparation for launch of a strategy.</p>
<div class="wp-caption aligncenter" style="width: 606px"><a href="/images/strategyLaunchFlow.jpg"><img src="/images/strategyLaunchFlow.jpg" alt="" width="596" height="466" /></a><p class="wp-caption-text">click for larger version...</p></div>
<p>The process has 4 or 5 stages depending on whether we&#8217;re being &#8220;meta&#8221; or not &#8211; that is, if we&#8217;re using optimization of any sort.  The highlighted parallelograms are the concrete end-result(s) for each of their respective steps.</p>
<ul>
<li><strong>Assemble Strategy Context</strong> &#8211; The basic channels of I/O for a trading strategy are its sources for market data and the exchanges where trades can be executed.  The MktDataSrc and ExecutionPlatform abstractions need to be specified.</li>
<li><strong>Assemble Strategy</strong> &#8211; Whatever mode we&#8217;re looking to employ will require a strategy either to itself go and trade for us or else to serve as the seed or prototype for our activities.  In this step we define that strategy.  For me, this can be a composite process as a strategy is made up of an arbitrary number of stratparts.  The assembly of a strategy can be done manually with a gui or by loading a strategy from a file or other store.  Once the parts are identified, they can be bound together to form a strategy instance.</li>
<li><strong>Configure Strategy</strong> &#8211; Once assembled, the strategy needs to be configured.  This may just mean we need to parameterize it appropriately for our use.  If we&#8217;re being meta and require optimization then we&#8217;ll also need to describe the parameters to modify and the ranges for each.  This gets boiled-down into an OptimizationDescriptor which is composed of a prototype strategy and the parameter/range data required to perform the requisite back-tests.</li>
<li><strong>Configure Optimization or Meta-strategy</strong> &#8211; If we&#8217;re not running an optimization or utilizing a meta-strategy, this step is skipped.  Otherwise, this is where the optimization and/or meta-strategy should be built and configured.</li>
<li><strong>Review &amp; Launch</strong> &#8211; Anytime you might loose a piece of software on an account with real money in it, it behooves you to pause and reflect for a moment.  Now is that moment.  If, after that moment of reflection, both you and whatever risk measures you employ agree that it&#8217;s OK, then it&#8217;s time to launch.  Happily, this is easy because we now have everything we need.  Again, the meta cases are a bit more involved as the strategies to be optimized or managed must be first generated based on the optimization descriptor we&#8217;ve inherited from earlier steps.  Then it&#8217;s a matter of assembling the pieces previously selected and configured and away we go.</li>
</ul>
<p>-</p>
<blockquote><p>Anytime you might loose a piece of software on an account with real money in it, it behooves you to pause and reflect for a moment.  Now is that moment.</p></blockquote>
<p>I was taught that writing software is easy <em>if</em> you can express the problem you&#8217;re looking to solve <em>and its solution</em> in clear and precise English.  I&#8217;m not sure about this being grammatical, clear or precise, but hopefully it&#8217;s enough to illustrate a means of uniformly modeling and implementing these beasties across a variety of use-cases.</p>
<p>For me, at least, writing this description is helpful as it forces me to walk through the various twists and turns in some detail before digging into the code.  When I get the integrated walk-forward testing working I&#8217;ll try to provide some illustration of its use and will look-ahead to developing a regime-switching example using the models/mechanisms we&#8217;ve been discussing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/11/08/ready-to-launch/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>easy money</title>
		<link>http://www.puppetmastertrading.com/blog/2009/10/27/easy-money/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/10/27/easy-money/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 13:35:05 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=698</guid>
		<description><![CDATA[There seems to be a developing meme out there suggesting that algorithmic-, and in particular high-frequency, trading is some kind of gold-rush route to easy money which brings to mind&#8230;
&#8230;this revision of a paper I&#8217;d read previously: &#8220;Statistical Arbitrage in the US Equities Market&#8221; by Avellaneda and Lee.   It&#8217;s a detailed and thoroughly worked [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 252px"><img style="margin: 3px 5px;" src="/images/easyMoney.jpg" alt="" width="242" height="300" /><p class="wp-caption-text">you, hf-trading</p></div>
<p>There seems to be a developing meme out there suggesting that algorithmic-, and in particular high-frequency, trading is some kind of gold-rush route to easy money which brings to mind&#8230;</p>
<p>&#8230;this revision of a paper I&#8217;d read previously: <a title="Statistical Arbitrage in the US Equities Market by Avellaneda &amp; Lee" href="/images/AvellanedaLeeStatArb20090616.pdf" target="_blank">&#8220;Statistical Arbitrage in the US Equities Market&#8221;</a> by Avellaneda and Lee.   It&#8217;s a detailed and thoroughly worked (and now re-worked) paper illustrating the development and analysis of a US equity stat-arb strategy based on <a title="Principal Component Analysis" href="http://en.wikipedia.org/wiki/Principal_component_analysis" target="_blank">Principal Component Analysis</a> (PCA) and then revised to use ETFs.</p>
<p>I came across this paper as I have still never used PCA in any of my own strategy development work and read Carol Alexander&#8217;s excellent <a title="Market Models, Carol Alexander " href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471899755.html" target="_blank"><span style="text-decoration: underline;">Market Models</span></a> over my summer vacation with an eye towards giving a PCA hedging model a spin in the near-term. Thus, I wanted another look at this paper as a reference point.  Although it&#8217;s an excellent paper, I&#8217;m not going to urge you to go out and read it immediately unless you have a reasonably pressing practical interest.  Instead, I find it interesting largely because of one of its authors &#8211; Professor Avellaneda &#8211; and its conclusions in the form of its strategies&#8217; performance.</p>
<p>I&#8217;ve seen Prof Avellaneda speak a number of times at a variety of quant meetups organized by the relevant <a title="Columbia's Center for Financial Engineering" href="http://www.cfe.columbia.edu/">Columbia</a>/<a title="Courant" href="http://www.cims.nyu.edu/">NYU</a> financial engineering depts.  His paper reminds me that at least once during my noisome adolescent years, my father intoned darkly that:</p>
<blockquote><p><strong>the streets are littered with brilliant minds</strong></p></blockquote>
<p><span id="more-698"></span>Implying that any wits I may believe myself to possess wouldn&#8217;t by themselves be worth much in life and that I&#8217;d need to bring actual <em>tools</em> to the task of solving problems if I wanted to address interesting ones.  Having seen Mr Avellaneda speak, I&#8217;m confident that at my peak, my &#8220;processor&#8221; was never as fast as his.  Much worse, there is no comparison between the tools he can level at a problem compared to me &#8211; he&#8217;s on an entirely different playing field so far as concrete mathematical/analytical capabilities go.  That&#8217;s why I go see him speak and read his papers.</p>
<p>Thus, the <em>results</em> of Avellaneda and Lee&#8217;s work are particularly interesting to me as they&#8217;re really pretty dull: something like a Sharpe of .9 and degrading briskly.  Now, you don&#8217;t expect people to be providing detailed recipes to wildly profitable strategies, and this result isn&#8217;t <em>bad</em>, particularly given that they&#8217;re describing strategies which likely have significant capacity.  Still, it illustrates that very smart people working with sophisticated mathematical tools even over extended periods are still operating under noteworthy constraints.  Perhaps also: ideas are relatively easy &#8211; examining them in the requisite detail is difficult and time consuming, even for (particularly for?) people with the most finely honed toolsets&#8230;</p>
<p>I frequently have friends or colleagues who will observe that if you &#8220;just write a strategy that <em>foos</em> when <em>bar</em> but <em>yaddas</em> when <em>baz</em>&#8230; you should surely make money.&#8221;  Maybe.  But the reality is that just putting together the strategy and <em>working through it</em> takes significant time for anything but the simplest strategies.  Once you add genuine complexity to a strategy, you can spend enormous time tuning it.</p>
<p>This, in turn, poses a dilemma I encounter frequently and honestly don&#8217;t have a great answer for:</p>
<blockquote><p>how to find the balance between continuing development on a known good strategy and initiating the development and analysis of unrelated and novel strategies?</p></blockquote>
<h3>the back of the envelope as canvas</h3>
<p>This next (de-)reference isn&#8217;t directly pertinent to algo trading, but the lessons learned by building <strong>BIG</strong> distributed systems can surely be applied elsewhere.  And they&#8217;re just plain fascinating.</p>
<p>Google&#8217;s Jeff Dean gave a recent talk entitled &#8220;Designs, Lessons and Advice from Building Large Distributed Systems&#8221; at the <span style="font-family: Cambria;"><span style="text-decoration: underline;">La</span>rge Scale <span style="text-decoration: underline;">Di</span>stributed                         <span style="text-decoration: underline;">S</span>ystems and Middleware (somehow &#8220;LADIS&#8221;) <a title="LADIS" href="http://www.cs.cornell.edu/projects/ladis2009/" target="_blank">workshop</a> and the slides are <a title="Jeff Dean Keynote LADIS" href="http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf" target="_blank">here</a>.   Go read them. </span></p>
<p><span style="font-family: Cambria;">If bald exhortation doesn&#8217;t convince you maybe slide 24 will:</span></p>
<p><span style="font-family: Cambria;"><img class="aligncenter" src="/images/ladis-slide24.png" alt="" width="497" height="377" /><br />
</span></p>
<p>or perhaps what he does with these baseline numbers in slide 27 will pique your interest:</p>
<div class="wp-caption aligncenter" style="width: 512px"><img src="/images/ladis-slide27.png" alt="back of the envelope as art form" width="502" height="375" /><p class="wp-caption-text">back of the envelope as art form</p></div>
<p><span style="font-family: Cambria;">One that made me (a serial prototype-builder) cringe:<br />
</span></p>
<blockquote style="text-align: left;"><p>Important skill: ability to estimate performance of a system design<br />
<span style="color: #ff0000;">– without actually having to build it!</span></p></blockquote>
<p>Ouch.</p>
<h3><span style="color: #ff0000;"><span style="color: #000000;">maybe it is easy after all</span></span></h3>
<p><span style="color: #ff0000;"><span style="color: #000000;">Of course, if you&#8217;ve studied Avellaneda &amp; Lee&#8217;s paper and it held no challenges or surprises and you&#8217;ve reviewed Mr Dean&#8217;s presentation and it&#8217;s old hat to you, too&#8230;<br />
</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;"> </span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/10/27/easy-money/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>our solid-state future</title>
		<link>http://www.puppetmastertrading.com/blog/2009/09/04/our-solid-state-future/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/09/04/our-solid-state-future/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 12:58:23 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[market data]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=577</guid>
		<description><![CDATA[I&#8217;ve never been a hardware guy. Hardware has gotten so fast throughout my professional life that it has just never been a big issue. Also, on wall st we had a robust and annual budget for h/w so I&#8217;d routinely sign-off on hundreds of thousands of dollars on all sorts of machines I&#8217;d never lay [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 260px"><img src="/images/solidStateFuture.jpg" alt="Mmmm... hardware.." width="250" height="249" /><p class="wp-caption-text">Mmmm... hardware..</p></div>
<p>I&#8217;ve never been a hardware guy. Hardware has gotten so fast throughout my professional life that it has just never been a big issue. Also, on wall st we had a robust and annual budget for h/w so I&#8217;d routinely sign-off on hundreds of thousands of dollars on all sorts of machines I&#8217;d never lay eyes on and somehow they always did the trick.</p>
<p>Before 9/11, they&#8217;d be in server racks in the building or down the street, but since then they might also be in increasingly far-flung places like weehawken or long island, tampa, even texas or beyond. The machines always seemed unbelievably overpriced &#8211; I remember over the years pretty consistently paying something like $40K for a low-end db server.  But that&#8217;s what it cost and you could only purchase approved products from approved channels, so nobody spent much thought on it.  Now that I don&#8217;t have the same kinds of constraints &#8211; or budgets! &#8211; I increasingly have to think of hardware.</p>
<p>As a software engineer, the hardware itself is also insisting that I pay some uncharacteristic attention to it.  The evolution of processors has reached a point where the programming paradigms many of us have fruitfully employed over many years are no longer suited for getting full performance out of today&#8217;s machines.  The recent introduction of remarkably powerful and inexpensive parallel-computing platforms based on GPUs like <a title="CUDA" href="http://www.puppetmastertrading.com/blog/2008/11/29/nvidias-tesla-and-the-compute-unified-device-architecture/" target="_blank">nvidia&#8217;s cuda</a> also outline a future that even current university training doesn&#8217;t address in a fashion practically adapted for institutional application.  Cores are multiplying like Tribbles.</p>
<p>The lines between persistent storage and main memory are also blurring as consumer SSDs push up from the &#8216;low&#8217;-end while exotic ioDrives and the like offer a glimpse of a world where the performance gap between the two approaches nil and after their long reign myriad metallic platters will spin no more.</p>
<p><span id="more-577"></span></p>
<p style="text-align: left;">
<div class="wp-caption aligncenter" style="width: 498px"><img class=" " src="/images/power7-die.jpg" alt="troubling like Tribbles" width="488" height="386" /><p class="wp-caption-text">troubling like Tribbles</p></div>
<p>There have been some steps taken towards taming the core dilemma.  Google&#8217;s introduction of the distributed map-reduce paradigm and all of the associated plumbing on top of computers in the &#8216;cloud&#8217; is probably the boldest and most effective reaction thus far, but it&#8217;s not always obvious that you want your stuff running in someone else&#8217;s cloud amongst many other natural limitations of this approach.  This is also a solution &#8216;in the large&#8217; and sometimes you need performant solutions on a different, smaller scale.  Here, the development of functional languages and idioms may be of some help, but there certainly don&#8217;t seem to be clear winners yet.</p>
<p>Erlang, Ocaml, Haskell, Scala and others all seem to have very limited impact thus far and all face big challenges before receiving widespread welcome.  Worse, any language can be mangled into expressing things poorly so the languages can only be a meaningful aid to programmers who are able to adjust their mindset for a new world&#8217;s computing paradigm.  This likely won&#8217;t be easy for many until there is an established set of usable programming idioms and toolsets for dealing with concurrency on a whole new scale.  To me, it seems that functional programming might well be an important part of that, but it&#8217;s difficult to imagine it as the complete game changer any time soon.  As it is, many of us have already been using functional languages (in my case, sql and R) on a regular or even daily basis for a long time, so it&#8217;s difficult to cast this as the revolution, bottled.</p>
<p>As cores proliferate and the bandwidth amongst them increases, new challenges and opportunities are unveiled.  Feeding all of those hungry cores can be a chore.  If you already had problems with i/o bound processes, then adding cores is sort of like adding liquidity to a debt crisis &#8211; not obviously helpful. We faced an issue like this recently while trying to improve the throughput of our backtesting subsystem in particularly poor-performing cases.</p>
<p>In these cases, we use daily data to initialize an intraday strategy.  For example, we might look at all equities over a trailing 3 or 5-year period, and perform various analytics – like calculating correlation matrices, beta against various benchmarks, volatility, etc – to determine which names might be candidates for the strategy’s portfolio.  We found that an inordinate amount of our time was spent just performing these morning analytics and that the cost of a day’s backtest was significantly spent on this oft-repeated morning exercise.  While working through this issue, we noted that we’ve reached a point where a decent, dual nehalem, server (with 16 logical cores) could be built with 48G of RAM for something along the lines of $6K.  Seriously.  So we stuck all of our daily data into memory and the improvement has been essentially infinite.  Maybe not the best example, but hopefully illustrative of the fact that the h/w underneath us is changing qualitatively and that we need to be more active in involving it in our design decisions.</p>
<p>Of course, even a yawning expanse like 48G or even a terabyte (in not too many years, after all, I should be able to buy 1T of RAM for my boxes at a similar price point to what I pay for 48G today), eventually gets consumed and so we can’t hope to employ this solution for all our problems.  We continue to develop our historical TAQ infrastructure (most recently discussed <a title="tick data and hdf5" href="../2009/01/06/tick-data-hdf5-part-2/" target="_blank">here</a>) and this is certainly an example where buying memory isn’t going to get you very far.  But SSDs now are getting reasonable and so our current approach uses memory as much as possible but when it needs to get volumes of detailed historical data we’ve placed our indices onto SSDs while the actual stores themselves reside on much more affordable RAID arrays.  Reading indices is now much faster and adds minimal i/o overhead to a very i/o-bound problem.</p>
<p>By constantly looking at new languages and concurrency idioms, vigilantly assessing the current and projected costs of ’solid-state’-solutions to our most vexing problems, and just staying active and creative, we hold out some hope that we can transition gracefully to what is looking increasingly like a solid-state future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/09/04/our-solid-state-future/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>containing a strategy</title>
		<link>http://www.puppetmastertrading.com/blog/2009/08/19/containing-a-strategy/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/08/19/containing-a-strategy/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 19:46:33 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[EMS Internals]]></category>
		<category><![CDATA[FIX Protocol]]></category>
		<category><![CDATA[portfolio management]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=543</guid>
		<description><![CDATA[My son recently had his first birthday and amazes me daily with his new feats as he runs around increasingly stably exploring the world around him.  It occurs to me that the system I use to trade every day, Stratbox, is approaching its fourth &#8220;birthday&#8221; in the next few months.  I hadn&#8217;t originally intended to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 3px 5px;" src="/images/theChase.jpg" alt="" width="240" height="194" />My son recently had his first birthday and amazes me daily with his new feats as he runs around increasingly stably exploring the world around him.  It occurs to me that the system I use to trade every day, Stratbox, is approaching its fourth &#8220;birthday&#8221; in the next few months.  I hadn&#8217;t originally intended to write a system &#8211; an algorithmic trading platform &#8211; but found that existing products were limited, expensive and didn&#8217;t fit my mental model of what they should do.</p>
<p>This isn&#8217;t surprising as I wanted the system to support all of the activities associated with our algorithmic trading.  It turns out that that&#8217;s a lot to ask of a system.  It also turns out that you learn as you go and so the system continues to evolve.  A few years ago I&#8217;d <a title="putting the pieces together" href="http://www.puppetmastertrading.com/blog/2007/10/12/putting-the-pieces-together/" target="_blank">posted</a> about the basics of a strategy container and in this post I&#8217;m going to come back to this topic and describe some of the layers of code and thought developed since then.</p>
<p>First, let&#8217;s consider the role of a strategy container.  Its job is to intermediate between trading strategies and the external environments with which they interact.  It must also provide services that strategies can use (e.g., position management) and that it wouldn&#8217;t make sense for each strategy to re-implement.  In the past I&#8217;ve focused on the former responsibility of adapting strategies to external environments.  Why is this necessary and interesting?  Because it allows us to take the same exact strategy and run it live, or in simulation or in backtest, etc.  Interesting and necessary, but not what I want to focus on this time.  Instead, I want to look at the services provided to strategies; the &#8216;ecosystem&#8217; a strategy container provides in the hope that strategies might flourish within it.</p>
<p><span id="more-543"></span></p>
<p>We&#8217;ll go from the bottom up.  At the bottom you have one or more pipes of market data coming in.  You might also have news feeds, weather feeds or other such things coming in and we&#8217;re just going to conflate them all and say &#8220;market data.&#8221;  Still at the bottom, you also have one or more  two-way pipes with your oms or broker or directly with exchanges or with an exchange simulator.  Within Stratbox we call this abstraction an &#8220;ExecutionPlatform&#8221; because it&#8217;s conceptually where trades get executed.  The lingua franca for execution platforms is FIX, so the baseline object model of a strategy container is likely going to look a lot like messages described by the FIX spec.  Here we have orders, executions and the like.  Within Stratbox we&#8217;ve implemented an exchange simulator, a <a title="QuickFIX: Open source FIX engine" href="http://www.quickfixengine.org/" target="_blank">QuickFIX</a>-based FIX interface and a couple broker-specific APIs and they are all of type ExecutionPlatform.  To a strategy, they all look the same.  Likewise with market data.  Within the strategy container, we provide a publish-subscribe model wherein any kind of market data can be subscribed to regardless of the ultimate source of the data.  The intention is always to intermediate between the strategy and its external environment.</p>
<p><strong>the baseline object model</strong></p>
<p>Great.  But now I want to write a trading strategy.  What&#8217;s that going to look like?   And this is where things get increasingly interesting as we get to decide what kind of facilities we&#8217;re going to provide.  We also need to manage concurrency in some fashion as trading is an intrinsically asynchronous activity.  For this lowest level of strategy, we&#8217;ll say that each market feed is handled by a thread which timestamps and enqueues a &#8220;Quote&#8221; for later consumption and redistribution to relevant subscribers by a MarketDataService within the container.  Likewise, each execution platform will be serviced by a thread who similarly enqueus &#8220;Execution&#8221; objects from exchanges real and simulated.  The threads handling executions should likely have a higher priority than the threads handling market data.  So, the strategy is naturally exposed to the markets&#8217; asynchronicity right off the bat.  What baseline facilities will our strategy have at its disposal?  Well, it needs to be able to manage orders and positions, so it needs some kind of a &#8220;blotter&#8221; facility.  With this in place, a strategy can safely assume that it doesn&#8217;t need to listen to each execution coming in just to have a correct picture of its book.  It needs access to a MarketDataSvc and it likely also needs access to historical data.  But, staying simple, that&#8217;s more or less all we have to provide and we&#8217;re going to provide all of these things to the strategy through a single handle: a strategy context.  By swapping out a strategy&#8217;s context, we can move the strategy among environments (eg, from simulation to live execution).</p>
<p>With these facilities in place, basic algorithms can be implemented within a strategy and you should be able to test and trade them.</p>
<p>But you&#8217;re dealing with very low-level stuff.  You&#8217;ll also find that you&#8217;re writing the same kind of code over and over.  This inspired the introduction of a layer on top of the baseline object model which I&#8217;ve described before: <a title="StratParts" href="http://www.puppetmastertrading.com/blog/2008/04/12/stratparts-a-strategy-component-model/" target="_blank">StratParts</a> &#8211; a strategy component model.</p>
<p><strong>StratParts &#8211; a component model for strategies</strong></p>
<p>With StratParts, we introduced metadata and composition (in the <a title="Composites" href="http://en.wikipedia.org/wiki/Composite_pattern" target="_blank">&#8220;composite pattern&#8221;</a> sense) to our Strategy object model.  A strategy is itself a stratpart which contains other stratparts.  Thus, stratparts introduce a hierarchical structure to a strategy.  Each stratpart publishes a metadata descriptor which is aggregated at the strategy level.  The descriptor contains all of the modifiable characteristics of the strategy and can be changed while a strategy is executing thus giving a sort of grey box capability wherein a trader (or another strategy etc) can modify the behavior of a strategy as it&#8217;s running.  Stratparts also create an effective means of providing scoping within your strategy environment.  Stratparts can &#8220;see&#8221; the activity of other stratparts within the same strategy &#8211; they&#8217;re all peers in this sense.  This scoping can also be used for allocation of resources across different elements of a strategy.  For example, one stratpart might be allocated 80% of the cash available to the strategy while another stratpart manages the remaining 20%.</p>
<p>Stratparts have proved to be very powerful and we&#8217;ve written many of them though we&#8217;ll sometimes write strategies as one monolithic stratpart where it makes sense.</p>
<p>Stratparts have a lot of uses but one thing they don&#8217;t do is help manage the low-level complexity inherent in trading activities.  This was best described by a trader with whom I&#8217;d collaborated.  He&#8217;s a reservist in the IDF and uses military metaphors like I use sports metaphors.  He bemoaned the low-level handling of orders required for a very close-to-the-market strategy we were working on and said that:</p>
<blockquote><p>we need an order like a &#8217;smart&#8217; missile: fire it and forget it</p></blockquote>
<p>He was right and that inspired the next level of abstraction/support that we built into the system.</p>
<p><strong>&#8217;smart&#8217; orders and the tradeflow stratpart</strong></p>
<p>The strategy we were developing was meant to look at a universe of futures spreads and generate all of the &#8216;cycles&#8217; that might result in an arbitrage opportunity.  Since these are rare at best, we were really looking for circumstances where it looked like we had an advantage based on depth-of-market and various heuristics we&#8217;d apply.  Among the functionality that he wanted was an &#8216;order&#8217; which would act as a limit unless some conditions obtained in which case it might pay the spread or otherwise stop being a fixed limit order.  He didn&#8217;t want to handle this inside the strategy but instead wanted to fire these things off and forget about them unless they required attention (eg, if they&#8217;re rejected by the exchange or his firm&#8217;s risk management checks).  Although we ended up calling these smart orders, they&#8217;re really a sort of very localized execution strategy themselves.</p>
<p>To support them, we utilized the same workflow (= state machine) framework we&#8217;d implemented for our ExchangeSimulator as we wanted to ensure that these smart orders had a very well-defined life-cycle with a clear set of states and guarantees about transitions among them.  We embedded this workflow engine into a stratpart which supported smart orders operating within this &#8220;tradeflow&#8221;.  Thus, the extra functionality (and weight/expense) of this functionality is only achieved/incurred when actually required or desired.  This has proven to be a powerful addition to the system and raises the level of services enjoyed by prospective strategies substantially, particularly given the variety of orders we&#8217;ve implemented which stand ready for use by any new strategy that might want to employ them.</p>
<p>At this point, it should be clear that we&#8217;re blurring the lines between the strategies themselves and the container.  Another way of thinking of it is that one container can have a set of pre-made building blocks that can be applied for families of strategies.  Each one provides a particular grammar with which a strategy&#8217;s aims can be expressed.  For very close-to-the-market strategies where very low-level handling of orders is required, the tradeflow stratpart provides a set of really useful facilities, but if I&#8217;m just trading with limits then it might not be so interesting.</p>
<p><strong>portfolio-oriented strategies and beyond<br />
</strong></p>
<p>I&#8217;ve mentioned <a title="portfolio strategy" href="http://www.puppetmastertrading.com/blog/2008/09/13/portfolio-atomic-element-of-a-trading-strategy/" target="_blank">before</a> that I only care to think of strategies that operate on a portfolio.  As such, we&#8217;ve written a good deal of facilities for creating, analyzing and manipulating portfolios within strategies.  By using these facilities within a tradeflow stratpart, I can express quite complex strategies relatively simply by using portfolio analytics to determine my current &#8216;model&#8217; portfolio and then using smart orders to most effectively transition me from my current state to my model state.  For the kind of trading I do, this provides a rich &#8216;vocabulary&#8217; for the development of strategies.  All the same, my brain is having a <em>tip-of-the-tongue moment</em> as I feel that there&#8217;s an entirely different model possible for portfolio-oriented strategies just at the periphery of my imagination&#8230;  oh well, there&#8217;s always something new to explore.</p>
<p>Different trading styles, perspectives or trading problems are undoubtedly better served by different sorts of facilities.  I&#8217;m finding that just as language constrains and shapes thought, the strategy container that you employ shapes and constrains the kinds of strategies that you can implement.</p>
<p>If you have any ideas about such facilities for any kind of trading you do, I&#8217;d love to hear them.</p>
<p>&#8211;</p>
<p><strong>a note about Stratbox</strong></p>
<p><img class="alignright" src="/images/sb.jpg" alt="" width="161" height="161" />Although I&#8217;m talking about our system in this blog, we&#8217;re not marketing the system and happily &#8220;just&#8221; use it for our own trading activities.  We had looked into marketing the system previously but ultimately feel the same about selling stratbox as <a title="pimp that strat" href="http://www.puppetmastertrading.com/blog/2009/03/18/pimp-that-strat/" target="_blank">selling strategies</a>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/08/19/containing-a-strategy/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>real battlebots</title>
		<link>http://www.puppetmastertrading.com/blog/2009/08/17/real-battlebots/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/08/17/real-battlebots/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 15:10:03 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[dereferenced]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=526</guid>
		<description><![CDATA[
There&#8217;s been a lot of attention focused on  trading battlebots recently.  It&#8217;s important to keep in mind that this is part of a long-standing, broad and arguably inexorable trend that is now spreading rapidly away from its successful base in industrial manufacturing to every other conceivable field from scheduling and logistics, to CAD and on [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 3px 5px;" src="/images/weddingParty.jpg" alt="wedding party popper" width="336" height="182" /></p>
<p>There&#8217;s been a lot of attention focused on  <a title="trading battlebots" href="http://www.google.com/search?q=trading+battlebots" target="_blank">trading battlebots</a> recently.  It&#8217;s important to keep in mind that this is part of a long-standing, broad and arguably inexorable trend that is now spreading rapidly away from its successful base in industrial manufacturing to every other conceivable field from scheduling and logistics, to CAD and on to more aggressive pursuits like trading and battlefield operations.  Perhaps looking at the state of the art in related fields can inform us about the direction of our algo bots.</p>
<p><a title="Drones take over" href="http://www.foreignpolicy.com/articles/2009/08/13/this_week_at_war_a_weak_state_solution_for_afghanistan" target="_blank">This article</a> in Foreign Policy illustrates an area where automation is making great strides into historically human undertakings.  The use of so-called drone aircraft for recon and tactical missile strikes has reached a remarkable milestone: <strong>this year, the US Air Force will train more &#8220;pilots&#8221; for unmanned aircraft than for real fighters or bombers</strong>.  Evidently there&#8217;s good reason for this change:</p>
<blockquote><p>By 2013, software and communications improvements will allow the Air Force&#8217;s unmanned-aircraft pilots to simultaneously fly three drones at one time, and four in an emergency. Another factor supporting the likely proliferation of drones such as the Predator, Reaper, and Global Hawk is their low cost compared with new manned aircraft such as the F-35 Joint Strike Fighter.</p>
<p>According to the Government Accountability Office, $24.5 million will purchase a set of four MQ-9 Reaper hunter-killer drones plus a ground station and satellite relay. (See page 117 of <a href="http://www.gao.gov/new.items/d09326sp.pdf" target="_blank"><span style="text-decoration: underline;">this report</span></a>.) The latest guess of the price for a single F-35 fighter-bomber is $100 million. (See page 93.) This gap in cost led Defense Secretary <strong>Robert Gates </strong>to demand the cancellation of the manned F-22 Raptor program <a href="http://www.af.mil/news/story.asp?id=123163023" target="_blank"><span style="text-decoration: underline;">in order to fund the purchase of more drones</span></a> for service in Afghanistan and Iraq.</p></blockquote>
<p><span id="more-526"></span></p>
<p>So, for the price of one F-35, I can get a 16-strong phalanx of drones and the required ground-based support to deploy them.  For the price of one F-22, I could get over 20.  Sounds like a deal.  But, they&#8217;re not really interchangeable parts as the F-35 or Raptor and other fighter-bombers can overcome &#8220;air defense threats&#8221; while the drones aren&#8217;t quite there yet.  Thus, they&#8217;re really only good against opponents that don&#8217;t have an air force or anti-aircraft capabilities.  In schoolyard parlance: they&#8217;re only good for defenseless opposition.  But we&#8217;re buying lots of them, so we must have found nails for our hammer.</p>
<p>While developmentally they are likely in different places, there&#8217;s probably an analogy to be made between the current state of development between military drones and their algo trading equivalents.</p>
<p>If today&#8217;s algos are like today&#8217;s drones in that they&#8217;re relatively dumb and can&#8217;t overcome defensive threats, what might tomorrow&#8217;s algos look like?  <strong>Who will they be able to target that they can&#8217;t currently?</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/08/17/real-battlebots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>the trading frequency spectrum</title>
		<link>http://www.puppetmastertrading.com/blog/2009/07/28/the-trading-frequency-spectrum/</link>
		<comments>http://www.puppetmastertrading.com/blog/2009/07/28/the-trading-frequency-spectrum/#comments</comments>
		<pubDate>Tue, 28 Jul 2009 13:42:45 +0000</pubDate>
		<dc:creator>tito</dc:creator>
				<category><![CDATA[hedge funds]]></category>
		<category><![CDATA[our managed markets]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[strategy development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.puppetmastertrading.com/blog/?p=145</guid>
		<description><![CDATA[
I&#8217;ve been saving the above image in a stubbed-out blog post I&#8217;ve wanted to write since a conversation I&#8217;d had in Jerusalem last fall.  The recent attention to high frequency trading and all of its attendant evils has reminded me that the topic is relevant and so I relate various thoughts at the risk of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Frequency Spectrum" src="/images/frequencySpectum.gif" alt="" width="755" height="300" /></p>
<p>I&#8217;ve been saving the above image in a stubbed-out blog post I&#8217;ve wanted to write since a conversation I&#8217;d had in Jerusalem last fall.  The recent attention to high frequency trading and all of its attendant evils has reminded me that the topic is relevant and so I relate various thoughts at the risk of jumping on a cacophonous bandwagon of rumbling misinformation.</p>
<p>First of all, the conversation.  It was with a talented guy who acted as the CFO for a variety of companies including a small startup hedge fund which traded US equities at a high frequency.   Although he was a part-time cfo, he seemed pretty plugged-into their trading operations and noted that they use an agency-only brokerage service for automated traders I&#8217;m familiar with and that they were &#8220;looking at full data for many&#8221; hundred stocks concurrently. He remarked that their trading was going well but that their hit rate was something like 4% and dropping.  By hit rate, he meant that they were placing limits frequently and generally pulling the orders if they didn&#8217;t get hit immediately.  He didn&#8217;t specify, but I imagine that &#8220;immediately&#8221; might range from milliseconds out to a second or twenty.  If the market is composed of makers and takers, then these guys were definitely makers of liquidity in the strict sense that they were placing limits and making markets.</p>
<p>At the time I thought it was interesting because it seemed that so many people were focused on the very, very short term trade that the frequency was becoming saturated.  It looked like a reminder that trading frequencies populate a spectrum; in this case, this part of the spectrum was becoming so saturated that returns were becoming increasingly difficult to obtain as more players crowded into it.  I&#8217;m not sure how this hedge fund has fared, but at the time I remember thinking that they were going to have a tough time competing if they were only geared for high-frequency trading as the space becomes increasingly expensive to play in as the inevitable talent and technology arms race marches on.</p>
<p><a title="Lo &amp; Khandani" href="http://web.mit.edu/alo/www/Papers/august07_2.pdf" target="_blank">Lo and Khandani</a> provide the below image illustrating this phenomenon happening to a class of contrarian strategies Lo &amp; MacKinlay had described in 1990.  The strategies stop working as people squeeze out the alpha.</p>
<p><span id="more-145"></span></p>
<p><img class="aligncenter" title="contrarians crushed" src="/images/crushedContrarians.jpg" alt="" width="594" height="436" /></p>
<p>My conversation in Jerusalem mostly made me think that we were seeing a similar phenomenon amongst HF strategies.</p>
<p>What does it mean for a strategy to be high-frequency?  First of all, it&#8217;s a large class of strategies which probably shouldn&#8217;t be treated uniformly.  What they have in common is an intention to trade in and out of positions on a frequent basis where frequent will range from sub-seconds out to perhaps several seconds or even minutes in particularly felicitous cases.</p>
<p>Aside from the fact that one can trade at various frequencies, one can mix them and one might even be only peripherally aware of doing so.  A long-only, fundamentally-driven mutual fund (ie, not an algo or high-frequency trader) might call/fax/email/ftp/fix etc their trades into their brokers who might then execute the trades with their in-house or outsourced/white-labeled execution-quality algos.  Those algos might use some very clever close-to-the-market analytics to provide great execution for the client.  Or they might be traded profitably against.  Or both.</p>
<p>In any case, to me it seems clear that there is nothing intrinsically wrong about high-frequency trading itself.  People will always try to react to information as quickly as possible.  Why wouldn&#8217;t or shouldn&#8217;t they?  They also like to be clever.  Again, why wouldn&#8217;t this be expected and good?  I remember Lefevre recounting the use of personal teletypes by big speculators at the turn of the (prior) century.  Not your everyday household item at the time.  I also remember him recounting strategies for moving large positions which involved both buying and selling to hide one&#8217;s hand.  Why wouldn&#8217;t algos do the same and more?</p>
<p>That the loudest critics have been old-style execution traders &#8220;talking their book&#8221; to me tells the story here.</p>
<p>One other thing that my conversation evinces is the kinds of biz models being employed by brokers.  Like I said, this little hf hedge fund used an agency-only brokerage that caters to algo/hf traders.  Why is it important to note that it&#8217;s &#8220;agency only&#8221;?  Because, as the Goldman/Aleynikov story illustrates, strategies are organizationally porous in the sense that their value drips away as the human capital behind them moves from organization to organization and understanding of the strat&#8217;s internals become understood more broadly outside the organization.  This is likely the dynamic that drove Lo&#8217;s example above &#8211; more and more traders were employing similar strategies as the knowledge of the strategy leaked further and further from its original source(s).  Likewise, if I see all of your trading activity in sufficient detail, I might be able to reverse-engineer your strategy and work to steal your alpha.</p>
<p>Thus, traders are happy not to advertise their strategies&#8217; behaviors.  So an agency-only broker &#8211; a broker who doesn&#8217;t engage in prop trading themselves &#8211; should inspire trust in a potential client.</p>
<p>Great!  What an honest business model!  But there&#8217;s an irony here and it points to the real kinds of problems we should be bugging our regulators to be addressing.  The specific agency-only broker I mention wasn&#8217;t self-clearing.  That is, they were using another broker (Goldman in this case) to handle their backoffice duties.  And guess what?  Goldman is anything but agency only.  So, clients who feel warm and fuzzy that they are dealing with an agency-only shop are actually exposing all of their activities with a particularly sophisticated and arguably predatorial prop trader.  It&#8217;s like the Guiness ads.</p>
<p>It&#8217;s funny to me that while there sits a multi-trillion dollar hole in our fed&#8217;s balance sheet which such hippies as fox news and bloomberg (suing to find out) and our US congress (asking politely) can&#8217;t seem to tease an explanation for out of the relevant authorities, the blogosphere and regulators seem to focus their invective on short sellers, hedge funds and high-frequency traders.</p>
<div class="wp-caption aligncenter" style="width: 338px"><img title="Brilliant?" src="/images/Guiness-Brilliant.jpg" alt="partners at GS?" width="328" height="240" /><p class="wp-caption-text">secret meeting at 85 broad?</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.puppetmastertrading.com/blog/2009/07/28/the-trading-frequency-spectrum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
