Jekyll2021-07-17T01:10:51+00:00https://baxtersa.github.io/feed.xmlSam BaxterSam Baxter - M.S. Computer Science - iZotope - Management, Systems, Serverless, Programming Languages - He/him/hisInteractive Plotting2017-11-23T16:57:13+00:002017-11-23T16:57:13+00:00https://baxtersa.github.io/2017/11/23/plotly<p>I’ve been getting familiar with <code class="language-plaintext highlighter-rouge">ggplot</code> in R for generating figures while
working on a PLDI submission. I spent a bit of time messing around with
styling and have grown increasingly disappointed that I can’t include
interactive plots in PDF submissions :/. Fortunately, Web browsers are much
more featureful, and I’d like to incorporate some interactivity into my
technical posts moving forwards.</p>
<p><br /></p>
<h1 id="it-works-sort-of">It works! Sort of…</h1>
<p>First, I have a couple requirements I would like to satisfy.</p>
<ul>
<li>Figures should be generated from R code. I don’t want to have to do data
processing in JavaScript.</li>
<li>Graphs and figures should be responsive to user input, and offer the
ability to “dig in” to the results for more details.</li>
<li>Interactive plots should work entirely client-side. I don’t want to host a
server, or deploy to a third party.</li>
<li>Developing interactive posts should integrate cleanly with my GitHub Pages
deployment process. Specifically, plots should be embeddable within Markdown
posts as inline HTML.</li>
</ul>
<p>Here’s what I’ve managed to get working so far. This page embeds a
<a href="https://plot.ly/">Plotly</a> widget as a JavaScript data URL within an
<code class="language-plaintext highlighter-rouge">iframe</code>.</p>
<iframe width="100%" height="600px" src="/plot"></iframe>
<p><br /></p>
<h1 id="some-future-ideas">Some Future Ideas</h1>
<p>This current approach is slooow. I’d like plots to load asyncrhonously
without blocking the page. Ideally, plots would load quick enough that
blocking is a non-issue. I think I can get this working better with some
JavaScript hacking on my own. The current solution is pretty naive, and
simply exports Plotly figures to JavaScript programs from R Studio. If anyone
has experience with this I’d love some help in optimizing the process/end
result!</p>I’ve been getting familiar with ggplot in R for generating figures while working on a PLDI submission. I spent a bit of time messing around with styling and have grown increasingly disappointed that I can’t include interactive plots in PDF submissions :/. Fortunately, Web browsers are much more featureful, and I’d like to incorporate some interactivity into my technical posts moving forwards.Stopify: In-Browser Debugging Abstractions2017-06-10T08:53:36+00:002017-06-10T08:53:36+00:00https://baxtersa.github.io/2017/06/10/stopify-proto<p>This is a writeup of a talk I co-gave at
<a href="http://www.nepls.org/Events/31/">NEPLS</a> on my current research project,
<a href="http://github.com/plasma-umass/Stopify">Stopify</a>. Stopify performs source-to-source
program transformations on the output of to-JavaScript compilers to allow
running code to be interrupted. These transformations provide the abstractions
necessary for us to build in-browser breakpointing and stepping debuggers for
arbitrary languages with compile-to-JS toolchains. Read along for the details!</p>
<p><br /></p>
<h1 id="motivation">Motivation</h1>
<p>It turns out that portable tablets and Chromebooks are increasingly common
devices. Custom-built Linux desktops will always be there for those of us who
have braved those waters, but for the rest, these smaller devices give users
exactly what they need (an internet connection) and a much lower pricepoint and
mental overhead.</p>
<p>But Chromebooks and the rest offer a <em>non-native</em> environment for installing and
running software. This means you can’t run traditional desktop IDEs like
Eclipse, XCode, or MS Visual Studio - you can’t even install compiler binaries
you’d need to run these programming environments.</p>
<p>So bringing programming environments <em>into the browser</em> makes sense because it
addresses these issues.</p>
<p><br /></p>
<h1 id="clientserver-side-decision">Client/Server-side Decision</h1>
<p>Now that we’ve decided to move programming into the browser itself, we have to
briefly discuss whether to run things client-side or server-side.</p>
<p>Server-side programming environments have downsides we can’t overlook:</p>
<ul>
<li>Cost of maintaining server infrastructure for providers of the web-service</li>
<li>Cost of server runtime for users of the web-service</li>
<li>Bound to the internet (i.e. no offline-mode)</li>
<li>The assumption of a reliable internet connection is flat-out unrealistic.</li>
</ul>
<p>Web-based programming environments offer such a great story to educators that
academic contexts are one of our leading motivators. Especially in these
educational contexts, reliable internet connections are unavailable, so
server-side execution of our programming environment is a non-starter.</p>
<p>So we have settled on a client-side, in-browser experience for developing
programs.</p>
<p><br /></p>
<h1 id="current-offerings">Current Offerings</h1>
<p>A number of client-side programming environments exist
(<a href="https://codepen.io">codepen.io</a>, <a href="https://repl.it">repl.it</a>, to name a
couple), but they fall short of offering a real execution environment. Many
compile-to-JS languages also offer in-browser execution environments
(<a href="http://bloomberg.github.io/bucklescript/js-demo/">BuckleScript</a>,
<a href="https://scalafiddle.io/">Scala.js</a>, <a href="http://elm-lang.org/try">Elm</a>, etc.), but
we’re going to focus on the first subset for now. Let’s look at CodePen as an
example.</p>
<p>CodePen let’s us develop JS, HTML, and CSS animations all within the browser.
This is nice for rapid development, but what happens when your JS program
doesn’t behave itself? Here’s a screenshot of running an infinite loop:</p>
<p><img src="/images/codepen-infinite-loop.png" alt="infinite-loop" /></p>
<p>To keep the page responsive, CodePen just terminates the infinite loop and
returns execution to the browser. Already, this rules out writing certain types
of interactive programs in this browser environment. But it gets worse! Let’s
try to run a <em>definitely-terminating</em> but <em>long running</em> program in CodePen:</p>
<p><img src="/images/codepen-long-running.png" alt="long-running" /></p>
<p>Again, it terminates what it detected to be an infinite loop (???), but it
resumes after the loop <strong>with the wrong results</strong>! This is crazy behavior.
<code class="language-plaintext highlighter-rouge">10000000</code> isn’t even <em>too</em> crazy of a loop bound.</p>
<p>Surprisingly, this is <em>better</em> than how many browser-based environments handle
these types of programs. The language playgrounds mentioned above will happily
crash your browser tab (+1 for multi-process Firefox), and you’ll lose all the
code you’ve written up until that point.</p>
<p>What’s going on here? Why is it so hard to offer a simple <strong>Stop</strong> button like
native IDEs provide?</p>
<p><br /></p>
<h1 id="the-browser-runtime">The Browser Runtime</h1>
<p>Browsers process a single event at a time off of the JS Event Queue. User
interaction (<code class="language-plaintext highlighter-rouge">onclick</code>, <code class="language-plaintext highlighter-rouge">onmousemove</code>, etc.) registers events on this queue, to
be processed by any registered event handlers. Let’s look at what happens when
an event is being processed. Below, we see some JS that increments a value <code class="language-plaintext highlighter-rouge">i</code>
in an infinite loop, and never reaches <code class="language-plaintext highlighter-rouge">foo</code>’s return statement.</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">foo</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="kc">true</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">i</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">foo</span><span class="p">();</span>
</code></pre></div></div>
<p>Let’s assume this codes executes in the handler for some <strong>Run</strong> event, which
gets enqueued. When <strong>Run</strong> is handled, <code class="language-plaintext highlighter-rouge">foo()</code> begins executing, incrementing
<code class="language-plaintext highlighter-rouge">i</code> each iteration of the infinite loop. If the user pushes a <strong>Stop</strong> button,
enqueing more events to be processed, they are queued after the current <strong>Run</strong>
event. The Event Queue is <em>blocked</em> on processing these <strong>Stop</strong> events until
<strong>Run</strong> completes, but this never happens. So while the webpage continues
queuing events on user interaction, they never get handled, so the page appears
unresponsive to user input.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Event Queue</th>
<th style="text-align: center">Runtime</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><strong>Stop</strong></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">i=147</code></td>
</tr>
</tbody>
<tbody>
<tr>
<td style="text-align: center"><strong>Stop</strong></td>
<td style="text-align: center">…</td>
</tr>
</tbody>
<tbody>
<tr>
<td style="text-align: center">______</td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">i=0</code></td>
</tr>
</tbody>
<tbody>
<tr>
<td style="text-align: center"><strong>Run</strong></td>
<td style="text-align: center"><code class="language-plaintext highlighter-rouge">foo()</code></td>
</tr>
</tbody>
</table>
<p><br />
Environments like CodePen try to detect this type of behavior, and terminate the
running event handler so that other events can be processed, maintaining the
page’s responiveness.</p>
<p>This means simply providing a <strong>Stop</strong> button doesn’t address the issue of
debugging in the browser. The JavaScript being executed within event handlers
must be <em>instrumented</em> in some way, pause to allow other events to be processed,
and eventually resume.</p>
<p><br /></p>
<h1 id="do-or-do-not">Do or Do Not</h1>
<p>Handling this properly is a large engineering effort. There are existing
solutions that don’t punt on this work (<a href="http://code.pyret.org">Pyret</a>,
<a href="http://wescheme.org">WeScheme</a>, <a href="http://plasma-umass.github.io/doppio-demo/">Doppio</a>,
and others), and engineer these stopping mechanisms into the compiler and
runtime implementation. So these tools produce instrumented code, and a
knowledgable execution environment to properly handle long-running programs and
provide a stop-button-like abstraction for pausing programs.</p>
<p><strong>Stopify</strong> is an alternative to these massive engineering efforts,
instrumenting code and offering a debuggable execution environment for
JavaScript emitted by unmodified compilers. Stopify composes with <em>unstoppable</em>
output from existing compile-to-JS toolchains to produce <em>stoppable</em> programs.
Furthermore, if these compile-to-JS toolchains provide source maps between the
source language and the JS they produce, Stopify can preserve source locations
and allow breakpointing and stepping <em>through the source language program</em>.</p>
<h1 id="stopify">Stopify</h1>
<p>How does Stopify achieve this? There are a number of program transformations
that produce instrumented code enabling these debugging features.</p>
<ul>
<li><strong>JavaScript generators</strong> (a relatively new ES6 language feature) allow
programs to <em>incrementally evaluate</em> until the next <code class="language-plaintext highlighter-rouge">yield</code> point - Stopify
implements a transformation to inject these <code class="language-plaintext highlighter-rouge">yield</code> points.</li>
<li><strong>Continuation-Passing Style</strong> is a program transformation that turns all
control-flow into function applications in tail-position. Functions are
applied to an additional <em>continuation argument</em>, which is a function
capturing the <em>rest of the program to be executed</em> upon the calling-functions
completion. Stopify implements this transformation as well, CPSing JavaScript
directly.</li>
<li><strong>Building a shadow-stack</strong> is similar to the type of engineering involved in
Pyret, WeScheme, and Doppio mentioned above. This involves maintaining a copy
of the stack at runtime, so that a running program can be suspended, and the
running state can be restored from our copy. This is a transformation we are
looking at implementing in the future.</li>
</ul>
<p>Stopify builds the foundation of the in-browser <strong>Paws IDE</strong> we are developing.
Paws will be your typical split-pane editor, with Stopify as the special sauce
under the hood. Because of Stopify’s composable nature, we can easily support
many (dozens?) of compile-to-JS languages with minimal extra effort. Stay tuned
as we continue development, and watch how the work evolves! We’ll hopefully have
a public demo shortly, contact me if you have questions in the meantime!</p>
<p><br /></p>
<hr />
<p><a href="/Stopify-NEPLS">Here</a> are some slides from the talk I presented. Some of the
animations are wonky in the browser, and it turns out minimal text makes for a
good talk, but for not-so-informative slides on their own. Refer to the above
writeup for the details!</p>This is a writeup of a talk I co-gave at NEPLS on my current research project, Stopify. Stopify performs source-to-source program transformations on the output of to-JavaScript compilers to allow running code to be interrupted. These transformations provide the abstractions necessary for us to build in-browser breakpointing and stepping debuggers for arbitrary languages with compile-to-JS toolchains. Read along for the details!Fun Times with C++ Custom Allocators2017-03-08T14:50:47+00:002017-03-08T14:50:47+00:00https://baxtersa.github.io/2017/03/08/custom-allocators<p>I recently wrote a statistics-gathering, segmented free-list allocator for C++
for my <a href="https://emeryberger.com/teaching/grad-systems">Systems class</a>. It was
lots of fun, very insightful, and exposed a couple of fun issues I thought I’d
share. I’m not sure if I can publicize the github repo for the allocator, but
it’s what you’d expect a simple free-list allocator to be. We also read lots of
cool papers listed on the course site. Check them out!</p>
<p><br /></p>
<h1 id="the-point-of-the-allocator">The Point of the Allocator</h1>
<p>I described the allocator we implemented as a “statistics-gathering, segmented
free-list allocator”. So what does that mean? Let’s break it down.</p>
<ul>
<li><em>Statistics-gathering</em> - Our allocator collects information about the memory
requests made and amount of memory in use and any given time step <code class="language-plaintext highlighter-rouge">t</code>.
Specifically, we care about the cumulative number of bytes requested and
number of bytes allocated at any time step <code class="language-plaintext highlighter-rouge">t</code>, and the max of each of those
stats over all time steps.</li>
<li><em>Segmented free-list</em> - Free-list allocators place freed blocks of memory
back into a data structure to be reused by calls to <code class="language-plaintext highlighter-rouge">malloc</code> before
allocating a new chunk in the heap. <em>Segmented</em> refers to the categorization
of objects in this data structure into discrete subsets based on size of the
object. All this is to say that each size class of allocated object is given
a unique free-list to allow for a sort of best-fit allocator that falls back
on allocating a new chunk from the heap if no free blocks of the given class
exist in its free-list.</li>
</ul>
<p>The segmentation of the free-lists is an implementation detail that allows for
decent efficiency. The reasoning for gathering statistics may be less obvious.
Wait for a follow-up post on conservative garbage collection for how we end up
using this information.</p>
<p>An important point to highlight is that we store this per-object metadata in
haeder structs immediately preceding each object in the heap. Here’s a sketch of
what the heap may look like after an allocation:</p>
<p><img src="/images/custom_heap.png" alt="Custom Heap Layout" /></p>
<p>So every allocated object has a constant space-overhead for its associated
header. This is small for our custom allocator, but can still be larger than
small heap-allocated objects. Managed languages like <code class="language-plaintext highlighter-rouge">Java</code> have even larger
headers containing things such as synchronization primitives, so they use clever
tricks to avoid allocating a header larger than the object it tracks. We’re not
going to concern ourselves with this. Check out <a href="http://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon98Thin.pdf">thin
locks</a>
if you’re interested in learning more.</p>
<p>So that’s the high-level of our allocator! We protect the <code class="language-plaintext highlighter-rouge">mmap</code>‘d heap with a
global lock to allow for safe, multithreaded allocations, and otherwise we are
off to the races!</p>
<p><br /></p>
<h1 id="re-linking-programs-to-use-the-allocator">Re-linking Programs to Use the Allocator</h1>
<p>Suppose you would like to test your custom allocator against some precompiled
binaries, using your allocator in a terminal session executing <code class="language-plaintext highlighter-rouge">bash</code> commands.
*NIX operating systems make this simple to do!</p>
<p>On Linux systems:</p>
<ul>
<li>The <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> environment variable loads a specified dynamic/shared
library before loading any other code upon launching a binary.</li>
</ul>
<p>On Mac:</p>
<ul>
<li>The <code class="language-plaintext highlighter-rouge">DYLD_INSERT_LIBRARIES</code> environment variable performs the same function.</li>
</ul>
<p>Note that you may need to set <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code> to the directory containing your
allocator’s shared library to get things to work.</p>
<p>Now let’s see this in action running <code class="language-plaintext highlighter-rouge">less</code> in a shell:</p>
<table>
<thead>
<tr>
<th style="text-align: right">PID</th>
<th style="text-align: right">USER</th>
<th style="text-align: right">PR</th>
<th style="text-align: right">NI</th>
<th style="text-align: right">VIRT</th>
<th style="text-align: right">RES</th>
<th style="text-align: right">%CPU</th>
<th style="text-align: right">%MEM</th>
<th style="text-align: right">TIME+</th>
<th style="text-align: right">S</th>
<th style="text-align: right">COMMAND</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">19321</td>
<td style="text-align: right">usernm</td>
<td style="text-align: right">20</td>
<td style="text-align: right">0</td>
<td style="text-align: right">1049.7m</td>
<td style="text-align: right">4.2m</td>
<td style="text-align: right">0.0</td>
<td style="text-align: right">0.1</td>
<td style="text-align: right">0:00.00</td>
<td style="text-align: right">S</td>
<td style="text-align: right">less</td>
</tr>
</tbody>
</table>
<p>It should be obvious that <code class="language-plaintext highlighter-rouge">less</code> doesn’t require <code class="language-plaintext highlighter-rouge">1GB</code> of memory. This shows up
because we statically allocate a <code class="language-plaintext highlighter-rouge">1GB</code> heap when we load our allocator,
regardless of the memory requirements of the program. It’s fine though because
virtual memory is magic and 64-bit address spaces mean we can allocate a <code class="language-plaintext highlighter-rouge">1GB</code>
heap per-process and not worry about it unless we actually need to use the full
heap.</p>
<p>Still, we’re able to run <code class="language-plaintext highlighter-rouge">bash</code> commands with evidence that our custom allocator
is being used! It will also be apparent that your allocator is being used if
you’ve made a mistake, in which case it’s likely you’ll coredump. Yay.</p>
<p>Depending on your implementation strategy, you may experience some noticeable,
and less-desirable properties of your allocator. Since you’ve caught <code class="language-plaintext highlighter-rouge">malloc</code>
calls, allocations should be using your system. It should be safe to assume a
well-behaved program will pass pointers to <code class="language-plaintext highlighter-rouge">free</code> that live in your heap with a
valid header object immediately preceding it in memory. If you want to verify
this is the case, you can linearly search a linked-list of allocated objects to
make sure it’s not an invalid <code class="language-plaintext highlighter-rouge">free</code>, but this comes at a performance cost. The
same approach could be used to catch double <code class="language-plaintext highlighter-rouge">free</code>s.</p>
<p><br /></p>
<h1 id="when-things-should-work-but-everything-is-the-worst">When Things Should Work But Everything Is the Worst</h1>
<p>What happens when you decide to trust this assumption? Misbehaved <code class="language-plaintext highlighter-rouge">malloc</code> and
<code class="language-plaintext highlighter-rouge">free</code> calls can cause extra trouble for our system because of our in-heap
header objects. If we do nothing to protect against an invalid free, the state
information we maintain in headers can overwrite valid objects in our heap. So
maybe you want to add some protection. Maybe you like living on the edge though.</p>
<p>Let’s graduate to running <code class="language-plaintext highlighter-rouge">GTK</code> applications. I’ve tested <code class="language-plaintext highlighter-rouge">nautilus</code> and
<code class="language-plaintext highlighter-rouge">wireshark-gtk</code>, but I suspect anything that can fit in our fixed-size heap will
do. And by “will do” I mean will eventually coredump. The issue seems to be
stressed heavily when creating and destroying short-lived threads.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID: 22577 (nautilus)
UID: 1000 (usernm)
GID: 1000 (usernm)
Signal: 11 (SEGV)
Timestamp: Sun 2017-02-26 17:41:40 EST (1 weeks 2 days ago)
Command Line: nautilus
Executable: /usr/bin/nautilus
Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
Unit: user@1000.service
User Unit: gnome-terminal-server.service
Slice: user-1000.slice
Owner UID: 1000 (usernm)
Boot ID: 774881d4e90d4110b2622c613ad8b731
Machine ID: bab7914944494565bde60a645e838c11
Hostname: hostnm
Storage: /var/lib/systemd/coredump/core.nautilus.1000.774881d4e90d4110b2622c613ad8b73
Message: Process 22577 (nautilus) of user 1000 dumped core.
Stack trace of thread 22577:
#0 0x00007f78cb2e4bd4 _ZN10StatsAllocI8MmapHeapILm1073741824EEE4freeEPv (libstatsalloc.so)
#1 0x00007f78cb2e469d xxfree (libstatsalloc.so)
#2 0x00007f78cb2e39e5 customfree (libstatsalloc.so)
#3 0x00007f78c7ca1736 n/a (libglib-2.0.so.0)
#4 0x00007f78c7ca28ee g_slice_free1 (libglib-2.0.so.0)
#5 0x00007f78c7c8160a n/a (libglib-2.0.so.0)
#6 0x00007f78c7c845d0 g_main_context_dispatch (libglib-2.0.so.0)
#7 0x00007f78c7c84810 n/a (libglib-2.0.so.0)
#8 0x00007f78c7c848bc g_main_context_iteration (libglib-2.0.so.0)
#9 0x00007f78c823e52d g_application_run (libgio-2.0.so.0)
#10 0x00000000004293ba n/a (nautilus)
#11 0x00007f78c739b291 __libc_start_main (libc.so.6)
#12 0x000000000042941a n/a (nautilus)
...
</code></pre></div></div>
<p>Here we see that <code class="language-plaintext highlighter-rouge">nautilus</code> segfaults in a call to <code class="language-plaintext highlighter-rouge">free</code> caught by our custom
<code class="language-plaintext highlighter-rouge">libstatsalloc.so</code> shared library. Some other coredumps showed a bunch of
<code class="language-plaintext highlighter-rouge">pthreads</code> calls in the stack trace, which in combination with short-lived
threads eventually led me to
<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=20116">this</a> really fun bug in
<code class="language-plaintext highlighter-rouge">glib</code> < 2.25. Locally building and linking against latest <code class="language-plaintext highlighter-rouge">glib</code> didn’t solve
the issue though. So, despite it being right there in the coredump stack trace,
it took me a <em>long</em> time (days) to look into <code class="language-plaintext highlighter-rouge">g_slice_free1</code> and the associated
<code class="language-plaintext highlighter-rouge">g_slice_allocator</code>.</p>
<p>Long story short, <code class="language-plaintext highlighter-rouge">glib</code> memory slices can delegate <code class="language-plaintext highlighter-rouge">free</code>ing pointers to the
system allocator, when those pointers were allocated by a custom allocator
internal to <code class="language-plaintext highlighter-rouge">glib</code> itself. So that’s pretty anticlimactic. Fortunately, you can
run your binary with <code class="language-plaintext highlighter-rouge">G_SLICE=always-malloc</code> to always defer to the system
allocator, guaranteeing you only <code class="language-plaintext highlighter-rouge">free</code> pointers that you <code class="language-plaintext highlighter-rouge">malloc</code> yourself.
After a long time spent debugging my custom allocator, it turned out the
allocator was fine all along, and my assumptions were incorrect.</p>
<h1 id="summary">Summary</h1>
<p>That was a long-winded way of highlighting a couple snags I ran into
implementing my own custom allocator. It’s fun to watch your system happily
trudge along, knowing that it is relying on an allocator you wrote by hand, and
are fully aware of its limitations. Hand-tuned allocators might still be magic,
but the core functionality of an allocator is pretty straight forward. I
mentioned there will be a follow up related to conservative garbage collection.
Check that out once it’s posted to see something neat this allows us to do.</p>I recently wrote a statistics-gathering, segmented free-list allocator for C++ for my Systems class. It was lots of fun, very insightful, and exposed a couple of fun issues I thought I’d share. I’m not sure if I can publicize the github repo for the allocator, but it’s what you’d expect a simple free-list allocator to be. We also read lots of cool papers listed on the course site. Check them out!OpenFlow 1.0 Protocol in Rust2016-12-30T16:21:53+00:002016-12-30T16:21:53+00:00https://baxtersa.github.io/2016/12/30/rust-openflow-0x01<p>I’ve published a <a href="https://crates.io/crates/rust_ofp">crate</a> (my first!) implementing a large portion of the OpenFlow 1.0 protocol in Rust. I am surprised I was not able to find any packages either natively implementing software-defined networking (SDN) capabilities in Rust, or providing Rust bindings to existing protocol libraries written in other languages. <code class="language-plaintext highlighter-rouge">rust_ofp</code> takes the first step into this empty space of Rust packages, providing a Rust-native implementation of the <a href="http://archive.openflow.org/documents/openflow-spec-v1.0.0.pdf">OpenFlow 1.0 specification</a>, and offering traits that abstract the core functionality an OpenFlow controller should provide.</p>
<p><br /></p>
<h1 id="why-rust">Why Rust?</h1>
<p>I’ll preface this by stating that I am a programming language nerd, and I currently work on the <a href="www.frenetic-lang.org">frenetic-lang</a> project developing tools and abstractions for NetKAT (a compiled network programming language based on sold mathematical foundations). I’m <em>not</em> a network administrator, and am more interested in linguistic abstractions that guarantee network properties and grow expressiveness than I am in <em>being</em> a network administrator. This is mostly a comparison of what Rust offers that could benefit the frenetic project. So, take my perspective with that in mind.</p>
<p>At the lowest level, implementing an OpenFlow protocol naturally fits the low-level (zero-cost) abstractions of Rust. Having control over memory layout makes implementing a wire protocol easy. With</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[repr(packed)]</span>
<span class="k">struct</span> <span class="nf">OfpActionVlanPcp</span><span class="p">(</span><span class="nb">u8</span><span class="p">,</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">3</span><span class="p">]);</span>
</code></pre></div></div>
<p>you can acheive C-style struct layout without the compiler inserting padding for field alignment, and without jumping through the <code class="language-plaintext highlighter-rouge">CStruct</code> hoops of frenetic’s OCaml implementation to avoid the same problem:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="o">%%</span><span class="n">cstruct</span>
<span class="k">type</span> <span class="n">ofp_action_vlan_pcp</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">vlan_pcp</span><span class="o">:</span> <span class="n">uint8_t</span><span class="p">;</span>
<span class="n">pad</span><span class="o">:</span> <span class="n">uint8_t</span> <span class="p">[</span><span class="o">@</span><span class="n">len</span> <span class="mi">3</span><span class="p">];</span>
<span class="p">}</span> <span class="p">[</span><span class="o">@@</span><span class="n">big_endian</span><span class="p">]]</span>
</code></pre></div></div>
<p>OCaml CStruct’s automatic code generation for setters/getters of cstruct fields is nice, but choosing constrained integer representations for enums in Rust hasn’t been bad either. Manipulating a <code class="language-plaintext highlighter-rouge">&mut [u8]</code> feels more comfortable to me when serializing/deserializing bytes.</p>
<p>Furthermore, Rust’s performance can be much easier to reason about. This may derive from my days as a C++ developer, but Rust’s explicit use of pointers and references make it clear that I am not copying around payloads and message abstractions, but reusing the same data where possible.</p>
<p>For me, the difference between functors in OCaml and traits in Rust doesn’t amount to much. I’ve found it somewhat easier to implement a common interface amongst different message types using traits, but I’m accustomed to functors and modules, so my design in Rust borrows heavily from the functional paradigm and existing frenetic implementation.</p>
<p>The main selling point for a Rust frenetic platform is true parallelism. OCaml notoriously doesn’t have good support for parallelism. Long-running computations (compiling a new NetKAT policy on a dynamic configuration update) either run the risk of hogging controller responsiveness, or require non-idiomatic OCaml using unix threads in combination with pervasive use of JaneStreet’s Async library. Rust can isolate each controller-switch connection to individual threads, and Rust’s concurrency model gives you great guarantees about the correctness of parallel code. I’m looking forward to experimenting with Rust concurrency to see how an SDN controller can reap these benefits.</p>
<p><br /></p>
<h1 id="rust_ofp">rust_ofp</h1>
<p><code class="language-plaintext highlighter-rouge">rust_ofp</code> is a library implementing the OpenFlow 1.0 protocol. What does it look like? At the highest level of abstraction, it provides traits for implementing common OpenFlow structures for different protocol versions:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">trait OfpHeader</code> for OpenFlow message headers.</li>
<li><code class="language-plaintext highlighter-rouge">trait OfpMessage</code> for byte-level operations on OpenFlow message types.</li>
<li><code class="language-plaintext highlighter-rouge">trait OfpController</code> for OpenFlow controller operations, like connection handshakes and communicating with switches over <code class="language-plaintext highlighter-rouge">TcpStream</code>s.</li>
</ul>
<p>I have so far implemented what I consider the majority of useful functionality for an OpenFlow 1.0 SDN in the <code class="language-plaintext highlighter-rouge">openflow0x01</code> module. This module implements the <code class="language-plaintext highlighter-rouge">OfpMessage</code> trait for OpenFlow 1.0 messages, and the accompanying binary crate <code class="language-plaintext highlighter-rouge">rust_ofp_controller</code> gives a small example using the library to install rules when a switch connects to the controller.</p>
<p>A controller implementor can add flows with the <code class="language-plaintext highlighter-rouge">FlowMod</code> message type:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">enum</span> <span class="n">Message</span> <span class="p">{</span>
<span class="o">...</span>
<span class="nf">FlowMod</span><span class="p">(</span><span class="n">FlowMod</span><span class="p">),</span>
<span class="nf">PacketIn</span><span class="p">(</span><span class="n">PacktIn</span><span class="p">),</span>
<span class="o">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">FlowMod</span> <span class="p">{</span>
<span class="k">pub</span> <span class="n">command</span><span class="p">:</span> <span class="n">FlowModCmd</span><span class="p">,</span>
<span class="k">pub</span> <span class="n">pattern</span><span class="p">:</span> <span class="n">Pattern</span><span class="p">,</span>
<span class="k">pub</span> <span class="n">actions</span><span class="p">:</span> <span class="nb">Vec</span><span class="o"><</span><span class="n">Action</span><span class="o">></span><span class="p">,</span>
<span class="o">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">FlowMod</code>s take a struct specifying a <code class="language-plaintext highlighter-rouge">FlowModCmd</code> that determines the type of modification to make to a switch’s flowtable (add, modify, delete), along with a pattern and list of actions to perform on pattern matches. An empty action list is interpreted as the <code class="language-plaintext highlighter-rouge">DROP</code> action.</p>
<p><code class="language-plaintext highlighter-rouge">PacketIn</code> message types (i.e. packets that arrive at the controller) can be handled to perform behaviors like MAC learning, dynamically installing rules on switches using network discovery.</p>
<p>The library design is intended to allow other OpenFlow specifications to implement the same traits, enabling a pluggable back-end for a single SDN compiler to target multiple protocol versions. I hope to implement OpenFlow 1.3 (specifically for multi-table support), but will otherwise be focusing on front-end abstractions to SDN specifications once the protocol is in working order.</p>
<p><br /></p>
<h1 id="documentation">Documentation</h1>
<p>Travis CI integration in the <a href="https://github.com/rust_ofp">github repo</a> automatically uploads source documentation generated by <code class="language-plaintext highlighter-rouge">cargo doc</code>.</p>
<ul>
<li><a href="https://baxtersa.github.io/rust_ofp/docs"><code class="language-plaintext highlighter-rouge">rust_ofp</code></a></li>
<li><a href="https://baxtersa.github.io/rust_ofp/docs/rust_ofp_controller"><code class="language-plaintext highlighter-rouge">rust_ofp_controller</code></a></li>
</ul>
<p>Please give a shot at implementing your own SDN controller using <code class="language-plaintext highlighter-rouge">rust_ofp</code>! If there’s demand for it, I’ll cover setting up and testing a controller in the mininet network simulation environment in a future post. Get in touch if you want to discuss anything further!</p>I’ve published a crate (my first!) implementing a large portion of the OpenFlow 1.0 protocol in Rust. I am surprised I was not able to find any packages either natively implementing software-defined networking (SDN) capabilities in Rust, or providing Rust bindings to existing protocol libraries written in other languages. rust_ofp takes the first step into this empty space of Rust packages, providing a Rust-native implementation of the OpenFlow 1.0 specification, and offering traits that abstract the core functionality an OpenFlow controller should provide.Adapton Tries2016-12-22T18:37:47+00:002016-12-22T18:37:47+00:00https://baxtersa.github.io/2016/12/22/adapton<p>As part II of the past semester’s <a href="https://people.cs.umass.edu/~arjun/courses/compsci691pl-fall2016">PL Seminar</a>, I have implemented <em>probablistically balanced tries</em> in <a href="github.com/cuplv/adapton.rust">adapton.rust</a>, a general-purpose <strong>incremental computation</strong> (I.C.) library for Rust. I.C. presents a different programming paradigm to get used to, but can yield big performance improvements without needing to implement dynamic programming algorithms that become harder to reason about. You can see <a href="baxtersa.github.io/2016/11/01/mixy.html">MIXY</a> for my post/project on part I of the seminar.</p>
<p>Details on the theory behind Adapton.Rust’s current implementation can be found in the paper <a href="https://arxiv.org/pdf/1503.07792v5.pdf">Incremental Computation with Names</a>. Adapton.Rust implements what the authors call <em>nominal</em> matching, giving <strong>names</strong> to computations for reuse. I’ll go over some of the details before presenting <em>probablistic tries</em>, and then explaining how to use and benchmark my implementation.</p>
<p><br /></p>
<h1 id="named-computation">Named Computation</h1>
<p>Adapton and previous works on incremental computation aim to reuse as much computation as possible after changes to input, improving upon naive memoization. In Adapton, we attempt to reuse sub-computations, such as results from similar tails of different input lists. So, mapping <code class="language-plaintext highlighter-rouge">f</code> over <code class="language-plaintext highlighter-rouge">[2;3;4;5]</code> first and then over <code class="language-plaintext highlighter-rouge">[1;2;3;4;5]</code>, Adapton will reuse the entire result on the common tail <code class="language-plaintext highlighter-rouge">[2;3;4;5]</code>, rather than failing a memo lookup on the original input and performing a full recomputation.</p>
<p>Nominal Adapton take this a step further. Though original Adapton was able to reuse the common tail above, incremental computations on lists are limited to only reusing computations over similar tails. This is because structurally, all prefix cons cells “depend” on the entire tail (Figure 1.a in the <a href="https://arxiv.org/pdf/1503.07792v5.pdf">paper</a> has a great depiction of how an insertion into the middle of a list “dirties” the entire prefix, triggering the need for recomputation of those values shown in Figure 1.b). This behavior is a product of <em>structural matching</em> to determine whether values can be reused, or must be recomputed. We introduce <em>nominal matching</em> to avoid situations such as these, enabling reuse of more computations.</p>
<p>In nominal Adapton, we associate external “names” with computations. If a future call dirties a node in the <em>demanded computation graph</em>, we can match on the <em>name</em> rather than <em>structure</em> to determine reusability. This results in a constant-time equality check, and greater opportunity for reuse!</p>
<p>In code, this means we wrap input in <em>named articulation points</em> before extending data structures.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">fn</span> <span class="nf">push_input</span><span class="p">(</span><span class="n">i</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">t</span><span class="p">:</span> <span class="n">Trie</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">Trie</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">t</span> <span class="o">=</span> <span class="nn">Trie</span><span class="p">::</span><span class="nf">art</span><span class="p">(</span><span class="nf">cell</span><span class="p">(</span><span class="nf">name_of_usize</span><span class="p">(</span><span class="n">i</span><span class="p">),</span> <span class="n">t</span><span class="p">));</span>
<span class="k">let</span> <span class="n">t</span> <span class="o">=</span> <span class="nn">Trie</span><span class="p">::</span><span class="nf">name</span><span class="p">(</span><span class="nf">name_of_usize</span><span class="p">(</span><span class="n">i</span><span class="p">),</span> <span class="n">t</span><span class="p">);</span>
<span class="nn">Trie</span><span class="p">::</span><span class="nf">extend</span><span class="p">(</span><span class="nf">name_unit</span><span class="p">(),</span> <span class="n">t</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">fn</span> <span class="nf">push_list</span><span class="p">(</span><span class="n">i</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">l</span><span class="p">:</span> <span class="n">List</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">List</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">l</span> <span class="o">=</span> <span class="nn">List</span><span class="p">::</span><span class="nf">art</span><span class="p">(</span><span class="nf">cell</span><span class="p">(</span><span class="nf">name_of_usize</span><span class="p">(</span><span class="n">i</span><span class="p">),</span> <span class="n">l</span><span class="p">));</span>
<span class="k">let</span> <span class="n">l</span> <span class="o">=</span> <span class="nn">List</span><span class="p">::</span><span class="nf">name</span><span class="p">(</span><span class="nf">name_of_usize</span><span class="p">(</span><span class="n">i</span><span class="p">),</span> <span class="n">l</span><span class="p">);</span>
<span class="nn">List</span><span class="p">::</span><span class="nf">cons</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">l</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Adapton can still perform structural matching (try wrapping <code class="language-plaintext highlighter-rouge">dcg</code> interactions with the <code class="language-plaintext highlighter-rouge">structural(...)</code> function, or running the executable with <code class="language-plaintext highlighter-rouge">ADAPTON_STRUCTURAL=1</code>), but programming by naming sub-structures is what enables these speedups.</p>
<p><br /></p>
<h1 id="probablistic-tries">Probablistic Tries</h1>
<p>Probablistic Tries are incremental data structures inspired by probablistically balanced trees. In Adapton, we can use tries to represent sets and finite maps.</p>
<p>Intuitively, tries are binary trees whose nodes are named and whose leaves hold data. Tries use bitstrings (see <a href="https://github.com/baxtersa/adapton.rust/blob/dev/src/bitstring.rs">bitstring.rs</a> for implementation) to represent both the data element (via its hash), and the path to traverse a trie in order to retrieve it (if the element is present). Because hashes describe the path to extract elements, similar inputs in tries yield similarly structured paths. Inserting elements affects little change to the structure; paths only change after insertions if the depth of a trie must grow, and only paths on the grown sub-trie are dirtied. If we memoize named computations in a <code class="language-plaintext highlighter-rouge">fold</code> over the structure of a trie, future <code class="language-plaintext highlighter-rouge">fold</code>s can reuse all unchanged subcomputations.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">trie_fold</span>
<span class="o"><</span><span class="n">X</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span><span class="n">TrieElim</span><span class="o"><</span><span class="n">X</span><span class="o">></span><span class="p">,</span> <span class="n">Res</span><span class="p">:</span><span class="n">Hash</span><span class="o">+</span><span class="n">Debug</span><span class="o">+</span><span class="nb">Eq</span><span class="o">+</span><span class="n">Clone</span><span class="o">+</span><span class="nv">'static</span><span class="p">,</span> <span class="n">F</span><span class="p">:</span><span class="nv">'static</span><span class="o">></span>
<span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span> <span class="n">res</span><span class="p">:</span><span class="n">Res</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="nb">Rc</span><span class="o"><</span><span class="n">F</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">Res</span>
<span class="k">where</span> <span class="n">F</span><span class="p">:</span> <span class="nf">Fn</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">Res</span><span class="p">)</span> <span class="k">-></span> <span class="n">Res</span> <span class="p">{</span>
<span class="nn">T</span><span class="p">::</span><span class="nf">elim_arg</span><span class="p">(</span><span class="n">t</span><span class="p">,</span>
<span class="p">(</span><span class="n">res</span><span class="p">,</span> <span class="n">f</span><span class="p">),</span>
<span class="p">|</span><span class="mi">_</span><span class="p">,</span> <span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="mi">_</span><span class="p">)|</span> <span class="n">arg</span><span class="p">,</span>
<span class="p">|</span><span class="mi">_</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="p">)|</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">arg</span><span class="p">),</span>
<span class="p">|</span><span class="mi">_</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">,</span> <span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="p">)|</span> <span class="nf">trie_fold</span><span class="p">(</span><span class="n">right</span><span class="p">,</span> <span class="nf">trie_fold</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="nf">.clone</span><span class="p">()),</span> <span class="n">f</span><span class="p">),</span>
<span class="p">|</span><span class="mi">_</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="p">)|</span> <span class="nf">trie_fold</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="p">),</span>
<span class="p">|</span><span class="n">nm</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">f</span><span class="p">)|</span> <span class="nd">memo!</span><span class="p">(</span><span class="n">nm</span> <span class="k">=></span><span class="o">></span> <span class="n">trie_fold</span><span class="p">,</span> <span class="n">t</span><span class="p">:</span><span class="n">t</span><span class="p">,</span> <span class="n">res</span><span class="p">:</span><span class="n">arg</span> <span class="p">;;</span> <span class="n">f</span><span class="p">:</span><span class="n">f</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">memo!(nm =>> trie_fold, t:t, res:arg ;; f:f)</code> syntax memoizes recursive <code class="language-plaintext highlighter-rouge">trie_fold</code> computations on named sub-tries. If we use nominal matching, these points result in <code class="language-plaintext highlighter-rouge">O(1)</code>-time memo lookups, and under structural matching these points are where we compare for structural equality.</p>
<p><br /></p>
<h1 id="building-and-testing">Building and Testing</h1>
<p>My implementation of tries has been merged into upstream adapton.rust, and can be found on the <code class="language-plaintext highlighter-rouge">dev</code> branch. To check out the behavior on your own machine, install the nightly release of <code class="language-plaintext highlighter-rouge">rustc</code> and <code class="language-plaintext highlighter-rouge">cargo</code>, build and run the tests/benchmarks following the instructions below.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl https://sh.rustup.rs <span class="nt">-sSf</span> | sh <span class="nt">-s</span> <span class="nt">--</span> <span class="nt">--default-toolchain</span> nightly
<span class="c"># Make sure you are on the dev branch</span>
<span class="nv">$ </span>git clone https://github.com/baxtersa/adapton.rust
<span class="nv">$ </span><span class="nb">cd </span>adapton.rust/
<span class="c"># Build the source</span>
<span class="nv">$ </span>cargo build
<span class="c"># Build and run tests</span>
<span class="nv">$ </span>cargo <span class="nb">test</span>
<span class="c"># Remove a benchmark file that breaks the benchmark build (I should fix this)</span>
<span class="nv">$ </span><span class="nb">rm </span>benches/benches.rs
<span class="c"># Build and run benchmarks</span>
<span class="nv">$ </span>cargo bench tries_bench
</code></pre></div></div>
<p><br /></p>
<h1 id="benchmarks">Benchmarks</h1>
<p>Some common list-computations perform better over tree-like structures in Adapton. <code class="language-plaintext highlighter-rouge">fold</code> is an example of this, as the accumulator carries dependencies on all previous computations, i.e. the accumulator at step <code class="language-plaintext highlighter-rouge">i</code> depends on the entire list prefix <code class="language-plaintext highlighter-rouge">0..i-1</code>, demanding recomputation of each step on changes to the input. Tree-like structures improve upon this by expressing independence of sub-problems, so changing a leaf only dirties the path up to the root, rather than all leaves traversed prior.</p>
<p>My performance benchmark compares <code class="language-plaintext highlighter-rouge">fold</code> over trees and tries. In this benchmark, we build an input out of the sequence <code class="language-plaintext highlighter-rouge">1..100</code>. At each iteration of building the input, we fold <code class="language-plaintext highlighter-rouge">sum</code> over the current iteration of the structure. <code class="language-plaintext highlighter-rouge">benchmark_dcg_*</code> uses <em>nominal matching</em> and the <em>demanded computation graph</em> to yeild large performance speedups over <code class="language-plaintext highlighter-rouge">benchmark_naive_*</code> tests, that perform recomputation.</p>
<p>Running <code class="language-plaintext highlighter-rouge">cargo bench</code> should show some comparisons of this behavior:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>running 4 tests
<span class="nb">test </span>tree_benchmarks::benchmark_dcg_tree ... bench: 2,160 ns/iter <span class="o">(</span>+/- 322<span class="o">)</span>
<span class="nb">test </span>tree_benchmarks::benchmark_naive_tree ... bench: 66,686 ns/iter <span class="o">(</span>+/- 201<span class="o">)</span>
<span class="nb">test </span>trie_input::benchmark_dcg_trie ... bench: 1,968 ns/iter <span class="o">(</span>+/- 97<span class="o">)</span>
<span class="nb">test </span>trie_input::benchmark_naive_trie ... bench: 16,378 ns/iter <span class="o">(</span>+/- 3,020<span class="o">)</span>
</code></pre></div></div>
<p>For comparison, run <code class="language-plaintext highlighter-rouge">ADAPTON_STRUCTURAL=1 cargo bench</code> to show the performance of <em>structural matching</em>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>running 4 tests
<span class="nb">test </span>tree_benchmarks::benchmark_dcg_tree ... bench: 1,604 ns/iter <span class="o">(</span>+/- 91<span class="o">)</span>
<span class="nb">test </span>tree_benchmarks::benchmark_naive_tree ... bench: 66,219 ns/iter <span class="o">(</span>+/- 1,465<span class="o">)</span>
<span class="nb">test </span>trie_input::benchmark_dcg_trie ... bench: 9,717 ns/iter <span class="o">(</span>+/- 95<span class="o">)</span>
<span class="nb">test </span>trie_input::benchmark_naive_trie ... bench: 16,479 ns/iter <span class="o">(</span>+/- 2,906<span class="o">)</span>
</code></pre></div></div>
<p>Using nominal matching, trees and tries perform roughly equivalent under the <code class="language-plaintext highlighter-rouge">dcg</code> engine. This makes sense, because we are folding over similar tree structures, and reusing named points at binary nodes. Interestingly, naive computation is much more performant for tries in this benchmark. My guess is that the tries are much smaller than the trees tested, but haven’t looked into it too much.</p>
<p>You can see that structural matching performs better than naive recomputation for both trees and tries, but tries perform significantly better under nominal matching. This shows the benefits of memoizing names rather than using structural comparison.</p>
<p>I should note that for <code class="language-plaintext highlighter-rouge">fold</code>ing the sum of an input is <em>significantly</em> faster using Rust native structures like vectors, than the benchmark above. This may be a combination of optimizations and the overhead of Adapton’s <code class="language-plaintext highlighter-rouge">dcg</code> engine outweighing the benefits of reuse for cheap integer arithmetic. I’d expect the benchmarks to look better compared to naive Rust vectors on more expensive computations, and future incrementalset and graph operations should be coming soon. For now, this is still a hunch I’d like to verify before incorporating I.C. with adapton.rust into any personal projects (of which I have none).</p>
<p><br /></p>
<h1 id="development">Development</h1>
<p>See <a href="https://github.com/cuplv/adapton.rust">here</a> for the latest changes upstream.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">dev</code> contains latest development (including tries!)</li>
<li><code class="language-plaintext highlighter-rouge">master</code> contains “stable” features</li>
</ul>
<p><br />
See <a href="https://github.com/cuplv/adapton.rust/pull/4">here</a> for the discussion on my PR merging tries upstream.</p>As part II of the past semester’s PL Seminar, I have implemented probablistically balanced tries in adapton.rust, a general-purpose incremental computation (I.C.) library for Rust. I.C. presents a different programming paradigm to get used to, but can yield big performance improvements without needing to implement dynamic programming algorithms that become harder to reason about. You can see MIXY for my post/project on part I of the seminar.P4 Developer Day2016-11-08T19:39:21+00:002016-11-08T19:39:21+00:00https://baxtersa.github.io/2016/11/08/p4-developer-day<p>I recently attended P4 Developer Day at Stanford University. The single-day event was sponsored by a handful of SDN-related companies, some industry/consumer driven, and others derived from academic work like P4 itself. Coming into the day I had only read the <a href="http://www.sigcomm.org/sites/default/files/ccr/papers/2014/July/0000000-0000004.pdf">original paper</a> <em>Programming Protocol-Independent Packet Processors</em>, but I encourage people to check out <a href="http://github.com/p4lang">p4lang</a> and follow the tutorials. I apologize for any aliteration that follows, but with a paper title as such, it is inevitable.</p>
<p><br /></p>
<h1 id="sdnopenflow">SDN/OpenFlow</h1>
<p>Let’s talk about <a href="https://www.opennetworking.org/sdn-resources/openflow">OpenFlow</a> quickly. OpenFlow is a standardized communication protocol for software-defined networks (SDNs). It allows network devices (controllers, switches, middleboxes, etc.) to communicate packet processing rules with each other. This gives network administrators control over dataplane behavior, whereas traditional networks conflate control- and data-planes into a single black-box of behavior.</p>
<p>OpenFlow presented the first successful specification of programmable control-plane features, and that is great. You can reprogram routing protocols without reprovisioning your physical infrastructure. But OpenFlow’s original specification has grown unwieldy with the continual introduction of support for packet types and switch capabilities.</p>
<p><br /></p>
<h1 id="p4">P4</h1>
<p>This is where P4 enters the SDN scene. Rather than updating a spec each year to support increasingly complex packet behavior, necessitating switch updates to support new features, P4 proposes programmer-defined packet parsing functions and match-action table behavior. There is a lot of complexity in how to implement this efficiently, but the high level idea is a natural successor to what OpenFlow started.</p>
<h2 id="headers-and-parsers">Headers and Parsers</h2>
<p>P4 programs can easily capture what traditional ethernet, ipv4/6, vlan headers, and the rest look like. In the style of a c-struct, the programmer defines the organization of bits in a header, and how to extract information from headers or delve deep into nested headers. Here’s what an ethernet header looks like in P4:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">header_type</span> <span class="n">ethernet_t</span> <span class="p">{</span>
<span class="n">fields</span> <span class="p">{</span>
<span class="n">dstAddr</span> <span class="o">:</span> <span class="mi">48</span><span class="p">;</span>
<span class="n">srcAddr</span> <span class="o">:</span> <span class="mi">48</span><span class="p">;</span>
<span class="n">ethType</span> <span class="o">:</span> <span class="mi">16</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>so an ethernet header is a 48-bit destination mac followed by a 48-bit source and a 16-bit ethtype. We can define IP and vlan headers similarly, and tell our parser to extract the ethernet frame, and parse the following bits of the packet as an IP header if the appropriate ethtype matches. We can easily unfold nested headers to perform L2/L3 routing as we wish, or perform more complex actions such as tunnel introspection.</p>
<p><br /></p>
<h2 id="match-action-tables">Match-Action Tables</h2>
<p>For people familiar with OpenFlow, match-action tables are nothing new. P4 supports defining custom actions and matching against packet header or metadata fields. Given a couple of action definitions and assuming we keep around some metadata, we can define a table as follows:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">action</span> <span class="nf">_drop</span><span class="p">()</span> <span class="p">{</span>
<span class="n">drop</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">action</span> <span class="nf">ip_firewall_to_ctrlr</span><span class="p">()</span> <span class="p">{</span>
<span class="n">send_to_ctrlr</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">action</span> <span class="nf">ip_pass_through</span><span class="p">()</span> <span class="p">{</span>
<span class="n">l3_route</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">table</span> <span class="n">firwall</span> <span class="p">{</span>
<span class="n">reads</span> <span class="p">{</span>
<span class="n">ipv4</span> <span class="o">:</span> <span class="n">valid</span><span class="p">;</span>
<span class="n">ipv4</span><span class="p">.</span><span class="n">srcAddr</span> <span class="o">:</span> <span class="n">exact</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">actions</span> <span class="p">{</span>
<span class="n">_drop</span><span class="p">;</span>
<span class="n">ip_firewall_to_ctlr</span><span class="p">;</span>
<span class="n">ip_pass_through</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Of course, you also need to install concrete match values for things such as <code class="language-plaintext highlighter-rouge">ipv4.srcAddr</code>. But this is the sort of expressiveness and modularity you get in traditional programming environments - P4 is bring that to SDNs.</p>
<p><br /></p>
<h1 id="hairyness">Hairyness</h1>
<p>P4 does come with a handful of rough patches too.</p>
<ul>
<li>Parsing support for things such as variable length header fields is difficult.</li>
<li>If we can program packet header descriptions and table structures, how do we accurately deploy those on highly specialized hardware?</li>
<li>Lacking an operational model, we are left to investing immense effort in testing.</li>
</ul>
<p><br />
I want to expand a little on deploying to hardware. Prof. <a href="www.cs.cornell.edu/~jnfoster">Nate Foster</a>, from Cornell, spoke about compiler internals, and specifically touched upon the need for API generation for each compilation of a P4 program. This API gets dynamically loaded onto switches, along with the new match-action tables, so that switches know how to talk about the new packet structures implemented. Without this, switches would not be able to communicate to each other about what sorts of metadata they contain, or how to add or delete rules from their tables. So with P4, we need programmable packet parsing and table definitions, but we need programmable NICs as well.</p>
<p>Automatic API generation can be done a handful of ways. Foster compared an approach of essentially monomorphizing API functions for each match/action pair (giving you strong type guarantees at the cost of a huge number of functions), against a more lenient approach generating generic functions that more or less are able to operate on any table or header spec. With this approach, you gain a more concise API, but lose type-checking measures to protect it because you are essentially operating over a single giant enum. This is just programming languages at work, and there’s a rich foundation of work we could apply to make better sense of this.</p>I recently attended P4 Developer Day at Stanford University. The single-day event was sponsored by a handful of SDN-related companies, some industry/consumer driven, and others derived from academic work like P4 itself. Coming into the day I had only read the original paper Programming Protocol-Independent Packet Processors, but I encourage people to check out p4lang and follow the tutorials. I apologize for any aliteration that follows, but with a paper title as such, it is inevitable.MIXY and OCaml2016-11-01T00:57:13+00:002016-11-01T00:57:13+00:00https://baxtersa.github.io/2016/11/01/mixy<p>I recently prototyped a system that mixes type checking with symbolic execution for my <a href="https://people.cs.umass.edu/~arjun/courses/compsci691pl-fall2016/">PL Seminar</a>. The course site already contains out-of-date information regarding papers, but if you check back there I assume we’ll eventually post accurate paper topics and links to the rest of our seminar’s prototype implementations.</p>
<p>I’ll go into a little bit of detail on the MIXY system presented in the PLDI 2010 paper <a href="http://www.cs.colorado.edu/~bec/papers/pldi10-mix.pdf">Mixing Type Checking and Symbolic Execution</a> by K.Y. Phang, B.E. Chang, and J.S. Foster. You can take a look at my prototype implementation and some more notes in <a href="https://github.com/baxtersa/mix_proto">my github repo</a>.</p>
<p><br /></p>
<h1 id="mixy">MIXY</h1>
<p>MIXY presents a system that allows you to balance the tradeoff between precision and performance in a static analysis. Type systems typically provide efficient analyses, usually baked into the compilation process, at the cost of coarse results. On the other hand, symbolic execution can yield tremendous precision, but does so at the cost of performance of the analysis, often needing to consult model checkers and/or satisfiability solvers, that can have long tails in runtime.</p>
<p>In MIXY, users annotate a program with typed and symbolic tags, specifying which portions of code should be analyzed by which procedure. For portions of code known to be computationally expensive for symbolic execution, typed tags can weaken precision on just that region, while symbolic execution can still perform precise analysis elsewhere in your program. Alternatively, symbolic execution could refine types based on flow-sensative behavior, allowing a flow-insensitive refined type system to be more precise in its analysis.</p>
<p>The high-level idea is pretty straightforward, so here are a couple examples of what this could look like:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">MIX</span><span class="p">(</span><span class="n">symbolic</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">div</span> <span class="o">=</span>
<span class="k">fun</span> <span class="p">(</span><span class="n">x</span><span class="o">:</span><span class="kt">int</span><span class="p">)</span> <span class="o">-></span>
<span class="k">fun</span> <span class="p">(</span><span class="n">y</span><span class="o">:</span><span class="kt">int</span><span class="p">)</span> <span class="o">-></span>
<span class="k">if</span> <span class="n">y</span> <span class="o">==</span> <span class="mi">0</span>
<span class="k">then</span> <span class="bp">false</span>
<span class="k">else</span> <span class="n">x</span> <span class="o">/</span> <span class="n">y</span> <span class="k">in</span>
<span class="nc">MIX</span><span class="p">(</span><span class="n">typed</span><span class="p">)</span> <span class="p">{</span> <span class="mi">10</span> <span class="o">+</span> <span class="nc">MIX</span><span class="p">(</span><span class="n">symbolic</span><span class="p">)</span> <span class="p">{</span> <span class="n">div</span> <span class="mi">7</span> <span class="mi">0</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>
</code></pre></div></div>
<p>In the above, we try to reject the classic divide-by-0 mistake everyone always talks about. Simple type systems have no knowledge of values, and cannot catch this error statically. Here, we wrap the application of <code class="language-plaintext highlighter-rouge">div</code> in a symbolic block, symbolically performing the application and recognizing that <code class="language-plaintext highlighter-rouge">y == 0</code>, yielding a <code class="language-plaintext highlighter-rouge">false</code> value. When we return to type check the addition <code class="language-plaintext highlighter-rouge">10 + ...</code>, we know we are trying to add an <code class="language-plaintext highlighter-rouge">int</code> to a <code class="language-plaintext highlighter-rouge">bool</code>, and we fail the type checker! Notice how the definition of <code class="language-plaintext highlighter-rouge">div</code> obviously doesn’t type-check under any sane type system because the <code class="language-plaintext highlighter-rouge">then</code> and <code class="language-plaintext highlighter-rouge">else</code> branches differ in type. MIXY’s presentation of symbolic execution forks on conditionals, accumulating path conditions on code reachability, so that later on we know under what the return type of <code class="language-plaintext highlighter-rouge">div</code> should be and under what conditions.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">val</span> <span class="o">=</span>
<span class="nc">MIX</span><span class="p">(</span><span class="n">symbolic</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="bp">true</span>
<span class="k">then</span>
<span class="nc">MIX</span><span class="p">(</span><span class="n">typed</span><span class="p">)</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">}</span>
<span class="k">else</span>
<span class="nc">MIX</span><span class="p">(</span><span class="n">typed</span><span class="p">)</span> <span class="p">{</span> <span class="mi">10</span> <span class="o">+</span> <span class="bp">false</span> <span class="p">}</span>
<span class="p">}</span> <span class="k">in</span>
<span class="k">val</span>
</code></pre></div></div>
<p>In this example, we have a type error in code that is unreachable at runtime. Everything about this is obviously bad practice, but more subtle examples of this behavior silently live in untyped code all over the place. We know that <code class="language-plaintext highlighter-rouge">10 + false</code> should fail to type check, so let’s wrap it in a typed block. Since we symbolically evaluate the entire <code class="language-plaintext highlighter-rouge">if</code>-statement, we will only perform typechecking on the <code class="language-plaintext highlighter-rouge">else</code> branch if the path condition is feasible. In this way, we can eliminate analyzing unreachable code that would otherwise cause our analysis to fail. We can use variations of this to ensure that certain conditions in our program are never met, over all inputs.</p>
<p>That’s it for the basic examples I’ve conjured up (you can test them in my prototype!). Check out what I’ve implemented on github, along with my notes on some issues I had regarding implementing the formal system presented in the paper, and with the paper itself. Despite my issues with it, the paper authors implemented a <em>substantial</em> prototype of thier system to check non-null pointer dereferences in C, using logically qualified type inference. The paper contains more concrete examples, and some implementation issues of their own that I sort of glossed over in my prototype.</p>
<p><br /></p>
<h1 id="functors-briefly">Functors, briefly</h1>
<p>I also want to talk about a couple OCaml things that OCaml people do and you can too.</p>
<p>A ‘functor’ is one of the many functional programming idioms I feel are too often described in dense, mathematical terms for no reason at all.</p>
<p>Functors are simply modules parameterized on the signature of another module. For anyone from an OOP background, you can just think of it as a class parameterized by the interfaces necessary to define it, rather than inheriting from those interfaces. Pretending like OCaml doesn’t have an object system, this is how you implement generic modules that expect to be abale to call certain functions defined to be applied to some parameterized type.</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">(* Set.Make : functor (Comparable:COMPARABLE) -> sig
* type t
* type elt = Comparable.t
* ...
* val add : t -> elt -> t
* ...
* end *)</span>
<span class="k">module</span> <span class="nc">C</span> <span class="o">:</span> <span class="nc">COMPARABLE</span> <span class="o">=</span> <span class="k">struct</span>
<span class="k">type</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">int</span>
<span class="k">let</span> <span class="n">compare</span> <span class="p">(</span><span class="n">x</span><span class="o">:</span><span class="n">t</span><span class="p">)</span> <span class="p">(</span><span class="n">y</span><span class="o">:</span><span class="n">t</span><span class="p">)</span> <span class="o">=</span> <span class="nn">Pervasives</span><span class="p">.</span><span class="n">compare</span> <span class="n">x</span> <span class="n">y</span>
<span class="k">end</span>
<span class="nn">Set</span><span class="p">.</span><span class="nc">Make</span><span class="p">(</span><span class="nc">C</span><span class="p">)</span>
</code></pre></div></div>
<p>In the above example, the <code class="language-plaintext highlighter-rouge">Set.Make</code> functor is parameterized by a module whose signature defines a type <code class="language-plaintext highlighter-rouge">t</code> and a binary function <code class="language-plaintext highlighter-rouge">compare</code> that operates on two <code class="language-plaintext highlighter-rouge">t</code>s. Throughout the implementation of the <code class="language-plaintext highlighter-rouge">Set.Make</code> module, we can make use of values of type <code class="language-plaintext highlighter-rouge">Comparable.t</code>, aliased to the type <code class="language-plaintext highlighter-rouge">elt</code>, and can apply <code class="language-plaintext highlighter-rouge">Comparable.compare</code> to perform traditional set operations, such as determining if a set contains an element before adding the element to it.</p>
<p><br /></p>
<h1 id="mutually-recursive-functors">Mutually Recursive Functors</h1>
<p>One of the key insights to the MIXY system is its agnosticism towards the type and symbolic analyses it integrates. Type checking and symbolic execution only interact when the analysis crosses the boundary between the two. The ‘mixing rule’ semantics given in the paper translate between type environments and symbolic state, retaining sufficient information to progress in the analysis soundly.</p>
<p>In my implementation, we mix a symbolic execution based off the operational semantics given in the paper, with a standard type system for a language akin to STLC. I finally found a use for mutually recursive functors in OCaml! The signatures for my typechecking and symbolic execution modules are mutually recursive, in the same way you can define mutually recursive even/odd functions. Here’s the concrete application of these functors in my prototype:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="k">rec</span> <span class="nc">T</span> <span class="o">:</span> <span class="nn">Analyses</span><span class="p">.</span><span class="nc">TYP</span> <span class="o">=</span> <span class="nn">Typecheck</span><span class="p">.</span><span class="nc">Make</span><span class="p">(</span><span class="nc">SE</span><span class="p">)</span>
<span class="ow">and</span> <span class="nc">SE</span> <span class="o">:</span> <span class="nn">Analyses</span><span class="p">.</span><span class="nc">SYM</span> <span class="o">=</span> <span class="nn">Symbolic_interp</span><span class="p">.</span><span class="nc">Make</span><span class="p">(</span><span class="nc">T</span><span class="p">)</span>
</code></pre></div></div>
<p>Notice the use of <code class="language-plaintext highlighter-rouge">rec</code> and <code class="language-plaintext highlighter-rouge">and</code> keywords just like mutually recursive function definitions. It’s the same idea. It’s kind of cool that it’s more or less an intuitive implementation of the system’s semantics. I don’t think I ever came across an intuitive need for mutually recursive classes during my C++ days…</p>
<p>Cyclic build dependencies become something you have to think about with mutually recursive functors though. You’ll notice the signatures TYP and SYM are nested inside the Analyses compilation unit. Since the signatures of typechecking and symbolic execution need to know about each other, they must be colocated in the same file, or they would otherwise cause cyclic build dependencies and be rejected by the compiler. This breaks down the ability to compartmentalize your code according to files somewhat, but I think that’s better than the alternative of losing code reuse by squashing both implementations into a single module.</p>I recently prototyped a system that mixes type checking with symbolic execution for my PL Seminar. The course site already contains out-of-date information regarding papers, but if you check back there I assume we’ll eventually post accurate paper topics and links to the rest of our seminar’s prototype implementations.OCaml Module Namespaces2016-08-31T22:35:46+00:002016-08-31T22:35:46+00:00https://baxtersa.github.io/2016/08/31/ocaml-module-namespaces<p>Here’s a fun thing I learned about OCaml today: OCaml’s linker has issues with module namespaces. Below is my understanding of the issue, let me know if I’m mistaken! This is a hard-to-diagnose issue if you don’t know what to look for, so hopefully this saves someone some time debugging.</p>
<p>In following snippet where <code class="language-plaintext highlighter-rouge">A</code> is an external module dependency,</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
<span class="k">open</span> <span class="nc">A</span>
<span class="k">open</span> <span class="nn">M</span>
<span class="p">...</span>
</code></pre></div></div>
<p>it’s not clear whether <code class="language-plaintext highlighter-rouge">M</code> is external or internal to the module <code class="language-plaintext highlighter-rouge">A</code> (i.e. <code class="language-plaintext highlighter-rouge">A.M</code>) if <code class="language-plaintext highlighter-rouge">A</code>’s signature has not yet been compiled. So, <code class="language-plaintext highlighter-rouge">ocamldep</code> must treat each module name as a potential extern reference. This can yield an overapproximation of dependencies in benign cases, but can even cause compilation to fail in the case of falsely-diagnosed cyclic dependencies.</p>
<p>At the root of the problem, compilation units in OCaml are represented by a module whose name is derived from the basename of the compunit’s files. This means if two independently developed libraries each have compilation units sharing the same representative module name, the two libraries cannot be reliably used in the same program.</p>
<p>What’s the error look like when you unknowingly come across this issue? Well, that depends on how the failure manifests itself in your program. Here’s the scenario that led me down this path:</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
<span class="k">open</span> <span class="nn">Async</span><span class="p">.</span><span class="nn">Std</span>
<span class="p">...</span>
<span class="p">(</span><span class="o">*</span> <span class="nc">Do</span> <span class="n">some</span> <span class="nc">OCaml</span> <span class="n">things</span><span class="o">...</span> <span class="o">*</span><span class="p">)</span>
<span class="o">...</span>
<span class="nn">Monitor</span><span class="p">.</span><span class="n">try_with</span> <span class="p">(</span><span class="k">fun</span> <span class="bp">()</span> <span class="o">-></span> <span class="o"><</span><span class="n">code</span> <span class="n">that</span> <span class="n">forks</span> <span class="n">a</span> <span class="n">child</span> <span class="n">process</span><span class="o">></span><span class="p">)</span>
<span class="o">>>|</span> <span class="k">function</span>
<span class="o">|</span> <span class="nc">Error</span> <span class="n">exn</span> <span class="o">-></span>
<span class="o"><</span><span class="nc">Do</span> <span class="n">some</span> <span class="n">things</span> <span class="ow">and</span> <span class="n">eventually</span> <span class="n">core</span> <span class="n">dump</span><span class="o">></span>
<span class="o">|</span> <span class="nc">Ok</span> <span class="bp">()</span> <span class="o">-></span> <span class="bp">()</span>
<span class="o">...</span>
</code></pre></div></div>
<p>My build system links in the async package from JaneStreet, and my src directory includes a file named <code class="language-plaintext highlighter-rouge">monitor.ml</code>. My src-tree-local compunit for <code class="language-plaintext highlighter-rouge">monitor.ml</code> produces the <code class="language-plaintext highlighter-rouge">Monitor</code> module, and this wreaks subtle havoc on OCaml’s linker. Compilation succeeds, but at runtime my executable spawns child processes, continually core dumping, until my machine runs out of memory and comes to a grinding halt. Renaming <code class="language-plaintext highlighter-rouge">monitor.ml</code> is all it takes to solve the issue if you are less fortunate than I was in figuring out the cause of the problem.</p>
<p><br /></p>
<h1 id="what-can-you-do">What can you do?</h1>
<p>It turns out you see a lot of OCaml projects with long, hopefully-unique prefixes on all of their filenames. This ends up looking like</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
<span class="nn">MyPkg_list</span><span class="p">.</span><span class="n">ml</span>
<span class="nn">MyPkg_set</span><span class="p">.</span><span class="n">ml</span>
<span class="nn">MyPkg_map</span><span class="p">.</span><span class="n">ml</span>
<span class="o">...</span>
</code></pre></div></div>
<p>for every variation of <code class="language-plaintext highlighter-rouge">MyPkg</code> that reimplements a similarly named module. This doesn’t solve the problem, it just (hopefully) avoids it with awkward, encumbered filenames, with no guarantee that you’re the only one authoring a package title <code class="language-plaintext highlighter-rouge">TheBestPkgEver</code>. You also can’t link in two different versions of the same library with this approach if that’s something you want to do.</p>
<p>Alternatively, OCaml offers ‘packed’ modules. Conceptually, this is like giving all your modules unique prefixes and then packaging them into one monolithic module that you link against. Different packs used in the same program can contain modules of the same name, at the cost of compilation time, binary bloat, and slow incremental builds because you are linking in/recompiling everything, regardless of whether or not it is being used or was changed.</p>
<p>As of OCaml 4.02, there’s at least some relief to this problem. Whereas previously</p>
<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nc">List</span> <span class="o">=</span> <span class="nc">Core_kernel_list</span>
</code></pre></div></div>
<p>copied the entire module into your compilation unit, it now aliases the module in reference instead. This is what happens when you <code class="language-plaintext highlighter-rouge">open</code> an external module, and now it comes with a lot less burden.</p>
<p>Most of my understanding of this issue is due to Yaron Minsky’s <a href="https://blogs.janestreet.com/better-namespaces-through-module-aliases/">blogpost</a> on the topic, <a href="http://gallium.inria.fr/~scherer/namespaces/spec.pdf">a proposal</a> for an improved handling of namespaces in OCaml, and <a href="http://lists.ocaml.org/pipermail/platform/2013-March/000213.html">this</a> thread on the OCaml mailing list. A huge thanks goes out to all of the people involved in those discussions.</p>
<ul>
<li>[1] <a href="http://lists.ocaml.org/pipermail/platform/2013-March/000213.html">http://lists.ocaml.org/pipermail/platform/2013-March/000213.html</a></li>
<li>[2] <a href="http://gallium.inria.fr/~scherer/namespaces/spec.pdf">http://gallium.inria.fr/~scherer/namespaces/spec.pdf</a></li>
<li>[3] <a href="https://blogs.janestreet.com/better-namespaces-through-module-aliases/">https://blogs.janestreet.com/better-namespaces-through-module-aliases/</a></li>
</ul>Here’s a fun thing I learned about OCaml today: OCaml’s linker has issues with module namespaces. Below is my understanding of the issue, let me know if I’m mistaken! This is a hard-to-diagnose issue if you don’t know what to look for, so hopefully this saves someone some time debugging.