Bookend consistency


<p>The writer&#8217;s side of the technique is this: Initialize both bookends to zero. On each write, increment the bookends.  Then copy the wrapper structure to the shared-memory segment byte-by-byte, <em>starting with the last byte and backing down to the first</em>.</p>
<p>The reader&#8217;s side is simpler: Copy the structure out of the shared-memory segment in normal (first-byte to last) order.  Look at the bookends.  If both have the same value, you have a coherent update. If they differ, you must back off and retry the read.</p>
<p>The reason this technique works is that if the shared-memory segment is in a mixed state (update in progress) the reversed-order write guarantees that bookend2 will be greater than bookend1 until the read is complete.  For the reader, on the other hand, forward copy guarantees that it can&#8217;t be fooled about the value of either bookend.  </p>
<p>(Thomas Zerucha &lt;tz@mich.com&gt; corrected my original proposal by exhibiting a case where you get inconsistency if the write copy is not in reverse order.)</p>
<p>The technique is robust in the face of interrupts to the copy operations, but depends on nothing in the software or hardware stack beneath C reordering the reads and writes in the copies.  A good place to begin ensuring this property is by declaring the pointer to the shared-memory segment &#8216;volatile'; this will tell your compiler that locations accessed through it may change asynchronously and it should not cache them or mess with operation order.</p>
<p>There are issues with what the underlying hardware might be doing, however.  There are two levels to worry about: instruction reordering in the processor and strange memory-controller optimizations.  The latter we can probably forestall by putting <a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a> before and after the copy loops.</p>
<p>One thing <em>not</em> to worry about is multiprocessor cache coherence.  MP systems pretty much have to guarantee interprocessor cache coherency, otherwise common instruction sequences like writes to memory-mapped devices would have undefined behavior. (For the same reason, MP systems have to limit instruction reordering.)</p>
<p>If you use memcpy(3) for forward copy, any modern compiler is likely to optimize it to a single-instruction copy or a phrase like X86 REP MOVS that won&#8217;t be reordered.  The point of maximum vulnerability will be in the reverse copy.  The iffiest case would be a single-processor system doing really aggressive instruction reordering that an MP can&#8217;t risk.  I doubt this is a practical problem, however, as byte-copy loops are so dead-simple that they&#8217;re unlikely targets for reordering.  </p>