Bookend consistency

The writer’s side of the technique is this: Initialize both bookends to zero. On each write, increment the bookends. Then copy the wrapper structure to the shared-memory segment byte-by-byte, starting with the last byte and backing down to the first.

The reader’s side is simpler: Copy the structure out of the shared-memory segment in normal (first-byte to last) order. Look at the bookends. If both have the same value, you have a coherent update. If they differ, you must back off and retry the read.

The reason this technique works is that if the shared-memory segment is in a mixed state (update in progress) the reversed-order write guarantees that bookend2 will be greater than bookend1 until the read is complete. For the reader, on the other hand, forward copy guarantees that it can’t be fooled about the value of either bookend.

(Thomas Zerucha <tz@mich.com> corrected my original proposal by exhibiting a case where you get inconsistency if the write copy is not in reverse order.)

The technique is robust in the face of interrupts to the copy operations, but depends on nothing in the software or hardware stack beneath C reordering the reads and writes in the copies. A good place to begin ensuring this property is by declaring the pointer to the shared-memory segment ‘volatile'; this will tell your compiler that locations accessed through it may change asynchronously and it should not cache them or mess with operation order.

There are issues with what the underlying hardware might be doing, however. There are two levels to worry about: instruction reordering in the processor and strange memory-controller optimizations. The latter we can probably forestall by putting memory barriers before and after the copy loops.

One thing not to worry about is multiprocessor cache coherence. MP systems pretty much have to guarantee interprocessor cache coherency, otherwise common instruction sequences like writes to memory-mapped devices would have undefined behavior. (For the same reason, MP systems have to limit instruction reordering.)

If you use memcpy(3) for forward copy, any modern compiler is likely to optimize it to a single-instruction copy or a phrase like X86 REP MOVS that won’t be reordered. The point of maximum vulnerability will be in the reverse copy. The iffiest case would be a single-processor system doing really aggressive instruction reordering that an MP can’t risk. I doubt this is a practical problem, however, as byte-copy loops are so dead-simple that they’re unlikely targets for reordering.