This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues or pull requests.
blog_post_tests/20141003122051.blog
Olivier DOSSMANN d897ae448f Initial commit
2014-11-19 16:42:25 +01:00

63 lines
7.2 KiB
Plaintext

RFC for a better C calendaring library
<p>In the process of working on my <a href="http://www.catb.org/esr/time-programming/">Time, Clock, and Calendar Programming In C</a> document, I have learned something sad but important: the standard Unix calendar API is irremediably broken.</p>
<p>The document list a lot of consequences of the breakage, but here I want to zero in on what I think is the primary causes. That is: the standard struct tm (a) fails to be an unambiguous representation of time, and (b) violates the SPOT (Single Point of Truth) design rule. It has some other more historically contingent problems as well, but these problems (and especially (a)) are the core of its numerous failure modes.</p>
<p>These problems cannot be solved in a backwards-compatible way. I think it&#8217;s time for a clean-sheet redesign. In the remainder of this post I&#8217;ll develop what I think the premises of the design ought to be, and some consequences.</p>
<p><span id="more-6324"></span></p>
<p>The functions we are talking about here are tzset(), localtime(3), gmtime(3), mktime(3), strftime(3), and strptime() &#8211; everything (ignoring some obsolete entry points) that takes a struct tm argument and/or has timezone issues.</p>
<p>The central problem with this group of functions is the fact that the standard struct tm (what manual pages hilariously call &#8220;broken-down-time&#8221;) was designed to hold a local time/date <em>without an offset from UTC time</em>. The consequences of this omission cascade through the entire API in unfortunate ways. </p>
<p>Here are the standard members:</p>
<pre language="C">
struct tm
{
int tm_sec; /* seconds [0,60] (60 for + leap second) */
int tm_min; /* minutes [0,59] */
int tm_hour; /* hour [0,23] */
int tm_mday; /* day of month [1,31] */
int tm_mon ; /* month of year [0,11] */
int tm_year; /* years since 1900 */
int tm_wday; /* day of week [0,6] (Sunday = 0) */
int tm_yday; /* day of year [0,365] */
int tm_isdst; /* daylight saving flag */
};
</pre>
<p>The presence of the day of year and day of week members violates SPOT. This leads to some strange behaviors &#8211; mktime(3) &#8220;normalizes&#8221; its input structure by fixing up these members. This can produce subtle gotchas.</p>
<p>Also, note that there is no way to represent dates with subsecond precision in this structure. Therefore strftime(3) cannot format them and strptime(3) cannot parse them.</p>
<p>The GNU C library takes a swing at the most serious problem by adding a GMT offset member, but only half-heartedly. Because it is concerned with maintaining backward compatibility, that member is underused.</p>
<p>Here&#8217;s what I think it ought to look like instead</p>
<pre language="C">
struct gregorian
{
float sec; /* seconds [0,60] (60 for + leap second) */
int min; /* minutes [0,59] */
int hour; /* hour [0,23] */
int mday; /* day of month [1,31] */
int mon; /* month of year [1,12] */
int year; /* years Gregorian */
int zoffset; /* zone offset, seconds east of Greenwich */
char *zone; /* zone name or NULL */
int dst; /* daylight saving offset, seconds */
};
</pre>
<p>Some of you, I know, are looking at the float seconds member and bridling. What about roundoff errors? What about comparisons? Here&#8217;s where I introduce another basic premise of the redesign: <em>integral floats are safe to play with.</em>.</p>
<p>That wasn&#8217;t true when the Unix calendar API was designed, but IEEE754 solved the problem. Most modern FPUs are well-behaved on integral quantities. There is not in fact a fuzziness risk if you stick to integral seconds values.</p>
<p>The other way to handle this &#8211; the classic Unix way &#8211; would have been to add a decimal subseconds member in some unit, probably nanoseconds in 2014. The problem with this is that it&#8217;s not future-proof. Who&#8217;s to say we won&#8217;t want finer resolution in a century?</p>
<p>Yes, this does means decimal subsecond times will have round-off issues when you do certain kinds of arithmetic on them. I think this is tolerable in calendar dates, where subsecond arithmetic is unusual thing to do to them.</p>
<p>The above structure fixes some quirks and inconsistencies, The silly 1900 offset for years is gone. Time divisions of a day or larger are consistently 1-origin as humans expect; this will reduce problems when writing and reading debug messages. SPOT is restored for the calendar portion of dates.</p>
<p>The zoffset/zone/dst group do not have the SPOT property &#8211; zone can be inconsistent with the other two members. This is, alas, unavoidable if we&#8217;re going to have a zone member at all, which is pretty much a requirement in order for the analogs of strftime(3) and strptime() to have good behavior. </p>
<p>Now I need to revisit another basic assumption of the Unix time API: that the basic time type is integral seconds since the epoch. In the HOWTO I pointed out that this assumption made sense in a world of 32-bit registers and expensive floating point, but no longer in a world of 64-bit machines and cheap floating point.</p>
<p>So here&#8217;s the other basic decision: the time scalar for this library is quad-precision seconds since the epoch in IEEE74 (that is, 112 bits of mantissa).</p>
<p>Now we can begin to sketch some function calls. Here are the basic two:</p>
<p>struct gregorian *unix_to_gregorian(double time, struct gregorian *date, char *zone)</p>
<p>Float seconds since epoch to broken-down time. A NULL zone argument means UTC, not local time. This is important because we want to be able to build a version of this code that doesn&#8217;t do lookups through the IANA zone database for embedded applications.</p>
<p>double gregorian_to_unix(struct gregorian *date)</p>
<p>Broken-down time to float seconds. No zone argument because it&#8217;s contained in the structure. Actually this function wouldn&#8217;t use the zone member but just the zoffset member; this is significant because we want to limit lookups to the timezone database for performance reasons.</p>
<p>struct gregorian *gregorian_to_local(struct gregorian *date, char *zone)</p>
<p>Broken-down time to broken-down time normalized for the specified zone. In this case a null zone just means normalize so there are no out-of-range structure elements (e.g. day 32 wraps to the 1st, 2nd, or 3rd of the next month) without applying any zone change. (Again, this is so the IANA timezone database is not a hard dependency).</p>
<p>Notice that both functions are re-entrant and can take constant arguments.</p>
<p>An auxiliary function we&#8217;ll need is:</p>
<p>char *local_timezone(void)</p>
<p>so we can say this:</p>
<p>unix_to_gregorian(time, datebuffer, local_timezone())</p>
<p>We only need two other functions: gregorian_strf() and gregorian_strp(), patterned after strftime() and strptime(). These present no great difficulties. Various strange bugs and glitches in the existing functions would disappear because zone offset and name are part of the structures they operate on.</p>
<p>Am I missing anything here? This seems like it would be a large improvement and not very difficult to write.</p>