This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues/pull-requests.
blog_post_tests/20090223072138.blog

95 lines
21 KiB
Plaintext

Why GPSes suck, and what to do about it
<p>I&#8217;m the lead of the <a href="http://gpsd.berlios.de/">GPSD project</a>, a service daemon that monitors GPS receivers on serial or USB ports and provides TPV (time-position-velocity) reports in a simple format on on a well-known Internet port. GPSD makes this job looks easy. But it&#8217;s not &#8212; oh, it&#8217;s decidedly not &#8212; and thereby hangs an entertaining tale of hacker ingenuity versus multiple layers of suck. </p>
<p><span id="more-801"></span></p>
<p>Away back in the dark and backward abysm of time when GPS technology was first being made generally available (1993), only military-grade receivers were sensitive enough to use it where there were things like buildings and trees partly blocking the sky view. The first civilian customers to actually find a use for it were people messing about in boats. Thus it came to pass that the manufacturers of marine navigation systems were the first civilians to grapple with the question of how a GPS receiver should report TPV information over a wire to a navigational computer.</p>
<p>Our first layer of suck begins with the National Marine Electronics Association, or NMEA. They wrote a standard describing a protocol for GPSes reporting over serial ports called NMEA 0183 which, despite being a technical expert in the field, I&#8217;ve never dared to look at. The reason is that they made it proprietary and expensive, and their lawyers have been known to threaten legal action against people who quote it on the net.</p>
<p>To add injury to insult, NMEA 0183 was (and still is) a crappy standard. How crappy? Well, before I get into that, let&#8217;s note that there is one thing NMEA did right that later attempts to replace it got wrong. Each NMEA report is a text packet, or sentence, that begins with a dollar sign and ends with a carriage-return and line feed. The data elements in in NMEA sentences are just text fields separated by commas, like this:</p>
<pre> $GPRMC,225446.33,A,4916.45,N,12311.12,W,000.5,054.7,191194,020.3,E,A*68
</pre>
<p>This means that log files of collected NMEA sentences are easy to read and edit. And that number on the right-hand end, after the &#8220;*&#8221; but before the CRLF? A data checksum, so you can tell whether you have a valid sentence or just line noise (and this is important: we&#8217;ll come back to it later). A GPS speaking NMEA emits sentences like this onto the wire, usually in once-per-second bursts.</p>
<p>The first layer of suck actually begins with what NMEA 0183 has you put in those packets. If you are a mathematician, you have a pretty good notion of what a TPV report is. It&#8217;s a 7-tuple <t , X, Y, Z, DX, DY, DZ> describing your position in four dimensions and your velocity in three. If you are an engineer or the more practical sort of physicist, you want to add expected-error estimates at some fixed confidence level, usually 50% or 95%, and return 14 numbers </t><t , X, Y, Z, DX, DY, DZ, ET, EX, EY, EZ, EDX, EDY, EDZ>. </p>
<p>Internally, this is what a GPS sensor computes from the signal times to GPS satellites. Actually, to be pedantic, it doesn&#8217;t compute the error bars in exactly this form; rather, you get scale factors for the errors derived from the geometry of the satellites when the fix was taken, and have to multiply that by an experimentally-derived bugger factor dependent on things like how turbulent the radio-reflecting layer in the ionosphere is.</p>
<p>Now let&#8217;s look at what NMEA 0183 tells GPS devices to actually report. Here is a breakdown of the data in our sample sentence, which is in fact the most commonly used GPS reporting format for TPV:</p>
<pre>
1 225446.33 Time of fix 22:54:46 UTC
2 A Status of Fix: A = Autonomous, valid;
D = Differential, valid; V = invalid
3,4 4916.45,N Latitude 49 deg. 16.45 min North
5,6 12311.12,W Longitude 123 deg. 11.12 min West
7 000.5 Speed over ground, Knots
8 054.7 Course Made Good, True north
9 181194 Date of fix 18 November 1994
10,11 020.3,E Magnetic variation 20.3 deg East
12 A FAA mode indicator (NMEA 2.3 and later)
A=autonomous, D=differential, E=Estimated,
N=not valid, S=Simulator, M=Manual input mode
13 *68 Mandatory NMEA checksum
</pre>
<p>Alert readers will notice what&#8217;s missing here. Altitude, for starters &#8212; we&#8217;ve got no Z! People in boats, remember? They think they don&#8217;t need no steenking altitude. And no error estimates at all. And the T report is incomplete, giving only a two-digit year. Yup, that one got annoying <em>real</em> fast when the millennium turned. And it&#8217;s not like the designers couldn&#8217;t see that coming in 1993.</p>
<p>Eventually, NMEA wised up about the altitude thing. The sane way to proceed would have been to define a new sentence containing all the GPRMC information, plus altitude, plus a real four-digit year, even if error bars had to remain suppressed for some inexplicable reason. Here&#8217;s what we got instead:</p>
<pre>
$GPGGA,123519,4807.038,N,01131.324,E,1,08,0.9,545.4,M,46.9,M, , *42
1 123519 Fix taken at 12:35:19 UTC
2,3 4807.038,N Latitude 48 deg 07.038' N
4,5 01131.324,E Longitude 11 deg 31.324' E
6 1 Fix quality: 0 = invalid, 1 = GPS, 2 = DGPS,
3=PPS (Precise Position Service),
4=RTK (Real Time Kinematic) with fixed integers,
5=Float RTK, 6=Estimated, 7=Manual, 8=Simulator
7 08 Number of satellites being tracked
8 0.9 HDOP = Horizontal dilution of position
9,10 545.4,M Altitude, Metres above mean sea level
11,12 46.9,M Height of geoid (mean sea level) above WGS84
ellipsoid, in Meters
</pre>
<p>Now we&#8217;ve got X, Y, and Z&#8230;but T is even more damaged! You get a time of day, no month, no year, no century. No velocity report at all. We&#8217;ve got one number, HDOP, that tangles EDX and EDY together to give a circular horizontal error. And despite the fact that this sentence reports an altitude (Z), there&#8217;s an EDX/EDY and <em>no report of EDZ</em>!</p>
<p>For some inexplicable reason, NMEA also describes a GPGLL sentence that has all the brain-damage of GPGGA, but without the altitude. And a GPVTG that gives <em>only</em> a velocity report &#8211; no position, and naturally no error bars. Do I need to add that both have missing or incomplete timestamps? And oh, yes, there are actually two different incompatible variants of GPVTG.</p>
<p>Remember I said GPS receivers emit bursts of NMEA packets once a second? Well, the bursts typically consist of a GPRMC, followed by a GGA, possibly followed by a GPGLL and/or GPVTG. Er, no, I&#8217;m lying, they could be in a different order. The sentences in the burst have overlapping, incomplete information. The NMEA standard doesn&#8217;t specify even which ones must be sent, let alone the order they&#8217;re sent in.</p>
<p>Some NMEA GPSes part-repair the timestamp damage by shipping a sentence called GPZDA that gives you a full UTC timestamp with century. But the standard doesn&#8217;t require it, and most don&#8217;t, so you can&#8217;t count on it.</p>
<p>The first layer of suck was about what NMEA 0183 specifies. We are now passing into the second layer of suck, which is what it <em>doesn&#8217;t</em> specify. Like, the minimum set of sentences that have to be sent per reporting cycle. Oh, and nothing in the standard stops a GPS from simply omitting fields it doesn&#8217;t feel like reporting. It&#8217;s fairly common, for example, for receivers to not report magnetic variation or geoid separation (the geoid is an imaginary surface representing the difference between mean sea level and local sea level, which varies because the earth&#8217;s mass is not uniformly distributed.). GPS designers can save some absurdly tiny fraction of a penny per unit by not having these data tables in ROM, and they&#8217;re generally more than willing to shaft their customers to do it.</p>
<p>A mob of crack-smoking rhesus monkeys could have designed a better standard than NMEA 0183. It means that if you want to assemble a proper TPV report from NMEA sentences, you actually need to wait until you&#8217;ve seen an entire reporting cycle. Only&#8230;you can&#8217;t tell without knowing the type and firmware version of the GPS which sentences start and end the cycle! And even if you did know, buffering the partial data introduces latency that may be unacceptable for some applications.</p>
<p>A very practical way this manifests is that if you have a GPS client faithfully reporting the NMEA sentences coming over the wire, your altitude will typically flicker from known to unknown and back twice a second as it gets hit by alternating GPRMC and GPGGA sentences. That is, unless you buffer, in which case the altitude you see could be up to one second stale and associated with a previous fix.</p>
<p>The incomplete timestamps mean various sorts of lossage can bite you if you have a GPS client active at midnight. Unless your software is actually watching for the moment when the GGA timestamp goes to 00:00:00 and can compensate, it&#8217;s going to look like you&#8217;ve dropped back in time 24 hours until the GPRMC next comes in. Human eyes can just reject this, but what if you&#8217;re logging telemetry and try to graph against time? Similar anomalies lurk at the edges of years and centuries.</p>
<p>Yes, and if you want to report true altitude over ground correctly and consistently across devices, you better have your own geoidal separation table in software somewhere.</p>
<p>And I have nowhere <em>near</em> plumbed the stygian depths of the NMEA standard&#8217;s top two layers of suck. To spare the reader&#8217;s sanity, we shall lightly draw a veil over the spiky, vague, ill-documented horror that is NMEA error and status reporting and pass directly to the <em>third</em> layer of NMEA suck, the complete absence of any standardization of GPS control codes.</p>
<p>Here are some of the more important things there is no NMEA-standard way to tell a GPS to do:</p>
<ol>
<li>Report its vendor, model, and firmware version.</li>
<li>Change the set of sentences it ships per cycle.</li>
<li>Change the baud rate at which it reports.</li>
<li>Change the number of samples it reports per second.</li>
</ol>
<p>Of these, (1) is the most harmless-looking, but actually the deadliest. Many GPSes have vendor-defined commands to do (2) and (3), but it is far from trivial to figure out which set of vendor-defined commands might apply. If you are a GPS-using application, and you are handed the name of a port with a GPS on it, you have to either settle for the minimum common subset of GPS behaviors, or throw all the vendor-specific ID probes you know of at the device hoping it will respond to one of them. Hint: too often, it won&#8217;t.</p>
<p>But wait. Things get worse!</p>
<p>There are, broadly speaking, three major different ways that GPS vendors could have responded to the admitted fact that NMEA 0183, as given, is a festering pile of rancid camel vomit.</p>
<ol>
<li> They could have pressured NMEA into cleaning up the damn standard.</li>
<li>They could have de-facto standardized on a decent set of extensions using the NMEA sentence packet format &#8211; a sentence that reports all 14 location parameters, a probe-for-ID query, a standard baud-rate change, etc.</li>
<li>Or&#8230;they could invent a dozen mutually incompatible and poorly documented proprietary binary protocols, all of which throw away the transparency advantages of the NMEA textual packet format and each one of which introduces unique and special brain-damage of its own!</li>
</ol>
<p>Guess which alternative most of them chose. Just guess&#8230;</p>
<p>You are now in a twisty maze of GPS reporting protocols, all different. Many devices have two different operating modes, one in which they emit NMEA packets and one in which they emit a vendor binary protocol that looks like nothing so much as line noise. At least one major vendor has dropped NMEA support entirely. If your location-sensitive application is naively expecting NMEA, you lose.</p>
<p>To be fair, one things the vendor binary protocols generally get right that NMEA doesn&#8217;t is shipping something close to a full TPV in one sentence per cycle. This at least avoids the nasty problems associated with integrating partial NMEA reports and worries about where the start of cycle is. However, I had to say &#8220;something close&#8221;; <em>not one</em> protocol ships the full and correct TPV 14-tuple. Usually one or more velocity components and error estimates are missing and have to be computed.</p>
<p>Let&#8217;s back off at this point and consider how people who use GPS sensors would, ideally, like their GPS sensors to behave. You plug it in, your software figures out what protocol it&#8217;s using, autoconfigures to match it, and starts collecting TPV reports and using them.</p>
<p>If &#8220;your software&#8221; is a GPSD-enabled application on a system with gpsd installed, it actually works this way. Those of you who have been following our descent into this fourth major layer of suck can be excused for wondering how in the flipping hell GPSD ever managed this trick in the twisty maze of vendor protocols, all different.</p>
<p>Certainly the vendors aren&#8217;t being much help here. Many of them (I&#8217;m looking at <em>you</em>, Garmin!) are cheerfully willing to assume that you will never use anything but their one idiosyncratic piece of GPS hardware, and that it will only talk to a limited, vendor-controlled selection of closed-source binary blobs provided by them or their business partners. Hello, vendor lock-in; goodbye, customer choice.</p>
<p>There is an Ariadne&#8217;s thread through this maze. It&#8217;s this: All the vendor protocols, like NMEA 0183, use packets with checksums and fixed header/trailer bytes. The intention is that they&#8217;re an integrity check so you don&#8217;t get fooled by line-noise-induced glitches. The side effect is that, if you&#8217;re sufficiently clever, you can do GPS protocol autodetection on the fly. It takes a fairly complex state machine that tangles together structural knowledge about every packet protocol in your supported set, but it can be done. In GPSD-land we call this piece of code the packet sniffer.</p>
<p>There&#8217;s something else the packet-sniffer does: it autobauds. Again, this is only possible because packet checksumming gives you a way to know for sure when you&#8217;re looking at valid data. When a serial GPS device is presented to gpsd, the packet sniffer doesn&#8217;t have to be told the baud rate the device is shipping at &#8211; it cycles through all possible combinations of speed, parity and stop bits looking for a combination under which it sees valid packets of some type. Normally this takes less than a second.</p>
<p>The packet sniffer is the real reason for the existence of gpsd &#8212; and I&#8217;d add &#8220;other programs like it&#8221;, except that there aren&#8217;t any others that I know of. Long ago, all the gpsd daemon did was serve as a multiplexer that read and buffered TPV reports from a single serial device so that several GPSD-aware applications could get simultaneous access to them. That&#8217;s all GPSD&#8217;s closest competitors today, like <a href="http://gypsy.freedesktop.org/">Gypsy</a>, can do; they&#8217;re NMEA multiplexers. They typically can&#8217;t cope with non-NMEA devices at all; no packet sniffer.</p>
<p>The gpsd daemon also copes with the data management problems surrounding NMEA partial TPV reports, doing everything from supplying missing geoidal separation for altitude to computing and reporting error estimates from the geometry of the satellite skyview if the GPS doesn&#8217;t supply them.</p>
<p>Most of the suck surrounding GPSes can be summed up by &#8220;all this cleverness is actually <em>necessary</em> if you want to get clean TPV data out of more than one different kind of device&#8221;, or even out of just one kind of device that fails to supply a complete TPV. And, as we&#8217;ve noted, all of them fail in a dizzying variety of ways.</p>
<p>It&#8217;s true that in theory, every single GPS-aware application could include its own packet sniffer, the matrix algebra needed to compute missing error estimates, its own geoidal separation table, and all the other random logic needed to cope even with GPSes that are working nominally correctly. But have we mentioned yet that some&#8230;don&#8217;t? We know of at least three circumstances under which popular GPS chipsets return un-obviously corrupted NMEA &#8211; detectable, but you actually have to know how. Then there&#8217;s one chipset we know of that returns incorrect packet checksums when it doesn&#8217;t have a fix.</p>
<p>And we&#8217;re not done yet, because there at least two other sets of issues about extracting sense from these devices. One set is an artifact of the way USB GPSes are put together. Now, USB is generally a good thing in this context; unlike old-school serial ports, USB devices raise notification events on connect and disconnect, which clever GPS software can listen for and use to automatically hook up and sync to GPS sensors when they&#8217;re available.</p>
<p>However&#8230;naked GPS chips report serial data at TTL levels. The standard way to build a USB GPS is to hook up your GPS chip with a serial-to-USB bridge; there&#8217;s one in particular called a Prolific Logic 2303 that tends to show up on about 70% of the USB GPSes out there. There are two problems with this kind of design, one obvious and one subtle.</p>
<p>The subtle one is that both the bridge and the UART on the GPS chip have their own data buffers. Under most circumstances this doesn&#8217;t matter because the introduced latency from both together is very small &#8211; but some control operations (notably the serial-speed changes you&#8217;re going to be doing while you try to sync up with the device) need an amount of delay sufficient to flush both, otherwise you get odd race conditions that can result in garbage data coming back up the wire or your control operation silently failing. </p>
<p>The right combinations of OS-level buffer flushes and delays will avoid this problem, but clumsy ways to do it cause fix latency and application slowdown. The comment explaining these issues in the GPSD code leads off with &#8220;Serious black magic begins here.&#8221; and continues for 48 lines &#8212; because it needs to.</p>
<p>(This isn&#8217;t the worst thing you have to be careful of while hunting, though. Some Bluetooth GPSes with defective firmware will actually go so badly catatonic if you try to change their baud rate that you actually have to crack the case and unsolder the battery to unbrick them!).</p>
<p>Here&#8217;s the more obvious USB problem: there is no USB device class for GPSes. A USB GPS will present the vendor/product ID of the serial-to-USB converter. This means that, even if you&#8217;re fortunate enough to have an operating system that can do something reasonable with hotplug events, you can&#8217;t just tell it to watch for GPS devices going live and connect them to your software; you have to know which bridge-chipset IDs are likely to have GPSes behind them, sniff the data, and <em>let go of the device if it&#8217;s not shipping GPS packets!</em> Otherwise you might eat events from non-GPS serial devices that some other application badly needs to see.</p>
<p>And so on, and so on. Dealing with all this crap is further complicated by vendor documentation that is scanty if you can get it at all, and often written in rather broken English when you can. Part of of the problem is the structure of the GPS sensor market, which largely consists of dozens of tiny Pacific-Rim companies &#8211; each popping up out of nowhere, shipping the cheapest possible spin on one of about a half-dozen reference designs, and disappearing six to eighteen months later.</p>
<p>A friend who works in embedded systems tells me these little outfits aren&#8217;t even intended to last long; they&#8217;re actually run by giant electronics combines through several layers of shell companies as a way of providing deniability in case of lawsuits by patent trolls. They spin up, they ship, they funnel money back to daddy&#8230;and before there&#8217;s time for a process-server to show up, they disappear. All the engineers get rehired by a different sock puppet a week later. Lather, rinse, repeat. And&#8230;er&#8230;product support? What&#8217;s that?</p>
<p>It&#8217;s messy. Really messy. Those who love the law, sausage, or GPS devices really shouldn&#8217;t watch any of them being made.</p>
<p>Expecting GPS-aware applications to keep track of all this stuff would be just nuts. The best way to cope is to have a dedicated service layer that specializes in knowing about GPS idiosyncracies, hides all that ugliness, and presents a simple TPV-reporting interface to the application layer above. Ideally, the service layer should have a sharp crew of developers who are specialist GPS experts so that nobody else has to be.</p>
<p>And that&#8217;s exactly what the GPSD project is. It looks like a simple job&#8230;but it&#8217;s not.<br />
</t></p>