Masking out bad GPS points with R

I've been playing around with R to correlate GPS and heart rate monitor logs from when we were cycling around in Italy.

One of the problems that I've run into is that the GPS position accuracy degrades significantly when less than 4 satellites are used, altitude data likewise degrades significantly when less than 6 satellites are used.

These high error data points add noise to the data set and complicate analysis. One option would be to find a way to calculate the error in position for each point, and move that data around with the position data.

That's hard. An easier way is to simply convert the inaccurate points to NULL (NA in R).

To import the data from my heart rate monitor, I used s710. To import the data from the GPS NMEA logs, I created a gpsbabel filter:


DESCRIPTION      GRASS GIS v.in.ascii
SHORTLEN  8
EXTENSION grass
# FILE LAYOUT DEFINITIIONS:
FIELD_DELIMITER         PIPE
RECORD_DELIMITER        NEWLINE
BADCHARS                ,"

IFIELD LON_DECIMAL,"","%f"
IFIELD LAT_DECIMAL,"","%f"
IFIELD ALT_METERS,"", "%f"
IFIELD GMT_TIME,"","%m/%d/%Y %H:%M:%S"
IFIELD PATH_SPEED,"","%f"
IFIELD PATH_COURSE,"","%f"
IFIELD GPS_HDOP,"","%f"
IFIELD GPS_VDOP,"","%f"
IFIELD GPS_PDOP,"","%f"
IFIELD GPS_SAT,"","%d"
IFIELD GPS_FIX,"","%s"

I had originally created this format for easy import into GRASS GIS, but it works nicely for importing into R :


Lat|Long|Altitude|timestamp|speed|bearing|hdop|vdop|pdop|satellites|fix
11.128023|43.430142|143.300000|09/26/2006 07:42:34|0.092600|11.690000|1.600000|0.000000|0.000000|6|3d
11.128025|43.430142|140.400000|09/26/2006 07:42:35|0.174911|177.690002|1.600000|0.000000|0.000000|6|3d
...

Read the GPS and heart rate data into an R frame with read.table.


raw_gps<-read.table("060926_bike_colle_valdelsa_siena.grass", sep="|", header=TRUE)
raw_hr1<-read.table("060926.0947.txt", header=TRUE)
raw_hr2<-read.table("060926.1447.txt", header=TRUE)

Create vectors to mask off the bad data.


mask_4sat <- !((!(raw_gps$satellites > 4)) & NA)
mask_6sat <- !((!(raw_gps$satellites > 6)) & NA)

This uses the identity that FALSE & NA FALSE, but TRUE & NA NA.

More data massaging is in order

The way I choose to deal with that is to create time series using ts(), and extend and resample the time series with window().


speed <- ts( raw_gps$speed    * mask_4sat, start=c(7, 2554), frequency=3600)
alt   <- ts( raw_gps$Altitude * mask_6sat, start=c(7, 2554), frequency=3600)
hr1   <- ts(data=raw_hr1$HR, start=c(7,564), frequency=720)
hr2   <- ts(data=raw_hr2$HR, start=c(12,567), frequency=720)

I've chose my unit of time to be the hour. The start argument to ts is particularly important, to get all of the data to line up. The easiest way to get all of this is to ensure that the clock on the heart rate monitor is set to UTC, and the GPS logs are in UTC. The frequency argument reflects the sampling rate of the GPS and heart rate monitor.

Now, mash all of this data together into a time series matrix, padding and resampling the constituent vectors to be uniform.


bike_cole_valdelsa_to_sienna <- ts(
    matrix( 
	c(
	    window(alt,   start=7.5, end=13.5, frequency=720, extend=TRUE), 
	    window(speed, start=7.5, end=13.5, frequency=720, extend=TRUE), 
	    window(hr1,   start=7.5, end=13.5, frequency=720, extend=TRUE), 
	    window(hr2,   start=7.5, end=13.5, frequency=720, extend=TRUE)
	),
	ncol=4),
    start=7.5, end=13.5, frequency=720)
colnames(bike_cole_valdelsa_to_sienna) <- c("altitude", "speed", "HR1", "HR2")

Now the data is all referenced together, and ready for some analysis.


jack | posted Mon Oct 30 13:15:44 2006 | #
category: projects/R

A new day, a new icecast

After several frustrations with the baroque arrangement of icecast 1.x, icecast 2.x, wget, madplay, obsequium, ices, and daemontools, I decided to replumb things to make them a wee bit simpler. The radio pipeline did a reasonably good job of managing playlists, and reencoding the stream to ogg to preserve bandwidth, but it wasn't playing well with proxies.

In the time since I had last done all of this, the good folk at xiph:http://www.icecast.org have released icecast 2.0, 2.0.1, 2.1.0, 2.2.0, and ices 2.0. This was an opportune time to reduce the complexity of the radio pipeline. With a little bit of hacking, I managed to get obs to talking directly to the icecast 2.2.0 server, which eliminated the need for the icecast 1.x server. A bit more script hacking and everything was up and running again.

Icecast 2.2.x introduces bursting, which makes me happy. No more waiting for the 64kiB buffer on your player to fill up. Icecast keeps enough data around to fill the players buffer in a burst on connection. This means the music starts instantly instead of after 8s econds.

The only outstanding issue is that the pipeline is now almost 5 minutes long. Somewhere, something has got about 5 minutes of music in a buffer. When obs starts feeding another track to the icecast server, it takes about 5 minutes before that track comes out the business end of the icecast server. I am begining to suspect the problem lies in my quick, cheesy, clueless hacks to the obs worker thread. The obs scheduling thread probably has a slightly different idea of how long it takes to stream out the tracks than shoutcast takes to do it.

I suspect this will lead to me eventually retiring obs, after 4 years of service, and replacing it with a markov chainer that uses data from audioscrobbler:http://www.audioscrobbler.com, and musicbrains:http://musicbrainz.org . More on that later...


jack | posted Mon Jan 10 22:31:32 2005 | updated Mon Oct 23 23:02:21 2006 | #
category: projects/radio

A weblog by Jack Cummings