R/Interfacing with GIS

From QERM Wiki
Jump to: navigation, search
SampleNikiHabitat.png

Loading shapefiles in R and doing some "basic" data manipulations.

Contents

Some background

I did an analysis in Finland of wolf movements that involved a lot of GIS-based analysis. Specifically, I needed to look at locations of remotely tracked wolves with respect to all sorts of landscape covariates: habitat types, distance from linear elements (paved roads, forest roads, rivers, railways, power lines, habitat edges), and so on. The land-class data, for example, came from an openly available Europe-wide CORINE land cover database, which divides up all of Europe into land cover classes ranging from "1.1.1 Continuous Urban fFabric" to "5.2.3 Sea and Ocean" passing through "2.2.3 Olive groves" and "2.2.1 Vineyards" to "3.3.5 Glacial and perpetual snow" in between (you know - typical Brussels - simultaneously irritating everyone from Corsican olive growers to Sami reindeer herders in one fell, bureaucratic swoop).

I did most of this in R, using functions in the "maptools" package, doing things that looked like this:

  NH <- read.shape("habitat_data_niki")	
  NH.data <- NH$att.data
  NH.poly <- Map2poly(NH)

Where "read.shape" would load the whole shapefile for the territory of a wolf (named Niki) and turning it into some kind of R shapefile list class, in which NH$att.data would be a table listing the ID, centerpoint, perimeter, area and habitat type of each polygon in the shapefile, and "Map2poly" would convert the R shapefile into a list of polygons which could be independently manipulated.

It was a little fussy, but eventually worked great, and we came up with some nice results and wrote them up and everyone was happy.

Tragedy

BUT THEN ... less than one year later, I tried to revisit the code, and discovered that NONE OF THE FUNCTIONS I HAD BEEN USING WORK! Not a one! They had all been "deprecated", which I guess is some kind of code euphemism similar to "rendered" for undesirable people, but apparently more absolute.

Resolution

Anyways, I pulled my hair for a while, and tried to figure out what to do, and what the new functions were, and how to do something really, really, basic, like identify the habitat type of a known location. It took hours and hours and a lot of futile attempts, but I finally got things to work. It is MUCH uglier than it used to be, and very confusing, and opened up a whole terrifying world of "Slots" (does anyone have experience with these?) and lots of other horrors. But it sort of worked

So let me lay a little code on you now.

Loading and plotting shapefiles

I uploaded the shapefile for the territory snip here. Unfortunately (and inexplicably ... can anyone help?) you cannot load the shape file directly from the website. So, if you are foolish enough to actually try this, you will have to download the files (all bundled in [1]) and place them in some appropriate location. Then:

 require(maptools)
 NH <- readShapeSpatial("./data/Habitat/habitat_data_Niki")

loads the data. So far so good. But what IS NH? It is a "SpatialPolygonsDataFrame". What is that? A nightmare. For one thing, it is not a list or a dataframe, it is an "S4 formal class", which does not contains "elements", but things called "slots", which are accessed not via "$" but via "@" (Has anybody heard of this? Does anybody have any insight into this?). Also, where "names" tells you what the column names are in a "data.frame", for an "S4 formal class" the relevant command is "slotNames". Thus:

slotNames(NH)

gives: "> "data" "polygons" "plotOrder" "bbox" "proj4string" Incidentally, in order to learn what an exotic object is, and how it is structured, I discovered the incredibly useful function "str", as in:

  str(NH)

Anyways, marching forward, we extract some useful pieces of "NH" using slot commands. Here is the table of attributes:

  NH.data <- NH@data                  

Just for illustration, we can look at the first few rows of this table:

 
        AREA PERIMETER CLC_EU25HA CLC_EU25_1 CODE_00 LEVEL3 LEVEL2 LEVEL1
0  2384796.0 12986.283      61188      63758     313    313     31      3
1  1366056.7  9959.354      61230      63800     512    512     51      5
2   395430.1  3364.520      61264      63834     312    312     31      3
3 10050618.4 50082.922      61277      63847     324    324     32      3
4  5931044.7 34088.445      61316      63886     313    313     31      3
5   404233.0  4933.306      61363      63933     313    313     31      3

Most of these are self-explanatory. The column "LEVEL3" is the finest scale habitat classification according to CORINE. The bounding box of the data is:

  NH.bbox <- NH@bbox	  

Finally, we need to dig inside of this S4 formal class to extract a "list" of polygons:

  NH.polygons <- NH@polygons

For some reason, for plotting, we need to convert "NH.polygons" to a SpatialPolygons class object. Why? I don't know. Steps like this are why it takes a long time to figure things out.

  NH.polylist <- as.SpatialPolygons.PolygonsList(NH.polygons)

This is how I made the plot at the top of the page:

   Habitat <- factor(NH.data$LEVEL3)
# recall that "LEVEL3" is the habitat
   HabitatColors <- c("wheat","forestgreen","darkgreen","lightgreen","brown","blue")
   HabitatNames <- c("Agriculture", "Coniferous forest", "Mixed forest", "Transitional woodland shrub", "Peatbogs", "Water")
   HabitatNames <- paste(levels(Habitat), HabitatNames)
   plot(NH.polylist, bor=NA, col=HabitatColors[as.integer(Habitat)])
# so somehow "plot" knows that when you plot a polylist, 
# you can call out the colors from this vector associated with the "NH.data"
   legend("topleft", legend=HabitatNames, fill=HabitatColors, bty="n")

Identifying the land class of points in the landscape

Create some random points within the bounding box:

xy <- data.frame(x=runif(20,NH.bbox[1,1],NH.bbox[1,2]),
		 y=runif(20,NH.bbox[2,1],NH.bbox[2,2]))

Plot those points (note the pale colors)!

SampleNikiHabitatWithPoints.png
# this is a sneaky sequence that allows you to make pale colors (by adding "alpha" to named colors)
   HabitatCols.matrix <- col2rgb(HabitatColors)
   HabitatCols.pale <- rgb(t(HabitatCols.matrix), max=255, alpha=100)

   plot(NH.polylist, bor=NA, col=HabitatCols.pale[as.integer(Habitat)])
   points(xy, pch=21, bg="darkred", fg=1)

Some of these points are outside the edges of the habitat snip. That's OK.

The key for identifying points in polygons is the simple function "inout" which tells you whether a data frame of points "xy" is inside a polygon. It is part of the splancs package. That part is Ok.

A more brutal piece (that took a long time to figure out, and that I could not find elsewhere on the web, and that is probably easier to do than I did) is actually extracting the polygon coordinates from a polygon list. Basically, the way to get the "i"th polygon is:

 
  IthPolygonCoordinates <- NH.polygons[[i]]@Polygons[[1]]@coords

Which, to me, looks crazy. If you were going to extract it from the shape file itself it would look like:

 
  IthPolygonCoordinates <- NH@polygonslist[[i]]@Polygons[[1]]@coords

What the logic is behind this deeply layered over-structured object is beyond me. It does not seem as simple as the older functions. But who am I to question the wisdom on Roger Bivand?

Anyways, now we're ready for the final coup.

# number of polygons
n <- nrow(NH.data)

# null vector of habitats

require(splancs)
xy.habitat <- factor(vector(len=nrow(xy)), levels=levels(Habitat))

for(i in 1:n)
{
  mycoords <- NH.polygons[[i]]@Polygons[[1]]@coords
  inout <- inout(xy, mycoords)
  if(sum(inout)>0)
     xy.habitat[inout] <- Habitat[i]
}

Basically I take each polygon and I see if any points land in it. Most of the time, I'm pretty dissappointed. But occasionally, something comes up - and the habitat level of that "i"th polygon gets stored in the blank xy.habitats vector.

An important caveat and solution

The extremely astute observer will wonder if there might not maybe possibly perhaps be an issue with toroidal polygons, or a point within a habitat sections within a larger habitat section. How do we know that the habitat identified is the habitat of the smaller bit? Well, we don't. HOWEVER, if we loop through the polygons in descending order of area size, then we are guaranteed that the eventual point selection will be in the smallest polygon. To fix that small but important point, we use the AREA column from "NH.data"

Thus:

polygon.order <- order(NH.data$AREA, decreasing=TRUE)
for(i in polygon.order)
{
  mycoords <- NH.polygons[[i]]@Polygons[[1]]@coords
  inout <- inout(xy, mycoords)
  if(sum(inout)>0)
     xy.habitat[inout] <- Habitat[i]
}

Does everything right.

Plot it to satisfy that the points have been Accurately Identified!

   plot(NH.polylist, bor=NA, col=HabitatCols.pale[as.integer(Habitat)])
   points(xy, pch=21, bg=HabitatColors[as.integer(xy.habitat)], cex=2)

SampleNikiHabitatWithIdentifiedPoints.png

Mission accomplished!

Complete Code

require(maptools)
require(splancs)
 
NH <- readShapeSpatial("./data/Habitat/habitat_data_Niki")
NH.data <- NH@data  
NH.bbox <- NH@bbox	
NH.polygons <- NH@polygons
NH.polylist <- as.SpatialPolygons.PolygonsList(NH.polygons)
Habitat <- factor(NH.data$LEVEL3)

xy <- data.frame(x=runif(20,NH.bbox[1,1],NH.bbox[1,2]),
		 y=runif(20,NH.bbox[2,1],NH.bbox[2,2]))

xy.habitat <- factor(vector(len=nrow(xy)), levels=levels(Habitat))
polygon.order <- order(NH.data$AREA, decreasing=TRUE)
for(i in polygon.order)
{
  mycoords <- NH.polygons[[i]]@Polygons[[1]]@coords
  inout <- inout(xy, mycoords)
  if(sum(inout)>0)
     xy.habitat[inout] <- Habitat[i]
}

HabitatColors <- c("wheat","forestgreen","darkgreen","lightgreen","brown","blue")
HabitatNames <- c("Agriculture", "Coniferous forest", "Mixed forest", "Transitional woodland shrub", "Peatbogs", "Water")
HabitatNames <- paste(levels(Habitat), HabitatNames)
HabitatCols.matrix <- col2rgb(HabitatColors)
HabitatCols.pale <- rgb(t(HabitatCols.matrix), max=255, alpha=100)

plot(NH.polylist, bor=NA, col=HabitatCols.pale[as.integer(Habitat)])
points(xy, pch=21, bg=HabitatColors[as.integer(xy.habitat)], cex=2)
Personal tools