2015-06-08

BP Oil Spill

Data Sets

  1. NOAA Data
    • National Oceanic and Atmospheric Administration
    • Temperature and Salinity Data in Gulf of Mexico
    • Measured using Floats, Gliders and Boats
  2. US Fisheries and Wildlife Data
    • Animal Sightings on the Gulf Coast
    • Birds, Turtles and Mammals
    • Status: Oil Covered or Not

Both data sets have geographic coordinates for ever observation

Loading NOAA Data

x <- ls()
noaa <- "http://www.public.iastate.edu/~hofmann/looking-at-data/data/noaa.rdata"
if (!file.exists("noaa.rdata")) download.file(noaa, "noaa.rdata")
load("noaa.rdata")
setdiff(ls(), x)
## [1] "boats"   "floats"  "gliders" "noaa"    "rig"     "x"

Floats Data

str(floats)
## 'data.frame':    10332 obs. of  14 variables:
##  $ callSign      : Factor w/ 11 levels "Q4901043","Q4901044",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Date_Time     : Factor w/ 73 levels "5/24/2010","5/26/2010",..: 30 30 30 30 30 30 30 30 30 30 ...
##  $ JulianDay     : num  2455390 2455390 2455390 2455390 2455390 ...
##  $ Time_QC       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Latitude      : num  24.8 24.8 24.8 24.8 24.8 ...
##  $ Longitude     : num  -88 -88 -88 -88 -88 ...
##  $ Position_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Depth         : int  2 4 6 8 10 12 14 16 18 20 ...
##  $ Depth_QC      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Temperature   : num  29.8 29.6 29.5 29.5 29.5 ...
##  $ Temperature_QC: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Salinity      : num  36.6 36.6 36.6 36.6 36.6 ...
##  $ Salinity_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Type          : Factor w/ 1 level "Float": 1 1 1 1 1 1 1 1 1 1 ...

Floats

library(ggplot2)
qplot(Longitude, Latitude, colour = callSign, data = floats) + 
  coord_map()

A note on qplot() versus ggplot()

  • qplot() makes many assumptions to save you some typing.
  • Although it's often useful for quick, interactive, exploratory data analysis; eventually you'll want the full flexibility of ggplot().
  • qplot() assumes variable names are aesthetics (a mapping from data to visual elements), but with ggplot(), you have to specify them.
ggplot(data = floats, aes(x = Longitude, y = Latitude, colour = callSign)) +
  geom_point() + coord_map()

Gliders Data

str(gliders)
## 'data.frame':    369384 obs. of  14 variables:
##  $ callSign      : Factor w/ 10 levels "48900","48901",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Date_Time     : Factor w/ 74 levels "5/28/2010","5/29/2010",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ JulianDay     : num  2455351 2455351 2455351 2455351 2455351 ...
##  $ Time_QC       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Latitude      : num  27.9 27.9 27.9 27.9 27.9 ...
##  $ Longitude     : num  -84 -84 -84 -84 -84 ...
##  $ Position_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Depth         : int  3 4 5 6 7 8 9 10 11 12 ...
##  $ Depth_QC      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Temperature   : num  28.1 27.8 27.8 27.7 27.7 ...
##  $ Temperature_QC: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Salinity      : num  35.4 35.4 35.4 35.3 35.3 ...
##  $ Salinity_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Type          : Factor w/ 1 level "Glider": 1 1 1 1 1 1 1 1 1 1 ...

Gliders

ggplot(data = gliders, aes(x = Longitude, y = Latitude, colour = callSign)) + 
  geom_point() + coord_map()

Boats Data

str(boats)
## 'data.frame':    106735 obs. of  14 variables:
##  $ callSign      : Factor w/ 2 levels "WTEO","WTER": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Date_Time     : Factor w/ 43 levels "4/22/2010","4/23/2010",..: 12 12 12 12 12 12 12 12 12 12 ...
##  $ JulianDay     : num  2455330 2455330 2455330 2455330 2455330 ...
##  $ Time_QC       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Latitude      : num  26.3 26.3 26.3 26.3 26.3 ...
##  $ Longitude     : num  -87 -87 -87 -87 -87 ...
##  $ Position_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Depth         : int  2 3 4 5 6 7 8 9 10 11 ...
##  $ Depth_QC      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Temperature   : num  27.8 27.8 27.7 27.7 27.7 ...
##  $ Temperature_QC: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Salinity      : num  36.4 36.4 36.4 36.4 36.4 ...
##  $ Salinity_QC   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Type          : Factor w/ 1 level "Boat": 1 1 1 1 1 1 1 1 1 1 ...

Boats

ggplot(data = boats, aes(x = Longitude, y = Latitude, colour = callSign)) + 
  geom_point() + coord_map()

Provide some context

  • All of this data comes from the same geographic region.
  • It'd be nice if we could overlay this information onto a map!
  • This is where ggplot2's layering idea becomes useful.

states <- map_data("state")
map_outline <- ggplot() +
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  xlim(c(-91, -80)) + ylim(c(22,32)) + coord_map()
map_outline

map_floats <- map_outline +
  geom_point(data = floats, aes(x = Longitude, y = Latitude, colour = callSign))
map_floats

rig # location of BP Oil rig
##         x        y
## 1 -88.366 28.73663
map_floats +
  geom_point(data = rig, aes(x, y), shape = "x", size = 5) + 
  geom_text(data = rig, aes(x, y), label = "BP Oil rig", 
            size = 5, hjust = -0.1)

Your Turn

  • Use your ggplot2 skills to explore any (or all!) of the floats/gliders/boats data. Be creative!!

How exactly does layering work?

A simpler example

p <- ggplot(data = diamonds, aes(x = carat, y = price)) +
  geom_point() +           # layer 1
  geom_smooth(method = lm) # layer 2
p

Plot-level things

# scales
p <- p + scale_x_log10() + scale_y_log10()  
p

# coordinate system
p + coord_polar()

# facets
p + facet_grid(. ~ cut)

  • Each geom_* function is really a layer with certain defaults for statistic, position, and, well, geometry.
args(geom_point)
## function (mapping = NULL, data = NULL, stat = "identity", position = "identity", 
##     na.rm = FALSE, ...) 
## NULL
args(geom_smooth)
## function (mapping = NULL, data = NULL, stat = "smooth", position = "identity", 
##     ...) 
## NULL
args(geom_bar)
## function (mapping = NULL, data = NULL, stat = "bin", position = "stack", 
##     ...) 
## NULL

Defaults aren't always right!!

base <- ggplot(diamonds, aes(x = cut, fill = clarity)) + theme(legend.position = "none")
base + geom_bar()                                   # bad!!!
base + geom_bar(position = "fill")                  # better
base + geom_bar(position = "dodge") + 
  theme(legend.position = "bottom")                 # better

Your Turn

  • Read in the animal.csv data
animals <- read.csv("http://heike.github.io/rwrks/02-r-graphics/data/animal.csv")
  • Plot the location of animal sightings on a map of the region
  • On this plot, try to color points by class of animal and/or status of animal
  • Advanced: Could we indicate time somehow?