# Behavioral Change Point Analysis

(Beginning of BCPA page) |
(... saving intermediate version ....) |
||

Line 1: | Line 1: | ||

'''''PLEASE NOTE: THIS PAGE IS UNDER CONSTRUCTION!''''' | '''''PLEASE NOTE: THIS PAGE IS UNDER CONSTRUCTION!''''' | ||

+ | [[image: FurSealBCPA.png | right | 300 px]] | ||

The '''Behavioral Change Point Analysis''' ('''BCPA''') is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. | The '''Behavioral Change Point Analysis''' ('''BCPA''') is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. | ||

Line 7: | Line 8: | ||

I have received numerous requests for the code behind the BCPA, so (after sending out more or less the same email with the same attachments several dozen times) have decided that it might be more efficient to post a sample work flow of using the BCPA on this wiki. | I have received numerous requests for the code behind the BCPA, so (after sending out more or less the same email with the same attachments several dozen times) have decided that it might be more efficient to post a sample work flow of using the BCPA on this wiki. | ||

− | + | Please send all comments, questions, critiques, possible improvements, or (heaven forbid!) identification of errors to via e-mail at <text> eli.gurarie (at) noaa.gov </text>. | |

== Brief Introduction == | == Brief Introduction == | ||

− | + | [[image:BCPA_RhoLikelihood.png | left| 200 px | thumb | Likelihood of autocorrelation parameter for randomly subsampled process ]] | |

+ | [[image:BCPA_MLCPLikelihood.png | right| 200 px | thumb | Identifying MLCP from likelihood profile.]] | ||

+ | |||

+ | Briefly, there are several hierarchically layered parts to the method. For location and time data <math>\{Z, T\}</math>, the analysis is performed on velocity (estimated as <math> \Delta Z / \Delta T</math> components <math> V \cos(\theta) </math> and <math> V \sin(\theta) </math>. These components are (importantly) assumed to be observations from a continuous time, Gaussian process, (i.e. an [[Wikipedia: Ornstein-Uhlenbeck process|Ornstein-Uhlenbeck process]]), with mean <math>\mu(t)</math>, variance <math> \sigma^2(t) </math> and autocorrelation <math>\rho(t)</math>. The values of these parameters are assumed to change gradually or abruptly. The purpose of the BCPA is to identify the locations where changes are abrupt (assumed to correspond to discrete changes in an animal's behavior). | ||

+ | |||

+ | The distribution function of this process is given by: | ||

+ | :: <math> f(X_i|X_{i-1}) = {1\over \sigma\sqrt{2 \pi (1-\rho^{2\tau_i})}} \exp {\left( \frac{\left(X_i - \rho^{\tau_i} (X_{i-1}-\mu)\right)^2}{2\sigma^2 (1 - \rho^{2\tau_i})} \right)} </math> | ||

+ | |||

+ | : First, we identifying a likelihood function for <math> \rho </math>, given estimates of <math> \mu = \overline{V_i} </math> and <math>\sigma^2 = S_V^2 </math>, in a behaviorally homogenous region using the distribution function above: | ||

+ | :: <math> L(\rho|{X},{T}) = \prod_{i=1}^n f(X_i|X_{i-1},\tau_i,\rho) </math> | ||

+ | |||

+ | * Second, we identify a "most likely change point" (MLCP) in a time series where the behavior may have changed by taking the product of the likelihoods of the estimates to the left and to the right of all possible change points in a time series. We identify which of the parameters (if any) have changed by comparing the [[Wikipedia: Bayesian Information Criterion | BIC]] of eight possible models: M0 - no changes in any parameter, M1 - change in <math>\mu</math> only, M2 - change in <math>\sigma</math> only, | ||

+ | M3 - change in <math>\rho</math> only ... etc ... M7 - chance in all three parameters. | ||

+ | |||

+ | * Third, we sweep the MLCP changepoint across a complete data set, recording at every point what the parameter values are to the left and right of all MLCP's under the model with the highest BIC, and record the paraemters. | ||

+ | |||

+ | * Fourth, we somehow present this mass of analysis. | ||

+ | |||

+ | I present here the code for all these steps, as they were applied in the original paper, and apply them to a simulated dataset. Naturally, implementation of this type of analysis should be specific to the relevant application. | ||

== Code pieces == | == Code pieces == | ||

+ | |||

=== Likelihood of <math> \rho </math> parameter === | === Likelihood of <math> \rho </math> parameter === | ||

+ | |||

+ | ''Description'': This function works first by estimating the mean and standard deviation directly from x, using these to standardize the time series, and then optimizes for the likelihood of the value of rho. The equation for the likelihood is given above (and in equations 10 and 11 in the BCPA paper). | ||

+ | |||

+ | ''Usage'': GetRho(x, t) | ||

+ | |||

+ | ''Value'': Returns a vector with two values (again - not a list or dataframe for greater speed). The first value is "rho" estimate, the second is the log likelihood of the estimate. | ||

+ | |||

+ | ''Arguments'' | ||

+ | *x Values of time series. | ||

+ | *t Times of measurements associated with x. | ||

+ | |||

+ | <pre> | ||

+ | GetRho <- function (x, t) | ||

+ | { | ||

+ | getL <- function(rho) { | ||

+ | dt <- diff(t) | ||

+ | s <- sd(x) | ||

+ | mu <- mean(x) | ||

+ | n <- length(x) | ||

+ | x.plus <- x[-1] | ||

+ | x.minus <- x[-length(x)] | ||

+ | Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - | ||

+ | mu), sd = s * sqrt(1 - rho^(2 * dt))) | ||

+ | logL <- sum(log(Likelihood)) | ||

+ | if (!is.finite(logL)) | ||

+ | logL <- -10^10 | ||

+ | return(-logL) | ||

+ | } | ||

+ | o <- optimize(getL, lower = 0, upper = 1, tol = 1e-04) | ||

+ | return(c(o$minimum, o$objective)) | ||

+ | } | ||

+ | </pre> | ||

+ | |||

+ | === Total likelihood within a behaviorally homogenous section === | ||

+ | ''Description'': Returns log-likelihood of a given parameter set for a gappy time series. | ||

+ | |||

+ | ''Usage'': GetLL(x, t, mu, s, rho) | ||

+ | |||

+ | ''Value'': Returns value of the log likelihood | ||

+ | |||

+ | ''Arguments'': | ||

+ | *x Time series data. | ||

+ | *t Times at which data is obtained | ||

+ | *mu Mean estimate | ||

+ | *s Sigma (standard deviation) estimate (>0) | ||

+ | *rho Rho estimate (between 0 and 1) | ||

+ | |||

+ | <pre> | ||

+ | GetLL <- function (x, t, mu, s, rho) | ||

+ | { | ||

+ | dt <- diff(t) | ||

+ | n <- length(x) | ||

+ | x.plus <- x[-1] | ||

+ | x.minus <- x[-length(x)] | ||

+ | Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - | ||

+ | mu), sd = s * sqrt(1 - rho^(2 * dt))) | ||

+ | LL <- -sum(log(Likelihood)) | ||

+ | return(LL) | ||

+ | } | ||

+ | </pre> | ||

+ | |||

=== Likelihood of single change point === | === Likelihood of single change point === | ||

+ | ''Description'': Takes a time series with values "x" obtained at time "t" and a time break "tbreak" and returns the estimates of "mu", "sigma" and "rho" as well as the negative log-likelihood of those estimates (given the data) both before and after the break. | ||

+ | |||

+ | ''Usage'': GetDoubleL(x, t, tbreak) | ||

+ | |||

+ | ''Value'': Returns a labeled matrix (more efficient than a data frame) with columns: "mu", "sigma", "rho" and "LL" corresponding to the estimates and 2 rows for each side of the break point. | ||

+ | |||

+ | ''Arguments'': | ||

+ | * x Values of time series. | ||

+ | * t Times of measurements associated with x. | ||

+ | * tbreak Breakpoint (in terms of the INDEX within "t" and "x", not actual time value). | ||

+ | <pre> | ||

+ | GetDoubleL <- function(x,t,tbreak) | ||

+ | { | ||

+ | x1 <- x[1:tbreak] | ||

+ | x2 <- x[tbreak:length(x)] | ||

+ | |||

+ | t1 <- t[1:tbreak] | ||

+ | t2 <- t[tbreak:length(t)] | ||

+ | |||

+ | o1<-GetRho(x1,t1) | ||

+ | o2<-GetRho(x2,t2) | ||

+ | |||

+ | mu1 <- mean(x1) | ||

+ | sigma1 <- sd(x1) | ||

+ | rho1 <- o1[1] | ||

+ | |||

+ | mu2 <- mean(x2) | ||

+ | sigma2 <- sd(x2) | ||

+ | rho2 <- o2[1] | ||

+ | |||

+ | LL1 <- -o1[2] | ||

+ | LL2 <- -o2[2] | ||

+ | |||

+ | m <- matrix(c(mu1,mu2,sigma1,sigma2,rho1,rho2,LL1,LL2),2,4) | ||

+ | colnames(m) <- c("mu","sigma","rho","LL") | ||

+ | |||

+ | return(m) | ||

+ | } | ||

+ | </pre> | ||

+ | |||

+ | === Sweeping breaks === | ||

+ | ''Description'': Finds a single change point within a time series. | ||

+ | |||

+ | ''Usage'': SweepBreaks(x, t, range=0.6) | ||

+ | |||

+ | ''Arguments'': | ||

+ | * x Values of time series. | ||

+ | * t Times of measurements associated with x. | ||

+ | * range Range of possible breaks. Default (0.6) runs approximately from 1/5th to 4/5ths of the total length of the time series. | ||

+ | |||

+ | ''Value'': Returns a matrix (not a data.frame for greater speed) with column headings: "breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL". This is calculated for every possible break - which extends from 0.2l to 0.8l (where l is the length of the time series). The output of this function feeds WindowSweep. | ||

+ | |||

+ | <pre> | ||

+ | SweepBreaks <- function(x,t,range=0.6) | ||

+ | { | ||

+ | n<-length(t) | ||

+ | start <- (1-range)/2 | ||

+ | breaks<-round((start*n):((1-start)*n)) | ||

+ | Ls<-breaks*0 | ||

+ | |||

+ | l<-length(breaks) | ||

+ | |||

+ | BreakParams <- matrix(NA,l,8) | ||

+ | #BreakParams <- data.frame(mu1=NA,s1=NA,rho1=NA,LL1=NA,mu2=NA,s2=NA,rho2=NA,LL2=NA) | ||

+ | |||

+ | for(i in 1:l) | ||

+ | { | ||

+ | myDoubleL <- GetDoubleL(x,t,breaks[i]) | ||

+ | BreakParams[i,] <- c(myDoubleL[1,],myDoubleL[2,]) | ||

+ | } | ||

+ | |||

+ | # remember: LL1 and LL2 are columns 4 and 8 | ||

+ | BreakMatrix<- cbind(breaks,t[breaks], BreakParams, | ||

+ | BreakParams[,4]+BreakParams[,8]) | ||

+ | |||

+ | colnames(BreakMatrix) <- c("breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL") | ||

+ | |||

+ | return(BreakMatrix[2:nrow(BreakMatrix),]) | ||

+ | } | ||

+ | </pre> | ||

+ | |||

+ | === Choosing a model === | ||

=== Window sweeping === | === Window sweeping === | ||

=== Plotting output === | === Plotting output === | ||

== Sample work flow == | == Sample work flow == |

## Revision as of 23:25, 18 July 2011

**PLEASE NOTE: THIS PAGE IS UNDER CONSTRUCTION!**

The **Behavioral Change Point Analysis** (**BCPA**) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled.

The original paper on which it is based is: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) *Ecology Letters* 12:5 395-408. Most of the material is also present in the Chapter 5 of my PhD dissertation (click to access).

I have received numerous requests for the code behind the BCPA, so (after sending out more or less the same email with the same attachments several dozen times) have decided that it might be more efficient to post a sample work flow of using the BCPA on this wiki.

Please send all comments, questions, critiques, possible improvements, or (heaven forbid!) identification of errors to via e-mail at <text> eli.gurarie (at) noaa.gov </text>.

## Contents |

## Brief Introduction

Briefly, there are several hierarchically layered parts to the method. For location and time data {*Z*,*T*}, the analysis is performed on velocity (estimated as Δ*Z* / Δ*T* components *V*cos(θ) and *V*sin(θ). These components are (importantly) assumed to be observations from a continuous time, Gaussian process, (i.e. an Ornstein-Uhlenbeck process), with mean μ(*t*), variance σ^{2}(*t*) and autocorrelation ρ(*t*). The values of these parameters are assumed to change gradually or abruptly. The purpose of the BCPA is to identify the locations where changes are abrupt (assumed to correspond to discrete changes in an animal's behavior).

The distribution function of this process is given by:

- First, we identifying a likelihood function for ρ, given estimates of and , in a behaviorally homogenous region using the distribution function above:

- Second, we identify a "most likely change point" (MLCP) in a time series where the behavior may have changed by taking the product of the likelihoods of the estimates to the left and to the right of all possible change points in a time series. We identify which of the parameters (if any) have changed by comparing the BIC of eight possible models: M0 - no changes in any parameter, M1 - change in μ only, M2 - change in σ only,

M3 - change in ρ only ... etc ... M7 - chance in all three parameters.

- Third, we sweep the MLCP changepoint across a complete data set, recording at every point what the parameter values are to the left and right of all MLCP's under the model with the highest BIC, and record the paraemters.

- Fourth, we somehow present this mass of analysis.

I present here the code for all these steps, as they were applied in the original paper, and apply them to a simulated dataset. Naturally, implementation of this type of analysis should be specific to the relevant application.

## Code pieces

### Likelihood of ρ parameter

*Description*: This function works first by estimating the mean and standard deviation directly from x, using these to standardize the time series, and then optimizes for the likelihood of the value of rho. The equation for the likelihood is given above (and in equations 10 and 11 in the BCPA paper).

*Usage*: GetRho(x, t)

*Value*: Returns a vector with two values (again - not a list or dataframe for greater speed). The first value is "rho" estimate, the second is the log likelihood of the estimate.

*Arguments*

- x Values of time series.
- t Times of measurements associated with x.

GetRho <- function (x, t) { getL <- function(rho) { dt <- diff(t) s <- sd(x) mu <- mean(x) n <- length(x) x.plus <- x[-1] x.minus <- x[-length(x)] Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - mu), sd = s * sqrt(1 - rho^(2 * dt))) logL <- sum(log(Likelihood)) if (!is.finite(logL)) logL <- -10^10 return(-logL) } o <- optimize(getL, lower = 0, upper = 1, tol = 1e-04) return(c(o$minimum, o$objective)) }

### Total likelihood within a behaviorally homogenous section

*Description*: Returns log-likelihood of a given parameter set for a gappy time series.

*Usage*: GetLL(x, t, mu, s, rho)

*Value*: Returns value of the log likelihood

*Arguments*:

- x Time series data.
- t Times at which data is obtained
- mu Mean estimate
- s Sigma (standard deviation) estimate (>0)
- rho Rho estimate (between 0 and 1)

GetLL <- function (x, t, mu, s, rho) { dt <- diff(t) n <- length(x) x.plus <- x[-1] x.minus <- x[-length(x)] Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - mu), sd = s * sqrt(1 - rho^(2 * dt))) LL <- -sum(log(Likelihood)) return(LL) }

### Likelihood of single change point

*Description*: Takes a time series with values "x" obtained at time "t" and a time break "tbreak" and returns the estimates of "mu", "sigma" and "rho" as well as the negative log-likelihood of those estimates (given the data) both before and after the break.

*Usage*: GetDoubleL(x, t, tbreak)

*Value*: Returns a labeled matrix (more efficient than a data frame) with columns: "mu", "sigma", "rho" and "LL" corresponding to the estimates and 2 rows for each side of the break point.

*Arguments*:

- x Values of time series.
- t Times of measurements associated with x.
- tbreak Breakpoint (in terms of the INDEX within "t" and "x", not actual time value).

GetDoubleL <- function(x,t,tbreak) { x1 <- x[1:tbreak] x2 <- x[tbreak:length(x)] t1 <- t[1:tbreak] t2 <- t[tbreak:length(t)] o1<-GetRho(x1,t1) o2<-GetRho(x2,t2) mu1 <- mean(x1) sigma1 <- sd(x1) rho1 <- o1[1] mu2 <- mean(x2) sigma2 <- sd(x2) rho2 <- o2[1] LL1 <- -o1[2] LL2 <- -o2[2] m <- matrix(c(mu1,mu2,sigma1,sigma2,rho1,rho2,LL1,LL2),2,4) colnames(m) <- c("mu","sigma","rho","LL") return(m) }

### Sweeping breaks

*Description*: Finds a single change point within a time series.

*Usage*: SweepBreaks(x, t, range=0.6)

*Arguments*:

- x Values of time series.
- t Times of measurements associated with x.
- range Range of possible breaks. Default (0.6) runs approximately from 1/5th to 4/5ths of the total length of the time series.

*Value*: Returns a matrix (not a data.frame for greater speed) with column headings: "breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL". This is calculated for every possible break - which extends from 0.2l to 0.8l (where l is the length of the time series). The output of this function feeds WindowSweep.

SweepBreaks <- function(x,t,range=0.6) { n<-length(t) start <- (1-range)/2 breaks<-round((start*n):((1-start)*n)) Ls<-breaks*0 l<-length(breaks) BreakParams <- matrix(NA,l,8) #BreakParams <- data.frame(mu1=NA,s1=NA,rho1=NA,LL1=NA,mu2=NA,s2=NA,rho2=NA,LL2=NA) for(i in 1:l) { myDoubleL <- GetDoubleL(x,t,breaks[i]) BreakParams[i,] <- c(myDoubleL[1,],myDoubleL[2,]) } # remember: LL1 and LL2 are columns 4 and 8 BreakMatrix<- cbind(breaks,t[breaks], BreakParams, BreakParams[,4]+BreakParams[,8]) colnames(BreakMatrix) <- c("breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL") return(BreakMatrix[2:nrow(BreakMatrix),]) }