welcome to part 2 of our short introduction to extreme value analysis using the
extRemes package in R.
As some of you might know, there are two common approaches for practical extreme value analysis. Today, we will focus on the first of these two approaches, called the block maxima method.
This approach for modelling extremes of a (time) series of observations is based on the utilization of maximum or minimum values of these observations within a certain sequence of constant length. For a sufficiently large number n of established blocks, the resulting peak values of these n blocks of equal length can be used for fitting a suitable distribution to these data. While the block size is basically freely selectable, a trade-off has to be made between bias (small blocks) and variance (large blocks). Usually, the length of the sequences is often chosen to correspond to a certain familiar time period, in most cases a year. The resulting vector of annual maxima (or minima, respectively) is calles “Annual Maxima (Minima) Series” or simply AMS.
According to the Fisher–Tippett–Gnedenko theorem, the distribution of block maxima can be approximated by a generalized extreme value distribution.
The following code shows a short practical example of fitting a generalized extreme value distribution to a time series of precipitation data using the
extRemes package in R. The sample data set features precipitation data of Oberwang (Salzburg, Austria) from 1981 through 2013 and is is provided by the hydrographic services of Austria via eHYD.
# load packages library(extRemes) library(RCurl) # get data from eHYD link <- getURL("http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2") df <- read.csv(text = link, sep=";", dec=",", skip=21, header=FALSE) df[df == "Lücke"] = NA colnames(df) <- c("Date","Precip") df$Date <- as.Date(df$Date, format = "%d.%m.%Y") df$Precip <- as.numeric(gsub(",", ".", as.character(df$Precip))) head(df) # derive AMS for maximum precipitation ams <- data.frame() ams <- do.call("rbind", by(df, format(df$Date, "%Y"), function(x) x[which.max(x$Precip), ])) amsval <- ams$Precip # generalized maximum-likelihood fitting of the GEV distribution mlefit <- fevd(amsval, method = "GMLE", type="GEV") plot(mlefit) # diagnostic plots # return levels: rlmle <- return.level(mlefit, conf = 0.05, return.period= c(2,5,10,20,50,100)) # fitting of GEV distribution based on L-moments estimation lmomfit <- fevd(amsval, method = "Lmoments", type="GEV") plot(lmomfit) # diagnostic plots # return levels: rlmom <- return.level(lmomfit, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(mlefit, type="rl", main="Return Level Plot for Oberwang w/ MLE", ylim=c(0,200), pch=16) loc <- as.numeric(return.level(mlefit, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(lmomfit, type="rl", main="Return Level Plot for Oberwang w/ L-Moments", ylim=c(0,200)) loc <- as.numeric(return.level(lmomfit, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels out <- t(data.frame(mle=as.numeric(rlmle), lmom=as.numeric(rlmom))) colnames(out) <- c(2,5,10,20,50,100) round(out,1)
In this case, L-Moments estimation reacts more sensitive to the influence of the outlier, which is quite unusual. In most cases, L-moments estimation is more robust than maximum likelihood estimation, but our example apparently is the exception that proves the rule. In addition, I have made the observation that maximum likelihood estimation works more reliable in other R packages in some cases (e.g.
Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.