Introduction to Extreme Value Analysis in R – Part 2: Block Maxima Approach

Hey there,

welcome to part 2 of our short introduction to extreme value analysis using the extRemes package in R.

As some of you might know, there are two common approaches for practical extreme value analysis. Today, we will focus on the first of these two approaches, called the block maxima method.

This approach for modelling extremes of a (time) series of observations is based on the utilization of maximum or minimum values of these observations within a certain sequence of constant length. For a  sufficiently  large  number  n  of  established  blocks, the  resulting  peak  values of  these  n  blocks of equal  length  can  be  used  for  fitting  a  suitable  distribution  to  these  data.  While  the  block  size  is basically freely selectable, a trade-off has to be made between bias (small blocks) and variance (large blocks). Usually, the length of the sequences is often chosen to correspond to a certain familiar time period, in most cases a year. The resulting vector of annual maxima (or minima, respectively) is calles “Annual Maxima (Minima) Series” or simply AMS.

According to the Fisher–Tippett–Gnedenko theorem, the distribution of block maxima can be approximated by a generalized extreme value distribution.

The following code shows a short practical example of fitting a generalized extreme value distribution to a time series of precipitation data using the extRemes package in R. The sample data set features precipitation data of Oberwang (Salzburg, Austria) from 1981 through 2013 and is is provided by the hydrographic services of Austria via eHYD.

# load packages
library(extRemes)
library(RCurl)

# get data from eHYD
link <- getURL("http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2")
df <- read.csv(text = link, sep=";", dec=",", skip=21, header=FALSE)
df[df == "Lücke"] = NA
colnames(df) <- c("Date","Precip")
df$Date <- as.Date(df$Date, format = "%d.%m.%Y")
df$Precip <- as.numeric(gsub(",", ".", as.character(df$Precip)))
head(df)

# derive AMS for maximum precipitation
ams <- data.frame()
ams <- do.call("rbind", by(df, format(df$Date, "%Y"),
               function(x) x[which.max(x$Precip), ]))
amsval <- ams$Precip

# generalized maximum-likelihood fitting of the GEV distribution
mlefit <- fevd(amsval, method = "GMLE", type="GEV")
plot(mlefit) # diagnostic plots
# return levels:
rlmle <- return.level(mlefit, conf = 0.05, return.period= c(2,5,10,20,50,100))

# fitting of GEV distribution based on L-moments estimation
lmomfit <- fevd(amsval, method = "Lmoments", type="GEV")
plot(lmomfit) # diagnostic plots
# return levels:
rlmom <- return.level(lmomfit, conf = 0.05, return.period= c(2,5,10,20,50,100))

# return level plots
par(mfcol=c(1,2))
# return level plot w/ MLE
plot(mlefit, type="rl",
     main="Return Level Plot for Oberwang w/ MLE",
     ylim=c(0,200), pch=16)
loc <- as.numeric(return.level(mlefit, conf = 0.05,return.period=100))
segments(100, 0, 100, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,100, loc, col='midnightblue', lty=6)

# return level plot w/ LMOM
plot(lmomfit, type="rl",
     main="Return Level Plot for Oberwang w/ L-Moments",
     ylim=c(0,200))
loc <- as.numeric(return.level(lmomfit, conf = 0.05,return.period=100))
segments(100, 0, 100, loc, col= 'midnightblue',lty=6)
segments(0.01,loc,100, loc, col='midnightblue', lty=6)

# comparison of return levels
out <- t(data.frame(mle=as.numeric(rlmle),
                    lmom=as.numeric(rlmom)))
colnames(out) <- c(2,5,10,20,50,100)
round(out,1)

return_levels_ams
In this case, L-Moments estimation reacts more sensitive to the influence of the outlier, which is quite unusual. In most cases, L-moments estimation is more robust than maximum likelihood estimation, but our example apparently is the exception that proves the rule. In addition, I have made the observation that maximum likelihood estimation works more reliable in other R packages in some cases (e.g. fExtremes, ismev, evd, evir).

Reference: Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. London: Springer.

About This Author

Matthias studied Environmental and Bio-Resources Management with a specialization in Environmental Information Management at the University of Natural Resources and Life Sciences (Vienna). He is currently a PhD student working at the Austrian Institute of Technology. Having written his master's thesis about extreme weather risk identification for the Austrian road network, he currently focuses on modeling of adverse weather events as a basis for risk assessment of road infrastructure networks.

21 Comments

You can post comments in this post.


  • i want a help of you

    Muhammad Arif 1 year ago Reply


    • Sure. Can you specify your request?

      Matthias 1 year ago Reply


  • plz send me the r-coding for fitting generalized pareto distribution,estimating the parameters estimation by mle method

    Muhammad Arif 1 year ago Reply


  • my research topic is “non stationary frequency analysis of extreme precipitation”.but i did not know my track. how i will proceed my job?
    please help me.
    thanks for your kindness!

    Muhammad Arif 1 year ago Reply


  • Hi I am currently doing a thesis on Garch-EVT approach to estimate Value at Risk and Expected Shortfall. Can you send me the R procedure for this approach? Thanks

    Gaby 8 months ago Reply


    • Hi Gaby,

      I have not yet worked with GARCH-EVT-copulas, if that’s what you intend to do. I am pretty sure that there is no specific R package targeted at this topic. So you probably have to split your procedure into several steps and deal with each step individually. Stack overflow is most likely a very good place to get some information and see the workflow of other people working with this topic. For a start, also take a look at fGarch (which is part of the Rmetrics-suite, just as the excellent package fExtremes) or rugarch for GARCH modelling.
      In addition, you might want to have a look at the package GEVStableGarch which seemingly employs maximum likelihood extimation of GARCH models with a GEV distribution, maybe this is sufficient for your needs.

      Regards,
      Matthias

      Matthias 8 months ago Reply


      • Thanks for your reply.

        I’ll just be working with garch and evt. The procedure consists of applying a garch model to the return series and then extract the residuals from the model. Then we apply evt on the residuals extracted from the garch model. However I am having difficulties selecting the threshold from the residuals so as to apply EVT-POT or block maxima method to it. Can you help me?

        Regards
        Gaby

        Gaby 8 months ago Reply


    • I’m shocked that I found this info so easily.

      Denver 6 months ago Reply


  • Hi

    Your website is very useful and make this topic easy to follow

    How did you change your points on your return level plots to green?

    misha 5 months ago Reply


    • Hi,

      unfortunately, the plotting methods for objects of type fevd are – albeit comprehensive – hardly customizable. Since I didn’t like the default plotting, I modified the plotting functions plot.fevd.mle and plot.fevd.lmoments to support better visualization options within the extRemes framework.

      Regards,
      Matthias

      Matthias 5 months ago Reply


      • Matthias,

        i am still learning phase of R.

        Would you be able to provide me with some guidance on how to get this done. Thanks

        misha 5 months ago Reply


        • Hi Misha,

          I have written the following function for creating a simple return level plot (base graphics) from fevd objects that have been fitted of type GEV.
          You can control both the color of points and the lines of the fitted function as well as the number of ticks on your y-axis.

          rlplot_gev < - function(fevd_obj,
                                 pointcolor = "firebrick",
                                 linecolor = "darkgreen",
                                 yticks = seq(0,225,25)){
            rperiods = c(1+1e-15, 1.25, 1.5, 2, 3,4,5, 7.5, 10, 15, 20,
                         30, 40, 50, 60, 75, 80, 100, 120, 200, 250)
            model <- fevd_obj$type
            if(model != "GEV"){
              stop("this plotting function only works for GEV")
            }
            pars <- fevd_obj$results$par
            loc <- pars["location"]
            scale <- pars["scale"]
            shape <- pars["shape"]
            xrl <- -1/(log(1 - 1/rperiods))
            yrl <- rlevd(rperiods, loc = loc, scale = scale, 
                         shape = shape)
            
            # empty plot
            plot(xrl, yrl, type = "n", log = "x", xlab = "Return Period",
                 ylab = "Return Level", lwd = 2, yaxt = "n", xlim = c(1,200),
                 main = "Return Level Plot")
            par(las = 1)
            axis(side = 2, at = yticks)
            
            # background grid
            abline(h = yticks,col = "lightgrey", lty = 2)
            abline(v = c(1,2,5,10,20,50,100,200), col = "lightgrey", lty = 2)
            
            # plot line
            lines(xrl, yrl, col = linecolor, lwd = 2)
            
            # ci lines for MLE base plot
            bds <- ci(fevd_obj, return.period = rperiods)
            lines(xrl, bds[, 1], col = linecolor, lty = 4)
            lines(xrl, bds[, 3], col = linecolor, lty = 4)
            
            # Weibull plotting position of points
            
            n <- fevd_obj$n
            xp <- ppoints(n = n, a = 0)
            ytmp <- datagrabber(fevd_obj)
            y <- c(ytmp[, 1])
            points(-1/log(xp), sort(y), pch = 16, col = pointcolor)
          }
          

          Please note that this is a very simple approach, I'm mainly re-using code from plot.fevd.mle().
          In order to create a similar plot for the threshold excess approach, you would have to make slight adjustments to parameters of the GP and to the plotting positions of the extreme values:

          sdat < - sort(y)
          points(-1/log(xp2)[sdat > u]/x$npy, sdat[sdat > u], pch="+", col = pointcolor)
          

          Using rlplot_gev(mlefit, pointcolor = "darkblue") on the code of my post should yield return level plots similar to the default plot(mlefit, type = "rl", xlim = c(1, 200), ylim = c(25, 225))

          Regards,
          Matthias

          Matthias 5 months ago Reply


  • Hey!

    I am currently writing my Bachelor Thesis on evds.

    Would it be okay for you if I used your code as an example?

    Lukas 4 months ago Reply


    • Hi Lukas,

      yes, sure. However, if you have enough time it might be worth to have a deeper look at the packages presented above.
      This post is really meant as an introduction to the topic.

      Regards,
      Matthias

      Matthias 4 months ago Reply


  • When modelling the effects of extreme events (say of variable X) on volatility of prices of a commodity (say Y), is possible to fit GEV to X and then insert it as an exogenous variable in mean or volatility equation? If not, how can I approach this issue?

    Waiguru 2 months ago Reply


    • Hi,

      I am not really sure if I understand what you are intending to do.
      What is the question you are trying to answer?
      Modelling the influence of X on Y sounds more like a standard regression model in the first place.

      Matthias 2 months ago Reply


  • Hello!
    I have a question:
    How can I use R and the Gumbel distribution to predict discharge on specific return periods?
    And how do I know the return period?
    Thank you!

    Abeer Haddad 2 months ago Reply


    • Hi,

      basically, you can stick to my example code above.
      If you specifically want to fit a Gumbel distribution, simply set the type argument in the fevd() function to type="Gumbel".
      Return levels can be obtained by using the function return.level().

      Regards,
      Matthias

      Matthias 2 months ago Reply


  • Hi Matthias, I have a question too:
    I want to use Block maxima Method to estimate Value at risk. How can I do this using EVT BMM? From Generalized Extreme Value distribution I need to estimate the shape, scale and loval parameters, but which function in R delivers this result? And I should use different blocks in the length of “months”, “quartal” and “6 months”. And after that to estimate with the help of the parameters the Value at Risk.

    Regards,
    Martina

    Martina 1 month ago Reply


Post A Reply

*