A couple of exercises for chapter 5 refer to a dataset of Arctic sea ice extent, and we also plan to add an online case study on regression to the mean where this will be one of the examples. The data comes from the National Snow and Ice Data Center Arctic Sea Ice News and Analysis. The dataset is more completely summarized in the graphic below.
This page, from the World Bank, gives comprehensive data on worldwide carbon dioxide emissions, broken down in a variety of ways (e.g., country by country).
This dataset is referenced in problem 4 of chapter 5 of the book.
This post is a supplement to the discussion of extreme value statistics at the end of Section 5.3 of the book. You can find an online extreme value distribution calculator, provided by South Dakota State University, at
This calculator fits a Gumbel distribution (a form of generalized extreme value distribution) to a data set. It uses the language of river floods because that is what the authors are interested in, but the underlying mathematics applies to many different situations.
To use the calculator one provides a data series consisting of extreme values. For instance, one might provide the data series
12.1; 11; 2; 1.8; 16.4; 6.7; 8; 3; 4; 9;
which represents the biggest value of some variable (the depth of the deepest flood, the windspeed of the strongest storm, or whatever) in each of 10 successive years. The output of the calculator is a table giving a probability distribution. It has five columns: the key ones are labeled T (return period), P (probability) and Q (“flood discharge” for this calculator, but it refers to whatever variable we are modeling). Here is part of the output for the data series above:
|Return period T, year||Probability P, percent||Value Q|
This tells us (based on the data provided) that, for instance, the value \(Q=28\) will be exceeded only once in a hundred years; the value \(Q=25\) will be exceeded only once in fifty years; and so on.
This spreadsheet contains the data from the imaginary class used as our first example in discussing descriptive statistics: