Abstract


Some recent interest in meteorological events and their economic and health consequences motivate this quantitative analysis to identify such events in order of importance and quantify its economic and health effects on the general population. We find that Hurricanes are the most expensive events with respect to property damage costing $733 Million US Dollars each, Typhoons are associated with the biggest number of injuries with 13.38 on average and Tsunamis take the most lives of all meteorological events with 1.65 persons in average.

Introduction

In order to alleviate or even prevent future economic damages and health problems in the general population caused by catastrophic events (e.g. Hurricanes and Typhoons), it is of the most importance to quantify the effects of each and every type of event in order to not only asses the absolute impact but also its impact in relation to each other. Are Hurricanes more fatal than Tsunamis? Are Typhoons more expensive than Floods? These types of questions motivate our present analysis. In concrete, we are interested in ranking and quantifying the effects of every type of catastrophic event in the NATIONAL WEATHER SERVICE data base with respect to property damage in US dollars, fatalities and injuries, both in units. The documents is dived in an introduction motivating the questions, a data processing section describing the data set and the appropriate code to make it usable for our purposes and also discuses some of the limitations of the data set, an analysis section describing the methodology applied and a results section which presents the conclusions of the analysis and mentions possible limitations and extensions of the analysis.

Data Processing

In this section we present the code to make the data analytical for our purposes and also present some descriptives to spot some main problems and limitations of the data.

The processing code bellow shows the necessary instruction to load and summarize the code.

##     STATE__                  BGN_DATE             BGN_TIME
##  Min.   : 1.0   5/25/2011 0:00:00:  1202   12:00:00 AM: 10163
##  1st Qu.:19.0   4/27/2011 0:00:00:  1193   06:00:00 PM:  7350
##  Median :30.0   6/9/2011 0:00:00 :  1030   04:00:00 PM:  7261
##  Mean   :31.2   5/30/2004 0:00:00:  1016   05:00:00 PM:  6891
##  3rd Qu.:45.0   4/4/2011 0:00:00 :  1009   12:00:00 PM:  6703
##  Max.   :95.0   4/2/2006 0:00:00 :   981   03:00:00 PM:  6700
##                 (Other)          :895866   (Other)    :857229
##    TIME_ZONE          COUNTY           COUNTYNAME         STATE
##  CST    :547493   Min.   :  0.0   JEFFERSON :  7840   TX     : 83728
##  EST    :245558   1st Qu.: 31.0   WASHINGTON:  7603   KS     : 53440
##  MST    : 68390   Median : 75.0   JACKSON   :  6660   OK     : 46802
##  PST    : 28302   Mean   :100.6   FRANKLIN  :  6256   MO     : 35648
##  AST    :  6360   3rd Qu.:131.0   LINCOLN   :  5937   IA     : 31069
##  HST    :  2563   Max.   :873.0   MADISON   :  5632   NE     : 30271
##  (Other):  3631                   (Other)   :862369   (Other):621339
##                EVTYPE         BGN_RANGE           BGN_AZI
##  HAIL             :288661   Min.   :   0.000          :547332
##  TSTM WIND        :219940   1st Qu.:   0.000   N      : 86752
##  THUNDERSTORM WIND: 82563   Median :   0.000   W      : 38446
##  TORNADO          : 60652   Mean   :   1.484   S      : 37558
##  FLASH FLOOD      : 54277   3rd Qu.:   1.000   E      : 33178
##  FLOOD            : 25326   Max.   :3749.000   NW     : 24041
##  (Other)          :170878                      (Other):134990
##          BGN_LOCATI                  END_DATE             END_TIME
##               :287743                    :243411              :238978
##  COUNTYWIDE   : 19680   4/27/2011 0:00:00:  1214   06:00:00 PM:  9802
##  Countywide   :   993   5/25/2011 0:00:00:  1196   05:00:00 PM:  8314
##  SPRINGFIELD  :   843   6/9/2011 0:00:00 :  1021   04:00:00 PM:  8104
##  SOUTH PORTION:   810   4/4/2011 0:00:00 :  1007   12:00:00 PM:  7483
##  NORTH PORTION:   784   5/30/2004 0:00:00:   998   11:59:00 PM:  7184
##  (Other)      :591444   (Other)          :653450   (Other)    :622432
##    COUNTY_END COUNTYENDN       END_RANGE           END_AZI
##  Min.   :0    Mode:logical   Min.   :  0.0000          :724837
##  1st Qu.:0    NA's:902297    1st Qu.:  0.0000   N      : 28082
##  Median :0                   Median :  0.0000   S      : 22510
##  Mean   :0                   Mean   :  0.9862   W      : 20119
##  3rd Qu.:0                   3rd Qu.:  0.0000   E      : 20047
##  Max.   :0                   Max.   :925.0000   NE     : 14606
##                                                 (Other): 72096
##            END_LOCATI         LENGTH              WIDTH
##                 :499225   Min.   :   0.0000   Min.   :   0.000
##  COUNTYWIDE     : 19731   1st Qu.:   0.0000   1st Qu.:   0.000
##  SOUTH PORTION  :   833   Median :   0.0000   Median :   0.000
##  NORTH PORTION  :   780   Mean   :   0.2301   Mean   :   7.503
##  CENTRAL PORTION:   617   3rd Qu.:   0.0000   3rd Qu.:   0.000
##  SPRINGFIELD    :   575   Max.   :2315.0000   Max.   :4400.000
##  (Other)        :380536
##        F               MAG            FATALITIES          INJURIES
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000
##  NA's   :843563

We are intersted in the EVTYPE, INJURIES, FATALITIES and PROPDMG, the last one needing some processing to convert in US dollars according to the variable PROPDMGEXP which is supposed to have three levels of magnitude: K for thusands, M for millions and B for billions. We observe a samll number of obervations in other levels which we are going to assume are US dollars for simplicity. The next code describes the transformation of PROPDMG to a new variable in US Dollars.

Lets zoom to the EVTYPE variable.

##
##             HIGH SURF ADVISORY                  COASTAL FLOOD
##                              1                              1
##                    FLASH FLOOD                      LIGHTNING
##                              1                              1
##                      TSTM WIND                TSTM WIND (G45)
##                              4                              1
##                     WATERSPOUT                           WIND
##                              1                              1
##                              ?                ABNORMAL WARMTH
##                              1                              4
##                 ABNORMALLY DRY                 ABNORMALLY WET
##                              2                              1
##           ACCUMULATED SNOWFALL            AGRICULTURAL FREEZE
##                              4                              6
##                  APACHE COUNTY         ASTRONOMICAL HIGH TIDE
##                              1                            103
##          ASTRONOMICAL LOW TIDE                       AVALANCE
##                            174                              1
##                      AVALANCHE                   BEACH EROSIN
##                            386                              1
##                  Beach Erosion                  BEACH EROSION
##                              1                              3
##    BEACH EROSION/COASTAL FLOOD                    BEACH FLOOD
##                              1                              2
##     BELOW NORMAL PRECIPITATION              BITTER WIND CHILL
##                              2                              1
## BITTER WIND CHILL TEMPERATURES                      Black Ice
##                              3                              3
##                      BLACK ICE                       BLIZZARD
##                             14                           2719
## BLIZZARD AND EXTREME WIND CHIL        BLIZZARD AND HEAVY SNOW
##                              2                              1
##               Blizzard Summary               BLIZZARD WEATHER
##                              1                              1
##         BLIZZARD/FREEZING RAIN            BLIZZARD/HEAVY SNOW
##                              1                              2
##             BLIZZARD/HIGH WIND          BLIZZARD/WINTER STORM
##                              1                              1
##                  BLOW-OUT TIDE                 BLOW-OUT TIDES
##                              1                              1
##                   BLOWING DUST                   blowing snow
##                              4                              2
##                   Blowing Snow                   BLOWING SNOW
##                              3                             12
## BLOWING SNOW- EXTREME WIND CHI BLOWING SNOW & EXTREME WIND CH
##                              1                              2
## BLOWING SNOW/EXTREME WIND CHIL               BREAKUP FLOODING
##                              1                              1
##                     BRUSH FIRE                    BRUSH FIRES
##                              3                              1

As we can see from the first rows of the Event Type column, there is no homogeneity in the way each event is described (possibly because it was typed by each weather observatory without following strict definition of events or human errors). This is one of the most important descriptor of the database and crucial for our intended analysis. The ideal way in which we need the Event Type column to appear is in the form of a factor variable with each event type as a distinct level. To accompish this task we first import the description key for the events which was constructed from the table 2.1.1 “Storm Data Event Table” from page 6 of the Storm Data Preparation document available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

Our strategy for building our desired vector of events with standarized description of events is gonna be the following:

  1. Split dual events in the key table, we are gonna merge them together in the text matching algotighm (see below). Note that some events are gonna be true in more than one event variable; this is because sometimes the event type matches more than one event description from the table.
  2. Import the key as a character vector.
  3. construct a loop that matches every element of the key vector to the EVTYPE column and construct a new dummy variable based on the matching result.

Lets import the key and take a look at the first elements.

##       Event.Description
## 1 Astronomical Low Tide
## 2             Avalanche
## 3              Blizzard
## 4         Coastal Flood
## 5                  Cold
## 6           Debris Flow

Let’s create the events dataframe with logical elements.

Now let’s create the composed variables of the original key table (i.e. Cold/Wind Chill, Extreme Cold/Wind Chill, Frost/Freeze , Hurricane (Typhoon) and Storm Surge/Tide).

Note that the only event that intersect with others is “Wind Chill”, therefore is the only one we need to delete. Finally, we converted the logical data frame to binary in order to treat each column as a dummy variable. (In this case, turns out this transformation is necessary in order to run the dummy regression with no intersect.)

This last data frame is what we need in order to run our methodology.

Analysis

Our strategy to quantify the effects of each event in mortality, injuries and property damage is very straightforward. We are going to fit a Classic Linear Model to each of our three outcomes: injuries, fatalities and property damage with all our dummy variables as covariates. We assume the classic assumptions of normality and finite variance hold for the error term. Each regression line is gonna be fitted without intercept in order to have each coefficient represent the estimated population mean and also to directly interpret each p value as the probability of observing the coefficients magnitude or larger under the null hypothesis of mean equal to 0 (\(H_0:\beta i=0|H_1:\beta i\neq0\) where i is each dummy variable we created representing the events.)

Once we train the linear model, we are gonna grab the coefficients and plot them in descending order to see which ones are the biggest in each regression and also make some comments about its statistical significance and the magnitud of the damage.

Results

In this section we present the main results and plots generated. The first regression corresponing to the fatalities outcome and the main plot is presented in the following figure.

The variable Debris.Flow presents singularity because it doesnt have any variability.

It is important to note the some variables are not statistically significantly different from zero under a two sided t test for every standard significance level. We are, however, most interested in the biggest one in order of magnitud. The main three regressors in order of magnitud to explain the number of fatalities are Tsunami, Heat and Rip. Current all three strongly statistically different from zero. Tsunamis for example, have a 1.65 point estimate meaning each event causes approximately 1.65 deaths.

Next we review the injuries estiamtion.

The results show that Typhoons, Hurricanes and Tsunamis in that order are the main events related to injuries. All of them show statistically different from zero estimates and for example, each Typhoon event corresponds to 13.38 injuries in average.

Finnaly, the results from the estimation of the property damage outcome is shown in the following lines.

The estimation show that the top three most devastating meteorological events for property damages are Hurricanes, Typhoons and Storm Surge Tides in that order with its estimators strongly different from zero after accounting for sources of variation. Hurricanes cost $733 Million US Dollars each on average.

Final Remarks

To end the analysis and open the discussion we think more detail in the analysis for example separating estimations by state can lead to useful insights since the marginal effects of each type of catastrophic event may differ between states due to the intrinsic climatological conditions of said places. Another interesting aspect of the data to further investigate is the count nature of injuries and fatalities which may be far from the assumption of normality because they are bounded at zero and also not continuous, maybe not even asyptotically. A poisson distribution of \(Y-E(Y|X)\) might be more appropriate.