Abstract
Some recent interest in meteorological events and their economic and health consequences motivate this quantitative analysis to identify such events in order of importance and quantify its economic and health effects on the general population. We find that Hurricanes are the most expensive events with respect to property damage costing $733 Million US Dollars each, Typhoons are associated with the biggest number of injuries with 13.38 on average and Tsunamis take the most lives of all meteorological events with 1.65 persons in average.
Introduction
In order to alleviate or even prevent future economic damages and health problems in the general population caused by catastrophic events (e.g. Hurricanes and Typhoons), it is of the most importance to quantify the effects of each and every type of event in order to not only asses the absolute impact but also its impact in relation to each other. Are Hurricanes more fatal than Tsunamis? Are Typhoons more expensive than Floods? These types of questions motivate our present analysis. In concrete, we are interested in ranking and quantifying the effects of every type of catastrophic event in the NATIONAL WEATHER SERVICE data base with respect to property damage in US dollars, fatalities and injuries, both in units. The documents is dived in an introduction motivating the questions, a data processing section describing the data set and the appropriate code to make it usable for our purposes and also discuses some of the limitations of the data set, an analysis section describing the methodology applied and a results section which presents the conclusions of the analysis and mentions possible limitations and extensions of the analysis.
Data Processing
In this section we present the code to make the data analytical for our purposes and also present some descriptives to spot some main problems and limitations of the data.
The processing code bellow shows the necessary instruction to load and summarize the code.
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0.0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31.0 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75.0 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :100.6 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131.0 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873.0 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0.000 :547332
## TSTM WIND :219940 1st Qu.: 0.000 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0.000 W : 38446
## TORNADO : 60652 Mean : 1.484 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1.000 E : 33178
## FLOOD : 25326 Max. :3749.000 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0.0000 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0.0000 N : 28082
## Median :0 Median : 0.0000 S : 22510
## Mean :0 Mean : 0.9862 W : 20119
## 3rd Qu.:0 3rd Qu.: 0.0000 E : 20047
## Max. :0 Max. :925.0000 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH
## :499225 Min. : 0.0000 Min. : 0.000
## COUNTYWIDE : 19731 1st Qu.: 0.0000 1st Qu.: 0.000
## SOUTH PORTION : 833 Median : 0.0000 Median : 0.000
## NORTH PORTION : 780 Mean : 0.2301 Mean : 7.503
## CENTRAL PORTION: 617 3rd Qu.: 0.0000 3rd Qu.: 0.000
## SPRINGFIELD : 575 Max. :2315.0000 Max. :4400.000
## (Other) :380536
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
We are intersted in the EVTYPE, INJURIES, FATALITIES and PROPDMG, the last one needing some processing to convert in US dollars according to the variable PROPDMGEXP which is supposed to have three levels of magnitude: K for thusands, M for millions and B for billions. We observe a samll number of obervations in other levels which we are going to assume are US dollars for simplicity. The next code describes the transformation of PROPDMG to a new variable in US Dollars.
Lets zoom to the EVTYPE variable.
##
## HIGH SURF ADVISORY COASTAL FLOOD
## 1 1
## FLASH FLOOD LIGHTNING
## 1 1
## TSTM WIND TSTM WIND (G45)
## 4 1
## WATERSPOUT WIND
## 1 1
## ? ABNORMAL WARMTH
## 1 4
## ABNORMALLY DRY ABNORMALLY WET
## 2 1
## ACCUMULATED SNOWFALL AGRICULTURAL FREEZE
## 4 6
## APACHE COUNTY ASTRONOMICAL HIGH TIDE
## 1 103
## ASTRONOMICAL LOW TIDE AVALANCE
## 174 1
## AVALANCHE BEACH EROSIN
## 386 1
## Beach Erosion BEACH EROSION
## 1 3
## BEACH EROSION/COASTAL FLOOD BEACH FLOOD
## 1 2
## BELOW NORMAL PRECIPITATION BITTER WIND CHILL
## 2 1
## BITTER WIND CHILL TEMPERATURES Black Ice
## 3 3
## BLACK ICE BLIZZARD
## 14 2719
## BLIZZARD AND EXTREME WIND CHIL BLIZZARD AND HEAVY SNOW
## 2 1
## Blizzard Summary BLIZZARD WEATHER
## 1 1
## BLIZZARD/FREEZING RAIN BLIZZARD/HEAVY SNOW
## 1 2
## BLIZZARD/HIGH WIND BLIZZARD/WINTER STORM
## 1 1
## BLOW-OUT TIDE BLOW-OUT TIDES
## 1 1
## BLOWING DUST blowing snow
## 4 2
## Blowing Snow BLOWING SNOW
## 3 12
## BLOWING SNOW- EXTREME WIND CHI BLOWING SNOW & EXTREME WIND CH
## 1 2
## BLOWING SNOW/EXTREME WIND CHIL BREAKUP FLOODING
## 1 1
## BRUSH FIRE BRUSH FIRES
## 3 1
As we can see from the first rows of the Event Type column, there is no homogeneity in the way each event is described (possibly because it was typed by each weather observatory without following strict definition of events or human errors). This is one of the most important descriptor of the database and crucial for our intended analysis. The ideal way in which we need the Event Type column to appear is in the form of a factor variable with each event type as a distinct level. To accompish this task we first import the description key for the events which was constructed from the table 2.1.1 “Storm Data Event Table” from page 6 of the Storm Data Preparation document available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
Our strategy for building our desired vector of events with standarized description of events is gonna be the following:
- Split dual events in the key table, we are gonna merge them together in the text matching algotighm (see below). Note that some events are gonna be true in more than one event variable; this is because sometimes the event type matches more than one event description from the table.
- Import the key as a character vector.
- construct a loop that matches every element of the key vector to the EVTYPE column and construct a new dummy variable based on the matching result.
Lets import the key and take a look at the first elements.
## Event.Description
## 1 Astronomical Low Tide
## 2 Avalanche
## 3 Blizzard
## 4 Coastal Flood
## 5 Cold
## 6 Debris Flow
Let’s create the events dataframe with logical elements.
Now let’s create the composed variables of the original key table (i.e. Cold/Wind Chill, Extreme Cold/Wind Chill, Frost/Freeze , Hurricane (Typhoon) and Storm Surge/Tide).
Note that the only event that intersect with others is “Wind Chill”, therefore is the only one we need to delete. Finally, we converted the logical data frame to binary in order to treat each column as a dummy variable. (In this case, turns out this transformation is necessary in order to run the dummy regression with no intersect.)
This last data frame is what we need in order to run our methodology.
Analysis
Our strategy to quantify the effects of each event in mortality, injuries and property damage is very straightforward. We are going to fit a Classic Linear Model to each of our three outcomes: injuries, fatalities and property damage with all our dummy variables as covariates. We assume the classic assumptions of normality and finite variance hold for the error term. Each regression line is gonna be fitted without intercept in order to have each coefficient represent the estimated population mean and also to directly interpret each p value as the probability of observing the coefficients magnitude or larger under the null hypothesis of mean equal to 0 (\(H_0:\beta i=0|H_1:\beta i\neq0\) where i is each dummy variable we created representing the events.)
Once we train the linear model, we are gonna grab the coefficients and plot them in descending order to see which ones are the biggest in each regression and also make some comments about its statistical significance and the magnitud of the damage.
Results
In this section we present the main results and plots generated. The first regression corresponing to the fatalities outcome and the main plot is presented in the following figure.
The variable Debris.Flow presents singularity because it doesnt have any variability.It is important to note the some variables are not statistically significantly different from zero under a two sided t test for every standard significance level. We are, however, most interested in the biggest one in order of magnitud. The main three regressors in order of magnitud to explain the number of fatalities are Tsunami, Heat and Rip. Current all three strongly statistically different from zero. Tsunamis for example, have a 1.65 point estimate meaning each event causes approximately 1.65 deaths.
Next we review the injuries estiamtion.
The results show that Typhoons, Hurricanes and Tsunamis in that order are the main events related to injuries. All of them show statistically different from zero estimates and for example, each Typhoon event corresponds to 13.38 injuries in average.Finnaly, the results from the estimation of the property damage outcome is shown in the following lines.
The estimation show that the top three most devastating meteorological events for property damages are Hurricanes, Typhoons and Storm Surge Tides in that order with its estimators strongly different from zero after accounting for sources of variation. Hurricanes cost $733 Million US Dollars each on average.Final Remarks
To end the analysis and open the discussion we think more detail in the analysis for example separating estimations by state can lead to useful insights since the marginal effects of each type of catastrophic event may differ between states due to the intrinsic climatological conditions of said places. Another interesting aspect of the data to further investigate is the count nature of injuries and fatalities which may be far from the assumption of normality because they are bounded at zero and also not continuous, maybe not even asyptotically. A poisson distribution of \(Y-E(Y|X)\) might be more appropriate.