Skip to content
Oisín Bates
GitHubLinkedin

Analyzing and visualizing weather data with R

R3 min read

Having moved from the West of Ireland to the Pacific North West I was interested to observe that, while both people regularly complain about the rain, the rainfall patterns appeared quite different. That initial observation led me to wonder how similar, or different, the patterns are between the two regions? This blog describes the process of accessing historical weather data, processing the dataframes with R, and visualizing the findings with R’s ggplot2 package. In doing so I will hopefully reach a conclusion as to which ‘West Coast’ has more cause to complain about rainfall. The weather stations we’ll reference are located at Shannon and Vancouver international airports.

Sourcing and Accessing Data

Initial Google searches will likely lead to the R package ‘weatherData’. It’s worth noting that despite its prominence in google search results, this package no longer works due to its API source, Weather Underground, moving to a paid model. A follow-up search may reference Weather Underground’s own R package, though it’s worth noting that their free-tier requires users to maintain a personal weather station in order to receive an API key.

I found the most appropriate solution was to pull data directly from the relevant national meteorological services. As many meteorological services have begun publishing their own R packages in recent years, Canadian data is accessible from the ‘weathercan’ package. The Irish Meteorological Service, Met Éireann, does not maintain an R package, but provides csv data for download from met.ie.

Determining Metrics

While Irish records track all forms of precipitation under a single count, Canadian records differentiate between ‘total rain’ and ‘total precipitation’. The latter includes all forms of precipitation while the former includes ‘all liquid precipitation’ but excludes snowfall. As snow is far more common in Vancouver than Shannon and the subject of this study is rainfall, it’s tempting to compare Shannon’s total precipitation to Vancouver’s total liquid precipitation. Ultimately though, it makes the most sense to stay consistent and compare identical metrics.

Importing Data

For the csv file from met.ie, it’s necessary to read from line 25 onwards, as the lines before this contain the dataset’s glossary. For the Canadian data, it’s necessary to concatenate data from two datasets as Vancouver International Airport’s weather station changed in June 2013.

1# get shannon data.
2# Csv file sourced for 'SHANNON AIRPORT' weather station at https://www.met.ie/ga/climate/available-data/historical-data
3shannon_df <-
4 read.csv("dly518.csv", skip = 24)
5
6# get vancouver data
7pre_2013_van_df <-
8 weather_dl(
9 station_ids = 889,
10 interval = "day",
11 string_as = NULL,
12 start = "2010-01-01",
13 end = "2013-06-12"
14 )
15post_2013_van_df <-
16 weather_dl(
17 station_ids = 51442,
18 interval = "day",
19 string_as = NULL,
20 start = "2013-06-13",
21 end = "2019-12-31"
22 )
23vancouver_df <- rbind(pre_2013_van_df, post_2013_van_df)

Standardizing the Data Frames

In order to efficiently work with these data frames, it’s necessary to format their date columns, restrict their date ranges, and standardize their column names.

1# format dates
2shannon_df$date <- as.Date(shannon_df$date, format = "%d-%b-%Y")
3vancouver_df$date <- as.Date(vancouver_df$date)
4
5# filter Shannon dataframe to last ten years
6shannon_df <-
7 shannon_df[shannon_df$date >= "2010-01-01" &
8 shannon_df$date <= "2019-12-31",]
9
10# add consistent month and year columns
11shannon_df <- shannon_df %>%
12 mutate(month = month(date), year = year(date))
13vancouver_df <- vancouver_df %>%
14 mutate(month = month(date), year = year(date))
15
16# standardize column names for convenience
17setnames(shannon_df, "rain", "total_precip")

Calculating the Highest Count of Consecutive Dry Days for each Year

My approach was to group each period of consecutive dry days with a unique id for each, before using these ids to get the highest count for each year.

1get_consec_dry_day_count <- function(df, region_name) {
2 processed_df <- df %>%
3 group_by(consec_dry_id = rleid(total_precip == 0)) %>%
4 mutate(consec_dry_days = if_else(total_precip == 0, row_number(), 0L)) %>%
5 group_by(consec_dry_id) %>%
6 top_n(1, consec_dry_days) %>%
7 group_by(year) %>%
8 top_n(1, consec_dry_days) %>%
9 mutate(location = region_name) %>%
10 select(date, year, consec_dry_days, location)
11
12 return(processed_df)
13}
14
15shannon_consec_dry_day_df <-
16 get_consec_dry_day_count(shannon_df, "Shannon")
17vancouver_consec_dry_day_df <-
18 get_consec_dry_day_count(vancouver_df, "Vancouver")
19consec_dry_day_df <-
20 rbind(shannon_consec_dry_day_df, vancouver_consec_dry_day_df)

The data showed that Vancouver consistently had the highest number of consecutive dry days.

Most consecutive dry days by year

The data showed that Vancouver’s driest consecutive periods were typically in the months of July-September while Shannon was far less predictable. These dates could be included as tooltip values if creating interactive plots.

Table of consecutive dry days

Get Monthly Insights

In terms of monthly insights, I was interested in total precipitation volume, volume by ranges, and counting dry days (days with 0mm precipitation). I also checked for NA values, of which there were 28 for Vancouver and 0 for Shannon.

1get_monthly_insights <- function(df, region_name) {
2 processed_df <- df %>%
3 group_by(month) %>%
4 summarise(
5 na_count = sum(is.na(total_precip)),
6 sum_precip = sum(total_precip, na.rm = T),
7 dry_days = sum(total_precip == 0, na.rm = T),
8 under_five_mm = sum(total_precip > 0 &
9 total_precip < 5, na.rm = T),
10 five_to_ten_mm = sum(total_precip >= 5 &
11 total_precip < 10, na.rm = T),
12 ten_to_fifteen_mm = sum(total_precip > 10 &
13 total_precip < 15, na.rm = T),
14 fifteen_to_twenty_mm = sum(total_precip >= 15 &
15 total_precip < 20, na.rm = T),
16 twenty_plus_mm = sum(total_precip >= 20, na.rm = T)
17 ) %>%
18 mutate(location = region_name)
19
20 return(processed_df)
21}
22
23shannon_monthly_insights_df <-
24 get_monthly_insights(shannon_df, "Shannon")
25vancouver_monthly_insights_df <-
26 get_monthly_insights(vancouver_df, "Vancouver")
27monthly_insights_df <-
28 rbind(shannon_monthly_insights_df, vancouver_monthly_insights_df)

Tibble of mm ranges

The resulting plots show greater precipitation volumes in Vancouver for the months of October to April, and higher counts of dry days in Vancouver for all but the month of March.

Total monthly precipitation

Total dry days by month

Calculate for mm Ranges

For plotting the precipitation range columns, it was necessary to convert the dataframe to long format in order to plot the data by individual facets.

1long_format_monthly_insights_df <-
2 gather(
3 monthly_insights_df,
4 precip_mm_range,
5 precip_day_count,
6 under_five_mm:twenty_plus_mm,
7 factor_key = TRUE
8 )

Long format tibble

This plot required more customization than the others and included a custom legend.

1ggplot(data = long_format_monthly_insights_df) +
2 geom_point(mapping = aes(x = month, y = precip_day_count, color = precip_mm_range)) +
3 facet_wrap(~ location) +
4 labs(
5 title = "Precipitation by mm Range, 2010-2019",
6 subtitle = "Days with 0mm are excluded",
7 y = "Day Count",
8 x = "Month"
9 ) +
10 theme_bw() +
11 scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)) +
12 scale_colour_discrete(
13 name = "MM Ranges",
14 breaks = c(
15 "under_five_mm",
16 "five_to_ten_mm",
17 "ten_to_fifteen_mm",
18 "fifteen_to_twenty_mm",
19 "twenty_plus_mm"
20 ),
21 labels = c("< 5", ">= 5 & < 10", ">= 10 & < 15", ">= 15 & < 20", ">= 20")
22 )

Precipitation by mm range

Conclusion

The findings show that Vancouver gets higher volumes of precipitation in shorter, isolated periods, while maintaining clear seasons with far less rain in Summer. In contrast, Shannon typically has far fewer consecutive dry days and far more days with 0 to 5 mm of rainfall. Ultimately, with something as arbitrary as complaining about the weather, people are always going to find an angle to lament their own experiences. In that spirit I’m going to conclude that Ireland has more cause to complain on the grounds that its weather requires its people to carry umbrellas on a higher number of days per year.

This project’s source code is viewable on GitHub.