Being in lockdown one way to keep occupied is with some data analysis. An interesting question with so many places shut down is how much less electricity is being used during the lockdown.
The first step will be to clear the workspace and load the required packages, rvest
some webscraping, lubridate
to manipulate dates and times and of course tidyverse
.
rm(list=ls())
library(tidyverse)
library(lubridate)
library(rvest)
Scraping Electricity Demand
Scraping the data involves downloading many zip files so we will create some folders (if they do not already exist) to store these. These folders and files will all be deleted at the end.
if(!dir.exists('data')){dir.create('data')}
if(!dir.exists('monthlyzips')){dir.create('monthlyzips')}
if(!dir.exists('dailyzips')){dir.create('dailyzips')}
The following code finds links for all the archived zip files of halfhourly operational demand. There is one zip file for each month, each of which contains zip file for each day of data. The loop downloads these monthly zip files and unzips them.
#URL for archived operational demand
arch_url<-'http://nemweb.com.au/Reports/ARCHIVE/Operational_Demand/ACTUAL_DAILY/'
arch_url%>%
read_html%>%
html_nodes("a")%>% #Get hyperlinks
html_attr("href")%>% #Get link url
tail(-1) %>% #Remove fist link
paste0('http://nemweb.com.au',.)->monthlyzips
#Extract monthly zip files
for (i in monthlyzips){
download.file(i,destfile = paste0('monthlyzips/',basename(i)))
unzip(paste0('monthlyzips/',basename(i)),exdir = 'dailyzips')
}
More recent data can be found at a different URL. The following code scrapes and downloads those links
#Current data
current_url<-'http://nemweb.com.au/Reports/CURRENT/Operational_Demand/ACTUAL_DAILY/'
current_url%>%
read_html%>%
html_nodes("a")%>% #Get hyperlinks
html_attr("href")%>% #Get link url
tail(-1) %>% #Remove fist link
paste0('http://nemweb.com.au',.)->dailyzips
basenames<-basename(dailyzips)
for (i in dailyzips){
download.file(i,destfile = paste0('dailyzips/',basename(i),'.zip'))
}
With all daily zip files downloaded these can be unzipped into csv files.
#Unzip all files
dayzips<-dir('dailyzips')
for (i in dayzips){
unzip(paste0('dailyzips/',i),exdir = 'data')
}
The next block of code includes a function that processes each csv file. The data for Victoria are kept (although all states could be investigated), as well as variables we are interested in (Demand, Time/Date and just Date). This function is used with the map_dfr
from the purrr
package which creates an output by concatenating the rows of dataframes. Once completed, the average daily electricity demand is obtained using group_by
and summarise
.
#Combine data
datafiles<-dir('data')
clean_energy_i<-function(i){
read_csv(file = paste0('data/',i),skip = 1,n_max = 240)%>%
filter(REGIONID=="VIC1")%>%
select(Time=INTERVAL_DATETIME,
Demand=OPERATIONAL_DEMAND_1)%>%
mutate(Time=ymd_hms(Time),
Date=as.Date(Time))->a
}
hhdata<-map_dfr(datafiles,clean_energy_i)
hhdata%>%group_by(Date)%>%
summarise(AvDemand=mean(Demand))->energydata
We now have a data frame with average daily electricity demand for Victoria.
Scraping Temperature
Whenever looking at electricity demand a very important thing to control for is temperature. Recent data can be easily scraped from the Bureau of Meteorology. The url for the files that need to be downloaded follow a fairly predicable structure. The term “IDCJDW3033” refers to the Melbourne Airport weather station. Most electricity demand in Victoria comes from Melbourne and airports tend to have the most reliable weather measurements.
if(!dir.exists('temps')){dir.create('temps')}
months<-c(paste0('20190',4:9), #change these according to the date you download
paste0('2019',10:12),
paste0('20200',1:5))
for(i in months){
download.file(paste0('http://www.bom.gov.au/climate/dwo/',
i,
'/text/IDCJDW3033.',
i,'.csv'),
destfile = paste0('temps/',i,'.csv'))
}
The data can be combined by writing a function that cleans one csv file and then uses the map_dfr
function to create a single data frame.
clean_temp_i<-function(i){
read_csv(paste0('temps/',i),skip = 5)%>%
select(2,4)->a
colnames(a)<-c('Date','MaxTemp')
a<-mutate(a,Date=as.Date(Date))
return(a)
}
tempdata<-map_dfr(dir('temps/'),clean_temp_i)
Joining Data and Clean Up
Finally, the electricity demand data can be joined with the temperature data.
data<-full_join(energydata,tempdata)%>%
na.omit()%>%
mutate(wday=wday(Date,label = T)%>%as.character(),
lockdown=(Date>'2020-03-15'))
unlink('data',recursive = T)
unlink('monthlyzips',recursive = T)
unlink('dailyzips',recursive = T)
unlink('temps',recursive = T)
saveRDS(data,file = 'energy_temp_data.rds')
The remaining code creates a day of week variable and a dummy for the lockdown which in Australia began on the 15th March, 2020. The unlink
function deletes all the folders containing zip and csv files. The data is then saved as an rds file.
Analyisis with a Generalised Additive Model.
To investigate the impact of the lockdown we can fit a Generalised Additive Model (GAM). This model allows for some regression effects to be non-linear which in the context of this model will be Temperature. Temperate exhibits a clear non-linear effect on electricity demand, on hot days more electricity is used for cooling and on cold days more electricity is used for heating. The model also includes dummy variables for each day of the week and for the lockdown.
The GAM model can be fit using the
library(mgcv)
gamout<-gam(AvDemand~1+s(MaxTemp)+wday+lockdown,data = data)
The output of this model can be viewed with the summary
function.
summary(gamout)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## AvDemand ~ 1 + s(MaxTemp) + wday + lockdown
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5218.005 40.453 128.990 < 2e-16 ***
## wdayMon -125.280 56.379 -2.222 0.02685 *
## wdaySat -484.283 56.825 -8.522 3.45e-16 ***
## wdaySun -616.518 56.695 -10.874 < 2e-16 ***
## wdayThu 41.474 56.159 0.739 0.46065
## wdayTue 7.109 56.410 0.126 0.89977
## wdayWed 19.436 56.316 0.345 0.73018
## lockdownTRUE -147.519 44.895 -3.286 0.00111 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(MaxTemp) 4.463 5.511 104.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.697 Deviance explained = 70.6%
## GCV = 93019 Scale est. = 90143 n = 403
This indicates that 147.52 MW fewer of electricity are being used during the lockdown. To put that into perspective the maximum capacity of the Dartmouth Dam (a large hydroelectric plant in the Snowy) is 150 MW. Naturally this analysis is crude but gives some indication of the reduction in electricity demand during the lockdown.