Free electricity? Did renewables really push prices to zero?

Last updated on Aug 12, 2019 20 min read

This post is the culmination of three different things.

I have finally got myself organised enough to have a personal website. The framework that I’m using is blogdown which despite the name is not just for blogs. However, since the facility is available I might start blogging on matters that are of professional interest to me. While I cannot promise to keep any unprofessional rants to myself, this website will not be the forum.
In this instance the matter of interest is the analysis of energy data. The particular instigator for this post is the following article on the occurence of zero wholesale prices in the Australian National Electricity Market.
Earlier today I was discussing dimension reduction for data visualisation which gave me a way of checking whether this incident was truly atypical.

So here is the analysis. Everything is done using R and the purpose here is to be completely transparent about how the analysis was done.

Most of this code relies on different parts of the tidyverse package, so the first step is to load it.

require(tidyverse)

The next step need to download the data from the Australian Energy Market Operator (AEMO). Price and demand data are available in csv files that have a very nicely structured URL. The beginning and end of the URL are always the same, the middle part changes depending on the month and state. The following chunk of code defines these things.

#The following is the first part of the url for each csv file.
head_url<-'https://aemo.com.au/aemo/data/nem/priceanddemand/PRICE_AND_DEMAND'

#The following is the last part of the url for each csv file.
tail_url<-'1.csv'

#This is the month we are interested in
month<-'201907'

The next step will be to create a function that reads data for each state and tidies it up a bit. The map family of functions can construct then be used to construct a big data frame containing data for all states.

download_aemo_data<-function(state){
  #This function will take a state name as input, download the data and tidy up the file
  file_url<-paste0(head_url,'_',month,'_',state,tail_url)
  file_url%>%
    read_csv%>% #read data
    mutate(TIME=as.POSIXct(SETTLEMENTDATE))%>% #Convert to date
    select(REGION, #Only keep variables we are interested in
           TIME,
           RRP)->df 
  return(df)  
}

#The states of the NEM
all_states<-c('NSW','VIC','QLD','SA','TAS')

#Now use map to apply this function to all states
all_state_data<-map_dfr(all_states,download_aemo_data)

all_state_data

## # A tibble: 7,440 x 3
##    REGION TIME                  RRP
##    <chr>  <dttm>              <dbl>
##  1 NSW1   2019-07-01 00:30:00  59.6
##  2 NSW1   2019-07-01 01:00:00  53.6
##  3 NSW1   2019-07-01 01:30:00  52.3
##  4 NSW1   2019-07-01 02:00:00  51.6
##  5 NSW1   2019-07-01 02:30:00  45.8
##  6 NSW1   2019-07-01 03:00:00  50.4
##  7 NSW1   2019-07-01 03:30:00  52.0
##  8 NSW1   2019-07-01 04:00:00  58.0
##  9 NSW1   2019-07-01 04:30:00  50.3
## 10 NSW1   2019-07-01 05:00:00  51.2
## # … with 7,430 more rows

This could also be done with a loop, but I find the map family of function in the purrr package to be a more elegant solution. Now its time to plot a time series of price for each state.

all_state_data%>%
  ggplot(aes(x=TIME,y=RRP))+
  geom_line(aes(group=REGION,color = RRP > 0))+
  facet_wrap(~REGION,nrow = 5)

What we can see here is that rather than zero prices, some periods are characterised by negative prices. To highlight this color has been added; negative (or zero) prices are red and positive prices are blue. Negative prices often occur when supply from renewables is high and demand is low. They imply that generators are willing to pay wholesalers to use the electricity they generate.

The following code can be used to filter out observations where prices were negative in all states. First we spread the dataset from long to wide

all_state_data_spread<-spread(all_state_data,REGION,RRP)

Then check if there are any observations for which all numeric variables are less than 0.

all_state_data_spread%>%
  filter_if(vars(is.numeric(.)),all_vars(. < 0))

## # A tibble: 0 x 6
## # … with 6 variables: TIME <dttm>, NSW1 <dbl>, QLD1 <dbl>, SA1 <dbl>,
## #   TAS1 <dbl>, VIC1 <dbl>

As you can see it returns nothing which indicates that there was no period where half-hourly spot prices all dropped to zero or lower simultaneously across all states. Have I uncovered some sort of greenie conspiracy involving the Guardian?…

… Well, no. The spot market data used here is at a half hour resolution and is in fact an average of the 5-minute spot prices. Perhaps prices fell to zero for one 5-minute interval within the half hour and this information was lost by taking the average. Perhaps restricting attention to periods where spot prices dropped below $10 will reveal something.

all_state_data_spread%>%
  filter_if(vars(is.numeric(.)),all_vars(. < 10))

## # A tibble: 3 x 6
##   TIME                 NSW1  QLD1    SA1   TAS1  VIC1
##   <dttm>              <dbl> <dbl>  <dbl>  <dbl> <dbl>
## 1 2019-07-20 13:00:00  7.46  6.67 -22.6  -46.7  -9.94
## 2 2019-07-21 12:00:00  2.98  2.8    2.32   2.31  2.52
## 3 2019-07-21 13:30:00  7.05  6.44   5.39   5.58  6.08

That first row looks promising, although prices for QLD and NSW are positive they are quite low. Indeed at 13:25 on July 20, 2019 spot prices in all five markets dropped to or below zero. This can be seen by downloading this zip file and navigating yourself to the right place. Perhaps using the 5-minute data is something I will do at length in a future post.

Finally, as promised, an alternative way to visualise these data. With five states, the data are five-dimensional. Unfortunately we cannot do a five-dimensional scatterplot. We can however plot the first two principal components of the data. When it comes to dimension reduction principal components really are an oldie but a goodie. The following code carries out PCA

all_state_data_spread%>%
  select_if(is.numeric)%>%
  prcomp(scale. = T)->pcaout
summary(pcaout)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5
## Standard deviation     1.7481 1.1182 0.58318 0.52551 0.27801
## Proportion of Variance 0.6112 0.2501 0.06802 0.05523 0.01546
## Cumulative Proportion  0.6112 0.8613 0.92931 0.98454 1.00000

About 80% of the data is explained by just the first two principal components. These can be plotted so that each dot refers to a point in time and in this way anomalous time periods can be seen. Two more packages will be used here. The first is broom which is used so that the output from the prcomp function can be coereced into a data frame. The second is plotly which is used to create an interactive graphic.

require(broom)
require(plotly)
all_state_data_spread%>%
  mutate(AveragePrice=(NSW1+VIC1+QLD1+SA1+TAS1)/5)%>%
  augment(pcaout,.)%>% #Add PCs back to data frame 
  ggplot(aes(x=.fittedPC1,
             y=.fittedPC2,
             col=AveragePrice,
             label=TIME))+
    geom_point(size=0.8)+
    scale_color_gradient2()+
    theme_dark()->g
ggplotly(g,tooltip = 'label')

To add a bit of colour, the average price is also portrayed with high average prices shown in darker shades of blue and low average prices shown in red. Prices close to zero are in lighter shades.

The plot above shows that higher prices are generally associated with the right hand side of the plot and lower prices with left hand side. Looking at the weights used to form the principal components shows that the bottom of the plot is associated with low (negative) prices in South Australia, Tasmania and Victoria while the top left hand side of the plot is associated with low (negative) prices In Queensland and NSW. Hovering over the points shows that the half hour period where all prices momentarily hit zero does lie towards the outside of the distribution. However it is by no means atypical, some other observations are in a similar region.

Anastasios N. Panagiotelis

Associate Professor of Business Analytics

My research interests include applied statistics and data science.