Where’s the breaking point in solar ROI?
A simple method to find the elbow of PV economics (with Python)
Finding the optimal size of a rooftop PV system can be tricky. There’s no single way to determine the right size, and multiple factors come into play when making this decision. In this Python tutorial, we’ll calculate some of the most common metrics used for this purpose, then explore one methodology to choose a size.
This tutorial is part of a series on simulating and analyzing rooftop solar energy production using Python. In previous tutorials, we simulated energy production for a roof using pvlib, analyzed electricity flows for a real building from the Building Data Genome Project, and compared physics-based methods for solar estimation to Google’s Solar API.
For brevity, we won’t recalculate the solar production of a single module in this tutorial. You can find those calculations in earlier tutorials, so for the code that follows, we’ll assume the variable module_energy
is already available in the workspace.
Let’s start by importing the libraries we need for the analysis.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
Then let’s import the consumption data from the building we’re analyzing. This building is part of the Building Data Genome Project 2. For more details about the dataset and the data extraction process, you can refer to this tutorial.
# read electricity consumption file
meters_df = pd.read_csv('data/electricity_cleaned.csv')
# set the timestamp column as index of the dataframe
meters_df.set_index('timestamp', inplace=True)
meters_df.index = pd.to_datetime(meters_df.index)
# create a new dataframe with only the data the selected building
building_df = meters_df['Rat_education_Alfonso']
building_df = building_df.rename('consumption')
# select one year of consumption data
annual_site_consumption = building_df['2016-01-01':'2016-12-31']
Now that the consumption data is available, we’ll calculate the hourly solar production of a rooftop PV system for multiple system peak capacities. Then, for each system size, we’ll determine the ratio of self-consumption and self-sufficiency.
The maximum system capacity is limited by the rooftop area, which we previously found to be about 22,500 m². Based on the panel’s characteristics (also described previously), this area corresponds to 12,680 panels, or a peak capacity of 5,072 kWp.
# define parameters of the system
module_rated_power = 0.4 # kWp
max_capacity = 5072 # kWp
min_capacity = 0.1 * max_capacity
electricity_price = 0.10 # $/kWh
injection_price = 0.02 # $/kWh
# we want to test 200 different configurations in between the min and max size
num_systems = 200
system_sizes = np.linspace(min_capacity, max_capacity, num_systems)
Let’s now iterate over 200 different configurations and calculate self-sufficiency and self-consumption ratios for each. We define the self-consumption ratio (SCR) as the ratio between the total annual self-consumed energy (Esc) and the total annual energy produced (Ep) by the PV system. This indicates how much of the PV production is used onsite versus how much is injected into the grid.
We define the self-sufficiency ratio (SSR) as the ratio between total annual self-consumed energy (Esc) and total annual building consumption (Ec). It represents how much of the building’s load is covered by solar.
# calculate scr and ssr for each of the 200 system sizes
results_df = pd.DataFrame()
for system_size in system_sizes:
panel_count = system_size / module_rated_power
pv_production = module_energy * panel_count / 1000
# calculate the grid consumption (electricity imported from the grid)
grid_consumption = (annual_site_consumption - pv_production).clip(lower=0)
# calculate self consumption (electricity that is consumed on site)
self_consumption = annual_site_consumption - grid_consumption
# calculate grid injection (electricity injected back into the grid)
grid_injection = pv_production - self_consumption
# calculate the self-consumption ratio
self_consumption_ratio = self_consumption.sum() / pv_production.sum()
# calculate the self-sufficiency ratio
self_sufficiency_ratio = self_consumption.sum() / annual_site_consumption.sum()
# add results as a new row to the results dataframe
results_df = pd.concat([results_df, pd.DataFrame({'system_size': [system_size], 'self_consumption_ratio': [self_consumption_ratio], 'self_sufficiency_ratio': [self_sufficiency_ratio]})], ignore_index=True)
# plot the results
fig = go.Figure()
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_consumption_ratio'], mode='lines', name='Self Consumption Ratio'line=dict(color='#ffc107')))
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_sufficiency_ratio'], mode='lines', name='Self Sufficiency Ratio', line=dict(color='#ff5733')))
fig.show()
Output
Let’s analyse these results.
For small PV sizes (left side of the x-axis), nearly all the generated solar energy is used onsite, so self-consumption is close to 100%.
As PV capacity increases, the building can’t consume all the extra electricity the system generates, so the self-consumption ratio drops. More energy is exported to the grid.
The self-sufficiency ratio (the fraction of the building’s load covered by PV) is relatively low for smaller systems, because there’s not enough solar capacity to meet the demand.
As system size increases, PV production covers a larger portion of the total energy needs. However, it eventually plateaus. After about 2,000 kWp, each additional kWp only adds a small incremental increase in coverage.
These metrics can help us understand the system’s dynamics: as we increase capacity, we cover more of the building’s load but self-consume less of the produced electricity. Note that this graph will be different for each individual building analyzed, as it represents how well production hours align with consumption hours for that building.
Although it’s informative, this plot doesn’t give a clear idea of how to determine the optimal system size. Let’s include another factor: payback time, which is often the most important factor when installing a PV system. We’ll calculate the system’s economic benefit by assuming an electricity cost of $0.10/kWh and a compensation of $0.02/kWh for electricity injected into the grid.
# define prices
electricity_price = 0.10 # $/kWh
injection_price = 0.02 # $/kWh
results_df = pd.DataFrame()
for system_size in system_sizes:
panel_count = system_size / module_rated_power
# match the PV production to the consumption
pv_production = module_energy * panel_count / 1000
# calculate the grid consumption (electricity imported from the grid)
grid_consumption = (annual_site_consumption - pv_production).clip(lower=0)
# calculate self consumption (electricity that is consumed on site)
self_consumption = annual_site_consumption - grid_consumption
# calculate grid injection (electricity injected back into the grid)
grid_injection = pv_production - self_consumption
# calculate the self-consumption ratio
self_consumption_ratio = self_consumption.sum() / pv_production.sum()
# calculate the self-sufficiency ratio
self_sufficiency_ratio = self_consumption.sum() / annual_site_consumption.sum()
# calculate the return from self consumption
self_consumption_return = self_consumption.sum() * electricity_price
# calculate the return from injection
injection_return = grid_injection.sum() * injection_price
# calculate the total annual return
total_return = self_consumption_return + injection_return
# add results as a new row to the results dataframe
results_df = pd.concat([results_df, pd.DataFrame({'system_size': [system_size], 'self_consumption_ratio': [self_consumption_ratio], 'self_sufficiency_ratio': [self_sufficiency_ratio], 'total_return': [total_return]})], ignore_index=True)
# plot the results
fig = go.Figure()
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_consumption_ratio'], mode='lines', name='Self Consumption Ratio', line=dict(color='#ffc107')))
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_sufficiency_ratio'], mode='lines', name='Self Sufficiency Ratio', line=dict(color='#ff5733')))
# plot the total return on a different y axis
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['total_return'], mode='lines', name='Total Return', yaxis='y2', line=dict(color='#007BFF')))
fig.update_layout(yaxis2=dict(overlaying='y', side='right'))
fig.show()
Output
The growth of total annual economic return also slows down as the system capacity increases. This happens because as more capacity is added, the SCR decreases, and a larger share of electricity is sold back to the grid at $0.02/kWh instead of being self-consumed at $0.10/kWh.
In practical use-cases, a CapEx analysis is typically performed to find the optimal system size, considering the capital investment, annual returns, and how electricity prices and inflation might evolve. We won’t go too deep into the financial details here, but the metrics we calculated can help us build an intuition about the capacity “sweet spot.” This intuition can then support the CapEx analysis. By examining the chart, we see that between 1,000 kW and 2,000 kW, the SSR plateaus and the annual return begins to slow down. Can we find a way to pinpoint this inflection point?
Mathematically, finding this inflection point is often called detecting the “knee” or “elbow” of the function, where the rate of increase starts to flatten. To find this point, we’ll use the Kneedle algorithm, a systematic method for detecting it in monotonically increasing curves. The algorithm was presented in this paper and we’ll use the kneed package for its Python implementation.
# find the knee algorithmically
from kneed import KneeLocator
# apply the Kneedle algorithm to the annual return function
kneedle = KneeLocator(results_df['system_size'], results_df['total_return'], S=1.0, curve="concave", direction="increasing")
print('Optimal system size:', round(kneedle.knee, 0), 'kWp')
Output
Let’s also plot the results:
# plot scr and ssr
fig = go.Figure()
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_consumption_ratio'], mode='lines', name='Self Consumption Ratio', line=dict(color='#ffc107')))
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['self_sufficiency_ratio'], mode='lines', name='Self Sufficiency Ratio', line=dict(color='#ff5733')))
# add a vertical line at the elbow
fig.add_vline(x=kneedle.knee, line_dash="dash", line_color="#7ac53c", annotation_text="Optimal Size")
# plot the total return on a different y axis
fig.add_trace(go.Scatter(x=results_df['system_size'], y=results_df['total_return'], mode='lines', name='Total Return', yaxis='y2', line=dict(color='#007BFF')))
fig.update_layout(yaxis2=dict(overlaying='y', side='right'))
fig.show()
Output
The detected optimal capacity is 1,700 kW. Of course, this is just a rule-of-thumb indication of the inflection area. The results of the Kneedle algorithm also depend on the sensitivity parameter (S), which affects how soon the inflection point is found, so the detected inflection point shouldn’t be treated a strict reference.
Conclusion
In this tutorial, we explored a method to estimate the optimal PV system size by considering metrics like the self-consumption ratio, self-sufficiency ratio, and the annual economic return. Although this is a good first analysis, real-world cost-effectiveness studies are more involved, especially when we consider hourly spot market prices. In the last year, we’ve seen high electricity price volatility and declining solar capture rates. This adds complexity because electricity market prices often depend on solar generation as well.
In a future tutorial, I’ll analyze how variable electricity and falling solar capture rates prices affect the cost-benefit analysis of PV systems.
Very interesting! Thank you for your insightful work. I wonder how storage (that allows using the excess solar during peak hours) would affect the sizing