Identifying and Addressing Underperforming Solar Assets
Inside this Article
PV is a constantly evolving global industry. Equipment, tools and vendors are continuously entering or exiting the market. Codes change. Weather fluctuates. Tariffs are imposed. In spite of these challenges, the costs to build and operate PV assets have generally been shrinking relentlessly. To keep pace with this market and mitigate risks without cutting corners, stakeholders need to have regular and robust O&M practices in place at the portfolio level.
In this article, we provide insight into maintaining a healthy portfolio of assets by identifying and remediating performance problems on the dc side of the PV system. We base the lessons learned, case studies and recommendations on observed data from a fleet of several hundred PV assets, ranging in capacity from 200 kW to 300 MW, deployed in different climates and across the continental US.
The adage that “you can’t manage what you can’t measure” holds true for solar assets. To understand asset health, you need an analytics platform that leverages multiple data streams and sources—such as performance models, remote site data, satellite irradiance, preventative and corrective maintenance logs, and financial metrics—so that you can analyze individual assets, specific asset groups or the collective fleet.
This analytics platform is the foundation of any effort to understand asset health and identify underperforming assets. An increasing number of vendors offer case-by-case or fleet-level analytic services. While it is beyond the scope of this article to elaborate on the process of creating or evaluating an analytics platform, the International Electrotechnical Commission (IEC) has published a suite of technical standards that relate to PV system performance monitoring (IEC 61724-1), capacity evaluation (IEC 61724-2), and energy evaluation (IEC 61724-3).
Consider key performance indicators across the entire fleet. The benefit of a robust analytics platform is that it allows stakeholders to track and compare key performance indicators (KPIs) over time, looking for trends and outliers. While many metrics and methods are useful for monitoring asset health, stakeholders can learn a lot simply by monitoring a handful of KPIs, such as baseline performance index (BPI), performance index (PI), weather-corrected performance ratio, yield and availability. When used together, these five KPIs provide powerful insights regarding underperforming asset identification. (See “PV System Energy Test Evaluations,” SolarPro, October/November 2014.)
Baseline performance index. BPI is a basic metric that evaluates the measured plant output in relation to its predicted output (BPI = measured output ÷ predicted output). While this analysis is useful for understanding asset performance relative to a financial model, it is less useful for O&M purposes since it does not consider actual weather conditions. A site with 10% of its capacity offline could have a BPI of 100% because the weather is 10% sunnier than average.
Performance index. PI evaluates asset health by comparing the measured output to the expected output (PI = measured output ÷ expected output). Using the expected output rather than the predicted output corrects for weather. In the aforementioned scenario, a site with 10% of its capacity offline would show a PI of 90%, which would flag the site for investigation. The accuracy of the PI value is closely tied to the accuracy of the underlying performance model and the weather data used by the model. Even high-quality weather data can hold several percentage points of uncertainty. The more distributed a fleet and the smaller the individual projects, the more challenging it becomes to obtain clean irradiance data with minimal uncertainty. Regardless of this uncertainty, PI provides valuable information about asset health.
Weather-corrected performance ratio. This performance indicator compares a plant’s actual energy production to its theoretical energy-generating potential and describes how efficient a PV power plant is in converting sunlight incident on the PV array into ac energy delivered to the utility grid. While you can use performance ratio (PR) values to compare PV power plants in different locations, it is important to correct these results for weather bias. The authors of the NREL technical report “Weather-Corrected Performance Ratio” define a way to modify PR calculations to help reduce weather bias.
Yield. Specific yield evaluates PV plant performance by comparing its total annual energy output to its nameplate capacity rating (yield = kWhac ÷ kWpdc). This metric is useful for making a levelized comparison or peer-to-peer evaluation of PV assets, as it allows stakeholders to flag underperforming assets without needing to account for weather. It is especially powerful in sites with many generation blocks because you can compare the performance of each block side by side and look for outliers. At sites with multiple array orientations, you can normalize yield to account for different azimuth or tilt angles.
Availability. This metric is important because it characterizes the percentage of time that a PV power system is generating energy. As detailed in the Sandia report “A Best Practice for Developing Availability Guarantee Language on Photovoltaic (PV) O&M Agreements,” there are many ways to calculate availability. To identify underperforming assets, we recommend calculating and comparing the raw component availability, which quantifies the percent of time that an inverter generates energy during daylight hours without any exclusions. Contractually focused availability metrics often exclude periods of downtime and therefore provide less useful detail for understanding plant performance.
While desktop analytics are a crucial tool for identifying underperforming assets, KPIs are ultimately limited in resolution. The fog of uncertainty is always present. This is why periodic preventative maintenance (PM) is a core component of an O&M program.
One approach to PM is to use highly trained PV professionals who can cut through the fog to unveil a wealth of information. The benefit of this approach is that it can identify subtle module-level defects, reveal systemic issues that may be undetectable via IR imaging, confirm warranty failures with IV measurements, and identify any safety issues that may exist with wiring and other aspects of the array. In many cases, it may be difficult to find qualified professionals to inspect all the fielded PV assets in a large fleet. While less highly trained professionals are more readily available and can conduct inspections at lower costs, the quality of these inspections can be lower. If you take this approach, you run the risk of overlooking subtle yet critical performance issues.
While there is no substitute for in-field investigations, relying exclusively on boots-on-the-ground inspections is an expensive and labor-intensive PM solution. If you have hundreds of assets distributed over thousands of square miles, your PM program would need to employ dozens of vendors and manage myriad contractual obligations to inspect every asset. It is also difficult to standardize inspection techniques and report data using this approach.
Standardize PM activities across your fleet. Aerial thermography in tandem with aerial visual inspections addresses many of the issues associated with traditional approaches to PM. With a few exceptions due to restricted airspace, asset managers can scan an entire fleet with one unified process. The resulting data integrates smoothly with analytics platforms. These data streams are extremely valuable as they provide a mountain of data that are easily mined for insights into asset performance.
The histogram in Figure 1, for example, shows the percentage of modules presenting faults for more than 200 PV assets based on aerial inspection results. Although there is some spread to the distribution of results, the bin with the lowest fault rate contains the highest number of sites. There are also two notable outliers in these results, which we flagged for further analysis. While it is theoretically possible to obtain these same data using traditional boots-on-the-ground inspections, doing so would be time intensive and costly.
Drilling down into thermal scan details, we can not only identify submodule issues but also differentiate between a faulty diode versus failure in a cell string. These results are useful for identifying and remediating performance issues and also reduce safety risks by identifying dangerously hot modules that would otherwise go unnoticed. Zooming out from individual components, we can review data across the fleet and look for trends. As an example, fleet-level thermal scan results show that performance impacts due to hard shading—often associated with foreign objects on top of PV modules—are higher at elementary schools than at middle or high schools, suggesting that younger children are more likely to throw objects onto solar canopies, causing hot spots. The ability to quantify subtle impacts like these improves performance models and the accuracy of risk assessments early in the project development cycle.
A distribution such as the one in Figure 1 clearly identifies underperforming sites relative to the fleet population. You can use these results to both flag sites with performance issues and prioritize remediation efforts. By ranking KPIs across the portfolio, for example, you can create an initial remediation priority list, which you can refine to optimize your corrective maintenance (CM) efforts.
Focus remediation resources where they are needed most. All else being equal, the first priority is generally to fix those assets that are losing the most revenue. However, you may also want to address sites together within a specific geographic region to optimize logistics and management costs. It is also important to consider equipment warranties, as you may need to prioritize corrective maintenance at sites with equipment nearing the end of the warranty period to leave time to engage with the manufacturer.
In many cases, it is efficient to leverage a site’s existing O&M provider to conduct CM activities. This is especially true when PM inspection results provide a clear understanding of the underlying performance issue and the location of the problem. Since aerial inspection reports indicate the precise location of each fault down to the module level, the repair team can use this report as a map, which expedites the remediation process.
Use expert investigators where appropriate. As a PM inspection reveals greater scale, complexity, or potential implications of performance issues, the site may require more-detailed investigations. In the case of the two outlier sites in Figure 1, the owner engaged an independent engineering (IE) firm specializing in PV performance audits to conduct detailed ground-based investigations. The case studies that follow illustrate that this level of investigation is warranted where project stakeholders suspect unknown systemic or potentially systemic issues that require an exact diagnosis to determine next steps.
CASE STUDY 1: SYSTEMIC ISSUES
For the first 6 years of its operational life, this site was subject to condition-based maintenance, and project stakeholders used a remote analytics platform to appraise its health. In 2017, an aerial inspection provider conducted an aerial thermographic site inspection for the first time. This inspection revealed a number of faults with a similar infrared (IR) signature, a linear submodule anomaly corresponding with one or more cell strings. Based on the portfolio-level data in Figure 2a, the percentage of faults per site fell within a normal distribution; however, the percentage of linear submodule faults per site in Figure 2b raised red flags.
To gather more information, the owner dispatched an IE team to the site with the full-field IR scans in hand to conduct detailed tests. This team performed a variety of in-field tests on a random sampling basis, including visual inspection, IV-curve traces, IR thermography and electroluminescence (EL) imaging. The results of these tests largely confirmed the aerial IR results. The team determined that the most prominent defect was open-circuit submodule cell strings. Closer analysis revealed that a defective off-cell solder joint was the root cause of this defect.
In the most extreme cases of overheating, this defective solder joint was identifiable via visual inspection. However, visual inspection alone would not have captured the true scope of the problem, as the defective solder joints did not always show evidence of visual discoloration. Remote analytics based on KPIs also did not detect this issue. System availability was solid. The performance index showed no cause for alarm. The peer-to-peer yield appeared normal because these submodule faults were evenly distributed across the site.
Data gathered during the aerial inspection and the follow-up in-field tests support an ongoing warranty claim to address the underperformance issues at this site. Those modules with open-circuit submodule faults—whether due to a solder-joint failure or a shorted bypass diode—clearly meet the criteria for a valid performance warranty claim as they reduce output power by at least 33%. Since this is a systemic defect, it is also important to consider its impacts on energy generation. Given that we know the number and location of defects and have module- and string-level IV-curve test results, we can use component-level modeling software to accurately assess these losses based on the number of modules per string and the number of paralleled strings per maximum power tracker.
CASE STUDY 2: SUBTLE ISSUES
Remote analytics had flagged this site for active investigation based on availability and performance index. In spite of the ongoing CM investigation, the nature of the underperformance issues remained unknown until the owner conducted an aerial IR inspection. As shown in Figure 3, this inspection revealed a unique spatial distribution of modules presenting with some type of hot spot. This issue not only was localized to one subsection of the site, but also specifically impacted the last module in almost every eight-module source circuit. The visual imagery showed no corresponding anomalies.
While hot spots are usually low on a remediation priority list, the aerial inspection provider recommended further ground-based investigations to determine the root cause of this persistent defect pattern. These revealed that the inverter associated with this section had a nonfunctional ground-fault circuit, which meant that the array was operating in a condition susceptible to potential-induced degradation (PID). Cell degradation that starts at the beginning or end of the string of modules not only is consistent with PID, but also has effects that progress in severity over time.
After repair of the inverter issue, the IE firm conducted ground-based IV curve and EL measurements to determine that the remediation efforts were successful. Because project stakeholders caught the PID problem at an early stage of development, its effects were still reversible. If the problem had persisted, hot spots would have spread to subsequent modules in every string. If this development had gone undetected long enough, it could have caused permanent damage to the impacted modules.
While we have focused on the operational aspects of defect identification and mitigation, it is also important to avoid repeating costly issues in the future. The best time to maximize long-term system performance is before building a project, as system design decisions and equipment selection correlate strongly with performance and reliability. One way to reduce risk is to have a robust due diligence program in place throughout the design, procurement, installation and commissioning processes. Equally important is continuous feedback from operations back to the design, construction and commissioning teams, as this is what allows stakeholders to improve future projects based on lessons learned in the real world from fielded projects.
Facilitate continuous improvement. Performance issues are inevitable. The key to long-term success, therefore, is continuous improvement in how you identify, address and learn to avoid the issues that impact performance. Accelerating this learning process is vital to the success of the solar industry. Even as we are racing to reduce the levelized cost of energy, we must maintain the internal rate of return for solar projects. This is a challenging dynamic as it requires a circular flow of information between project stakeholders while looking for opportunities to streamline costs throughout the value chain. Only by sharing lessons learned can we continue to evolve and mature.
The authors would like to acknowledge Rob Andrews from Heliolytics, Mason Reed from Core Energy Works, Rob Chatelain and Steve Wheeler from Constellation, and Magin Reyes and Stan Tehee from Exelon Power Renewables for their contributions to this article.
—Kristine Sinclair / Heliolytics / Toronto, ON / heliolytics.com
—James Rand / Core Energy Works / Newark, DE / coreenergyworks.com
—Robert Flottemesch / Constellation / Baltimore, MD / constellation.com