4.3

/10

withdrawn4 位审稿人

最低3最高6标准差1.3

4.5

置信度

正确性2.3

贡献度2.0

表达2.0

ICLR 2025

Benchmarks and Custom Package for Energy Forecasting

Zhixian Wang,Qingsong Wen,Chaoli Zhang,Liang Sun,Leandro Von Krannichfeldt,Shirui Pan,Yi Wang

OpenReview PDF

提交: 2024-09-27更新: 2024-11-19

摘要

关键词

Energy Forecasting.+Benchmark.+Dataset

评审与讨论

审稿意见

评分: 3置信度: 52024-10-26

The paper presents a modular package for energy forecasting that incorporates datasets for load, wind, and photovoltaic (PV) energy. It emphasizes feature engineering techniques and custom loss functions aimed at aligning forecasting error with real-world dispatch costs. The package supports both probabilistic and point forecasting methods, evaluated across a set of 21 machine learning and deep learning models. The authors highlight their contribution in terms of feature engineering for load data, customization of loss functions for energy forecasting, and extensive benchmarking results.

优点

The paper's package allows for customized forecasting pipelines
The paper collects many datasets and benchmarked on 21 models.

缺点

My major concern for this paper is as follows.

Scale of Datasets: The datasets included are relatively small and may not fully capture the scale or complexity of modern power systems. For instance, the load and renewable datasets lack the granularity typically needed at operational levels, such as bus-level load forecasting or wind farm-level generation forecasts. A large-scale dataset, such as the ARPA-E PERFORM dataset provided by NREL, which includes hundreds of time series, would serve as a more relevant benchmark for modern power systems.
External Factors: The authors claim that external factors for renewables have not been extensively studied, which is inaccurate. In the energy sector, weather-informed and physics-informed models for load and renewable forecasting are well-established. Commercial software solutions, like Solcast and NREL's System Advisor Model (SAM), already rely heavily on meteorological factors and hybrid physical and machine learning models. These approaches are widely adopted and should be considered as relevant benchmarks. The software and their predictions are publicly available.
Limitations in Probabilistic Forecasting Approach: While the paper extends to probabilistic forecasting through quantile regression, this approach does not account for the correlation between errors across time series, which is essential in multivariate contexts like interconnected power systems. Effective probabilistic forecasting for power systems often requires capturing correlation structures to support stochastic optimization of power dispatch problems (since the paper emphasizes power grid dispatch). The metrics used to evaluate probabilistic forecasts (e.g., Pinball Loss and CRPS) are limited. Important metrics like the energy score (I agree this is similar to CRPS, but I believe it is more general than CRPS), which handles multivariate distributions, and the variogram score, which captures spatial or temporal correlation, are missing. These metrics are commonly used in power systems to assess probabilistic forecasts and would improve the package's relevance for real-world applications.

问题

Including datasets that mirror the complexity of operational power systems (such as the ARPA-E PERFORM dataset) would improve the package's utility and scalability.
It would be nice to compare with physical/hybrid models that have been well-established in the industry. The PERFORM dataset has predictions from the SAM model with outstanding accuracy for PV power prediction.
To align probabilistic forecasts with real-world power system applications, quantile regression should be supplemented with models that capture inter-series correlation, for example [1] and [2].
Additionally, introducing metrics such as the energy score and variogram score would enable a more robust evaluation of probabilistic forecasts, especially in multivariate contexts.

[1] Hauzenberger, Niko, et al. "Gaussian process vector autoregressions and macroeconomic uncertainty." Journal of Business & Economic Statistics (2024): 1-17. [2]Ashok, Arjun, et al. "TACTis-2: Better, faster, simpler attentional copulas for multivariate time series." arXiv preprint arXiv:2310.01327 (2023).

审稿意见

评分: 3置信度: 42024-11-02

In this paper, the authors propose a novel renewable electricity generation dataset that can be utilized in various time series forecasting tasks. This dataset is embedded in a Git Repository, along with other datasets and accompanied by several pre- and post-processing techniques, metrics, point and probabilistic forecasting models, and feature engineering methods.

The authors used this dataset to benchmark different forecasting models and test their proposed loss function, aiming to train models not only for optimal accuracy but also to comply with specific tasks.

优点

Provision of a Renewable Energy dataset.
Important collection of Energy datasets.
Combination of various techniques, models, and metrics.

缺点

Clarity could be improved to better understand the different parts/experiments of the proposal.
Paper flow could be enhanced.
Lack of details for some parts, even with the appendix.
Needs thorough proofreading.
Missing important state-of-the-art (SOTA) Time Series Forecasting baselines.

问题

Goal

What is the objective of this package?

To gather datasets and models related to “Energy (load, wind, photovoltaic)”: However, why not include datasets like the London smart meter datasets or NREL Solar energy, or other existing datasets? Additionally, why not include the latest SOTA models? Such choices might prevent future users to select your repository.
To provide datasets that have both energy and weather data: Yet why include datasets with only energy data (ELD and ELF)?
To introduce a specific loss including subsequent tasks: But without real “tasks” in the provided dataset, how can this loss be effectively tested? Only the simulation is not sufficient. Furthermore, what about the negative log-likelihood (NLL) loss that provides uncertainty (or probabilistic prediction) to point forecast models? Models using architecture based on diffusion or traditional models minimizing the NLL can provide the uncertainty of a given prediction (range of values for each time step). Isn’t this the solution to account for any potential task? Why would authors loss should be preferred compared to these options?

Questions

Q1

I agree that energy forecasting requires external factors' information to achieve better performance. However, having the information in the input is one thing; the model can learn the relation between the external factor and the target. But then comes the availability for the prediction window. Does the model have to predict the external factors along with the energy forecasting, or does the model receive the forecast of these external factors? For the latter, are the forecasts reliable? If there is a mistake in the forecast, there will be a mistake in the prediction. How is this taken into consideration in the metrics when it comes to benchmarking models?

We will input the renewable energy sequence and meteorological factors of the past 24 hours, as well as the meteorological factors of the next 24 hours.

This sentence implies that your scenario has a perfect forecast of future weather, which is highly unfeasible in real scenarios. Noise should have been included to account for errors in weather forecasts, if authors want to provide a package that is close to real case scenarios.

Q2

[…] ensuring minimum data information distortion.

Do we have a guarantee on that?

Experiment Setting

E1

It is beneficial for the community to study the relationship between different renewable energy types under similar climate conditions in a certain area.

Why did the authors not demonstrate such an advantage in the experiment section, with a multivariate-to-multivariate scenario of different types of renewables?

E2

There is a lack of information on the experiment setting. Are we in a Univariate-to-univariate, multivariate-to-univariate, or multivariate-to-multivariate forecasting? This uncertainty poses reproducibility problems. Figure 3 and Table 3 results would seem to be U2U, but it needs to be confirmed for each experiment. Authors should clearly state the forecasting task of each experiment.

E3

What does MQ[model_name] stand for in Figure 3?

E4

What does $123 456 789$ in Table 4's first columns stand for?

Feature Engineering

F1

How will feature engineering based on calendar info, such as workdays, benefit REF? Are there options for tailoring the engineered features according to the user's needs and datasets capabilities? Or it only provides the combination described in Section 3.2.1? Which definitely limit the usage of such features.

F2

How are engineered features integrated into transformer-based models (informer, etc.)? Are they considered as channels? The usage of such features limits the forecasting task to univariate-to-univariate forecasting, which impedes the usage of this framework. Such features should have been better integrated into the models for easier usage.

Dataset

D1

The UCI dataset is a power load dataset from the UCI database, […]

Not enough details. I assume it is the Electricity Load Diagrams, and it should be clearly mentioned.

D2

Why is UCI ELD not in the results of Figure 3 and Table 3? It is the most well-known and used dataset in the TSF domain. Is it because there is no weather data to do your feature engineering? If yes, your platform enables feature engineering only with calendar data, right? If so, it should be possible to give results with and without feature engineering considering only calendar data.

D3

Why, for the spatial aggregation of REF, are there no external variables? It would be better to provide min/max/std/mean of the weather indicators of the considered area to improve forecasts, as this is one of the main goals/advantages of this dataset/paper.

D4

Where are these regions/cities located? This information might be necessary to perform some feature engineering for given tasks.

D5

The Git repository does not contain all the datasets mentioned in the paper. Does load_with_weather.pkl correspond to REF?

Limitations

The SOTA of transformer models is outdated. It would be more pertinent to have models like iTransformer or Pathformer.

Revision

$L(\epsilon)$ is not introduced before being used.

Citation Formatting Issue

“As stated in (Menezes et al., 2020), […]” should be “As stated by Menezes et al. (2020), […]”
“(Wang & Wu, 2017) discovered the asymmetry […]”
“Based on (Zhang et al., 2022), our framework […]”

Authors should use \citet{} as mentioned in ICLR guidelines:

When the authors or the publication are included in the sentence, the citation should not be in parenthesis using \citet{} (as in “See Hinton et al. (2006) for more information.”). Otherwise, the citation should be in parenthesis using \citep{} (as in “Deep learning shows promise to make progress towards AI (Bengio & LeCun, 2007).”)

Proofreading

“[…] such as PV,onshore wind, and offshore wind, […]” missing space.
“[…] on traditional machine learning models and deep learning models.” -> “[…] on traditional machine learning and deep learning models.”
“[…] in data at this level(Jeong et al., 2021a).” missing space before citation.
“[…] the real requirement(here we mainly refer […]”
“To our knowledge, […]” -> To the best of our knowledge,

审稿意见

评分: 6置信度: 52024-11-03

This paper is on an open source package for benchamarking energy-forecasting time-series models. Authors have integrated a framework which can handle and showed performance of widely varying 11 energy public datasets and integrated both SOTA deep-learning models(mostly time-series forecasting) and other machine learning models. The framework can handle missing data features, and various forecasting metrics and point-based and probabilistic loss. The framework looks similar idea to open graph benchmark except on energy forecasting time-series.

优点

An important and impactful application

framework can handle feature engineering, missing data, common pre-processing pipeline
can be customized to integrate more models and loss functions. designed to handle custom loss functions.
multiple evaluation metrics, loss, public datasets have been integrated.

缺点

The work is a very timely application for energy forecasting. The reviewer has some concern regarding the results and reproducability on new datasets and integration of new models.

Comments:

Are the model features been handled as extrinsic and intrinsic features?, e.g. Weekday vs weekend, #occupancies, static features
If the framework code can follow OOP format to aggregate all models, would be more convenient. I was looking for the base model in the repository, could not find it. Not sure if OoP is integrated.

E.g., MyQuantileModel can be used as base model and all different deep-learning models can inherit the base model, forward traning can be same as the base model or can be overridden.
Similar thing can be done for loss functions.

Is there a way to customize hyperparameters like time-lag, forecast horizon,
A json config file for chyperparameters, and parameters selction for model and datasets would be a useful integration.
Can this repo be usable and reproducable for evaluating other time-series datasets? What things to be fixed for such cases?
pinball Loss are extremely high, which is unususal.
- How are the train and test datasets selected, is the test on unseen domain or same domain data?
- how are the training data for building-level and aggregated level? Seems building level data is not multivariate forecasting. Are each building data trained through different models? Would be easier if authors can clarify on train and test dataset selection?
- Can you show the forecasted vs ground-truth comaprison plot for different models?
The repository dont seem to include result plots for visualizing training and test forecast vs ground-truth comparison plots, etc. These plots are also missing in the paper.
As the framework is designed to handle both building level and aggregated performance, showing some analysis plots on the performance of building vs aggregation would be convincing to understand the models capability.

问题

Please see the comments in the weakness section.

伦理问题详情

The paper seem to have used all public datasets.

审稿意见

评分: 5置信度: 42024-11-03

The paper introduces an open-source package and benchmark specifically designed for energy forecasting, addressing the challenges in the field such as the significant impact of external factors like temperature and calendar variables. The authors highlight the differences between energy forecasting and general time series forecasting, emphasizing the need for targeted feature engineering and custom loss functions that consider the costs associated with power grid dispatch. They provide a modular framework that includes data pre-processing, feature engineering, forecasting methods, post-processing, and evaluation metrics. Additionally, they release a high-quality renewable energy dataset with corresponding meteorological data to facilitate research in this area.

优点

In this work, the authors develop an open-source forecasting package for energy forecasting. They also provide a dataset that includes different types of renewable energy generation and key meteorological factors.

缺点

While the authors claim that energy forecasting is different from general time series forecasting, they do not provide a detailed explanation of these differences.

The feature engineering techniques are heavily centered on temperature and calendar variables. Other external factors that might affect energy forecasting, such as economic indicators or unexpected events (e.g., pandemics), are not extensively explored.

The effectiveness of the proposed temperature-calendar feature engineering and custom loss function may be dataset-specific.

The custom loss function is derived from simulations using an IEEE 30-bus test system. Due to its simplified structure and limited scale, this system may not fully capture the complexity and variability of real-world power networks, limiting the generalizability of the results to more complex systems.

While the feature engineering strategy is well-motivated, the paper could provide more justification for its generalization across different datasets and energy types.

The overall clarity of writing needs improvement.

问题

Please see the weakness section.

撤稿通知

2024-11-19

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.