7.1

/10

Poster5 位审稿人

最低4最高5标准差0.5

2.8

置信度

创新性2.4

质量2.8

清晰度3.2

重要性3.0

NeurIPS 2025

FuncGenFoil: Airfoil Generation and Editing Model in Function Space

Jinouwen Zhang,Junjie Ren,Qianhong Ma,Jianyu Wu,Aobo Yang,Yan Lu,Lu Chen,Hairun Xie,Jing Wang,Miao Zhang,Wanli Ouyang,SHIXIANG TANG

OpenReview PDF

提交: 2025-05-08更新: 2025-10-29

TL;DR

FuncGenFoil is a new generative model that designs and edits high-fidelity airfoils as continuous functions, outperforming prior methods in accuracy and diversity while enabling flexible, resolution-free shape generation.

摘要

关键词

Generative ModelAI for EngineeringGenerative Model in Function SpaceFlow Model

评审与讨论

审稿意见

评分: 5置信度: 32025-06-22

The paper proposes a generative model for 2D airfoil generation and editing by combining Gaussian Processes and Fourier Neural Operators through a flow-matching training strategy. It combines advantages from both parametric representations, as well as point-based representations of airfoils. Experiments show that the model surpasses baselines in terms of diversity, smoothness and label error in the conditional airfoil generation task. Further, the model allows for constrained editing of the airfoil.

优缺点分析

Strengths

The paper is clearly written and well structured. Experiments seem to answer the asked research question and show the advantages of the model. Especially the aerodynamic simulation of generated airfoils highlights that the model is able to generate useful geometries. Ablation studies further underline certain design choices and make them transparent.

Weaknesses

For the conditional airfoil generation task, only one baseline is provided for the smaller UIUC and Super datasets. I see that the best performing baseline on AF-200K was used for the other datasets as well, but given that the other datasets are way smaller in terms of the number of samples available, it would be interesting to see how at least one more baseline model copes with the limited data regime (that is usually the big constraint in this domain).

The model seems to have issues learning angles as the label error on the PARSEC parameters $p_{10}$ and $p_{11}$ are quite large relative to baselines. Could you elaborate on that?

The experiment on airfoil editing shows that the model is able to incorporate user constraints. It seems like the editing scales and perturbations during editing are rather small. It would be interesting to see how robust the model is when asked for larger edits.

问题

Is there a reason that only PK-DiT is provided as baseline for the UIUC and Super datasets?
Could you provide some visualizations of generated airfoils by the model, maybe also for the baselines?
As written above, could you elaborate on the model performance on the error of the PARSEC angle parameters, in particular give an explanation why the model struggles?
How does the model handle large edits? Is it robust?

局限性

Limitations are provided, but only in the appendix. Please move them to the main text.

最终评判理由

The paper can be recommended for acceptance. All points raised in my review were addressed in the rebuttal which lead to me raising the score.

格式问题

no concerns.

作者回复

2025-07-30

Thanks for your feedback and suggestions! I'm glad you appreciated our work.

Here are some responses to your questions and concerns:

For the conditional airfoil generation task, only one baseline is provided for the smaller UIUC and Super datasets. I see that the best performing baseline on AF-200K was used for the other datasets as well, but given that the other datasets are way smaller in terms of the number of samples available, it would be interesting to see how at least one more baseline model copes with the limited data regime (that is usually the big constraint in this domain). Is there a reason that only PK-DiT is provided as baseline for the UIUC and Super datasets?

Thank you for your warm suggestion. The original AFBench paper did not report information except for the PK-DiT model, so only PK-DiT was included in Table 1 for the UIUC and Super datasets. However, we trained the PK-VAE model using the original AFBench code and same settings. We have now added new comparison experiments, and the results are as follows:

Dataset	Algo.	$\sigma_1$	$\sigma_2$	$\sigma_3$	$\sigma_4$	$\sigma_5$	$\sigma_6$	$\sigma_7$	$\sigma_8$	$\sigma_9$	$\sigma_{10}$	$\sigma_{11}$	$\bar{\sigma}_a$	$\bar{\sigma}_g$	$\mathcal{D} \uparrow$	$\mathcal{M} \downarrow$
UIUC	PK-VAE	80.7	20.9	12.2	12843	36.9	14.0	37263	1.7	1.9	94.8	109.9	4589	69.1	-93.5	7.29
Super	PK-VAE	10.8	17.5	2.3	1735.6	12.1	3.2	8131.5	3.5	1.4	98.3	80.6	917.9	28.4	-122.8	1.38

FuncGenFoil outperforms both PK-VAE and PK-DIT on the UIUC and Supercritical datasets, even in the relatively limited data regime.

The model seems to have issues learning angles as the label error on the PARSEC parameters and are quite large relative to baselines. Could you elaborate on that? Could you elaborate on the model performance on the error of the PARSEC angle parameters, in particular give an explanation why the model struggles?

This is because we changed the definition of the trailing-edge angle, as explained in Appendix A.2. In this work, we perform super-resolution inference, increasing the resolution from 257 to 1025 points, as shown in Table 2. We found that the traditional method for calculating the trailing-edge angle is not suitable or numerically stable at higher resolutions, so we updated the definition in our dataset.

This new definition is not sensitive for the supercritical airfoil dataset (which is the main cluster for modern commercial aircraft design), but it can introduce additional noise and linear regression error for other datasets. As a result, FuncGenFoil shows a higher label error on these particular parameters compared to baselines. However, this deterioration in performance is generally not significant on average, and it has little practical impact on the use of the generated airfoils. The trailing-edge angle can typically be changed very rapidly by adjusting the position of only a small number of design points by a small amount, leaving the main shape of the airfoil largely unchanged, which may suggest that the FuncGenFoil model is focusing more on the overall shape of the airfoil rather than the trailing-edge angle.

The experiment on airfoil editing shows that the model is able to incorporate user constraints. It seems like the editing scales and perturbations during editing are rather small. It would be interesting to see how robust the model is when asked for larger edits. How does the model handle large edits? Is it robust?

Here we provide results for three additional editing scales under the same experimental setting, with the largest being 8 times greater than the original. As shown, the edit error increases at an accelerating rate with larger edits but remains relatively low overall. Given that the average thickness of an airfoil is about 0.1, our editing method remains effective for edit ranges from -5% to 5% under a conservative estimate. In practice, by increasing the finetuning steps and choosing suitable time steps, even larger edit scales or denser constraints are possible, as demonstrated in Figure 4.

Dataset	Edit Scale	$\text{MSE} \downarrow (1e^{-7})$	$\mathcal{M} \downarrow (1e^{-2})$
Super	0.0001	2.41	1.16
	0.0002	2.45	1.15
	0.0004	2.75	1.15
	0.0008	4.32	1.26
	0.0016	15.5	1.35
	0.0032	61.7	1.49

Could you provide some visualizations of generated airfoils by the model, maybe also for the baselines?

Thank you for your suggestion. Due to the new NeurIPS 2025 policy, we are unable to upload PDFs or share external links at this stage. If the paper is accepted, we will include visualizations of the generated airfoils by our model and the baselines both on the public project page and in the appendix of the camera-ready version.

2025-08-04

Many thanks for the rebuttal. My points were answered and I think the paper can be recommended for acceptance. Therefore, I am raising my score to 5.

审稿意见

评分: 4置信度: 32025-06-23

Review of Submission #9911

Summary

The paper introduces FuncGenFoil, a generative model to generate airfoil designs/geometries. It is a Flow Matching model with a Fourier Neural Operator (FNO) backbone and is trained on large, open-source datasets (AF-200K, UIUC, Super). FuncGenFoil allows both conditional generation and "freestyle editing" of geometries. The paper shows that FuncGenFoil outperforms previous deep-learning-based approaches (GANs, Diffusion models) in the conditional generation task. Ablation studies justifying architecture and training/generation choices are provided.

优缺点分析

Strengths

Good introduction and related work section, introducing the problem setting (airfoil design for the aviation industry) and previous works on this topic, both "classical" computational design (e.g., NURBS, Bézier curves) and deep learning (VAE, GANs, Diffusion) approaches. Good mathematical notation is kept throughout the paper.
Clear training description: Flow matching based on [1], Gaussian Processes as noise distribution, conditioning on geometric parameters (PARSEC) with an explanatory Figure.
Clear generation/editing description: Inference with ODE solver and finetuning the model for editing is well explained and shown in the Figure.
Well-written experimental section, detailing the used metrics, training, and tasks/comparisons.
Extensive comparison with other DL approaches (VAE, GANs, Diffusion) for conditional generation.

Weaknesses

The limitation of classical methods is not really explained; it is just said that picking certain function families limits the design space. But how strong of a limitation is this in practice? Furthermore, neural networks are not random functions [2]. So there is also a clear bias in the shapes/geometries neural networks will learn, depending on the architecture, activation functions, etc, which is well explored in the neural implicit shape and geometry processing literature. We are simply using a different class of function family, but to me it is not clear what underlying problem the authors are trying to solve with this. It looks like just applying a different function class to the problem. I would ask for clarification from the authors on this point.
This point follows from W1: while the authors provide extensive comparisons to previous DL methods, they provide none to other "classical" computational design methods. These should serve as a baseline for the data-driven approaches.
No comparison methods for the airfoil editing task. Also, the edited airfoils from "Freestyle Airfoil editing" are not being tested for their aerodynamic performance. Instead, a new model is trained on a different dataset to test this. It is not clear to me why, please clarify.
No clear overview of the computational cost of dataset generation, training, finetuning (for editing), and inference is provided. Recent review articles of DL methods for numerical problems (PDEs, simulation, shape/topology optimization) have raised the issue that the full computational costs of DL methods/surrogates are often not provided and compared to SoTA numerical methods (see e.g.[3]). The paper gives the training time, but this should be more thorough.
I find Figure 4 a bit hard to read/process; the color choices (light green, light blue, light gray) seem suboptimal. Just a minor point.

[1] LIPMAN, Yaron, et al. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024. [2] TENEY, Damien, et al. Neural redshift: Random networks are not random functions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. S. 4786-4796. [3] WOLDSETH, Rebekka V., et al. On the use of artificial neural networks in topology optimisation. Structural and Multidisciplinary Optimization, 2022, 65. Jg., Nr. 10, S. 294.

问题

How does the full cost of FuncAirGen compare to a classical method using e.g. NURBS? What is the advantage of FuncAirGen? Are there failure cases of classical methods that it can fix? Are there failure cases of FuncAirGen?
The paper makes the point that FuncAirGen enables superresolution with smooth, arbitrary resolution shapes. Is this smoothness not a direct consequence of the FNO backbone, which approximates the data with a finite Fourier representation of the signal (geometric curve)? Is the size of the FNO sum a limit/upper bound of FuncGenFoil similar to parametric-model-based methods?
Why use a FNO backbone instead of typical INRs like SIREN? The paper mentions it is possible in principle, but does not elaborate on this choice further. FNOs are not commonly used in neural geometry processing literature.

局限性

See Weaknesses/Questions

最终评判理由

The main reason for me raising the score are the data range table which shows when the model breaks and the added baseline experiment.

格式问题

作者回复

2025-07-30

Thanks for your valuable feedback and thoughtful suggestions!

We address your questions and concerns in detail, and hope these responses help clarify our work.

(a) The limitation of classical methods is not really explained; it is just said that picking certain function families limits the design space. But how strong of a limitation is this in practice? (b) Furthermore, neural networks are not random functions [2]. So there is also a clear bias in the shapes/geometries neural networks will learn, depending on the architecture, activation functions, etc, which is well explored in the neural implicit shape and geometry processing literature. We are simply using a different class of function family, but to me it is not clear what underlying problem the authors are trying to solve with this. It looks like just applying a different function class to the problem. I would ask for clarification from the authors on this point.

For question (a):

Selecting specific function families, such as NURBS, inherently limits expressiveness, as they cannot exactly represent all smooth functions (e.g., a simple sine curve), and increasing the number of control points adds significant complexity. In practice, such representations restrict the ability to fully explore the design potential of airfoils. Moreover, aerodynamic performance is highly sensitive to curvature and higher-order smoothness, especially in high Reynolds number flows. From a design perspective, it is therefore advantageous to unlock the full functional design space. The function-space generative model we proposed can represent airfoils both as point sets and continuous functions at infinite resolution.

For question (b):

Regarding the use of the Fourier neural operator instead of NURBS: as you suggested, this is indeed a choice of a different function class. Our motivation for this choice is that Fourier transforms offer theoretical elegance, as they allow us to operate in the spectral domain and recover functions via inverse transforms with minimal loss, leveraging efficient modern computational algorithms. The Fourier neural operator is also supported by theoretical guarantees as a general function approximator [1], making it a suitable and principled choice for our purposes. We agree that the exploration of alternative neural operators—potentially based on NURBS or Bézier curve —remains an open and interesting research direction.

While the authors provide extensive comparisons to previous DL methods, they provide none to other "classical" computational design methods. These should serve as a baseline for the data-driven approaches.

This paper presents a data-driven and model-based method that enables engineers to more effectively leverage existing high-quality data for airfoil generation. It relies on data.

Classical, model-free methods represent a complementary paradigm rather than a substitute. These methods have the advantage that they do not rely on existing data, but the disadvantage that they cannot extract information from data.

These two kinds of methods are complementary but not directly comparable. Non-data-driven, model-free methods should only be compared with algorithms that are also model-free. Therefore, they should not be included as baselines in our experiments focused on data-driven methods in Table 1.

(a) No comparison methods for the airfoil editing task. (b) Also, the edited airfoils from "Freestyle Airfoil editing" are not being tested for their aerodynamic performance. Instead, a new model is trained on a different dataset to test this. It is not clear to me why, please clarify.

For question (a):

To the best of our knowledge, as of the submission date, there is no existing research utilizing generative models for the Freestyle Airfoil editing task. We are the first to integrate conditional airfoil generation and freestyle airfoil editing within a unified, consistent framework, achieving highly accurate editing results.

For question (b):

For the airfoil editing task, our primary focus is on whether the model accurately follows the designer’s instructions (i.e., achieves low editing error) while preserving airfoil smoothness. Consequently, we do not evaluate aerodynamic performance for edited airfoils in this paper, as there is no well-established benchmark for comparison. For aerodynamic performance evaluation, we use the NASA Common Research Model (CRM) dataset, a widely recognized benchmark with validated working conditions and extensive use in CFD and aerodynamic studies.

No clear overview of the computational cost of dataset generation, training, finetuning, and inference is provided. Recent review articles of DL methods for numerical problems have raised the issue that the full computational costs of DL methods/surrogates are often not provided and compared to SoTA numerical methods (see e.g.[3]). The paper gives the training time, but this should be more thorough.

We will include a table summarizing the computational costs, memory usage, and runtimes in the paper. Regarding dataset generation time, we are unable to provide this information as we only utilize publicly available datasets created by other researchers.

	Wall-clock for 1 000 epochs (RTX 4090)	# NFEs at test time	Mean inference time (RTX 4090 + i9-13900K)	GPU memory at test
PK-DIT (score matching)	≈ 10 h	50 (DDIM)	220 ms	≈ 200 MB
FuncGenFoil (flow matching)	< 6 h	10	50 ms	≈ 200 MB

I find Figure 4 a bit hard to read/process; the color choices (light green, light blue, light gray) seem suboptimal. Just a minor point.

Thank you for pointing this out. We will improve the color choices in Figure 4 and select a palette with higher contrast to enhance readability.

How does the full cost of FuncGenFoil compare to a classical method using e.g. NURBS? What is the advantage of FuncGenFoil? Are there failure cases of classical methods that it can fix? Are there failure cases of FuncGenFoil?

For an aircraft engineers, designing an airfoil using traditional NURBS methods requires several minutes of manual work and CAD expertise. Or, using a model-free optimization method takes about 5–10 minutes and thousands iterations to fit a 24-control-point NURBS airfoil, depending on the initial shape condition. In contrast, FuncGenFoil can generate a new airfoil in about one second.

Classical methods may fail when control points are too close together or too few in number, leading to numerical instability and shape errors—particularly near the leading edge where control points are densely clustered. FuncGenFoil avoids these issues by not relying on control points.

For FuncGenFoil, failures typically occur when generating airfoils far outside the training data range (e.g., specifying an unrealistic leading-edge radius, too small or too large).

The paper makes the point that FuncGenFoil enables superresolution with smooth, arbitrary resolution shapes. Is this smoothness not a direct consequence of the FNO backbone, which approximates the data with a finite Fourier representation of the signal (geometric curve)? Is the size of the FNO sum a limit/upper bound of FuncGenFoil similar to parametric-model-based methods?

The smoothness of the output is indeed influenced by both the type of operator used and the choice of Gaussian process kernel during operator flow matching. Since FuncGenFoil operates as both a parametric-model-based and point-based method, smoothness and resolution are primarily determined by the number of modes specified for the neural operator. Increasing the number of modes allows the model to capture more information from the data, reducing truncation error and enabling finer control over shape details. While, in theory, a Fourier transform (and its inverse without spectrum truncation) can represent smooth curves with zero error, in practice, capturing high-frequency information may require increasing the model size and mode width. In this paper, performance differences between various model sizes are minor, as demonstrated in Table 8.

Why use a FNO backbone instead of typical INRs like SIREN? The paper mentions it is possible in principle, but does not elaborate on this choice further. FNOs are not commonly used in neural geometry processing literature.

To model vector operators in infinite-dimensional Hilbert spaces, several options are available, including INRs and neural operators [1]. INRs are a good choice for neural geometry processing due to their ability to perform arbitrary point inference. However, when using INRs in generative models in function space, you must specifically design a functional encoder for function-type inputs to ensure the model is invariant to different resolutions. For example, as a network for the velocity operator $dx=v_{\theta}(x)$ , both input $x$ and output $dx$ can be of any resolution; the original INR can only guarantee the output being of infinite resolution, but not the input. This encoder in the INR should act as an operator in the mathematical sense, since an operator is a transformation between functions and is naturally input-resolution invariant.

Neural operator is widely adopted in AI for science and engineering [1], particularly for solving PDEs and in computational fluid dynamics applications such as weather forecasting [2]. Given their strong theoretical foundation and proven effectiveness in modeling complex physical phenomena, FNO is especially suitable for aerodynamic modeling and research, so we chose the neural operator as the backbone for our approach.

[1]Neural operators for accelerating scientific simulations and design. Nature Reviews Physics, 2024.

[2]FourCastNet: Accelerating Global High-Resolution Weather Forecasting Using Adaptive Fourier Neural Operators. Proceedings of the Platform for Advanced Scientific Computing Conference. 2023

2025-08-04

Qs to NURBS: Thanks for the explanation. I would encourage you to add a clearer explanation of the drawbacks of classical representations and what real limitations for users/practitioners they impose in the final version.

Qs to FNO: I'm still kinda interested to see what happens for other backbones, but it is out-of-scope here. Thanks for the detailed response/references here.

Table with computational costs: Thanks for providing this.

Freestyle editing: If there is not established framework to test these editing tasks it's hard to provide numbers, makes sense.

Failure cases: I believe it would strengthen the submission significantly if there was at least an appendix section quantifying the range in which the model works and when it breaks down, e.g. what the limits are on the leading-edge radius.

Classical baseline: This remains a major point of disagreement for me. Yes, many data-driven / deep learning based methods do not include classical baselines and of course there are other tradeoffs to consider like high computational costs. But I believe they are needed to put data-driven methods and their advancements into perspective. Failing to do this has caused large issues in research before [4]. I can't recommend acceptance if this isn't provided as I believe it is absolutely vital.

[4] McGreivy, Nick, and Ammar Hakim. "Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations." Nature machine intelligence 6.10 (2024): 1256-1269.

评论- Further Explanation to Official Comment by Reviewer G9rd

2025-08-04

Thank you for your valuable feedback. We are glad to see that you found our explanations helpful, and we sincerely appreciate your further suggestions.

We apologize for not fully elaborating on some topics in our previous response due to NeurIPS’s new 10,000-word response limit, which may have led to some misunderstandings.

Qs to NURBS.

Thank you for the suggestion. We will add a detailed discussion of the limitations of classical representations in a separate appendix section in the final version of the paper.

Failure cases.

Quantifying the range in which the model works and identifying when it breaks down is helpful for users. We provide a statistical analysis of the dataset and specify the effective range where the model is fully supported by the data. We will include this table in the appendix, and we will provide figures showing failure cases in the final version of the paper.

Index	Parameter	Range (minimum)	Range (maximum)
1	leading edge radius	0.0073	0.0140
2	upper crest position x	0.396	0.520
3	upper crest position y	0.0592	0.0784
4	upper crest curvature	-0.458	-0.210
5	lower crest position x	0.318	0.410
6	lower crest position y	-0.0589	-0.0414
7	lower crest curvature	0.373	0.805
8	trailing edge position	-0.0001	0.0001
9	trailing thickness	0.0020	0.0075
10	trailing edge angle up	-0.5514	-0.2254
11	trailing edge angle down	-0.4477	-0.1397

Classical baseline: ... Failing to do this has caused large issues in research before [4].

We agree on your concerns about the lack of classical baselines leading to significant shortcomings in previous research—particularly in applying these methods to PDEs, where classical methods are the gold standard as baselines. However, this concern does not apply to our work because our problem setting differs fundamentally from those discussed in [4].

Our work does not use data-driven methods to solve PDEs. Instead, we employ a generative model to assist experts in designing airfoil geometries. This is a generation problem and does not have unique solutions, e.g., there are many design solutions for a 100-meter ship. It is analogous to training a Stable Diffusion model on a dataset of artistic images—after training, the model can create new, high-quality images that are distinct from those in the training set. Direct comparison to a “classical baseline” like a human artist is inappropriate, as the model serves as a creative tool for human designers.

In practice, implementing airfoil generation is complex. It is challenging to identify a classical method for this task, since no classical approach can generate a valid airfoil solely from these 11 parameters fairly. We clarify this with the following possible approaches:

Case A: A parametric curve (e.g., NURBS) with 24 or more control points. Control points are randomly initialized, and optimization algorithms adjust them to fit the 11 parameters. The resulting shape is not guaranteed to be a valid airfoil, as the 11 parameters do not fully define the geometry. The mapping from airfoil to parameters is not reversible—the curve contains more information than the parameters.

Case B: Similar to Case A, but control points are initialized from an existing airfoil designed by an expert. Optimization proceeds as before, and the resulting shape is more likely to be valid. However, using a similar airfoil as the starting point is not a fair baseline, and it is impractical to find an appropriate initialization for every airfoil in a dataset.

Case C: This is the standard engineering approach. Start with an airfoil and use CAD to iteratively adjust features such as the leading-edge radius or crest positions. Each step eval with software (e.g., Ansys CFX), and repeat until requirement is met. This is the actual dataset creation process -- like an artist revising a painting until satisfied. The task is generative and iterative, not a problem of static solution.

It is not easy to give practical baseline for it. For comparison, image generation models like Stable Diffusion also lack clear classical baselines for datasets such as ImageNet.

In contrast, when solving PDEs, the solution is unique given a set of boundary and initial conditions, using the same numerical methods, RANS or Large Eddy Simulation. In these cases, a real classical method does exist. So, it is vital to compare the data-driven methods with classical methods under a fair setting, where the classical method uses the actual equations and the data-driven method uses data from the real world.

This is why we cannot provide a classical method as a baseline.

We apologize again for any misunderstanding caused by our previous response due to the word limit. If you have any further questions or concerns, please feel free to let us know; we are willing to discuss further.

评论- Further Explanation to Official Comment by Reviewer G9rd (Part 2)

2025-08-05

To better demonstrate the performance of classical (model-free and non-data-driven) methods, we conducted an experimental test implementing the plan of Case B memtioned in our last response. In this approach, a NURBS curve with 24 control points is used to represent the airfoil $x$ . We rewrote a differentiable evaluator $c=F(x)$ to enable gradient-based optimization, aiming to minimize the gap between the evaluated 11 parameters $c$ and the required 11 parameters $c_{\text{req}}$ , as measured by $\text{MSE}(c, c_{\text{req}})$ .

For the experimental settings, we initialized the 24 control points to fit the average airfoil in the test dataset. The optimization was performed sequentially for all 3,825 supercritical airfoils in the test dataset. For each airfoil, we ran 100 steps of the Adam optimizer with a learning rate of $3\times10^{-4}$ , ensuring that the MSE did not decrease and that the optimization converged. The total computation cost was approximately 20 GPU-hours on an Nvidia A800 GPU.

The resulting errors are summarized in the following tables:

Dataset	Algo.	$\sigma_1$	$\sigma_2$	$\sigma_3$	$\sigma_4$	$\sigma_5$	$\sigma_6$	$\sigma_7$	$\sigma_8$	$\sigma_9$	$\sigma_{10}$	$\sigma_{11}$	$\bar{\sigma}_a$	$\bar{\sigma}_g$
Super	NURBS fitting	10.8	17.4	4.5	64.5	9.79	3.56	113.2	1.53	1.72	0.065	0.106	20.6	3.97
Super	FuncGenFoil	0.71	8.23	0.13	201.3	4.72	0.12	174.2	0.09	0.14	34.2	36.7	41.9	3.08

Since we initialize the optimization from a position very close to the target in the test dataset, this baseline can be considered the optimal performance that a naive 24-control-point NURBS curve can achieve for fitting the design parameters. Compared to Table 1, the FuncGenFoil method closely approaches this upper bound in terms of accuracy.

The NURBS fitting baseline performs exceptionally well in controlling the trailing edge, as the control points are densely distributed in this region. However, FuncGenFoil demonstrates superior performance at the leading edge, since the NURBS fitting baseline tends to turn sharply at the leading edge and cannot fit it accurately with only 24 control points.

In general, this is not a fair comparison because we initialize the control points very close to the target for training stability, making convergence relatively easy. In contrast, FuncGenFoil or PK-DIT generate new airfoils starting from pure Gaussian noise. (We have observed that the optimization process is delicate and can easily become unstable if the initial control points are perturbed more randomly, causing the control points to diverge after just a few training steps.)

Nevertheless, this experiment provides a useful reference for the potential best performance of a model-free parametric method, provided that each optimization proceeds successfully. We will open-source the code for this test in our repository. We hope this information helps address your concerns. If you have any further questions or would like more details, we welcome further discussion.

2025-08-05

Effecitve model range / failure cases:

This is exactly what I was looking for, I believe this is a great addition and makes the capabilities of your method much more clear.

Baseline I understand that shape optimization in practice is often quite tricky and can be a bit of an art (Case C). But as I understand there is previous work on shape optimization for airfoils, also with NURBS representation so I would encourage you to at least cite some of this in the intro or related work section. To me it is sufficient to have at least this basic comparison in the paper.

I thank the authors for provided clarifications and additional tables/experiments. With these additions I will increase my score and recommend acceptance.

审稿意见

评分: 5置信度: 22025-06-27

This paper presents a novel generative model for generating airfoil geometries (2D contours) based on flow matching. The proposed model includes an optional conditioning variable and supports customized editing on the airfoil geometry. The paper evaluates the proposed model on two tasks: conditional generation and freestyle editing, and compares its performance with multiple baselines on several established datasets.

优缺点分析

Strengths:

I think the paper tackles a valuable research problem through the lens of modern AI tools. Exploring airfoil geometry designs may have a significant impact, especially if the proposed AI method can discover novel geometries with high-performance aerodynamics.
The technical method proposes a generative model in a function space, rather than the standard Bezier-curve shape space used in airfoil design. This is a novel and timely exploration of diffusion and flow models in this specific problem domain.
I appreciate that the experiments also include aerodynamic simulations of the shapes generated by the proposed generative model. This makes the proposed method more useful in practice.

Weaknesses:

The paper does not present a hardware prototype to confirm the physical fidelity of the results. For example, whether the airfoil geometry proposed by the generative model is more useful than existing designs remains unclear. This is probably beyond the scope of this work, though.

问题

Overall, the paper appears to be in good shape, and I do not have major concerns with it at this time. I am not an expert in airfoil design, though.

That said, I still have a few questions:

I didn’t find any discussions on the paper’s limitations in the main paper. Please discuss.
Following up on the weakness I mentioned above, is there a way to validate the performance of these explored airfoil designs in the real world?
I’d also like to ask about the potential of this method for 3D airfoil design.

局限性

The paper seems to lack discussions on its limitations. I suggest the authors add a paragraph about it.

最终评判理由

Overall, I think this is a good paper. The problem scope may be niche for a general ML conference, but the solution is interesting and novel, and the downstream application is valuable.

The rebuttal has adequately answered the weaknesses and questions raised in my original review. I am happy to maintain my original score and recommend acceptance.

格式问题

None.

作者回复

2025-07-30

Thanks for your feedback and suggestions! I'm glad you appreciated our work.

Here is the response to your several concerns:

The paper does not present a hardware prototype to confirm the physical fidelity of the results. For example, whether the airfoil geometry proposed by the generative model is more useful than existing designs remains unclear. This is probably beyond the scope of this work, though. Is there a way to validate the performance of these explored airfoil designs in the real world?

Yes, this work is part of a generative design module that has been integrated into modern commercial aircraft design systems in collaboration with aerodynamic experts. Due to commercial confidentiality, we are unable to disclose details regarding the real-world performance of generated or optimized airfoils deployed in industry. Instead, our study utilizes public datasets, on which the proposed techniques demonstrate similar performance trends. As shown in Figure 5, the simulation results for the generated airfoils exhibit trends that closely align with those observed in real-world applications.

I didn’t find any discussions on the paper’s limitations in the main paper. Please discuss.

We apologize for not including a discussion of the paper’s limitations in the main text due to the 9-page length restriction. However, we have provided this discussion in Appendix E following the main text.

The primary limitation of FuncGenFoil is that it is applicable only to curves or surfaces that can be parameterized with a suitable coordinate system. For more complex surfaces, where defining such a coordinate system is challenging, FuncGenFoil is not directly applicable. Addressing these challenges will require further theoretical advancements, which we consider an important direction for future research. In addition, we should introduce more kinds of real constraints to the model, such as mechanical constraints and manufacturing constraints, to enhance the model's applicability to real-world engineering problems.

I’d also like to ask about the potential of this method for 3D airfoil design.

FuncGenFoil can be applied to a variety of aircraft component surfaces, such as fuselages, engine nacelles, and turbomachinery blades, as long as a suitable coordinate system can be defined for the geometry. For example, we can generate the 3D main wing surface of an aircraft using a larger dataset of 3D airfoil geometries under cylindrical coordinate system.

For more complex and irregular shapes—such as full aircraft surfaces or automotive bodies—where it is difficult to define a parameterization or coordinate system, further advances in differential geometry and manifold theory would be needed to enhance the FuncGenFoil method. For example, generative models could be used to reconstruct the deformation of general shapes parameterized in geodesic coordinates. This remains an open and challenging problem, and is certainly a promising direction for future research.

评论- Rebuttal response

2025-08-03

I would like to thank the authors for their rebuttal, which has addressed the questions in my review. I'm happy to maintain my positive score.

It's good to know your work's connection to commercial aircraft design systems.
Yes, please include them in the main paper.
Thank you for the clarification. I agree with this work's potential in 3D designs.

审稿意见

评分: 4置信度: 32025-07-03

This paper focus on an important task to generate airfoil that satisify the requirement for several key parameters. Instead of generating hyperparameters to define the airfoil shape or its discrete boundary points, this paper proposes representing the airfoil as a closed-form function. Then, with the With the usage of operator flow matching and sampling from guassian distribution, it generates the function for the target airfoil step by step.

优缺点分析

Strength:

The experimental results achieve state-of-the-art performance on AFBench, with a significant boost in accuracy. As shown in Table 1, the error is reduced by orders of magnitude.
The writing of this paper is clear and easy to follow. The idea of representing the airfoil as a closed-form function is well-motivated. Due to this, the method can produce the arifoil with any resolutions.

Weakness:

A more thorough efficiency study of the proposed method should be provided to better demonstrate its advantages. In particular, a comparison with discrete point-based generation methods would be valuable. Intuitively, the proposed approach should offer efficiency gains over these methods, and it would strengthen the paper to provide evidence supporting this.

问题

N/A.

局限性

N/A.

最终评判理由

The generation of airfoil is an important task and this submission. And the idea of operator flow matching provides a great performance on the current benchmarks.

格式问题

N/A.

作者回复

2025-07-30

Thanks for your feedback and suggestions! I'm glad you appreciated our work.

Here is the response to your concerns:

A more thorough efficiency study of the proposed method should be provided to better demonstrate its advantages. In particular, a comparison with discrete point-based generation methods would be valuable. Intuitively, the proposed approach should offer efficiency gains over these methods, and it would strengthen the paper to provide evidence supporting this.

We agree that a more thorough efficiency study is beneficial. In short, both PK-DIT and FuncGenFoil have comparable model sizes and utilize MSE-type losses (flow matching for FuncGenFoil, score matching for PK-DIT), resulting in similar computational costs during training. On my current setup, which includes an NVIDIA 4090 GPU, PK-DIT can be trained for 1000 epochs on supercritical datasets in approximately 10 hours, whereas FuncGenFoil completes 1000 epochs in less than 6 hours, benefiting from the modern dataloader provided by the TorchRL library. As for AFBench dataset, it takes around 10 times longer to train both models as the dataset is larger and more complex, but the relative performance remains similar.

For inference, PK-DIT incurs a higher computational cost due to the use of a large number of function evaluations (NFE = 50) with the DDIM sampling scheme in their original Github implementation. This could be improved by adopting more advanced diffusion inference techniques such as DPM-Solver++. In contrast, FuncGenFoil, as a flow matching model, generates flows directly and requires only 10 NFEs to achieve competitive performance.

Both PK-DIT and FuncGenFoil are lightweight and can be efficiently run on consumer GPUs. On a platform with an NVIDIA 4090 and Intel 13900K, the average inference time is 220 ms for PK-DIT and 50 ms for FuncGenFoil. GPU memory usage for both models is around 200 MB, which is quite small. We will include a table summarizing the exact computational costs, memory usage, and runtimes in the appendix if the paper is accepted.

评论- Reply to the Authors

2025-08-06

Thanks for the author's effort to do the rebuttal, and my concerns are addressed.

审稿意见

评分: 4置信度: 32025-07-03

The paper presents FuncGenFoil, a novel generative model for airfoil design that operates in function space rather than traditional discrete or parametric representations. By leveraging Fourier Neural Operators (FNO) and Flow Matching, FuncGenFoil enables:

Resolution-free generation of smooth and continuous airfoil shapes.
Flexible conditional generation, guided by geometric parameters.
Freestyle editing capabilities via a MAP-based fine-tuning procedure.

The method demonstrates significant improvements in label error, diversity, and smoothness over existing state-of-the-art baselines (e.g., CVAE, CGAN, PK-DIT), across multiple datasets (AF-200K, UIUC, Supercritical Airfoil). The model also generalizes well to different sampling resolutions and shows physically plausible aerodynamic performance through CFD simulation.

优缺点分析

Strength

Novelty and Technical Innovation: Modeling airfoils as continuous functions in function space, rather than point clouds or fixed parametric families, is a substantial conceptual advancement. Avoids limitations in discrete point set expressiveness and parametric function rigidity. Uses Flow Matching with Neural Operators to support infinite-dimensional generative modeling. Supports conditional generation and geometric editing through a unified framework.
Thorough Experiments: Strong improvements in label accuracy (e.g., 74.4% reduction on AF-200K), diversity, and smoothness. Extensive ablation studies (kernel choice, ODE solvers, Fourier modes). Evaluation includes classical metrics and physics-based simulation.

Weaknesses

Complexity and Accessibility: The method requires a sophisticated combination of operator learning, Gaussian processes, and differential solvers, which may hinder reproducibility.
Generality of Shape Types: Although the method is proposed in a general form, the experiments are restricted to airfoils and wings, which are relatively smooth and regular.
Runtime Efficiency: While performance is strong, training involves millions of iterations and inference requires solving ODEs, which may be costly in time or resources.

问题

Generality Beyond Airfoils: Have you considered applying FuncGenFoil to more general or irregular geometries (e.g., full aircraft surfaces or automotive shapes)?
Editing Scope and Flexibility: How does the method handle conflicting or dense constraints (e.g., multiple overlapping or incompatible edits)?
Scalability and Efficiency: What are the training/inference times compared to point-based models like PK-DIT or diffusion models? It is suggested to include a table comparing computational costs, memory usage, and runtime to help readers assess deployment feasibility.
Design Parameter Interpretability: Can the latent space or generated airfoils be interpreted in terms of classical aerodynamic design principles?

局限性

yes

最终评判理由

During the rebuttal process, the authors' responses have addressed most of my concerns. I remain generally positive about the paper, and my final decision is Weak Accept. However, I did not choose Accept because I believe the overall quality and potential impact on the community are somewhat limited.

格式问题

No major formatting issues.

作者回复

2025-07-30

Thanks for your feedback and suggestions! I'm glad you appreciated our work.

Here are some responses to your questions and concerns:

Complexity and Accessibility: The method requires a sophisticated combination of operator learning, Gaussian processes, and differential solvers, which may hinder reproducibility.

First, we have included the complete code in the supplementary material (zip file) and are committed to open-sourcing our work to ensure reproducibility. Second, our training code is built upon several widely-used open-source packages and maintains a manageable level of complexity. Specifically, our methods are implemented using a standard neural operator library [1]. Notably, we introduce a novel and tractable loss function to model functional distributions over objectives and constraints, which regularizes model outputs in function space. Additionally, we employ a learnable prior during fine-tuning, which adds some complexity compared to other generative model frameworks.

Generality of Shape Types & Generality Beyond Airfoils: Although the method is proposed in a general form, the experiments are restricted to airfoils and wings, which are relatively smooth and regular. Have you considered applying FuncGenFoil to more general or irregular geometries (e.g., full aircraft surfaces or automotive shapes)?

Yes, FuncGenFoil can theoretically be applied to a variety of aircraft component surfaces, such as fuselages, engine nacelles, and turbomachinery blades, as long as a suitable coordinate system can be defined for the geometry. In this paper, we focus on airfoil objects as they are important for aircraft systems lift and drag performance.

Editing Scope and Flexibility: How does the method handle conflicting or dense constraints (e.g., multiple overlapping or incompatible edits)?

For conflicting constraints, the model does not strictly satisfy all conditions but instead seeks a reasonable airfoil design that balances errors between conflicting objectives. This trade-off is quantitatively determined by the scalar value of the associated penalty term.

For overlapping or repeated constraints, the model remains effective. Technically, the multiple-constraint loss continues to perform robustly. As shown in Figure 4 and described by Equation 8, even with dense constraints—covering up to 60% of all points—repeating the constraints leads to editing errors of less than 10e−6, which is negligible.

Runtime Efficiency: While performance is strong, training involves millions of iterations and inference requires solving ODEs, which may be costly in time or resources.

On an NVIDIA 4090 GPU, PK-DIT can be trained for 1000 epochs on the supercritical datasets in approximately 10 hours, while FuncGenFoil completes 1000 epochs in under 6 hours. This efficiency is attributed to our use of the modern flow matching training scheme, which not only accelerates convergence but also improves performance.

For inference, PK-DIT incurs a higher computational cost due to the use of a large number of function evaluations (NFE = 50) with the DDIM sampling scheme. This can potentially be reduced by employing more advanced diffusion inference techniques, such as DPM-Solver++. In contrast, FuncGenFoil, as a flow matching model, directly generates flows and requires only 10 NFEs to achieve competitive performance.

Both PK-DIT and FuncGenFoil are lightweight models that can be efficiently executed on consumer GPUs. On a system equipped with an NVIDIA 4090 and Intel 13900K, the average inference time is 220 ms for PK-DIT and 50 ms for FuncGenFoil, with GPU memory usage for both models remaining around 200 MB.

Scalability and Efficiency: What are the training/inference times compared to point-based models like PK-DIT or diffusion models? It is suggested to include a table comparing computational costs, memory usage, and runtime to help readers assess deployment feasibility.

Model	Wall-clock for 1 000 epochs (RTX 4090)	# NFEs at test time	Mean inference time (RTX 4090 + i9-13900K)	GPU memory at test
PK-DIT (score matching)	≈ 10 h	50 (DDIM)	220 ms	≈ 200 MB
FuncGenFoil (flow matching)	< 6 h	10	50 ms	≈ 200 MB

We will include a table summarizing these computational costs, memory usage, and runtimes in the the paper.

Design Parameter Interpretability: Can the latent space or generated airfoils be interpreted in terms of classical aerodynamic design principles?

The flow generation path acts as a bridge between the data space and the latent space. The data space consists of numerous airfoils designed according to classical aerodynamic principles by human experts, while the latent space represents a function space with a Gaussian measure. Because the generation path is reversible in both directions, every expert-designed airfoil corresponds to a unique latent function. By leveraging modern interpolation techniques such as spherical interpolation (SLERP)[3] or Latent Optimal Linear combinations (LOL)[4], we can interpolate between latent representations of different airfoils. This allows us to generate new airfoils that transition smoothly between designs, all of which adhere to classical aerodynamic design principles and exhibit similar aerodynamic performance, as illustrated in Figure 5.

[1] Jean Kossaifi and Nikola Kovachki and Zongyi Li and David Pitt and Miguel Liu-Schiaffini and Robert Joseph George and Boris Bonev and Kamyar Azizzadenesheli and Julius Berner and Anima Anandkumar. A Library for Learning Neural Operators. arXiv 2412.10354.

[2] Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David. Neural Ordinary Differential Equations. Neurips 2018.

[3] Ken Shoemake. Animating rotation with quaternion curves. In Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pp. 245–254, 1985.

[4] Erik Bodin and Alexandru Stere and Dragos D. Margineantu and Carl Henrik Ek and Henry Moss. Linear combinations of latents in generative models: subspaces and beyond. ICLR 2025.

评论- Feedback to Rebuttal

2025-08-04

Thank you to the authors for their responses in the rebuttal. Most of my concerns have been addressed by the provided response. I hope the authors can clarify these points further in the revised version.

最终决定Accept (poster)

2025-09-17

The paper introduces a generative model for producing 2D airfoil designs. The novelty lies in directly generating the geometries as function curves, rather than using parametric curves or point sets. This is achieved with an architecture that optimizes a flow-matching criterion using an FNO backbone. The model operates in function space, leveraging recent advances in diffusion and flow-based methods. It also supports airfoil editing, i.e., generating shapes while preserving user-defined sections. The approach is evaluated in two contexts: generating airfoils subject to specified constraints and performing editing tasks. Comparisons are made against several generative baselines on three datasets.

The reviewers agree on the originality of the idea—directly generating 2D curves—and on the contribution of leveraging recent advances in generative models. They acknowledge the evaluation, which demonstrates good performance relative to alternative generative methods. However, they also raised several issues and requested clarifications, including details on computational and memory complexity, efficiency, comparisons with discrete point-based methods and classical design techniques, and analysis of limitations and the effective model range.

In their rebuttal, the authors provided detailed answers to all reviewer questions and included additional experiments, comparing against baselines and analyzing the model’s range. The reviewers consider that their concerns have been resolved, and they all recommend acceptance.