6.0

/10

Poster4 位审稿人

最低4最高7标准差1.2

3.5

置信度

正确性2.8

贡献度2.8

表达2.5

NeurIPS 2024

Gliding over the Pareto Front with Uniform Designs

Xiaoyuan Zhang,Genghui Li,Xi Lin,Yichi Zhang,Yifan Chen,Qingfu Zhang

OpenReview PDF

提交: 2024-05-13更新: 2025-01-15

摘要

Multiobjective optimization (MOO) plays a critical role in various real-world domains. A major challenge therein is generating $K$ uniform Pareto-optimal solutions to represent the entire Pareto front. To address this issue, this paper firstly introduces fill distance to evaluate the $K$ design points, which provides a quantitative metric for the representativeness of the design. However, directly specifying the optimal design that minimizes the fill distance is nearly intractable due to the nested $\min-\max-\min$ optimization problem. To address this, we propose a surrogate ``max-packing'' design for the fill distance design, which is easier to optimize and leads to a rate-optimal design with a fill distance at most $4\times$ the minimum value. Extensive experiments on synthetic and real-world benchmarks demonstrate that our proposed paradigm efficiently produces high-quality, representative solutions and outperforms baseline methods.

关键词

uniform design; multiobjective optimization; algorithmic fairness

评审与讨论

审稿意见

评分: 6置信度: 52024-07-06

This paper attempts to address a challenging problem in the field of MOO: how to generate a uniform set of solutions on the PF. The authors try to directly characterize uniformity and propose fill distance to quantitatively measure it. Given that fill distance is difficult to optimize directly, the paper further introduces a surrogate problem and theoretically analyzes the relationship between the surrogate problem and the original problem. Finally, the paper presents an MOA, named UMOD, based on MOEA/D. UMOD models the PF through an MLP and adaptively adjusts perference weights by optimizing the surrogate problem. Empirical studies demonstrate the effectiveness of UMOD.

优点

This paper tackles a longstanding challenge in MOA: producing uniformly distributed solutions on the PF. Generally, this paper is well written and easy to follow.

I am really impressed by the theoretical analysis of uniformity. In the fields of MOO and evolutionary algorithms, past theoretical research has mainly focused on convergence, while theoretical analysis of distribution has long lacked progress. This paper introduces new mathematical tools for the analysis of diversity, making significant theoretical contributions to improving the uniformity of solutions. Previous MOAs typically used some mechanisms based on intuition to maintain distribution, such as crowding distance. These mechanisms lack theoretical basis and often do not perform well in practical applications. This paper is among the first to propose an MOA with theoretical guarantees for diversity, paving a new path for future research in MOO.

缺点

Although the theoretical part is impressive, I am not fully convinced that UMOD is a novel and outstanding MOA.

One of my major concerns is about the fill distance. I acknowledge that the theoretical analysis of fill distance is very helpful. However, fill distance does not seem like a novel concept in MOO; actually, it can be regarded as a variant of IGD. The proposed fill distance is $\max_{y \in T} \min_{y^\prime\in\mathbb{Y}\_K} \rho(y,y^\prime),$ and IGD is $\operatorname{mean}\_{y \in T} \min\_{y^\prime\in\mathbb{Y}\_K} \rho(y,y^\prime)$ . The difference is on that the fill distance uses $\max$ whereas IGD uses $\operatorname{mean}$ . Therefore, minimizing the fill distance is generally equivalent to minimizing IGD, on which there have been many researches in the field of MOO. Moreover, a significant advantage of IGD over fill distance is that it can reflect the overall distribution of almost all non-dominated solutions, whereas fill distance only focuses on the maximal distance. This means that almost all the solutions may contribute to IGD, but only one solution contributes to fill distance. This directly leads to a drawback of the proposed UMOD algorithm: it can only update the nearest pair of weight vectors each time, while the gradients for the others are zero. This may limit the efficiency of the UMOD algorithm. This issue will be further discussed below.

Another concern is on efficiency. Although the authors claim that efficiency is one of the main highlights of this paper (L139), I did not find results regarding efficiency, such as resource consumption, convergence curves, etc. Some critical experimental settings are also missing; for example, it does not seem to mention how many function evaluations UMOD and these baselines used. Without this key data, I cannot be convinced that the proposed algorithm is efficient. On the contrary, for the following reasons, I think UMOD might be much more expensive than the baselines.

The first reason is that the authors does not seem to take efficiency into account when designing the optimization objective. As mentioned above, the original problem (Eq. 2) and the surrogate problem (Eq. 3) proposed are both min/max optimization problems. Therefore, in each iteration, only the nearest pair of weights will be updated, while the others, which may also be disorganized, remain unchanged. This may make the weight updates very slow. It's like moving the two closest points slightly apart each time, which may require many iterations for all the points to achieve a uniform distribution. Additionally, the theorems and optimization objectives have a strong assumption that $\mathbb{Y}_K\subset T$ . This means that before the solution set converges well to the PF, the theoretical analysis does not hold, making the weight updates potentially meaningless.

The second reason is the use of neural networks. In some previous studies on expensive MOO, DL models are used because fitness evaluation often takes much more time than training a model. However, in the general MOPs discussed in this paper, to balance efficiency, it is rare for MOAs to use neural networks to simply maintain distribution. In UMOD, training the PF model requires thousands of epochs each time, and this model needs to be repeatedly trained many times. Therefore, the training process of the PF model is quite expensive. Furthermore, as mentioned earlier, this model is only effective when the solution set has already converged to the PF, which raises a question: if the solutions have already converged, why incur such a high cost to repeatedly train a neural network just to make the distribution of some solutions more uniform?

Based one these reasons and the minor ones listed below, I do not recommend accepting this paper in the current form. Since some of these problems may be easy to fix, I will increase my rating if some of concerns are adequately addressed.

问题

P1, Line 30. The dimension of PF can be considered as $m-1$ , provided that PF is not a degenerate one. However, the dimension of PS has no relation to $m$ .
P1, Line 36. It is “crowding distance”, not “crowded distance”. Moreover, NSGA-II uses crowding distance, whereas NSGA-III does not. This mistake also appears in P2, Line 80.
P3, Line 88. HV is to be maximized, while IGD is to be minimized.
P5, Line 163. $\partial \delta/\partial \theta^{(j^*)}$ is $-CA$ or $C(-A)$ instead of $C-A$ .
P6, Footnote. I agree that DTLZ7 is not connected. But why do DTLZ5-6 not meet the requirements? DTLZ5-6 are both compact and connected.
P9, Line 279. Regarding these baselines as "SOTA" might be controversial. Of course, I don't think that a novel method must be able to outperform all past ones, but the baselines selected here are much lower than the SOTAs I've seen in recent years (with HV>3 on Adult and Compass). I think using these algorithms as baselines is ok, but please do not call them "SOTA".
P17, Line 586. The authors might have confused IGD and IGD+, which are two different indicators. In the caption (Line 586), the authors write IGD, on the left side of the equation they write IGD+, and on the right side of the equation it is IGD again.
P25, Fig. 10. The caption should be placed below the image.
The method proposed in this paper can be classified as a weight adaptation mechanism. Weight adaptation is a popular research direction in MOO, and there is a substantial amount of related work that shares some similarities with the focus of this paper. Therefore, I suggest that the authors include these related works in the literature review.
Some experimental settings are missing, e.g., the reference point for HV and the reference set for IGD. These settings are important for reproducibility.

局限性

The discussion on limitations is presented in Section D.2 of the supplementary material. The authors' discussion of the limitations is somewhat perfunctory. Many limitations are mentioned in the main text, such as the inability to solve constrained problems and the inapplicability to disconnected PFs; however, they are overlooked in the limitations section. I suggest that the authors reconsider and summarize the limitations of the proposed method, as this will help readers gain a more comprehensive understanding of the contributions of this work.

作者回复

2024-08-03

We thank the reviewer for the exhaustive feedback and, especially, for acknowledging that our work and contributions are significant to the community. Due to space limit, we will directly take the advice on writing and incorporate them into the next revision.

W1. (1) Minimizing the fill distance is generally equivalent to minimizing IGD. (2) Limited efficiency of the UMOD.

The comment mainly includes two points:

whether FD is new given IGD (c.f. our general response 1)
whether our proposed algorithm is efficient (deferred to W2).

W2. (1) Efficiency concerns. (2) If the solutions have already converged, why just make some solutions more uniform?

Firstly, we aim to address your concern with efficiency.

Conceptually, optimizing the max-min problem (Eq.7) is naturally efficient due to the following reasons:

This optimization problem is solved by gradient ascent, which is more efficient than the evolutionary computations (ECs).
The gradient calculation is closed-form and GPU-accelerated.
Although only two preferences are updated, they reflect global information, eliminating the need to calculate the rest of the gradients.
The PF model is lightweight, taking preference angle ( $(m-1)$ -D) as input and outputting objective ( $m$ -D).

For algorithm configurations, see our response to AxUm(Q2). In bi-objective problems with a maximum fitness value of 32,000 (4000*8, with 8 as the main population), preferences are updated every 1000 iterations, resulting in at most three updates or retrainings of the Pareto front model. Thus, the Pareto front model is not trained repeatedly.

Empirically, we analyze time consumption using the bi-objective ZDT1 optimization as an example. Finding the best preference configuration typically takes 500 iterations (1s), while training the Pareto front model requires only 10 iterations (75ms with GPU acceleration, see the convergence curve provided in a one-page PDF). The total runtime for UMOD is 95.49s, with preference optimization accounting for a negligible 3.14% and 0.24% of the total time.

We also would like to make two remarks.

The two nearest preferences are updated repeatedly (rather than one time) to find the optimal configuration until convergence (convergence curve cf one-page PDF).
The max-min optimization can be accelerated by optimizing the max-softmin objective.

Secondly, we explain why finding uniform Pareto objectives is necessary.

The main purpose of this paper is to study generating $K$ uniform Pareto solutions that represent the entire Pareto front in a single run. This is meaningful because:

Real-world applications (e.g., product releasing) often require producing only $K$ Pareto products, making the study of topK uniform solutions beneficial.
Too many solutions can overwhelm users, especially for many-objective problems. A small number of representative solutions is helpful during the early stage.

Previous methods, such as "subset selection" (Qian et al., 2020), achieve a $1 - 1/e$ precision guarantee by selecting topK representative solutions. Table 7/8 shows UMOD outperforms subset selection. Additionally, UMOD is more efficient, requiring fewer main populations (40 vs 10 for UMOD) and taking less time (142.2s vs 95.49s for UMOD).

Q1. However, the dimension of PS has no relation to $m$ .

The effective dimension of the Pareto set (PS) manifold in the $n$ -D Euclidean space is $m-1$ (see Ye et al., 2022, p.1, line 12). For biobjective optimization, PS is a 1-D curve embedded the $n$ -D space.

Q5. Results on DTLZ5-7?

Please refer to general response 2.

Q6. Regarding these baselines as "SOTA" might be controversial.

We will use the term "baseline" rather than "SOTA" in the revised paper.

Q7. IGD and IGD+?

It is a typo and has been fixed.

Q9. Literature review of weight adaptation methods.

We will add a new subsection to discuss weight adaptation method. Specifically, we will follow the survey (Ma et al, 2020) as well as other recent progress to review the existing 6 categories of weight adaptation methods.

We remark the main differences between our proposed UMOD and other weight adjustment methods are:

the theoretical motivation to improve the uniformity of solutions.
the first practical neural network implementation (UMOD) to fit the Pareto front model (with neglectable training time $<100$ ms).

Q10. Some experimental settings are missing, e.g., the reference point for HV and the reference set for IGD.

As shown in the already uploaded core source code, we simply follow the conventional settings (e.g., using the default pareto_front() function in pymoo to calculate the reference set for IGD). We will detail these settings in Appendix B.3.

L1. The authors' discussion of the limitations

We intentionally skipped the already mentioned limitations in Appendix D.2. As per your advice, we plan to reorganize D.2 as follows:

While we have illustrated UMOD's success, it is also crucial to understand the limitations thereof. First, UMOD theoretically does not address disconnected Pareto fronts such as DTLZ7, as we note Theorems 3 and 4 both assume a connected Pareto front to ensure uniform distribution. Second, cases containing discrete decision variables or constrained MOPs are also dismissed, as a continuous Pareto front is required for UMOD. Finally, the neural network complicates the analysis of the optimization landscape, which is a common issue for using neural networks.

Reference:

Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set. Ye et al. UAI. 2022.
A Survey of Weight Vector Adjustment Methods for Decomposition-Based Multiobjective Evolutionary Algorithms. Ma et al. TEVC. 2020.
Subset Selection by Pareto Optimization with Recombination. Qian et al. AAAI. 2020.

2024-08-08

Thank you for your response. Some of my concerns have been adequately addressed. However, I still have some concerns about the proposed UMOD algorithm.

This is a minor comment, but I have some reservations about your response. You provided a paper (Ye et al., 2022, p.1, line 12) to support your claim that "the dimension of PS is $m-1$ ". Actually, this paper did not explicitly prove this claim, so I think the statement in that paper may also be incorrect. The vector function may be a projection from $\mathbb R^n\to \mathbb R^m, (n>m)$ . In this regard, I can not find any evidence of why the dimension of PS is $m-1$ without further assumptions, because there may be multiple $x$ resulting in the same $f(x)$ .
About UMOD "require fewer main populations". I think a good algorithm should work well for adjustable population size, so I am confused about what "require fewer main populations" means.
About the limits of subset selection. I think "subset selection" is a problem, not an algorithm. There are many algorithms for solving subset selection. The $1-1/e$ bound is the guarantee of naive greedy algorithms. Actually, there are many subset selection methods that could achieve better results. Therefore, I do not agree with the authors claiming that the $1-1/e$ bound is a shortcoming of subset selection. There is a recent work [1] that also focuses on uniformity design but is based on subset selection. The results presented in [1] seem comparable with, or even better than, UMOD, regarding uniformity. Moreover, [1] seems more efficient because it does not require training a network. [1] can effectively deal with disconnected PFs like DTLZ7, whereas UMOD cannot. [1] optimizes for uniformity by minimizing an energy function, which is similar to UMOD which optimizes Eq. (3). Another representative work [2] focusing on weight adaptation also uses subset selection and archiving strategies to optimize for uniformity, and achieves very promising results in various PFs. I am not asking for new results, but to my knowledge and based on the results reported in this paper, UMOD does not seem to perform better than the real SOTA baselines regarding uniformity.
The assumptions $y\in T$ in the theory part is very ideal. Moreover, the PF model can not learn the PF when the solutions are not well converged. In such situations, is the PF model meaningless?
The FE budget seems very high. The maximum fitness evaluations are 32,000 for bi-objective, 126,000 for three-objective, and 350,000 for four-objective optimization. It makes me believe that UMOD needs much more budget compared with the baselines.

References

[1] Enhancing Diversity by Local Subset Selection in Evolutionary Multiobjective Optimization. IEEE TEC.

[2] What Weights Work for You? Adapting Weights for Any Pareto Front Shape in Decomposition-based Evolutionary Multi-Objective Optimisation. Evolutionary Computation.

评论- Author's response

2024-08-09

We deeply appreciate your prompt response. We hope that our following answers can address your concerns more or less.

Q1. Why the PS dimension is $m-1$ ?

This means that the effective dimension of the Pareto set is $m-1$ . This conclusion can be found in Theorem 2.2, Equation (6), on page 5, line 18 of Hillermeier et al. (2001). For completeness, we provide a rough analysis here. For a detailed proof, please refer to Hillermeier et al. (2001), page 5.

Under a mild KKT condition (Equation (2), Hillermeier et al., 2001), the gradients of all solutions that belong to the Pareto set satisfy $n-m+1$ gradient constraints. Therefore, the dimension of the tangent space of the Pareto set is $n - (n-m+1) = m-1$ . This indicates that the dimension of the Pareto set manifold is $m-1$ .

For example, the PS for a typical 2-objective optimization problem is a 1-D curve, and the PS for a typical 3-objective optimization problem is a 2-D surface. For the classic ZDT1, no matter how large the decision number $n$ is, its PS is always the 1-D line from [0,0,..,0] to [1,0,..,0].

Reference

Generalized Homotopy Approach to Multiobjective Optimization, Journal of Optimization Theory and Applications. 2001. C. Hillermeier.

Q2. About UMOD "require fewer main populations". I think a good algorithm should work well for adjustable population size, so I am confused about what "require fewer main populations" means.

The claim of "fewer populations" compares UMOD with subset selection in finding $K$ representative solutions. UMOD maintains a population size of $K$ , whereas subset selection requires a larger initial pool of solutions to choose the final $K$ . Specifically, we choose to use $4K$ solutions for subset selection. Therefore, we say UMOD uses fewer populations than subset selection.

UMOD does work well for a wide range of population sizes. Due to the space limit, we have reported that UMOD works for a large range of different populations size $K$ , which is provided in the one-page PDF. If needed, we can report more experimental results.

Q3. Comparison between UMOD and subset-selection.

UMOD offers two key advantages over energy-based subset selection:

Single-Phase Algorithm: Energy-based subset selection involves a "two-phase" process: (1) generating a large number of candidate solutions and (2) performing subset selection. This two-phase approach can be computationally expensive, especially when the candidate pool is large. Additionally, determining the appropriate size of the candidate pool to ensure representative solutions is uncertain. In contrast, UMOD operates as a single-phase algorithm, eliminating the need for an extensive candidate generation phase.
Efficient Optimization: Energy-based subset selection is a discrete optimization problem, which can be computationally intensive. UMOD, on the other hand, employs a gradient-based method for selecting representative solutions, leveraging continuous optimization. Gradient-based methods are generally more efficient and faster compared to discrete optimization problems, leading to more timely and resource-effective solutions.

We argue that training a neural network is not a big trouble and should be widely adopted in multi-objective optimization due to its robustness (see response to Q1, JBGb) and quick training time (75ms).

We will soon compare it with LLSA (Wang et al., 2023). While we currently lack numerical results for LSSA, Table 3 in the LSSA paper indicates that LSSA underperforms compared to SMS-MOEA on the ZDT4, DTLZ1, DTLZ5, and DTLZ6 problems. In contrast, UMOD consistently outperforms SMS-MOEA on those tasks. The results compared with SMS-MOEA serve as evidence that UMOD performs better than LSSA.

Reference

Enhancing Diversity by Local Subset Selection in Evolutionary Multiobjective Optimization. Wang et al. 2023. IEEE TEC.

Q4 When the solutions are not well converged. In such situations, is the PF model meaningless?

Firstly, our results show that solutions easily converge to the PF, but uniformity of those solutions is a big issue. Therefore, we use converged Pareto solutions to train PF models in our experiment. Secondly, according to our results, even if solutions are not fully converged, the fitted model remains meaningful. A rough Pareto front model still aids in finding more uniform solutions through solving the optimal preference configuration problem.

Q5. The FE budget seems very high.

This question can be answered from two aspects.

UMOD is the first method, to our knowledge, to reliably find uniform Pareto objectives, as shown in Table 1 of our response to AxUm, where the spacing indicator is nearly zero. Increasing the FE budget is worthwhile for achieving uniform Pareto objectives.
The FE budget can be largely improved. Previously, we do not conduct much effort to reduce it. We will provide new results for you soon.

2024-08-09

Thank you for your detailed response. I have increased my rating.

评论- Thanks for your score increasing

2024-08-09

Thanks for your score increasing. We are conducting more and more experiments to address your concerns and we will report them to you as soon as possible.

评论- Analysis of different FEs

2024-08-09

Table 1. Results under different function evaluations. (K=10).

Problem	Func. Eval. (FE)	Method	Spacing	Sparsity	HV	Uniform	Soft Uniform	IGD	FD
ZDT1	15,000	UMOD	0.0007	0.0266	1.0536	0.1616	0.0167	0.0403	0.0809
	15,000	MOEAD	0.0506	0.0284	1.0503	0.1175	-0.0047	0.0414	0.1353
	20,000	UMOD	0.0009	0.0266	1.0536	0.1613	0.0167	0.0403	0.0810
	20,000	MOEAD	0.0503	0.0283	1.0505	0.1175	-0.0047	0.0414	0.1353
	25,000	UMOD	0.0007	0.0266	1.0537	0.1620	0.0168	0.0403	0.0813
	25,000	MOEAD	0.0509	0.0285	1.0503	0.1175	-0.0047	0.0414	0.1353
	40,000	UMOD	0.0003	0.0266	1.0537	0.1624	0.0167	0.0403	0.0816
	40,000	MOEAD	0.0504	0.0283	1.0504	0.1175	-0.0047	0.0414	0.1353
RE21	15,000	UMOD	0.0015	0.0164	1.2513	0.1241	-0.0202	0.0322	0.0661
	15,000	MOEAD	0.0845	0.0237	1.2447	0.0721	-0.0580	0.0423	0.1792
	20,000	UMOD	0.0013	0.0164	1.2512	0.1250	-0.0202	0.0322	0.0648
	20,000	MOEAD	0.0853	0.0239	1.2452	0.0721	-0.0581	0.0425	0.1812
	25,000	UMOD	0.0007	0.0164	1.2513	0.1266	-0.0203	0.0322	0.0643
	25,000	MOEAD	0.0853	0.0239	1.2452	0.0721	-0.0581	0.0425	0.1812
	40,000	UMOD	0.0011	0.0164	1.2520	0.1256	-0.0202	0.0321	0.0645
	40,000	MOEAD	0.0853	0.0239	1.2453	0.0721	-0.0582	0.0425	0.1812
RE22	15,000	UMOD	0.0675	0.0200	1.1802	0.0050	-0.0800	0.0406	0.0854
	15,000	MOEAD	0.0460	0.0194	1.1867	0.0601	-0.0489	0.0385	0.1197
	20,000	UMOD	0.0676	0.0198	1.1809	0.0036	-0.0786	0.0402	0.0866
	20,000	MOEAD	0.0460	0.0194	1.1867	0.0601	-0.0489	0.0385	0.1197
	25,000	UMOD	0.0676	0.0198	1.1809	0.0036	-0.0786	0.0402	0.0866
	25,000	MOEAD	0.0460	0.0194	1.1867	0.0602	-0.0489	0.0385	0.1197
	40,000	UMOD	0.0677	0.0199	1.1822	0.0033	-0.0792	0.0403	0.0875
	40,000	MOEAD	0.0460	0.0194	1.1868	0.0601	-0.0489	0.0385	0.1197

UMOD is integrated within the Pymoo environment, which supports both evolutionary algorithms (popular in multi-objective optimization) and gradient-based ML problems. LSSA is found in the PlatEMO environment, with some differences in function evaluations (FE) between Pymoo and PlatEMO. For bi-objective problems, LSSA requires around 50,000 iterations (100 ( subproblem) x 500 (iterations) ).

Table 1 highlights two key points:

Without subset selection, MOEA/D fails to produce uniform Pareto solutions even with large iterations, suggesting that it is worthy to use extra FEs to find the topK uniform solutions, which is the motivation of this paper.
Reducing function evaluations from 40,000 (our original setting) to 15,000 (K=10) is possible without significantly impacting performance. Results remain robust when FE > 15,000. (40,000 is for K=10, while 32,000 (previously mentioned) is for K=8.)

Further improvements the number of FE could be made in three ways:

1. Using the last iteration as a warm start for generating uniform solutions, as MOEA/D provides good approximations.
1. Making the epoch for calculating the optimal preference configuration adaptive to save computation.
1. Using all populations to train the Pareto front rather than the main population (the method currently used).

2024-08-09

Thank you for your effort in providing new results. However, the new results are not very convincing to me because the selected problems are too simple. For example, the PS is ZDT1 lies in $x_i=0$ for $i\ge 2$ . All the decision variables are clipped to [0,1], so there is actually only one effective decision variable. Therefore, ZDT1 does not need much budget for searching and converging. I suggest the author use DTLZ1 instead of ZDT1 to provide a more convincing experiment setting. I am not asking for new results, and this is only my advice for improving this new experiment if the authors are considering including it in the paper.

评论- Comparison between UMOD and LSSA

2024-08-10

Table 2. Comparison with LSSA (Wang et al. 2022). IGD, spacing, and sparsity are scaled by 100, while uniformity, soft uniformity, and FD are scaled by 10 for a better illustration.

		HV	IGD	Spacing	Sparsity	Uniform	Soft uniform	FD
ZDT1	UMOD	1.04	5.19	0.12	4.39	2.07	0.77	1.04
	LSSA	1.04	5.22	0.1	4.45	2.09	0.78	1.04
ZDT2	UMOD	0.71	5.23	0.25	4.45	2.05	0.78	1.07
	LSSA	0.71	5.19	0.54	4.45	2.03	0.78	1.05
ZDT3	UMOD	0.89	39.8	3.57	3.45	1.2	0.3	9.22
	LSSA	0.91	38.45	3.27	3.88	1.53	0.42	8.87
ZDT4	UMOD	1.04	5.21	0.22	4.39	2.05	0.77	1.06
	LSSA	1.02	5.41	0.34	4.49	2.06	0.79	1.05
ZDT6	UMOD	0.66	4.18	0.12	2.86	1.67	0.35	0.81
	LSSA	0.66	4.54	3.04	3.07	1.01	0.17	1.2
DTLZ1	UMOD	1.69	4.87	0.09	0.73	1.39	-0.9	0.78
	LSSA	1.42	32.77	3.06	1.93	1.91	0.47	3.67
DTLZ2	UMOD	1.07	12.1	0.52	1.61	3.17	1.18	2.37
	LSSA	1.07	12.23	2.2	2.17	2.96	1.21	2.22
DTLZ3	UMOD	1.06	12.3	1.93	1.7	2.72	1.13	2.39
	LSSA	0	1345.87	281.52	923.37	6.72	3.45	136.54
DTLZ4	UMOD	1.07	12.1	0.83	1.77	3.07	1.18	2.41
	LSSA	1.06	12.39	2.29	2.34	2.99	1.23	2.32
DTLZ5	UMOD	0.72	0.63	0.7	0.03	0	-3.62	0.24
	LSSA	0.7	2.67	0.51	1.11	0.96	-0.72	0.54
DTLZ6	UMOD	0.69	3.23	1.06	0.06	0	-3.65	0.46
	LSSA	0.7	2.68	1.16	1.11	0.87	-0.73	0.57

Thank you for mentioning LSSA as a strong baseline. The code is available at GitHub and uses PlatEMO version 3.4. Key observations from Table 1 are:

UMOD outperforms LSSA on most problems, particularly for the IGD indicator, where UMOD excels in 8 out of 11 cases.
Our reimplementation shows that LSSA struggles with a small population size, notably on ZDT5 and DTLZ3, which are one of the focus of this paper.
LSSA does outperform UMOD on ZDT3 (disconnected case), and we plan to adapt UMOD for such scenarios.

Additionally, UMOD offers novel theoretical insights, including the relationship between IGD, fill distance, and max-min distance, along with error analysis. Beyond evolutionary problems, which are a part of this paper, UMOD also effectively addresses large-scale machine learning challenges using pure gradient-based methods.

Reference

[1] Enhancing Diversity by Local Subset Selection in Evolutionary Multiobjective Optimization. Wang et al. IEEE TEC. 2022.

2024-08-10

Thank you for your response. The results are very impressive. Can you provide some details about the settings? For example, the number of solutions, the number of decision variables, the number of objective functions, and the FE budget.

评论- Results comparison with different FE on three objective problems.

2024-08-11

Table 5 Results comparison with different FE on three objective problems.

Problem	Iterations	Method	Spacing	Sparsity	HV	Uniform	Soft Uniform	IGD	MaxGD
DTLZ1	63,000	UMOD	0.0079	0.0067	1.6926	0.1181	-0.0903	0.0491	0.0861
	63,000	MOEAD	0.0002	0.0075	1.6917	0.1408	-0.0899	0.0488	0.0750
	84,000	UMOD	0.0044	0.0070	1.6929	0.1274	-0.0902	0.0489	0.0857
	84,000	MOEAD	0.0002	0.0075	1.6917	0.1408	-0.0899	0.0488	0.0750
	105,000	UMOD	0.0031	0.0071	1.6929	0.1315	-0.0900	0.0488	0.0843
	105,000	MOEAD	0.0002	0.0075	1.6914	0.1408	-0.0900	0.0488	0.0750
	126,000	UMOD	0.0025	0.0072	1.6929	0.1331	-0.0900	0.0488	0.0820
	126,000	MOEAD	0.0002	0.0074	1.6913	0.1404	-0.0901	0.0487	0.0750
DTLZ2	63,000	UMOD	0.0151	0.0174	1.0637	0.2783	0.1155	0.1223	0.2387
	63,000	MOEAD	0.0545	0.0243	1.0624	0.2433	0.0947	0.1263	0.2514
	84,000	UMOD	0.0044	0.0177	1.0625	0.3180	0.1178	0.1218	0.2370
	84,000	MOEAD	0.0546	0.0243	1.0626	0.2433	0.0947	0.1263	0.2515
	105,000	UMOD	0.0070	0.0173	1.0661	0.3100	0.1178	0.1222	0.2433
	105,000	MOEAD	0.0546	0.0243	1.0625	0.2433	0.0947	0.1263	0.2515
	126,000	UMOD	0.0042	0.0169	1.0730	0.3237	0.1204	0.1250	0.2477
	126,000	MOEAD	0.0546	0.0243	1.0626	0.2433	0.0947	0.1263	0.2515
DTLZ3	63,000	UMOD	0.0150	0.0184	1.0531	0.2937	0.1177	0.1237	0.2576
	63,000	MOEAD	0.0545	0.0243	1.0564	0.2439	0.0958	0.1267	0.2524
	94,500	UMOD	0.0064	0.0183	1.0586	0.3134	0.1208	0.1236	0.2266
	94,500	MOEAD	0.0546	0.0244	1.0601	0.2436	0.0950	0.1264	0.2518
	126,000	UMOD	0.0025	0.0176	1.0610	0.3213	0.1177	0.1219	0.2351
	126,000	MOEAD	0.0546	0.0244	1.0602	0.2436	0.0950	0.1157	0.2749
DTLZ4	63,000	UMOD	0.0142	0.0231	1.0711	0.3057	0.1194	0.1252	0.2672
	63,000	MOEAD	0.0546	0.0243	1.0626	0.2434	0.0947	0.1155	0.2748
	94,500	UMOD	0.0045	0.0178	1.0645	0.3181	0.1195	0.1233	0.2326
	94,500	MOEAD	0.0547	0.0243	1.0627	0.2433	0.0947	0.1155	0.2748
	126,000	UMOD	0.0024	0.0170	1.0631	0.3226	0.1178	0.1223	0.2544
	126,000	MOEAD	0.0546	0.0243	1.0626	0.2433	0.0947	0.1155	0.2748

In previous discussion, we have identified additional methods to further reduce FE and will incorporate them in the revised paper. Even with the original strategy, FE can be reduced to approximately 63,000 from 126,000 for 21 solutions. Table 5 highlights key insights:

For the DTLZ1 problem, as shown in the main paper, uniform preferences yield uniform Pareto objectives. UMOD and MOEAD perform similarly, indicating that UMOD generates nearly uniform Pareto objectives.
While further improvement beyond 63,000 FE is minimal, UMOD significantly outperforms MOEAD on DTLZ2 and DTLZ4.

评论- Detailed Settings of the Comparison Experiments between LSSA and UMOD

2024-08-11

Thank you for your consistent support and interest in our work.

The settings compared with LSSA are the same as in our main paper (Table 5, Appendix). We will include all additional settings discussed during the rebuttal.

Table 3. Settings of the comparison between LSSA and UMOD.

	Bi-objective	Three-objective (DTLZ 1-4)	Degenerated three-objective (DTLZ 5-6)
Number of solutions	8	21	16
Number of seeds	31	31	31
Func. Eval. (FE)	32,000	126,000	96,000
Number of decision variables	30	30	30

Table 4. Software for LSSA and UMOD.

	LSSA	UMOD (for MOEA)
Software	PlatEMO 3.4	Pymoo 0.6.1.1

The FE numbers and UMOD results follow those in our main paper. As discussed, the FE budget can be significantly reduced. For example, we reduced the FE for the bi-objective case from 32,000 to 12,000 without affecting UMOD's performance. Improved FE results for three-objective problems will be released soon.

评论- Results on DTLZ5 and 6

2024-08-11

You can view our visual results on the one-page summary, where results on DTLZ5 and 6 are accurate. DTLZ5 and DTLZ6 were run on a new computer. However, there's a minor bug in the IGD function of the pymoo library (line 256 in dtlz.py), where the default partition is set to 15. This causes numerical instability in approximating the PF and IGD. We've adjusted it to n_partitions=50. This line of code has been changed to ref_dirs = UniformReferenceDirectionFactory(3, n_partitions=50).do() # Original is 15.

We are aware of this issue, and all other results are correct (they are by our main PC). We will re-run the IGD values for DTLZ5 and DTLZ6.

New results will be updated soon.

评论- Results on DTLZ5 and 6

2024-08-12

When calculating the Pareto front (PF) for DTLZ5 and DTLZ6, we encountered an intriguing bug. After thorough investigation, we discovered that results may vary on different PCs due to a minor issue in pymoo (version 0.6.1.1). When using Remote.get_instance().load("pymoo", "pf", "dtlz5-3d.pf") to calculate the PF, the file is loaded from A\\envs\\B\\Lib\\site-packages\\pymoo\\data\\pymoo\\pf\\dtlz5-3d.pf, where 'A' is the anaconda installation directory and 'B' is the environment name.

Interestingly, the dtlz5-3d.pf file is 387KB, while dtlz6-3d.pf is only 32KB, indicating that the PF for DTLZ6 is incomplete. To resolve this issue, we used the dtlz5-3d.pf file for both DTLZ5 and DTLZ6 calculations.

The minor bug in pymoo has been fixed. We will report the DTLZ5/6 results and the performance of UMOD with large populations soon.

评论- About Source Code of UMOD

2024-08-12

I was trying to run the code uploaded by the authors, but I noticed that they only uploaded the umod.py file without other critical libraries and files like xy_util and mo_util, so the code cannot run actually. Could you please provide a full program?

评论- Source Code of UMOD

2024-08-12

When we submitted the code, there were still some privacy issues in the project that we hadn't had time to address. Therefore, we mainly provided the main file to explain the algorithm's running process.

For UMOD solving multitask learning problems, recently, there is a open-sourcing implement of UMOD in the libmoon framework. Code is available at here by setting solver as uniform.

2024-08-12

Thank you for your response. I have checked the code of PyMOO. I carefully read dtlz.py at https://github.com/anyoptimization/pymoo/blob/main/pymoo/problems/many/dtlz.py. I think the setting of n_partitions does not affect the results on DTLZ5 and DTLZ6 because ref_dirs = UniformReferenceDirectionFactory is only used in DTLZ1-4. Moreover, I am also very confused as to why running on a new computer could result in the wrong result.

评论- We choose a small population because it is hard!

2024-08-11

We chose a small population to test whether a few solutions can approximate the entire Pareto front (PF). With a larger population, fitting the network is easier. Our key point is that even with fewer solutions, UMOD with a neural model produces uniform Pareto objectives.

We use mean in Table 2.

2024-08-11

Thank you for your response, and thank you for your effort in improving the empirical study. However, I am confused about some of the numbers you reported. The PF of DTLZ5 and DLTZ6 is the same. However, the IGD of UMOD on DTLZ5 is 0.63, and the IGD of UMOD on DTLZ6 is 3.23. Why do they differ so much on the same PF? Moreover, is the number in the table an average or median?

In addition, the population size used in your experiment is significantly smaller than that employed in other studies. In practical applications, a larger population is often necessary to adequately approximate an irregular or high-dimensional PF. I am curious about whether UMOD encountered difficulties when using a larger population size, such as 100 solutions on 3-objective problems.

2024-08-12

After correcting the IGD calculation, the revised DTLZ5 and DTLZ6 results are as follows.

Table 6. Results on DTLZ5.

	Spacing	Sparsity	HV	Uniform	Soft uniform	IGD	FD
UMOD	0.0073	0.0131	0.6995	0.0867	-0.0634	0.0294	0.0850
MOEAD	0.0871	0.0645	0.6352	0.0000	-0.1978	0.1430	0.3002
AWA	0.0909	0.0312	0.6697	0.0000	-0.1802	0.0712	0.1697
SMS-MOEA	0.0454	0.0147	0.7035	0.0783	-0.0753	0.0307	0.1099
NSGA3	0.0599	0.0289	0.6623	0.0005	-0.1417	0.0683	0.1802

Table 7. Results on DTLZ6.

	Spacing	Sparsity	HV	Uniform	Soft uniform	IGD	FD
UMOD	0.0125	0.0128	0.7011	0.0738	-0.0618	0.0285	0.0731
MOEAD	0.0843	0.0599	0.6352	0	-0.2032	0.143	0.3002
AWA	0.0908	0.0312	0.6697	0	-0.1694	0.0712	0.1697
SMS-MOEA	0.0426	0.0143	0.7036	0.072	-0.0752	0.0305	0.1099
NSGA3	0.0524	0.0415	0.6623	0	-0.1357	0.0931	0.2972

Table 6/7 highlights that UMOD significantly outperforms other methods in ensuring evenly distributed solutions. While NSGA3, MOEA/D, and MOEA/D-AWA produce duplicate solutions, SMS-MOEA is the only method with a comparable HV to UMOD. However, UMOD surpasses SMS-MOEA considerably in IGD and FD.

评论- Numerical results for larger K

2024-08-13

Table 6. Results for large solution numbers (K).

Problem	Method	Spacing	Sparsity	HV	Uniform	Soft uniform	IGD	FD
ZDT1 (sols=50)	UMOD	0.0018	0.0009	1.0973	0.0270	-0.2381	0.0072	0.0156
	MOEAD	0.0129	0.0010	1.0964	0.0215	-0.2458	0.0078	0.0347
	AWA	0.0130	0.0011	1.0961	0.0215	-0.2431	0.0079	0.0347
	SMS-MOEA	0.0052	0.0009	1.0972	0.0195	-0.2390	0.0075	0.0173
	NSGA3	0.0168	0.0012	1.0962	0.0217	-0.2456	0.0075	0.0361
ZDT2 (sols=50)	UMOD	0.0016	0.0009	0.7639	0.0239	-0.2374	0.0074	0.0162
	MOEAD	0.0049	0.0009	0.7632	0.0198	-0.2379	0.0075	0.0167
	AWA	0.0049	0.0009	0.7632	0.0202	-0.2380	0.0075	0.0167
	SMS-MOEA	0.0098	0.0010	0.7642	0.0207	-0.2405	0.0090	0.0404
	NSGA3	0.0048	0.0009	0.7636	0.0206	-0.2379	0.0075	0.0172
DTLZ1 (sols=91)	UMOD	0.0009	0.0006	1.7016	0.0553	-0.2839	0.0203	0.0341
	MOEAD	0.0003	0.0007	1.7002	0.0581	-0.2843	0.0203	0.0328
	AWA	0.0086	0.0007	1.7002	0.0000	-0.2837	0.0205	0.0531
	SMS-MOEA	0.0092	0.0044	1.6930	0.1064	-0.0926	0.0491	0.1013
	NSGA3	0.0005	0.0006	1.7017	0.0566	-0.2842	0.0203	0.0333
DTLZ2 (sols=91)	UMOD	0.0054	0.0013	1.1446	0.1276	-0.1607	0.0542	0.1023
	MOEAD	0.0277	0.0013	1.1406	0.0896	-0.1688	0.0535	0.1131
	AWA	0.0255	0.0013	1.1414	0.0759	-0.1678	0.0537	0.1130
	SMS-MOEA	0.0335	0.0018	1.1510	0.0443	-0.1967	0.0797	0.1928
	NSGA3	0.0273	0.0013	1.1419	0.0903	-0.1683	0.0536	0.1134

From Table 6, we can find that, UMOD outperforms previous MOEA methods on ZDT1, ZDT2, and DTLZ2 problems, particularly in covering radius and evenly spread indicators. Since DTLZ1 is the only problem that uniform preferences induces uniform Pareto objectives, MOEA/D with uniform preferences perform the best on DTLZ1. Despite not directly optimizing IGD, UMOD significantly excels on IGD indicators, especially for ZDT1 and ZDT2. We will incorporate these results in our paper.

If you have any further concerns, please let us know.

2024-08-13

Thank you for your results. Could you report the FE budget of this experiment?

评论- FE for many-solution problems

2024-08-13

Dear Reviewer N5xS,

For bi-objective problems, the FE count is 75,000 (50 * 1500 generations), similar to problems with fewer sub-problems. For three-objective problems, the FE count is 273,000 (91 * 3000 generations). As mentioned in our response under "Analysis of Different Iterations," three methods could significantly reduce FE, allowing for further reductions.

We also wish to clarify our paper's motivation:

We aim to propose an (the first) optimizable indicator for MOO that, when optimized, ensures the algorithm returns uniformly distributed Pareto objectives. Optimizing such indicator is elegant and easy to understood (just like optimizing the HV), while methods like LSSA needs some complex mechanisms. Optimizing this indicator yields uniform solutions for both evolutionary and gradient-based problems. Our experiments confirm we achieved this goal.
Our goal for uniform Pareto objectives is to cover the entire Pareto front, significantly outperforming in terms of FD (covering radius). The main paper focuses on using a small population to cover the whole PF (a harder case). In the rebuttal, we demonstrate that UMOD also performs well with a larger population (K).
While there is room for technical and engineering improvements to reduce FE, our primary focus is on the original idea, theoretical results, and designing a practical algorithm. We will actively leverage techniques to reduce FE.
For the machine learning problems addressed in this paper, function evaluation is relatively inexpensive, making FE less critical in gradient-based methods. UMOD is the first method to find uniform Pareto objectives with low computation.

If you find our rebuttal helpful, we kindly ask you to consider raising your score for our paper. Thank you for your encouraging and constructive feedback.

Best regards, 5680 authors

2024-08-13

Thank you for your response. With your patient explanation and additional results, I think now I have a more comprehensive understanding of your work. I increased my rating to 6, and I think this paper can be weakly accepted. I do not require further clarification.

Moreover, I strongly suggest the authors include a more comprehensive empirical study in the camera-ready version (if accepted). Currently, most of the presented results use a very small population size. I acknowledge that it is usually a difficult task to perform well with a small population size, but I think a good MOA should be compatible with a wide range of population sizes. To my knowledge, most related work may adopt a population size of around 100 for 3-obj problems. Additionally, this work tends to adopt a very large FE. UMOD performs well with a large FE because it can make use of extensive evaluations to refine the distributions, thus achieving a better benchmark performance, but MOEA/D cannot. I think a wide range of audiences may be interested in the performance of UMOD with a larger population size and a smaller FE.

Another limitation of this work is it does not touch on the real complex PFs. The PF of the adopted problems is very simple. Although the reviewers are interested in the results of DTLZ7, the authors refused to present them. This work does not provide new insights into how to deal with complicated PFs, which I think is a very important topic with high practical value. The authors said the theories do not hold for a disconnected PF, so the proposed method is not applicable. I suspect the real limitation may lie in the PF learning. The PF learning mechanism in this paper does not seem to be able to learn a disconnected PF. The current model cannot model the boundary of the PF; thus if the weight vectors go outside the boundary, uniformity can no longer be guaranteed. I recommend an interesting test suite called IMOP, which contains many extremely complicated PFs. The authors could adopt these problems for future research.

Last, I would ask the authors to include a more comprehensive summary of the limitations of UMOD in the camera-ready version. I also suggest the authors double-check the experimental details such as the reference set for IGD. PyMOO is not a well-verified library, so it may have a lot of bugs.

评论- Final conclusion

2024-08-13

Dear reviewer,

Thanks a lot for pointing necessary references (e.g., LSSA), scoring raising, and your consist support for our work. We want to make a final response to your concerns, and we guarantee we will include these issues in out paper, since you have spent so much previous time and professional suggestions.

Here is our final response to your comments:

Comprehensive empirical study: We will add extensive empirical studies, including all ablation and sensitivity analyses, beyond what was presented during the rebuttal.
Larger population and FE issues: We will include results for both small and large populations in the main paper and discuss the impact of different FEs in MOEAs to appeal to a broader audience.
Complex PF: Handling complex Pareto fronts (PF) is critical in the MOEA community. UMOD is well-suited for degenerated PFs (e.g., DTLZ5/6) in the decomposition framework, as it helps find preferences corresponding to representative solutions. For disconnected PFs, the challenges include (1) improving PF model estimation with techniques like self-organized mapping networks, and (2) optimizing the max-min distance, which is more difficult.
Limitations: We will discuss limitations in detail, including the use of more FEs and the ability to handle complex PFs.

审稿意见

评分: 7置信度: 32024-07-11

The paper presents a new approach to effectively represent the entire Pareto front in multi-objective optimization problems. The authors propose using fill distance as a new metric for uniformity in MOO, addressing the challenge of quantifying the representativeness of design points.

优点

The paper presents a novel approach to MOO by introducing the concept of fill distance and a surrogate objective function.
The UMOD algorithm provides bounds on the optimization error. Besides, it demonstrates that as the size of the solution set K increases, the obtained Pareto set will asymptotically converge to a uniform distribution over the Pareto front, which ensures the effectiveness of the algorithm
The paper makes comprehensive empirical evaluations.

缺点

The proposed method relies on a neural network to approximate the Pareto front. The performance of the method could be sensitive to the quality of the neural network model, its training data, and hyperparameters.
While the paper claims that UMOD is efficient, there is limited discussion on the method to scale to larger problems.

问题

The paper mentions the use of neural networks for approximating the Pareto front. Could the authors elaborate on the choice of the neural network architecture and how it impacts the optimization process?
Are there any additional limitations or specific considerations for higher-dimensional Pareto fronts?
While the paper demonstrates the method on several benchmark problems, what are the scalability considerations when applying UMOD to larger, more complex problems, especially those with a high number of objectives or decision variables?
It would be interesting to know how the algorithm performs on DTLZ5-7, where the assumptions are not satisfied.

局限性

The authors have acknowledged the complexity of the optimization landscape introduced by the neural network. They could further discuss how their method handles cases where the Pareto front is not well-approximated by the neural network or where the front is highly irregular.

作者回复

2024-08-04

We thank for the detailed feedbacks from the reviewer. We hope our following response can address your concerns more or less.

Since W1 and Q1 are related. We combine the answers here.

W1. The proposed method relies on a neural network to approximate the Pareto front. The performance of the method could be sensitive to the quality of the neural network model, its training data, and hyperparameters.
Q1. Could the authors elaborate on the choice of the neural network architecture and how it impacts the optimization process?

Neural networks in this paper serve as a universal approximator of the continuous mapping function from preference angles to Pareto objectives. Although it is a feature of NN that it will suffer from the sensitivity to the architectures, training data, and hyperparameters, this feature will allow substantial improvement of the NN performance through careful investigation of the problem structure, which is widely accepted in modern machine learning.

On the other hand, the experiments show that indeed the performance of NN is empirically robust. Here, we construct 6 NN architectures and exam the performance sensitivity.

Model	Architecture
$M_1$	$(m-1) - 256 - 256 - 256 - m$
$M_2$	$(m-1) -128 - 128 - 128 - m$
$M_3$	$(m-1) - 64 - 64 - 64 - m$
$M_4$	$(m-1) - 256 - 256 - m$
$M_5$	$(m-1) - 128 - 128 - m$
$M_6$	$(m-1) - 64 - 64 - m$

Table 1. Model architecture. Non-linear activation functions are ReLU.

The table shows that network structure minimally impacts the final results. Due to space constraints, experiments on other problems are not reported here but will be included in the revised paper.

Architecture	Spacing $\downarrow$	Sparsity $\downarrow$	HV $\uparrow$	IGD $\downarrow$	FD $\downarrow$
M1	0.0030	0.0445	0.7120	0.0522	0.1051
M2	0.0015	0.0445	0.7119	0.0522	0.1045
M3	0.0011	0.0445	0.7119	0.0523	0.1054
M4	0.0016	0.0445	0.7119	0.0523	0.1052
M5	0.0013	0.0445	0.7119	0.0523	0.1043
M6	0.0054	0.0445	0.7116	0.0522	0.1078

Table 2. Results on various models for the ZDT1 problem.

Then, since W2 and Q3 are related. We answer then together.

W2. While the paper claims that UMOD is efficient, there is limited discussion on the method to scale to larger problems.
Q3. What are the scalability considerations when applying UMOD to larger, more complex problems, especially those with a high number of objectives or decision variables.

In Section 4.2 (Pages 8-9), we extend UMOD to solve large-scale machine learning problems, specifically fairness classification problems with around 2891/6811 neural network parameters as decision variables.

These experiments demonstrate that UMOD can handle very large-scale problems efficiently. During optimization, we only need to fit a model where the input is the preference angle and the output is the Pareto objectives, with dimensions $m-1$ and $m$ , respectively, independent of the number of decision variables $n$ . Additionally, generating new preference angles has complexity related to $m$ rather than $n$ . Considering $n \geq m$ , those two operations are efficient. Apart from these operations, UMOD functions similarly to standard MOEA/D, contributing to its efficiency.

Also, in Section 4.1, we have also conducted experiments on four-objective problems, namely RE41 and RE42. Our visual results (Full results see Figure 8 and Figure 9) demonstrates that UMOD performs baseline methods a lot in finding uniform K Pareto solutions on the entire PF.

Q2. Are there any additional limitations or specific considerations for higher-dimensional Pareto fronts.

For high-dimensional Pareto fronts, we donot have very specific designs and we note three key differences: (1) More samples are needed to train the Pareto front model, so the number $K$ of candidate solutions should be larger; (2) Training iterations of the Pareto front model are increased; (3) Preference angle adaptation iterations are also increased.

Q4. It would be interesting to know how the algorithm performs on DTLZ5-7, where the assumptions are not satisfied.

For discussion on DTLZ5-7 problems, please refer to our general response 2.

2024-08-12

Thank you for your detailed response. My concerns have been addressed. I have decided to maintain my original positive score.

评论- Thanks for your feedback

2024-08-12

We sincerely appreciate your positive feedback and are pleased that we were able to address your concerns.

审稿意见

评分: 7置信度: 32024-07-13

In this paper, the authors focus on finding K uniform Pareto-optimal points that are capable of representing the entire Pareto front and propose a new metric, namely fill distance to quantify the effectiveness of these K points. To minimize the fill distance easily, the authors adopted a surrogate model, called 'max-min'. Furthermore, the authors design an effective multi-objective algorithm, namely UMOD.

优点

The paper introduces a novel way of generating K uniform Pareto objectives, which is quite original. The paper has a good quality. The authors provided a rigorous proof process for the rationality of using the surrogate model, and the experimental results verified the effectiveness of the designed UMOD. The presentation of the paper is clear. The contribution of the paper is thus significant.

缺点

Section 2.1 and 2.2： What are the drawbacks of the existing methods for generating diverse solutions? You need to have a simple summary. The motivation of your design is not very clear without saying the drawbacks of existing methods.

问题

I know the situation of solutions reflected by different metrics (such as the convergence of IGD and the uniformity of Fill Distance), but is there a consistent relationship between the classic IGD indicator and the Fill Distance indicator? (Calculated results listed in Tables 1 and 2 seem to confirm this viewpoint)
Is it necessary to have the Fill Distance indicator? What impact will it have on measuring the performance of different algorithms? What adverse effects will the lack of this indicator have?
I am curious whether it is reasonable to generate K uniformly distributed points from PF based on Euclidean distance and use them as representative solutions for PF. Especially for irregularly distributed PF situations. In practical problems, optimization problems often contain several complex constraint conditions, leading to scattered arrangements when the solution space is mapped to the target space, resulting in many irregular distributions of PF.

局限性

Authors can summarize the drawbacks or shortcomings of existing indicators and delve into the necessity of the Fill Distance metric, as well as the impact it will have on evaluating the fairness of different algorithm performance.

作者回复

2024-08-04

We thank the reviewer for pointing out key passages that could be complemented with the necessary background for a broader audience.

W1. What are the drawbacks of the existing methods for generating diverse solutions? You need to have a simple summary.
For indicators, please refer our response to Reviewer AxUm in W1 and W2. We will include the response as a simple summary in Appendix B1.

Q1. Is there a consistent relationship between the classic IGD indicator and the Fill Distance indicator?

The detailed relationship between IGD and FD can be found in our general response 1. Thank you for the insightful question. We will add the discussion to the next revision.

Q2. Is it necessary to have the Fill Distance indicator? What impact will it have on measuring the performance of different algorithms? What adverse effects will the lack of this indicator have?

The related discussion can be found in our general response 1. We argue FD is a better metric than its $L_1$ counterpart IGD. FD has better theoretical properties than IGD, especially in ensuring each point in the Pareto front can essentially be well represented (while IGD may dismiss some points).

Specifically for your first sub-question, we believe the FD indicator is necessary. Briefly:

The FD indicator represents the entire Pareto front. Optimizing it provides the minimal covering radius, making it a space-filling design .
FD is robust to the distribution of Pareto solutions approximating the whole front.
We establish a rate-optimal design between FD and maximizing the minimal pairwise distance, which is easier to optimize, and provide a detailed solution distribution for this design.

For your second sub-question, since the FD indicator measures uniformity, a large FD indicator means the solutions generated by the algorithm are generally more uniform.

For your last sub-question, the main weakness is that to measure the level of uniformity, the FD indicator assumes the Pareto front is connected, which may not be suitable for disconnected problems like DTLZ7.

Q3. I am curious whether it is reasonable to generate K uniformly distributed points from PF based on Euclidean distance and use them as representative solutions for PF.Especially for irregularly distributed PF situations. In practical problems, optimization problems often contain several complex constraint conditions, leading to scattered arrangements.

We use Euclidean distance because when $K$ is large enough it will well approximate geodesic distance on manifolds; in most experiments the strategy works well. However, due to the theoretical limitation (Theorem 3/4 assumes compactness and connectivity), UMOD does struggle with a disconnected Pareto front. For irregular, disconnected Pareto fronts, a potential solution is to generate uniformly distributed Pareto solutions on each segment separately. We will investigate this approach in the future work.

L1. Authors can summarize the drawbacks or shortcomings of existing indicators and delve into the necessity of the Fill Distance metric, as well as the impact it will have on evaluating the fairness of different algorithm performance.

Thanks for your advice. To address your concern, we conduct comparisons with the hypervolume indicator, the sparsity indicator, and the IGD indicator.

Hypervolume is a coarse uniformity metric as HV maximization brings uncontrollable configurations of Pareto solutions. As remarked in Line 124-125, only a linear Pareto front results in equally spaced solutions when maximizing hypervolume.
The sparsity indicator considers the sum of squared distances between neighboring Pareto objectives. However in problems with more than two objectives, this indicator lacks a clear interpretation.
For IGD, please refer to our general response 1. In summary, it is a weaker metric than FD.

For the impact of FD on evaluation, we argue FD is a better metric than its $L_1$ counterpart IGD. FD has better theoretical properties than IGD, especially in ensuring each point in the Pareto front can essentially be well represented (while IGD may dismiss some points).

Specifically,

The FD indicator represents the entire Pareto front. Optimizing it provides the minimal covering radius, making it a space-filling design (Joseph et al, 2016).
FD is robust to the distribution of Pareto solutions approximating the whole front.
We establish a rate-optimal design between FD and maximizing the minimal pairwise distance, which is easier to optimize, and provide a detailed solution distribution for this design.

We will incorporate the discussion into the next revision.

Reference

Space-Filling Designs for Computer Experiments: A Review. Joseph et al. 2016.

2024-08-08

The authors have provided detailed responses to my comments, including theoretical rigorous proof, and detailed textual supplements.

I think the revised paper is acceptable.

Additionally, I strongly encourage the authors to conduct similar research on multi-objective combinatorial optimization problems. In such problems, PF is discrete and discontinuous, and current practices generally use a set of non-dominated solutions from different algorithms to replace PF. Your research results may lead to many interesting discoveries in this field.

2024-08-08

We thank the reviewer for the encouraging response as well as the additional suggestion.

We agree that examining the performance on multi-objective combinatorial optimization problems would be interesting. We note, however, considering the disconnected Pareto front thereof, the proposed method theoretically may be inappropriate. Nevertheless, we think that our method would be an interesting baseline; as we mentioned in the general response, we are more than willing to explore the potential recipe "run UMOD on each segment separately" in the future work.

审稿意见

评分: 4置信度: 32024-07-15

This study addresses the problem of generating K uniform Pareto-optimal solutions for multi-objective optimization problems. It introduces a new metric, fill distance, to evaluate the uniformity of the solution set on the Pareto front. This metric is then used as the objective for evolutionary optimization, where a surrogate for fill distance is first proposed, and the problem is then solved using a neural network-based bi-level optimization method.

优点

A new metric is proposed to evaluate the uniformity of the solution set on the Pareto front.
Some theoretical analysis is provided for the surrogate of fill distance.

缺点

While there are many metrics for evaluating the uniformity of the solution set on the Pareto front, they are not discussed, and the proposed fill distance is not compared with their ability to assess uniformity.
There is no discussion on how the proposed fill distance ensures that it can (at least to some extent) accurately evaluate the uniformity of the solution set on the Pareto front.
The standard benchmark set only considers ZDT and DTLZ problems, omitting more complex problem instances.

问题

How sensitive is the algorithm to the initialization method for K preferences?
What are the settings for the algorithms in the experiments? For instance, what are the maximum fitness evaluations and the number of independent runs for each algorithm?
Were statistical tests performed to compare the algorithm's performance?

局限性

It is recommended to provide a sensitivity analysis of the algorithm concerning different values of K.

作者回复

2024-08-04

We sincerely appreciate your valuable reviews and hope our response addresses your concerns.

W1. Fill distance (FD) is not discussed and compared with other metrics for achieving and access uniformity.

We beg to argue some discussion has been provided.

Hypervolume (HV): As noted in lines 124-125, HV is a coarse uniformity metric. It results in equally spaced solutions only for a linear Pareto front, while for a non-linear PF, the configuration of K solutions is generally unknown.
Sparsity. The sparsity indicator considers the sum of squared distances between neighboring Pareto objectives in the non-dominated sorted order. However, summing in the non-dominated sorted order does not have a directly meaning for (more than) three-objective problems.
IGD. Difference between FD and IGD please refer to general response 1.

Moreover, directly using FD and IGD as indicators for a uniform configuration is challenging because they assume the true PF is known. In practice, we optimize a surrogate max-min distance for FD. The table below highlights the differences between optimizing the FD surrogate and optimizing HV to guide preference adjustment in UMOD. Most uniformity indicators show that optimizing the FD surrogate yields better results.

Table 1. Comparison between using FD and HV in UMOD.

Problem	Method	Spacing $\downarrow$	Sparsity $\downarrow$	HV $\uparrow$	Uniform $\uparrow$	Soft Uniform $\uparrow$	IGD $\downarrow$	FD $\downarrow$
ZDT1	FD	0.001	0.027	1.054	0.162	0.017	0.040	0.082
	HV	0.018	0.027	1.054	0.148	0.014	0.040	0.091
ZDT2	FD	0.001	0.027	0.725	0.162	0.018	0.041	0.081
	HV	0.039	0.028	0.727	0.145	0.008	0.046	0.136
RE21	FD	0.001	0.016	1.252	0.125	-0.020	0.032	0.064
	HV	0.026	0.017	1.253	0.099	-0.028	0.033	0.090
RE22	FD	0.002	0.017	1.190	0.125	-0.019	0.033	0.066
	HV	0.010	0.017	1.192	0.107	-0.020	0.033	0.072

W2. How does FD evaluate uniformity?

Uniformity can be measured by the discrepancy between the empirical distribution of representative points and the uniform distribution over $\mathcal{T}$ (Fang et al. 2000). FD can be rewritten exactly as the Wasserstein(W)- $\infty$ distance between a discrete distribution (induced from the input designs) and uniform distribution $U(\mathcal{T})$ . Optimizing the FD is equivalent to minimizing the W- $\infty$ distance from a design to $U(\mathcal{T})$ . Moreover, Minimizing FD is also called to produce space-filling designs (Pronzato et al, 2016), which is a kind of uniform distribution.
For the proposed rate-optimal design approaching minimal FD, Thm. 3/4 demonstrate that the design will asymptotically converge to $U(\mathcal{T})$ and yield an equal spacing design, representing uniformity (see Fig. 1, Line 123).

W3. More complex problem instances besides ZDT and DTLZ in standard benchmark?

Besides ZDT and DTLZ, we experimented with real-world multiobjective test suites (RE problems, Tanabe et al., 2020) and complex multiobjective machine learning problems. The reason we consider more RE problems is that Tanbabe el al. claimed that "they (current synthetic problems) are likely to include unrealistic properties which may lead to overestimation/underestimation". Additionally, we conduct new results on synthetic UF problems (Li et al., 2008) and Fi problems (Lin et al., 2022) (cf one-page pdf).

Q1. Sensitivity to the preferences initialization? We do the following ablation studies (cf one page pdf).

Initial preferences span from $[0, 1]$ to $[0.5, 0.5]$ . UMOD then expands this to the full preference space, finding Pareto solutions up to $[0.9, 0.1]$ .
Non-uniform preferences (truncated Gaussian). UMOD still achieves uniform Pareto objectives, demonstrating robustness to non-uniform initialization.

Q2. What are the algorithm settings used in the experiments, such as number of fitness evaluations and independent runs?

The experiment involves 31 independent runs for MOEA and 5 for multiobjective machine learning tasks. The maximum fitness evaluations are 32,000 for bi-objective, 126,000 for three-objective, and 350,000 for four-objective optimization (except for DTLZ5,6 since they are degenerated). Detailed hyper-parameters are in Table 5.

Q3. Statistical tests?

Algorithms are ran for 31 times independently to obtain robust experimental results. We conducted the Wilcoxon rank-sum test at a significance level of 5%. The p value (%) against others are:

Table 2. p-value of Wilcoxon rank

	SMS	NSGA3	MOEA/D	AWA
IGD	1.76	3.92	0.34	0.01
Spacing	0.02	0.01	0.06	0.01
Sparsity	0.2	0.01	0.01	0.07
Uniform	0.02	0.01	0.01	0.01
SUniform	0.07	0.07	0.07	0.07
MaxGD	1.15	3.38	2.9	0.67
which indicates UMOD is significant.

L1. Sensitivity analysis concerning different values of $K$ .

We investigate K=8, 12, 20, and 30 (cf attached PDF), showing UMOD can handle a wide range of K values.

Reference

Uniform Design: Theory and applications. Fang et al. Technometrics. 2000.

Minimax and maximin space-filling designs: some properties and methods for construction. Pronzato et al. Journal de la Société Française de Statistique. 2017.

An Easy-to-use Real-world Multi-objective Optimization Problem Suite. Tanabe et al. Applied Soft Computing. 2020.

Multiobjective optimization problems with complicated Pareto sets. Li et al. TEVC. 2008.

Pareto Set Learning for Expensive Multi-Objective Optimization. Lin et al. NeurIPS 2022.

评论- Kindly requiring for further comments

2024-08-09

Dear reviewer,

We are more than willing to engage with you further to provide additional results to address any of your remaining concerns. Thanks for your effort on our work!

Best regards, Paper 5680 Authors

评论- Kindly ask for feed-backs

2024-08-13

Dear Reviewer AxUm,

We sincerely thank you for your time and effort in reviewing our work, especially given your busy schedule. As the discussion phase wraps up, we kindly ask for your feedback on our responses. We hope to confirm that we've addressed your concerns and are open to any further questions or discussions.

If our answers meet your expectations, we would appreciate it if you could consider adjusting your score and confidence. We look forward to any further dialogue.

Thank you for your consideration.

Best regards, Paper 5680 Authors

评论- Further experiment results and discussions (Sensitivity of different $K$, FE, independent runs)

2024-08-13

Dear Reviewer,

We have new results that directly address your concerns regarding to the setting of different $K$ 's, which we summarize below for your convenience.

Question: Sensitivity analysis for different values of $K$ ?

Thank you for raising this important issue. We agree that analyzing different values of $K$ is crucial. In our main paper, we focused on a more challenging scenario using small $K$ values ( $K$ =8 for bi-objective and $K$ =21 for three-objective problems) because:

Approximating the Pareto front with a small number of solutions is more difficult, while larger values of $K$ make this easier.
For multiobjective evolutionary algorithms (MOEAs), handling smaller $K$ values is more challenging due to the complexities in the MOEA/D framework. However, we inventively do not want to delve into technical details in the main paper, as UMOD is a general framework applicable to both evolutionary and gradient-based MOO.
Fitting a neural network with fewer solutions is challenging, but we demonstrated that even with small $K$ , the Pareto front model fits well, highlighting the benefits of using neural networks in this context.
Theoretically, as per Theorem 4 in the main paper, maximizing the minimum pairwise distances ensures that as $K$ increases, the distribution converges asymptotically to a uniform distribution over the Pareto front.

Extending UMOD to a larger number of solutions is therefore easier. We present two new results:

The one-page PDF shows promising visual results for different values of $K$ .
In Table 6 of our response to N5xS, we demonstrate that using larger $K$ values (K=50 for bi-objective and K=91 for three-objective problems) yields even better results, outperforming classical methods in most uniformity indicators. For example, the filling distance (covering radius) is only half that of MOEA/D, indicating a more precise coverage of the Pareto front.

We plan to include these results and comparisons with advanced methods like LSSA (Wang et al. 2023) in the revised paper.

Question: Hyperparameters (FE, Seed Number) Used in MOEA?

In addition to the previously mentioned hyperparameters, the study explores the impact of varying FE (Function Evaluations). Detailed results can be found in Table 5 under "Results Comparison with Different FE on Three-Objective Problems." The sections "FE for Many-Solution Problems" and "Analysis of Different FEs" provide further insights, along with our response to N5xS.

Reference:

Wang et al., "Enhancing Diversity by Local Subset Selection in Evolutionary Multiobjective Optimization," IEEE TEC, 2023.

作者回复

2024-08-04

We sincerely appreciate all helpful feedback and comments from the reviewers. In this part, we first address some general comments raised by the reviewers.

Q1 (Asked by 5qjZ and N5xS). Relationship between the convergence of IGD and the uniformity of Fill Distance (FD). The advantages of two indicators?

We first illustrate the relation between Inverted Generational Distance (IGD) and Fill Distance (FD); and then explain the reasons why we prefer FD over IGD.

We (informally) recall the definitions: $\mathrm{IGD}(\mathbb{Y}) = \mathbb{E_{\mathbf{y} \sim U(\mathcal{T})}} \min_{\mathbf{y}' \in \mathbb{Y}} \rho(\mathbf{y}, \mathbf{y}'), \mathrm{FD}(\mathbb{Y}) = \max_{y \in \mathcal{T}} \min_{\mathbf{y}' \in \mathbb{Y}} \rho(\mathbf{y}, \mathbf{y}'),$ where $\mathcal{T}$ denotes the Pareto front, and $\rho$ is the Euclidean distance. Given a set $\mathbb{Y}$ , we denote $d(\mathbf{y}; \mathbb{Y}) = \min_{\mathbf{y}' \in \mathbb{Y}} \rho(\mathbf{y}, \mathbf{y}')$ as the distance function between $\mathbf{y}$ and $\mathbb{Y}$ .

We can then rewrite $\mathrm{IGD}(\mathbb{Y}) = \\|d(\cdot,\mathbb{Y})\\|\_{1, U(\mathcal{T})}$ , the $L_1$ norm of distance functional $d(\cdot,\mathbb{Y})$ under the measure $U(\mathcal{T})$ , and $\mathrm{FD}(\mathbb{Y}) = \\|d(\cdot,\mathbb{Y})\\|\_{\infty, U(\mathcal{T})}$ , the $L_\infty$ norm of the distance functional. By the classic property that $L_\infty$ norm dominates $L_1$ norm, we can immediately conclude that $\mathrm{IGD}(\mathbb{Y}) \leq \mathrm{FD}(\mathbb{Y})$ and thus FD is a stronger metric than IGD. More specifically, a design with tightly controlled FD must imply small IGD, while the reverse is not always true.

(Reviewer N5xS) With the observation above, it is thus improper to claim minimizing FD is generally equivalent to minimizing IGD. FD is a stronger metric, and in many fields, the design of an algorithm to provably attain small $L_\infty$ norm is harder than $L_1$ norm. In addition, we argue all solutions essentially contribute to FD. FD, as the $L_\infty$ norm of the distance functional, is naturally the uniform upper bound over all solutions, and we recall convergence in $L_\infty$ norm is equivalent to uniform convergence. Indeed, any outlier point $y$ away from the design $\mathbb{Y}$ will be captured by FD, while not by IGD.

We finally summarize the reasons why we choose FD as our motivated objective function:

The attractive theoretical properties mentioned above.
FD can be geometrically understood as covering radius.
FD induces a tractable surrogate (Thm. 1). We note, both IGD and FD, cannot be directly optimized without knowing the true PF. Unlike our proposed surrogate, the practical computation of IGD relies on the empirical average over the approximation to the Pareto front, while the quality of the approximation is usually unguaranteed.

Q2 (Asked by JBGb, N5xS). Results on DTLZ 5-7?

UMOD performs exceptionally well on DTLZ5/6, as shown in Figure 3 of the one-page PDF. This is because DTLZ5/6 has a degenerated Pareto front, meaning most preference vectors correspond to duplicate Pareto objectives.

This property makes calculating specific preference vectors corresponding to uniform Pareto objectives more effective (by encouraging to produce different solutions).
In contrast, using fixed preference vectors results in many duplicate Pareto solutions, which is inefficient. While MOEA/D-AWA reduces duplicated solutions through heuristic methods, UMOD significantly outperforms it.
Additionally, UMOD achieves a much smaller covering radius compared to NSGA3 and SMS-MOEA. Although SMS-MOEA is competitive with UMOD on some uniformity indicators, it take more time (around 2x) compared with UMOD.

Visual results for DTLZ5 and DTLZ6 are presented in the one-page PDF.

Table 1: Result on DTLZ5 (15 solutions), max func. evaluation 90,000.

Method	Spacing $\downarrow$	Sparsity $\downarrow$	HV $\uparrow$	IGD $\downarrow$	FD $\downarrow$	Runtime (min.)
UMOD	0.0197	0.0131	0.6999	0.0291	0.076	10.34
AWA	0.0495	0.0282	0.6783	0.0639	0.1513	10.01
MOEAD	0.0871	0.0645	0.6352	0.1430	0.3002	9.85
NSGA3	0.0599	0.0289	0.6623	0.0683	0.1802	0.61
SMS	0.0454	0.0147	0.7035	0.0307	0.1099	16.99

Table 2: Result on DTLZ6 (15 solutions), max func. evaluation 90,000.

Method	Spacing	Sparsity	HV	IGD	FD	Runtime (min.)
UMOD	0.0315	0.0136	0.7008	0.0505	0.0941	11.56
AWA	0.0904	0.0313	0.6697	0.0715	0.1697	10.34
MOEAD	0.0843	0.0599	0.6352	0.143	0.3002	9.34
NSGA3	0.0524	0.0415	0.6623	0.0931	0.2972	0.73
SMS	0.0426	0.0143	0.7036	0.0305	0.1099	21.23

Further comparisons with more recent methods will be included in the revised paper.

For the current version of UMOD, it is not suitable to solve DTLZ7 since DTLZ7 has a disconnected Pareto front. Theorem 3 and Theorem 5 assume that the Pareto front is compact and connected to build a uniform distribution of Pareto objectives, but DTLZ7 is disconnected. To handle disconnected Pareto fronts, one approach is to run UMOD on each segment separately. We will explore this in future work.

最终决定Accept (poster)

2024-09-25

The authors have made substantial improvements in response to the reviewers' concerns, effectively addressing several key issues raised.

One of the primary concerns involved the novelty and effectiveness of the proposed fill distance metric, specifically its distinction from existing metrics like IGD. The authors provided a thorough rebuttal, clarifying its strong theoretical properties for ensuring uniform coverage of the Pareto front. Additionally, the authors emphasized the practical benefits of their surrogate optimization approach, which has been shown to outperform traditional methods in various benchmarks.

Another significant improvement was in the detailed discussion of the algorithm's efficiency, particularly regarding its use of neural networks. The authors provided detailed analysis to demonstrate that the optimization process is competitive with other state-of-the-art methods. This addresses the reviewers' concerns and strengthens the contribution of this paper. The authors also responded well to queries regarding the scalability of UMOD.