PaperHub
4.9
/10
Poster4 位审稿人
最低2最高3标准差0.4
3
2
3
3
ICML 2025

CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

This paper introduces CAD-Editor, a novel generative model that enables precise text-based editing of CAD designs.

摘要

关键词
Computer Aided DesignGenerative ModelsText-based EditingLarge Language Models

评审与讨论

审稿意见
3

This paper introduces CAD-Editor, a framework for text-based CAD editing that leverages large language models (LLMs) through a locate-then-infill strategy for identifying modification areas and executing edits. To be more specific, CAD-Editor breaks down the editing process into two steps. First, it identifies the areas that need changes by creating a masked CAD sequence. Second, it further fills in these masked sections with context-aware edits, ensuring seamless integration with the existing design. Besides, CAD-Editor also develops an automated pipeline that generates triplet data using design variation models and Large Vision-Language Models (LVLMs) to preparing training data. Experiments show CAD-Editor outperforms existing methods in both quantitative and qualitative evaluations.

给作者的问题

  1. Would authors please showcase more results by editing parameters in CAD commands?
  2. The reviewer is curious about why did authors choose to remove the long-tail data with more than 3 sketch-extrusion pairs? Could you please give the detailed clarifications?

The reviewer may adjust the original score based on authors' feedback.

论据与证据

Yes, all claims in the submission are clear.

方法与评估标准

For the metrics in the submission, it would be better if COV (Coverage) and MMD (Minimum matching distance) are reported.

理论论述

Yes, the reviewer has checked the correctness of all claims.

实验设计与分析

Yes, the reviewer has checked them all. The experimental designs are reasonable.

补充材料

There is no supplementary material uploaded.

与现有文献的关系

Text-based CAD editing is a branch of Text2CAD tasks, which has been rarely discussed previously and can promote the development of CAD community.

遗漏的重要参考文献

Considering CAD-Editor is a branch of text2CAD task, it would be better to discuss more text2CAD efforts in the related work section.

For example, "CAD Translator: An Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling. ACM MM 2024" is also a recent text2cad task.

其他优缺点

####Strengths####

  1. This paper is well-structured and easy to follow.
  2. Text-based CAD editing is an interesting application and has been rarely discussed previously.
  3. The paper proposes a new method for text-based CAD editing that utilizes a locate-then-infill strategy, allowing users to edit CADs via natural language commands.

####Weaknesses####

  1. The data representation and training paradigm heavily rely on prior method [*] with minor improvements.
  2. The edited CAD model could be useless if the editing process fails to precisely control the parameters. The current approach does not encode the actual parameters, or at best, it only encodes a limited amount of parameter information.
  3. The proposed stepwise captioning strategy is independent across different modalities, meaning there is no interaction between information from them, which may lead to suboptimal results. Additionally, the current LVLMs struggle to accurately describe complex CAD models using only the sequence modality, which can cause errors to accumulate in the later steps.
  4. The model's generalization ability remains questionable, which means it could potentially underperform in real-world scenarios. Firstly, the generation of synthetic data relies on models like HNC-CAD, which are also trained on the DeepCAD dataset. This introduces a risk of overfitting to the same distribution. Secondly, all data containing more than three sketch-extrusion pairs were excluded, but DeepCAD itself already includes a lot of simple examples. By excluding these CAD models, the resulting training set may be biased toward simpler CAD models. Consequently, the model may exhibit weak generalization and limited reliability in real-world applications.

[*] Zhang, Z., Sun, S., Wang, W., Cai, D., and Bian, J. Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models.

其他意见或建议

  1. To enhance the generalization ability of proposed model, complex data like more sketch-extrusion pairs should be included in the training set.

  2. Although Text-based CAD editing has not been discussed yet, the inability to control command parameters still leaves current CAD-Editor far from practical applications.

作者回复

1. Discussion on CAD Translator

We will include this work and other recent text-to-CAD efforts in the Related Work section. Since CAD Translator does not take existing CAD models as input, it cannot directly handle the text-based editing task discussed here. Additionally, its code is not publicly available, preventing experimental comparison.

2. Data Representation and Training Paradigm Rely on FlexCAD

While our work is built upon existing CAD representations and seq2seq training paradigms of FlexCAD, these are not the focus of our research. Our main novelty lies in defining a new task and developing a tailored framework to address its unique challenges.

  1. New Task. We define text-based CAD editing, enabling precise modifications via textual instructions—unlike FlexCAD on CAD variation.
  2. Automated Data Synthesis Pipeline. A key challenge is the lack of triplet data (original CAD, instruction, edited CAD). We propose a novel pipeline that combines design variation models with LVLMs, along with a stepwise captioning strategy to improve caption quality. This enables the creation of a high-quality dataset, which we release for future research.
  3. Locate-Then-Infill Framework. We decompose editing into two stages, with tailored solutions and ablation studies validating its effectiveness.
    • Locating: Identifying modifiable regions via masked CAD sequences. We tackle the lack of supervision signals using LCS-based mask generation.
    • Infilling: Generating edits for masked regions. We enhance data quality via selective dataset.

3. Control Parameters

  1. We have included parameter information in our dataset. Specifically, the sequence-level captioning approach more frequently captures parametric cues. As shown in Figure 3, parameter-related information such as "reduce by 10 units" is included, and the model is trained to associate such expressions with corresponding geometric changes.
  2. we analyzed the dataset and found that:
    • 11.10% of instructions contain explicit numeric expressions,
    • 12.85% include number words that represent quantities (e.g., one, two, first); see cases 4, 6, 8 in Figure 5;
    • 31.33% feature implicit parametric cues such as "half", "double", "left", "top", "center", or "end" (see cases 1,2,4,6 in Figure 5).
  3. This indicates that the model receives parameter-related supervision during training.
  4. We add qualitative examples (Figure 2 in https://anonymous.4open.science/r/CAD-Editor-MoreResult-DBDC ) demonstrating the model’s ability to interpret parameterized instructions, such as generating a correctly sized hole for "Add a 44-unit diameter circular hole" and producing a proportionally smaller cylinder for "reduce the cylinder's height by half". These examples will be included in the revised paper.

4. Independent across Different Modalities

  1. Our experiments indicate that LVLMs do not struggle significantly more with either sequence or visual modality. We manually reviewed 100 samples each from the sequence and visual modalities, with correctness rates of 78% and 83%, respectively—indicating no significant difference in difficulty for LVLMs.
  2. We tested using both modalities together as input and observed a correctness rate of 86%, which was not significantly better than single-modality inputs.
  3. As clarified in Sec. 4 of the paper, our use of both modalities aims to enhance the diversity and coverage of editing instructions, rather than to improve per-sample accuracy. For example, visual modality better captures structural changes, while the sequence modality provides fine-grained numerical edits.

5. Generalization Ability

  1. DeepCAD provides a substantial amount of data for training purposes (approximately 178k instances), whereas other datasets offer significantly less data (Fusion 360 [4] contains only about 8k instances).
  2. Related works on generating CAD models are predominantly trained on the DeepCAD [1, 2, 3, 4], which is a standard in this field. To our knowledge, no other datasets comparable to DeepCAD are available.
  3. Studies have indicated that using DeepCAD is unlikely to result in overfitting due to its extensive and cross-industry data [5]. According to the research that introduces DeepCAD [1] (supplementary E), the model trained on DeepCAD generalize well to Fusion 360, which is collected from different sources to DeepCAD. We evaluate CAD-Editor’s generalization by testing a model trained on DeepCAD directly on Fusion 360. As shown in Table 4 (in https://anonymous.4open.science/r/CAD-Editor-MoreResult-DBDC) , CAD-Editor outperforms baselines, confirming its generalization ability to datasets with different shape distributions.

6. 3 Sketch-Extrude Pairs

Please see 3-Steps of Sketch and Extrude to Reviewer keVC.

References

[1] DeepCAD, ICCV 2021.

[2] SkexGen, ICML 2022.

[3] Hnc-CAD, ICML 2023.

[4] Text2CAD, Neurips 2024.

[5] ABC, CVPR 2019.

审稿人评论

Thanks for authors' feedback, which addresses most of my concerns. I would like to raise the initial score.

审稿意见
2

This paper introduces CAD-Editor, the first framework for text-based CAD editing. The authors frame the problem as a sequence-to-sequence generation task and propose a locate-then-infill approach that decomposes editing into two sub-tasks: locating regions requiring modification and infilling these regions with appropriate edits. To address the lack of training data, they develop an automated data synthesis pipeline combining design variation models with Large Vision-Language Models. Experimental results demonstrate that CAD-Editor outperforms baseline methods in validity, text-CAD alignment, and generation quality.

给作者的问题

Please see the above content.

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

There is no theoretical claim.

实验设计与分析

Yes.

补充材料

This submission has no supplementary material.

与现有文献的关系

As an application paper, this work is related to the CAD field.

遗漏的重要参考文献

No.

其他优缺点

Strengths

  • The paper addresses a novel and practical problem (text-based CAD editing) that has significant real-world applications.
  • The automated data synthesis pipeline is clever and well-designed, leveraging existing design variation models and LVLMs to generate paired data with editing instructions.
  • The locate-then-infill framework offers an intuitive decomposition of the editing task that aligns with how humans might approach CAD editing. -The qualitative results are impressive, showing the system's capability to perform a variety of complex editing operations based on natural language instructions.

Weaknesses

  • The paper lacks a detailed discussion of how the system handles ambiguous or imprecise editing instructions. While Figure 8 shows some examples of diverse outcomes for vague instructions, a more systematic analysis would be valuable to show its stability and robustness.
  • The evaluation can benefit from more comparison with human-designed edits to assess how well the system aligns with human expectations and design standards.
  • The authors acknowledge limitations regarding complex CAD models, but it is important that the author should show whether this method has this potential, as least all the figures in the paper are not that interesting.

其他意见或建议

  • Some figures (particularly Figure 3) are not aligned.
  • The explanation of evaluation metrics in Section 6.1 should be clearer, especially regarding how D-CLIP is adapted from the image domain to CAD models.
  • The paper can benefit from more explicit definitions of technical terms specific to CAD modeling for readers less familiar with the domain.
作者回复

1. Ambiguous Editing Instructions

We agree that handling ambiguous editing instructions is a critical challenge. However, ambiguity in natural language is a long-standing issue in NLP research [1]. As the first work on natural language-driven CAD editing, our focus is on establishing a complete and effective framework. Comprehensive ambiguity resolution remains an avenue for future work. Considering your concerns, we conduct a systematic analysis based on the phenomenon shown in Figure 8. Specifically:

  • (a) Dataset Construction: We used LLMs to identify ambiguous editing instructions from our dataset. We selected the top 100 instructions with the highest ambiguity scores as judged by GPT4-o.

  • (b) Inference Diversity: For each ambiguous instruction, we ran our model three times, resulting in 300 generated outputs in total.

  • (c) Human Evaluation: We asked three human annotators to assess whether each generated result aligned with the intended instruction. For each instruction, we recorded how many of the 3 outputs were judged as "semantically consistent". The distribution was as follows:

    Correct Outputs (out of 3)Percentage (%)
    05
    116
    247
    332

2. Alignment with Human Expectations

As detailed in Sec. 6.2, we have conducted human evaluation, which is specifically designed to assess both the alignment with textual instructions and the overall visual quality of the edits. Importantly, human raters were instructed to take into account how well the output aligns with human expectations and common design standards as part of their evaluation criteria.

3. Complex CAD Models

We address this concern by highlighting that our paper already presents several complex editing instructions and providing new qualitative results. Please refer to Complexity of Editing Instructions for Reviewer keVC.

4. More Details about D-CLIP

The D-CLIP was originally proposed in the image generation domain to measure whether the semantic direction in CLIP space between two images aligns with the direction between two corresponding text prompts (e.g., “a face”(source) → “a smiling face” (target)). In our work, we adapt this idea to the CAD editing setting as follows:

  • We render the original and edited CAD models into images.
  • We define a neutral base text (e.g., “This is a 3D shape”) and concatenate it with the editing instruction to form the target text, mimicking the (source → target) text pair in the original D-CLIP formulation.
  • We then compute the CLIP-space direction between the image embeddings of the original and edited shapes and compare it to the direction between the text embeddings of the neutral and edited instruction texts. This adaptation allows us to measure whether the visual change in the CAD model aligns with the semantic intention expressed in the editing instruction. We will update the metric description in the revised version to improve clarity.

References

[1] Promptify, UIST 2023.

审稿意见
3

This paper introduces a text-based CAD editing framework. The authors propose an automated data synthesis pipeline that generates triplet data with VLMs and variation models. They designed a locate-then-infill framework to perform the editing process.

给作者的问题

  1. How do you assess and mitigate errors introduced by LVLMs in the data synthesis pipeline?

论据与证据

  1. Automated data synthesis pipeline: It is derived based on HNC-CAD's auto completion, therefore, the differences always come from the latter part of the sequences. I am wondering if the method can handle cases where the start of the CAD sequences needs to be edited.

2.Locate-then-infill, This claim seems reasonable and supported by ablation studies.

方法与评估标准

The proposed method makes sense to a certain extent where we need to add or replace some parts of the CAD modules. However, some simpler cases where users use text prompt to just scale the CAD models are not reasonable to me since direct editing could be simpler than describing in text.

The evaluation criteria are reasonable but could be improved, please see Experimental Designs Or Analyses for details.

理论论述

For the locate-then-infill decomposition, the LCS-based mask generation assumes token-level correspondence between C_ori and C_edit, which may fail for edits involving structural reordering. The paper’s focus on simpler edits mitigates this, but it could be a problem in complex scenarios.

实验设计与分析

The experiments evaluated the framework across multiple dimensions (validity, realism, edit consistency) with VR, JSD, D-CLIP. However, adding some reconstruction metrics may better illustrate the effectiveness of the method. For example, you can generate the text descriptions to some ground truth target CAD sequences and ask the model to edit the source to the target and calculate metrics like CD, EMD, COV, etc.

补充材料

Yes I reviewed the supplementary materials.

与现有文献的关系

This work can contribute to the industrial design community to accelerate the design process for CAD models.

遗漏的重要参考文献

None

其他优缺点

The D-CLIP metric, adapted from image editing, may not fully capture the nuances of CAD editing. How are the images rendered? what if the edited region is occluded in certain camera poses?

其他意见或建议

None

作者回复

1. Edit the Starting of a CAD Sequence

  1. To support edits at diverse positions, we adopt a reversible annotation strategy as described in Sec.4: for each training pair, we also include a version with edited sequence and original sequence swapped, encouraging the model to generalize across editing directions.
  2. We construct a dataset with 200 cases, each requiring modifications at the beginning of the CAD sequence. Test results show that 75.5% of the generated outputs successfully include edits at the start. We attribute this capability to the strong generalization ability of LLMs.

2. Direct Editing vs. Text-based Editing

We agree that direct editing is simpler for scaling CAD models, which is why we did not focus on such cases (as shown in the qualitative results). Text-based editing is more beneficial when modifying sketch-extrude components and spatial relationships.

3. LCS-based Mask Generation

  1. Our work focuses on semantically meaningful and structurally coherent edits (e.g., feature addition, deletion, or modification), which typically preserve the relative order of most operations. In such cases, LCS provides a practical and effective way to generate ground-truth masks. Empirically, this method performs robustly, and many real-world edits also fall into this category.
  2. We are not entirely certain about the precise meaning of "structural reordering" as mentioned by the reviewer. Based on our understanding, it may refer to scenarios such as:
  • Swapping the order of two sketch-extrude (SE) pairs with unchanged content. Here, LCS can still match one SE pair and identify the other as edited, resulting in a masked segment that is then infilled.
  • Or, consider a case like a flat plate with two holes. If the editing instruction is: "use two separate 2D sketches followed by extrusions to create the holes, instead of creating a single 2D sketch and performing two cut operations", then the entire sequence C_editC\_{edit} differs from C_origC\_{orig} . In this case, our LCS-based mask generation still applies, as it correctly identifies that all tokens need to be masked and regenerated.

4. Reconstruction Metrics

  1. We have included JSD scores in our evaluation, measuring the similarity between generated results and ground truth at the distribution level. This serves as a distribution-level reconstruction metric.
  2. Following your suggestion, we will incorporate sample-level metrics like Chamfer Distance (CD) in the revised version to better assess the fidelity of individual edits. As shown in Table 3 in https://anonymous.4open.science/r/CAD-Editor-MoreResult-DBDC , CAD-Editor outperforms baselines.

5. D-CLIP Metric

We would like to address your concerns as follows.

  1. In our experiments, all CAD models are rendered using a same protocol: views are automatically scaled, centered, and captured from a fixed camera angle, consistent across all methods and samples. This ensures a fair comparison under identical visual conditions.
  2. In practice, our test set contains over 10,000 diverse examples, and we observe that aggregating results over such a large dataset helps mitigate viewpoint-specific bias. Importantly, all methods are evaluated under the same rendering protocol, ensuring that D-CLIP remains a fair and comparable metric. Finally, the D-CLIP scores align with our human evaluations (H-Eval), which directly assess the alignment between the editing instruction and the resulting CAD model.
  3. We acknowledge that any single viewpoint may fail to capture certain edited regions due to occlusion. On the other hand, naïvely averaging D-CLIP scores across multiple random viewpoints can introduce semantic bias, as some views may be unrelated to the editing instruction and show no meaningful change, thereby diluting the score. Ideally, the viewpoint should be selected adaptively based on both the editing instruction and the 3D geometry, so as to maximize visibility of the modified region. However, It's hard to reliably achieve this.

6. Assess and Mitigate Errors Introduced by LVLMs in Data Synthesis Pipeline

  1. Assessment
  • In the early development stages, we relied on human evaluation to assess errors introduced by LVLMs. For example, manually reviewing 100 samples showed that stepwise captioning improved correctness from 65% (basic) to 81%. Qualitative comparisons are provided in Appendix B.
  • In the later stages, we established a test set where editing instructions were human-verified. Evaluation metrics on this set (e.g., VR, JSD, D-CLIP) indicate whether the training data generated by LVLMs contain significant errors when the model is fixed.
  1. Mitigation LVLM-induced errors are mitigated through a stepwise captioning strategy (Sec. 4) and a selective dataset (Sec. 5.2), as demonstrated by ablation studies on the human-verified test set (Table 2).
审稿意见
3

Authors propose a novel task for text-based CAD editing of sketch-and-extrude models. They demonstrate that LVLMs can be used to annotate the editing instructions. A synthesis CAD editing dataset is proposed based on DeepCAD dataset. Finally a locate-then-infill framework is proposed to generate the edited CAD sequence from original and text instructions. Overall task is well-motivated and the proposed method is solid.

给作者的问题

It will be great if authors can demonstrate more complicated editing ability using this approach. A few other questions I want to ask are

1 ) What is the inference speed of the model?

  1. Why only constraint to 3-steps of sketch and extrude, those really limites the CAD complexity. Is there potential scaling issue with this approach?

  2. Why use quantized parameteres instead of float numbers (figure 2) if everything is just text tokens? Is it because the sequence length becomes too long and untrainable?

  3. How novel are the generated results? I know there there are overalp between train/val deepcad data, and deduplication is only done for within training set. That means if the text instruction is similiar then the model can just overfit and remember the results. This is a concern expecially when the training data is small. Some metrics or results that demonstrate the novelty will be nice to have.

论据与证据

The two main contributions of this paper are well-supported. Dataset collection pipeline is clearly explained in section 4 and the supplementary. Table 2 results also demonstrate that locate-then-infill framework is better than directly generating the edited CAD sequence.

方法与评估标准

Finetuning LLM with LoRA and using two-stage approach kind of like CoT makes sense for this problem. In terms of evaluation, both D-CLIP and user evaluations confirm the advantage of the proposed method.

理论论述

N/A

实验设计与分析

Line 148 “we use hnc-cad autocomplete to generate variants”. It is a bit unclear how to use hnc-cad to generate design variations. Which part of the original model is kept, and which are autocompleted? What sampling threshold is used, how diverse or related the completed results are w.r.t the original?

补充材料

Yes, I read through all the supplementray.

与现有文献的关系

CAD generation is important for manufacturing design. Been able to control generated result with editing prompts is a next step towards AI-driven content creation. This should have big impact for ML / HCI researchers working on CAD generation.

遗漏的重要参考文献

N/A

其他优缺点

A concern is the lack of complexity for the editing instructions. From the figures, most of them are either 1) adding / removing a extruded solid from the original model, or 2) modifying a single or few loop parameters. Those are very simple instructions that don’t require professional understanding of how different parts of a CAD model relates to each other. Something more useful and commonly find in real-world CAD editing is when I edit one part of the model, another part will change its diameters along with it, something like a fully-constraint model would do but with the use of generative model.

Overall, the paper is a good step in the right direction, but the result quality is very simple and mostly demonstrate understanding of geometry shape, which very likely come from the use of LVLMs (e.g. gpt-4o).

其他意见或建议

N/A

作者回复

1. More Details on Using HNC-CAD to Generate Design Variations

We follow the original implementation of HNC-CAD when setting the retained part of the original model and the sampling threshold.

  • Only one sketch-extrude (SE) component from the original model is retained, while all other SE components are autocompleted by HNC-CAD.
  • For top-p sampling, p is set to 0.95. For completion diversity, please see Figure 9&10 in the original HNC-CAD paper. In short, HNC-CAD can generate variations that differ in the number of SE components, curve types (e.g., lines, arcs, circles), geometric scale and proportions, and spatial relationships.

2. Complexity of Editing Instructions

We address this concern by highlighting that our paper already presents several complex editing instructions and providing new qualitative results:

  • Task Complexity. Text-based CAD editing is inherently challenging. As shown in Fig.1, instructions cover deletion (1st case), addition (5th case), local (2nd/3rd cases) and global changes (4th case). These require understanding not only geometry (e.g., holes, corners) but also the underlying sketch-and-extrusion (SE) structure—e.g., identifying which SE operation controls the “inward curvature” in the 2nd case.
  • Coupled Edits. While we do not explicitly model parametric constraints, our framework can implicitly handle coupled changes. For instance, in Fig.1 (4th case), increasing the wall and overall thickness leads to coordinated changes in diameters and height of four holes. In Fig.9 (3rd case), when two holes are removed, the remaining one is automatically enlarged and centered, showing learned structural adaptation.
  • Complex Editing. Our current approach supports complex edits through stepwise decomposition, inspired by chain-of-thought reasoning. For example, the 3rd case in Fig.9 could be summarized as: “Remove all cutouts, round the corners, and add four cylindrical legs.” This complex goal is achieved through a series of simpler instructions. In the future, we will explore single-step approaches to handling complex editing instructions.
  • Additional Results. See added examples in Figure 1 in the below link.

3. Inference Speed

It is 0.69s for the locating stage and 1.31s for the infilling stage per sample, measured on a single NVIDIA A800 GPU.

4. 3-Steps of Sketch and Extrude and Scaling Issue

This stems from severe data imbalance in the DeepCAD dataset. Among deduplicated 137,012 training models, over 91.1% have three or fewer SE operations, while only 6.2% have 4–5 SEs. Models with more than 5 SEs make up less than 3%, and those exceeding 10 SEs account for just 0.4%. To show the scalability of our method, we conducted an experiment where we artificially balanced the dataset across SE lengths. Specifically, we constructed a training set with 4,000 samples for each SE length from 1 to 5, ensuring uniform distribution. The model was then evaluated on separate test sets for each SE length. As shown in Table 1 in the below link:

  • CAD-Editor consistently outperforms the baselines across all SE lengths.
  • Model performance declines slightly as SE length increases. Since generating longer sequences is inherently more challenging, this result demonstrates that our method generalizes well to complex CAD structures given sufficient training data.

Furthermore, Fig. 2 includes examples with more than three SEs, further showcasing our model’s generalization ability.

5. Quantized Parameters

Discretization is a common technique in geometry modeling to prevent excessively long sequences and improve training efficiency [1,2]. Moreover, DeepCAD dataset has already been quantized, and this format has been widely adopted in recent CAD research [3,4,5].

6. Novelty of Generated Results

In editing tasks, novelty is not always required; the primary goal is to faithfully follow user instructions—even if the result resembles existing designs.

Considering your concern, we evaluate the novelty and uniqueness of generated CAD sequences using the metrics in SkexGen.

  • Novelty: The percentage of generated CAD sequence that does not appear in the training set.
  • Unique: The percentage of generated data that appears once in the generated set.

As shown in Table 2 in the below link , our model achieves slightly higher novelty and uniqueness compared to baselines, indicating diverse generation. More importantly, as shown in Table 1 of our main paper, our method significantly outperforms others in quality, instruction alignment, and validity—key factors in editing tasks.

Link

https://anonymous.4open.science/r/CAD-Editor-MoreResult-DBDC

References

[1] LayoutTransformer, CVPR 2021.

[2]LLaMA-Mesh, arXiv:2411.09595, 2024.

[3] SkexGen, ICML 2022

[4] Hnc-CAD, ICML 2023

[5] FlexCAD, ICLR 2024

最终决定

This paper proposes CAD-Editor, a novel framework for text-based CAD editing with a locate-then-infill design and a high-quality synthetic dataset. Among four reviewers, three gave weak accept recommendations (keVC, PgA5, owCQ), highlighting the novelty, practical relevance, and sound technical design. One reviewer (QCF8) gave a weak reject but acknowledged the contribution as a meaningful exploration for the CAD field and is not opposed to acceptance. The authors provided a thorough rebuttal addressing the concerns. The area chair recommends acceptance.