PaperHub
6.0
/10
Poster4 位审稿人
最低3最高5标准差0.8
4
3
3
5
3.8
置信度
创新性3.0
质量2.8
清晰度3.0
重要性2.8
NeurIPS 2025

TokMan:Tokenize Manhattan Mask Optimization for Inverse Lithography

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29

摘要

关键词
Structured TokenizationManhattan represenetationInverse Rendering

评审与讨论

审稿意见
4

This paper presents TokMan, a novel approach for optical proximity correction (OPC) in semiconductor lithography that formulates mask optimization as a discrete sequence modeling problem. The key insight is to maintain Manhattan geometric constraints throughout the optimization process rather than treating them as post-hoc constraints.

The experimental evaluation on ICCAD2013 benchmarks shows significant improvements: >20% better edge placement error compared to state-of-the-art IL-ILT

优缺点分析

Strengths

Technical Quality:

  • The core technical approach is sound - using diffusion transformers for sequence modeling of Manhattan geometries is well-motivated and properly executed. The mathematical formulation clearly connects forward lithography simulation to structured noise addition, making the denoising perspective natural.
  • Experimental validation is comprehensive with proper baselines (GAN-OPC, Neural-ILT, etc.) and industry-standard metrics (EPE, mask shots, TAT). The >20% EPE improvement and 4x mask shot reduction are substantial and well-documented.
  • Authors are honest about limitations, explicitly discussing scalability issues with SRAFs, evaluation only on sliced layouts, and need for industrial dataset validation.

Clarity and Organization:

  • Paper is generally well-written with clear motivation and problem setup. The connection between Manhattan constraints and tokenization is well-articulated.
  • Method section logically progresses from segmentation → tokenization → training → post-processing.

Significance and Impact:

  • First work to apply sequence modeling to lithography mask optimization - could inspire broader adoption of tokenization approaches in computational imaging.
  • Results demonstrate practical viability for semiconductor manufacturing, not just academic benchmarks.

Originality:

  • Novel combination of diffusion models, transformers, and Manhattan geometric constraints. While individual components exist, their integration for this specific problem is original.
  • The "tokenize everything" perspective applied to semiconductor manufacturing is fresh and well-motivated.

Weaknesses

Technical Limitations:

  • The segmentation algorithm description lacks sufficient detail for reproduction. The "lithography-aware points segmentation" process needs clearer algorithmic specification and parameter settings.
  • Limited theoretical analysis of why tokenization should work better than continuous optimization. The paper relies heavily on empirical validation without deeper theoretical insights.

Experimental Concerns:

  • Evaluation limited to ICCAD2013 benchmarks which may not represent modern lithography challenges. More diverse and recent benchmarks would strengthen claims.
  • Training dataset is synthetically generated by "rearranging polygons" - unclear how well this represents real design patterns and whether the model would generalize to actual industrial layouts.
  • Some results may not align with those in the original referenced papers, such as the number of shots.

Minor issues

  • Should the text in Figure 1 read "w/ OPC" instead of "w/ OPT"?

The paper would benefit from deeper theoretical analysis of why tokenization helps, more comprehensive evaluation on diverse datasets, and clearer algorithmic specifications for key components like the segmentation algorithm.

问题

  • What is a clearer explanation of "lithography-aware points segmentation"?
  • What are the specific criteria for "spatial alignment casting" and corner detection?
  • How sensitive is overall performance to segmentation parameters?

局限性

Yes.

格式问题

N/A

作者回复

We sincerely thank the reviewer for the detailed and thoughtful feedback. Below, we provide clarification and responses to the raised questions and concerns.

Q1&2: What is a clearer explanation of “lithography-aware points segmentation” and the specific criteria for "spatial alignment casting" and corner detection?

Thank you for highlighting the need for further clarification. The lithography-aware points segmentation module is the foundation of our sequence formulation. The key motivation stems from the fact that, under a fixed forward lithography model, the printability of a given layout pattern is influenced not only by its own shape but also by the spatial arrangement of nearby patterns.

Our segmentation algorithm converts polygonal edges into a dense sequence of axis-aligned points by incorporating both intrinsic and contextual lithographic considerations:

Corner-based refinement: Corners are known to suffer from corner rounding due to diffraction. To mitigate this, we apply fine-grained segmentation near detected corners, increasing sampling resolution in these regions to allow for more precise correction.

Spatial alignment casting: For a given edge, we detect other parallel edges in its lithographic interaction zone (a configurable spatial window). The corner vertices of these nearby features are then projected onto the edge being segmented. These projections act as segmentation anchors, embedding cross-feature contextual awareness into the point sequence.

Long-segment partitioning: For extended edges that exceed a lithography-aware printable length threshold, we insert uniformly spaced anchor points, ensuring compliance with minimum printable feature rules.

This process yields a point sequence that not only represents the geometric structure of a polygon, but also encodes interactions and proximity effects from surrounding patterns, making it lithography-aware by construction.

Q3: How sensitive is overall performance to segmentation parameters?

The segmentation parameters—such as corner refinement granularity, alignment radius, and minimum printable length—are selected to reflect realistic manufacturing constraints based on industrial guidelines.

While the model is relatively robust to minor variations in these settings, overly coarse segmentation (e.g., skipping fine corner refinement or disabling spatial alignment casting) can degrade model performance, particularly in terms of Edge Placement Error (EPE), due to insufficient resolution in geometrically or optically sensitive regions.

Conversely, excessive segmentation granularity may lead to longer sequences and increased computational burden, without proportional gain in accuracy. In our experiments, we tuned these parameters empirically to balance manufacturability, simulation accuracy, and training stability. We will add a supplementary ablation of segmentation granularity in the revised version.

Q4: Lack sufficient detail for reproduction for segmentation algorithm; Lack of theoretical analysis

Our segmentation pipeline is described in detail in Section 4.1, where we explicitly introduce three key components: (1) corner rounding mitigation through dense segment splits at polygon corners, (2) spatial alignment casting that ensures connectivity across neighboring edges by projecting parallel corner anchors, and (3) minimum printable segment enforcement that guarantees DFM compliance. These rules are visually illustrated in Figure 3a and are grounded in real lithographic considerations. We also provide extensive descriptions in the Introduction (lines 13–16, 68–72) to motivate this segmentation from both geometric and manufacturing perspectives.

Our work is primarily empirical, but we do motivate the effectiveness of tokenization in Sections 1 and 4.2. The key insight is that Manhattan layouts exhibit strong geometric structure and low entropy, making them highly compatible with sequence modeling. By operating on axis-aligned point tokens rather than pixel grids, our model inherits inductive biases that align with manufacturable mask design. This results in better generalization, stability, and manufacturability, as shown empirically.

Q5: Evaluation is limited to ICCAD2013; synthetic training data; potential mismatch in referenced baseline results

These are important concerns, and we appreciate the chance to clarify.

ICCAD2013 Benchmarks: We chose ICCAD2013 because it remains the most widely used open-source benchmark in OPC literature, allowing direct comparison to prior works such as Neural-ILT and Multi-ILT. We fully agree that more diverse and modern datasets would strengthen the evaluation, and we are currently working with industry partners to validate TokMan on more advanced technique nodes.

Synthetic Dataset Generation: While our training dataset is generated by rearranging real polygons from open benchmarks, we preserve authentic design characteristics such as density, edge alignment, and spacing constraints. Empirically, our model generalizes well to unseen real layouts within the ICCAD2013 test set. These properties are governed by standard design rules, ensuring that the synthesized layouts remain representative and manufacturable.

Potential Mismatch in Baseline Results: All baseline results are obtained using either publicly available open-source implementations or reproduction code that has been acknowledged by the respective authors. Some numerical differences from the originally reported results arise due to differences in evaluation methodology. For instance, in certain official implementations, the mask shot count is computed on a downsampled version of the mask (e.g., from 2048×2048 to 512×512), which we believe does not reflect actual manufacturability and leads to underestimation. We recomputed mask shots at native resolution for fairness. Similarly, our EPE calculation differs slightly from some prior works; we follow a more precise, industry-standard method based on merit points and contour distances, as described in detail in Section 5.1. We believe these choices ensure a more accurate and consistent comparison across all methods.

Q6: Minor issue in Figure 1 ("w/ OPT")

Thank you for spotting this. It should indeed read "w/ OPC." We will correct this in the final version.

评论

I have read all the comments. Thank you to the authors for answering my questions. I have decided to keep my score.

审稿意见
3

This paper proposes TokMan, a novel framework for inverse lithography that reframes mask optimization as a sequence modeling task. The method utilizes a Diffusion Transformer to generate masks that inherently comply with Manhattan geometry constraints. Trained in a self-supervised fashion using a differentiable simulator, TokMan demonstrates state-of-the-art performance on standard industry benchmarks in both pattern fidelity and manufacturability.

优缺点分析

Strengths: 1.The paper is well-written, and the proposed methodology is clearly explained. 2.The formulation of inverse lithography as a token-based sequence modeling problem is novel and original. 3.Extensive experiments show the outstanding effects of proposed framework over previous learning-bases works.

Weaknesses:

  1. As shown in Figure 6, the framework's final performance is heavily dependent on an ILT post-processing step, which weakens the contribution of the core learning model. However, the details of the post-processing step are not explained.
  2. The paper claims to be state-of-the-art but only compares to learning models that don't all focus on optimizing manufacturability. Comparing to more relevant works, such as "Fracturing-aware Curvilinear ILT via Circular E-beam Mask Writer", would be more convincing.
  3. The ablation study only targets EPE, not mentioning #shots or TAT. The reason for the effect of hyper-parameters on EPE is not fully explained.

问题

  1. Could you please elaborate on the details of post-processing?
  2. Why does high heads num or dec layer num, as shown in figure 6, result in worse effect?

局限性

yes

最终评判理由

Thank you for your rebuttal. I will maintain my score. My main concern is the significant reliance on the post-processing step. When a traditional method accounts for nearly half of the EPE improvement, it overshadows the contribution of the core learning model, which is a critical issue for a paper at a top learning conference. Furthermore, the evaluation feels incomplete without crucial robustness metrics like the Process Variation Band (PVB). The ablation study is similarly limited by focusing only on EPE, which doesn't provide a full picture of the trade-offs involved. Therefore, while I acknowledge the originality of the approach, these combined issues prevent me from recommending a higher score.

格式问题

No

作者回复

We thank the reviewer for their thoughtful feedback and helpful questions. We address the concerns as follows.

Q1: Could you please elaborate on the details of post-processing?

We apologize for not providing sufficient details in the original manuscript. As briefly mentioned in Section 4.3, we apply a lightweight ILT refinement as a post-processing step. Specifically, as illustrated on the right side of Figure 2, we treat the predicted displacements from the DiT model as optimization variables. These displacements are first rasterized into a mask image via a differentiable renderer and then passed through our lithography simulator. The simulated wafer image is compared against the target layout using a differentiable loss, and the resulting gradient is backpropagated to refine the displacements.

Importantly, Manhattan mask optimization is an NP-hard, highly underdetermined problem with a vast solution space. Our approach alleviates this by using a learning-based model to generate high-quality initializations that significantly reduce the computational burden during refinement, while remaining fully compatible with traditional optimization-based industrial flows for seamless downstream integration. The entire pipeline is independently designed to align closely with existing industry practices, making it easy to adopt in real-world settings. While the learning-based model remains the core of our method—responsible for producing manufacturable and high-fidelity initial solutions—the hybrid formulation accelerates convergence and has been particularly well-received in early-stage industrial validation. This fast-generation-plus-refinement strategy has been preliminarily validated on industrial datasets and is actively being tested in production environments.

Q2: Why does increasing the number of decoder layers or attention heads (as in Figure 6) result in degraded performance?

This is a great question. In our ablation study, we observed that while small values of feature dimension, number of attention heads, or decoder layers can lead to underfitting (i.e., insufficient model capacity), simply increasing these parameters does not always yield better results.

When the number of attention heads or decoder layers becomes too large, the model becomes over-parameterized relative to the dataset size and task complexity. This can lead to optimization instability and difficulty in generalization, especially given the statistical nature of the training data and the limited number of correction tokens per sample. In particular, for OPC corrections, the model must resolve fine-grained, spatially localized variations—which excessive global attention capacity can dilute.

This effect is most noticeable with large feature dimensions, where the model may struggle to converge within the given training schedule and may overfit to high-frequency noise. Therefore, careful architectural balancing is critical for achieving both accuracy and generalization. We will clarify this analysis in the ablation section.

Q3: Regarding the comparison with "Fracturing-aware Curvilinear ILT via Circular E-beam Mask Writer"

We appreciate the reviewer’s suggestion to consider broader comparisons. However, we believe that a direct comparison with “Fracturing-aware Curvilinear ILT via Circular E-beam Mask Writer” may not be entirely appropriate due to fundamental differences in target mask formats and downstream manufacturing flows.

Specifically, the referenced work focuses on curvilinear masks designed for circular e-beam writers, whereas TokMan is tailored for Manhattan masks, which remain the dominant format in optical lithography pipelines due to their compatibility with existing EDA tools and manufacturing standards. Comparing across such fundamentally different mask paradigms may not yield meaningful insights, as they operate under different assumptions and optimization goals.

Moreover, from a process integration perspective, the manufacturing flow differs: curvilinear masks typically undergo fracturing before EPC (Electron Proximity Correction), while Manhattan masks require EPC before fracturing. Our work is designed to align with the latter flow, aiming to significantly improve both fidelity and efficiency within the existing Manhattan mask infrastructure widely adopted in industry.

We acknowledge the value of curvilinear ILT approaches in their respective subdomain, but we believe it is more appropriate to view them as addressing a different formulation of the ILT problem, rather than as the SOTA baseline for our target setting. We will consider including a brief discussion of curvilinear ILT approaches in the final version to clarify this distinction and position our work more clearly.

评论

Thank you for your rebuttal. I will maintain my score. My main concern is the significant reliance on the post-processing step. When a traditional method accounts for nearly half of the EPE improvement, it overshadows the contribution of the core learning model, which is a critical issue for a paper at a top learning conference. Furthermore, the evaluation feels incomplete without crucial robustness metrics like the Process Variation Band (PVB). The ablation study is similarly limited by focusing only on EPE, which doesn't provide a full picture of the trade-offs involved. Therefore, while I acknowledge the originality of the approach, these combined issues prevent me from recommending a higher score.

评论

On the academic expression front, as highlighted by the reviewers, we fully acknowledge the challenges in presenting the results effectively. The visualizations of accuracy (EPE) and manufacturability (SHOT) metrics, while central to our work, may not have had the intended impact due to the inherent gap between disciplines. We understand this gap and will address it by incorporating more prominent and insightful visualizations in the revised manuscript. This will include enhanced comparisons of localized optimization results to provide a clearer representation of our contributions.

We also recognize the current limitations of our work. At present, our OPC solution focuses solely on main pattern optimization, omitting sub-resolution assist features (SRAF). Without considering SRAF, comparing PVB metrics to other methods would be irrelevant, which is why we did not include PVB metrics in this study. However, we intend to extend our work in the future to incorporate SRAF optimization and include PVB metrics for a more comprehensive analysis. In relation to the ablation study, we admit that the trade-off between manufacturability (SHOT) and precision (EPE) was not clearly presented in the current manuscript. We plan to rectify this in the updated version, ensuring that the trade-off between these two critical metrics is more explicitly illustrated during the optimization process.

epemask shotepemask shot
dim2565.15340.2no iter3.95338.2
5123.95338.2503.00356.3
10247.96339.91002.13392.85
n_head88.69340.352001.95422.2
163.95338.24001.87443.8
325.57339.15
dec_layer36.40337.15
63.95338.2
98.30341.35

The table above shows the results of an ablation study performed on different architectural parameters of the model, focusing on two primary metrics: EPE (Edge Placement Error) and mask shots. These metrics are key indicators of lithographic fidelity and manufacturability, respectively.

EPE (Edge Placement Error): The EPE metric evaluates how accurately the mask aligns with the intended target layout after lithographic simulation. A lower EPE indicates better fidelity. As we observed, increasing the feature dimension (dim) consistently reduces the EPE, with significant improvements noted when going from 256 to 512 dimensions. However, beyond 512, the improvement in EPE becomes marginal, indicating diminishing returns from further increasing the feature dimension. Interestingly, varying the number of attention heads (n_head) and decoder layers (dec_layer) did not result in substantial changes in EPE, suggesting that increasing model complexity beyond a certain threshold may not always enhance accuracy. In fact, excessive complexity could even impede the model's ability to effectively capture the geometric patterns required for OPC.

Mask Shots: Mask shots represent the number of individual exposures needed to create a mask in lithography. A lower number of mask shots indicates a more efficient and manufacturable mask, leading to reduced production time and costs. The results show that increasing the feature dimension reduces the number of mask shots, particularly at dimensions of 512 and higher. Notably, the 'no iter' condition (no post-processing refinement) results in a higher number of mask shots, underscoring the importance of iterative refinement in improving manufacturability. For instance, increasing the number of iterations from 50 to 400 leads to a substantial reduction in mask shots, although beyond 200 iterations, the gains diminish. This suggests that while iterative refinement is crucial for improving manufacturability, further iterations yield diminishing returns and add computational overhead.

EPE and Mask Shots Correlation: It is important to note that EPE and mask shots are not always directly correlated. While EPE reflects the fidelity of the mask, mask shots are more closely tied to manufacturability and production efficiency. Our results show that improving EPE does not always correspond to fewer mask shots. This indicates a trade-off between achieving high fidelity and optimizing for manufacturability, which presents a complex challenge that we plan to explore further in future work. Specifically, achieving a reduction in mask shots without compromising EPE requires advanced post-processing techniques or modifications to the model architecture to better balance these two metrics.

审稿意见
3

This paper introduces TokMan, a framework for Inverse Lithography Technology (ILT). The key problem the authors address is that existing deep learning-based ILT methods often produce masks with curvilinear features, which are difficult and expensive to manufacture and violate the industry-standard "Manhattan" (axis-aligned, orthogonal) geometry constraints. The main contribution is to reframe the mask optimization problem as a sequence modeling task. TokMan first segments the target IC layout into a sequence of Manhattan-aligned points. It then uses a Diffusion Transformer (DiT) model to learn how to predict precise, axis-aligned positional corrections for these points. The model is trained in a self-supervised manner, using a differentiable renderer and a lithography simulator to provide feedback, thus eliminating the need for ground-truth corrected masks. The authors demonstrate that their method achieves state-of-the-art results on standard benchmarks.

优缺点分析

Strengths:

  1. The core idea of "tokenizing" a Manhattan layout is novel and significant. It cleverly translates a geometric correction problem into a sequence-to-sequence task.
  2. The methodology is well-designed and technically sound.
  3. The paper presents an evaluation against multiple recent state-of-the-art methods on standard benchmarks.

Weaknesses:

  1. There are no visualization results for the ablation study of each module.
  2. The current visualizations for comparison do not clearly highlight where the improvements occur, which may make it harder to interpret the gains.
  3. The visualization results comparing different methods are not included in the main text to show the improvements.

问题

  1. Would it be possible to include visualization results for the ablation study of each module to better understand their individual contributions?

  2. Could the authors consider clarifying where the improvements occur in the visual comparisons to make the gains more interpretable? For example, highlighting the regions with more noticeable improvements in the figure using zoomed-in views.

局限性

Yes

最终评判理由

I carefully reviewed the rebuttal. While I understand that numerical improvements (e.g., in EPE) might be difficult to observe visually, the lack of compelling qualitative evidence still makes it challenging to fully assess the contribution of individual modules. Even subtle differences, if critical to manufacturability, should ideally be more clearly communicated, perhaps with domain-specific visualization strategies. A more rigorous ablation could help build confidence. Therefore, I maintain the score at this time.

格式问题

None

作者回复

We sincerely thank the reviewer for the constructive comments and suggestions. Below, we address the raised concerns.

Q1: Would it be possible to include visualization results for the ablation study of each module to better understand their individual contributions? Could the authors clarify where the improvements occur in visual comparisons, for example using zoomed-in regions?

Thank you for the valuable suggestion. We initially chose not to include extensive visualizations for ablation studies because, in Manhattan mask layouts, even significant numerical improvements (e.g., in EPE) may correspond to visually subtle changes. Unlike natural images, lithographic masks are discrete and highly structured, making qualitative differences harder to perceive. This also highlights an important insight: CV-based models, when properly adapted, can be effectively transferred to non-visual, high-precision domains such as computational lithography.

That said, we agree that focused zoomed-in visualizations can improve interpretability. In the revised version, we will include annotated printed wafer images highlighting localized improvements—for instance, better reconstruction of M-shaped patterns or reduced line-edge deformation in dense regions, which can be seen in Supplementary Figures 2 and 3. These changes, although subtle, are critical for manufacturability and print fidelity.

评论

Thank you for the responses to the raised concerns. I appreciate the discussion on the limitations of visualization in Manhattan mask layouts. That said, after carefully reviewing the rebuttal, I would like to explain why I still lean toward a borderline reject decision:

While I understand that numerical improvements (e.g., in EPE) might be difficult to observe visually, the lack of compelling qualitative evidence still makes it challenging to fully assess the contribution of individual modules. Even subtle differences, if critical to manufacturability, should ideally be more clearly communicated, perhaps with domain-specific visualization strategies. A more rigorous ablation could help build confidence.

Overall, I believe the paper is promising and could reach acceptance with further revision and stronger visual or experimental support. Therefore, I maintain the score at this time.

评论

On the academic expression front, as highlighted by the reviewers, we fully acknowledge the challenges in presenting the results effectively. The visualizations of accuracy (EPE) and manufacturability (SHOT) metrics, while central to our work, may not have had the intended impact due to the inherent gap between disciplines. We understand this gap and will address it by incorporating more prominent and insightful visualizations in the revised manuscript. This will include enhanced comparisons of localized optimization results to provide a clearer representation of our contributions.

We also recognize the current limitations of our work. At present, our OPC solution focuses solely on main pattern optimization, omitting sub-resolution assist features (SRAF). Without considering SRAF, comparing PVB metrics to other methods would be irrelevant, which is why we did not include PVB metrics in this study. However, we intend to extend our work in the future to incorporate SRAF optimization and include PVB metrics for a more comprehensive analysis. In relation to the ablation study, we admit that the trade-off between manufacturability (SHOT) and precision (EPE) was not clearly presented in the current manuscript. We plan to rectify this in the updated version, ensuring that the trade-off between these two critical metrics is more explicitly illustrated during the optimization process.

epemask shotepemask shot
dim2565.15340.2no iter3.95338.2
5123.95338.2503.00356.3
10247.96339.91002.13392.85
n_head88.69340.352001.95422.2
163.95338.24001.87443.8
325.57339.15
dec_layer36.40337.15
63.95338.2
98.30341.35

The table above shows the results of an ablation study performed on different architectural parameters of the model, focusing on two primary metrics: EPE (Edge Placement Error) and mask shots. These metrics are key indicators of lithographic fidelity and manufacturability, respectively.

EPE (Edge Placement Error): The EPE metric evaluates how accurately the mask aligns with the intended target layout after lithographic simulation. A lower EPE indicates better fidelity. As we observed, increasing the feature dimension (dim) consistently reduces the EPE, with significant improvements noted when going from 256 to 512 dimensions. However, beyond 512, the improvement in EPE becomes marginal, indicating diminishing returns from further increasing the feature dimension. Interestingly, varying the number of attention heads (n_head) and decoder layers (dec_layer) did not result in substantial changes in EPE, suggesting that increasing model complexity beyond a certain threshold may not always enhance accuracy. In fact, excessive complexity could even impede the model's ability to effectively capture the geometric patterns required for OPC.

Mask Shots: Mask shots represent the number of individual exposures needed to create a mask in lithography. A lower number of mask shots indicates a more efficient and manufacturable mask, leading to reduced production time and costs. The results show that increasing the feature dimension reduces the number of mask shots, particularly at dimensions of 512 and higher. Notably, the 'no iter' condition (no post-processing refinement) results in a higher number of mask shots, underscoring the importance of iterative refinement in improving manufacturability. For instance, increasing the number of iterations from 50 to 400 leads to a substantial reduction in mask shots, although beyond 200 iterations, the gains diminish. This suggests that while iterative refinement is crucial for improving manufacturability, further iterations yield diminishing returns and add computational overhead.

EPE and Mask Shots Correlation: It is important to note that EPE and mask shots are not always directly correlated. While EPE reflects the fidelity of the mask, mask shots are more closely tied to manufacturability and production efficiency. Our results show that improving EPE does not always correspond to fewer mask shots. This indicates a trade-off between achieving high fidelity and optimizing for manufacturability, which presents a complex challenge that we plan to explore further in future work. Specifically, achieving a reduction in mask shots without compromising EPE requires advanced post-processing techniques or modifications to the model architecture to better balance these two metrics.

审稿意见
5

The work proposes a DiT-based method for the ILT method, which achieves SOTA performance.

优缺点分析

  1. This framework provides an effective approach to optimizing masks for the Imaging Lithography Technology (ILT) problem, showcasing its versatility in addressing complex design challenges.

  2. It achieves state-of-the-art (SOTA) performance, setting a new benchmark in the field and demonstrating significant improvements over previous methodologies.

  3. This technique is valuable for engineers and designers in the semiconductor industry, as it streamlines the process of designing masks, ultimately improving manufacturing efficiency and yield.

  4. The research appears to benefit significantly from the implementation of a novel tokeniser module, which enhances the overall performance and flexibility of the optimization process.

问题

How does the work customize for patterns other than piecewise constant?

Is this work derived from some ideas on inverse problems in imaging? Cite relevant works, if any.

局限性

The work could have been written better. To motivate readers.

最终评判理由

The rebuttal is strong and I am inclined to accept.

格式问题

nil

作者回复

We sincerely thank the reviewer for the valuable comments. Below, we address each of the raised questions and concerns.

Q1: How does the work customize for patterns other than piecewise constant?

Thank you for raising this important question. In Section 4.1 of our paper, we detail our lithography-aware segmentation algorithm, which currently focuses on Manhattan geometries—i.e., piecewise constant patterns composed of axis-aligned polygon edges. The segmentation process involves three key stages: (1) dense sampling at corners to mitigate rounding artifacts, (2) spatial alignment casting to enhance consistency across neighboring features, and (3) minimum-feature-length–based partitioning to enforce DFM constraints.

While this paper targets Manhattan mask optimization, we believe the segmentation algorithm is highly extensible to non-axis-aligned and even curvilinear masks. For example, diagonal edges can still be segmented uniformly along their linear span. In the case of curved features—such as Bézier or circular arcs—the segmentation can be performed based on curvature, control points, or arc length to ensure optical fidelity.

Exploring such generalizations to non-piecewise-constant layouts represents an exciting direction for future work, particularly in adapting tokenized representations to richer geometric primitives while preserving compatibility with differentiable lithography pipelines.

Q2: Is this work derived from some ideas on inverse problems in imaging? Cite relevant works, if any.

Our work is conceptually grounded in the framework of inverse problems in imaging. Inverse lithography itself is a classical inverse imaging task, where the goal is to infer a mask pattern that, under a forward optical projection model, reproduces a desired resist pattern on the wafer. Beyond the classical formulation, our main novelty lies in recasting inverse lithography as a sequence modeling problem, inspired by the "tokenize everything" paradigm recently adopted in vision and geometry tasks. Works such as MeshGPT [Shue et al., 2023] and EgoEgo [Chi et al., 2023] demonstrate that structured geometric data can be effectively represented and optimized through discrete token sequences. We extend this idea to lithographic mask optimization under Manhattan constraints, enabling our model to operate directly in a manufacturable token space rather than pixel grids. To our knowledge, this is the first work to apply tokenized diffusion transformers to inverse lithography.

Q3: Limitations — The paper could be written better to motivate readers.

We appreciate this suggestion and acknowledge that the manuscript could benefit from clearer motivation and exposition. In future revisions, we will improve the introduction to better highlight the industrial need for scalable, manufacturable OPC solutions and clarify why tokenization is well-suited for lithography. Structured layouts such as Manhattan masks naturally align with discrete token sequences, making our formulation both geometrically meaningful and fabrication-aware. We will make this connection more explicit and warmly welcome any specific suggestions the reviewer may have. Our broader goal is to bring cutting-edge AI techniques—particularly tokenized generative modeling—into semiconductor lithography, and we hope this work helps bridge these two technically rich and impactful domains.

评论

Dear Reviewers,

Thank you for your thoughtful feedback. We would like to take this opportunity to re-emphasize that deep learning models form the core of our approach.

It is important to highlight that many existing academic works, in their pursuit of achieving the highest precision in layout optimization, often overlook manufacturability as a constraint. For example, while approaches like Multi-ILT do address manufacturability, they resort to brute-force methods, such as local clipping, to improve manufacturability metrics. These post-processing methods are far removed from truly enhancing manufacturability, as they fail to incorporate manufacturability as a key factor during the optimization process. From the very beginning, our approach takes into account manufacturability based on industry experience during the optimization process, thus addressing a significant gap between industry needs and academic work. As pioneers in this area, we have successfully achieved a balance between manufacturability and optimization precision, marking a key distinction in our approach.

It is also worth noting that, even without post-processing steps, our method's precision metrics—such as EPE, which reflect layout optimization accuracy—are on par with traditional methods. This means that we can achieve better precision while maintaining manufacturability. Furthermore, the high-quality initial results provided by the deep learning model significantly improve the effectiveness of any subsequent post-processing. For instance, even though Multi-ILT searches in a more flexible, higher-dimensional space, it cannot achieve results comparable to ours under stricter Manhattan constraints, highlighting the significance of manufacturability in our method.

The need for post-processing is not arbitrary. Through discussions with industry experts, we learned that yield-critical accuracy is the primary focus. Therefore, relying solely on single-image optimization methods is unlikely to replace traditional approaches. Our work aims to accelerate the process and improve precision through a learning-based initialization, which we believe is a novel contribution.

Of course, we acknowledge the current limitations of our work. At present, our OPC solution focuses only on the main pattern optimization and does not consider SRAF. Without SRAF, comparing PVB metrics to other methods would be meaningless. Consequently, we did not include PVB metrics in this work. However, we plan to extend this in future work, where we will consider SRAF optimization and include PVB metrics. Regarding the ablation study, we recognize that the trade-off between manufacturability (SHOT) and precision (EPE) was not clearly presented. We will address this in the updated manuscript, ensuring the trade-off between these two metrics is more explicitly illustrated during optimization.

On the academic expression front, as highlighted by the reviewers, we also acknowledge the challenges in presenting the results effectively. The visualizations of accuracy (EPE) and manufacturability (SHOT) metrics may not have been as impactful as expected, which stems from the inherent gap between disciplines. We fully recognize this gap and will incorporate more prominent visualizations in the revised manuscript to better showcase our contributions, such as enhanced comparisons of localized optimization results.

Despite these limitations, we firmly believe that our approach sets a new benchmark for AI in lithography. The integration of AI into semiconductor manufacturing, particularly within the manufacturing domain, is a difficult yet essential journey. While traditional methods, such as rule-based approaches, continue to dominate the field, the industry has been relatively slow to adopt cutting-edge AI technologies. Much like Prometheus, who bore the burden of bringing fire to humanity without receiving favor from gods or mortals, we are committed to being pioneers in introducing the next generation of AI into semiconductor manufacturing. Our work may not yet receive widespread acclaim, but it is this very challenge that positions us as trailblazers, opening up new application avenues for the AI community. The friction between these two domains may create some discomfort, but it is precisely this collision—AI4SCI—that has the potential to reshape industries. As forerunners, we are eager to lead this transformation, helping to bridge the gap between traditional manufacturing practices and the immense power of AI.

We will be providing additional experimental details shortly, which we believe will help clarify our approach and further demonstrate the strength of our methodology.

Thank you once again for your valuable feedback. We look forward to your further thoughts and reconsideration.

最终决定

This is an interesting paper that studies a novel application of Inverse Lithography Technology (ILT). The authors address a limitation of existing ILT methods that produce masks with curvilinear features, which are difficult and expensive to manufacture and violate the industry-standard "Manhattan" constraints. The main contribution of the paper is to reframe the mask optimization problem as a sequence modeling task. The pipeline combines a IC layout segmentation module that outputs a sequence of Manhattan-aligned points, a DIT module that predicts precise, axis-aligned positional corrections for these points. The model is trained in a self-supervised manner, using a differentiable renderer and a lithography simulator to provide feedback. The authors demonstrate that their method achieves state-of-the-art results on standard benchmarks..

The ratings are mixed. The major concerns are in the experimental evaluations. In particular, one reviewer commented that post-processing contributes more than 50% of performance gains. However, the AC believes this is an issue that is shared by many existing methods (including VGGT). Another concern was on the results. It is recommended that in the final version the authors clearly highlight and discuss the specific aspects in both comparative and ablation study visualizations where their method demonstrates advantages in the revised manuscript.

The paper was accepted because it introduces a new application of ML that brings fresh air to the NeurIPS community and the technical contributions are above the bar for NeurIPS.