4.3

/10

withdrawn3 位审稿人

最低3最高5标准差0.9

3.3

置信度

正确性2.7

贡献度2.0

表达2.7

ICLR 2025

Crafting Layered Designs from Pixels

Jingye Chen,Zhaowen Wang,Nanxuan Zhao,LI ZHANG,Difan Liu,Jimei Yang,Qifeng Chen

OpenReview PDF

提交: 2024-09-13更新: 2024-11-13

TL;DR

Generate layered designs benefited from non-layered designs.

摘要

关键词

Graphic DesignLayered Image Generation

评审与讨论

审稿意见

评分: 3置信度: 42024-10-31

Based on existing Vision Language Model, the paper presented a layers-aware graphics design system, named Accordion. The system can be used for reference creation, design planning, and layer generation. The pipeline is comprised of three stages: reference creation, design planning, and layer generation. The experiments demonstrate applications of text to template, adding text to background, text de-rendering, and design variation creation.

优点

The system is comprised of various existing components, and seems to work well. It is able to ease the graphics design work.

The result images are delightful. I hope the system could be accessed publicly in the future.

缺点

The innovation is limited. The work integrates various existing tools and large models, such as SAM, VLM, to compose the system. In the Section 3 (Methodology), there is no specific content describing the work, but there is only high-level summary. Hence, I cannot find enough specific academic contribution.

This is an engineering work. I consider it is not suitable for an ICLR research paper.

问题

I'd like to see the runtime table for Stage 1, 2, 3 and various editing tasks, as waiting time highly has a big impact on an editing system.

Is the system adaptable to different resolution images. If given a high resolution image (>5M pixels), how about the running performance.

伦理问题详情

The system may be used to generate fake images.

审稿意见

评分: 5置信度: 32024-11-01

This paper proposes some components for visual designs. It has stages for layer extraction, background processing, element planning, etc, to help the design generation.

优点

The problem and research direction seem solid and has many real-world applications. Some recent commercial products like ChatGPT Canvas also validate the importance of these research directions.

The paper has shown many results with different application types.

缺点

It is not very clear which part is proposed by this paper: The Reference Creation is an existing Vision LLM. (This paper proposed a prompt that works well for it). The OCR, background removal, SAM, inpainting are all existing models. The Vision LLM is trained with an internal dataset Design39K, which seems the contribution of this work. But this contribution (presenting a finetuned VLM) seems relatively weak.

问题

I think this paper may need a major revision to make the exposition much clearer. The authors may want to show at the very beginning that the intended use case of this application is mainly converting AI-generated design images into real design files with layers and texts, as a post-processing for image generators like Flux and SD.

Currently the first stage of this method is “reference creation”, making readers to think that the references are outputs. But the intended application is mainly using existing reference as inputs.

伦理问题详情

N/A

审稿意见

评分: 5置信度: 32024-11-02

This paper presents a workflow for generating layered graphic designs using visual language models (VLMs). The workflow involves three stages from generating reference, design planning, and layers. The proposed workflow can facilitate creation of design variations.

优点

the overall workflow successfully generate nice layered graphic design.
the workflow is versatile for different design scenario
the proposed method is straightforward as long as the training data is provided.

缺点

Although the results seem better than other compared method, it is not clear whether this is because the better image generative model. I recommend to have a fair comparison, e.g., applying the planning and layering method to the design image generated by other image generative models.
the novelty of the method seems limited. I think the main contribution is the process of generating design planning before layering decomposition. But I think it lacks the comparison on combining different previous works with the proposed unified workflow.
lack of experiment and results on applying the proposed method on graphic design in the wild.

问题

it is still very unclear to me why the method need to generate the reference image in the first place instead of using an existing deisgn? I think it makes the overall proposed workflow like a system or a product, without a concentrated technical contribution.
I think the second and the third stages can be achieved by combining other existing works. Therefore, I want to ask is there any specific reason why those existing works for text de-rendering and layer decomposition or code generation from a given design cannot work?

撤稿通知

2024-11-13

We appreciate all the reviewers' efforts and valuable feedback on our manuscript, and decided to withdraw the manuscript.