PaperHub
5.5
/10
Poster4 位审稿人
最低5最高7标准差0.9
7
5
5
5
4.0
置信度
正确性3.3
贡献度3.0
表达3.0
NeurIPS 2024

Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

OpenReviewPDF
提交: 2024-05-12更新: 2024-11-06

摘要

关键词
Image GenerationQR CodeStable DiffusionControl Network

评审与讨论

审稿意见
7

This paper is pioneering in its integration of face identity with QR codes, proposing a novel pipeline for generating customized QR codes with embedded face identity. The key idea is to leverage diffusion models and control networks to create visually appealing QR codes while preserving face identity. The pipeline introduces an ID-aware QR ReShuffle module to address conflicts between face identity and QR code patterns, and designs an ID-preserved Scannability Enhancement module to improve scannability without compromising the face identity and visual quality. The experiment results showcase a perfect balance between face identity, aesthetic quality and scannability.

优点

  • As the first paper to combine face identity with QR codes, the proposed pipeline for generating face embedded QR code is innovative and addresses the practical needs for social connection in real-world scenarios.

  • The IDRS module presents an interesting solution to conflicts arising from different control signals. By rearranging QR patterns to harmonize varying control conditions, the proposed pipeline leverages information from both face images and QR codes to generate customized QR codes.

  • The IDSE module significantly enhances the scannability of QR images using adaptive loss, while concurrently maintaining a faithful representation of face identity.

  • Experimental results show that the generated QR codes successfully preserve face identity, yielding impressive visual results. Notably, there is minimal interference from QR patterns in the face region.

  • The paper is well-organized, thoroughly discussing motivation and related work. Rigorous experiments enhance the credibility of the results.

缺点

  • The paper mentions that the method is limited by the generative models, but does not present bad cases due to failure of generative models. It is recommended to include such examples to provide readers with a better understanding of the algorithm’s limitation.

  • Certain technical details in the paper require further elaboration. For example, definition of error rate is not explictly given. In Figure 3, the image difference visualization DD appears to support the claim that “adaptive loss modifies face region more gently”. However, without a comparison of DD between adaptive and uniform losses, this claim lacks substantiation.

  • The paper lacks a comparison of computational resource requirements with other methods. This omission makes it challenging to assess the practical feasibility of this algorithm.

  • Typos exist; for instance, line 174 should read “with a learning rate of 0.002”

问题

  • Can the authors provide anaylsis of bad cases caused by failure of generative models?

  • Could the authors provide definition of error rate and comparison of image difference visualization DD between adaptive and uniform losses?

  • While the method excels in visual quality, what are the computational resource requirements compared to other methods?

局限性

Yes. The paper adequately discusses limitations and broader impact.

作者回复

Dear Reviewer jWT5,

Thank you for taking the time to review our paper and providing valuable feedback. Below, I will address the raised concerns:

Q1: [Can the authors provide anaylsis of bad cases caused by failure of generative models?]

  • We include some bad cases caused by failure of generative model in Table D of the PDF. This issue may arise from the lack of diversity in the training data, or the model’s inability to generate complex structures or understand nuanced prompts. We hope these examples can provide a clearer understanding of the algorithm’s limitations. Notably, Face2QR is designed to allow new, better generative models to be easily plugged into our proposed training-free framework. As generative models advance, we expect ongoing improvements to address these limitations, increasing the robustness and versatility of future models. Our framework is adaptable and will benefit from these advancements, helping us to push the boundaries of what generative models can achieve.

Q2: [Could the authors provide definition of error rate and comparison of image difference visualization DD between adaptive and uniform losses?]

  • The error rate is defined as e/Nθe/N_\theta where ee is the number of error modules and NθN_\theta is the total number of modules in a QR code excluding the marker and alignment pattern region. In our experiments, the generated QR images have version 5 and 37×3737\times 37 modules in total. Therefore, the value of NθN_\theta is typically 1197=3723×72521197 = 37^2-3\times 7^2-5^2.
  • In Figure 3, the visualization DD primarily illustrates the differences in modification between the face region and the background when using the adaptive loss. We are pleased to provide the comparison of image difference visualization DD between adaptive and uniform losses in Table E of the PDF. It is shown that the modifications in the face region are less pronounced with adaptive loss. A more straightforward comparison between adaptive loss and uniform loss can be found in Table 7, where we demonstrate how different losses affect the nuance of face ID.

Q3: [What are the computational resource requirements compared to other methods?]

  • Face2QR is able to run on one RTX 4090 GPU, so our pipeline has similar computational resource requirements as previous method such as Text2QR [41] and ArtCoder [34]. In the first two modules (IDQR and IDRS), the pipeline goes through Stable Diffusion twice, and the latent code update in the last module (IDSE) can converge within 150 iterations. In terms of generation time, Face2QR outputs a QR image in about five minutes on one RTX 4090 GPU, which is about the same as Text2QR.

Q4: [Typo exists on line 174.]

  • The typos in the article will be addressed accordingly. Thanks for pointing them out.

Thanks again for your review. We hope our response has well answered your questions.

评论

Thanks for the detailed rebuttal. All my concerns have been well addressed. I think this paper is novel and interesting. Thus, I decide to keep my original score (7: Accept).

评论

We are glad that our rebuttal addressed all your concerns. Thanks for your valuable comments and recognition of our work.

审稿意见
5

The article introduces a novel pipeline designed to create customized QR codes that integrate aesthetic appeal, facial identification (ID), and scannability. The proposed approach incorporates three key components: (1) ID-refined QR Integration (IDQR) seamlessly incorporates facial ID into the QR code background (2) ID-aware QR ReShuffle (IDRS) addresses and resolves conflicts between facial ID and QR code patterns (3) ID-preserved Scannability Enhancement (IDSE) optimizes the robustness of QR code scanning while preserving both the facial ID and aesthetic quality.

优点

(1) The motivation of Face2QR proposed in the paper is straightforward, and the method proves to be effective based on the quantitative and qualitative results presented. (2) There is sufficient ablation study to demonstrate the effectiveness of each module in this paper. (3) This paper is well-written and easy to follow.

缺点

(1) The paper's innovation is relatively weak, with each module and its technology being a combination of previous works. (2) Although the paper includes numerous quantitative experiments, it only involves up to 20 identities, all of whom are celebrities. This limitation hinders the ability to fully demonstrate the method's effectiveness for a broader range of ordinary users. (3)The paper does not mention how to set the value of a in line 131.

问题

Quantitative and qualitative experiments on a larger and more diverse group of ordinary users are crucial to demonstrate the scalability and effectiveness of the method.

局限性

The authors adequately addressed the limitations of their work

作者回复

Dear Reviewer MYMo,

Thank you for taking the time to review our manuscript and providing valuable feedback. All raised concerns are addressed below point by point:

Q1: [Each module and its technology is a combination of previous works.]

  • We acknowledge that the components—Diffusion models, Identity Preserved Generative Models, and the latest QR generation methods—are directly used. The reason for this design is to allow new, better methods to be easily plugged into our proposed training-free framework. Our innovation lies in how we control these components, which is why we proposed the three modules: IDQR, IDRS, and IDSE. The key technical contributions of our work are embodied in these three modules, which integrate the components and balance the three control signals. Specifically:

    • IDQR Module: This module integrates a face ID with an aesthetic background. It preserves the facial identity in the generated image aligned with text prompts, ensuring the luminance distribution matches that of a QR code.
    • IDRS Module: This module harmonizes the QR pattern with the face ID. It uses face masks to preserve the fidelity of the face ID and resolves conflicts between the face ID and QR patterns by reshuffling the QR code to match the brightness distribution in the face region.
    • IDSE Module: This module balances scannability and aesthetic quality. It iteratively updates the generated image in the latent space and applies adaptive loss to carefully preserve the face ID while enhancing scannability.

    In Table 1, Text2QR [41] demonstrates unsatisfactory results due to the lack of harmony between the face ID, QR pattern, and background. In summary, the innovation of our work lies not only in the modules of our model but also in the proposed training-free framework for solving complex control problems, where triplet control signals inherently conflict with each other. We believe that this new framework will facilitate subsequent research in the community on the effective control of generative models. We will clarify these contributions in the revised manuscript.

Q2: [Involved identities are all celebrities. How about the method's effectiveness for a broader range of ordinary users?]

  • Our Face2QR system is generalizable to real faces, generated realistic faces, and cartoon faces. As shown in Table C of the PDF, the experimental results demonstrate that facial identities are well preserved and seamlessly blended into the background in all generated QR images, showcasing the effectiveness of Face2QR across these three face types.

Q3: [The paper does not mention how to set the value of a in line 131.]

  • The value aa is the pixel length of one module, which is determined by the QR image size and the version of the QR code. In our experiments, the version 5 of QR code is leveraged, so there are typically 37×3737\times 37 modules in a QR code image. Then for an QR image has size L×LL\times L, the value of aa should be L/37L/37. Such details will be included in the revised manuscript.

Thanks again for your review. We hope our response has well addressed all your concerns.

评论

Thank you for your detailed rebuttal. The reply partially addressed my questions, but there are still doubts regarding Question 2. Reviewer p8m4 also mentioned this issue. The main concern here is to understand the effectiveness of the method on images of other identities, not the type of facial images.

评论

Thanks for joining the discussion. By proposing three modules (i.e., IDQR, IDRS, and IDSE), we introduce a training-free QR generation framework that solves the complex problem of managing inherently-conflicted triplet control signals. This framework can also accommodate other types of identities. For example, we can simply replace InstantID, which is designed to preserve face identity, with components designed to preserve object identity (e.g., SSR-Encoder [53] and CustomNet [54]) to create object-preserved QR codes. Since face identity preservation is particularly challenging due to the uncanny valley effect, where even minor discrepancies in facial features or expressions can cause discomfort and appear unnatural, our focus in this paper aims at solving the challenge of generating aesthetic QR codes with faces.

Also, the Reviewer p8m4’s Question 3 (“The illustrative results are based on the celebrities or movie stars. How about the results to common face?”) queries whether our Face2QR is applicable to common faces. In Table C of the rebuttal pdf, we provide the evidence that Face 2QR generalizes well to real common faces, generated realistic faces, and even cartoon faces.

We apologize for misunderstanding your comments regarding “it only involves up to 20 identities, all of whom are celebrities. This limitation hinders the ability to fully demonstrate the method's effectiveness for a broader range of ordinary users”. We thought you were asking the generalizability of Face2QR to common faces, similar to Reviewer p8m4’s Question 3, rather than its application to object identities. Due to the NeurIPS 24 policy during the author-reviewer discussion period, we currently cannot find any way to show the generation results of object-preserved QR codes, but we will add them in the revised manuscript for completeness.

Thanks again for joining the discussion and providing valuable comments.

[53] Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing. Ssr-encoder: Encoding selective subject representation for subject-driven generation. In Proc. CVPR 2024.

[54] Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, and Ying Shan. Customnet: Object customization with variable-viewpoints in text-to-image diffusion models. In ACM Multimedia 2024.

审稿意见
5

This work proposed Face2QR, a pipeline for generating personalized QR codes that balance aesthetics, face identity, and scannability. It mainly introduces three components: ID-refined QR integration (IDQR) for seamless background styling with face ID, ID-aware QR ReShuffle (IDRS) to rectify conflicts between face IDs and QR patterns, and ID-preserved Scannability Enhancement (IDSE) to boost scanning robustness through latent code optimization. Face2QR outperforms existing methods in preserving facial recognition features within custom QR code designs.

优点

  • Generating personalized QR codes that balance aesthetics, face identity, and scannability seems an interesting topic and application.

  • The design of the major components in this paper, including IDQR for seamless background styling with face ID, DRS to rectify conflicts between face IDs and QR patterns, and IDSE to boost scanning robustness through latent code optimization, is well-motivated and reasonable.

  • Experiments with user studies show the proposed method has strength compared with existing method like Text2QR, ArtCoder etc.

缺点

  • The proposed method is mainly build upon existing works in areas like Diffusion models, Identity Preserved Generative Models, and latest QR generation method like [41]. It is indeed a good application work but may lack technical contributions.

  • The proposed new components are good practice for this specific application about QR code generation. It is a good application paper but I'm not sure whether these insights are general enough to fit a requirement for a NeurIPS paper. For example, it is not clear if the proposed method can generate impact or useful for more general topics such as Identity Preserved Generative Models.

问题

Please refer to details in the Weakness section above.

局限性

Some limitations have been discussed in the submission.

作者回复

Dear Reviewer ebKp,

Thank you for taking the time to review our manuscript and providing valuable feedback. All raised concerns are addressed below point by point:

Q1: [Method is built upon existing work like Diffusion models, Identity Preserved Generative Models and latest QR generation method like [41].]

  • While Diffusion models, Identity Preserved Generative Models, and latest QR generation method are components of our framework, the key technical contributions lie in the three modules (i.e., IDQR, IDRS, and IDSE), which integrate these components and balance the three control signals. More specifically, the IDQR module integrates a face ID with an aesthetic background, the IDRS module harmonizes the QR pattern with the face ID, and the IDSE module balances between scannability and aesthetic quality. In Table 1, Text2QR [41] shows the unsatisfactory results without harmonizing the face ID with QR pattern and the background. More importantly, our contribution also lies in the proposed training-free framework for solving complex control problem in which triplet control signals inherently conflict with each other. We believe this new framework could facilitate the subsequent research in community about the effective control of generative models.

Q2: [Whether these insights are general enough? Will the proposed method generate impact for more general topics?]

  • Since the proposed framework resolves the conflicts among triplet control signals, it is applicable to other generation tasks involving multiple controls. For example, with some modifications to the pipeline, it is feasible to control face identity, object and background simultaneously. One possible solution is to have one module integrate face identity with background, another harmonizes object positions with face identity, and the final module balance the generation of objects and background. Additionally, this framework can be also applied to other tasks, such as generating videos that preserves the motion of a designated person across various backgrounds. Therefore, although our Face2QR is specifically designed for QR code generation, the built training-free framework with triplet controls provides valuable insights and has broader impacts across various fields, especially in the control of generative models.

Thanks again for your review. We hope our response has well addressed all your concerns.

审稿意见
5

The paper presents an interesting framework to generate face-preserving QR code, which is useful in social entertainment applications. To enable this application, the paper first encode the Face ID information into the QR generation process and a refining process is applied to improve the integrity of facial features as well as the scannability of the QR code. Experiments results show the effective of the proposed algorithm. The paper is well presented and would be easy to reproduce.

优点

  1. The proposed face ID preserved QR code generation is useful in industry applications.
  2. The presentation of the paper is clear and easy to follow. There are sufficient details to reproduce the paper.

缺点

  1. The paper seems to be an engineering report. The novelty of the paper is limited and it seems more likely to be an application paper rather than a neurips submission.

  2. For the experimental evaluations, there are several points which should be well improved.

  • The evaluation test set is relatively small. For example, for scanning robustness test, there are only 20 QR codes for the test, which may not be statistically effectively.
  • For the ID-preserving results, it seems the ID results have been compromised compared with the original images.
  • Also, the generated face results seems not be consistent with the QR codes, as shown in Figure 1 and Table 1. It seems to be more like a simple combination of a QR code with a face image.
  1. The illustrative results are based on the celebrities or movie stars. How about the results to common face?

问题

The main concern is on the novelty of the paper. Please well justify the novelties of the paper.

局限性

The paper has discussed the potential limitations in the Conclusion section.

作者回复

Dear Reviewer p8m4,

Thank you for taking the time to review our manuscript and providing valuable feedback. All raised concerns are addressed below point by point:

Q1: [The paper is more likely to be an application paper rather than a neurips submission.]

  • We believe that our Face2QR is novel since it is pioneering in the field of image-to-image generation to generate QR codes that preserve face ID, scannability and aesthetic quality at the same time. The pipeline is able to balance between three inherently conflicting control signals and achieve the SOTA performance. More specifically, the IDQR module preserves the facial identity in the generated image aligned with text prompts, and ensures the luminance distrbution matches that of a QR code. The IDRS module uses face masks to preserve the fidelity of face ID and resolves conflict between face ID and QR patterns by reshuffling the QR code to match the brightness distribution in the face region. The IDSE module iteratively updates the generated image in the latent space and applies adaptive loss to carefully preserve the face ID while enhancing scannability. We will make these contributions clearer in the revised manuscript.

  • The innovation of our work not only lies in the modules of our model but also lies in the proposed training-free framework (no parameters updated) for solving complex control problem in which triplet control signals inherently conflict with each other. We believe that this new framework will facilitate the subsequent research in community about the effective control of generative models.

Q2.1: [The evaluation test set is relatively small.]

  • This paper adopts setting of scanning robustness test from previous works [41,34], which also use a batch of 20 samples. We conducted a scanning robustness experiment with an expanded test set of 100 QR codes, using the same settings as described in the manuscript. The test results of our Face2QR are shown in the table below, with an average successful rate over 95%. The successful rate of this new scanning robustness experiment is consistent with that reported in the manuscript.
Decoder(3cm)2(3\text{cm})^2@4545^{\circ}(3cm)2(3\text{cm})^2@9090^{\circ}(5cm)2(5\text{cm})^2@4545^{\circ}(5cm)2(5\text{cm})^2@9090^{\circ}(7cm)2(7\text{cm})^2@4545^{\circ}(7cm)2(7\text{cm})^2@9090^{\circ}
Scanner98%96%100%100%99%100%
WeChat95%99%100%100%98%98%
TikTok100%100%100%100%100%100%

Q2.2: [The ID results have been compromised compared with the original images.]

  • In our pipeline, the InstantID [38] network is used to preserve face identity during the generation process. We compare our generation results with the results of InstantID using the same prompt in Table A of the PDF. Compared with the baseline of InstantID, our results with additional QR information show little degradation in quality of face identity. For the generated QR images to be practical in daily life, they must be successfully decoded by standard QR code decoders originally designed for black and white QR codes. Therefore, although the balance between face identity and QR pattern is carefully managed, subtle artifacts, such as color blocks in the face region, may still occur due to the compromises made for scanability. If a decoder is designed specifically for decoding aesthetic QR codes, we believe it can completely eliminate the impact of QR patterns on faces. In the current situation, our method is likely the best solution for balancing face identity, QR code patterns, and aesthetics.

Q2.3: [The face results seems not be consistent with the QR codes. The results are like a simple combination of a QR code with a face image.]

  • In the generated QR image, the face ID is consistent with the original face image. Moreover, the decoded QR code matches the original encoding, indicating the functional consistency between the face region and the QR pattern.
  • We do not consider our generated QR images to be simple combinations of a QR code with a face image. Indeed, conventional methods such as ArtUp [44] attempted to combine them directly, but this often comes at the expense of aesthetic quality. As shown in Table B of the PDF, ArtUp directly pastes the user-provided face image onto the QR code, resulting in much lower aesthetic quality compared to ours. By harmonizing triplet control signals, our Face2QR successfully achieve superiority in balancing aesthetics, face-preserving, and scanability. It is worth noting that the clothes, pose, hairstyle, and other features in the generated images have been adjusted accordingly to achieve holistic semantic consistency, far beyond only combining a QR code with a face image.

Q3: [How about the results to common face?]

  • Our Face2QR is generalizable to real common faces, generated realistic faces, and even cartoon faces. The experimental results are shown in Table C of the PDF. It can be seen that the face identities, no matter which type it is, are well preserved in the generated QR images with seamlessly blending into the background, which demonstrates the effectiveness of Face2QR in these three face types.

Thanks again for your review. We hope our response has well addressed all your concerns.

作者回复

Dear Reviewers and Area Chairs,

We appreciate the reviewers (R1 p8m4, R2 ebKp, R3 MYMo, and R4 jWT5) for their insightful feedback. The reviewers agree that:

Novel approach:

  • R3: "The article introduces a novel pipeline designed to create customized QR codes..."
  • R4: "As the first paper to combine face identity with QR codes, the proposed pipeline for generating face embedded QR code is innovative..."

Effectiveness:

  • R1: "Experiments results show the effective of the proposed algorithm."
  • R2: "Face2QR outperforms existing methods in preserving facial recognition features within custom QR code designs."
  • R3: "There is sufficient ablation study to demonstrate the effectiveness of each module in this paper."
  • R4: "Experimental results show that the generated QR codes successfully preserve face identity, yielding impressive visual results."

Interesting:

  • R1: "The paper presents an interesting framework..."
  • R2: "Generating personalized QR codes that balance aesthetics, face identity, and scannability seems an interesting topic and application."
  • R4: "The IDRS module presents an interesting solution to conflicts arising from different control signals."

Well-Written and Organized:

  • R1: "The paper is well presented and would be easy to reproduce."
  • R2: "The design of the major components in this paper, including ..., is well-motivated and reasonable."
  • R3: "This paper is well-written and easy to follow."
  • R4: "The paper is well-organized, thoroughly discussing motivation and related work."

We have responded individually to each reviewer to address any concerns.

Best Regards,

Authors

References for the PDF file:

[51] Pexels. Accessed: 2024-08-06.

[52] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN. In Proc. CVPR, 2020.

评论

Dear reviewers,

We are pleased that our paper receives all positive ratings, and we are grateful for your valuable and insightful comments.

As the author-reviewer discussion period draws to a close, we want to ensure that any remaining issues or questions are addressed promptly. If there are additional concerns or if further clarification is needed, please do not hesitate to reach out. We are committed to resolving any concerns to facilitate this discussion.

Thank you once again for your precious time and effort in reviewing our submission.

Best regards,

Authors of Submission 4902

最终决定

This work proposed a interesting pipeline, Face2QR, for generating personalized QR codes that balance aesthetics, face identity, and scannability. After rebuttal, the paper receives unanimous accept from all reviewers. All reviewers agree that the paper is well-written and the experimental results are impressive and the proposed method is interesting.