c:["$","div",null,{"className":"container py-8 max-w-6xl mx-auto","children":["$","$e",null,{"fallback":null,"children":["$","$L16",null,{"paper":{"id":"3iGponpukH","title":"ScalePerson: Towards Good Practices in Evaluating Physical Adversarial Attacks on Person Detection","abstract":"$17","keywords":["Physical Adversarial Attack","Person Detection","Dataset"],"primary_area":"datasets and benchmarks","venue":"ICLR 2025 Conference Withdrawn Submission","conference":"ICLR","year":2025,"status":"withdrawn","is_accepted":false,"avg_rating":4.75,"avg_rating_normalized":4.75,"rating_min":3,"rating_max":6,"rating_std":1.08972,"review_count":4,"comment_count":5,"creation_date":"2024-09-26","modification_date":"2024-11-13","forum_link":"https://openreview.net/forum?id=3iGponpukH","pdf_link":"https://openreview.net/pdf?id=3iGponpukH","arxiv_id":null,"arxiv_url":null,"arxiv_match_method":null,"arxiv_matched_at":null,"tldr":"Our paper introduces ScalePerson, a new dataset and benchmark for evaluating physical adversarial attacks in person detection, providing standardized metrics and comprehensive analyses across multiple attack methods and detectors.","created_at":"2026-01-21T12:26:19.487885+00:00","updated_at":"2026-04-22T06:00:50.081236+00:00","authors":[{"id":"~Hui_Wei2","name":"Hui Wei","openreview_id":"~Hui_Wei2","position":0},{"id":"~Yuanwei_Liu4","name":"Yuanwei Liu","openreview_id":"~Yuanwei_Liu4","position":1},{"id":"~Xuemei_Jia1","name":"Xuemei Jia","openreview_id":"~Xuemei_Jia1","position":2},{"id":"~Baraa_Al-Hassani1","name":"Baraa Al-Hassani","openreview_id":"~Baraa_Al-Hassani1","position":3},{"id":"~Manhuen_Zhang1","name":"Manhuen Zhang","openreview_id":"~Manhuen_Zhang1","position":4},{"id":"~Joey_Tianyi_Zhou1","name":"Joey Tianyi Zhou","openreview_id":"~Joey_Tianyi_Zhou1","position":5},{"id":"~Zheng_Wang14","name":"Zheng Wang","openreview_id":"~Zheng_Wang14","position":6}]},"stats":{"ratings":[{"id":"9flD26z9gu","value":6,"confidence":3},{"id":"UtPYTinG9o","value":3,"confidence":5},{"id":"nFdoFC7Yf5","value":5,"confidence":3},{"id":"Ojo7ES8o0f","value":5,"confidence":4}],"avg_rating":4.75,"rating_min":3,"rating_max":6,"rating_std":1.2583057392117916,"detailed_scores":{"soundness":[3,2,2,3],"contribution":[3,1,2,2],"presentation":[3,3,2,4],"originality":[],"quality":[],"clarity":[],"significance":[]}},"commentTree":[{"id":"9flD26z9gu","paper_id":"3iGponpukH","replyto":"3iGponpukH","number":1,"type":"Official_Review","role":"reviewer","rating":6,"confidence":3,"soundness":3,"contribution":3,"presentation":3,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":6,"summary":"This work proposes a new person detection dataset, SCALEPERSON, for assessing existing physical adversarial attacking methods on the person detection tasks. It builds a standard benchmark and evaluation metrics to measure the performance of attacks under different settings, which is transparent and insightful for the future physical adversarial attacks works.","questions":"See weakness.","soundness":3,"strengths":"a)\tThis work is well organized and easy to follow. Its motivation is reasonable and provides a solid foundation for the proposed benchmark.\n\nb)\tThis work conducts thorough experiments across various attacks, detectors, and datasets to construct a fair benchmark for existing methods.\n\nc)\tThe quantitative analysis is detailed and uncovers weaknesses of existing datasets and methods.","confidence":3,"weaknesses":"i.\tMy main concern is the quality of the proposed dataset. How many unique persons are used in SCALEPERSON dataset? According to Fig 3, it seems like that the diversity of persons is low.\nii.\tThe AP performance is high, and ASR performance is low on the proposed dataset. Is it caused by the low difficulty and diversity of the proposed dataset? Except for T-SEA, the performance distinction of existing methods is lower on SCALEPERSON than on other datasets. Does it cause the proposed dataset not a qualified benchmark to evaluate these methods?\niii.\tMore statistical numbers of the proposed dataset should be provided, such as the gender ratio, occlusion levels, and ages.","contribution":3,"presentation":3,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."]},"created_at":"2024-10-29T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This work proposes a new person detection dataset, SCALEPERSON, for assessing existing physical adversarial attacking methods on the person detection tasks. It builds a standard benchmark and evaluation metrics to measure the performance of attacks under different settings, which is transparent and insightful for the future physical adversarial attacks works.

","questions":"

See weakness.

","strengths":"

a)\tThis work is well organized and easy to follow. Its motivation is reasonable and provides a solid foundation for the proposed benchmark.

b)\tThis work conducts thorough experiments across various attacks, detectors, and datasets to construct a fair benchmark for existing methods.

c)\tThe quantitative analysis is detailed and uncovers weaknesses of existing datasets and methods.

","weaknesses":"

i.\tMy main concern is the quality of the proposed dataset. How many unique persons are used in SCALEPERSON dataset? According to Fig 3, it seems like that the diversity of persons is low.\nii.\tThe AP performance is high, and ASR performance is low on the proposed dataset. Is it caused by the low difficulty and diversity of the proposed dataset? Except for T-SEA, the performance distinction of existing methods is lower on SCALEPERSON than on other datasets. Does it cause the proposed dataset not a qualified benchmark to evaluate these methods?\niii.\tMore statistical numbers of the proposed dataset should be provided, such as the gender ratio, occlusion levels, and ages.

","code_of_conduct":"

Yes

"}},{"id":"UtPYTinG9o","paper_id":"3iGponpukH","replyto":"3iGponpukH","number":2,"type":"Official_Review","role":"reviewer","rating":3,"confidence":5,"soundness":2,"contribution":1,"presentation":3,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":3,"summary":"The manuscript introduces SCALEPERSON, a novel dataset designed to evaluate physical adversarial attacks on person detection systems. Addressing limitations in existing evaluations—such as inconsistent setups and lack of a dedicated dataset—the paper establishes a comprehensive benchmark that standardizes evaluation metrics and includes critical factors like person scale, orientation, number of individuals, and capture devices. The benchmark assesses 11 state-of-the-art attack methods against 7 mainstream detectors across 3 datasets, totaling 231 experiments, providing detailed insights into the efficacy of these attacks.","questions":"Please refer to the weaknesses.","soundness":2,"strengths":"1. Originality: The paper introduces SCALEPERSON, a novel dataset specifically designed for evaluating physical adversarial attacks on person detection systems\n2. Quality: The paper features a comprehensive benchmark that systematically evaluates 11 state-of-the-art attack methods against 7 mainstream detectors on 3 datasets of person detection, ensuring robust and detailed analysis.\n3. Clarity: The writing is clear and well-structured, effectively communicating the purpose and methodology behind the dataset and benchmark.\n4. Significance: The introduction of SCALEPERSON advances the field by providing a resource for evaluating person detection systems.","confidence":5,"weaknesses":"$18","contribution":1,"presentation":3,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."]},"created_at":"2024-11-01T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

The manuscript introduces SCALEPERSON, a novel dataset designed to evaluate physical adversarial attacks on person detection systems. Addressing limitations in existing evaluations—such as inconsistent setups and lack of a dedicated dataset—the paper establishes a comprehensive benchmark that standardizes evaluation metrics and includes critical factors like person scale, orientation, number of individuals, and capture devices. The benchmark assesses 11 state-of-the-art attack methods against 7 mainstream detectors across 3 datasets, totaling 231 experiments, providing detailed insights into the efficacy of these attacks.

","questions":"

Please refer to the weaknesses.

","strengths":"

Originality: The paper introduces SCALEPERSON, a novel dataset specifically designed for evaluating physical adversarial attacks on person detection systems
Quality: The paper features a comprehensive benchmark that systematically evaluates 11 state-of-the-art attack methods against 7 mainstream detectors on 3 datasets of person detection, ensuring robust and detailed analysis.
Clarity: The writing is clear and well-structured, effectively communicating the purpose and methodology behind the dataset and benchmark.
Significance: The introduction of SCALEPERSON advances the field by providing a resource for evaluating person detection systems.

","weaknesses":"$19","code_of_conduct":"

Yes

"}},{"id":"nFdoFC7Yf5","paper_id":"3iGponpukH","replyto":"3iGponpukH","number":3,"type":"Official_Review","role":"reviewer","rating":5,"confidence":3,"soundness":2,"contribution":2,"presentation":2,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":5,"summary":"This paper addresses the problem of evaluating physical adversarial attacks on person detection systems. The main issues highlighted are the lack of consistent experimental setups and ambiguous evaluation metrics that hinder fair comparisons, and the absence of a dedicated dataset designed for assessing physical adversarial attacks, leading to evaluations on datasets not ideally suited for this purpose.\n\nThe authors propose SCALEPERSON, the first dataset specifically designed for evaluating physical adversarial attacks in person detection. This dataset incorporates critical factors such as person scale, orientation, number of individuals, and capture devices, providing a more realistic and challenging testbed for evaluating such attacks. Additionally, they introduce a comprehensive benchmark with standardized evaluation metrics and a modular codebase to enhance reproducibility and transparency.","questions":"Pls see the weaknesses above","soundness":2,"strengths":"1. SCALEPERSON is the first dataset designed to address the uneven distribution of person scales in existing datasets, which is crucial for evaluating the effectiveness of adversarial attacks across different scales.\n2. The benchmark includes standardized evaluation metrics and a modular codebase that allows for transparent and reproducible assessments of attack effectiveness.\n3. The authors conduct an extensive evaluation of 11 state-of-the-art attacks against 7 mainstream detectors across 3 datasets, providing multidimensional quantitative analysis.\n4. The analysis uncovers deficiencies in current methods and offers novel insights to inspire future technological advancements.","confidence":3,"weaknesses":"1. While SCALEPERSON addresses the issue of uneven person scale distribution, it may not cover all possible real-world scenarios, which could limit the generalizability of the findings. The collection and use of images in the dataset must adhere to strict ethical guidelines to ensure personal privacy is not compromised.\n2. The effectiveness of the benchmark relies on the selection of attack methods included. If certain effective attacks are not considered, the benchmark may not fully represent the threat landscape.","contribution":2,"presentation":2,"code_of_conduct":"Yes","flag_for_ethics_review":["Yes, Privacy, security and safety"]},"created_at":"2024-11-03T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This paper addresses the problem of evaluating physical adversarial attacks on person detection systems. The main issues highlighted are the lack of consistent experimental setups and ambiguous evaluation metrics that hinder fair comparisons, and the absence of a dedicated dataset designed for assessing physical adversarial attacks, leading to evaluations on datasets not ideally suited for this purpose.

The authors propose SCALEPERSON, the first dataset specifically designed for evaluating physical adversarial attacks in person detection. This dataset incorporates critical factors such as person scale, orientation, number of individuals, and capture devices, providing a more realistic and challenging testbed for evaluating such attacks. Additionally, they introduce a comprehensive benchmark with standardized evaluation metrics and a modular codebase to enhance reproducibility and transparency.

","questions":"

Pls see the weaknesses above

","strengths":"

SCALEPERSON is the first dataset designed to address the uneven distribution of person scales in existing datasets, which is crucial for evaluating the effectiveness of adversarial attacks across different scales.
The benchmark includes standardized evaluation metrics and a modular codebase that allows for transparent and reproducible assessments of attack effectiveness.
The authors conduct an extensive evaluation of 11 state-of-the-art attacks against 7 mainstream detectors across 3 datasets, providing multidimensional quantitative analysis.
The analysis uncovers deficiencies in current methods and offers novel insights to inspire future technological advancements.

","weaknesses":"

While SCALEPERSON addresses the issue of uneven person scale distribution, it may not cover all possible real-world scenarios, which could limit the generalizability of the findings. The collection and use of images in the dataset must adhere to strict ethical guidelines to ensure personal privacy is not compromised.
The effectiveness of the benchmark relies on the selection of attack methods included. If certain effective attacks are not considered, the benchmark may not fully represent the threat landscape.

","code_of_conduct":"

Yes

"}},{"id":"Ojo7ES8o0f","paper_id":"3iGponpukH","replyto":"3iGponpukH","number":4,"type":"Official_Review","role":"reviewer","rating":5,"confidence":4,"soundness":3,"contribution":2,"presentation":4,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"rating":5,"summary":"This work introduces a novel dataset and benchmark for physical adversarial attacks on person detection task, focusing on fair comparison regarding various factors such as scale, orientation, cameras, etc. Also, this work suggests an evaluation metrics: Average Precision (AP) and Attack Success Rate (ASR) for benchmark. With the dataset and benchmark, the authors conduct an extensive evaluation with various attack methods and detectors across the existing and novel datasets.","questions":"Please refer to the weakness part.","soundness":3,"strengths":"1. This work provides a novel dataset designed for studying physical adversarial attack. The dataset consists of person images with an uniformly distributed scale, while the existing datasets (INRIAPerson, COCOPerson) do not.\n2. The presentation is good.\n3. This work provides the extensive experimental results comparing the various adversarial attack methods between datasets.","confidence":4,"weaknesses":"$1a","contribution":2,"presentation":4,"code_of_conduct":"Yes","flag_for_ethics_review":["No ethics review needed."]},"created_at":"2024-11-04T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"summary":"

This work introduces a novel dataset and benchmark for physical adversarial attacks on person detection task, focusing on fair comparison regarding various factors such as scale, orientation, cameras, etc. Also, this work suggests an evaluation metrics: Average Precision (AP) and Attack Success Rate (ASR) for benchmark. With the dataset and benchmark, the authors conduct an extensive evaluation with various attack methods and detectors across the existing and novel datasets.

","questions":"

Please refer to the weakness part.

","strengths":"

This work provides a novel dataset designed for studying physical adversarial attack. The dataset consists of person images with an uniformly distributed scale, while the existing datasets (INRIAPerson, COCOPerson) do not.
The presentation is good.
This work provides the extensive experimental results comparing the various adversarial attack methods between datasets.

","weaknesses":"$1b","code_of_conduct":"

Yes

"}},{"id":"X4aGVUJzLz","paper_id":"3iGponpukH","replyto":"3iGponpukH","number":1,"type":"Withdrawal","role":"author","rating":null,"confidence":null,"soundness":null,"contribution":null,"presentation":null,"originality":null,"quality":null,"clarity":null,"significance":null,"content":{"withdrawal_confirmation":"I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors."},"created_at":"2024-11-13T00:00:00+00:00","modified_at":"2024-11-13T00:00:00+00:00","replies":[],"contentHtml":{"withdrawal_confirmation":"

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.

"}}],"submissionHistory":[]}]}]}]