PaperHub
5.8
/10
Rejected4 位审稿人
最低5最高6标准差0.4
6
6
5
6
4.3
置信度
正确性2.8
贡献度2.0
表达2.8
ICLR 2025

Chronicling Germany: An Annotated Historical Newspaper Dataset

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05

摘要

关键词
historic newspaper processingdigital historycomputer vision

评审与讨论

审稿意见
6

This paper presents an annotated historical newspaper dataset that was written in German between the 19 and 20th centuries.

The dataset contains 693 pages where each page is manually labelled with both transcriptions of the text and layout elements, such as background, table, etc.

The authors also provided baseline methods assess the layout segmentation, OCR recognition accuracies as well as the performance of the full pipeline.

The authors hope that researchers could build upon the Chronicling Germany dataset to improve historical newspaper processing methods.

优点

  • Chronicling Germany dataset introduces OCR as well as polygon layout analysis problem for historical newspaper scans
  • Compared to Dell 2024, Chronicling Germany appears to have much more complex layouts. This could introduce a new line of problems in historic document analysis
  • The paper is well written and easy to follow

缺点

  • The authors claim that modern human readers will struggle reading the contents of Chronicling Germany however, only provided examples of font differences. As layout analysis is also the primary focus of this dataset, it will be nice if the authors could also focus more on the difficulties for modern readers to understand the layout differences.
  • The authors did not provide information regarding the image sizes or DPI. This could help researchers evaluate the usefulness of the dataset.
  • The benchmark methods doesn’t seem very well justified. Have you explored how existing pipelines trained on modern newspaper perform on your dataset?
  • There is no information about the legibility of the data

问题

  • As the authors mentioned that comprehending the pages is not trivial for modern German readers, why is the reading order within scope of this dataset?
  • Follow up: how was the reading order automatically assigned and evaluated to have “satisfactory results”?
评论

We address questions and feedback below:

  1. The authors claim that modern human readers will struggle reading the contents of Chronicling Germany however, only provided examples of font differences. As layout analysis is also the primary focus of this dataset, it will be nice if the authors could also focus more on the difficulties for modern readers to understand the layout differences.

Thanks for the pointer! The corresponding passage in our submission was indeed unclear on this point, we have updated it now. The issue is twofold: While Humans struggle with the text font, but don't have a problem with layout, computers struggle with font and layout. Newspapers set in Fraktur font are particularly hard for layout recognition, since their visual appearance differs from that of antiqua texts due to its higher rate of black pixels. This combination makes digitization complex.

  1. The authors did not provide information regarding the image sizes or DPI. This could help researchers evaluate the usefulness of the dataset.

Historical Newspapers use various page formats, differing regionally as well as over time. The final image resolution depends both on the size of the original (analogly set) pages and the digitization process. Newspapers were digitzed by a variety of libraries, archives and external service providers using different types of scanners. Many newspapers were first scanned to microform in the 80s and 90s and later on digitized from these films. The web portals providing digitized newspapers disclose neither the original page size nor information on their digitization methods, making it impossible to calculate DPI.

We thank UUca for the suggestion. Overall, 83,8% of our dataset scans fall into a range between 4500 and 5499 width times 6500 and 7499 height pixels. The largest and smallest widths are 5375 and 1800, for the height we have a maximum of 7230 and a minimum of 2510. We have added this information to the introduction section of our paper.

  1. The benchmark methods doesn’t seem very well justified. Have you explored how existing pipelines trained on modern newspaper perform on your dataset?

Font differences do not allow us to transfer results for models trained on modern newspapers. The same differences also complicate the weight transfer of weights trained on historic American data, i.e. from Dell et al. (2024). Most American datasets have no baseline annotations, which datasets of European origin typically do. Our annotations follow European standards for compatibility reasons. To the extent possible, we evaluated Dell for the layout and Kraken and Pero for the OCR.

  1. There is no information about the legibility of the data

Yes, legibility is hard to measure. We could have asked our annotators to rate legibility, but this would have reduced the size of our data set since our resources are finite.

  1. As the authors mentioned that comprehending the pages is not trivial for modern German readers, why is the reading order within scope of this dataset?

While the average modern German reader will struggle to read the fraktur font, this is not the case for our human annotators, who are domain experts. Having a (mostly) sensible reading order provided with the automated OCR facilitates the correction by our human annotators. Reading order could also be annotated manually. Currently, this is not a priority.

  1. Follow up: how was the reading order automatically assigned and evaluated to have “satisfactory results”?

Reading Order includes the order of lines within a region, as well as the order of all text regions within a single page. (Reading order in general and with regard to multiple pages is a future research topic.) Besides sorting lines vertically within one text region, we use an algorithm that estimates the number of columns within a page. This is based on bounding box information of all regions within that page. All elements within one column will be sorted vertically, while different columns are processed from left to right. Furthermore, we consider large separators, that divide a page into different sections, which are completely independent of one another.

审稿意见
6

This paper addresses the challenges of article layout detection and text recognition in historical German newspaper pages. The authors present the "Chronicling Germany" dataset, containing 693 annotated pages from 1852 to 1924, and establish a baseline pipeline for layout detection and OCR tasks. The authors aim to address the challenges of article layout detection and text recognition in historical German newspaper pages, which are essential for NLP and machine learning in digital history. They validate the model's performance on an out-of-distribution set of 112 pages from 1785-1866. Both the dataset and the baseline code are publicly available.

优点

The paper makes a significant contribution by providing a new, large, and annotated dataset of 693 high-resolution historical newspaper pages in German Fraktur, addressing a crucial gap in resources for processing German-language historical documents.

缺点

The paper does not cite significant existing projects that contributed to newspaper recognition:

While the OCR-D project, which develops advanced tools for processing German historical documents https://ocr-d.de/en/, is mentioned in the annex, it should be highlighted in the main text

The study does not leverage high-performance OCR systems such as Pero-OCR https://pero-ocr.fit.vutbr.cz/, which could have enhanced baseline OCR results. This choice limits the impact of the paper’s findings, as more modern and effective systems are overlooked.

The paper’s reliance on an OCR system that requires baseline detection is questionable. These systems are typically designed for non-horizontal or curved text, which is rare in newspaper layouts. A retrained OCR model, especially one leveraging pre-existing models on platforms like HuggingFace https://huggingface.co/Teklia/pylaia-newseye-austrian , would likely have been more suitable.

The use of a relatively older UNet model for layout detection is debatable, given that more recent and effective models like YOLO-based architectures outperform UNet in similar tasks. Table 3's comparison, which references Dell et al., highlights these limitations, suggesting the paper's choice of model may not be optimal. As a result, the evaluation of layout analysis show poor generalization capabilities, as reflected in Table 3.

The full evaluation of the pipeline is limited; the paper only reports a Character Error Rate of 6% without providing further details. Furthermore, article separation is not assessed, making the results appear preliminary and potentially incomplete. Comprehensive evaluation metrics are needed for a stronger validation of the approach.

The legend for Table 3 lacks clarity, especially regarding what is meant by "F1 Score in distribution"

问题

No specific question.

评论

We have done our best to address your concerns below:

1. cite significant existing projects

We thank CCLu for the suggestion. The paper already lists the NZZ blackletter dataset from the impresso team. Unfortunately, we cited the Mannheim University library, which published an OCR-D ground truth update. To rectify the situation, we are now additionally citing [6], where the dataset appeared originally.

Thank you for bringing News Eye to our attention. We have created a new table, in the paper and below:

DatasetPages
Chronicling Germany (ours)693
Europeana [1]528
News Eye Finnish [7]200
Deutscher Reichsanzeiger und Preußischer Staatsanzeiger [2]197
News Eye French [8]183
Neue Züricher Zeitung (impresso) [3, 4, 6]167
News Eye Austrian [9]158
News Eye Competition [5]100 (50 simple track, 50 complex track)

[1] http://dataset.primaresearch.org/www/assets/papers/ICDAR2015_Clausner_ENPDataset.pdf

[2] https://github.com/UB-Mannheim/reichsanzeiger-gt/

[3] https://zenodo.org/records/3333627

[4] https://github.com/UB-Mannheim/NZZ-black-letter-ground-truth

[5] https://zenodo.org/records/4943582

[6] Ströbel, Phillip and Clematide, Simon Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images

[7] NewsEye / READ AS Finnish Newspapers https://zenodo.org/records/5654858

[8] NewsEye / READ AS French Newspapers https://zenodo.org/records/5654841

[9] NewsEye / READ AS Austrian Newspapers https://zenodo.org/records/5654907

2. OCR-D in main text

Thank you for the pointer: We already mention the OCR-D guidelines in the text at the end of section three (lines 297-299); a citation is included. Because of the page limit, we are forced to keep the annotation guidelines in the appendix.

3. leverage high-performance OCR systems such as Pero-OCR

We respectfully disagree. Please take a look at section 4.3; we do compare to Pero. Table 6 lists transformer results using a fine-tuned Transformer as proposed by Kodym and Hradis ( https://arxiv.org/abs/2102.11838 ). The two are the original authors of the Pero-OCR system.

4. The paper’s reliance on an OCR system that requires baseline detection is questionable

Our dataset actually includes curved lines due to their low-quality paper, age and often indirect digitization. Following Kodym & Hradis (2021) [Pero] we apply a CNN-based text baseline detection system by adding line height and text block boundary predictions to the model output, allowing the system to extract more comprehensive layout information. This serves to create bounding boxes around each line, so that OCR can be performed on each line individually. For the transfomer model that follows Kodym & Hradis (2021), the baselines themselves are not processed further, they are merely used to establish boxes around text lines. However, our model based on Kiessling (2022) [Kraken] uses baselines. This approach yields better results.

We are currently exploring the pylaia model and will report back if we have satisfying results.

5. The use of a relatively older UNet model for layout detection is debatable

We have chosen the Unet model, because the segmentation approach captures our problem well and we do not need the higher flexibility of a detection model. Nevertheless, we evaluated the YOLOv8 (same as Dell) on our dataset and got the results below. In some of the classes the YOLOv8 model performs better than our current UNet. But as the UNet performs best on the paragraph class, which is for us the most important, we still believe the UNet is the better suited approach. We list F1-scores:

classid + oodidood
background0.786 ±\pm 0.0090.830 ±\pm 0.0080.716 ±\pm 0.013
caption0.575 ±\pm 0.0050.947 ±\pm 0.0020.112 ±\pm 0.021
table0.696 ±\pm 0.0370.881 ±\pm 0.0070.285 ±\pm 0.042
paragraph0.859 ±\pm 0.0130.916 ±\pm 0.0070.768 ±\pm 0.028
heading0.673 ±\pm 0.0070.827 ±\pm 0.0140.405 ±\pm 0.021
header0.622 ±\pm 0.0500.842 ±\pm 0.0710.164 ±\pm 0.027
image0.178 ±\pm 0.0360.503 ±\pm 0.1910.117 ±\pm 0.050
inverted_text0.157 ±\pm 0.0160.420 ±\pm 0.0780.000 ±\pm 0.000

6. The full evaluation of the pipeline is limited

This paper is a prerequsite for article separation in the Fraktur letter domain. Article separation is another topic that is not within the scope of this work. Generally this paper is about a new dataset, it does not aim to make an algorithmic contribution. Consequently, we aim to demonstrate that our dataset allows automatic text extraction on unseen samples using Character Error Rate. Beyond Character Error Rate Table 9 now reports numbers for the entire pipeline.

7. The legend for Table 3 lacks clarity

Thank you for the suggestion, we have reworked table 4 (formerly table 3) taking your feedback into account.

评论

Dear CCLu, thank you for suggesting Pylaia. We successfully trained the model on our Chronicling Germany Dataset. We report our results below:

ModeldatasetLevenshteinfully correct [%]many errors [%]
LSTM (UBM)id + ood0.0243.57.2
id only0.0148.93.2
ood only0.0335.513.0
LSTM finetuned (ours)id + ood0.02 ±\pm 0.00160.5 ±\pm 0.348.1 ±\pm 0.66
id only0.01 ±\pm 0.00171.3 ±\pm 0.212.9 ±\pm 0.19
ood only0.04 ±\pm 0.00444.6 ±\pm 1.2715.8 ±\pm 1.96
Transformer (ours)id + ood0.04 ±\pm 0.0156.2 ±\pm 1.312.5 ±\pm 2.3
id only0.04 ±\pm 0.0166.1 ±\pm 1.649.7 ±\pm 2.7
ood only0.04 ±\pm 0.0141.7 ±\pm 0.8916.6 ±\pm 2.13
PyLaia (retrained, [1])id + ood0.01 ±\pm 0.0061.2 ±\pm 0.877.3 ±\pm 0.5
id only0.01 ±\pm 0.0073.5 ±\pm 0.421.5 ±\pm 0.15
ood only0.03 ±\pm 0.0043.1 ±\pm 1.5915.7 ±\pm 1.30

It turns out that PyLaia performs slightly better than the LSTM and the Pero-Inspired transformer. Thank you for the suggestion. We will update the paper accordingly. This experiment strengthens the paper since it shows that our dataset allows multiple pipelines to converge. In all cases, convergence produces generalizing networks.

References

[1] Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? from Joan Puigcerver, published in the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017). (https://atr.pages.teklia.com/pylaia/usage/training/)

审稿意见
5

This paper presents a new dataset called "Chronicling Germany", consisting of 693 annotated historical German newspaper pages from 1852 to 1924. The dataset includes layout annotations and ground truth text transcriptions. The authors establish baseline results for layout detection, text line recognition, and OCR tasks using the dataset. They also test generalization on an out-of-distribution test set.

优点

This paper presents a new historical German newspaper dataset, providing layout information and text line annotations, and also offers some baselines for layout detection and OCR tasks.

缺点

  1. The novelty and contribution of this paper is limited. The main contribution of this paper is the historical German newspaper dataset itself. However, compared to existing datasets [1][2][3], this dataset does not have significant uniqueness in dataset size and diversity. [1] Christian Clausner, Christos Papadopoulos, Stefan Pletschacher, and Apostolos Antonacopoulos. The enp image and ground truth dataset of historical newspapers. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 931–935. IEEE, 2015. [2] UB-Mannheim. Ground truth for neue zürcher zeitung black letter period. https://github. com/UB-Mannheim/NZZ-black-letter-ground-truth, 2023a. [3] UB-Mannheim. reichsanzeiger gt. https://github.com/UB-Mannheim/reichsanzeiger-gt/, 2023b.

  2. Lack of quantitative comparison with existing datasets, which fails to show the superiority of the Chronicling Germany dataset.

  3. Lack of detailed assessment of the annotation quality, such as the review of annotation consistency.

  4. Section 5 does not provide detailed results of testing on OOD data, which is insufficient to reflect the generalizability of the pipeline.

  5. The paper incorrectly used the ICLR 2024 template instead of the ICLR 2025 one.

问题

  1. In comparison to other German historical newspaper datasets, how does the quantitative aspects of your dataset fare in terms of size, annotation quality, and diversity?

  2. Have the authros explored employing more advanced baseline methods for your tasks?

伦理问题详情

N/A

评论

We address questions and feedback below:

  1. Novelty and contribution.

In addition to stating sizes in the text, we have now added a table illustrating the size of the data set in comparison to related work. You are reviewing the largest dataset of historic newspaper pages set in fracture letters. For your convenience, we have also listed the table below.

DatasetPages
Chronicling Germany (ours)693
Europeana [1]528
News Eye Finnish [7]200
Deutscher Reichsanzeiger und Preußischer Staatsanzeiger [2]197
News Eye French [8]183
Neue Züricher Zeitung (impresso) [3, 4, 6]167
News Eye Austrian [9]158
News Eye Competition [5]100 (50 simple track, 50 complex track)

[1] http://dataset.primaresearch.org/www/assets/papers/ICDAR2015_Clausner_ENPDataset.pdf

[2] https://github.com/UB-Mannheim/reichsanzeiger-gt/

[3] https://zenodo.org/records/3333627

[4] https://github.com/UB-Mannheim/NZZ-black-letter-ground-truth

[5] https://zenodo.org/records/4943582

[6] Ströbel, Phillip and Clematide, Simon Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images, https://www.zora.uzh.ch/id/eprint/177164/1/ Improving_OCR_of_Black_Letter_in_Historical_Newspapers_The_Unreasonable_Effecti.pdf

[7] NewsEye / READ AS training dataset from Finnish Newspapers (19th C.) https://zenodo.org/records/5654858

[8] NewsEye / READ AS training dataset from French Newspapers (19th, early 20th C.) https://zenodo.org/records/5654841

[9] NewsEye / READ AS training dataset from Austrian Newspapers (19th, early 20th C.) https://zenodo.org/records/5654907

  1. Lack of quantitative comparison with existing datasets, which fails to show the superiority of the Chronicling Germany dataset.

In addition to the discussion on pages 2 and 3, we have added this information in tabular form (See Table 2 in the updated paper and above).

  1. Lack of detailed assessment of the annotation quality, such as the review of annotation consistency.

Thank you for this question. We are an academic project with limited resources. Assessing annotation quality is expensive. To make more valuable annotations available, we opted to spend our domain experts' time creating and checking as many annotations as possible. To ensure high quality, we defined and communicated detailed guidelines as outlined in A.6 of the paper.

  1. Section 5 does not provide detailed results of testing on OOD data, which is insufficient to reflect the generalizability of the pipeline.

Thank you for your input. We now provide out-of-distribution (ood) performance measurements in supplementary table nine.

  1. The paper incorrectly used the ICLR 2024 template instead of the ICLR 2025 one

Thank you for pointing out this mistake. We have rectified the issue in the current version.

Questions

  1. In comparison to other German historical newspaper datasets, how does the quantitative aspects of your dataset fare in terms of size, annotation quality, and diversity?

See point 1.

  1. Have the authors explored employing more advanced baseline methods for your tasks?

We assume 'more advanced' refers to transformer-style architectures or a YOLO model. Table 6 lists results for a Transformer on the optical character recognition task. We added Table 8, which lists performance measurements of a YOLOv8 model trained on this dataset.

审稿意见
6

This paper presents the Chronicling Germany dataset. It consists in an annotated dataset of historical newspapers in German language. The dataset has been constructed after 1,500 hours of human annotation of the layout components. It consists of 693 pages. The dataset also includes 1,900 individually annotated advertisements. According to the authors, it the largest fully annotated collection of historic German newspaper pages. The motivation of creating this dataset is that the problem of historical newspapers understanding lacks of enough data, in particular in some languages like German. The newspapers of the dataset have some singular features. Among others, the particular use of the Fraktur font, the presence of some characters and combinations, and the dense layout. In addition to the dataset, the paper presents a processing pipeline for layout analysis and text recognition, and experimentally evaluates the pipeline on in- and out-of-domain test data.

优点

A useful dataset for the community of digital humanities, covering a gap of low resources data. The dataset has been rigorously constructed, with layout annotations. In addition to the language, the documents in the dataset have some particularities that make it interesting to tackle language-independent document layout analysis and recognition problems.

The complementary pipeline processing that is presented is a good way to illustrate the value of the dataset, comparing the processing with other datasets.

缺点

Annotated data is important for the scientific community. But presenting a new dataset in a conference requires a solid justification on how useful is this dataset in contributing in the progress of the state of the art in the main problems addressed by the community. In this case, an annotated historical newspaper dataset is not in the mainstream of the representation learning community nor the scope of the conference. Its interest is addressed to a marginal audience of the representation learning community.

问题

I believe that the contribution of this dataset is interesting, however as indicated above, it is addressed to a narrow audience, considering the scope of the conference. I am open to be persuaded on the relevance of this contribution in the context of the representation learning area. Beyond the interest in the problem of historical document layout analysis and recognition, from a wider perspective, authors should identify other points that make this dataset interesting for a larger audience.

Comparison with the pipeline developed by Dell et al. (2024) analyzed in table 3. It is not clear to me if there is a crossfold validation, i.e. if the proposed pipeline is tested with the American Stories dataset, as well as the Dell et al. pipeline is tested on the proposed German dataset. It would be good to have different datasets and different methods for the comparison.

======================== AFTER THE REBUTTAL

After interacting with the authors, and looking at the other reviewers' revisions, I will keep the score of the first review. I thank the authors for their clarifying responses, and the effort to consider the comments. Authors provided interesting new material, and I encourage them to include in a potential revised version of the paper.

评论

We address questions and feedback below:

  1. The dataset is addressed to a narrow audience and not in the mainstream of the representation learning community nor the scope of the conference.

The ICLR 2025 call for papers includes datasets and benchmarks in the list of relevant topics. In this regard, we contribute a dataset for the study of a low-resource computer vision problem. We address a pressing need of history scholars. The mass digitization of historical newspapers allows historians and social scientists to address new questions and analyze old questions from a new perspective with big data. However, this potential has not yet been realized. Key reasons for current shortcomings are insufficient layout recognition and OCR (which is often correlated) of historical newspapers. Current layout detection pipelines frequently miss article-level or even column separators. This prevents precise information retrieval because combined search terms are applied to the page level and not the article level, thus generating hits for unrelated search terms (false positives). However, especially when combining large databases, precision is essential; for example, many advanced Natural Language Processing methods need article-level precision of OCR data. For a more detailed discussion of why machine learning is crucial for the study of history, please see Appendix A.4 of the paper.

  1. Authors should identify other points that make this dataset interesting for a larger audience

Following (Zhang et al., 2024), we argue that low-resource problems should receive more attention in computer vision because studying such tasks has led to significant progress in natural language processing. See pages one and two for the full argument.

  1. crossfold validation with dell American stories

This paper makes a datasets and benchmarks contribution. Consequently, its main focus is to establish the validity of the presented data using standard algorithms. We do not establish a novel pipeline. Crossfold validation with the American stories dataset would currently not be feasible since we employ baseline information in our pipeline, which are not annotated in the American stories dataset.

  1. It would be good to have different datasets and different methods for the comparison.

In addition to discussing dataset sizes in section one, we have added a new table (Table 2) that lists our and related datasets. We compare different methods in Tables four, six and seven of the updated paper.

评论

I thank the authors for the detailed answer to my comments, and the other reviewers. I have now a better view of the paper based on the rebuttal report. The addition of some data, as the new table is valuable.

评论

Dear editorial Team,

Thank you for the constructive comments and insightful feedback. We were happy to read that reviewers found the paper well written (UUca), the dataset new (FKw8) as well as rigorously constructed (KWSV), presenting an interesting to tackle document-layout analysis problem (KWSV), while closing a crucial gap in resources (CCLu). Furthermore, UUca noticed that this dataset contained pages with a more complex layout than those in the dataset by Dell et al. (2024), which opens up new research avenues to the community.

Naturally our reviewers also voiced questions and concerns, we have done our best to address these individually and summarize the most important points here.

Chronicling Germany Data-Set size in comparison to related work.

Reviewers asked about the overall motivation for the collection of the data in the first place, as well as its relation to prior art. To address both, we have added a new table (Table 2) to the paper, which we repeat here:

DatasetPages
Chronicling Germany (ours)693
Europeana [1]528
News Eye Finnish [7]200
Deutscher Reichsanzeiger und Preußischer Staatsanzeiger [2]197
News Eye French [8]183
Neue Züricher Zeitung (impresso) [3, 4, 6]167
News Eye Austrian [9]158
News Eye Competition [5]100 (50 simple track, 50 complex track)

[1] http://dataset.primaresearch.org/www/assets/papers/ICDAR2015_Clausner_ENPDataset.pdf

[2] https://github.com/UB-Mannheim/reichsanzeiger-gt/

[3] https://zenodo.org/records/3333627

[4] https://github.com/UB-Mannheim/NZZ-black-letter-ground-truth

[5] https://zenodo.org/records/4943582

[6] Ströbel, Phillip and Clematide, Simon Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images, https://www.zora.uzh.ch/id/eprint/177164/1/ Improving_OCR_of_Black_Letter_in_Historical_Newspapers_The_Unreasonable_Effecti.pdf

[7] NewsEye / READ AS training dataset from Finnish Newspapers (19th C.) https://zenodo.org/records/5654858

[8] NewsEye / READ AS training dataset from French Newspapers (19th, early 20th C.) https://zenodo.org/records/5654841

[9] NewsEye / READ AS training dataset from Austrian Newspapers (19th, early 20th C.) https://zenodo.org/records/5654907

评论

Dear reviewers, Thank you very much for your constructive feedback. If there is anything else we can do to address your concerns, please let us know. Your feedback has helped us improve this paper. If you feel the same way, please consider raising your score.

AC 元评审

This paper discusses the challenges of detecting layouts and recognizing characters in historical German newspapers for in terms of Natural Language Processing and Machine Learning. The authors propose and discuss the Chronicling Germany Dataset, which consists of 693 annotated German newspaper pages from 1852 to 1924. The paper additionally discusses baselines to address the challenges of historical document understanding by training layout and OCR models, proposing a processing pipeline, and establishing baseline results on both in- and out-of-domain test data.

Reviewers are almost unanimous in recognizing the importance of annotated datasets of the type proposed for historical document analysis, and also recognize the gap in the current offering of German-language resources of this type. However, the papers (as reviewers note) offers very little in terms of scientific contributions to the broader machine and representation learning communities. While the proposed pipeline illustrates the utility of the proposed dataset, it is based on consolidates, well-known models and techniques and thus also does not represent a novel contribution. Finally, the proposed dataset -- while it does address a lack of German-language resources -- is comparable in terms of size and variety with existing historical document resources as pointed out by multiple reviewers.

The consensus that emerged from the discussion phase is that this paper is not well-aligned with the ICLR community due to the lack of significant scientific contributions related to machine and representation learning and thus the decision is to reject. The authors are encouraged to look for a better-aligned venue, for example the NeurIPS dataset track.

审稿人讨论附加意见

Reviewers appreciated the contribution of a new dataset, however they were fairly unanimous in recognizing that the paper has little scientific contribution to offer to the ICLR community.

最终决定

Reject

公开评论

We want to thank everyone for their feedback. The final version of this paper is now available at https://openreview.net/pdf?id=YHDfqvtye9 .