S.A. Krassotkin1
1 MIPT (Dolgoprudny, Russia)
1 Institute of Control Sciences RAS (Moscow, Russia)
1 krasotkin.sa@phystech.edu
This paper presents a comparative analysis of two approaches to extracting and grouping bibliographic records from the text of scientific articles: the industrial pipeline RedUpAPI and the specialized tool GROBID. We discuss the limitations and typical errors of these approaches on heterogeneous corpora formatted according to various standards (APA, MLA, Chicago, IEEE, etc.), as well as Russian GOSTs. The discussion of these and other issues, along with proposals for possible solutions, is the focus of this work.
Krassotkin S.A. Challenges of extracting bibliographies from academic publications. Highly Available Systems. 2026. V. 22. № 1.
P. 17−20. DOI: https://doi.org/10.18127/j20729472-202601-03 (in Russian)
- Kopanichuk I. et al. Structure Extractor: Multilingual Extraction of Sections from Scientific Document. 37th Conference of Open Innovations Association (FRUCT). IEEE. 2025. Р. 122–128.
- Grobid End-to-End Benchmarking Datasets. Zenodo: Dataset. Electronic data. URL: https://zenodo.org/records/7708580 (data obrashheniya: 10.03.2026).
- Kir`yanov P.A., Latipov A.R., Blashkun M.R. Ispol`zovanie instrumenta GROBID dlya izvlecheniya strukturny`x e`lementov russkoyazy`chny`x nauchny`x publikacij. Trudy` 66-j Vseros. nauch. konf. MFTI (Moskva, 2024). M.: MFt. 2024. S. 71–72.
- Polevoj V.G. i dr. Metodicheskie rekomendacii dlya razrabotki i predstavleniya k publikacii nauchnoj stat`i. Nauchny`e i obrazovatel`ny`e problemy` grazhdanskoj zashhity`. 2016. № 1(28). S. 94–102.
- Romary L., Lopez P. Grobid-information extraction from scientific publications. ERCIM News. 2015. V. 100.
- Besagni D., Belaïd A., Benet N. A segmentation method for bibliographic references by contextual tagging of fields. Seventh International Conference on Document Analysis and Recognition. 2003. Proceedings. IEEE. 2003. P. 384–388.
- Ohta M., Inoue R., Takasu A. Empirical evaluation of crf-based bibliography extraction from.
- Krüger F., Schindler D. A literature review on methods for the extraction of usage statements of software and data. Computing in Science & Engineering. 2019. V. 22. № 1. P. 26–38.
- Marini P. et al. Data gatherer: LLM-powered dataset reference extraction from scientific literature. Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025). 2025. P. 114–123.

