500 rub
Journal Highly available systems №1 for 2026 г.
Article in number:
A two-stage recommendation-based system for numerical similarity assessment of Android applications using static features
Type of article: scientific article
DOI: https://doi.org/10.18127/j20729472-202601-12
UDC: 004.4
Authors:

V.V. Petrov1

1 Kazan Federal University (Kazan, Russia)

1 valeryvpetrov.itis@gmail.com

Abstract:

Problem statement. Large mobile app repositories contain functionally related and derivative Android applications, including different versions, modified copies, and apps with injected third-party modules. It is required to develop a method that, given an APK file of a target application, automatically identifies the most similar applications in a collection and computes a numerical similarity score robust to code obfuscation techniques.

Objective. To propose and implement a prototype pipeline for numerical similarity scoring of Android applications based on static analysis of APK files, including scalable candidate generation using compact fingerprints.

Results. A two-stage architecture is introduced: 1) fast candidate generation using compact MinHash/SimHash fingerprints and approximate nearest neighbor indexing; 2) refined comparison at the function and structural levels with normalization of the final score and a structural penalty for injected or unmatched code.

Practical significance. The approach supports quality control (duplicate and version detection), security analysis (injected code detection), and semantic search and recommendation over large application repositories.

Pages: 61-64
For citation

Petrov V.V. A two-stage recommendation-based system for numerical similarity assessment of Android applications using static features. Highly Available Systems. 2026. V. 22. № 1. P. 61−64. DOI: https://doi.org/10.18127/j20729472-202601-12 (in Russian)

References
  1. Petrov V.V. Sistema avtomatizacii chislennoj ocenki sxodstva Android-prilozhenij // E`lektronny`e biblioteki. 2024. DOI: https://doi.org/10.26907/1562-5419-2024-27-3-336-365
  2. Li L. et al. Understanding Android App Piggybacking: A Systematic Study of Malicious Code Grafting. IEEE TIFS. 2017. DOI: https://doi.org/10.1109/TIFS.2017.2656460
  3. Piggybacking dataset repository (SerVal, Univ. of Luxembourg). GitHub. URL: https://github.com/serval-snt-uni-lu/Piggybacking
  4. RePack: repository of repackaged Android app pairs (SerVal, Univ. of Luxembourg). GitHub. URL: https://github.com/serval-snt-uni-lu/RePack
  5. Allix K. et al. AndroZoo: Collecting Millions of Android Apps for the Research Community. ACM MSR. 2016. DOI: https://doi.org/10.1145/ 2901739.2903508
  6. Broder A.Z. On the Resemblance and Containment of Documents. Compression and Complexity of Sequences. 1997. URL: https://www.cs.princeton.edu/courses/archive/spring13/cos598C/broder97resemblance.pdf
  7. Charikar M.S. Similarity Estimation Techniques from Rounding Algorithms. STOC. 2002. DOI: https://doi.org/10.1145/509907.509965
  8. Manku G.S. et al. Detecting Near-Duplicates for Web Crawling. WWW 2007. DOI: https://doi.org/10.1145/1242572.1242592
  9. Zhang Y. et al. Detecting Third-Party Libraries in Android Applications with High Precision and Recall. IEEE SANER. 2018. DOI: https://doi.org/10.1109/SANER.2018.8330204
  10. Huang J. et al. Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom Filtering. IEEE Transactions on Software Engineering. 2023. DOI: https://doi.org/10.1109/TSE.2022.3215628
  11. The Drebin Dataset. URL: https://drebin.mlsec.org/
  12. Elizarov A.M. et al. Digital Ecosystem OntoMath as an Approach to Building the Space of Mathematical Knowledge. Russian Digital Libraries Journal. 2023. DOI: https://doi.org/10.26907/1562-5419-2023-26-2-154-202
Date of receipt: 24.02.2026
Approved after review: 26.02.2026
Accepted for publication: 10.03.2026