Radiotekhnika
Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS


Тел.: +7 (495) 625-9241

 

Accuracy study of data recovery technique based on fuzzy clustering

Keywords:

T.V. Afanasyeva – Dr.Sc.(Eng.), Professor, Associate Professor, Department «Information Systems», Ulyanovsk State Technical University
E-mail: tv.afanasjeva@gmail.com
I.V. Sibirev – Post-graduate Student, Department «Information Systems», Ulyanovsk State Technical University
E-mail: ivan.sibirev@yandex.ru


One of the problem of data mining is pre-processing of data, which are often presented as numerical arrays with omissions (not all parameters are available, some parameters are lost or not removed; not all parameters are credible). The article describes an algorithm for data recovery. It is based on the method of cluster analysis, in particular, fuzzy clustering (FCM-technique). The main idea of the algorithm is the iterative recovery of missing numerical values of parameters as a linear combination of mean values of the parameter in clusters with the weighting coefficients of objects belonging to clusters. Weighting coefficients obtained by using FCM algorithm of fuzzy clustering. Experimental studies have shown that the algorithm does not depend on the type of statistical distributions. It works, among other cases, for large data sets. The algorithm has an acceptable estimate of data recovery. It is shown that the use of fuzzy clustering to fill data gaps has an advantage in accuracy in comparison with the algorithm based on the arithmetic mean.

References:
  1. Mockus A. How to run empirical studies using project repositories. – Avaya Labs. 2006. URL = http://www.research.avayalabs.com /user/audris (ot 24.02.17).
  2. Algoritm vosstanovleniya danny’x po fajlovy’m «signaturam». URL = https://spark.ru/startup/hetmansoftware/blog/9391/algoritm-vosstanovleniya-dannih-po-fajlovim-signaturam (ot 4.04.18).
  3. Xalafyan A.A. Statistica 6. Statisticheskij analiz danny’x: Uchebnik. M.: OOO «Binom-Press». 2007. 512 s.
  4. Littl R.Dzh.A., Rubin D.B. Statisticheskij analiz danny’x s propuskami: Per. s angl. M.: Finansy’ i statistika.1990. 336 s.
  5. Vyatchenin D.A., Nechetkie metody’ avtomaticheskoj klassifikaczii. Minsk: UP «Texnoprint». 2004. 219 s.
  6. Busse M., Orbanz P., Buhmann M. Cluster Analysis of Heterogeneous Rank Data. URL = http://www.machinelearning.org/proceedings /icml2007/ papers/341.pdf (ot 24.02.17).
  7. Masal’skix A.V. Parallel’ny’j algoritm odnogo metoda vosstanovleniya tablichny’x danny’x // Izvestiya Tul’skogo gosudarstvennogo universiteta. Estestvenny’e nauki. 2014. № 3. S. 67−77.
  8. Rahman M.M., Davis D.N. Fuzzy Unordered Rules Induction Algorithm Used as Missing Value Imputation Methods for K Mean Clustering on Real Cardiovascular Data. URL = http://www2.dcs.hull.ac.uk/NEAT/dnd/papers/ Fuzzy_InductionAlgorithm_MVI.pdf (ot 4.05.18).
  9. Sibirev I.V., Afanas’eva T.V. Algoritm predobrabotki i vosstanovleniya anketny’x danny’x // Materialy’ VI Mezhdunar. nauch.-texn. konf. «Otkry’ty’e semanticheskie texnologii proektirovaniya intellektual’ny’x sistem» (Open Semantic Technologies for Intelligent Systems, OSTIS-2016). Minsk: BGUIR. 2016. S. 271−274.
  10. Sibirev I.V. Predobrabotka danny’x v intellektual’nom analize na osnove vosstanovleniya propushhenny’x anketny’x znachenij // Trudy’ Pyatnadczatoj naczional’noj konf. po iskusstvennomu intellektu s Mezhdunar. uchastiem KII-2016. V 3-x tomax. Smolensk: Universum. 2016. T. 1. S. 378−386.
  11. Sibirev I.V., Afanas’eva T.V. Analiz e’ffektivnosti algoritma vosstanovleniya anketny’x danny’x // Sb. nauchny’x trudov III Mezhdunar. nauchno-prakticheskaya konf. «E’lektronnoe obuchenie v neprery’vnom obrazovanii (E’ONO-2016)». Ul’yanovsk: UlGTU. 2016. 367−373 s.
  12. Sibirev I.V. Programma generator isxodny’x danny’x dlya klasterizaczii / Nechetkie sistemy’ i myagkie vy’chisleniya. Promy’shlenny’e primeneniya // Sb. nauchny’x trudov IV Vseros. nauchno- prakticheskoj mul’tikonferenczii s Mezhdunar. uchastiem «Prikladny’e informaczionny’e sistemy’ (PIS-2017)». Ul’yanovsk (Rossiya). 29−31 maya 2017. Ul’yanovsk: UlGTU. 2017. S. 171−174.
June 24, 2020
May 29, 2020

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio