Radiotekhnika
Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS


Тел.: +7 (495) 625-9241

 

An approach to clustering feature tree transformation into feature vectors

Keywords:

P.V. Dudarin – Post-graduate Student, Ulyanovsk State Technical University
E-mail: p.dudarin@ulstu.ru
N.G. Yarushkina – Dr.Sc.(Eng.), Professor, Head of Department «Information Systems», Ulyanovsk State Technical University
E-mail: jng@ulstu.ru


Almost any machine learning algorithm includes a feature selection and feature extraction phase. In case of non-vector features a transformation into feature vectors is needed. Feature extraction algorithm determines the volume and quality of information enclosed in features and quality of clustering. Thus this kind of transformation is important part of clustering procedure. In this paper an approach to clustering feature tree transformation into feature vectors is proposed. Presented approach allows saving hierarchy information and reducing feature space dimension. An efficiency of transformation is shown in the experiment part with different clustering algorithms. There is a result analysis at the end of the paper.

References:
  1. Jain A.K., Murty M.N., Flynn P.J. Data Clustering: A Review // ACM Computing Surveys (CSUR) (USA). 1999. V. 31. № 3. P. 264−323.
  2. Amorim Renato. Feature Weighting for Clustering: Using K Means and the Minkowski. LAP Lambert Academic Publishing. 2012.
  3. Modha D.S., Spangler W.S. Feature Weighting in K Means Clustering // Machine Learning. 2003. 52: 217. doi.org/10.1023/A:1024016609528.
  4. Zhang T., Ramakrishnan R.; Livny M. BIRCH: an efficient data clustering method for very large databases // Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD). 1996. P. 103−114. doi:10.1145/233269.233324.
  5. Li J., Wang K., Xu L. Chameleon based on clustering feature tree and its application in customer segmentation // Ann Oper Res. 2009. P. 168−225. doi.org/10.1007/s10479-008-0368 4.
  6. Mansoori E.G. GACH: a grid based algorithm for hierarchical clustering of high-dimensional data // Soft Computing. 2004. V. 18. № 5. P. 905−922.
  7. Federal'ny'j zakon «O strategicheskom planirovanii v Rossijskoj Federaczii» № 172-FZ ot 28.07.2014 g. URL = http://pravo.gov.ru/proxy/ips/?docbody=&nd=102354386 (02.05.2018).
  8. Dudarin P., Pinkov A., Yarushkina N. Methodology and the algorithm for clustering economic analytics object, Automation of Control Processes. 2017. V. 47. № 1. P. 85−93.
  9. Ester M., Kriegel H.P., Sander J., Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, OR. AAAI Press. 1996. P. 226−231.
  10. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre. Fast unfolding of communities in large networks // J. Stat. Mech. 2008.
  11. Zhang J., Wang Y., Feng J. A hybrid clustering algorithm based on PSO with dynamic crossover // Soft Computing. 2014. V. 18. № 5. P. 961−979.
  12. Q. Le, T. Mikolov. Distributed Representations of Sentences and Documents // Proceedings of the 31st International Conference on Machine Learning (PMLR). 2014. 32(2). 1188−1196.
  13. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality // Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe (Nevada). 5−10 December 2013. P. 3111−3119.
  14. Dudarin P.V., Yarushkina N.G. An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering // Proceedings of the Second International Scientific Conference «Intelligent Information Technologies for Industry» (IITI). 2017. Advances in Intelligent Systems and Computing. 2018. V. 679. Springer. Cham.
  15. Dudarin P.V., Yarushkina N.G. Formirovanie priznakov iz ierarxicheskogo klassifikatora dlya klasterizaczii korotkix tekstovy'x fragmentov // Nechetkie sistemy' i myagkie vy'chisleniya. 2017. T. 12. № 2. S. 87−96.
  16. Dudarin P.V., Yarushkina N.G. Algoritm postroeniya ierarxicheskogo klassifikatora korotkix tekstovy'x fragmentov na osnove klasterizaczii nechetkogo grafa // Radiotexnika. 2017. № 6.
  17. Rosenfeld A. Fuzzy graphs // Zadeh L.A., Fu K.S., Tanaka K., Shimura M. (Eds.) Fuzzy Sets and Their Applications to Cognitive and Decision Processes. Academic Press. New York. 1975. P. 77−95.
  18. Ruspini E.H. A new approach to clustering // Inform. and Control. 1969. 15 (1). 22−32.
  19. Raymond T Yeh, Bang S.Y. Fuzzy relation, fuzzy graphs and their applications to clustering analysis // Fuzzy Sets and their Applications to Cognitive and Decision Pro-cesses. Academic Press. P. 1975. P. 125−149. ISBN 9780127752600.
  20. Jolliffe I.T. Principal Component Analysis. Springer-Verlag. 1986. P. 487. doi:10.1007/b98835, ISBN 978-0-387-95442-4.
  21. Ball, Geoffrey H., Hall, David J. Isodata: a method of data analysis and pattern classification. Stanford Research Institute, Menlo Park (United States). Office of Naval Re-search. Information Sciences Branch. 1965.
  22. Brendan J. Frey and Delbert Dueck, Clustering by Passing Messages Between Data Points. Science. Feb. 2007.
  23. Comaniciu D., Meer P. Mean shift: A robust approach toward feature space analysis // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002.
  24. Rokach L., Maimon O. Clustering Methods // Maimon O., Rokach L. (eds). Data Mining and Knowledge Discovery Handbook. Springer. Boston. MA. 2005.
  25. Pedregosa F. et al. Scikit-learn: Machine Learning in Python // Journal of Machine Learning Research. 2011. V. 12. P. 2825−2830.
  26. Hubert L., Arabie P. Comparing partitions // Journal of Classification. 1985. V. 2. № 1. P. 193−218. doi:10.1007/BF01908075.
  27. Rousseeuw P.J. Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis // Computational and Applied Mathematics. 1987. P. 20: 53−65. doi:10.1016/0377-0427(87)90125 7.
June 24, 2020
May 29, 2020

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio