A.M. Namestnikov – Ph.D. (Eng.), Associate Professor, Department of Information Systems, Ulyanovsk State Technical University. E-mail: firstname.lastname@example.org
A.A. Filippov – Post-graduate Student, Department of Information Systems, Ulyanovsk State Technical University. E-mail: email@example.com
The electronic archive has to possess properties of intellectual system. Now there are no the mathematical methods and algorithms allowing to structure electronic archive of text documents based on its contents and taking into account specifics of design organization domain area. Therefore, development of models, methods and algorithms for creation of electronic archive navigation structure on the basis of a ontological clustering of semi-structured information resources is actual.
Article describes a method of formation of navigation structure of electronic archive of technical documentation on the basis of domain ontology. In this article are presented the formal model of ontology, model of technical documentation and algorithm of a clustering of contents of electronic archive based at modified fcm-method. Formalization of a measure of distance between ontological representations of technical documentation of archive assumes comparing of complexity of conversion of hierarchies, in view of different types of the semantic relations between ontology concepts.
Text clustering is mostly considered as an objective method, which delivers one clearly defined result, which needs to be optimal in some way. But different people have quite defferent needs with regard to clustering of text documents, because they may view the same documents from completely different perspectives.
In this paper we have shown how to cluster text documents using ontology approach. We have compared our semantic methods with known "bag of words" representation of documents and well known Oracle Text. For practical purposes of clustering in small or medium document repositories, suggested algorithms seem better suited. Currently we try to apply our techniques to large text document archives.
Serrano-Guerrero J., Olivas J. A., De la Mata J.,
Garces P. Physical and Semantic Relations to Build Ontologies for Representing
Documents // Fuzzy logic, Soft Computing and Computational Intelligence
(Eleventh International Fuzzy Systems Association World Congress IFSA) (2005,
Beijing, China) – Tsinghua University Press, 2005. V. 1. P. 503-508.
Zagoruyko N.G. Prikladnye metody analiza dannykh i
znaniy. Novosibirsk: IM SO RAN. 1999. S. 218-223.
Zagorul'ko Yu.A., Kononenko I.S., Sidorova Ye.A.
Semanticheskiy podkhod k analizu dokumentov na osnove ontologii predmetnoy
oblasti. URL: http://www.dialog-21.ru/digests/dialog2006/materials/html/SidorovaE.htm
(data obrashcheniya: 12.09.2012).
Bershteyn L.S., Bozhenyuk A.V. Nechetkie grafy i
gipergrafy. M.: Nauchnyy mir. 2005. S. 41-49.
Namestnikov A.M., Filippov A.A. Metod geneticheskoy
optimizatsii ontologicheskikh predstavleniy proektnykh dokumentov v zadache
indeksirovaniya // Trinadtsataya natsional'naya konf. po iskusstvennomu
intellektu s mezhdunarodnym uchastiem KII-2012 (16-20 oktyabrya 2012 g., g.
Belgorod, Rossiya). Belgorod: Izd-vo BGTU. 2012. S. 84-91.
Namestnikov A.M. Kontseptual'noe indeksirovanie i
klasterizatsiya arkhiva proektnoy dokumentatsii na osnove ontologii //
Naukoemkie tekhnologii. 2013. T.
14. № 5. S. 73-78.