350 rub
Journal Neurocomputers №3 for 2016 г.
Article in number:
Application of parsing tree for increasing the frequency text analysis - result relevance
Authors:
A.A. Melikhov - Assistant, Department of Intellectual Technologies and Systems (ITS), Moscow Technological University (MIREA). E-mail: megadelphin@mail.ru
Abstract:
The former article regards the problem of the combinatorial explosion which emerges while counting the possible word com-binations for the statistical natural text analysis. This, in turn, leads to the situation when found combinations (n-grams) are irrelevant to the topic of the text, but have very high occurrence rates. The introduced solution is based on filtering the initial n-gram dataset with a help of heuristic algorithm. This algorithm uses parsing trees for predicting the possible relevant word combinations and excludes those which do not follow the sentence-s grammatical structure. The proposed method combines standard statistical n-gram text model with precursory data filtration based on the grammar-aware heuristics. Practical study results demonstratesincrease of relevance in frequency n-gram analysis - outcome which can be explained by the significant variance between the filtered an unfiltered initial datasets. This is achieved by applying grammar-aware heuristics in the n-gram search. Due to its inner peculiarities, related to the fundamental principles of automated syntax parsing,practical implementation of the proposed method has some limitations on the computer system-s minimal performance. This feature, however, is compensated by no need in scope-related knowledge base, what makes it scope-tolerant. Thus, this method can be applied for indexing any natural language text using only a grammar model
Pages: 39-46
References

 

  1. Rozhnov A.V., Antiokh G.M., Seliverstov D.E., Kublik E.I. Sistemnaja integracija napravlenijj nauchnojj dejatelnosti v uslovijakh formirovanija predyntellektualnojj infrastruktury // Informacionno-izmeritelnye i upravljajushhie sistemy. 2014. T. 12. № 11. S. 59-63.
  2. Lobanov I.A., Rozhnov A.V., Skorik N.A., Cypelev V.V. O nechetkojj strategii integracii komponentov v interesakh nakoplenija opyta ehvoljucionnogo modelirovanija problemno-orientirovannojj sistemy upravlenija na nachalnykh ehtapakh zhiznennogo cikla // Sistemy proektirovanija, tekhnologicheskojj podgotovki proizvodstva i upravlenija ehtapami zhiznennogo cikla promyshlennogo produkta (SAD/CAM/PDM - 2015) Trudy mezhdunarodnojj konferencii. Pod red. A.V. Toloka. M. 2015. S. 345-348.
  3. Rozhnov A.V., ZHarkov I.D. Algoritmizacija intellektualnojj obrabotki dannykh v zadachakh slabo formalnykh sistem // Nejjrokompjutery: razrabotka, primenenie. 2008. № 1-2. S. 35-42.
  4. Rozhnov A.V., EHneev O.O. Osnovy formirovanija novykh metodov intellektualnojj obrabotki dannykh informacionno-upravljajushhikh sistem // Nejjrokompjutery: razrabotka, primenenie. 2003. № 2.
  5. Pavlovskijj I.S.Ispolzovanie konceptualnykh modelejj v intellektualnykh sistemakh podderzhki obrazovatelnojj dejatelnosti // Nejjrokompjutery: razrabotka, primenenie. 2008. № 7. S. 81-85
  6. Karpov V.A. JAzyk kak sistema. Izd. 3-e. M.: Editorial URSS. 2009. 304 s.
  7. Gusev S.V., CHepovskijj A.M. Model dlja identifikacii estestvennogo jazyka teksta // Biznes-informatika. 2011. № 3 (17). S. 31-35.
  8. KHomskijj N., Miller Dzh. Vvedenie v formalnyjj analiz estestvennykh jazykov: Per. s angl. Izd. 3-e. M.: Knizhnyjj dom «LIBROKOM». 2010. 64 s.
  9. CHejjf U.L. Znachenie i struktura jazyka: Per. s angl. G. S. SHHura.Izd. 3-e. M.: URSS, 2009. 424 s.
  10. Apresjan JU.D., Boguslavskijj I.M., Iomdin L.L. i dr. Lingvisticheskijj processor dlja slozhnykh informacionnykh sistem. M.: Nauka. 1992. 256 s.
  11. Ingersoll G.S., Morton T.S., Farris A.L. Taming Text. How to Find, Organize, and Manipulate It. Greenwich: Manning Publications.2013. 320 p.
  12. Piperski A.CH. Generalnyjj internet-korpus russkogo jazyka i ponjatie reprezentativnosti v korpusnojj lingvistike // Sovremennye problemy nauki i obrazovanija. 2013. № 5.
  13. Ivanova G.S., Andreev A.M., Nefedov V.I., SHouman M.A., Egorova E.V. Avtomaticheskijj poisk informacii s ispolzovaniem multi-agentnojj sistemy // EHlektromagnitnye volny i ehlektronnye sistemy. 2015. T. 20. № 2. S. 33-38.
  14. Strocev V.A. Informativnost chastotnykh kharakteristik N-gramm tekstovykh fragmentov // IVD. 2013. № 1. S. 10.
  15. Namestnikov A.M.Formirovanie informacionnykh zaprosov k ehlektronnomu arkhivu na osnove konceptualnogo indeksa // Radiotekhnika. 2014. № 7. S. 126-129.
  16. Savotchenko S.E., Proskurina E.A. Pokazateli semanticheskikh svjazejj informacionno-poiskovykh sistem // Nauchnye vedomosti BelGU. Ser. Istorija. Politologija. EHkonomika. Informatika. 2013. № 1-1 (144). S. 145-151.