350 rub
Journal Highly available systems №1 for 2014 г.
Article in number:
Methods for Unification of Non-traditional Data Models
Authors:
S. A. Stupnikov - Ph.D. (Eng.), Senior Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: ssa@ipi.ac.ru
N. A. Skvortsov - Research Scientist, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: nskv@ipi.ac.ru
V. I. Budzko - Dr.Sc. (Eng.), Professor, Deputy Director, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: vbudzko@ipiran.ru
V. N. Zakharov - Dr.Sc. (Eng.), Scientific Secretary, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: vzakharov@ipiran.ru
L. A. Kalinichenko - Dr.Sc. (Eng.), Head of Laboratory, Institute of Informatics Problems, Russian Academy of Sciences. E-mail: leonidk@ipi.ac.ru
Abstract:
In the current period of IT development the creation of data manipulation and analysis facilities aimed at Web, social media, machine and sensor data, etc., is regarded as of paramount importance. The data of such scale (frequently measured in petabytes) are related to the category of the Big Data. To represent and manipulate collections of such data the new data models were created that differ of the traditional (relational) data models. One of the not yet solved problems of Big Data manipulation is the problem of integration of various non-traditional data models. For solving of such problem first of all it is required to create the unified representation of various kinds of non-traditional data models in the canonical information model (the generalized language unifying the languages of various data models). For such representation it is required to construct the data model mapping preserving semantics of its data description and data manipulation languages in the canonical one. Such mapping is required for the materialized integration (creation of a data warehouse) as well as for the virtual integration (by means of the subject mediators) of the respective collections of data. In the paper the principles of mapping of four kinds of non-traditional data models into the canonical model (for which the SYNTHESIS language is used representing the object-frame composed data model) are considered: the data models based on the multidimensional arrays; the graph-based data models; the NoSQL data models; the triple-based RDF data model. The method of data model mapping verification applied for the proof of information and operation preserving under the mapping is illustrated by examples. The objective of this research is the definition of well founded unifying mappings of the non-traditional data models to present the possibility of the unified representation of so different data models for the materialized or virtual integration of the respective collections of data.
Pages: 18-39
References

  1. Challenges and Opportunities with Big Data. Computing Community Consortium. 2012. URL: http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf (data obrashcheniya: 05.02.2014).
  2. Zakharov V. N., Kalinichenko L. A., Sokolov I. A., Stupnikov S. A. Konstruirovanie kanonicheskikh informatsionnykh modeley dlya integrirovannykh informatsionnykh sistem // Informatika i ee primeneniya. M.: IPI RAN. 2007. T. 1. Vyp. 2. C. 15 - 38.
  3. Kalinichenko L. A., Briukhov D. O., Martynov D. O., Skvortsov N. A., Stupnikov S. A. Mediation Framework for Enterprise Information System Infrastructures. // Proc. of the 9th International Conference on Enterprise Information Systems ICEIS 2007. Funchal. 2007. V.: Databases and Information Systems Integration. P. 246 - 251.
  4. Kalinichenko L. A., Stupnikov S. A. Heterogeneous information model unification as a pre-requisite to resource schema mapping // A. D-Atri and D. Saccà (eds.), Information Systems: People, Organizations, Institutions, and Technologies. Proc. of the V Conference of the Italian Chapter of Association for Information Systems, itAIS. Berlin-Heidelberg: Springer Physica Verlag, 2010. P. 373 - 380.
  5. Kalinichenko L. A., Stupnikov S. A., Martynov D. O. SYNTHESIS: a Language for Canonical Information Modeling and Mediator Definition for Problem Solving in Heterogeneous Information Resource Environments. Moscow: IPI RAN, 2007. 171 p.
  6. Bryukhov D. O., Vovchenko A. Ye., Zakharov V. N., Zhelenkova O. P., Kalinichenko L. A., Martynov D. O., Skvortsov N. A., Stupnikov S. A. Arkhitektura promezhutochnogo sloya predmetnykh posrednikov dlya resheniya zadach nad mnozhestvom integriruemykh neodnorodnykh raspredelennykh informatsionnykh resursov v gibridnoy grid-infrastrukture virtual'nykh observatoriy // Informatika i ee primeneniya. M.: IPI RAN, 2008. T. 2. Vyp. 1. S. 2 - 34.
  7. Vassiliadis P., Sellis T. K. A Survey of Logical Models for OLAP Databases. // SIGMOD Record. 1999. V. 28. № 4. P. 64 - 69.
  8. Pedersen T. B., Jensen C. S. Multidimensional Database Technology. // IEEE Computer. 2001. V. 34. № 12. P. 40 - 46.
  9. Libkin L., Machlin R., Wong L. A Query Language for Multidimensional Arrays: Design, Implementation, and Optimization Techniques. // Proc. of the 1996 ACM SIGMOD International Conference on Management of Data. 1996. P. 228 - 239.
  10. Baumann P. A Database Array Algebra for Spatio-Temporal Data and Beyond. // Next Generation Information Technologies and Systems. 1999. P. 76 - 93.
  11. Brown P. G. Overview of SciDB: Large Scale Array Storage, Processing and Analysis. // Proc. of the 2010 ACM SIGMOD International Conference on Management of Data. 2010. P. 963 - 968.
  12. Besla J., Kim K. T. Report from the First Workshop on Extremely Large Databases. // Data Science Journal. 2008. V. 7.
  13. Large Synoptic Survey Telescope. URL: http://www.lsst.org/ (data obrashcheniya: 05.02.2014).
  14. Kersten M. L., Zhang. Y., Ivanova M., Nes N. J. SciQL, a query language for science applications. // EDBT/ICDT Workshop on Array Databases. Uppsala. 2011. P. 1 - 12.
  15. Astronomy in ArrayDB. URL: http://scidb.org/UseCases/ Astronomy%20in%20ArrayDB.pdf (data obrashcheniya: 05.02.2014).
  16. SciDB User's Guide. Version 13.12. 2014. URL: http://www.scidb.org/HTMLmanual/13.12/scidb_ug/index.html (data obrashcheniya: 05.02.2014).
  17. Buck J. B., Watkins N., LeFevre J., Ioannidou K., Maltzahn C., Polyzotis N., Brandt S. A. SciHadoop: array-based query processing in Hadoop. // Proc. of 2011 International Conference for High Performance Computing, Networking, Storage and AnalysisSuper Computing. ACM, 2011. ISBN: 978-1-4503-0771-0.
  18. Stupnikov C. A. Verifitsiruemoe otobrazhenie modeli dannykh, osnovannoy na mnogomernykh massivakh, v ob''ektnuyu model' dannykh // Informatika i ee primeneniya. M.: IPI RAN, 2013. T. 7. Vyp. 3. S. 22 - 34.
  19. Angles R. A Comparison of Current Graph Database Models. // Proc. of IEEE 28th International Conference on Data Engineering Workshops (ICDEW). 2012. P. 171 - 177.
  20. Malewicz G., Austern M. H., Bik A. J. C., Dehnert J. C., Horn I., Leiser N., Czajkowski G. Pregel: A System for Large-Scale Graph Processing. // Proc. of the 2010 ACM SIGMOD International Conference on Management of Data. 2010. P. 135 - 145.
  21. Shao B., Wang H., Li Y. Trinity: a distributed graph engine on a memory cloud. // Proc. of the 2013 ACM SIGMOD International Conference on Management of Data. 2013. P. 505 - 516.
  22. Titan Project. URL: https://github.com/thinkaurelius/titan/wiki  (data obrashcheniya: 05.02.2014).
  23. Neo4j Graph Database. URL: http://www.neo4j.org/ (data obrashcheniya: 05.02.2014).
  24. The Neo4j Manual. 2013. URL: http://docs.neo4j.org/ (data obrashcheniya: 05.02.2014).
  25. Abrial J.-R. The B-Book: Assigning Programs to Meanings. Cambridge: Cambridge University Press, 1996. ISBN:0-521-49619-5.
  26. Cattell R. Scalable SQL and NoSQL Data Stores. // ACM SIGMOD Record. NY: ACM New York, 2010. V. 39. Iss. 4. P. 12 - 27.
  27. Stoica I., Morris R., Liben-Nowell D., Karger D. R., Kaashoek M. F., Dabek F., Balakrishnan H. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking (TON). 2003. V. 11. Iss. 1. P. 17 - 32.
  28. Apache HBase Home. URL: http://hbase.apache.org/ (data obrashcheniya: 05.02.2014).
  29. MongoDB. URL: http://www.mongodb.org/ (data obrashcheniya: 05.02.2014).
  30. Martin J. Managing the Data-base Environment. New Jersey: Prentice-Hall. 381 p. ISBN 0-13-550582-8.
  31. Gray J. The Transaction Concept: Virtues and Limitations. // Proceedings of the 7th International Conference on Very Large Databases. Cupertino: Tandem Computers, 1981. P. 144 - 154.
  32. Pritchett D. Base: An Acid Alternative. // ACM Queue. 2008. V. 6. Iss 3. P. 48 - 55.
  33. Getting Started with NoSQL Database 11g Release 2. Oracle. 2011. URL: http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/Oracle-NoSQLDB-GSG.pdf (data obrashcheniya: 05.02.2014).
  34. The Apache Cassandra Project. URL: http://cassandra.apache.org/ (data obrashcheniya: 05.02.2014).
  35. Skvortsov N. A. Otobrazhenie modeley dannykh NoSQL v ob''ektnye spetsifikatsii. Trudy 14-y Vserossiyskoy nauchnoy konferentsii «Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii» RCDL 2012. Pereslavl'-Zalesskiy: Universitet goroda Pereslavlya. 2012. S. 78 - 87.
  36. Crockford D. The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627. The Internet Engineering Task Force (IETF) 2006. URL: http://tools.ietf.org/html/rfc4627 (data obrashcheniya: 05.02.2014).
  37. RDF vocabulary description language 1.0: RDF schema. D. Brickley, R.V. Guha (Eds.), W3C Recommendation. W3C, 2004. URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210 (data obrashcheniya: 05.02.2014).
  38. OWL Web Ontology Language Reference. M. Dean, G. Schreiber (Eds.), W3C Recommendation. W3C, 2004. URL: http://www.w3.org/TR/2004/REC-owl-ref-20040210/ (data obrashcheniya: 05.02.2014).
  39. SPARQL Query Language for RDF. E. Prud'hommeaux, A. Seaborne (eds.), W3C Recommendation. W3C, 2008. URL: http://www.w3.org/TR/rdf-sparql-query/ (data obrashcheniya: 05.02.2014).
  40. Erling O. Virtuoso, a Hybrid RDBMS/Graph Column Store. // IEEE Data Engineering Bulletin. 2012. V. 35. № 1. P. 3 - 8.
  41. Linked Open Data. W3C SWEO Community Project. URL: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData (data obrashcheniya: 05.02.2014).
  42. Datahub. The easy way to gat, use and share data. URL: http://datahub.io/ (data obrashcheniya: 05.02.2014).
  43. Turtle Terse RDF Triple Language. W3C, 2011. URL: http://www.w3.org/TeamSubmission/turtle/ (data obrashcheniya: 05.02.2014).
  44. Oren E. et al. ActiveRDF: Object-oriented semantic web programming. // Proc. of the 16th international conference on World Wide Web. ACM, 2007. S. 817 - 824.
  45. Skvortsov N. A. Otobrazhenie modeli dannykh RDF v kanonicheskuyu model' predmetnykh posrednikov // Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii: Trudy XV Vserossiyskoy nauchnoy konferentsii RCDL 2013 (Yaroslavl', 14 - 17 oktyabrya 2013). Yaroslavl': YarGU im. P. G. Demidova. 2013. S. 202 ? 209. ISBN 978-5-8397-1004-7.
  46. Kalinichenko L. A. Method for Data Models Integration in the Common Paradigm // Proc. of the First East-European Symposium on Advances in Databases and Information Systems ADBIS 97. - St.-Petersburg: Nevsky Dialect, 1997. V. 1: Regular Papers. P. 275 - 284.
  47. Atelier B. the industrial tool to efficiently deploy the B Method. URL: http://www.atelierb.eu/index-en.php (data obrashcheniya: 05.02.2014).
  48. Ballegooij A. R. RAM: Array Database Management through Relational Mapping. SIKS Dissertation Series. SIKS, 2009. 180 p. http://oai.cwi.nl/oai/asset/14074/14074D.pdf (data obrashcheniya: 05.02.2014).
  49. Theodoratos D. Semantic Integration and Querying of Heterogeneous Data Sources Using a Hypergraph Data Model // BNCOD 2002. Lecture Notes in Computer Science. V. 2405. Springer, 2002. P. 166 - 182.
  50. Sundaresan S., Hu G: Schema integration of distributed databases using hyper-graph data model. // Proc. of Information Reuse and Integration Conf, IRI 2005. 2005. P. 548 - 553.
  51. Tahat A. Ling M.H.T. Mapping Relational Operations onto Hypergraph Model // Proc. of CoRR 2011. - The Python Papers, 2011. V. 6 Iss. 1. P. 1.
  52. Calvanese D., Giacomo G., Lenzerini M., Vardi M. Query Processing under GLAV Mappings for Relational and Graph Databases. // Proc. of the VLDB Endowment. 2012. V. 6. № 2. P. 61 - 72.
  53. Stupnikov S. A. Otobrazhenie grafovykh modeley dannykh v kanonicheskuyu model' v sistemakh s intensivnym ispol'zovaniem dannykh. // Sistemy vysokoy dostupnosti. № 1. 2014.
  54. Merriman D. SQL to Mongo Mapping Chart. 2011. URL: http://docs.mongodb.org/manual/reference/sql-comparison/ (data obrashcheniya: 05.02.2014).
  55. Meijer H.J.M. Object model to key-value data model mapping US Patent App. 12/938,168, 2010. Google Patents, 2013.
  56. The Apache Hive data warehouse software. URL: http://hive.apache.org/ (data obrashcheniya: 05.02.2014).
  57. JavaScript Object Notation. URL: http://www.json.org/ (data obrashcheniya: 05.02.2014).
  58. JsonSerde - a read/write SerDe for JSON Data. URL: https://github.com/rcongiu/Hive-JSON-Serde (data obrashcheniya: 05.02.2014).
  59. Using Oracle External Tables To Access Oracle NoSQL Database Data. Oracle Technology Network. Documentation. URL: http://docs.oracle.com/cd/E26161_02/html/examples/ externaltables/cookbook.html (data obrashcheniya: 05.02.2014).
  60. Yu L. Linked open data // A Developer-s Guide to the Semantic Web. Springer Berlin Heidelberg, 2011. S. 409 - 466.
  61. Beckett D., Grant J. SWAD-Europe Deliverable 10.2: Mapping Semantic Web data with RDBMSes. W3C, 2003. URL: http://www.w3.org/2001/sw/Europe/reports/ scalable_rdbms_mapping_report. (data obrashcheniya: 05.02.2014).
  62. Wilkinson K., Sayers C., Kuno H.A., Reynolds D. Efficient RDF Storage and Retrieval in Jena2.// In Semantic Web and Databases Workshop. 2003. P. 131 - 150.
  63. Bornea M. A. et al. Building an Efficient RDF Store Over a Relational Database. // Proc. of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013. P. 121 ? 132.
  64. Sahoo S. S. et al. A survey of current approaches for mapping of relational databases to RDF. W3C RDB2RDF Incubator Group Report. 2009.
  65. Chebotko A., Lu S., Fotouhi F. Semantics preserving SPARQL-to-SQL translation // Data & Knowledge Engineering. 2009. T. 68. №. 10. S. 973 - 1000.