Creating The Data Warehouse of Huge Information Content. Practical experience


A. V. Belokonnyy

In JSC "NICEVT" designed and manufactured computer systems for storing and processing large amounts of information. The report briefly discusses the practice of developing and implementing computer systems for data processing and storage of large amounts of data based on the mainframe and cartridge libraries. As the main data storage in computer systems of production are used cartridge library. The choice of cartridge libraries as repositories of large amounts of data due to several factors: – Large storage capacity per unit area; - Low power consumption; - High reliability storage, compensation complex errors, including physical damage to the vehicle; – Guaranteed by the manufacturer of information storage for 30 years; - The minimum cost of storage to 1TB – for modern devices, it is 60 USD / TB uncompressed data; A feature of these devices, IBM-libraries is that they use the same type of media – IBM 3592 cartridge. Increase the storage capacity is reached perfection read / write technology. Modern cartridge can store up to 4 / 12 TB of data (uncompressed / compressed), with the cost of the media about 4.000 rubles. By type of records 3592 cartridges are advanced LTO. When creating a large data storage within computer systems, including the use of disparate systems and technology platforms to address the following objectives: 1) Carrying out the backup / restore (Backup / Restore) for mainframes. 2) Implementation of backup / recovery for open systems (UNIX, LINUX, WINDOWS, ORACLE, etc.). 3) Create a hierarchical file system data storage cartridge-based libraries available to all users of the complex as part of their mandate. Realized with the help of a developed "NICEVT" software: "The system file hierarchy" which would allow data sets, as in disk subsystems, and on the cartridges. "System of a hierarchical library" for robot control libraries. Use developed by the JSC "NICEVT" software allows you to create a trusted (certified) software environment for building large data warehouses. Implementation feature is that in the present computer system obsolete storage media connected to a computer via ESCON (16 MB / sec) and modern media to connect to FICON-channel (1/2/4/8 Gb / sec). The backup / restore of data volumes is highly dependent on media type, which is located or is to be restored and the type of connection to the computer. The following results during the full and incremental backup: Full Backup – 724 volumes total 500 GB of which 70 GB with ORACLE SUN duration (20 hours). Incremental backups of about 200 volumes – 3 hours. Such a long time to backup is primarily due to the large number of 100/200 Mbaytnyh volumes located on the carriers to be connected to a computer for ESCON (slow link). During the backup process includes the transfer time data from ORACLE / SUN on an Ethernet network, the disk subsystem DS8300 (about 3 hours). On the technical equipment of the same computer system was set up a hierarchical file storage capacity – 10 TB on disk subsystems with the extension of 5 PB on cartridge system. The hierarchical file system available to users via NFS-interface on the network. As a result, the procedure read access to data hosted on the cartridge system is less than one minute. At the same time available to form a repository of information 5 PB. Сonclusions: The creation of large data warehouses on the base cartridge library software developed by JSC "NICEVT" allows you to: Scalable data warehouse from tens to hundreds of TB PB just increasing; Ensure the preservation of investment – every three years is an increase in storage capacity without changing the carrier; Provide a unified environment for mainframe storage and open systems (UNIX, LINUX, WINDOWS); Ensure minimal energy consumption per volume of storage; Create a large data warehouse for processing and storage of classified information constituting a state secret.

