Radiotekhnika
Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS


Тел.: +7 (495) 625-9241

 

Increasing the failure tolerance of microprocessor systems to control flow errors based on architectural redundancy

Keywords:

S.L. Podvalny – Dr. Sc. (Eng.), Professor, Head of Department of Automated and Computing Systems,
Voronezh State Technical University
E-mail: spodvalny@yandex.ru
S.V. Tyurin – Ph. D. (Eng.), Associate Professor, Professor, Department of Automated and Computing Systems, Voronezh State Technical University
E-mail: svturin@mail.ru
M.A. Khudyakov – Post-graduate Student, Department of Automated and Computing Systems,
Voronezh State Technical University
E-mail: makkhudjakv@yandex.ru


The article discusses the original method of rapid detection of random failures that violate the implementation of the work program in microprocessor systems. The error of program execution is a discrepancy between the command sequence, executed by the micro-processor after crash, and the working sequence of commands. The rapid detection of such failures is possible on the basis of archi-tectural improvements: the use of tagged memory program, the attribute «read command code», that is specifically generated by the microprocessor, and the organization of the additional interface, «handshake» between the microprocessor and program memory.
The essence of the proposed method of detection of random failures is the following. In the address space of the main (software) n bit storage device include additional single-bit storage device, that is each bit is a tag (named mark) that stores additional information for the distinction code that stored in the same cell of the primary (program) memory. Thus, the primary and the additional storage devices form a tagged program memory.
Microprocessor, in each cycle of turning to main memory, reports by the special signal about that, it is turned for the instruction code or the code of operand, at the same time the additional memory by default either confirms the correctness of rotation or reveals failure, that’s why we can see the formation of a signal of interruption of the microprocessor. Based on a preliminary analysis of the mi-croprocessor operating program, represented by binary codes, we can define the addresses of cells of the main storage device, which will contain command codes. For each cell it determine the amount of single values, which contain in the instruction code. Also if the amount of single values in the command code is even, then the corresponding bit of extra storage device must contain the unit (the command code is padded to odd). For all other code words of the work program, including useless cells of the main storage device, the code words complement to parity with the help of additional storage device. This markup of tagged memory can detect not only the flow of control errors, but single distortion of the data that read from the primary and secondary memory devices. Then, the CPU operating program, presented in the form of binary code is loaded into the main storage device, and additional storage is loaded by found values that complement to parity. The load of information into the basic and additional memory units can be produced both in the composition of microprocessor system and out of it.
In the operating mode of functioning the basic and additional memory devices form the united (n + 1) discharge memory unit, where n is the word length of the data bus of processor. To the address entrances of the basic and additional memory units moves the address codes, formed by processor on the tire of address.
The proposed method of detecting of the random failures, which disrupt the motion of the execution of working program, according to the preliminary estimations possesses have the large effectiveness (not more than the time of the fulfillment of two – three commands), it ensures the detection of 50−60% of the potential disturbances of the motion of the execution of working program and it requires for its realization not so many of equipment expenditures.

References:
  1. Vemu R., Abraham J.A. CEDA: Control-Flow Error Detection Using Assertions // IEEE Trans. on Computers. 2011. V. 60. № 9. S. 1233−1245.
  2. Rozhkov M.V., Tyurin S.V. Perspektivny'e podxody' k povy'sheniyu e'ffektivnosti programmnogo metoda obnaruzheniya oshibok potoka upravleniya // Sistemy' upravleniya i informaczionny'e texnologii. 2013. № 1(51). S. 65−71.
  3. Pechinkin A.V., Frenkel' S.L. Veroyatnostny'j analiz vremeni proyavleniya neispravnosti v seti avtomatov // Informatika i eë primeneniya. 2009. T. 3. № 2. S. 2−14.
  4. Mahmood A. McCluskey E.J. Concurrent Fault Detection Using a Watchdog Processor and Assertions // Proc. Int. Test Conf. Philadelphia, PA. 1983. S. 14.
  5. Pat. RF № 2461051. Sposob obnaruzheniya sluchajny'x «bluzhdanij» v mikroE'VM. Tyurin S.V., Rozhkov M.V. / MPK G06F 11/00. Zayavitel' i patentoobladatel' VGTU. № 2010131651/08. Zayavl. 27.07.2010. Opubl. 10.09.2012. Byul. № 25.
  6. Farhady N. et al. Software-based Control Flow Error Detection and Correction Using Branch Triplication // Proc. 17th Intl. On-Line Testing Symp. IEEE Computer Society Washington, DC (USA). 2011. S. 214−217.
  7. Azambuja J.R. et al. Detecting SEEs in microprocessors through a nonintrusive hybrid technique // IEEE Trans. on Nuclear Science. 2011. V. 58. № 3. S. 993−1000.
  8. Pat. RF № 2530325. Sposob povy'sheniya nadezhnosti mikroE'VM. Tyurin S.V., Rozhkov M.V. / MPK G06F11/10. Zayavitel' i patentoobladatel' VGTU. № 2012116018. Zayavl. 19.04.2012. Opubl. 10.10.2014. Byul. № 28.
  9. Kopy'tov G.V. Analiz chastoty' ispol'zovaniya komand i metodov adresaczii v proczessorax Intel // Vestnik Baltijskogo federal'nogo universiteta im. I. Kanta. 2011. № 10. S. 168−171.

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio