350 rub
Journal Highly available systems №1 for 2023 г.
Article in number:
Self-timed pipeline optimization
DOI: https://doi.org/10.18127/j20729472-202301-01
UDC: 621.3.049.77:004.312
Authors:

Yu.A. Stepchenkov1, Yu.G. Diachenko2, N.V. Morozov3, D.Yu. Stepchenkov4, D.V. Khilko5, D.Yu. Diachenko6

1–6 FRC “Computer Science and Control” of RAS (Moscow, Russia)
 

Abstract:

High-performance computing systems are traditionally implemented as branched pipeline architectures. Global synchronization, on the one hand, facilitates pipelining, and on the other hand, requires the consistent clock "tree" construction and forces one to focus on the worst case, that is the slowest pipeline stage.

Self-timed (ST) circuits are an alternative to synchronous circuits. Instead of global synchronization, they rely on the local request-acknowledge interaction of neighboring digital units connected to each other by common information signals. They use ST (dual-rail) data coding and a two-phase functioning discipline: the working phase serves for converting information, and the spacer phase provides simple completion detection of switching to any phase and the absence of signal's glitches and hazards. Due to the global clock tree absence, two-phase discipline and the mandatory acknowledgement of the successful completion of the transition to the current phase, ST circuits guarantee the detection and localization of any stuck-at faults, a wide range of operability in terms of supply voltage and ambient temperature, and reliable operation at any cell delays, determined by the current operating conditions. These properties of ST circuits make them attractive for the implementation of digital units that operate reliably under extreme operating conditions.

Pipelining ST circuits improves their performance by accelerating the request-acknowledge interaction. As in synchronous counterparts, the addition of intermediate data registers during pipelining causes an increase in the processing time for one input data portion. In a synchronous pipeline, this increase is associated with an increase in the pipeline stage quantity, but is partly offset by a clock frequency increase, which is possible due to the stages simplification and their switching delays reduction. In ST circuits, pipelining is effective, first of all, if the ST circuit's combinational part, which makes the greatest contribution to the circuit delay, is divided into several successive parts of a smaller size and with shorter delays. The presence of a mandatory indication subcircuit both in the combinational part and register of the ST pipeline stage makes the division of the ST circuit into pipeline stages not always obvious. This article analyzes the features of ST pipelines and quantifies the effectiveness of pipelining cases.

The stage of any pipeline that processes digital data contains two main components: combinational logic and a register. Each of them contains an indication subcircuit acknowledging the completion of switching the ST circuit to the current operation phase, and controlling the interaction of the stage with its environment. The indication subcircuit contributes to the stage switching delay to the operating or spacer phase. The ST pipeline registers divide long combinational cell chains in the serial data processing path into shorter segments limited by one stage of the pipeline. As a result, the number of indicated signals and, accordingly, the complexity and switch delay of the each stage's indication subcircuit are decreased. Due to this, the switching times to the working or spacer phase for each stage are reduced. The indication subcircuit is implemented as a pyramid of two- and three-input Muller's C-elements. The indication subcircuit of the combinational logic switches against the background of data processing or switching of the indicated circuit into a spacer.

The output register's bits of the SS-pipeline stage are implemented on two-input C-elements. Such an implementation provides storage of both the dual-rail signal's working and spacer states and simplifies the request-acknowledge interaction of the pipeline stages. A change in the state of this stage register's dual-rail information outputs initiates the switching of the next stage's combinational logic to the corresponding phase before the register acknowledges the completion of its switching by the indication output. The indication outputs of the stage's combinational logic and register control the phase switching of the previous stage's register.

Splitting the ST circuit into pipeline stages always worsens its data processing time due to the introduced registers. The ST pipeline's delay increases linearly by the two-input C-element's switching delay value for each added stage. However, the performance of the pipeline (the amount of data processed per unit of time) increases due to the "waves" superposition of the working phase and the spacer propagating along the pipeline, and the critical cell paths reduction in the stages' combinational parts, although it is still determined by the slowest stage. The maximum improvement of the ST pipeline performance occurs when transforming a single-stage pipeline to the two-stage one. A further increase in the number of pipeline stages is not so effective in terms of the result's periodicity.

Thus, an increase in the performance of the ST circuit is provided by means similar to synchronous circuits, namely by its pipelining. To improve the efficiency of the ST circuit implementation as a pipeline, one should match the switching delays of all stages, making them approximately the same.

Pages: 5-13
For citation

Степченков Ю.А., Дьяченко Ю.Г., Морозов Н.В., Степченков Д.Ю., Хилько Д.В., Дьяченко Д.Ю. Оптимизация самосинхронного конвейера // Системы высокой доступности. 2023. Т. 19. № 1. С. 5–13. DOI: https://doi.org/ 10.18127/j20729472-202301-01

References
  1. Hennessy J.L., and Patterson D.A. Computer architecture: A quantitative approach. 6th ed. Morgan Kaufmann. 2019. 936 p. ISBN-13: ‎978-0128119051.
  2. Wu Y., Liu G.P. Dual pipeline pressure synchronous-coordinated control with the assistance of the golden section control method // In-ternational Journal of Systems Science. 2018, Vol. 49. No 11. P. 2318-2327. DOI: 10.1080/00207721.2018.1498932
  3. Shakeri K., Ghalam F.Z. Wave pipeline including synchronous stage. Patent US No. 11061836 B2. 2019.
  4. Varshavsky V.I., Kishinevsky M.A., Marakhovsky V.B., Peschansky V.A., Rosenblum L.Y., Taubin A.R., Tsyrlin B.S. Self-timed Control of Concurrent Processes. Kluver Academic Publishers. 1990. 245 p.
  5. Sokolov I.A., Stepchenkov YU.A., Petruhin V.S., D'yachenko YU.G., Zaharov V.N. Samosinhronnaya skhemotekhnika – perspektivnyj put' realizacii apparatury // Sistemy vysokoj dostupnosti. 2007. T. 3. № 1-2. C. 61–72.
  6. Zakharov V., Stepchenkov Yu., Diachenko Y., Rogdestvenski Yu. Self-Timed Circuitry Retrospective // International Conference Engi-neering Technologies and Computer Science EnT 2020, Russia, Moscow, 2020. 24–27 June 2020. P. 58–64. DOI:10.1109/ EnT48576.2020.00018
  7. Kushnerov A., Medina M., and Yakovlev A. Towards hazard-free multiplexer based implementation of self-timed circuits // 27th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). 2021. P. 17–24. DOI: 10.1109/ASYNC48570. 2021.00011
  8. Sparsø J. Introduction to Asynchronous Circuit Design. DTU Compute, Technical University of Denmark, 2020, available: https://backend.orbit.dtu.dk/ws/files/215895041/JSPA_async_book_2020_PDF.pdf
  9. Muller D.E. Asynchronous logics and application to information processing. In H. Aiken and W. F. Main, editors, Proc. Symp. on Application of Switching Theory in Space Technology. Stanford University Press, 1963. P. 289–297.
  10. Yoshikawa S., Sannomiya S., Iwata M., and Nishikawa H. Pipeline Stage Level Simulation Method for Self-Timed Data-Driven Pro-cessor on FPGA // 2020 8th International Electrical Engineering Congress (iEECON), 2020. P. 1–5. DOI: 10.1109/iEECON 48109.2020.229515
  11. Fiorentino M., Thibeault C., Savaria Yvon. Introducing KeyRing self-timed microarchitecture and timing-driven design flow // IET Computers & Digital Techniques, 2021. No 15, P. 409–426. DOI: 10.1049/cdt2.12032
  12. Jiang W., Sha E. H.-M., and Zhuge Q. On the Design of Time-Constrained and Buffer-Optimal Self-Timed Pipelines // IEEE Transac-tions on Computer-Aided Design of Integrated Circuits and Systems, 2019. Vol. 38. No. 8. P. 1515–1528. DOI: 10.1109/TCAD. 2018.2846642
  13. Stepchenkov YU.A., D'yachenko YU.G., Rozhdestvenskij YU.V., Morozov N.V., Stepchenkov D.YU., D'yachenko D.YU. Optimizaciya indikacii mnogorazryadnyh samosinhronnyh skhem // Sistemy i sredstva informatiki. 2019. № 4. S. 14–27. DOI: 10.14357/08696527190402
Date of receipt: 02.02.2023
Approved after review: 14.02.2023
Accepted for publication: 01.03.2023