### Abstract

Original language | Undefined |
---|---|

Pages (from-to) | 1-22 |

Number of pages | 22 |

Journal | ACM transactions on architecture and code optimization |

Volume | Volume 8 |

Issue number | Issue 4 |

DOIs | |

Publication status | Published - Jan 2012 |

### Keywords

- EWI-22155
- Embedded Systems
- area
- Low power
- multiply-accumulate (mac)
- fused multiply-add (fma)
- Floating-point
- datapath
- pipeline
- digital signal processing
- METIS-296073
- IR-81219
- integer

### Cite this

*ACM transactions on architecture and code optimization*,

*Volume 8*(Issue 4), 1-22. https://doi.org/10.1145/2086696.2086720

}

*ACM transactions on architecture and code optimization*, vol. Volume 8, no. Issue 4, pp. 1-22. https://doi.org/10.1145/2086696.2086720

**Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic.** / Bruintjes, Tom; Walters, K.H.G.; Gerez, Sabih H.; Molenkamp, Egbert; Smit, Gerardus Johannes Maria.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic

AU - Bruintjes, Tom

AU - Walters, K.H.G.

AU - Gerez, Sabih H.

AU - Molenkamp, Egbert

AU - Smit, Gerardus Johannes Maria

N1 - eemcs-eprint-22155

PY - 2012/1

Y1 - 2012/1

N2 - In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.

AB - In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.

KW - EWI-22155

KW - Embedded Systems

KW - area

KW - Low power

KW - multiply-accumulate (mac)

KW - fused multiply-add (fma)

KW - Floating-point

KW - datapath

KW - pipeline

KW - digital signal processing

KW - METIS-296073

KW - IR-81219

KW - integer

U2 - 10.1145/2086696.2086720

DO - 10.1145/2086696.2086720

M3 - Article

VL - Volume 8

SP - 1

EP - 22

JO - ACM transactions on architecture and code optimization

JF - ACM transactions on architecture and code optimization

SN - 1544-3566

IS - Issue 4

ER -