Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic

    Research output: Contribution to journalArticleAcademicpeer-review

    4 Citations (Scopus)
    140 Downloads (Pure)

    Abstract

    In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.
    Original languageUndefined
    Pages (from-to)1-22
    Number of pages22
    JournalACM transactions on architecture and code optimization
    VolumeVolume 8
    Issue numberIssue 4
    DOIs
    Publication statusPublished - Jan 2012

    Keywords

    • EWI-22155
    • Embedded Systems
    • area
    • Low power
    • multiply-accumulate (mac)
    • fused multiply-add (fma)
    • Floating-point
    • datapath
    • pipeline
    • digital signal processing
    • METIS-296073
    • IR-81219
    • integer

    Cite this

    @article{0b1d1d8d063d4a0e8d7966faa1cde222,
    title = "Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic",
    abstract = "In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67{\%} energy and 19{\%} area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5{\%}. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.",
    keywords = "EWI-22155, Embedded Systems, area, Low power, multiply-accumulate (mac), fused multiply-add (fma), Floating-point, datapath, pipeline, digital signal processing, METIS-296073, IR-81219, integer",
    author = "Tom Bruintjes and K.H.G. Walters and Gerez, {Sabih H.} and Egbert Molenkamp and Smit, {Gerardus Johannes Maria}",
    note = "eemcs-eprint-22155",
    year = "2012",
    month = "1",
    doi = "10.1145/2086696.2086720",
    language = "Undefined",
    volume = "Volume 8",
    pages = "1--22",
    journal = "ACM transactions on architecture and code optimization",
    issn = "1544-3566",
    publisher = "Association for Computing Machinery (ACM)",
    number = "Issue 4",

    }

    Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic. / Bruintjes, Tom; Walters, K.H.G.; Gerez, Sabih H.; Molenkamp, Egbert; Smit, Gerardus Johannes Maria.

    In: ACM transactions on architecture and code optimization, Vol. Volume 8, No. Issue 4, 01.2012, p. 1-22.

    Research output: Contribution to journalArticleAcademicpeer-review

    TY - JOUR

    T1 - Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic

    AU - Bruintjes, Tom

    AU - Walters, K.H.G.

    AU - Gerez, Sabih H.

    AU - Molenkamp, Egbert

    AU - Smit, Gerardus Johannes Maria

    N1 - eemcs-eprint-22155

    PY - 2012/1

    Y1 - 2012/1

    N2 - In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.

    AB - In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two’s complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST’s general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2 . Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.

    KW - EWI-22155

    KW - Embedded Systems

    KW - area

    KW - Low power

    KW - multiply-accumulate (mac)

    KW - fused multiply-add (fma)

    KW - Floating-point

    KW - datapath

    KW - pipeline

    KW - digital signal processing

    KW - METIS-296073

    KW - IR-81219

    KW - integer

    U2 - 10.1145/2086696.2086720

    DO - 10.1145/2086696.2086720

    M3 - Article

    VL - Volume 8

    SP - 1

    EP - 22

    JO - ACM transactions on architecture and code optimization

    JF - ACM transactions on architecture and code optimization

    SN - 1544-3566

    IS - Issue 4

    ER -