Mapping Applications to a Coarse-Grained Reconfigurable Architecture

Y. Guo

    Research output: ThesisPhD Thesis - Research UT, graduation UT

    57 Downloads (Pure)

    Abstract

    Today the most commonly used system architectures in data processing can be divided into three categories: general purpose processors, application specific architectures and reconfigurable architectures. General purpose processors are flexible, but inefficient and for some applications do not other enough performance. Application specific architectures are efficient and give good performance, but are in flexible. Recently reconfigurable systems have drawn increasing attention due to their combination of flexibility and efficiency. Reconfigurable architectures limit their flexibility to a particular algorithm domain. Two types of reconfigurable architectures exist: fine-grained in which the functionality of the hardware is specified at the bit level and coarse-grained in which the functionality of the hardware is specified at the word level. High-level design entry tools are essential for reconfigurable systems, especially coarse-grained reconfigurable architectures. However, the tools for coarse-grained reconfigurable architectures are far from mature. This thesis proposes a method for mapping applications onto a coarse-grained reconfigurable architecture. This is a heuristic method which tackles this complex problem in four phases: translation, clustering, scheduling and allocation. In this thesis, the Montium tile processor, a coarse-grained reconfigurable architecture, is used to demonstrate the proposed mapping method. In the translation phase, an input program written in a high-level language is translated into a control data flow graph; and some transformations and simplifications are done on the control data flow graph. In the clustering phase, the data flow graph is partitioned into clusters and mapped onto an unbounded number of fully connected Arithmetic Logic Units (ALUs). The ALU structure is the main concern of this phase and we do not take the inter-ALU communication into consideration. In the scheduling phase the graph obtained from the clustering phase is scheduled taking the maximum number of ALUs into account. The scheduling algorithm tries to minimize the number of clock cycles used for the given application under the constraints of the number of distinct reconfigurations of ALUs. In the allocation phase, variables are allocated to memories or registers, and data moves are scheduled. The main concern in this phase is the details of the target architecture. There are many constraints and optimization objectives that need to be considered during the mapping and scheduling procedure. According to the above mentioned division only parts of constraints and optimization goals are considered in each phase, which simplifies the problem dramatically. Furthermore, after the division, part of our work relates to the existing research work, especially the work in the clustering and the scheduling phases. This connection is very important because on one hand, we can build our work on the results of others; on the other hand, the result of our work can also be used by others. Different levels of heuristics are used in the mapping procedure. In our opinion, using a greedy method is the only practical choice because the original mapping task is too complex to be handled in an optimal way. To decrease the negative consequences caused by the sequential order of the heuristics, when tackling each subproblem, we take the requirements of other subproblems also into consideration. In the Montium, a two-layer decoding technique is used. This technique limits the configuration space to simplify the structure of instruction fetching, which is good for reducing energy consumption and cost. However, the compiler has to face the challenge of decreasing the number of distinct configurations, i.e., generating a limited number of different instructions, which is the most difficult requirement to the Montium compiler. This is also the main difference between the Montium compiler and a compiler for other architectures. However, these constraints have a good reason: they are inherent to energy efficiency, so we believe that it will be used more and more in future energy-efficient computing architectures. Therefore, we have to find ways to cope with these constraints. To generate only a few different instructions, our approach tries to find the regularity in the algorithms. Therefore, regularity selection algorithms are used (e.g., template selection in the clustering phase, pattern selection in the scheduling phase). These algorithms find the most frequently used templates or patterns in an application. By using these templates or patterns, the number of different instructions is small.
    Original languageUndefined
    Awarding Institution
    • University of Twente
    Supervisors/Advisors
    • Smit, Gerard J.M., Advisor
    • Krol, Th., Supervisor
    Thesis sponsors
    Award date8 Sep 2006
    Place of PublicationZutphen
    Publisher
    Print ISBNs978-90-365-2390-5
    Publication statusPublished - 8 Sep 2006

    Keywords

    • IR-57121
    • METIS-238182
    • CAES-EEA: Efficient Embedded Architectures
    • EWI-6958
    • NWO 612.064.103

    Cite this