Mining Process Model Variants: Challenges, Techniques, Examples

C. Li

    Research output: ThesisPhD Thesis - Research UT, graduation UT

    166 Downloads (Pure)


    During the last years a new generation of process-aware information systems has emerged, which enables process model configurations at buildtime as well as process instance changes during runtime. Respective model adaptations result in large collections of process model variants that are derived from same process model, but slightly differ in structure. Generally, such model variants are expensive to maintain and configure. In this thesis, we present challenges, scenarios and algorithms for representing, comparing and mining such process model variants. We first introduce the notion of process distance, which corresponds to the minimal number of high-level change operations needed for transforming one process model into another. In general, we presume that the shorter the average distance between a reference process model and related process variants is, the less changes are required for adapting the variants and the less efforts are needed for (future) process configuration. In this context, we present a method based on boolean algebra to compute the distance between two process models. Starting with a collection of related process model variants, the major goal of this thesis is to discover a reference process model out of which these variants can be easily configured; i.e., a reference process model with minimal average distance to the variants. To achieve this goal we present two advanced algorithms which have their pros \& cons, and that are applicable in different scenarios. Our clustering algorithm does not presume any knowledge about the original reference process model out of which the process model variants were configured. By only looking at the process model variants, this algorithm can quickly discover a reference process model in polynomial time, which allows us to scale up when solving real-world problems. The clustering algorithm further provides information on how well each part of the discovered reference model fits to the variants. Our heuristic algorithm, in turn, can take the original reference model into account as well. In particular, the user can control to what degree the discovered model differs from the original one. This way we can avoid spaghetti-like process models and additionally control how many changes we want to perform on the original reference model. We systematically evaluate and compare the two algorithms based on simulations that comprise more than 7000 process models. Simulation results indicate good performance and make the differences between the two algorithms explicit. For example, the simulation results indicate that our clustering algorithm runs significantly faster than our heuristic algorithm. However, our heuristic algorithm can identify important changes at the beginning of the search and can discover better results than the clustering algorithm. We successfully applied the two algorithms to cases from the automotive and the healthcare domain. During these case studies, the practical relevance and benefit of our work has become evident once more. Overall, this Ph.D thesis will contribute to more intelligent information systems by learning from past adaptations and to an improved management of the variants by continuously evolving related reference process model.
    Original languageUndefined
    Awarding Institution
    • University of Twente
    • Wombacher, Andreas , Advisor
    • Reichert, M.U., Supervisor
    • Wieringa, Roelf Johannes, Supervisor
    Thesis sponsors
    Award date11 Nov 2010
    Place of PublicationEnschede, the Netherlands
    Print ISBNs978-90-365-3084-2
    Publication statusPublished - 11 Nov 2010


    • EWI-19774
    • IR-74378
    • METIS-276772

    Cite this