Realization of Random Forest for Real-Time Evaluation through Tree Framing

Sebastian Buschjager, Kuan-Hsun Chen, Jian-Jia Chen, Katharina Morik

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

5 Citations (Scopus)

Abstract

The optimization of learning has always been of particular concern for big data analytics. However, the ongoing integration of machine learning models into everyday life also demand the evaluation to be extremely fast and in real-time. Moreover, in the Internet of Things, the computing facilities that run the learned model are restricted. Hence, the implementation of the model application must take the characteristics of the executing platform into account Although there exist some heuristics that optimize the code, principled approaches for fast execution of learned models are rare. In this paper, we introduce a method that optimizes the execution of Decision Trees (DT). Decision Trees form the basis of many ensemble methods, such as Random Forests (RF) or Extremely Randomized Trees (ET). For these methods to work best, trees should be as large as possible. This challenges the data and the instruction cache of modern CPUs and thus demand a more careful memory layout. Based on a probabilistic view of decision tree execution, we optimize the two most common implementation schemes of decision trees. We discuss the advantages and disadvantages of both implementations and present a theoretically well-founded memory layout which maximizes locality during execution in both cases. The method is applied to three computer architectures, namely ARM (RISC), PPC (Extended RISC) and Intel (CISC) and is automatically adopted to the specific architecture by a code generator. We perform over 1800 experiments on several real-world data sets and report an average speed-up of 2 to 4 across all three architectures by using the proposed memory layout. Moreover, we find that our implementation outperforms sklearn, which was used to train the models by a factor of 1500.
Original languageEnglish
Title of host publication2018 IEEE International Conference on Data Mining (ICDM)
Place of PublicationPiscataway, NJ
PublisherIEEE EDS
Pages19-28
Number of pages10
ISBN (Electronic)978-1-5386-9159-5
ISBN (Print)978-1-5386-9160-1
DOIs
Publication statusPublished - 20 Nov 2018
Externally publishedYes
Event2018 IEEE International Conference on Data Mining (ICDM) - Singapore, Singapore
Duration: 17 Nov 201820 Nov 2018
http://icdm2018.org/

Conference

Conference2018 IEEE International Conference on Data Mining (ICDM)
CountrySingapore
CitySingapore
Period17/11/1820/11/18
Internet address

Keywords

  • Vegetation
  • Computational modeling
  • Real-time systems
  • Decision trees
  • Computer architecture
  • Radio frequency
  • Probabilistic logic

Fingerprint

Dive into the research topics of 'Realization of Random Forest for Real-Time Evaluation through Tree Framing'. Together they form a unique fingerprint.

Cite this