Effective Data Preprocessing Techniques for CNN-based Selective Sweep Detection

Hanqing Zhao, Nikolaos Alachiotis

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
20 Downloads (Pure)

Abstract

Identifying positive selection has been cast as a classification task, with Convolutional Neural Networks (CNNs) already delivering higher accuracy than summary statistics and likelihood-based approaches. While several CNN-based methods rearrange the pixels of images representing raw genomic data as a preprocessing technique to enhance classification accuracy, the effectiveness of such pixel-rearrangement methods has not been thoroughly studied in the presence of confounding factors such as population bottlenecks and recombination hotspots. Here, we present a series of pixel-rearrangement algorithms to increase CNN classification accuracy for selective sweep detection, and evaluate the performance of four CNN models that are specifically designed for detecting selective sweeps. We find that data preprocessing based on pixel-rearrangement algorithms significantly improves the overall classification accuracy of a given CNN for diverse datasets simulating confounding factors. We observe up to 24.55% higher top-1 accuracy than using the preprocessing algorithms proposed by the authors of each CNN architecture. Furthermore, our results suggest a correlation between the stability of the rearrangement algorithms (over the different CNN architectures and confounding factors) and their performance. Based on these findings, we make suggestions for the most suitable preprocessing technique per CNN architecture used in this study. We provide the data rearrangement algorithms as a distinct module available for download at: https://github.com/Zhaohq96/Genetic-data-rearrangement.
Original languageEnglish
Title of host publication2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Place of PublicationPiscataway, NJ
PublisherIEEE
Pages793-800
Number of pages8
ISBN (Electronic)979-8-3503-3748-8
ISBN (Print)979-8-3503-3749-5
DOIs
Publication statusPublished - 8 Dec 2023
Event2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023 - Istanbul, Turkey
Duration: 5 Dec 20238 Dec 2023

Publication series

NameIEEE International Conference on Bioinformatics and Biomedicine (BIBM)
PublisherIEEE
Volume2023
ISSN (Print)2156-1125
ISSN (Electronic)2156-1133

Conference

Conference2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
Country/TerritoryTurkey
CityIstanbul
Period5/12/238/12/23

Keywords

  • 2023 OA procedure
  • Data preprocessing
  • Sociology
  • Genomics
  • Stability analysis
  • Classification algorithms
  • Convolutional neural networks
  • Correlation

Fingerprint

Dive into the research topics of 'Effective Data Preprocessing Techniques for CNN-based Selective Sweep Detection'. Together they form a unique fingerprint.

Cite this