Skip to main navigation Skip to search Skip to main content

Beyond CVEs: Mapping Weaknesses in Unstructured Threat Intelligence Text

Research output: Working paperPreprintAcademic

Abstract

In real-world cyberattacks, adversaries frequently exploit a combination of vulnerabilities, bugs, and misconfigurations to compromise systems. To systematically analyze the root causes behind these issues, the Common Weakness Enumeration (CWE) framework provides a standardized taxonomy of software weaknesses.

While vulnerability databases are central to cataloging known issues, many security-relevant descriptions first appear in informal sources such as blog posts, CTI reports, and social media. Although these sources predominantly offer broader cybersecurity insights, they occasionally yield details that may indicate underlying weaknesses not captured in formal databases.

We propose a two-step approach to extract these security-related descriptions from unstructured threat intelligence and automatically map them to their corresponding CWE categories.
First, a binary classifier detects
sentences resembling CVE descriptions, identifying information relevant to security teams. Then, we apply a self-supervised learning model to predict the most appropriate CWE, enabling structured analysis even in the absence of formal vulnerability tracking.

As no ground truth exists for this task, we conduct expert-driven validation. Our results show strong performance, with an F1-score of 98.17% for correctly assigning CWE labels, improving by at least 64 percentage points over state-of-the-art reasoning LLMs. This demonstrates the feasibility of automating weakness classification in unstructured cybersecurity text.
Original languageEnglish
Publication statusPublished - 2025

Fingerprint

Dive into the research topics of 'Beyond CVEs: Mapping Weaknesses in Unstructured Threat Intelligence Text'. Together they form a unique fingerprint.

Cite this