Abstract
In real-world cyberattacks, adversaries frequently exploit a combination of vulnerabilities, bugs, and misconfigurations to compromise systems. To systematically analyze the root causes behind these issues, the Common Weakness Enumeration (CWE) framework provides a standardized taxonomy of software weaknesses.
While vulnerability databases are central to cataloging known issues, many security-relevant descriptions first appear in informal sources such as blog posts, CTI reports, and social media. Although these sources predominantly offer broader cybersecurity insights, they occasionally yield details that may indicate underlying weaknesses not captured in formal databases.
We propose a two-step approach to extract these security-related descriptions from unstructured threat intelligence and automatically map them to their corresponding CWE categories. First, a binary classifier detects sentences resembling CVE descriptions, identifying information relevant to security teams. Then, we apply a self-supervised learning model to predict the most appropriate CWE, enabling structured analysis even in the absence of formal vulnerability tracking.
As no ground truth exists for this task, we conduct expert-driven validation. Our results show strong performance, with an F1-score of 98.17% for correctly assigning CWE labels, improving by at least 64% points over state-of-the-art reasoning LLMs. This demonstrates the feasibility of automating weakness classification in unstructured cybersecurity text.
| Original language | English |
|---|---|
| Title of host publication | Cryptology and Network Security |
| Subtitle of host publication | 24th International Conference, CANS 2025, Osaka, Japan, November 17–20, 2025, Proceedings |
| Editors | Yongdae Kim, Atsuko Miyaji, Mehdi Tibouchi |
| Place of Publication | Singapore |
| Publisher | Springer |
| Pages | 493-517 |
| Number of pages | 25 |
| Edition | 1 |
| ISBN (Electronic) | 978-981-95-4434-9 |
| ISBN (Print) | 978-981-95-4433-2 |
| DOIs | |
| Publication status | Published - 14 Nov 2025 |
| Event | 24th International Conference on Cryptology and Network Security, CANS 2025 - Osaka International Convention Center, Osaka, Japan Duration: 17 Nov 2025 → 20 Nov 2025 Conference number: 24 https://cy2sec.comm.eng.osaka-u.ac.jp/miyaji-lab/event/cans2025/ |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 16351 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 24th International Conference on Cryptology and Network Security, CANS 2025 |
|---|---|
| Abbreviated title | CANS 2025 |
| Country/Territory | Japan |
| City | Osaka |
| Period | 17/11/25 → 20/11/25 |
| Internet address |
Keywords
- CTI
- CVE-like extraction
- CWE mapping
- Security blog posts
- Vulnerability analysis
Fingerprint
Dive into the research topics of 'Beyond CWEs: Mapping Weaknesses in Unstructured Threat Intelligence Text'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver