Flow-based intrusion detection

Research output: ThesisPhD Thesis - Research UT, graduation UT

  • 183 Citations

Abstract

The spread of 1-10Gbps technology has in recent years paved the way to a flourishing landscape of new, high-bandwidth Internet services. As users, we depend on the Internet in our daily life for simple tasks such as checking e-mails, but also for managing private and financial information. However, entrusting such information to the Internet also means that the network has become an alluring place for hackers. To this threat, the research community has answered with an increased interest in intrusion detection. With the number of attacks almost exponentially increasing, and the attackers' motivations moving from ideological to economical, the researchers' attention is focused on developing new techniques to timely detect intruders and prevent damage. Our studies in the field of intrusion detection, however, made us realize that additional research is needed, in particular: the creation of shared data sets to validate Intrusion Detection Systems (IDSs) and the development of automatic procedures to tune the parameters of IDSs. The contribution of this thesis is that it develops a structured approach to intrusion detection that focuses on (i) shared ground-truth data sets and (ii) automatic parameter tuning. We develop our approach by focusing on network flows. Flows offer an aggregated view of network traffic, by reporting on the amount of packets and bytes exchanged over the network. Therefore, flows drastically reduce the amount of data to be analyzed. In this thesis, we aim at detecting anomalies in flow-based time series, describing how the number of flows, packets and bytes changes over time. Ground truth data sets are fundamental in the development phase, for validation purposes and, if publicly available, for comparison between different IDSs. We attack the problem of ground truth generation in two complementary manners. First, we obtain ground truth information for flow-based intrusion detection by manually creating it. We do so by means of a honeypot-based data collection and monitoring setup, specifically tuned to (i) offer an attracting platform for attackers, and (ii) include enhanced logging capabilities to support the labeling of the collected data. The outcome of our research has been a publicly released flow-based labeled data set. To the best of our knowledge, no such data set already exists. Second, we generate ground truth information in an automatic manner. We do this by generating artificial flow, packet and byte time series for benign and attack traffic. In this thesis, we rely upon Hidden Markov Models, which allow for probabilistic and compact representations of flow-based time series and can be used for generation purposes. Finally, we approach the problem of automatic tuning of IDSs. The performance of an IDS is governed by the trade-off between detecting all anomalies (at the expense of raising alarms too often), and missing anomalies (but not issuing many false alarms). We developed an optimization procedure that aims to mathematically treat such trade-off in a systematic manner, by automatically tuning the system parameters.
LanguageUndefined
Awarding Institution
  • University of Twente
Supervisors/Advisors
  • Haverkort, Boudewijn R.H.M., Supervisor
  • Pras, Aiko , Advisor
Award date14 Oct 2010
Place of PublicationZutphen
Print ISBNs978-90-365-3089-7
DOIs
StatePublished - 14 Oct 2010

Keywords

  • IR-73630
  • METIS-271091
  • EWI-18649

Cite this

@phdthesis{9b6ea741e8a8499fb87b6d2eeb6b0108,
title = "Flow-based intrusion detection",
abstract = "The spread of 1-10Gbps technology has in recent years paved the way to a flourishing landscape of new, high-bandwidth Internet services. As users, we depend on the Internet in our daily life for simple tasks such as checking e-mails, but also for managing private and financial information. However, entrusting such information to the Internet also means that the network has become an alluring place for hackers. To this threat, the research community has answered with an increased interest in intrusion detection. With the number of attacks almost exponentially increasing, and the attackers' motivations moving from ideological to economical, the researchers' attention is focused on developing new techniques to timely detect intruders and prevent damage. Our studies in the field of intrusion detection, however, made us realize that additional research is needed, in particular: the creation of shared data sets to validate Intrusion Detection Systems (IDSs) and the development of automatic procedures to tune the parameters of IDSs. The contribution of this thesis is that it develops a structured approach to intrusion detection that focuses on (i) shared ground-truth data sets and (ii) automatic parameter tuning. We develop our approach by focusing on network flows. Flows offer an aggregated view of network traffic, by reporting on the amount of packets and bytes exchanged over the network. Therefore, flows drastically reduce the amount of data to be analyzed. In this thesis, we aim at detecting anomalies in flow-based time series, describing how the number of flows, packets and bytes changes over time. Ground truth data sets are fundamental in the development phase, for validation purposes and, if publicly available, for comparison between different IDSs. We attack the problem of ground truth generation in two complementary manners. First, we obtain ground truth information for flow-based intrusion detection by manually creating it. We do so by means of a honeypot-based data collection and monitoring setup, specifically tuned to (i) offer an attracting platform for attackers, and (ii) include enhanced logging capabilities to support the labeling of the collected data. The outcome of our research has been a publicly released flow-based labeled data set. To the best of our knowledge, no such data set already exists. Second, we generate ground truth information in an automatic manner. We do this by generating artificial flow, packet and byte time series for benign and attack traffic. In this thesis, we rely upon Hidden Markov Models, which allow for probabilistic and compact representations of flow-based time series and can be used for generation purposes. Finally, we approach the problem of automatic tuning of IDSs. The performance of an IDS is governed by the trade-off between detecting all anomalies (at the expense of raising alarms too often), and missing anomalies (but not issuing many false alarms). We developed an optimization procedure that aims to mathematically treat such trade-off in a systematic manner, by automatically tuning the system parameters.",
keywords = "IR-73630, METIS-271091, EWI-18649",
author = "Anna Sperotto",
note = "10.3990/1.9789036530897",
year = "2010",
month = "10",
day = "14",
doi = "10.3990/1.9789036530897",
language = "Undefined",
isbn = "978-90-365-3089-7",
school = "University of Twente",

}

Sperotto, A 2010, 'Flow-based intrusion detection', University of Twente, Zutphen. DOI: 10.3990/1.9789036530897

Flow-based intrusion detection. / Sperotto, Anna.

Zutphen, 2010. 166 p.

Research output: ThesisPhD Thesis - Research UT, graduation UT

TY - THES

T1 - Flow-based intrusion detection

AU - Sperotto,Anna

N1 - 10.3990/1.9789036530897

PY - 2010/10/14

Y1 - 2010/10/14

N2 - The spread of 1-10Gbps technology has in recent years paved the way to a flourishing landscape of new, high-bandwidth Internet services. As users, we depend on the Internet in our daily life for simple tasks such as checking e-mails, but also for managing private and financial information. However, entrusting such information to the Internet also means that the network has become an alluring place for hackers. To this threat, the research community has answered with an increased interest in intrusion detection. With the number of attacks almost exponentially increasing, and the attackers' motivations moving from ideological to economical, the researchers' attention is focused on developing new techniques to timely detect intruders and prevent damage. Our studies in the field of intrusion detection, however, made us realize that additional research is needed, in particular: the creation of shared data sets to validate Intrusion Detection Systems (IDSs) and the development of automatic procedures to tune the parameters of IDSs. The contribution of this thesis is that it develops a structured approach to intrusion detection that focuses on (i) shared ground-truth data sets and (ii) automatic parameter tuning. We develop our approach by focusing on network flows. Flows offer an aggregated view of network traffic, by reporting on the amount of packets and bytes exchanged over the network. Therefore, flows drastically reduce the amount of data to be analyzed. In this thesis, we aim at detecting anomalies in flow-based time series, describing how the number of flows, packets and bytes changes over time. Ground truth data sets are fundamental in the development phase, for validation purposes and, if publicly available, for comparison between different IDSs. We attack the problem of ground truth generation in two complementary manners. First, we obtain ground truth information for flow-based intrusion detection by manually creating it. We do so by means of a honeypot-based data collection and monitoring setup, specifically tuned to (i) offer an attracting platform for attackers, and (ii) include enhanced logging capabilities to support the labeling of the collected data. The outcome of our research has been a publicly released flow-based labeled data set. To the best of our knowledge, no such data set already exists. Second, we generate ground truth information in an automatic manner. We do this by generating artificial flow, packet and byte time series for benign and attack traffic. In this thesis, we rely upon Hidden Markov Models, which allow for probabilistic and compact representations of flow-based time series and can be used for generation purposes. Finally, we approach the problem of automatic tuning of IDSs. The performance of an IDS is governed by the trade-off between detecting all anomalies (at the expense of raising alarms too often), and missing anomalies (but not issuing many false alarms). We developed an optimization procedure that aims to mathematically treat such trade-off in a systematic manner, by automatically tuning the system parameters.

AB - The spread of 1-10Gbps technology has in recent years paved the way to a flourishing landscape of new, high-bandwidth Internet services. As users, we depend on the Internet in our daily life for simple tasks such as checking e-mails, but also for managing private and financial information. However, entrusting such information to the Internet also means that the network has become an alluring place for hackers. To this threat, the research community has answered with an increased interest in intrusion detection. With the number of attacks almost exponentially increasing, and the attackers' motivations moving from ideological to economical, the researchers' attention is focused on developing new techniques to timely detect intruders and prevent damage. Our studies in the field of intrusion detection, however, made us realize that additional research is needed, in particular: the creation of shared data sets to validate Intrusion Detection Systems (IDSs) and the development of automatic procedures to tune the parameters of IDSs. The contribution of this thesis is that it develops a structured approach to intrusion detection that focuses on (i) shared ground-truth data sets and (ii) automatic parameter tuning. We develop our approach by focusing on network flows. Flows offer an aggregated view of network traffic, by reporting on the amount of packets and bytes exchanged over the network. Therefore, flows drastically reduce the amount of data to be analyzed. In this thesis, we aim at detecting anomalies in flow-based time series, describing how the number of flows, packets and bytes changes over time. Ground truth data sets are fundamental in the development phase, for validation purposes and, if publicly available, for comparison between different IDSs. We attack the problem of ground truth generation in two complementary manners. First, we obtain ground truth information for flow-based intrusion detection by manually creating it. We do so by means of a honeypot-based data collection and monitoring setup, specifically tuned to (i) offer an attracting platform for attackers, and (ii) include enhanced logging capabilities to support the labeling of the collected data. The outcome of our research has been a publicly released flow-based labeled data set. To the best of our knowledge, no such data set already exists. Second, we generate ground truth information in an automatic manner. We do this by generating artificial flow, packet and byte time series for benign and attack traffic. In this thesis, we rely upon Hidden Markov Models, which allow for probabilistic and compact representations of flow-based time series and can be used for generation purposes. Finally, we approach the problem of automatic tuning of IDSs. The performance of an IDS is governed by the trade-off between detecting all anomalies (at the expense of raising alarms too often), and missing anomalies (but not issuing many false alarms). We developed an optimization procedure that aims to mathematically treat such trade-off in a systematic manner, by automatically tuning the system parameters.

KW - IR-73630

KW - METIS-271091

KW - EWI-18649

U2 - 10.3990/1.9789036530897

DO - 10.3990/1.9789036530897

M3 - PhD Thesis - Research UT, graduation UT

SN - 978-90-365-3089-7

CY - Zutphen

ER -

Sperotto A. Flow-based intrusion detection. Zutphen, 2010. 166 p. Available from, DOI: 10.3990/1.9789036530897