This page has only limited features, please log in for full access.
Advanced validation of cluster analysis is expected to increase confidence and allow reliable implementations. In this work, we describe and test CluReAL, an algorithm for refining clustering irrespective of the method used in the first place. Moreover, we present ideograms that enable summarizing and properly interpreting problem spaces that have been clustered. The presented techniques are built on absolute cluster validity indices. Experiments cover a wide variety of scenarios and six of the most popular clustering techniques. Results show the potential of CluReAL for enhancing clustering and the suitability of ideograms to understand the context of the data through the lens of the cluster analysis. Refinement and interpretability are both crucial to reduce failure and increase performance control and operational awareness in unsupervised analysis.
Félix Iglesias; Tanja Zseby; Arthur Zimek. Clustering refinement. International Journal of Data Science and Analytics 2021, 1 -21.
AMA StyleFélix Iglesias, Tanja Zseby, Arthur Zimek. Clustering refinement. International Journal of Data Science and Analytics. 2021; ():1-21.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby; Arthur Zimek. 2021. "Clustering refinement." International Journal of Data Science and Analytics , no. : 1-21.
The increased interest in secure and reliable communications has turned the analysis of network traffic data into a predominant topic. A high number of research papers propose methods to classify traffic, detect anomalies, or identify attacks. Although the goals and methodologies are commonly similar, we lack initiatives to categorize the data, methods, and findings systematically. In this paper, we present Network Traffic Analysis Research Curation (NTARC), a data model to store key information about network traffic analysis research. We additionally use NTARC to perform a critical review of the field of research conducted in the last two decades. The collection of descriptive research summaries enables the easy retrieval of relevant information and a better reuse of past studies by the application of quantitative analysis. Among others benefits, it enables the critical review of methodologies, the detection of common flaws, the obtaining of baselines, and the consolidation of best practices. Furthermore, it provides a basis to achieve reproducibility, a key requirement that has long been undervalued in the area of traffic analysis. Thus, besides reading hard copies of papers, with NTARC, researchers can make use of a digital environment that facilitates queries and reviews over a comprehensive field corpus.
Félix Iglesias; Daniel C. Ferreira; Gernot Vormayr; Maximilian Bachl; Tanja Zseby. NTARC: A Data Model for the Systematic Review of Network Traffic Analysis Research. Applied Sciences 2020, 10, 4307 .
AMA StyleFélix Iglesias, Daniel C. Ferreira, Gernot Vormayr, Maximilian Bachl, Tanja Zseby. NTARC: A Data Model for the Systematic Review of Network Traffic Analysis Research. Applied Sciences. 2020; 10 (12):4307.
Chicago/Turabian StyleFélix Iglesias; Daniel C. Ferreira; Gernot Vormayr; Maximilian Bachl; Tanja Zseby. 2020. "NTARC: A Data Model for the Systematic Review of Network Traffic Analysis Research." Applied Sciences 10, no. 12: 4307.
Among network analysts, “anomaly” and “outlier” are terms commonly associated to network attacks. Attacks are outliers (or anomalies) in the sense that they exploit communication protocols with novel infiltration techniques against which there are no defenses yet. But due to the dynamic and heterogeneous nature of network traffic, attacks may look like normal traffic variations. Also attackers try to make attacks indistinguishable from normal traffic. Then, are network attacks actual anomalies? This paper tries to answer this important question from analytical perspectives. To that end, we test the outlierness of attacks in a recent, complete dataset for evaluating Intrusion Detection by using five different feature vectors for network traffic representation and five different outlier ranking algorithms. In addition, we craft a new feature vector that maximizes the discrimination power of outlierness. Results show that attacks are significantly more outlier than legitimate traffic—specially in representations that profile network endpoints—, although attack and non-attack outlierness distributions strongly overlap. Given that network spaces are noisy and show density variations in non-attack spaces, algorithms that measure outlierness locally are less effective than algorithms that measure outlierness with global distance estimations. Our research confirms that unsupervised methods are suitable for attack detection, but also that they must be combined with methods that leverage pre-knowledge to prevent high false positive rates. Our findings expand the basis for using unsupervised methods in attack detection.
Félix Iglesias; Alexander Hartl; Tanja Zseby; Arthur Zimek. Are Network Attacks Outliers? A Study of Space Representations and Unsupervised Algorithms. Communications in Computer and Information Science 2020, 159 -175.
AMA StyleFélix Iglesias, Alexander Hartl, Tanja Zseby, Arthur Zimek. Are Network Attacks Outliers? A Study of Space Representations and Unsupervised Algorithms. Communications in Computer and Information Science. 2020; ():159-175.
Chicago/Turabian StyleFélix Iglesias; Alexander Hartl; Tanja Zseby; Arthur Zimek. 2020. "Are Network Attacks Outliers? A Study of Space Representations and Unsupervised Algorithms." Communications in Computer and Information Science , no. : 159-175.
The application of clustering involves the interpretation of objects placed in multi-dimensional spaces. The task of clustering itself is inherently submitted to subjectivity, the optimal solution can be extremely costly to discover and sometimes even unreachable or nonexistent. This fact introduces a trade-off between accuracy and computational effort, moreover given that engineering applications usually work well with suboptimal solutions. In such applied scenarios, cluster validation is mandatory to refine algorithms and ensure that solutions are meaningful. Validity indices are commonly intended to benchmark diverse clustering setups, therefore they are coefficients with a relative nature, i.e., useful when compared to one another. In this paper, we propose a validation methodology that enables absolute evaluations of clustering results. Our method performs geometric measurements of the solution space and provides a coherent interpretation of the data structure by using indices based on inter- and intra-cluster distances, density, and multimodality within clusters. Conducted tests and comparisons with well-known indices show that our validation methodology improves the robustness of the clustering application for knowledge discovery. While clustering is often performed as a black box technique, our index is construable and therefore allows for the implementation of systems enriched with self-checking capabilities.
Felix Iglesias; Tanja Zseby; Arthur Zimek. Absolute Cluster Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence 2019, 42, 2096 -2112.
AMA StyleFelix Iglesias, Tanja Zseby, Arthur Zimek. Absolute Cluster Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 42 (9):2096-2112.
Chicago/Turabian StyleFelix Iglesias; Tanja Zseby; Arthur Zimek. 2019. "Absolute Cluster Validity." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 9: 2096-2112.
We present a tool for generating multidimensional synthetic datasets for testing, evaluating, and benchmarking unsupervised classification algorithms. Our proposal fills a gap observed in previous approaches with regard to underlying distributions for the creation of multidimensional clusters. As a novelty, normal and non-normal distributions can be combined for either independently defining values feature by feature (i.e., multivariate distributions) or establishing overall intra-cluster distances. Being highly flexible, parameterizable, and randomizable, MDCGen also implements classic pursued features: (a) customization of cluster-separation, (b) overlap control, (c) addition of outliers and noise, (d) definition of correlated variables and rotations, (e) flexibility for allowing or avoiding isolation constraints per dimension, (f) creation of subspace clusters and subspace outliers, (g) importing arbitrary distributions for the value generation, and (h) dataset quality evaluations, among others. As a result, the proposed tool offers an improved range of potential datasets to perform a more comprehensive testing of clustering algorithms.
Félix Iglesias; Tanja Zseby; Daniel Ferreira; Arthur Zimek. MDCGen: Multidimensional Dataset Generator for Clustering. Journal of Classification 2019, 36, 599 -618.
AMA StyleFélix Iglesias, Tanja Zseby, Daniel Ferreira, Arthur Zimek. MDCGen: Multidimensional Dataset Generator for Clustering. Journal of Classification. 2019; 36 (3):599-618.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby; Daniel Ferreira; Arthur Zimek. 2019. "MDCGen: Multidimensional Dataset Generator for Clustering." Journal of Classification 36, no. 3: 599-618.
Clock synchronization has become essential to modern societies since many critical infrastructures depend on a precise notion of time. This paper analyzes security aspects of high-precision clock synchronization protocols, particularly their alleged protection against delay attacks when clock synchronization traffic is encrypted using standard network security protocols such as IPsec, MACsec, or TLS. We use the Precision Time Protocol (PTP), the most widely used protocol for high-precision clock synchronization, to demonstrate that statistical traffic analysis can identify properties that support selective message delay attacks even for encrypted traffic. We furthermore identify a fundamental conflict in secure clock synchronization between the need of deterministic traffic to improve precision and the need to obfuscate traffic in order to mitigate delay attacks. A theoretical analysis of clock synchronization protocols isolates the characteristics that make these protocols vulnerable to delay attacks and argues that such attacks cannot be prevented entirely but only be mitigated. Knowledge of the underlying communication network in terms of one-way delays and knowledge on physical constraints of these networks can help to compute guaranteed maximum bounds for slave clock offsets. These bounds are essential for detecting delay attacks and minimizing their impact. In the general case, however, the precision that can be guaranteed in adversarial settings is orders of magnitude lower than required for high-precision clock synchronization in critical infrastructures, which, therefore, must not rely on a precise notion of time when using untrusted networks.
Robert Annessi; Joachim Fabini; Felix Iglesias; Tanja Zseby. Encryption is Futile: Delay Attacks on High-Precision Clock Synchronization. 2018, 1 .
AMA StyleRobert Annessi, Joachim Fabini, Felix Iglesias, Tanja Zseby. Encryption is Futile: Delay Attacks on High-Precision Clock Synchronization. . 2018; ():1.
Chicago/Turabian StyleRobert Annessi; Joachim Fabini; Felix Iglesias; Tanja Zseby. 2018. "Encryption is Futile: Delay Attacks on High-Precision Clock Synchronization." , no. : 1.
Adversarial machine learning copes with the development of methods to prevent machine learning algorithms from being misled by malicious users. This field is especially relevant for applications where machine learning lies in the core of security systems. In the field of network security, adversarial samples are actually novel network attacks or old attacks with tuned properties. This paper proposes to blur classification boundaries in order to enhance machine learning robustness and improve the detection of adversarial samples that exploit learning weaknesses. We test this concept by an experimental setup with network traffic in which linear decision trees are wrapped by a one-class-membership scoring algorithm. We benchmark our proposal with plain linear decision trees and fuzzy decision trees. Results show that evasive attacks (i.e., false negatives) tend to be ranked with low class-membership levels, meaning that they are located in zones close to classification thresholds. In addition, classification performances improve when membership scores are added as new features. Using fuzzy class boundaries is highly consistent with the interpretation of many network traffic features used for malware detection; moreover, it prevents network attackers from exploiting classification boundaries as attack objectives.
Félix Iglesias; Jelena Milosevic; Tanja Zseby. Fuzzy classification boundaries against adversarial network attacks. Fuzzy Sets and Systems 2018, 368, 20 -35.
AMA StyleFélix Iglesias, Jelena Milosevic, Tanja Zseby. Fuzzy classification boundaries against adversarial network attacks. Fuzzy Sets and Systems. 2018; 368 ():20-35.
Chicago/Turabian StyleFélix Iglesias; Jelena Milosevic; Tanja Zseby. 2018. "Fuzzy classification boundaries against adversarial network attacks." Fuzzy Sets and Systems 368, no. : 20-35.
The consolidation of encryption and big data in network communications have made deep packet inspection no longer feasible in large networks. Early attack detection requires feature vectors which are easy to extract, process, and analyze, allowing their generation also from encrypted traffic. So far, experts have selected features based on their intuition, previous research, or acritically assuming standards, but there is no general agreement about the features to use for attack detection in a broad scope. We compared five lightweight feature sets that have been proposed in the scientific literature for the last few years, and evaluated them with supervised machine learning. For our experiments, we use the UNSW-NB15 dataset, recently published as a new benchmark for network security. Results showed three remarkable findings: (1) Analysis based on source behavior instead of classic flow profiles is more effective for attack detection; (2) meta-studies on past research can be used to establish satisfactory benchmarks; and (3) features based on packet length are clearly determinant for capturing malicious activity. Our research showed that vectors currently used for attack detection are oversized, their accuracy and speed can be improved, and are to be adapted for dealing with encrypted traffic.
Fares Meghdouri; Tanja Zseby; Félix Iglesias. Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic. Applied Sciences 2018, 8, 2196 .
AMA StyleFares Meghdouri, Tanja Zseby, Félix Iglesias. Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic. Applied Sciences. 2018; 8 (11):2196.
Chicago/Turabian StyleFares Meghdouri; Tanja Zseby; Félix Iglesias. 2018. "Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic." Applied Sciences 8, no. 11: 2196.
The detection of covert channels in communication networks is a current security challenge. By clandestinely transferring information, covert channels are able to circumvent security barriers, compromise systems, and facilitate data leakage. A set of statistical methods called DAT (Descriptive Analytics of Traffic) has been previously proposed as a general approach for detecting covert channels. In this paper, we implement and evaluate DAT detectors for the specific case of covert timing channels. Additionally, we propose machine learning models to induce classification rules and enable the fine parameterization of DAT detectors. A testbed has been created to reproduce main timing techniques published in the literature; consequently, the testbed allows the evaluation of covert channel detection techniques. We specifically applied Decision Trees to infer DAT-rules, achieving high accuracy and detection rates. This paper is a step forward for the actual implementation of effective covert channel detection plugins in modern network security devices.
Félix Iglesias; Valentin Bernhardt; Robert Annessi; Tanja Zseby. Decision Tree Rule Induction for Detecting Covert Timing Channels in TCP/IP Traffic. Lecture Notes in Computer Science 2017, 105 -122.
AMA StyleFélix Iglesias, Valentin Bernhardt, Robert Annessi, Tanja Zseby. Decision Tree Rule Induction for Detecting Covert Timing Channels in TCP/IP Traffic. Lecture Notes in Computer Science. 2017; ():105-122.
Chicago/Turabian StyleFélix Iglesias; Valentin Bernhardt; Robert Annessi; Tanja Zseby. 2017. "Decision Tree Rule Induction for Detecting Covert Timing Channels in TCP/IP Traffic." Lecture Notes in Computer Science , no. : 105-122.
This paper studies the temporal behavior of communication flows in the Internet. Characterization of flows by temporal patterns supports traffic classification and filtering for network management and network security in situations where full packet data is not accessible (e.g. obfuscated or encrypted traffic) or cannot be analyzed due to privacy concerns or resource limitations. In this paper we define a time activity feature vector that describes the temporal behavior of flows. Later, we use cluster analysis to capture the most common time activity patterns in real Internet traffic using traces from the MAWI dataset. We discovered a set of 7 time-activity footprints and show that 95.3% of the analyzed flows can be characterized based on such footprints, which represent different behaviors for the three main protocols (4 in TCP, 1 in ICMP and 2 in UDP). In addition, we found that the majority of the observed flows consisted of short, one-time bursts. An in-depth inspection revealed, besides some DNS traffic, the preponderance of a large number of scanning, probing, DoS attacks and backscatter traffic in the network. Flows transmitting meaningful data became outliers among short, one-time bursts of unwanted traffic.
Félix Iglesias; Tanja Zseby. Time-activity footprints in IP traffic. Computer Networks 2016, 107, 64 -75.
AMA StyleFélix Iglesias, Tanja Zseby. Time-activity footprints in IP traffic. Computer Networks. 2016; 107 ():64-75.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby. 2016. "Time-activity footprints in IP traffic." Computer Networks 107, no. : 64-75.
Covert channels provide means to conceal information transfer between hosts and bypass security barriers in communication networks. Hidden communication is of paramount concern for governments and companies, because it can conceal data leakage and malware communication, which are crucial building blocks used in cyber crime. We propose detectors based on descriptive analytics of traffic (DAT) to facilitate revealing network and transport layer covert channels originated from a wide spectrum of published data-hiding techniques. DAT detectors transform communication data into flexible feature vectors that represent traffic by a set of extracted calculations and estimations. For the case of covert channels, the core of the detection is performed by the combined application of autocorrelation calculations and multimodality measures built upon kernel density estimations and Pareto charts. DAT detectors are devised to be embedded as extensions of network intrusion detection systems, being able to perform fast, lightweight analysis of numerous flows. The present paper focuses specifically on TCP/IP traffic and provides suitable classifications of TCP/IP fields and related covert channel techniques from the perspective of the statistical detection. The proposed methodology is evaluated with public traffic datasets as well as covert channels generated according to main techniques described in the related literature. Copyright © 2016 John Wiley & Sons, Ltd.
Felix Iglesias; Robert Annessi; Tanja Zseby. DAT detectors: uncovering TCP/IP covert channels by descriptive analytics. Security and Communication Networks 2016, 9, 3011 -3029.
AMA StyleFelix Iglesias, Robert Annessi, Tanja Zseby. DAT detectors: uncovering TCP/IP covert channels by descriptive analytics. Security and Communication Networks. 2016; 9 (15):3011-3029.
Chicago/Turabian StyleFelix Iglesias; Robert Annessi; Tanja Zseby. 2016. "DAT detectors: uncovering TCP/IP covert channels by descriptive analytics." Security and Communication Networks 9, no. 15: 3011-3029.
Network security requires real-time monitoring of network traffic in order to detect new and unexpected attacks. Attack detection methods based on deep packet inspection are time consuming and costly, due to their high computational demands. This paper proposes a fast, lightweight method to distinguish different attack types observed in an IP darkspace monitor. The method is based on entropy measures of traffic-flow features and machine learning techniques. The explored data belongs to a portion of the Internet background radiation from a large IP darkspace, i.e., real traffic captures that exclusively contain unsolicited traffic, ongoing attacks, attack preparation activities and attack aftermaths. Results from an in-depth traffic analysis based on packet headers and content are used as a reference to label data and to evaluate the quality of the entropy-based classification. Full IP darkspace traffic captures from a three-week observation period in April, 2012, are used to compare the entropy-based classification with the in-depth traffic analysis. Results show that several traffic types present a high correlation to the respective traffic-flow entropy signals and can even fit polynomial regression models. Therefore, sudden changes in traffic types caused by new attacks or attack preparation activities can be identified based on entropy variations.
Félix Iglesias; Tanja Zseby. Entropy-Based Characterization of Internet Background Radiation. Entropy 2014, 17, 74 -101.
AMA StyleFélix Iglesias, Tanja Zseby. Entropy-Based Characterization of Internet Background Radiation. Entropy. 2014; 17 (1):74-101.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby. 2014. "Entropy-Based Characterization of Internet Background Radiation." Entropy 17, no. 1: 74-101.
Anomaly detection in communication networks provides the basis for the uncovering of novel attacks, misconfigurations and network failures. Resource constraints for data storage, transmission and processing make it beneficial to restrict input data to features that are (a) highly relevant for the detection task and (b) easily derivable from network observations without expensive operations. Removing strong correlated, redundant and irrelevant features also improves the detection quality for many algorithms that are based on learning techniques. In this paper we address the feature selection problem for network traffic based anomaly detection. We propose a multi-stage feature selection method using filters and stepwise regression wrappers. Our analysis is based on 41 widely-adopted traffic features that are presented in several commonly used traffic data sets. With our combined feature selection method we could reduce the original feature vectors from 41 to only 16 features. We tested our results with five fundamentally different classifiers, observing no significant reduction of the detection performance. In order to quantify the practical benefits of our results, we analyzed the costs for generating individual features from standard IP Flow Information Export records, available at many routers. We show that we can eliminate 13 very costly features and thus reducing the computational effort for on-line feature generation from live traffic observations at network nodes.
Félix Iglesias; Tanja Zseby. Analysis of network traffic features for anomaly detection. Machine Learning 2014, 101, 59 -84.
AMA StyleFélix Iglesias, Tanja Zseby. Analysis of network traffic features for anomaly detection. Machine Learning. 2014; 101 (1-3):59-84.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby. 2014. "Analysis of network traffic features for anomaly detection." Machine Learning 101, no. 1-3: 59-84.
An IP darkspace is an unused IP address range. Addresses are announced by routing, but no hosts are attached. Therefore all traffic directed to IP darkspace addresses is unsolicited and usually originates from attacks, attack preparation activities or misconfigurations. Most of the observed traffic belongs to known phenomena (e.g. horizontal scanning targeting a specific port) and is of limited interest to security analysts. But hidden in the vast amount of common attacks, smaller unusual events may indicate new malicious activities. In this paper we present a methodology to distinguish IP darkspace sources with common traffic patterns from sources that show uncommon behavior and may be the origin of novel attacks. For this, we model IP darkspace sources based on clustering techniques. We extract data from one complete month of a large /8 darkspace capture and use a very simple feature vector. Our analysis is purely based on clustering techniques and does not require any pre-knowledge about phenomena in darkspace traffic. We found that about 75% of the darkspace IP sources contributes to a set of very stable clusters, 4% to less stable clusters and 21% to outliers. This allows us to concentrate the effort for searching for new attacks in just 21% of the sources.
Félix Iglesias; Tanja Zseby. Modelling IP darkspace traffic by means of clustering techniques. 2014 IEEE Conference on Communications and Network Security 2014, 166 -174.
AMA StyleFélix Iglesias, Tanja Zseby. Modelling IP darkspace traffic by means of clustering techniques. 2014 IEEE Conference on Communications and Network Security. 2014; ():166-174.
Chicago/Turabian StyleFélix Iglesias; Tanja Zseby. 2014. "Modelling IP darkspace traffic by means of clustering techniques." 2014 IEEE Conference on Communications and Network Security , no. : 166-174.
Félix Iglesias Vázquez. Comparison of standard and case-based user profiles in building’s energy performance simulation. 2021, 1 .
AMA StyleFélix Iglesias Vázquez. Comparison of standard and case-based user profiles in building’s energy performance simulation. . 2021; ():1.
Chicago/Turabian StyleFélix Iglesias Vázquez. 2021. "Comparison of standard and case-based user profiles in building’s energy performance simulation." , no. : 1.