This page has only limited features, please log in for full access.

Dr. Georgios Douzas
NOVA Information Management School

Basic Info


Research Keywords & Expertise

0 Deep Learning
0 Machine Learning
0 Mathematics
0 Physics
0 Standard Model

Honors and Awards

The user has no records in this section


Career Timeline

The user has no records in this section.


Short Biography

The user biography is not available.
Following
Followers
Co Authors
The list of users this user is following is empty.
Following: 0 users

Feed

Journal article
Published: 04 July 2021 in Remote Sensing
Reads 0
Downloads 0

In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data “on-demand” for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human–computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.

ACS Style

Joao Fonseca; Georgios Douzas; Fernando Bacao. Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing 2021, 13, 2619 .

AMA Style

Joao Fonseca, Georgios Douzas, Fernando Bacao. Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing. 2021; 13 (13):2619.

Chicago/Turabian Style

Joao Fonseca; Georgios Douzas; Fernando Bacao. 2021. "Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification." Remote Sensing 13, no. 13: 2619.

Journal article
Published: 29 June 2021 in Information
Reads 0
Downloads 0

Land cover maps are a critical tool to support informed policy development, planning, and resource management decisions. With significant upsides, the automatic production of Land Use/Land Cover maps has been a topic of interest for the remote sensing community for several years, but it is still fraught with technical challenges. One such challenge is the imbalanced nature of most remotely sensed data. The asymmetric class distribution impacts negatively the performance of classifiers and adds a new source of error to the production of these maps. In this paper, we address the imbalanced learning problem, by using K-means and the Synthetic Minority Oversampling Technique (SMOTE) as an improved oversampling algorithm. K-means SMOTE improves the quality of newly created artificial data by addressing both the between-class imbalance, as traditional oversamplers do, but also the within-class imbalance, avoiding the generation of noisy data while effectively overcoming data imbalance. The performance of K-means SMOTE is compared to three popular oversampling methods (Random Oversampling, SMOTE and Borderline-SMOTE) using seven remote sensing benchmark datasets, three classifiers (Logistic Regression, K-Nearest Neighbors and Random Forest Classifier) and three evaluation metrics using a five-fold cross-validation approach with three different initialization seeds. The statistical analysis of the results show that the proposed method consistently outperforms the remaining oversamplers producing higher quality land cover classifications. These results suggest that LULC data can benefit significantly from the use of more sophisticated oversamplers as spectral signatures for the same class can vary according to geographical distribution.

ACS Style

Joao Fonseca; Georgios Douzas; Fernando Bacao. Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures. Information 2021, 12, 266 .

AMA Style

Joao Fonseca, Georgios Douzas, Fernando Bacao. Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures. Information. 2021; 12 (7):266.

Chicago/Turabian Style

Joao Fonseca; Georgios Douzas; Fernando Bacao. 2021. "Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures." Information 12, no. 7: 266.

Journal article
Published: 06 June 2021 in Expert Systems with Applications
Reads 0
Downloads 0

Traditional supervised machine learning classifiers are challenged to learn highly skewed data distributions as they are designed to expect classes to equally contribute to the minimization of the classifiers cost function. Moreover, the classifiers design expects equal misclassification costs, causing a bias for overrepresented classes. Different strategies have been proposed to correct this issue. The modification of the data set has become a common practice since the procedure is generalizable to all classifiers. Various algorithms to rebalance the data distribution through the creation of synthetic instances were proposed in the past. In this paper, we propose a new oversampling algorithm named G-SOMO. The algorithm identifies optimal areas to create artificial data instances in an informed manner and utilizes a geometric region during the data generation process to increase their variability. Our empirical results on 69 datasets, validated with different classifiers and metrics against a benchmark of commonly used oversampling methods show that G-SOMO consistently outperforms competing oversampling methods. Additionally, the statistical significance of our results is established.

ACS Style

Georgios Douzas; Rene Rauch; Fernando Bacao. G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE. Expert Systems with Applications 2021, 183, 115230 .

AMA Style

Georgios Douzas, Rene Rauch, Fernando Bacao. G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE. Expert Systems with Applications. 2021; 183 ():115230.

Chicago/Turabian Style

Georgios Douzas; Rene Rauch; Fernando Bacao. 2021. "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE." Expert Systems with Applications 183, no. : 115230.

Technical note
Published: 17 December 2019 in Remote Sensing
Reads 0
Downloads 0

The automatic production of land use/land cover maps continues to be a challenging problem, with important impacts on the ability to promote sustainability and good resource management. The ability to build robust automatic classifiers and produce accurate maps can have a significant impact on the way we manage and optimize natural resources. The difficulty in achieving these results comes from many different factors, such as data quality and uncertainty. In this paper, we address the imbalanced learning problem, a common and difficult conundrum in remote sensing that affects the quality of classification results, by proposing Geometric-SMOTE, a novel oversampling method, as a tool for addressing the imbalanced learning problem in remote sensing. Geometric-SMOTE is a sophisticated oversampling algorithm which increases the quality of the instances generated in previous methods, such as the synthetic minority oversampling technique. The performance of Geometric- SMOTE, in the LUCAS (Land Use/Cover Area Frame Survey) dataset, is compared to other oversamplers using a variety of classifiers. The results show that Geometric-SMOTE significantly outperforms all the other oversamplers and improves the robustness of the classifiers. These results indicate that, when using imbalanced datasets, remote sensing researchers should consider the use of these new generation oversamplers to increase the quality of the classification results.

ACS Style

Georgios Douzas; Fernando Bacao; João Fonseca; Manvel Khudinyan. Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing 2019, 11, 3040 .

AMA Style

Georgios Douzas, Fernando Bacao, João Fonseca, Manvel Khudinyan. Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing. 2019; 11 (24):3040.

Chicago/Turabian Style

Georgios Douzas; Fernando Bacao; João Fonseca; Manvel Khudinyan. 2019. "Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm." Remote Sensing 11, no. 24: 3040.

Journal article
Published: 03 June 2019 in Information Sciences
Reads 0
Downloads 0

Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm, as well as any other oversampling method based on the SMOTE mechanism, generates synthetic samples along line segments that join minority class instances. In this paper we propose Geometric SMOTE (G-SMOTE) as a enhancement of the SMOTE data generation mechanism. G-SMOTE generates synthetic samples in a geometric region of the input space, around each selected minority instance. While in the basic configuration this region is a hyper-sphere, G-SMOTE allows its deformation to a hyper-spheroid. The performance of G-SMOTE is compared against SMOTE as well as baseline methods. We present empirical results that show a significant improvement in the quality of the generated data when G-SMOTE is used as an oversampling algorithm. An implementation of G-SMOTE is made available in the Python programming language.

ACS Style

Georgios Douzas; Fernando Bacao. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Information Sciences 2019, 501, 118 -135.

AMA Style

Georgios Douzas, Fernando Bacao. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Information Sciences. 2019; 501 ():118-135.

Chicago/Turabian Style

Georgios Douzas; Fernando Bacao. 2019. "Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE." Information Sciences 501, no. : 118-135.

Journal article
Published: 01 October 2018 in Information Sciences
Reads 0
Downloads 0

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampling technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 90 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation1 is made available in the Python programming language.

ACS Style

Georgios Douzas; Fernando Bacao; Felix Last. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences 2018, 465, 1 -20.

AMA Style

Georgios Douzas, Fernando Bacao, Felix Last. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences. 2018; 465 ():1-20.

Chicago/Turabian Style

Georgios Douzas; Fernando Bacao; Felix Last. 2018. "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE." Information Sciences 465, no. : 1-20.

Journal article
Published: 01 January 2018 in Expert Systems with Applications
Reads 0
Downloads 0
ACS Style

Georgios Douzas; Fernando Bacao. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications 2018, 91, 464 -471.

AMA Style

Georgios Douzas, Fernando Bacao. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications. 2018; 91 ():464-471.

Chicago/Turabian Style

Georgios Douzas; Fernando Bacao. 2018. "Effective data generation for imbalanced learning using conditional generative adversarial networks." Expert Systems with Applications 91, no. : 464-471.

Journal article
Published: 01 October 2017 in Expert Systems with Applications
Reads 0
Downloads 0

New method for generating artificial data using self-organizing maps.Provides a simple and safe way to deal with imbalanced datasets.Generates within-cluster and between cluster synthetic samples.Improves performance of classifiers and outperforms various oversampling methods. Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications. Specifically, they generate artificial data that can be used by any algorithm, not constraining the options of the user. In this paper, we present a new oversampling method, Self-Organizing Map-based Oversampling (SOMO), which through the application of a Self-Organizing Map produces a two dimensional representation of the input space, allowing for an effective generation of artificial data points. SOMO comprises three major stages: Initially a Self-Organizing Map produces a two-dimensional representation of the original, usually high-dimensional, space. Next it generates within-cluster synthetic samples and finally it generates between cluster synthetic samples. Additionally we present empirical results that show the improvement in the performance of algorithms, when artificial data generated by SOMO are used, and also show that our method outperforms various oversampling methods.

ACS Style

Georgios Douzas; Fernando Bacao. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications 2017, 82, 40 -52.

AMA Style

Georgios Douzas, Fernando Bacao. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications. 2017; 82 ():40-52.

Chicago/Turabian Style

Georgios Douzas; Fernando Bacao. 2017. "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning." Expert Systems with Applications 82, no. : 40-52.