This page has only limited features, please log in for full access.
Electrocardiographic (ECG) signals have been used for clinical purposes for a long time. Notwithstanding, they may also be used as the input for a biometric identification system. Several studies, as well as some prototypes, are already based on this principle. One of the methods already used for biometric identification relies on a measure of similarity based on the Kolmogorov Complexity, called the Normalized Relative Compression (NRC)—this approach evaluates the similarity between two ECG segments without the need to delineate the signal wave. This methodology is the basis of the present work. We have collected a dataset of ECG signals from twenty participants on two different sessions, making use of three different kits simultaneously—one of them using dry electrodes, placed on their fingers; the other two using wet sensors placed on their wrists and chests. The aim of this work was to study the influence of the ECG protocol collection, regarding the biometric identification system’s performance. Several variables in the data acquisition are not controllable, so some of them will be inspected to understand their influence in the system. Movement, data collection point, time interval between train and test datasets and ECG segment duration are examples of variables that may affect the system, and they are studied in this paper. Through this study, it was concluded that this biometric identification system needs at least 10 s of data to guarantee that the system learns the essential information. It was also observed that “off-the-person” data acquisition led to a better performance over time, when compared to “on-the-person” places.
Mariana Ramos; João Carvalho; Armando Pinho; Susana Brás. On the Impact of the Data Acquisition Protocol on ECG Biometric Identification. Sensors 2021, 21, 4645 .
AMA StyleMariana Ramos, João Carvalho, Armando Pinho, Susana Brás. On the Impact of the Data Acquisition Protocol on ECG Biometric Identification. Sensors. 2021; 21 (14):4645.
Chicago/Turabian StyleMariana Ramos; João Carvalho; Armando Pinho; Susana Brás. 2021. "On the Impact of the Data Acquisition Protocol on ECG Biometric Identification." Sensors 21, no. 14: 4645.
Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.
Milton Silva; Diogo Pratas; Armando Pinho. AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models. Entropy 2021, 23, 530 .
AMA StyleMilton Silva, Diogo Pratas, Armando Pinho. AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models. Entropy. 2021; 23 (5):530.
Chicago/Turabian StyleMilton Silva; Diogo Pratas; Armando Pinho. 2021. "AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models." Entropy 23, no. 5: 530.
Emotional responses are associated with distinct body alterations and are crucial to foster adaptive responses, well-being, and survival. Emotion identification may improve peoples’ emotion regulation strategies and interaction with multiple life contexts. Several studies have investigated emotion classification systems, but most of them are based on the analysis of only one, a few, or isolated physiological signals. Understanding how informative the individual signals are and how their combination works would allow to develop more cost-effective, informative, and objective systems for emotion detection, processing, and interpretation. In the present work, electrocardiogram, electromyogram, and electrodermal activity were processed in order to find a physiological model of emotions. Both a unimodal and a multimodal approach were used to analyze what signal, or combination of signals, may better describe an emotional response, using a sample of 55 healthy subjects. The method was divided in: (1) signal preprocessing; (2) feature extraction; (3) classification using random forest and neural networks. Results suggest that the electrocardiogram (ECG) signal is the most effective for emotion classification. Yet, the combination of all signals provides the best emotion identification performance, with all signals providing crucial information for the system. This physiological model of emotions has important research and clinical implications, by providing valuable information about the value and weight of physiological signals for emotional classification, which can critically drive effective evaluation, monitoring and intervention, regarding emotional processing and regulation, considering multiple contexts.
Gisela Pinto; João M. Carvalho; Filipa Barros; Sandra C. Soares; Armando J. Pinho; Susana Brás. Multimodal Emotion Evaluation: A Physiological Model for Cost-Effective Emotion Classification. Sensors 2020, 20, 3510 .
AMA StyleGisela Pinto, João M. Carvalho, Filipa Barros, Sandra C. Soares, Armando J. Pinho, Susana Brás. Multimodal Emotion Evaluation: A Physiological Model for Cost-Effective Emotion Classification. Sensors. 2020; 20 (12):3510.
Chicago/Turabian StyleGisela Pinto; João M. Carvalho; Filipa Barros; Sandra C. Soares; Armando J. Pinho; Susana Brás. 2020. "Multimodal Emotion Evaluation: A Physiological Model for Cost-Effective Emotion Classification." Sensors 20, no. 12: 3510.
SummaryNext-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics inquiry to students in life sciences.Availability and implementationGTO is implemented in C language and it is available, under the MIT license, at http://bioinformatics.ua.pt/[email protected] informationSupplementary data are available at publisher’s Web site.
Joao Rafael Almeida; Armando J. Pinho; José Luis Oliveira; Olga Fajarda; Diogo Pratas. GTO: a toolkit to unify pipelines in genomic and proteomic research. 2020, 1 .
AMA StyleJoao Rafael Almeida, Armando J. Pinho, José Luis Oliveira, Olga Fajarda, Diogo Pratas. GTO: a toolkit to unify pipelines in genomic and proteomic research. . 2020; ():1.
Chicago/Turabian StyleJoao Rafael Almeida; Armando J. Pinho; José Luis Oliveira; Olga Fajarda; Diogo Pratas. 2020. "GTO: a toolkit to unify pipelines in genomic and proteomic research." , no. : 1.
Background The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial due to their role in chromosomal evolution, genetic disorders and cancer; Results We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between two DNA sequences. This computational solution extracts information contents of the two sequences, exploiting a data compression technique, in order for finding rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image; Conclusions Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves and mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions complied with previous studies which took alignment-based approaches or performed FISH (Fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ~1 GB, which makes Smash++ feasible to run on present-day standard computers.
Morteza Hosseini; Diogo Pratas; Burkhard Morgenstern; Armando J. Pinho. Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements. 2019, 1 .
AMA StyleMorteza Hosseini, Diogo Pratas, Burkhard Morgenstern, Armando J. Pinho. Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements. . 2019; ():1.
Chicago/Turabian StyleMorteza Hosseini; Diogo Pratas; Burkhard Morgenstern; Armando J. Pinho. 2019. "Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements." , no. : 1.
The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.
Diogo Pratas; Morteza Hosseini; Jorge M. Silva; Armando J. Pinho. A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models. Entropy 2019, 21, 1074 .
AMA StyleDiogo Pratas, Morteza Hosseini, Jorge M. Silva, Armando J. Pinho. A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models. Entropy. 2019; 21 (11):1074.
Chicago/Turabian StyleDiogo Pratas; Morteza Hosseini; Jorge M. Silva; Armando J. Pinho. 2019. "A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models." Entropy 21, no. 11: 1074.
Identifying the emotion that someone is feeling will allow to improve the experience of the person interaction with environments, devices, and contents. Our body responds to events around us, by emotional responses, reflected in cognitive, behavioral and physiological dimensions. In the present work, we target the electrocardiogram (ECG) response as a mean to express emotions. Its processing is performed using information-theoretical measures, allowing true exploratory data mining. Participants recruited for the experiment watched three video sets in three different days, with a different emotion being induced in each day: fear, happiness, and neutral condition. The method is divided in: (1) conversion of the real-valued ECG record into a symbolic time-series; (2) relative compression of the symbolic representation of the ECG, using the symbolic ECG records stored in the database as a reference; (3) identification of the ECG record class, using a 1-NN (nearest neighbor) classifier. An accuracy of 90% was obtained. A posteriori analysis of the false negative results indicated that there was a relation between the relative dissimilarity measure and the self-reported emotions.
Susana Brás; João M. Carvalho; Filipa Barros; Cláudia Figueiredo; Sandra Soares; Armando J. Pinho. An Information-Theoretical Method for Emotion Classification. VI Latin American Congress on Biomedical Engineering CLAIB 2014, Paraná, Argentina 29, 30 & 31 October 2014 2019, 253 -261.
AMA StyleSusana Brás, João M. Carvalho, Filipa Barros, Cláudia Figueiredo, Sandra Soares, Armando J. Pinho. An Information-Theoretical Method for Emotion Classification. VI Latin American Congress on Biomedical Engineering CLAIB 2014, Paraná, Argentina 29, 30 & 31 October 2014. 2019; ():253-261.
Chicago/Turabian StyleSusana Brás; João M. Carvalho; Filipa Barros; Cláudia Figueiredo; Sandra Soares; Armando J. Pinho. 2019. "An Information-Theoretical Method for Emotion Classification." VI Latin American Congress on Biomedical Engineering CLAIB 2014, Paraná, Argentina 29, 30 & 31 October 2014 , no. : 253-261.
Morteza Hosseini; Diogo Pratas; Armando J. Pinho. A Probabilistic Method to Find and Visualize Distinct Regions in Protein Sequences. 2019 27th European Signal Processing Conference (EUSIPCO) 2019, 1 .
AMA StyleMorteza Hosseini, Diogo Pratas, Armando J. Pinho. A Probabilistic Method to Find and Visualize Distinct Regions in Protein Sequences. 2019 27th European Signal Processing Conference (EUSIPCO). 2019; ():1.
Chicago/Turabian StyleMorteza Hosseini; Diogo Pratas; Armando J. Pinho. 2019. "A Probabilistic Method to Find and Visualize Distinct Regions in Protein Sequences." 2019 27th European Signal Processing Conference (EUSIPCO) , no. : 1.
Image segmentation lies at the heart of multiple image processing chains, and achieving accurate segmentation is of utmost importance as it affects later processing. Image segmentation has recently gained interest in the field of remote sensing, mostly due to the widespread availability of remote sensing data. This increased availability poses the problem of transmitting and storing large volumes of data. Compression is a common strategy to alleviate this problem. However, lossy or near-lossless compression prevents a perfect reconstruction of the recovered data. This letter investigates the image segmentation performance in data reconstructed after a near-lossless or a lossy compression. Two image segmentation algorithms and two compression standards are evaluated on data from several instruments. Experimental results reveal that segmentation performance over previously near-lossless and lossy compressed images is not markedly reduced at low and moderate compression ratios (CRs). In some scenarios, accurate segmentation performance can be achieved even for high CRs.
Joaquin Garcia-Sobrino; Armando J. Pinho; Joan Serra-Sagrista. Competitive Segmentation Performance on Near-Lossless and Lossy Compressed Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 2019, 17, 834 -838.
AMA StyleJoaquin Garcia-Sobrino, Armando J. Pinho, Joan Serra-Sagrista. Competitive Segmentation Performance on Near-Lossless and Lossy Compressed Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters. 2019; 17 (5):834-838.
Chicago/Turabian StyleJoaquin Garcia-Sobrino; Armando J. Pinho; Joan Serra-Sagrista. 2019. "Competitive Segmentation Performance on Near-Lossless and Lossy Compressed Remote Sensing Images." IEEE Geoscience and Remote Sensing Letters 17, no. 5: 834-838.
The development of efficient DNA data compression tools is fundamental for reducing the storage, given the increasing availability of DNA sequences. The importance is also reflected for analysis purposes, given the search for optimized and new tools for anthropological and biomedical applications. In this paper, we describe the characteristics and impact of the GeCo2 tool, an improved version of the GeCo tool. In the proposed tool, we enhanced the mixture of models, where each context model or tolerant context model has now a specific decay factor. Additionally, specific cache-hash sizes and the ability to run only a context model with inverted repeats was developed. A new command line interface, twelve new pre-computed levels, and several optimizations in the code were also included. The results show a compression improvement using less computational resources (RAM and processing time). This new version permits more flexibility for compression and analysis purposes, namely a higher ability of addressing different characteristics of the DNA sequences. The decompression is performed using symmetric computational resources (RAM and time). The GeCo2 is freely available, under GPLv3 license, at https://github.com/pratas/geco2.
Diogo Pratas; Morteza Hosseini; Armando J. Pinho. GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences. Advances in Intelligent Systems and Computing 2019, 137 -145.
AMA StyleDiogo Pratas, Morteza Hosseini, Armando J. Pinho. GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences. Advances in Intelligent Systems and Computing. 2019; ():137-145.
Chicago/Turabian StyleDiogo Pratas; Morteza Hosseini; Armando J. Pinho. 2019. "GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences." Advances in Intelligent Systems and Computing , no. : 137-145.
Primer and adapter sequences are synthetic DNA or RNA oligonucleotides used in the process of amplification and sequencing. In theory, while similar primer sequences can be present on assembled genomes, adapter sequences should be trimmed (filtered) and, hence, absent from assembled genomes. However, given ambiguity problems, inefficient parameterization of trimming tools, and others, uncommonly they can be found in assembled genomes, on an exact or approximate state. In this paper, we investigate the occurrence of exact and approximate primer-adapter subsequences in assembled and, specifically, in the whole archaeal genomes of the NCBI database. We present a new method that combines data compression with custom signal processing operations, namely filtering and segmentation, to localize and visualize these regions given a defined similarity threshold. The program is freely available, under GPLv3 license, at https://github.com/pratas/maple.
Diogo Pratas; Morteza Hosseini; Armando J. Pinho. Visualization of Similar Primer and Adapter Sequences in Assembled Archaeal Genomes. Advances in Intelligent Systems and Computing 2019, 129 -136.
AMA StyleDiogo Pratas, Morteza Hosseini, Armando J. Pinho. Visualization of Similar Primer and Adapter Sequences in Assembled Archaeal Genomes. Advances in Intelligent Systems and Computing. 2019; ():129-136.
Chicago/Turabian StyleDiogo Pratas; Morteza Hosseini; Armando J. Pinho. 2019. "Visualization of Similar Primer and Adapter Sequences in Assembled Archaeal Genomes." Advances in Intelligent Systems and Computing , no. : 129-136.
To explore the inverted repeats regularities along the genome sequences, we propose a sliding window method to extract the concentration scores of inverted repeats periodic regularities and the total mass of possible inverted repeats pairs. We apply the method to the human genome and locate the regions with the potential for the formation of large number of hairpin/cruciform structures. The number of found windows with periodic regularities is small and the patterns of occurrence are chromosome specific.
Carlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho. Detection and Characterization of Local Inverted Repeats Regularities. Advances in Intelligent Systems and Computing 2019, 113 -120.
AMA StyleCarlos A. C. Bastos, Vera Afreixo, João Manuel Rodrigues, Armando J. Pinho. Detection and Characterization of Local Inverted Repeats Regularities. Advances in Intelligent Systems and Computing. 2019; ():113-120.
Chicago/Turabian StyleCarlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho. 2019. "Detection and Characterization of Local Inverted Repeats Regularities." Advances in Intelligent Systems and Computing , no. : 113-120.
Due to its characteristics, there is a trend in biometrics to use the ECG signal for personal identification. There are different applications for this, namely, adapting entertainment systems to personal settings automatically. Recent works based on compression models have shown that these approaches are suitable to ECG biometric identification. However, the best results are usually achieved by the methods that, at least, rely on one point of interest of the ECG – called fiducial methods. In this work, we propose a compression-based non-fiducial method, that uses a measure of similarity, called the Normalized Relative Compression—a measure related to the Kolmogorov complexity of strings. Our method uses extended-alphabet finite-context models (xaFCMs) on the quantized first-order derivative of the signal, instead of using directly the original signal, as other methods do. We were able to achieve state-of-the-art results on a database collected at the University of Aveiro, which was used on previous works, making it a good preliminary benchmark for the method.
João M. Carvalho; Susana Brás; Armando J. Pinho. Compression-Based Classification of ECG Using First-Order Derivatives. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019, 27 -36.
AMA StyleJoão M. Carvalho, Susana Brás, Armando J. Pinho. Compression-Based Classification of ECG Using First-Order Derivatives. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. 2019; ():27-36.
Chicago/Turabian StyleJoão M. Carvalho; Susana Brás; Armando J. Pinho. 2019. "Compression-Based Classification of ECG Using First-Order Derivatives." Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering , no. : 27-36.
Finding DNA sites with high potential for the formation of hairpin/cruciform structures is an important task. Previous works studied the distances between adjacent reversed complement words (symmetric word pairs) and also for non-adjacent words. It was observed that for some words a few distances were favoured (peaks) and that in some distributions there was strong peak regularity. The present work extends previous studies, by improving the detection and characterization of peak regularities in the symmetric word pairs distance distributions of the human genome. This work also analyzes the location of the sequences that originate the observed strong peak periodicity in the distance distribution. The results obtained in this work may indicate genomic sites with potential for the formation of hairpin/cruciform structures.
Carlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho; Raquel Silva. Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks. Interdisciplinary Sciences: Computational Life Sciences 2019, 11, 367 -372.
AMA StyleCarlos A. C. Bastos, Vera Afreixo, João Manuel Rodrigues, Armando J. Pinho, Raquel Silva. Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks. Interdisciplinary Sciences: Computational Life Sciences. 2019; 11 (3):367-372.
Chicago/Turabian StyleCarlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho; Raquel Silva. 2019. "Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks." Interdisciplinary Sciences: Computational Life Sciences 11, no. 3: 367-372.
Advancement of protein sequencing technologies has led to the production of a huge volume of data that needs to be stored and transmitted. This challenge can be tackled by compression. In this paper, we propose AC, a state-of-the-art method for lossless compression of amino acid sequences. The proposed method works based on the cooperation between finite-context models and substitutional tolerant Markov models. Compared to several general-purpose and specific-purpose protein compressors, AC provides the best bit-rates. This method can also compress the sequences nine times faster than its competitor, paq8l. In addition, employing AC, we analyze the compressibility of a large number of sequences from different domains. The results show that viruses are the most difficult sequences to be compressed. Archaea and bacteria are the second most difficult ones, and eukaryota are the easiest sequences to be compressed.
Morteza Hosseini; Diogo Pratas; Armando J. Pinho. AC: A Compression Tool for Amino Acid Sequences. Interdisciplinary Sciences: Computational Life Sciences 2019, 11, 68 -76.
AMA StyleMorteza Hosseini, Diogo Pratas, Armando J. Pinho. AC: A Compression Tool for Amino Acid Sequences. Interdisciplinary Sciences: Computational Life Sciences. 2019; 11 (1):68-76.
Chicago/Turabian StyleMorteza Hosseini; Diogo Pratas; Armando J. Pinho. 2019. "AC: A Compression Tool for Amino Acid Sequences." Interdisciplinary Sciences: Computational Life Sciences 11, no. 1: 68-76.
In this paper, we address handwritten digit classification as a special problem of data compression modeling. The creation of the models—usually known as training—is just a process of counting. Moreover, the model associated to each class can be trained independently of all the other class models. Also, they can be updated later with new examples, even if the old ones are not available anymore. Under this framework, we show that it is possible to attain a classification accuracy consistently above 99.3% on the MNIST dataset, using classifiers trained in less than one hour on a common laptop.
Armando J. Pinho; Diogo Pratas. An Application of Data Compression Models to Handwritten Digit Classification. Computer Vision 2018, 487 -495.
AMA StyleArmando J. Pinho, Diogo Pratas. An Application of Data Compression Models to Handwritten Digit Classification. Computer Vision. 2018; ():487-495.
Chicago/Turabian StyleArmando J. Pinho; Diogo Pratas. 2018. "An Application of Data Compression Models to Handwritten Digit Classification." Computer Vision , no. : 487-495.
The sequencing of ancient DNA samples provides a novel way to find, characterize, and distinguish exogenous genomes of endogenous targets. After sequencing, computational composition analysis enables filtering of undesired sources in the focal organism, with the purpose of improving the quality of assemblies and subsequent data analysis. More importantly, such analysis allows extinct and extant species to be identified without requiring a specific or new sequencing run. However, the identification of exogenous organisms is a complex task, given the nature and degradation of the samples, and the evident necessity of using efficient computational tools, which rely on algorithms that are both fast and highly sensitive. In this work, we relied on a fast and highly sensitive tool, FALCON-meta, which measures similarity against whole-genome reference databases, to analyse the metagenomic composition of an ancient polar bear (Ursus maritimus) jawbone fossil. The fossil was collected in Svalbard, Norway, and has an estimated age of 110,000 to 130,000 years. The FASTQ samples contained 349 GB of nonamplified shotgun sequencing data. We identified and localized, relative to the FASTQ samples, the genomes with significant similarities to reference microbial genomes, including those of viruses, bacteria, and archaea, and to fungal, mitochondrial, and plastidial sequences. Among other striking features, we found significant similarities between modern-human, some bacterial and viral sequences (contamination) and the organelle sequences of wild carrot and tomato relative to the whole samples. For each exogenous candidate, we ran a damage pattern analysis, which in addition to revealing shallow levels of damage in the plant candidates, identified the source as contamination.
Diogo Pratas; Morteza Hosseini; Gonçalo Grilo; Armando J. Pinho; Raquel M. Silva; Tânia Caetano; João Carneiro; Filipe Pereira. Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard. Genes 2018, 9, 445 .
AMA StyleDiogo Pratas, Morteza Hosseini, Gonçalo Grilo, Armando J. Pinho, Raquel M. Silva, Tânia Caetano, João Carneiro, Filipe Pereira. Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard. Genes. 2018; 9 (9):445.
Chicago/Turabian StyleDiogo Pratas; Morteza Hosseini; Gonçalo Grilo; Armando J. Pinho; Raquel M. Silva; Tânia Caetano; João Carneiro; Filipe Pereira. 2018. "Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard." Genes 9, no. 9: 445.
Diogo Pratas; Armando J. Pinho. Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight. 2018 26th European Signal Processing Conference (EUSIPCO) 2018, 1 .
AMA StyleDiogo Pratas, Armando J. Pinho. Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight. 2018 26th European Signal Processing Conference (EUSIPCO). 2018; ():1.
Chicago/Turabian StyleDiogo Pratas; Armando J. Pinho. 2018. "Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight." 2018 26th European Signal Processing Conference (EUSIPCO) , no. : 1.
The Normalized Relative Compression (NRC) is a recent dissimilarity measure, related to the Kolmogorov Complexity. It has been successfully used in different applications, like DNA sequences, images or even ECG (electrocardiographic) signal. It uses a compressor that compresses a target string using exclusively the information contained in a reference string. One possible approach is to use finite-context models (FCMs) to represent the strings. A finite-context model calculates the probability distribution of the next symbol, given the previous k symbols. In this paper, we introduce a generalization of the FCMs, called extended-alphabet finite-context models (xaFCM), that calculates the probability of occurrence of the next d symbols, given the previous k symbols. We perform experiments on two different sample applications using the xaFCMs and the NRC measure: ECG biometric identification, using a publicly available database; estimation of the similarity between DNA sequences of two different, but related, species – chromosome by chromosome. In both applications, we compare the results against those obtained by the FCMs. The results show that the xaFCMs use less memory and computational time to achieve the same or, in some cases, even more accurate results.
João M. Carvalho; Susana Brás; Diogo Pratas; Jacqueline Ferreira; Sandra C. Soares; Armando J. Pinho. Extended-alphabet finite-context models. Pattern Recognition Letters 2018, 112, 49 -55.
AMA StyleJoão M. Carvalho, Susana Brás, Diogo Pratas, Jacqueline Ferreira, Sandra C. Soares, Armando J. Pinho. Extended-alphabet finite-context models. Pattern Recognition Letters. 2018; 112 ():49-55.
Chicago/Turabian StyleJoão M. Carvalho; Susana Brás; Diogo Pratas; Jacqueline Ferreira; Sandra C. Soares; Armando J. Pinho. 2018. "Extended-alphabet finite-context models." Pattern Recognition Letters 112, no. : 49-55.
It is important to develop methods for finding DNA sites with high potential for the formation of hairpin/cruciform structures. In a previous work, we studied the distances between adjacent reversed complement words (symmetric words), and we observed that for some words some distances were favored. In the work presented here, we extended the study to the distance between non-adjacent reversed complement words and we observed strong periodicity in the distance distribution of some words. This may be an indication of potential for the formation of hairpin/cruciform structures.
Carlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho. An Analysis of Symmetric Words in Human DNA: Adjacent vs Non-adjacent Word Distances. Advances in Intelligent Systems and Computing 2018, 80 -87.
AMA StyleCarlos A. C. Bastos, Vera Afreixo, João Manuel Rodrigues, Armando J. Pinho. An Analysis of Symmetric Words in Human DNA: Adjacent vs Non-adjacent Word Distances. Advances in Intelligent Systems and Computing. 2018; ():80-87.
Chicago/Turabian StyleCarlos A. C. Bastos; Vera Afreixo; João Manuel Rodrigues; Armando J. Pinho. 2018. "An Analysis of Symmetric Words in Human DNA: Adjacent vs Non-adjacent Word Distances." Advances in Intelligent Systems and Computing , no. : 80-87.