This page has only limited features, please log in for full access.
Cybersecurity threats are on the rise, and small- and medium-sized enterprises (SMEs) struggle to cope with these developments. To combat threats, SMEs must first be willing and able to assess their cybersecurity posture. Cybersecurity risk assessment, generally performed with the help of metrics, provides the basis for an adequate defense. Significant challenges remain, however, especially in the complex socio-technical setting of SMEs. Seemingly basic questions, such as how to aggregate metrics and ensure solution adaptability, are still open to debate. Aggregation and adaptability are vital topics to SMEs, as they require the assimilation of metrics into an actionable advice adapted to their situation and needs. To address these issues, we systematically review socio-technical cybersecurity metric research in this paper. We analyse aggregation and adaptability considerations and investigate how current findings apply to the SME situation. To ensure that we provide valuable insights to researchers and practitioners, we integrate our results in a novel socio-technical cybersecurity framework geared towards the needs of SMEs. Our framework allowed us to determine a glaring need for intuitive, threat-based cybersecurity risk assessment approaches for the least digitally mature SMEs. In the future, we hope our framework will help to offer SMEs some deserved respite by guiding the design of suitable cybersecurity assessment solutions.
Max van Haastrecht; Bilge Yigit Ozkan; Matthieu Brinkhuis; Marco Spruit. Respite for SMEs: A Systematic Review of Socio-Technical Cybersecurity Metrics. Applied Sciences 2021, 11, 6909 .
AMA StyleMax van Haastrecht, Bilge Yigit Ozkan, Matthieu Brinkhuis, Marco Spruit. Respite for SMEs: A Systematic Review of Socio-Technical Cybersecurity Metrics. Applied Sciences. 2021; 11 (15):6909.
Chicago/Turabian StyleMax van Haastrecht; Bilge Yigit Ozkan; Matthieu Brinkhuis; Marco Spruit. 2021. "Respite for SMEs: A Systematic Review of Socio-Technical Cybersecurity Metrics." Applied Sciences 11, no. 15: 6909.
The summary of product characteristics from the European Medicines Agency is a reference document on medicines in the EU. It contains textual information for clinical experts on how to safely use medicines, including adverse drug reactions. Using natural language processing (NLP) techniques to automatically extract adverse drug reactions from such unstructured textual information helps clinical experts to effectively and efficiently use them in daily practices. Such techniques have been developed for Structured Product Labels from the Food and Drug Administration (FDA), but there is no research focusing on extracting from the Summary of Product Characteristics. In this work, we built a natural language processing pipeline that automatically scrapes the summary of product characteristics online and then extracts adverse drug reactions from them. Besides, we have made the method and its output publicly available so that it can be reused and further evaluated in clinical practices. In total, we extracted 32,797 common adverse drug reactions for 647 common medicines scraped from the Electronic Medicines Compendium. A manual review of 37 commonly used medicines has indicated a good performance, with a recall and precision of 0.99 and 0.934, respectively.
Zhengru Shen; Marco Spruit. Automatic Extraction of Adverse Drug Reactions from Summary of Product Characteristics. Applied Sciences 2021, 11, 2663 .
AMA StyleZhengru Shen, Marco Spruit. Automatic Extraction of Adverse Drug Reactions from Summary of Product Characteristics. Applied Sciences. 2021; 11 (6):2663.
Chicago/Turabian StyleZhengru Shen; Marco Spruit. 2021. "Automatic Extraction of Adverse Drug Reactions from Summary of Product Characteristics." Applied Sciences 11, no. 6: 2663.
The cost of recovery after a cybersecurity attack is likely to be high and may result in the loss of business at the extremes. Evaluating the acquired cybersecurity capabilities and evolving them to a desired state in consideration of risks are inevitable. This research proposes the CYberSecurity Focus Area Maturity (CYSFAM) Model for assessing cybersecurity capabilities. In this design science research, CYSFAM was evaluated at a large financial institution. From the many cybersecurity standards, 11 encompassing focus areas were identified. An assessment instrument—containing 144 questions—was developed. The in-depth single case study demonstrates how and to what extent cybersecurity related deficiencies can be identified. The novel scoring metric has been proven to be adequate, but can be further improved upon. The evaluation results show that the assessment questions suit the case study target audience; the assessment can be performed within four hours; the organization recognizes itself in the result.
Bilge Yigit Ozkan; Sonny van Lingen; Marco Spruit. The Cybersecurity Focus Area Maturity (CYSFAM) Model. Journal of Cybersecurity and Privacy 2021, 1, 119 -139.
AMA StyleBilge Yigit Ozkan, Sonny van Lingen, Marco Spruit. The Cybersecurity Focus Area Maturity (CYSFAM) Model. Journal of Cybersecurity and Privacy. 2021; 1 (1):119-139.
Chicago/Turabian StyleBilge Yigit Ozkan; Sonny van Lingen; Marco Spruit. 2021. "The Cybersecurity Focus Area Maturity (CYSFAM) Model." Journal of Cybersecurity and Privacy 1, no. 1: 119-139.
Healthcare is a data intensive industry in which data mining has a great potential for improving the wellbeing of patients. However, a multitude of barriers impedes the application of machine learning. This work focuses on medical adverse event prediction by domain experts. In this research we present AutoCrisp as a self-service data science prototype for multivariate sequential classification on electronic healthcare records to facilitate self-service data science by domain experts, without requiring any sophisticated data mining knowledge. We performed an empirical case study with the objective to predict bleedings with the use of AutoCrisp. Our results show that multivariate sequential classification for medical adverse event prediction can indeed be made accessible to healthcare professionals by providing appropriate tooling support.
Marco Spruit; Niels de Vries. Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records. Designing Networks for Innovation and Improvisation 2021, 517 -535.
AMA StyleMarco Spruit, Niels de Vries. Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records. Designing Networks for Innovation and Improvisation. 2021; ():517-535.
Chicago/Turabian StyleMarco Spruit; Niels de Vries. 2021. "Self-Service Data Science for Adverse Event Prediction in Electronic Healthcare Records." Designing Networks for Innovation and Improvisation , no. : 517-535.
This research assesses the education quality factors in secondary schools using a business intelligence approach. We operationalize each layer of the business intelligence framework to identify the stakeholders and components relevant to education quality. The resulting Education Quality Indicator (EQI) framework consists of seven Critical Success Factors (CSFs) and is measured through twenty-eight Key Performance Indicators (KPIs). The EQI framework was evaluated through expert interviews and a survey, and uncovers that the most important factor in assuring education quality is a teacher's ability to communicate with students. Furthermore, a feasibility analysis was conducted in a Dutch student monitoring information system. The results pave the way towards attainable and data-driven innovation in secondary education towards personalized student and teacher performance management using business intelligence technologies, which may ultimately integrate a wide variety of data sources from environmental sensors to wearables to optimally understand each individual student and teacher.
Marco Spruit; Tiffany Adriana. Business Intelligence in Secondary Education. Research Anthology on Preparing School Administrators to Lead Quality Education Programs 2021, 565 -597.
AMA StyleMarco Spruit, Tiffany Adriana. Business Intelligence in Secondary Education. Research Anthology on Preparing School Administrators to Lead Quality Education Programs. 2021; ():565-597.
Chicago/Turabian StyleMarco Spruit; Tiffany Adriana. 2021. "Business Intelligence in Secondary Education." Research Anthology on Preparing School Administrators to Lead Quality Education Programs , no. : 565-597.
There are various challenges regarding the development and use of cybersecurity standards for SMEs. In particular, SMEs need guidance in interpreting and implementing cybersecurity practices and adopting the standards to their specific needs. As an empirical study, the workshop Cybersecurity Standards: What Impacts and Gaps for SMEs was co-organized by the StandICT.eu and SMESEC Horizon 2020 projects with the aim of identifying cybersecurity standardisation needs and gaps for SMEs. The workshop participants were from key stakeholder groups that include policymakers, standards developing organisations, SME alliances, and cybersecurity organisations. This paper highlights the key discussions and outcomes of the workshop and presents the themes, current initiatives, and plans towards cybersecurity standardisation for SMEs. The findings from the workshop and multivocal literature searches were used to formulate an agenda for future research.
Bilge Yigit Ozkan; Marco Spruit. Cybersecurity Standardisation for SMEs. Research Anthology on Artificial Intelligence Applications in Security 2021, 1252 -1278.
AMA StyleBilge Yigit Ozkan, Marco Spruit. Cybersecurity Standardisation for SMEs. Research Anthology on Artificial Intelligence Applications in Security. 2021; ():1252-1278.
Chicago/Turabian StyleBilge Yigit Ozkan; Marco Spruit. 2021. "Cybersecurity Standardisation for SMEs." Research Anthology on Artificial Intelligence Applications in Security , no. : 1252-1278.
This paper identifies the effects of small and medium-sized enterprises’ (SME) characteristics on the general design principles for maturity models in the information security domain. The purpose is to guide the research on information security maturity modelling for SMEs that will fit in form and function for their capability assessment and development purposes, and promote organizational learning and development. This study reviews the established frameworks of general design principles for maturity models and projects the design requirements of our envisioned information security maturity model for SMEs. Maturity models have different purposes of uses (descriptive, prescriptive and comparative) and design principles with respect to these purposes of uses. The mapping of SME characteristics and design principles facilitates the development of an information security maturity model that systematically integrates the desired qualities and components addressing SME characteristics and requirements.
Bilge Yigit Ozkan; Marco Spruit. Addressing SME Characteristics for Designing Information Security Maturity Models. Collaboration in a Hyperconnected World 2020, 161 -174.
AMA StyleBilge Yigit Ozkan, Marco Spruit. Addressing SME Characteristics for Designing Information Security Maturity Models. Collaboration in a Hyperconnected World. 2020; ():161-174.
Chicago/Turabian StyleBilge Yigit Ozkan; Marco Spruit. 2020. "Addressing SME Characteristics for Designing Information Security Maturity Models." Collaboration in a Hyperconnected World , no. : 161-174.
Various tasks in natural language processing (NLP) suffer from lack of labelled training data, which deep neural networks are hungry for. In this paper, we relied upon features learned to generate relation triples from the open information extraction (OIE) task. First, we studied how transferable these features are from one OIE domain to another, such as from a news domain to a bio-medical domain. Second, we analyzed their transferability to a semantically related NLP task, namely, relation extraction (RE). We thereby contribute to answering the question: can OIE help us achieve adequate NLP performance without labelled data? Our results showed comparable performance when using inductive transfer learning in both experiments by relying on a very small amount of the target data, wherein promising results were achieved. When transferring to the OIE bio-medical domain, we achieved an F-measure of 78.0%, only 1% lower when compared to traditional learning. Additionally, transferring to RE using an inductive approach scored an F-measure of 67.2%, which was 3.8% lower than training and testing on the same task. Hereby, our analysis shows that OIE can act as a reliable source task.
Injy Sarhan; Marco Spruit. Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction. Applied Sciences 2020, 10, 5758 .
AMA StyleInjy Sarhan, Marco Spruit. Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction. Applied Sciences. 2020; 10 (17):5758.
Chicago/Turabian StyleInjy Sarhan; Marco Spruit. 2020. "Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction." Applied Sciences 10, no. 17: 5758.
In a time when the employment of natural language processing techniques in domains such as biomedicine, national security, finance, and law is flourishing, this study takes a deep look at its application in policy documents. Besides providing an overview of the current state of the literature that treats these concepts, the authors implement a set of natural language processing techniques on internal bank policies. The implementation of these techniques, together with the results that derive from the experiments and expert evaluation, introduce a meta-algorithmic modelling framework for processing internal business policies. This framework relies on three natural language processing techniques, namely information extraction, automatic summarization, and automatic keyword extraction. For the reference extraction and keyword extraction tasks, the authors calculated precision, recall, and F-scores. For the former, the researchers obtained 0.99, 0.84, and 0.89; for the latter, this research obtained 0.79, 0.87, and 0.83, respectively. Finally, the summary extraction approach was positively evaluated using a qualitative assessment.
Marco Spruit; Drilon Ferati. Text Mining Business Policy Documents. International Journal of Business Intelligence Research 2020, 11, 28 -46.
AMA StyleMarco Spruit, Drilon Ferati. Text Mining Business Policy Documents. International Journal of Business Intelligence Research. 2020; 11 (2):28-46.
Chicago/Turabian StyleMarco Spruit; Drilon Ferati. 2020. "Text Mining Business Policy Documents." International Journal of Business Intelligence Research 11, no. 2: 28-46.
There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.
Wienand A. Omta; Roy G. Van Heesbeen; Ian Shen; Jacob De Nobel; Desmond Robers; Lieke M. Van Der Velden; René H. Medema; Arno P. J. M. Siebes; Ad J. Feelders; Sjaak Brinkkemper; Judith S. Klumperman; Marco René Spruit; Matthieu J. S. Brinkhuis; David A. Egan. Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening. SLAS DISCOVERY: Advancing the Science of Drug Discovery 2020, 25, 655 -664.
AMA StyleWienand A. Omta, Roy G. Van Heesbeen, Ian Shen, Jacob De Nobel, Desmond Robers, Lieke M. Van Der Velden, René H. Medema, Arno P. J. M. Siebes, Ad J. Feelders, Sjaak Brinkkemper, Judith S. Klumperman, Marco René Spruit, Matthieu J. S. Brinkhuis, David A. Egan. Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening. SLAS DISCOVERY: Advancing the Science of Drug Discovery. 2020; 25 (6):655-664.
Chicago/Turabian StyleWienand A. Omta; Roy G. Van Heesbeen; Ian Shen; Jacob De Nobel; Desmond Robers; Lieke M. Van Der Velden; René H. Medema; Arno P. J. M. Siebes; Ad J. Feelders; Sjaak Brinkkemper; Judith S. Klumperman; Marco René Spruit; Matthieu J. S. Brinkhuis; David A. Egan. 2020. "Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening." SLAS DISCOVERY: Advancing the Science of Drug Discovery 25, no. 6: 655-664.
Mobile phone data are a novel data source to generate mobility information from Call Detail Records (CDRs). Although mobile phone data can provide us with valuable insights in human mobility, they often show a biased picture of the traveling population. This research, therefore, focuses on correcting for these biases and suggests a new method to scale mobile phone data to the true traveling population. Moreover, the scaled mobile phone data will be compared to roadside measurements at 100 different locations on Dutch highways. We infer vehicle trips from the mobile phone data and compare the scaled counts with roadside measurements. The results are evaluated for October 2015. The proposed scaling method shows very promising results with near identical vehicle counts from both data sources in terms of monthly, weekly, and hourly vehicle counts. This indicates the scaling method, in combination with mobile phone data, is able to correctly measure traffic intensities on highways, and thereby able to anticipate calibrated human mobility behaviour. Nevertheless, there are still some discrepancies—for one, during weekends—calling for more research. This paper serves researchers in the field of mobile phone data by providing a proven method to scale the sample to the population, a crucial step in creating unbiased mobility information.
Johan Meppelink; Jens Van Langen; Arno Siebes; Marco Spruit. Beware Thy Bias: Scaling Mobile Phone Data to Measure Traffic Intensities. Sustainability 2020, 12, 3631 .
AMA StyleJohan Meppelink, Jens Van Langen, Arno Siebes, Marco Spruit. Beware Thy Bias: Scaling Mobile Phone Data to Measure Traffic Intensities. Sustainability. 2020; 12 (9):3631.
Chicago/Turabian StyleJohan Meppelink; Jens Van Langen; Arno Siebes; Marco Spruit. 2020. "Beware Thy Bias: Scaling Mobile Phone Data to Measure Traffic Intensities." Sustainability 12, no. 9: 3631.
(1) Background: This work investigates whether and how researcher-physicians can be supported in their knowledge discovery process by employing Automated Machine Learning (AutoML). (2) Methods: We take a design science research approach and select the Tree-based Pipeline Optimization Tool (TPOT) as the AutoML method based on a benchmark test and requirements from researcher-physicians. We then integrate TPOT into two artefacts: a web application and a notebook. We evaluate these artefacts with researcher-physicians to examine which approach suits researcher-physicians best. Both artefacts have a similar workflow, but different user interfaces because of a conflict in requirements. (3) Results: Artefact A, a web application, was perceived as better for uploading a dataset and comparing results. Artefact B, a Jupyter notebook, was perceived as better regarding the workflow and being in control of model construction. (4) Conclusions: Thus, a hybrid artefact would be best for researcher-physicians. However, both artefacts missed model explainability and an explanation of variable importance for their created models. Hence, deployment of AutoML technologies in healthcare remains currently limited to the exploratory data analysis phase.
Richard Ooms; Marco Spruit. Self-Service Data Science in Healthcare with Automated Machine Learning. Applied Sciences 2020, 10, 2992 .
AMA StyleRichard Ooms, Marco Spruit. Self-Service Data Science in Healthcare with Automated Machine Learning. Applied Sciences. 2020; 10 (9):2992.
Chicago/Turabian StyleRichard Ooms; Marco Spruit. 2020. "Self-Service Data Science in Healthcare with Automated Machine Learning." Applied Sciences 10, no. 9: 2992.
It is becoming more challenging for health professionals to keep up to date with current research. To save time, many experts perform evidence syntheses on systematic reviews instead of primary studies. Subsequently, there is a need to update reviews to include new evidence, which requires a significant amount of effort and delays the update process. These efforts can be significantly reduced by applying computer-assisted techniques to identify relevant studies. In this study, we followed a “human-in-the-loop” approach by engaging medical experts through a controlled user experiment to update systematic reviews. The primary outcome of interest was to compare the performance levels achieved when judging full abstracts versus single sentences accompanied by Natural Language Inference labels. The experiment included post-task questionnaires to collect participants’ feedback on the usability of the computer-assisted suggestions. The findings lead us to the conclusion that employing sentence-level, for relevance assessment, achieves higher recall.
Noha S. Tawfik; Marco Spruit. Computer-Assisted Relevance Assessment: A Case Study of Updating Systematic Medical Reviews. Applied Sciences 2020, 10, 2845 .
AMA StyleNoha S. Tawfik, Marco Spruit. Computer-Assisted Relevance Assessment: A Case Study of Updating Systematic Medical Reviews. Applied Sciences. 2020; 10 (8):2845.
Chicago/Turabian StyleNoha S. Tawfik; Marco Spruit. 2020. "Computer-Assisted Relevance Assessment: A Case Study of Updating Systematic Medical Reviews." Applied Sciences 10, no. 8: 2845.
Background Several approaches to medication optimisation by identifying drug-related problems in older people have been described. Although some interventions have shown reductions in drug-related problems (DRPs), evidence supporting the effectiveness of medication reviews on clinical and economic outcomes is lacking. Application of the STOPP/START (version 2) explicit screening tool for inappropriate prescribing has decreased inappropriate prescribing and significantly reduced adverse drug reactions (ADRs) and associated healthcare costs in older patients with multi-morbidity and polypharmacy. Therefore, application of STOPP/START criteria during a medication review is likely to be beneficial. Incorporation of explicit screening tools into clinical decision support systems (CDSS) has gained traction as a means to improve both quality and efficiency in the rather time-consuming medication review process. Although CDSS can generate more potential inappropriate medication recommendations, some of these have been shown to be less clinically relevant, resulting in alert fatigue. Moreover, explicit tools such as STOPP/START do not cover all relevant DRPs on an individual patient level. The OPERAM study aims to assess the impact of a structured drug review on the quality of pharmacotherapy in older people with multi-morbidity and polypharmacy. The aim of this paper is to describe the structured, multi-component intervention of the OPERAM trial and compare it with the approach in the comparator arm. Method This paper describes a multi-component intervention, integrating interventions that have demonstrated effectiveness in defining DRPs. The intervention involves a structured history-taking of medication (SHiM), a medication review according to the systemic tool to reduce inappropriate prescribing (STRIP) method, assisted by a clinical decision support system (STRIP Assistant, STRIPA) with integrated STOPP/START criteria (version 2), followed by shared decision-making with both patient and attending physician. The developed method integrates patient input, patient data, involvement from other healthcare professionals and CDSS-assistance into one structured intervention. Discussion The clinical and economical effectiveness of this experimental intervention will be evaluated in a cohort of hospitalised, older patients with multi-morbidity and polypharmacy in the multicentre, randomized controlled OPERAM trial (OPtimising thERapy to prevent Avoidable hospital admissions in the Multi-morbid elderly), which will be completed in the last quarter of 2019. Trial registration Universal Trial Number: U1111-1181-9400 Clinicaltrials.gov: NCT02986425, Registered 08 December 2016. FOPH (Swiss national portal): SNCTP000002183. Netherlands Trial Register: NTR6012 (07-10-2016).
Erin K. Crowley; Bastiaan T. G. M. Sallevelt; Corlina J. A. Huibers; Kevin D. Murphy; Marco Spruit; Zhengru Shen; Benoît Boland; Anne Spinewine; Olivia Dalleur; Elisavet Moutzouri; Axel Löwe; Martin Feller; Nathalie Schwab; Luise Adam; Ingeborg Wilting; Wilma Knol; Nicolas Rodondi; Stephen Byrne; Denis O’Mahony. Intervention protocol: OPtimising thERapy to prevent avoidable hospital Admission in the Multi-morbid elderly (OPERAM): a structured medication review with support of a computerised decision support system. BMC Health Services Research 2020, 20, 1 -12.
AMA StyleErin K. Crowley, Bastiaan T. G. M. Sallevelt, Corlina J. A. Huibers, Kevin D. Murphy, Marco Spruit, Zhengru Shen, Benoît Boland, Anne Spinewine, Olivia Dalleur, Elisavet Moutzouri, Axel Löwe, Martin Feller, Nathalie Schwab, Luise Adam, Ingeborg Wilting, Wilma Knol, Nicolas Rodondi, Stephen Byrne, Denis O’Mahony. Intervention protocol: OPtimising thERapy to prevent avoidable hospital Admission in the Multi-morbid elderly (OPERAM): a structured medication review with support of a computerised decision support system. BMC Health Services Research. 2020; 20 (1):1-12.
Chicago/Turabian StyleErin K. Crowley; Bastiaan T. G. M. Sallevelt; Corlina J. A. Huibers; Kevin D. Murphy; Marco Spruit; Zhengru Shen; Benoît Boland; Anne Spinewine; Olivia Dalleur; Elisavet Moutzouri; Axel Löwe; Martin Feller; Nathalie Schwab; Luise Adam; Ingeborg Wilting; Wilma Knol; Nicolas Rodondi; Stephen Byrne; Denis O’Mahony. 2020. "Intervention protocol: OPtimising thERapy to prevent avoidable hospital Admission in the Multi-morbid elderly (OPERAM): a structured medication review with support of a computerised decision support system." BMC Health Services Research 20, no. 1: 1-12.
Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub.
Noha S. Tawfik; Marco R. Spruit. Evaluating sentence representations for biomedical text: Methods and experimental results. Journal of Biomedical Informatics 2020, 104, 103396 .
AMA StyleNoha S. Tawfik, Marco R. Spruit. Evaluating sentence representations for biomedical text: Methods and experimental results. Journal of Biomedical Informatics. 2020; 104 ():103396.
Chicago/Turabian StyleNoha S. Tawfik; Marco R. Spruit. 2020. "Evaluating sentence representations for biomedical text: Methods and experimental results." Journal of Biomedical Informatics 104, no. : 103396.
Research data management planning (RDMP) is the process through which researchers first get acquainted with research data management (RDM) matters. In recent years, public funding agencies have implemented governmental policies for removing barriers to access to scientific information. Researchers applying for funding at public funding agencies need to define a strategy for guaranteeing that the acquired funds also yield high-quality and reusable research data. To achieve that, funding bodies ask researchers to elaborate on data management needs in documents called data management plans (DMP). In this study, we explore several organizational and technological challenges occurring during the planning phase of research data management, more precisely during the grant submission process. By doing so, we deepen our understanding of a crucial process within research data management and broaden our understanding of the current stakeholders, practices, and challenges in RDMP.
Armel Lefebvre; Baharak Bakhtiari; Marco Spruit. Exploring research data management planning challenges in practice. it - Information Technology 2020, 62, 29 -37.
AMA StyleArmel Lefebvre, Baharak Bakhtiari, Marco Spruit. Exploring research data management planning challenges in practice. it - Information Technology. 2020; 62 (1):29-37.
Chicago/Turabian StyleArmel Lefebvre; Baharak Bakhtiari; Marco Spruit. 2020. "Exploring research data management planning challenges in practice." it - Information Technology 62, no. 1: 29-37.
Big data analysis is increasingly becoming a crucial part of many organizations, popularizing the distributed computing paradigm. Within the emerging research field of Applied Data Science, multiple notable methods are available that help analysists and scientists to create their analytical processes. However, for distributed computing problems such methods are not available yet. Therefore, to support data analysts, scientists and software engineers in the creation of distributed computing processes, we present the CRoss-Industry Standard Process for Distributed Computing Workflows (CRISP-DCW) method. The CRISP-DCW method lets users create distributed computing workflows through following a predefined cycle and using reference manuals, where the critical elements of such a workflow are developed for the context at hand. Using our method’s reference manuals and predefined steps, data scientists can spend less time on developing big data processing workflows, thus increasing efficiency. Results were evaluated with experts and found to be satisfactory. Therefore, we argue that the CRISP-DCW method provides a good starting point for applied data scientists to develop and document their distributed computing workflow, making their processes both more efficient and effective.
Marco Spruit; Stijn Meijers. The CRISP-DCW Method for Distributed Computing Workflows. First Complex Systems Digital Campus World E-Conference 2015 2019, 325 -341.
AMA StyleMarco Spruit, Stijn Meijers. The CRISP-DCW Method for Distributed Computing Workflows. First Complex Systems Digital Campus World E-Conference 2015. 2019; ():325-341.
Chicago/Turabian StyleMarco Spruit; Stijn Meijers. 2019. "The CRISP-DCW Method for Distributed Computing Workflows." First Complex Systems Digital Campus World E-Conference 2015 , no. : 325-341.
In a time when the employment of Natural Language Processing techniques in domains such as biomedicine, national security, finance and law, is flourishing, this study takes a deep look in its application in policy documents. Besides providing an overview of the current state of the literature that treats these concepts, the study at hand implements a set of unprecedented Natural Language Processing techniques on internal bank policies. The implementation of these techniques, together with the results that derive from the experiment and the experts’ evaluation, introduce a Meta-Algorithmic Modelling framework for processing internal business policies. This framework relies on three Natural Language Processing techniques, namely information extraction, automatic summarization and automatic keyword extraction. For the reference extraction and keyword extraction tasks we calculated Precision, Recall and F-scores. For the former we obtained 0.99, 0.84, and 0.89; for the latter we obtained 0.79, 0.87 and 0.83, respectively. Finally, our summary extraction approach was positively evaluated using a qualitative assessment.
Marco Spruit; Drilon Ferati. Applied Data Science in Financial Industry. First Complex Systems Digital Campus World E-Conference 2015 2019, 351 -367.
AMA StyleMarco Spruit, Drilon Ferati. Applied Data Science in Financial Industry. First Complex Systems Digital Campus World E-Conference 2015. 2019; ():351-367.
Chicago/Turabian StyleMarco Spruit; Drilon Ferati. 2019. "Applied Data Science in Financial Industry." First Complex Systems Digital Campus World E-Conference 2015 , no. : 351-367.
Natural language processing (NLP) has become essential for secondary use of clinical data. Over the last two decades, many clinical NLP systems were developed in both academia and industry. However, nearly all existing systems are restricted to specific clinical settings mainly because they were developed for and tested with specific datasets, and they often fail to scale up. Therefore, using existing NLP systems for one’s own clinical purposes requires substantial resources and long-term time commitments for customization and testing. Moreover, the maintenance is also troublesome and time-consuming. This research presents a lightweight approach for building clinical NLP systems with limited resources. Following the design science research approach, we propose a lightweight architecture which is designed to be composable, extensible, and configurable. It takes NLP as an external component which can be accessed independently and orchestrated in a pipeline via web APIs. To validate its feasibility, we developed a web-based prototype for clinical concept extraction with six well-known NLP APIs and evaluated it on three clinical datasets. In comparison with available benchmarks for the datasets, three high F1 scores (0.861, 0.724, and 0.805) were obtained from the evaluation. It also gained a low F1 score (0.373) on one of the tests, which probably is due to the small size of the test dataset. The development and evaluation of the prototype demonstrates that our approach has a great potential for building effective clinical NLP systems with limited resources.
Zhengru Shen; Hugo van Krimpen; Marco Spruit. A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems. Journal of Healthcare Engineering 2019, 2019, 1 -11.
AMA StyleZhengru Shen, Hugo van Krimpen, Marco Spruit. A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems. Journal of Healthcare Engineering. 2019; 2019 ():1-11.
Chicago/Turabian StyleZhengru Shen; Hugo van Krimpen; Marco Spruit. 2019. "A Lightweight API-Based Approach for Building Flexible Clinical NLP Systems." Journal of Healthcare Engineering 2019, no. : 1-11.
Recently, the topic of research data management (RDM) has emerged at the forefront of Open Science. Funders and publishers posit new expectations on data management planning and transparent reporting of research. At the same time, laboratories rely upon undocumented files to record data, process results and submit manuscripts which hinders repeatable and replicable management of experimental resources. In this study, we design a forensic process to reconstruct and evaluate data management practices in scientific laboratories. The process we design is named Laboratory Forensics (LF) as it combines digital forensic techniques and the systematic study of experimental data. We evaluate the effectiveness and usefulness of Laboratory Forensics with laboratory members and data managers. Our preliminary evaluation indicates that LF is a useful approach for assessing data management practices. However, LF needs further developments to be integrated into the information systems of scientific laboratories.
Armel Lefebvre; Marco Spruit. Designing Laboratory Forensics. Lecture Notes in Computer Science 2019, 238 -251.
AMA StyleArmel Lefebvre, Marco Spruit. Designing Laboratory Forensics. Lecture Notes in Computer Science. 2019; ():238-251.
Chicago/Turabian StyleArmel Lefebvre; Marco Spruit. 2019. "Designing Laboratory Forensics." Lecture Notes in Computer Science , no. : 238-251.