ASReview parallelization design

Optimizing ASReview simulations: A generic multiprocessing solution for ‘light-data’ and ‘heavy-data’ users

Active learning can be used for optimizing and speeding up the screening phase of systematic reviews. Running simulation studies mimicking the screening process can be used to test the performance of different machine-learning models or to study the impact of different training data. This paper presents an architecture design with a multiprocessing computational strategy for running many such simulation studies in parallel, using the ASReview Makita workflow generator and Kubernetes software for deployment with cloud technologies. We provide a technical explanation of the proposed cloud architecture and its usage. In addition to that, we conducted 1140 simulations investigating the computational time using various numbers of CPUs and RAM settings. Our analysis demonstrates the degree to which simulations can be accelerated with multiprocessing computing usage. The parallel computation strategy and the architecture design that was developed in the present paper can contribute to future research with more optimal simulation time and, at the same time, ensure the safe completion of the needed processes.

Kubernetes cluster implementation

Diagram of the ASReview parallelization design for cloud Kubernetes cluster implementation that describes the setup manual steps and the way two Docker components (Worker and Tasker) communicate using a ‘tasker.sh’ and ‘worker.sh’ bash scripts with the addition of RabbitMQ Message broker.

Documentation and Instructions

The presented architecture was provided with explicit documentation describing the steps necessary to run a large number of simulations via the cloud and local environments.

Usage Instructions

Scripts, data and output

This study has been made publicly available, where all used scripts and generated data have been made freely available under the MIT license.

GitHub Scripts

Publication

Sergei Romanov, Abel Soares Siqueira, Jonathan de Bruin, Jelle Teijema, Laura Hofstee, Rens van de Schoot; Optimizing ASReview simulations: A generic multiprocessing solution for ‘light-data’ and ‘heavy-data’ users. Data Intelligence 2024; DOI:10.1162/dint_a_00244

Open-Access

This work was supported by the Netherlands eScience Center under grant number ODISSEI.2022.023.

Categories: Data, Featured, Scientific Papers, Simulation Studies, Special Use-Cases

Check out similar projects

The FORAS project

The FORAS project will replicate and extend an original review integrating advanced machine-learning techniques via the OpenAlex database.

The Noisy Label Filter procedure: a case study to address replication issues in systematic reviews

In this study, we addressed the issue of the lack of replicability of systematic reviews datasets. We used a case study format and developed a procedure to optimize and finalize the by rule imperfect reconstructed dataset.

Reproducibility and Data storage for Active Learning-Aided Systematic Reviews

This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading.

Simulation Study Switching between Models

This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading.

Simulation Study on Risk Analysis Documents

ASReview conducted a simulation study on risk analysis documents to evaluate the time-benefit for the Royal Dutch Pharmacists Association.

The Mega Meta project: Substance use, anxiety and depression

The MegaMeta project is a large scale project to review factors that contribute to substance use, anxiety and depressive disorders. Read more information on the search and screening protocol, hyperparameter tuning and post-processing used in this post.

Systematic Review on Studies Evaluating the Performance of Active Learning within Systematic Reviews

This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading.

Systematic Review on the Implementation of AI-aided Systematic Reviews in Clinical Guideline Development

The ASReview research team conducted a systematic review on the implementation of AI-aided Systematic Reviews within Clinical Guideline Development.

AI-aided literature screening in medical guideline development

In a time of exponential growth of new evidence supporting clinical decision making, combined with a labor-intensive process of selecting this evidence, there is a need for methods to speed up current processes in order to keep medical guidelines up-to-date.

Systematic reviews performed within the UU and UMC Utrecht

This dataset contains an overview of 117 systematic reviews published by corresponding authors affiliated to Utrecht University (UU) or UMC Utrecht in 2020.

Paper introducing the ASReview project

We show that by using active learning, ASReview can lead to far more efficient reviewing than manual reviewing, while exhibiting adequate quality. Furthermore, the presented software is fully transparent and open source.

Systematic review data about depression

Explore the systematic review dataset that was used for the publication “Psychological theories of depressive relapse and recurrence” from Brouwer et al., 2019. From pre-processing to the final dataset, a look into the complete systematic review process behind this publication.