Reproducibility and Data storage for AI-Aided Systematic Reviews

Reproducibility and Data storage Checklist for Active Learning-Aided Systematic Reviews

In the screening phase of a systematic review, screening prioritization via active learning effectively reduces the workload. However, the PRISMA guidelines are not sufficient for reporting the screening phase in a reproducible manner. Text screening with active learning is an iterative process, but the labeling decisions and the training of the active learning model can happen independently of each other in time. So it is not trivial to store the data from both events so that you can still know which iteration of the model was used for each labeling decision. Moreover, many iterations of the active learning model will be trained throughout the screening process, producing an enormous amount of data (think of many gigabytes or even terabytes of data), and machine learning models are continually becoming larger. Together this can add up to an undesirable amount of data when naively storing all the data produced at every iteration of the active learning pipeline. This article clarifies the steps in an active learning-aided screening process and what data is produced at every step. We show how this data can be stored efficiently in terms of size. Most notably, the data produced by the model is where we need to strike a balance between reproducibility and storage size. Finally, we created the RDAL-Checklist (Reproducibility and Data storage for Active Learning-aided systematic reviews – checklist) that helps users and creators of active learning software make their screening process reproducible.

Lombaers, P., de Bruin, J., & van de Schoot, R. (2023, January 19). Reproducibility and Data storage Checklist for Active Learning-Aided Systematic Reviews. DOI: 10.31234/osf.io/g93zf

Data generated using ASReview LAB is stored in an ASReview project file. Via the ASReview Python API, there are two ways to access the data in the ASReview (extension .asreview) file: Via the project-API and the state-API. The project API is for retrieving general project settings, the imported dataset, the feature matrix, etc. The state API retrieves data related directly to the reviewing process, such as the labels, the time of labeling, and the classifier used. Go to the documentation for detailed instructions.

PsyArXiv

Categories: Data, Featured, Scientific Papers, Simulation Studies

Check out similar projects

Optimizing ASReview simulations

Optimizing ASReview simulations with multiprocessing solutions for ‘light-data’ and ‘heavy-data’ users via a Kubernetes cluster.

The FORAS project

The FORAS project will replicate and extend an original review integrating advanced machine-learning techniques via the OpenAlex database.

The Noisy Label Filter procedure: a case study to address replication issues in systematic reviews

In this study, we addressed the issue of the lack of replicability of systematic reviews datasets. We used a case study format and developed a procedure to optimize and finalize the by rule imperfect reconstructed dataset.

Simulation Study Switching between Models

This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading.

Simulation Study on Risk Analysis Documents

ASReview conducted a simulation study on risk analysis documents to evaluate the time-benefit for the Royal Dutch Pharmacists Association.

The Mega Meta project: Substance use, anxiety and depression

The MegaMeta project is a large scale project to review factors that contribute to substance use, anxiety and depressive disorders. Read more information on the search and screening protocol, hyperparameter tuning and post-processing used in this post.

Systematic Review on Studies Evaluating the Performance of Active Learning within Systematic Reviews

This systematic review focused on synthesizing information on studies that evaluated the performance of Active Learning compared to human reading.

Systematic Review on the Implementation of AI-aided Systematic Reviews in Clinical Guideline Development

The ASReview research team conducted a systematic review on the implementation of AI-aided Systematic Reviews within Clinical Guideline Development.

AI-aided literature screening in medical guideline development

In a time of exponential growth of new evidence supporting clinical decision making, combined with a labor-intensive process of selecting this evidence, there is a need for methods to speed up current processes in order to keep medical guidelines up-to-date.

Systematic reviews performed within the UU and UMC Utrecht

This dataset contains an overview of 117 systematic reviews published by corresponding authors affiliated to Utrecht University (UU) or UMC Utrecht in 2020.

Paper introducing the ASReview project

We show that by using active learning, ASReview can lead to far more efficient reviewing than manual reviewing, while exhibiting adequate quality. Furthermore, the presented software is fully transparent and open source.

Systematic review data about depression

Explore the systematic review dataset that was used for the publication “Psychological theories of depressive relapse and recurrence” from Brouwer et al., 2019. From pre-processing to the final dataset, a look into the complete systematic review process behind this publication.

Reproducibility and Data storage for Active Learning-Aided Systematic Reviews

Reproducibility and Data storage Checklist for Active Learning-Aided Systematic Reviews

Check out similar projects