ASReview LAB Class 101
Welcome to ASReview LAB class 101, an introduction to the most important aspects of ASReview LAB. This class serves as a mini-guide to understand what the software does, why it works, and how you can use it. The flowcharts below depict (part of) the processes surrounding ASReview LAB and its Electronic learning assistant: Elas. We will go through the following topics one by one:
The vocabulary of Elas
Elas: Our Electronic Learning Assistant, the mascot of ASReview, who helps to explain the machinery behind ASReview.
Dataset: A file consisting of records.
Record: A record contains at least a single piece of text and can be unlabeled or labeled. The model is trained on the labeled record(s) and the unlabeled records are presented to the user to be screened and labeled. In the case of scientific systematic reviewing, a record contains the meta-data of scientific publications such as the title, abstract, and DOI number. If the DOI is available it will be presented as a clickable hyperlink to the full text.
Text: the part of a record used for training the model. In the case of systematic reviewing it is the title and abstract of a scientific paper. Be aware of the impact of missing information; the performance of ASReview is optimal if your meta-data is complete.
Labeling: Deciding whether a record is relevant or irrelevant.
The general idea behind ASReview LAB: The Active Learning cycle
Traditionally, going through a large number of records would take a lot of time. The order in which you screen these records is random and could be alphabetically sorted on title, authors, or simply in the order of which record was found first. You would start at the top of the stack of records and gradually work your way through them until you have reached the very last one.
The essence of active learning as implemented in ASReview LAB is ‘simply’ to continuously reorder the stack of records based on relevance scores computed by a machine learning model. By reordering the records based on predicted relevance, you will find the relevant records way sooner than the traditional way. Moreover, you can stop before you reach the last of the records. This means that you can save time, which you can use for other work, for screening even larger datasets, or for relaxing!
How does this work? At the beginning of the screening process you are asked to provide Elas with at least one relevant, and one irrelevant record. Based on this information, Elas learns what is considered to be important information and will present you with the record that is predicted to be the next most relevant (this is called certainty based sampling). You are then asked to screen the text and decide whether or not the record is relevant. Based on this decision, Elas updates its knowledge (i.e., re-trains the model based on all labeled records including the just labeled record), reorders the unlabeled records, and presents you the most relevant one. Now it is your turn again to decide whether the record is relevant or not. Thereafter, Elas again (and again) updates its knowledge, reorders the records and presents you with the next record to be labeled.
This means that while you keep screening, Elas continuously updates its predictions in the background and presents you with the record that is most relevant given your previous decisions. This iterative process is called active learning, or researcher-in-the-loop machine learning.
Important to realize is that the software does not make any decision but only re-orders the records. A human (you!) makes the decision whether to mark a record relevant/irrelevant. This is called Oracle Mode in ASReview, because you are the oracle, the expert in your field, who knows which label to provide to each record.
When to get started with the software
So, when should you use ASReview during your systematic review? Let’s start at the beginning, the top of the flowchart below. For a systematic review, a researcher needs to find all potentially relevant articles to answer a research question at hand. Using prespecified search terms, articles are found through different search engines (eg. Web of Science, Pubmed, etc.). The records are saved in a reference manager, such as Mendeley, Zotero or Endnote, or simply in Excel. Thereafter, the cleaning of the records begins: Removing duplicates and making sure that the record of each article contains a title and abstract (see this blogpost) and ideally also a DOI so that we can refer you to the full text in case you want to read it. All of these steps result in a file containing the records of the possible relevant articles.
To arrive at a file that can be read by ASReview LAB, you have to convert your records to a RIS, .csv or excel format. This can be achieved by exporting your records from your reference manager. More information on which files are supported by ASReview LAB, can be found here. With this file you can now start screening the abstracts in ASReview LAB!
When to stop screening and how to export your results
Now that it is clear when to get started with ASReview LAB, the question remains when to stop. At which point can you be sure that you have found all relevant records? This is a fundamental question that is still being discussed. There is no golden answer available. Therefore, here are some possible strategies that you could adopt. Make sure to choose a strategy or define a stopping rule before you begin screening.
- Predetermined: Screen only X% of all records;
- Data-driven: Stop after finding X irrelevant papers in a row;
- Time-based: Stop after X hours.
The decision depends on how much time you are willing to spend screening, in combination with how problematic it is to fail to find relevant papers. Note that humans typically have an error rate of 10% due to screening fatigue. Simulations help estimate such error rates, but these require labeled data (note that ASReview also has a simulation mode).
For unlabeled data the software offers some insightful graphs to keep track of your screening process so far. The pie chart on the top represents an overview of how many relevant (green) and irrelevant (orange) records you have screened so far. The plot in the middle is a progress plot, which shows a so-called moving average: Throughout the screening process, the ratio between relevant and irrelevant records is depicted. The bottom plot is a recall curve, which shows you the benefit of ASReview over manual screening.
You can export the results from ASReview LAB in Excel or CSV format at any time you like. All your screening decisions are automatically saved on your computer! The export file contains all the records ranked from most relevant to least relevant and a column with the classification of each record (1 = relevant, 0 = irrelevant, empty = not labeled). You can use this information to start computing the analyses based on the included papers (e.g., for a meta-analyses); reading all the full-texts of the relevant records (i.e, you used ASReview merely for the abstract screening phase of the PRISMA steps); or merge your results with the results of a colleague and compute the cronbach’s alpha (reviewer consensus) and discuss the article for which you disagree. Next to a file with the results, you can also export the project file. With the project file you can easily send the whole project to a colleague so they can continue working on the screening process for example, or you can upload the project file on a trusted data repository for reproducibility purposes.
How to cite ASReview?
Cite our project through this publication in Nature Machine Intelligence. For citing the software, please refer to the specific release of the ASReview software on Zenodo.
Contributing to ASReview
Do you have any ideas or suggestions which could improve ASReview? Create an issue, feature request or report a bug on GitHub! Also take a look at the development fund to help ASReview continue on its journey to easier systematic reviewing.