This blogpost is meant to give insight into the most important aspects of ASReview LAB. It serves as a mini-guide to understand what it does, why it works, and how you can apply it. The flowcharts below depict (part of) the processes surrounding ASReview LAB and its Electronic learning assistant: Elas. We will go through the following topics one by one:
- The vocabulary of Elas;
- How to get started with the software;
- The general idea behind ASReview LAB: The active learning cycle;
- When to stop in screening (spoiler alert: this is up to you!);
- How to export your results;
- In which contexts ASReview LAB can be used.
The vocabulary of Elas:
- Elas: Our Electronic Learning Assistant, the mascot of ASReview, who helps to explain the machinery behind ASReview.
- Dataset: A file consisting of records.
- Record: A record contains at least a single piece of text and can be unlabeled or labeled. The model is trained on the labeled record(s) and the unlabeled records are presented to the user to be screened and labeled. In the case of scientific systematic reviewing, a record contains the meta-data of scientific publications such as the title, abstract, and DOI number. If the DOI is available it will be presented as a clickable hyperlink to the full text.
- Text: the part of a record used for training the model. In the case of systematic reviewing it is the title and abstract of a scientific paper. Be aware of the impact of missing information; the performance of ASReview is optimal if your meta-data is complete.
- Labeling: Deciding whether a record is relevant or irrelevant.
How to get started with the software
The text below and flowchart on the left are meant to be an example in the context of doing a systematic review. This process can vary per context in which you wish to use ASReview (see last part of the blog post), as in principle any type of text can be processed by the software as long as it is uploaded in a specific format.
In a systematic review, a researcher needs to find all potentially relevant articles to answer a research question at hand. Using prespecified search terms, articles are found through different search engines (eg. Web of Science, Pubmed, etc.). The records are saved in a reference manager, such as Mendeley, Zotero or Endnote, or simply in Excel. Thereafter, the cleaning of the records begins: Removing duplicates and making sure that the record of each article contains a title and abstract (see this blogpost) and ideally also a DOI so that we can refer you to the full text in case you want to read it. All of these steps result in a file containing the records of the possible relevant articles.
To arrive at a file that can be read by ASReview LAB, you have to convert your records to a RIS, .csv or excel format. This can be achieved by exporting your records from your reference manager. More information on which files are supported by ASReview LAB, can be found here.
A quick guide on how to install ASReview is available on our website and our quick tour will get you starting in 5 minutes! Extensive documentation is available including a description of features (like the dark mode, changing the font size), FAQ’s, trouble shooting, running the software on a server, all technical details under the hood, and much more!
The general idea behind ASReview: Active learning cycle
Traditionally, going through a large number of records would take a lot of time (left part flowchart above). The order in which you screen these records is random and could be alphabetically sorted on title, authors, or simply in the order of which record was found first. You would start at the top of the stack of records and gradually work your way through them until you have reached the very last one.
The essence of active learning as implemented in ASReview LAB is ‘simply’ to continuously reorder the stack of records based on relevance scores computed by a machine learning model. By reordering the records based on predicted relevance, you will find the relevant records way sooner than the traditional way. Moreover, you can stop before you reach the last of the records. This means that you can save time, which you can use for other work, for screening even larger datasets, or for relaxing!
How does this work? At the beginning of the screening process you are asked to provide Elas with at least one relevant, and one irrelevant record. Based on this information, Elas learns what is considered to be important information and will present you with the record that is predicted to be the next most relevant (this is called certainty based sampling). You are then asked to screen the text and decide whether or not the record is relevant. Based on this decision, Elas updates its knowledge (i.e., re-trains the model based on all labeled records including the just labeled record), reorders the unlabeled records, and presents you the most relevant one. Now it is your turn again to decide whether the record is relevant or not. Thereafter, Elas again (and again) updates its knowledge, reorders the records and presents you with the next record to be labeled. This means that while you keep screening, Elas continuously updates its predictions in the background and presents you with the record that is most relevant given your previous decisions. This iterative process is called active learning, or researcher-in-the-loop machine learning. Important to realize is that our software does not make any decision but only re-orders the records.
A human (you!) makes the decision whether to mark a record relevant/irrelevant. We call this implementation in the software the Oracle Mode, because you are the oracle who knows which label to provide to each record (or at least you should now, because you are an expert in your field!)
When to stop screening and how to export your results
Now that it is clear how we get started with ASReview LAB, and how the screening works, the question remains when to stop screening records. Do you have to screen all the records? Or is there perhaps a certain point after which you can be sure that you have found all relevant records? This is a fundamental question that is still being researched . For now, if you are doing a systematic review, we suggest that you define a stopping rule before you begin screening. For example, you could decide to only screen 70% (or 5%, or 42%) of all records, perhaps you can decide that you will stop after you have found 200, 50 or 150 irrelevant records in a row, or you just stop after an x amount of hours you have available for screening. The decision depends on how much time you are willing to spend screening, in combination with how problematic it is to fail to find relevant papers. Note that humans typically have an error rate of 10% due to screening fatigue. Simulations help to estimate such error rates, but these require labeled data (note that ASReview also has a simulation mode).
For unlabeled data the software offers some insightful graphs to keep track of your screening process so far. The pie chart on the left represents an overview of how many relevant (green) and irrelevant (orange) records you have screened so far. The plot in the middle is a progress plot, which shows a so-called moving average: Throughout the screening process, the ratio between relevant and irrelevant records is depicted. The third plot presents the recall curve, which is the same as in the gif above.
After you have reached the point which was defined by the stopping rule (or actually any time you want because your results are automatically saved!), you can export the results in Excel or CSV format. The export file contains all the records ranked from most relevant to least relevant and a column with the classification of each record (1 = relevant, 0 = irrelevant, empty = not labeled). You can now use this information to start computing the analyses based on the included papers (e.g., for a meta-analyses), reading all the full-texts of the relevant records (i.e, you used ASReview merely for the abstract screening phase of the PRISMA steps), or merge your results with the results of a colleague and compute the cronbach’s alpha (reviewer consensus) and discuss the article for which you disagree. Next to a file with the results, you can also export the project file. With the project file you can easily send the whole project to a colleague so they can continue working on the screening process for example, or you can upload the project file on a trusted data repository for reproducibility purposes.
In which contexts ASReview LAB can be used
ASReview LAB is originally designed for the screening of abstracts within a systematic review (hence the name!), however, ancient greek texts, jurisdictional texts, and news articles have also been imported in the software for screening. In general, the software can be used by anyone who has to screen a large pile of documents containing texts, to find a few relevant ones. Some other possible examples can be found in the colored boxes in the flowchart below. As long as your dataset is in the right format, you can put it into ASReview LAB. Let the screening process commence!
How to cite ASReview?
The preprint on ArXiv can be used to cite this project and we expect the paper being published any moment now (curious to know in which journal… follow Rens on Twitter or LinkedIn to be the first to find out). For citing the software, please refer to the specific release of the ASReview software on Zenodo. More papers on ASReview will be listed on our website including a statement on why we value Open Science so much.
Contributing to ASReview
Although the software is for free (and Open Source), the development is not… We are research-focused and not a commercial company (which we do not want to become). So, if you want to contribute, please donate some money via our crowdfunding (even small donations are highly appreciated) or support the development!