Systematic Review: Systematic Reviews and AI

This guide presents tools and advice for conducting systematic reviews.

Systematic Reviews and AI

Introduction

Systematic Reviews, Scoping Reviews, and other types of literature reviews are widely recognized for their rigorous methodologies and the considerable time required for their completion. Recent advancements in artificial intelligence (AI) have led to the recognition of its potential in enhancing and streamlining various aspects of the review process. AI tools are capable of assisting in tasks such as search strategy development, identification of relevant articles, screening, deduplication, data extraction, and synthesis.

Despite their promise, it is essential to acknowledge that AI tools may introduce errors or unforeseen biases into the review process. As such, it is crucial to apply these tools in conjunction with established, validated methods to ensure the quality and reliability of the review outcomes.

The AI tools here are a examples of those available to use. We don't promote one tool over another but rather reviewers are encouraged to assess and select tools based on their specific needs and the requirements of the review. Furthermore, AI tools continue to evolve, and it is advisable to remain informed about new developments that could enhance the systematic review process.

Transparency is a fundamental component of any review. Therefore, if AI tools are employed during the review process, their use should be clearly documented and reported.

For further guidance, please refer to the Responsible AI in Evidence SynthEsis (RAISE): guidance and recommendations framework, which offers recommendations and best practices for the ethical application of AI in evidence synthesis.

Is it Safe to use ChatGPT to Write your Search?

Increasingly Large language models (LLMs), such as ChatGPT, are being used to help develop search strategies for Reviews. However while they do have a role in discovering keywords for search strategies they are not yet reliable enough to be used without verification and analysis.

University of Toronto has written a detailed guide on the pros and cons of using Tools like Chat GPT to create search strategies. The guide points out some of the shortfalls of using these tools such as

ChatGPT inserts imaginary MeSH terms (i.e.. it uses MeSH that do not exist as part of the search)
ChatGPT does not reproduce the same search even when the exact same prompts as used
ChatGPT cannot currently produce search strategies for proprietary databases (ie. Ovid Medline, Embase, PsycINFO, etc)
cannot produce phrases with proximity operators, which can be helpful in decreasing imprecision while maintaining recall

The guide is informed by an article written by Wang et al , listed below,

We would recommend that you read both the University of Toronto’s guide and the article by Wang et al to inform yourself of the limitations of using tools like Chat GPT in creating search strategies.

Reading list

Guimarães, N. S., Joviano-Santos, J. V., Reis, M. G., Chaves, R. R. M. and Observatory of Epidemiology, N. H. R. (2024) 'Development of search strategies for systematic reviews in health using ChatGPT: a critical analysis', Journal of Translational Medicine, 22(1), pp. 1. DOI: 10.1186/s12967-023-04371-5.

Parisi, V. and Sutton, A. (2024) 'The role of ChatGPT in developing systematic literature searches: an evidence summary', Journal of EAHIL, 20(2), pp. 30-34. Available at: https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-023-04371-5.

Wang, S., Scells, H., Koopman, B. and Zuccon, G. (2023) 'Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?', Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1426–1436. DOI: 10.1145/3539618.3591703.

Data extraction is the stage of a Systematic Review that occurs between identifying eligible studies and analysing the data. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and, for quantitative synthesis, to collect the necessary data to carry out meta-analysis. (Taylor, Mahtani & Aronson, 2021) There are a variety of AI tools that can be used to assist with this step. However, as ever, researchers need to to be cautious and check the results of any work completed by AI tools during data extraction to ensure accuracy and relevance.

Elicit

Elicit can extract data from pdfs you upload, saving you time and allowing you to then synthesise the information. The free basic version allows you to extract data from papers and upload your own papers. However, only priced versions of the product will give you summaries of papers and allow you to extract the information into csv and bib formats.

Users should be aware that Elicit does not remove the need for human oversight as while it may replace some of of the lower level tasks in data extraction , it “is not able to perform high-level cognitive functions that are required to create an understanding and synthesize the literature" ((Whitfield and Hofmann, 2023, p. 204).

SysRev

SysRev is an online platform that assists with systematic reviews, including machine learning tools for screening and data extraction, and can be used for basic or complex data extraction. Note that it does not offer a deduplication feature at this time. Free and paid versions are available.

Large languge model (LLM) tools

LLMs like ChapGPT are increasingly seen as having a role to play in data extraction. Recent research Indicates that CHAT GPT when used in data extraction had was quite successful with “agreement between ChatGPT-40 and human reviewers to be greater than 80% for data extraction in systematic reviews, with a 92.4% agreement between ChatGPT-4o and human reviewers. (Motzfeldt et al., 2025, p. 10).

Once again it should be noted that LLMs should be used to support Data Extraction and not used as the primary rater in Data extraction.

Success when using using LLMs to extract data is based on the quality of the prompts developed . It is recommended that an iterative development process should be used to make sure that the prompts are sufficiently robust. Check out our AI Guide for information on Prompt Engineering for best practice.

Motzfeldt Jensen, M., Brix Danielsen, M., Riis, J., Assifuah Kristjansen, K., Andersen, S., Okubo, Y. and Jørgensen, M. G. (2025) 'ChatGPT-4o can serve as the second rater for data extraction in systematic reviews', PLOS ONE, 20(1), pp. e0313401, Available: Public Library of Science. Available at: https://doi.org/10.1371/journal.pone.0313401.

Taylor, K. S., Mahtani, K. R. and Aronson, J. K. (2021) 'Summarising good practice guidelines for data extraction for systematic reviews and meta-analysis', BMJ Evidence-Based Medicine, 26(3), pp. 88. Available at: http://ebm.bmj.com/content/26/3/88.abstract.

Whitfield, S. and Hofmann, M. A. (2023) 'Elicit: AI literature review research assistant', Public Services Quarterly, 19(3), pp. 201-207, Available: Routledge. Available at: https://doi.org/10.1080/15228959.2023.2224125

An integral element of Systematic Reviews is Citation analysis. This importance has been highlighted by the recent development of the TARCiS Statement, which details how researchers can consistently undertake and report citation analysis (Hirt et al, 2024). Databases like Scopus and Web of Science allow for citation analysis and recently AI tools are being used for citation analysis as well

ResearchRabbit

ResearchRabbit is a free online citation-based literature mapping tool. It is a visual literature review software mapping tool That allows you to track citations.

Lens.org

Lens.org provides tools to explore the linkages between documents and entities by viewing and analyzing the citing and cited scholarly works or patents. It offers different scholarly citation sets to explore: citing scholarly works (forward citations), cited scholarly works (backward citations).

Dimensions.ai

Dimensions.ai facilitates citation analysis by providing a database of over 106 million publications and 1.2 billion citations, enabling users to search, extract, and analyze research data, including citation metrics

Hirt, J., Nordhausen, T., Fuerst, T., Ewald, H. and Appenzeller-Herzog, C. (2024) 'Guidance on terminology, application, and reporting of citation searching: the TARCiS statement', BMJ, 385, pp. e078384. Available at: https://www.bmj.com/content/bmj/385/bmj-2023-078384.full.pdf.