AI for social good: Making AI work for health systems

Posted on:
A man on the left checks a woman's vitals on the right while they both sit down.
Photo credit: Jahangir Alam Onuchcha, Shutterstock.com

Why AI matters for health systems

Health systems around the world are overburdened, with more than half of the global population lacking full access to essential health services and an estimated shortage of up to 11 million health workers by 2030. These challenges are more pronounced in low- and middle-income countries, where over 8 million people die every year from treatable conditions. Sixty percent of these deaths occur due to poor quality of care. 

AI has the potential to improve access to and quality of care by, for example, reducing the cost of training health workers, enabling triage and improved diagnosis, automating administrative tasks, and offering timely, personalized guidance to patients. Innovators are already building and deploying many AI use cases in low-income settings. 

However, evidence on the real-world impacts of AI solutions on providers and patients is scarce. Closing these evidence gaps is key to ensuring policymakers can responsibly scale AI that supports healthcare workers and patients.

Importance of evidence in health care

Evidence from survey experiments suggests that simply giving access to a large language model (LLM) that performs well on its own is unlikely to be enough to improve health outcomes in the real world. For example, neither physicians in the United States nor radiologists located around the world improved the accuracy of their diagnoses when given access to an AI tool that had outperformed humans on its own. 

These studies found that physicians struggled to interact with the AI tools effectively: radiologists incorrectly trusted their own judgment more than the AI, and acted as if their own assessments were completely independent from the AI’s predictions, even though they were not. This was not the result of failure of the technology itself, but rather the limitations of human interactions with the technology. 

Similarly, in the United Kingdom, LLMs achieved high accuracy when tested on hypothetical medical scenarios, correctly identifying conditions in 95 percent of cases. Without LLMs, laypeople correctly identified conditions in 47 percent of these cases. However, when laypeople used the LLMs to assess the same scenarios, their diagnoses were less accurate than those not using LLMs, correctly identifying conditions in 35 percent of cases. The accuracy breakdown seemed to emerge as a result of poor communication between the user and the LLMs: Users provided LLMs with incomplete information, and LLMs communicated poorly in return. 

These findings indicate that, even in hypothetical scenarios, AI tools can fail to improve human decisions, underscoring the importance of evaluating the impacts of AI tools in the hands of the people using them, rather than relying solely on technical benchmarks.  

Evidence from AI interventions in health care

While evidence on the impacts of AI solutions is still scarce, a handful of evaluations highlight the potential for well-designed, context-specific interventions to improve health outcomes. For example, in Kenya, pregnant women in randomly-assigned antenatal care facilities received access to PROMPTS, an SMS-based and AI-enabled virtual helpdesk that answered questions and connected them to health professionals in high-priority cases. Women with access to the platform reported a range of modest improvements in knowledge, birth preparedness, and newborn care behaviors like sleep positioning and parental engagement, demonstrating the potential for low-cost, scalable tools to strengthen routine maternal care.  

When explicitly designed for use in real-world settings, AI tools have the potential to improve frontline workers’ diagnostic capabilities. For example, a social enterprise called Penda Health, which operates a network of medical centers throughout Nairobi, developed an AI-based clinical decision support tool for clinicians. The tool operated within Penda Health’s electronic medical record system and was integrated into regular workflows for a randomly-assigned half of clinicians in participating medical centers. Clinicians with access to the tool made fewer diagnostic and treatment errors, and reported that it improved the quality of care they provided, highlighting the promise of well-integrated AI tools. 

In a J-PAL-funded study, researchers are evaluating an AI-based referral tool that allows volunteer community health workers to predict the risk of a “silent,” or hard-to-detect, heart attack during household visits using a small, portable electrocardiogram device. This tool has the potential to be a scalable, evidence-backed model for delivering diagnostic care through government-run health systems to underserved populations.

Three women in bright clothing sit down a look out onto the street
Photo credit: CatherineLProd, Shutterstock.com

Beyond physical health, AI is also being applied for mental health and emotional well-being. Among adults in the United States, one early study found that access to a generative AI therapy chatbot led to reductions in symptoms of major depressive disorder and generalized anxiety disorder. Also in the United States, researchers are exploring the impact of an AI-driven mobile app on mental health for low-income Hispanic immigrants. 

Despite this encouraging early evidence across health domains, AI tools can also involve significant risks: exhibiting or exacerbating biasresponding inappropriately to sensitive mental health discussions, or lacking the human qualities needed to build trusted patient-provider relationships. Continued scrutiny of potential risks is an important avenue of future research and implementation.

Using AI to enhance proven health interventions

Decades of rigorous evaluation have identified effective approaches to improving health outcomes, and AI has the potential to improve the efficiency, impact, and reach of these programs. For example, research in the J-PAL network shows that AI tools may improve the targeting of conditional cash transfers aimed at boosting child health to ensure they reach the households where they are most needed. It also has the potential to enhance the monitoring of health workers to support pay-for-performance programs, and to strengthen information campaigns and behavioral nudges by delivering personalized content promoting healthy behaviors and responding to questions in real time.

Gaps in research and barriers to AI adoption

Many urgent questions remain unanswered at the intersection of AI and health. With lives at stake, priorities for future research include:

  1. Behavior change and equity: How can AI-assisted care and mobile platforms increase the adoption of evidence-based health behaviors and reduce inequality in service delivery, particularly for underserved groups?
  2. Provider efficiency and quality of care: How can AI best improve the efficiency and effectiveness of health care providers through, for example, diagnostic support, automated note summaries, or chatbot-based patient intake?
  3. Adoption and workforce capacity: What factors influence the adoption of promising health AI tools, and how can AI augment the skills of health workers in low-resource or rural settings to enhance decision-making and job satisfaction?
  4. Costs and outcomes: How does the cost-effectiveness of health AI tools compare to alternatives, and how can costs be minimized without affecting patient health outcomes?
  5. Scalability of high-tech solutions: Beyond specific use cases, research should explore how to overcome barriers that could limit scaling AI-based health tools. For example, access to the internet and a mobile phone is a prerequisite for using many AI solutions; failure to consider these issues risks deepening exclusion of already marginalized groups. In addition, ensuring the availability of clear escalation and referral pathways can help to prevent harmful overreliance that could put patients at risk.
  6. Building trust for sustained impact: How can AI solutions ensure user trust for continued adherence to and follow-up on AI-based health advice? Trust can be undermined if LLM-based systems produce inconsistent or inaccurate responses. For instance, models might show biased medical recommendations, based on patients' sociodemographic characteristics, that are not supported by clinical reasoning or guidelines.
  7. Guardrails and data privacy: Additionally, because the health sector involves private and sensitive data, ensuring the safety, reliability, and integrity of these tools is essential. 

As the use of AI in health care expands rapidly, a deliberate focus on evaluation is critical to determine whether specific tools truly strengthen health systems or simply add complexity. J-PAL’s upcoming AI Evidence Playbook will delve deeper into these questions, guiding policymakers and practitioners on how to harness AI responsibly and effectively for better health outcomes.

Read other posts in this series.

Authored By