AI for Social Good
We need rigorous impact evaluations of AI in the social sector to ensure that it promotes social welfare.
People in many countries face persistent challenges to employment. Even when jobs exist, specific opportunities can be hard to identify, difficult to get, and might not pay enough. Workers looking for jobs need employable skills and knowledge of where jobs are. Businesses and entrepreneurs, on the other hand, often struggle to grow, limiting how many jobs they can create. How can AI tools support workers and businesses to address these challenges?
Researchers in the J-PAL network have been using randomized evaluations to learn how to help people get jobs and how to help businesses grow for over two decades, drawing out insights to inform policies and programs from Brazil to sub-Saharan Africa and beyond. Researchers have also started testing how new AI interventions can support workers and firms to improve economic outcomes for people in low- and middle-income countries.
AI has the potential to support workers and businesses in LMICs by both uncovering new evidence-informed solutions and making existing solutions more effective.
One clear pathway is by increasing worker performance. Early research on equipping workers with AI tools tends to show positive impacts on productivity.
For example, in ongoing research in the Czech Republic, small and mid-sized businesses whose workers received training on using AI in operations, marketing, customer management, product development, and financial management saw their profits grow compared to similar firms. Across Boston Consulting Group offices around the world, providing strategy consultants with access to a company-specific AI platform improved productivity and quality on knowledge-intensive tasks.
Within businesses, this early evidence suggests that AI tools lead to larger gains for workers with lower skills or less experience. In the Boston Consulting Group example above, AI-driven improvements were larger for consultants who were performing below average before the study began.
Similarly, a quasi-experimental study with customer support agents in the Philippines and the United States finds that providing agents with chatbot-based suggestions for client responses and links to internal documentation increased their performance. Improvements were largest among lower-skill agents, while higher-skilled agents receiving chatbot access experienced a small dip in work quality.
In the US, giving software developers at large companies access to AI assistants increased the number of tasks they completed compared to developers without access. Less-experienced developers saw larger productivity gains.
AI tools may help match workers to suitable jobs more efficiently by reducing search barriers for both job-seekers and employers.
For example, in Ghana, an automated application review tool screened prospective teachers more effectively than “human-only” and “human-with-AI-assistance” alternatives, resulting in a higher rate of offers and hires at the interview stage. Three ongoing projects will evaluate tools designed to improve matching: a career counseling and recommendation tool in Kenya; a platform to help hiring managers assess applicants in Mexico; and a system using administrative data to recommend applicants for existing vacancies in France.
However, improving performance and matching only go so far if businesses aren’t growing and expanding their hiring.
AI and other innovative tools can help banks and other lenders identify entrepreneurs and businesses with high growth potential, providing them with resources to grow and create jobs. For example, in Egypt, an AI-based credit scoring system that incorporated borrowers’ personality traits helped lenders allocate larger loans to client businesses that were more likely to grow with more capital, while flagging others who were unlikely to see their profits grow with the larger loan.
AI may help business owners and leaders make better decisions through customized advice, creating more jobs as they grow their existing enterprises or launch new ones. While giving traditional business training to entrepreneurs tends to have modest effects, evidence shows that more customized or consulting approaches can have greater impacts. Using AI may enable personalized and more effective support to help improve business practices. Among new sellers on a Chinese e-commerce site, for example, those receiving business training materials that an AI tool selected for them based on real-time performance data experienced higher revenues than sellers who were not offered training.
However, emerging evidence also shows not all entrepreneurs may benefit from general purpose tools. For example, an early study from Kenya found that small business owners who were offered access to an “AI business assistant” chatbot did not improve business performance on average. But researchers also found that, among this group of entrepreneurs, businesses that were already more successful before being introduced to the AI tool increased their revenues and profits—yet the chatbot reduced revenues and profits for those who were less successful. Researchers suggest that the difference in outcomes stemmed from how the business owners picked and implemented specific pieces of AI advice.
How will the role of workers change in economies shaped by AI? Researchers within and beyond the J-PAL network are assessing AI’s broader economic implications, including the possibility that AI’s widespread use could cause firms to hire fewer people in LMICs or lower their wages. For example, one study identified relative declines in employment for early-career workers in AI-exposed fields in the United States. As AI adoption spreads in LMICs, automation may reduce the need for workers with certain skills, with some groups benefiting while others lose out.
If not designed and monitored with care, AI-based matching and targeting systems to improve jobs and business growth may also exacerbate inequality. These AI tools identify hidden patterns in data that predict successful trainees or candidates. Depending on how the underlying model is trained, it may reinforce biases against marginalized groups. The existing evidence is mixed: in one case, giving reviewers an AI-generated score of applicants reduced discrimination against women. In another study that leveraged a Fortune 500 company’s historical hiring data, a tool that predicted candidate quality worsened discrimination unless it was carefully designed to consider target groups who are less represented in the data.
Careful design may allow AI to instead close gaps—helping underserved groups access credit, prepare for disasters, and maintain housing.
Further research should explore how to leverage AI tools for improved skills, productivity, and wellbeing, and avoid inadvertent harm. Priority questions include:
AI adoption presents both opportunities and challenges for LMICs trying to support workers and businesses and foster economic growth that benefits more people. To chart this emerging frontier, J-PAL’s forthcoming AI Evidence Playbook will summarize what we know—and what we still need to learn—about AI’s role across sectors.
Beyond labor markets and business growth, AI offers enormous potential in many other sectors. The next blog in this series will focus on applications relevant for health outcomes—stay tuned!
Read other posts in this series.
In addition to questions about AI’s big-picture impacts, pressing questions remain about the effects specific use cases will have on outcomes in the real world. History gives us reason to be careful. Over the past two decades, new technologies launched with great promise often fell short when context, delivery, and incentives were overlooked.
Artificial intelligence is reshaping how we learn, work, and govern, much like earlier general-purpose technologies that rewired entire economies. The social sector is no exception: governments, NGOs, and social enterprises are rapidly weaving AI into programs, from education to financial inclusion to social protection.
In addition to questions about AI’s big-picture impacts, pressing questions remain about the effects specific use cases will have on outcomes in the real world.
History gives us reason to be careful. Over the past two decades, new technologies launched with great promise often fell short when context, delivery, and incentives were overlooked—from laptops in classrooms to clean cookstoves. To avoid repeating those mistakes, policymakers and funders should invest in evidence-informed programming and real-world evaluations that measure what changes for people and surface unintended effects. That’s how we leverage both evidence and innovation to learn what actually works, and for whom.
This post, the first in a series, outlines how we can bring this discipline to AI in the social sector. Drawing on a growing base of evidence and J-PAL’s earlier article in the Stanford Social Innovation Review, we outline six promising opportunities for AI to contribute to social good, highlighting emerging and ongoing research:
Governments and NGOs often struggle to identify who needs support the most. Traditional methods for collecting this information, such as surveys and administrative data, can be slow, expensive, and inaccurate (see Chapter 4 of the Social Protection Initiative Evidence Review for a synthesis of existing evidence on targeting).
AI can improve targeting by analyzing vast data to predict needs more accurately, allowing for earlier and/or more targeted action. For example, in Togo, using algorithms to analyze satellite and phone data enabled rapid, cost-effective, targeted cash transfers during COVID-19. Researchers are working with Google in India to test strategies for disseminating AI-powered flood forecasts that alert households of likely flooding in their area. In another example, researchers are building a predictive model to identify families at risk of eviction in Missouri so that timely assistance reaches them before displacement.
AI tools can expand access to context-specific information, benefits, and programs through personalized content. In Kenya, providing pregnant women with informational messages and an AI-powered helpdesk led to improved knowledge, birth preparedness, and newborn and postpartum care. In the United States, an AI-powered chatbot to support admitted students in the summer before their first year of university increased on-time enrollment in one context—but increased enrollment only for a subset of students in another, emphasizing the need for continued evaluation as interventions expand to new contexts.
Through triaging and personalized training, AI can help frontline workers, including health workers, teachers, and law enforcement officers, focus their limited time on the highest-value work. For example, automated essay grading and feedback enabled teachers in Brazil to spend more time engaging with students, improving student performance on a standardized exam. Through J-PAL funding, researchers are also assessing the impact of an AI-based low-cost referral tool that predicts the risk of silent heart attacks in India.
AI-enabled solutions can improve the efficiency of organizations by augmenting capabilities, optimizing service delivery, and automating labor-intensive processes. In Mexico, informing workers about conciliation services and machine learning predictions of case outcomes increased immediate settlements when workers personally received the information, resulting in reduced caseloads for courts. In Chile, providing procurement officers with AI-assisted monthly performance reports improved efficiency, but only when the reports were also visible to managers.
Researchers have also identified opportunities for AI to improve efficiency in hiring at both a nonprofit in Ghana and a recruitment firm in the Philippines. Researchers are now exploring related questions with a recruitment software platform in Mexico, and also with the Public Employment Service in France.
AI can amplify bias if trained on skewed data or deployed without safeguards such as independent audits and human oversight. However, when carefully designed, it has the potential to reduce human bias in decision-making.
J-PAL affiliated researchers have developed new, diversity-enhancing algorithms for resume screening during hiring processes. Ongoing research is evaluating whether algorithms can help teachers in Italy reduce gender discrimination in educational opportunities.
Many low- and middle-income countries face severe fiscal constraints, making it harder to address pressing challenges like poverty and climate change. AI can help governments mobilize resources more effectively and support a shift toward fairer taxation by improving compliance, detecting fraud and leakages, and strengthening revenue collection and management. For instance, a “digital property tax census” in Senegal led to more accurate and progressive property value assessments than relying on bureaucrats’ in-person assessments.
It’s essential to build these tools to be useful within real-world systems. In India, a machine learning tool flagged tax fraud more accurately than existing methods, but tax collection did not improve, in part due to institutional processes that made it difficult to act on the improved recommendations. Examples like this highlight the importance of context-appropriate interventions and continued evaluation.
AI offers an opportunity to help governments, donors, and civil society organizations do more with less while staying evidence-informed. But responsible adoption demands concurrent impact evaluation and transparency about results.
In the coming months, we will feature sector deep dives, from education and health to labor and firms, highlighting where AI is showing promise, where evidence is mixed, and what it takes to deliver impact in practice. Over time, we will also write about using AI for social science research and host guest blogs from researchers within our network. Stay tuned for the launch of J-PAL’s AI Evidence Playbook in coming months, a guide that distills what we know and still need to learn about AI’s role across the social sector.
Are you a funder interested in being part of this effort? Contact us—and to stay up to date on the latest AI research and policy efforts, subscribe to our eNews and select “artificial intelligence” as an interest area in the subscription form.
We need rigorous impact evaluations of AI in the social sector to ensure that it promotes social welfare.
Artificial intelligence (AI) has the potential to support teachers in completing time-intensive, subjective tasks, such as grading essays, and to provide individualized feedback to students on their writing. However, evidence on whether and how AI impacts students’ writing skills remains limited. To fill this gap, researchers evaluated the learning impacts of using AI systems to score and comment on essays written for Brazil’s national post-secondary admission exam. In schools where AI technology was introduced, teachers were able to provide more frequent individualized feedback to students, and students’ essay scores improved as a result.
Improving learning is one of the most pressing goals for educational policy in low- and middle-income countries. A common barrier to learning is that teachers are time-constrained and often have to juggle between providing individualized assistance to students and performing routine tasks, like grading. This is especially true for the development of writing skills, which requires teachers to spend long hours grading.
Educational technologies (ed techs) could help alleviate this challenge by performing the operational parts of teaching and allowing teachers to reallocate time from less to more complex tasks (e.g., one-on-one feedback to students). For example, automated writing evaluation (AWE) systems can potentially help improve writing by using natural language processing and machine learning algorithms to predict scores and allocate feedback, thereby reducing the amount of time teachers need to spend on grading. However, there is little evidence on the effectiveness of ed techs or artificial intelligence (AI) focused on writing. Are AWE systems effective in improving students’ writing skills?
Providing high-quality education is a challenge in Brazil, especially when it comes to language skills. According to the 2018 PISA exam, a worldwide study of students’ scholastic performance, the average 15-year-old Brazilian student scored 413 points on reading, compared to an average of 487 points in all OECD countries.1
In response to the need for higher-quality education in Brazil, the implementing partner in this evaluation was launched in 2015 with the mission of improving literacy and writing skills among school-aged youth by applying artificial intelligence to linguistics in public schools. Its main product was a pedagogical program that provided feedback on writing skills to students, using an automated writing evaluation (AWE) system combined with validation of feedback by human essay graders. The AWE system was embedded in an online platform that granted students access to in-classroom practice opportunities for the essay of the National Secondary Education Exam (Exame Nacional do Ensino Médio, ENEM).
ENEM is the second largest college admission exam in the world. It has been increasingly used as an entry exam by many post-secondary institutions in Brazil, and the essay portion of it accounts for the largest share of the public-private school achievement gap in ENEM scores. In allowing teachers to reallocate their time from grading to providing more tailored assistance to students, the AWE system could potentially improve students’ writing skills and their performance on the exam, helping close the opportunity gap between public and private-school students.
One of the advantages of the provider’s AWE technologies was that they were based on a platform that worked well with poor internet connections. Given the low cost of sharing online access to automated essay scoring, this algorithm could represent a cost-effective way of improving writing skills among school-aged youth, even in contexts of low internet connectivity.
Researchers partnered with the implementer to measure the impacts of two AWE-based programs on students’ writing skills. The evaluation took place in 178 public schools with about 19,000 students who had computer access in the state of Espírito Santo. Schools were randomly assigned to one of three groups:
The State’s Education Department selected schools to participate in the evaluation based on a 2017 survey on proneness to online technology adaptation. These 178 schools received 8,000 laptops between February and April of 2019 to ensure adequate computer availability for the implementation of the ed techs, regardless of whether they received the intervention.
The primary goal of the evaluation was to document the impacts of the two ed tech systems on ENEM essay scores. By comparing the two interventions, researchers assessed if incorporating additional inputs from humans improved grading and feedback quality on aspects in which AI may fall short. Given that the human grading component is expensive and challenging to scale up, it was important to understand if the potential benefits of the enhanced AWE system relative to the pure intervention were worth their cost.
Teachers shifted their work hours from routine (e.g., searching for orthographic mistakes) toward nonroutine tasks (e.g., providing individual assistance on essay consistency) with the adoption of both the enhanced and pure AWE systems, and students' essay scores improved as a result. Having human graders as a resource to improve grading and feedback quality did not lead to larger improvements in essay scores than the AWE system alone, despite increasing perceived feedback quality.
Take-up: Take-up was high and similar across the two ed tech systems for both teachers and students. In the two intervention groups, more than 95 percent of teachers used the ed techs to assign and collect essays in each of the five essay practices. Student compliance was also similar across interventions and relatively high. For each writing activity, 75 to 80 percent of students enrolled in the intervention schools submitted essays through the platform.
Impact on essay practice and quantity and quality of feedback received: Students in both intervention groups wrote more training essays, received more feedback on their essays, and perceived the feedback received as higher -quality. Students increased their essay practice and wrote 1.4 more training essays when using the enhanced AWE ed tech (a 29 percent increase relative to the baseline of 4.9 essays) and 1.6 more essays when using the pure AWE ed tech (a 32 percent increase) relative to the comparison group. In addition to practicing more, students in both interventions benefited from comments and notes in about 1.3 more essays (a 40 percent increase relative to 3.4 essays on average in the comparison group) and received a grade on an additional 1.6-1.7 essays (an increase of about 45 percent).
Students exposed to either of the two ed techs were also 6-7 percentage points more likely to find comments and annotations somewhat useful (from an average of 81 percent in the comparison group), but only those using the enhanced AWE ed tech were more likely to find the comments on their essays very useful (a 6 percentage point increase relative to a base of 44 percent in the comparison group). The difference in effects on perceived feedback quality between the two groups was meaningful, suggesting that the human graders in the enhanced AWE system did contribute to higher feedback quality.
Impact on teachers’ pedagogy and time allocation: Students using either of the two ed techs discussed roughly 35 percent more essays individually with their teachers after they received grades. However, only teachers in the enhanced AWE group reported having more time to cover the topics of writing, grammar, and literature in school. In line with these results, the share of teachers who felt their time was very insufficient dropped from 23 percent in the comparison group to 9 percent for teachers using the enhanced AWE system, but remained roughly unchanged for teachers using the pure AWE ed tech. Taken together, these findings suggest that teachers in the enhanced AWE group delegated part of their gradings tasks to human graders, while teachers in the pure AWE arm were able to keep pace by taking over some of these tasks, without increasing their usual workload.
Impact on essay scores and writing skills: The enhanced and the pure AWE interventions had positive and almost identical effects on the full ENEM essay score and led to similar improvements in scores on the writing skills evaluated by official graders, except for analytic skills. Both interventions increased the full essay score by about 0.09 standard deviations, mitigating an estimated 9 percent of the public-private school gap in essay scores. Specifically, syntactic skills scores improved by 0.07 and 0.06 standard deviations for students in the enhanced and pure AWE interventions, respectively; the policy proposal grade increased by 0.16 standard deviations with the enhanced AWE ed tech and 0.14 standard deviations with the pure AWE ed tech; and the analytic skills scores increased by 0.06 standard deviations with the pure AWE ed tech, but was not affected by the enhanced AWE intervention.
The similarity in effects across interventions suggests that the additional inputs from human graders did not change the extent to which the ed techs were able to improve scores. While this was expected for syntactic skills, which are graded based on standardized processes (e.g., parsing content and catching errors), it was not anticipated for the more complex parts of the exam (e.g., analytical skills and the policy proposal grade). Likely, the shifts in time allocation allowed teachers to fill in some of the gaps or limitations of the pure AWE ed tech relative to the enhanced platform.
In short, the study presents evidence that artificial intelligence can help overcome bottlenecks that prevent the accumulation of writing skills; in particular, teachers’ time constraints to provide individualized assistance to students. Pure and enhanced AWE ed techs led to similar effects on essay scores and writing skills, suggesting that including human scorers was not necessary to attain the same results. This makes the intervention less costly and easier to scale.
From 2020 onwards, the results from this study were used by the State Education Department of Espírito Santo to justify procuring the pure AWE tool. The program is currently a public policy for senior high school students in the state, potentially benefiting close to 30,000 senior high school students every year.
“PISA 2018 Results: Combined Executive Summaries, Volume I, II and III”, p. 18 (Table I.1, 2/2) Available at: https://www.oecd.org/pisa/Combined_Executive_Summaries_PISA_2018.pdf