fbpx

August 2024 interim update to ‘Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and expanded’

by , , , and | August 2024

In November 2023, we published Version 3 of our psychotherapy analysis. This was a working report in which we estimated the effects of psychotherapy in low- and middle-income countries, as well as the cost-effectiveness of two psychotherapy charities: StrongMinds and Friendship Bench. In the first part of 2024, we have updated several parts of the analysis. This present, interim report, Version 3.5, describes the changes we have made so far. Our analysis suggests that both StrongMinds and Friendship Bench are among the most cost-effective charities we have evaluated to date. Friendship Bench has a cost-effectiveness of 53 WELLBYs per $1,000 donated and StrongMinds has a cost-effectiveness of 47 WELLBYs per $1,000 donated.

This is the summary of the report. Click the button above to read the pdf of the full report (130 pages).

Summary

In November 2023, we published Version 3 (V3) of our psychotherapy analysis. This was a working report in which we estimated the effects of psychotherapy in low- and middle-income countries (LMICs), as well as the cost-effectiveness of two psychotherapy charities: StrongMinds (SM) and Friendship Bench (FB). In the first part of 2024, we have updated several parts of the analysis. This present, interim report, Version 3.5 (V3.5), describes the changes we have made so far, and our current funding recommendations for StrongMinds and Friendship Bench. The goal of this report is to provide a timely update on our thinking, so it does not reiterate our methodology from Version 3; it only mentions where we update or expand upon it. We plan to publish Version 4 later this year after we have: done a second risk of bias assessment, double checked data, integrated any additional information the charities will provide us with, and received external academic review. We do not expect there will be major changes between Version 3.5 and Version 4 in terms of our results, but we can’t rule out changes that come from receiving review. The aim of Version 4 is to produce a standalone report that comprehensively explains our full methodology and results in one place, and does not require readers to be familiar with our previous psychotherapy reports.

Our analysis suggests that both StrongMinds and Friendship Bench are among the most cost-effective charities we have evaluated to date. Friendship Bench has a cost-effectiveness of 53 WELLBYs (see Endnote 1) per $1,000 donated (hereafter ‘WBp1k’) and StrongMinds has a cost-effectiveness of 47 WBp1k. As the cost-effectiveness of the two charities is similar, and because of uncertainty about these estimates, we avoid strictly considering one a better opportunity for improving global wellbeing over the other. In the rest of this text, we mention various differences between the organisation’s programmes that donors may consider relevant when deciding whether they want to donate to one, the other, or somehow split their allocation. See our website for more up to date information about our recommendations across all the charities we have evaluated.

Main Updates for Version 3.5

We extracted results from 44 underpowered studies we had postponed extracting in Version 3 due to time constraints. After reviewing studies and their fit with our inclusion criteria, we added 2 further studies, but removed 3 other studies. Overall, this led to an initial dataset with 128 studies. Our collaborators at Oxford rated these studies for risk of bias. Following this, we removed 46 studies (from our 128 studies) which were classified as ‘high’ risk of bias, in compliance with our protocol (McGuire et al., 2024). As with Version 3, we removed outliers: effect sizes with values above 2 standard deviations (SDs; g > 2 SDs) as is done in other meta-analyses (Cuijpers et al., 2018; Cuijpers et al., 2020c; Tong et al., 2023). Otherwise, the effects of psychotherapy would be overestimated because some studies provide large implausible effect sizes (up to 10 SDs). After removing 10 outlier studies, we arrive at the final sample of 72 studies, with 215 effect sizes, used in this analysis. Overall, our risk of bias analysis and updated methodology has led to a decline in the total effect of psychotherapy in LMICs on the individual (2.6 → 1.9 WELLBYs).

We made several further methodological improvements to our analysis, the most important of which was updating our system for weighing and aggregating different pieces of evidence. We move beyond solely using the weights suggested by a formal Bayesian analysis, which are only based on statistical uncertainty. Now, we use subjective weights that are informed by the Bayesian analysis and a structured assessment of relevant characteristics based on the GRADE criteria (SchĂ¼nemann et al., 2013). This does mean introducing (more) subjectivity into the analysis but it is the best way we are aware of to account for higher-level, hard-to-quantify uncertainty, notably, the direct relevance of the different sources of evidence. Hence, Version 3.5 places a greater emphasis on the more relevant pieces of evidence related to a charity’s effects – principally, the randomised controlled trials (RCTs) based on the programmes StrongMinds and Friendship Bench implement – than in Version 3. We have also added charity monitoring and evaluation (‘M&E’) pre-post results as an additional source of evidence which we incorporate. We do not put much weight on the M&E data for two reasons: (1) it is not causal evidence (2) we are uncertain about which method to use to adjust pre-post results to account for not having a control group; we could not find a clear, precedented methodology for our specific analysis, and we try multiple methods that produce differing results. We hope to improve on all of these methodological points in the future. Note that we understand that not everyone will agree with our informed weights, and so we describe how results would change with different weights in our robustness section (see Section 7.3.1).

This version also comes with an improvement in the flow of our analysis, where we now separately estimate, adjust, and present effect estimates based on the different evidence sources before combining them. This helps show the similarities and differences in estimated effects between evidence sources.

We now present a revised and expanded set of factors that influence our confidence in our cost-effectiveness analysis figures. These include an assessment of:

  • The depth of the analysis (see Endnote 2), based on a combination of how extensively we have reviewed the literature and how comprehensive our analysis is.
  • The evidence quality, which we assess using an approach based on GRADE with a few minor adjustments to fit the charity evaluation context (see Endnote 3). Note that our criteria for evidence quality is stringent. Note also that our assessment has become more stringent since the last version because we now more precisely account for how different sources of evidence have different ratings, notably, spillovers play an important part in the analysis but have lower quality evidence.
  • We conduct robustness checks to see if alternative analytic choices would result in a decision-relevant change to our results. What is a decision-relevant change? We think one important decision is whether the intervention is more (i.e., robust) or less (i.e., not robust) cost-effective than GiveDirectly cash transfers. We currently estimate the cost-effectiveness of GiveDirectly at 8 WBp1k, so we use this as our lower robustness threshold. However, to provide a stricter test, we also use a higher threshold at 20 WBp1k, which represents 2.5x the cost-effectiveness of GiveDirectly.
  • We have now conducted site visits of the charities, which have added to our confidence.

Previously, we only formally considered evidence quality and depth. We think we have made the additional factors that inform our interpretation of the quantitative analysis much more legible.

We also updated specific details of the implementation of the StrongMinds and Friendship Bench programmes such as the costs, the number of people treated, and the average dosage received per person to include more up-to-date 2023 figures for StrongMinds and Friendship Bench. Additionally, we took a closer look at the RCT evidence supporting Friendship Bench.

Finally, we also made a number of smaller updates and changes to our analysis, which we describe throughout this report.

Friendship Bench cost-effectiveness

Our updated estimate of Friendship Bench’s overall effect (the effect on the individual and on the household) per person treated decreased (1.34 → 0.87 WELLBYs), primarily due to two factors. First, a decrease in the modelled total effect on the individual in both the general evidence prior and in the charity-related RCTs. Second, and most important, we apply a bigger adjustment for low dosage (0.37 → 0.33) because the latest, more precise information from Friendship Bench suggests that participants, on average, receive 1.12 out of the 6 possible sessions of psychotherapy (previously, 1.95). The costs, however, have also decreased ($20.87 → $16.50), counterbalancing some of the decline in effectiveness. Overall, this has led to a decrease in the cost-effectiveness of Friendship Bench (58 → 53 WBp1k, or $19 to produce one WELLBY).

In Version 3, we had categorised Friendship Bench as a ‘promising charity’ because it appeared to be highly cost-effective, but we had only evaluated it in moderate depth. We now rate our evaluation as ‘high’ depth to reflect the additional analysis and review. This means that we believe we have reviewed most of the relevant available evidence, and we have completed nearly all (e.g., 90%+) of the analyses we think are useful. We have reviewed the Friendship Bench data in more depth and added the 2023 pre-post data as a source of evidence in our analysis. Based on the GRADE criteria (SchĂ¼nemann et al., 2013), we evaluate the overall quality of evidence for Friendship Bench as being ‘low to moderate’, though readers should know these are very stringent standards (and labels) for evaluating evidence quality. Friendship Bench is robust to all individual plausible robustness checks at 20 WBp1k. Combining all the adjustments together reduces the cost-effectiveness to 14 WBp1k. We have also been reassured by our site visit that Friendship Bench is operating an effective program.

Nevertheless, while we have finished an in-depth evaluation, we still have some concerns around the implementation of the Friendship Bench programme in practice: in reality, participants attend 1.12 sessions, far less than intended 6 sessions (or the nearly 6 attended in the relevant RCTs). We discuss this topic in depth in Section 4.2.2. In the points below, we summarise the reasons why we think it is still plausible that Friendship Bench would be cost-effective at improving global wellbeing:

  • Despite applying a severe adjustment for attendance of 0.33 (67% discount), Friendship Bench is still cost-effective at 53 WBp1k.
  • Even with a more severe adjustment of 0.16 (84%) in our robustness checks (see Section 7.3.4), Friendship Bench is still cost-effective at 31 WBp1k.
  • There is research by Schleider and colleagues (Schleider & Weisz, 2017; Schleider et al., 2022; Fitzpatrick et al., 2023) to show that even single session therapy can be effective, and our adjusted effects for Friendship Bench are close in magnitude to the effects found in this literature.
  • Our adjustment for dosage mixes concerns both about the ‘intended’ number of sessions (6 in this case) with the number of sessions ‘actually attended’ (1.12 / 6 = 19% in this case).
  • We explore and present different plausible alternative calculations for the dosage adjustment and their limitations. We think our chosen calculation is plausible and evidence based. Plus, the harshest possible calculation is the 0.16 adjustment we use in our robustness checks, which leads to a cost-effectiveness of 31 WBp1k. Hence, our overall conclusion that Friendship Bench is cost-effective is robust to the type of calculation selected.
  • We think that it is plausible that low attendance can still be impactful because the first few sessions can play an important psychoeducative role (as witnessed in our site visit). The first session of problem solving therapy (the programme Friendship Bench uses) does involve a whole process of discussing a problem and making a plan to address it, it is not just an introduction.
  • The Friendship Bench 2023 pre-post data source (with all the caveats of using this data source) suggests a higher cost-effectiveness than the other data sources, with 64 WBp1k, even though the participants also did very few sessions (1.16 sessions on average). Furthermore, we have also seen similar evidence of effectiveness in a wider range (2021-2024) of pre-post data from Friendship Bench. We use the 2023 data because it is the latest complete year and the most relevant for our purposes.
  • Friendship Bench have told us that they believe low attendance is not necessarily a problem because some clients only do a few sessions because they feel like it has helped them and they do not find more sessions necessary. Other clients, however, encounter barriers like transport, which suggests the attendance could be improved for some clients. Friendship Bench have told us that they plan on improving uptake and mental health awareness. We are keen to see improvements in these areas in future data reports.

We have attempted to adjust for this issue in our estimates, but we are still left with some uncertainty about the magnitude of the effects. We believe that if Friendship Bench improved attendance (for those in need, as some clients may only need a few sessions), it could increase their effectiveness – and likely cost-effectiveness – as well as assuage our uncertainty.

StrongMinds cost-effectiveness

The overall effect of StrongMinds has decreased slightly (2.09 → 2.03 WELLBYs), because more weight is placed on the charity-related RCTs evidence coming from the Baird et al. (2024) RCT (16% → 25%), which has a very small effect. The weighting has changed mainly because in Version 3 we used a placeholder but now we can directly use the results from Baird et al. (2024) which are finally out as a working paper. We now also place weight on the M&E pre-post data (17%), which has a larger effect than the two other evidence sources. However, the costs have declined more than we expected ($63 → $43). Overall, this led to an increase in the cost-effectiveness of StrongMinds (30 → 47 WBp1k) or $21 to produce a WELLBY.

We do not think Baird et al. (2024) should be given more weight (arguably, it could probably receive much less) amongst the different sources of evidence for StrongMinds (see Section 7.3.1 for more detail as to how results are affected by these weights). We discuss why it is only of limited relevance, even though it is the only RCT of StrongMinds’ programme with a partner (BRAC), in detail in Section 5.2.1. Briefly, some considerations about Baird et al.’s (2024) relevance to StrongMinds are that it involved:

  • Different population: Baird et al. (2024) treat adolescents and used youth facilitators;  StrongMinds mainly treats adults (81% of the time) and no longer uses youth facilitators.
  • Different control group: the control group in Baird et al. (2024) was more ‘active’ compared to what we expect StrongMinds’ clients would have access to if they did not receive psychotherapy. The control group involved Empowerment and Livelihood for Adolescents (ELA) clubs, which could lead to improvements in wellbeing for the control when most people might not have access to another kind of intervention when they don’t have access to psychotherapy.
  • Different context: the long-term data collection occurred during COVID-19, so COVID may have overpowered the effects of the intervention; Baird et al. (2024) should be seen as more informative about the long-run effects of therapy when a pandemic strikes, than in general.
  • Different/worse implementation quality: We think that the implementation in Baird et al. (2024) was worse than what StrongMinds would provide today. Factors suggesting this are the use of youth facilitators, the low compliance, the limited involvement from StrongMinds, and the improvements made by StrongMinds since then (discussed below).
  • Different levels of compliance: There was unusually low compliance in Baird et al. (44% attended no sessions) which we do not think is representative of StrongMinds’ general compliance rates.
  • Limited involvement: StrongMinds have communicated to us that there were constraining factors that meant they could not be as involved as they would be with partners. Notably, they told us that, to accommodate the school schedules of many clients, group therapy sessions were hosted on weekends, which limited StrongMinds’ ability to supervise and provide feedback to the BRAC facilitators.
  • Growing pains: this was the first time StrongMinds attempted to implement its programme via a partner. StrongMinds (2024) and Baird et al. (2024) acknowledge that many improvements have been made since then in StrongMinds’ work with partners and with adolescents. Therefore, this RCT is not fully representative of StrongMinds’ current direct- and partner-implemented programmes.
  • Unexpectedly small results: Baird et al. (2024) comment that the effect they found was unusually small compared to a study using the same intervention as StrongMinds – Bolton et al. (2003) – and this merits explanation. We provide further examples of how these results differ from similar studies. Furthermore, we expect that relatively worse implementation (see above) was one of several factors that may explain the lower-than-usual effects.

For these reasons, we do not think it appropriate to base our evaluation of StrongMinds solely or primarily on one RCT of limited relevance. Instead, we also draw on the other sources of evidence: the general psychotherapy meta-analysis (the largest of the sources with 72 RCTs) and the M&E pre-post data (the most relevant of the sources).

We rate our evaluation as ‘high’ depth. This means that we believe we have reviewed most of the relevant available evidence, and we have completed nearly all (e.g., 90%+) of the analyses we think are useful. Based on the GRADE criteria (SchĂ¼nemann et al., 2013), we evaluate the overall quality of evidence for StrongMinds as being ‘low to moderate’, though readers should know these are very stringent standards (and labels) for evaluating evidence quality. StrongMinds is robust to individual plausible robustness checks at 20 WBp1k, except giving 100% weight to the least cost-effective of the sources of evidence (i.e., Baird et al., 2024), which reduces the cost-effectiveness to 9 WBp1k. Combining the adjustments together reduces the cost-effectiveness to 7 WBp1k, which is largely driven by the evidence weighting. Note, again, that we do not consider this outcome, nor of giving all (or even most) of the weight to Baird et al. (2024) very plausible. We have also been reassured by our site visit that StrongMinds is operating an effective program.

As in earlier analyses, our main source of uncertainty is due to the lack of high quality, and relevant, RCTs of the StrongMinds programmes (as noted, Baird et al., 2024, has limited relevance). Plus, as mentioned above, the results from Baird et al. (2024) if taken alone, are much less cost-effective than the other sources of evidence for StrongMinds. We now view robustness of results across data sources as being more important than we did before, as unaccounted differences across reasonable data sources warrants increased uncertainty. Nevertheless, our weighted average of the difference sources find StrongMinds to be a cost-effective way of improving global wellbeing. We hope to see more current and relevant RCTs of StrongMinds’ programme.

Comparing charities

StrongMinds and Friendship Bench are among the best giving opportunities we have found so far for donors who want to support the most cost-effective, evidence-based ways of improving wellbeing by improving the quality of life of recipients. StrongMinds is now 5.7 times (previously 3.7 times) more cost-effective than GiveDirectly (GD), an NGO that provides cash transfers for very poor households, which we have examined in another major analysis, and we take to be an important point of comparison. Friendship Bench is now 6.4 times (previously 7.0 times) more cost-effective than GiveDirectly. Our results suggest that delivering psychotherapy to people in Sub-Saharan Africa (SSA) who have common mental disorders is more cost-effective at improving global wellbeing than providing $1,000 cash transfers to people in SSA in poverty because, while the per person effects of the psychotherapy charities are smaller than that of GiveDirectly, delivery of psychotherapy is much cheaper per person (see Endnote 4). As the cost-effectiveness of StrongMinds and Friendship Bench is similar, we think both provide good giving opportunities for donors. See our website for the most up to date recommendations amongst our different evaluations.

Notes

Updates note: This is Version 3.5, an update to the Version 3 working paper. New versions will be uploaded over time.

External appendix and summary spreadsheet note: There is no external appendix for this update (refer to Version 3 for more detail). There is a summary spreadsheet available. But note that our analysis is conducted in R and explained in the report.

Author note: Joel McGuire, Samuel Dupret, and Ryan Dwyer contributed to the conceptualization, investigation, analysis, data curation, and writing of the project. Michael Plant contributed to the conceptualization, supervision, and writing of the project. Ben Stewart contributed to the writing.

Note that the views of collaborators, reviewers, and employees from the different charities evaluated do not necessarily align with the views reported in this document.

Collaborator note: We thank Maxwell Klapow, Deanna Giraldi, Benjamin Olshin for their work on the Risk of Bias analysis.

Reviewer note: We thank, in chronological order, the following reviewers or people who have answered technical questions for us: Lily Yu (general; HLI), Peter Brietbart (general; HLI), Lara Watson (general; HLI), Lingyao Tong (meta-analysis methods and results), Clara Miguel Sanz (meta-analysis methods and results), Sven Kepes (questions about heterogeneity, publication bias, and outliers).

Charity information note: We thank Jess Brown, Andrew Fraker, Rasa Dawson, Elly Atuhumuza, and Roger Nokes for providing information about StrongMinds. We also thank Lena Zamchiya, Ephraim Chiriseri, and Tapiwa Takaona for providing information about Friendship Bench.

Summary Endnotes

(1) One WELLBY (or wellbeing adjusted life year) is the equivalent of a 1 point increase on a 0-10 wellbeing scale. See our methodology page for more detail.

(2) The depth of our analysis is based on a combination of how extensively we have reviewed the literature and how comprehensive our analysis is.

  • High: We believe we have reviewed most or all of the relevant available evidence on the topic, and we have completed nearly all (e.g., 90%+) of the analyses we think are useful.
  • Moderate: We believe we have reviewed most of the relevant available evidence on the topic, and we have completed the majority (e.g., 60-90%) of the analyses we think are useful.
  • Low: We believe we have only reviewed some of the relevant available evidence on the topic, and we have completed only some (10-60%) of the analyses we think are useful.

(3) Our assessment of the quality of evidence is based on a holistic evaluation of the quantity and quality of the data, combined across the different sources of evidence for the charity. This is based on the GRADE criteria (SchĂ¼nemann et al., 2013): Study design, Risk of bias, Imprecision, Inconsistency, Indirectness, and Publication bias. We provide a rough example of how this can work:

  • High: To be rated as high, an evidence source would have multiple relevant, low risk of bias, high-powered RCTs that consistently demonstrate effectiveness and have little to no signs of publication bias.
  • Moderate: If the evidence source moderately deviates on some of the criteria above, it would be downgraded to moderate. For example, if it has some moderate issues of risk of bias, publication evidence from a single well-conducted RCT, or evidence from multiple well-designed but non-randomised studies that consistently demonstrate effectiveness.
  • Low: If the evidence deviates more severely on these criteria it could be downgraded to low. For example, if it does not use causal studies (pre-post, correlations, etc.).
  • Very low: If the evidence deviates even more severely on these criteria, or is low on many criteria, it can be downgraded to very  low.

For more detail, please consult our page on quality of evidence and Section 7.2 of the report.

(4) Note that psychotherapy is provided to individuals with common mental disorders like depression, who, because they live in SSA, also happen to be poor. Cash transfers are provided to individuals in SSA because they are poor; whether they also have problems like depression is unknown. Hence, we are not saying that giving psychotherapy to a randomly selected poor person in SSA is better than giving them a cash transfer, only that funding psychotherapy for the individuals that need it is more cost-effective at improving global wellbeing than funding cash transfers. We expand on this below.

We estimate that GiveDirectly cash transfer has an overall effect (10.01 WELLBYs; McGuire et al., 2022b) which is 5-12x greater than the overall effect of a course of psychotherapy (from StrongMinds: 2.03 WELLBYs; or Friendship Bench: 0.87 WELLBYs). However, the cost to provide a $1,000 cash transfer with GiveDirectly is $1,220, which is 28-74x more costly than psychotherapy ($43.3 for StrongMinds and $16.5 for Friendship Bench). For $1,220, one could, thereby, fund 28-74 courses of psychotherapy. To put it another way – in the context we are considering and with some linear assumptions about dosage – a course of psychotherapy for depressed person A, which costs $43 (as is the case for StrongMinds), would have about the same effect on total wellbeing as providing a cash transfer of $243 to person B.