Introduction

Snakebite envenomation, categorized as a neglected tropical disease (NTD), disproportionately affects vulnerable rural communities in tropical and subtropical regions1,2. Despite its considerable morbidity and mortality, snakebite has historically received limited attention and resources compared to other NTDs3. Factors contributing to this neglect include limited awareness, inadequate surveillance systems, scarce access to antivenoms and healthcare services, and a deficit in research and development for improved treatments1,4. Global collaboration, increased funding, and research initiatives are essential to formulate and execute comprehensive strategies for prevention, treatment, and rehabilitation5. Addressing snakebite as a neglected tropical disease holds the promise of significantly reducing its burden and improving the lives of affected communities.

The World Health Organization (WHO) has devised a comprehensive strategy to tackle snakebite mortality and morbidity, emphasizing four pillars: prevention, treatment, strengthening health systems, and empowering communities6. Prevention efforts include raising awareness, educating about snakebite risk factors, and promoting behavioral changes. Ensuring access to treatment involves guaranteeing the availability and affordability of antivenoms and enhancing healthcare infrastructure. Strengthening health systems encompasses capacity building through training healthcare workers, implementing surveillance systems, and integrating snakebite management into existing health programs. Empowering communities involves engagement, participatory decision-making, and community-led initiatives for snakebite prevention and management7. This comprehensive strategy aims to significantly reduce snakebite-related deaths and disabilities, improve treatment outcomes, and alleviate the socioeconomic burden worldwide.

Understanding snakebite risk factors is pivotal for prevention efforts and empowering local communities. Climate factors significantly shape the distribution, behavior, and abundance of venomous snakes, influencing snakebite incidence8,9,10,11. As climate change continues to impact ecosystems globally, it has the potential to alter the patterns of snakes and human distribution and behavior12. Rising temperatures, changing rainfall patterns, and shifts in habitat availability may result in variations in the geographical range and abundance of venomous snake species13,14, potentially bringing humans into closer contact with snakes and increasing the risk of snakebite incidents15,16. Understanding this complex relationship is crucial for effective preventive strategies, healthcare systems, and mitigating health risks associated with snake envenomation.

Colombia, with its unique geographic features and varied climate, provides an ideal habitat for a diverse range of venomous snakes17. The country experiences notable spatial and temporal variations in rainfall, with distinct wet and dry seasons across different regions9. Additionally, the El Niño Southern Oscillation (ENSO), a large-scale climate phenomenon, influences regional climate patterns, including precipitation18. These climatic factors can affect snake behavior and reproduction, potentially leading to variations in snakebite incidence10,12. While rainfall drives incidence dynamics in areas with marked seasonality, such as the Caribbean coast and the Orinoco plains9, the interaction between ENSO, forest coverage, and rainfall, and its potential effect on snakebite dynamics in Colombia remains unclear and has not been formally evaluated with causal methods.

Randomized controlled trials are the gold standard for assessing treatment/exposure effects. However, ethical concerns or practical limitations may favor causal inference in epidemiology19. Unlike statistical inference, which examines predictor variables in relation to outcomes, causal inference focuses on estimating the causal association between treatment/exposure and outcome, adjusting for confounders, instrumental variables, mediators, and effect modifiers20. Basic estimations of treatment effects in causal inference often rely on parametric methods like linear or generalized linear regression, assuming linearity and additivity. These assumptions can introduce bias through model misspecification, even without confounding factors, impacting the correct estimation of treatment/exposure effects21.

This study aims to assess the effect of rainfall on snakebite incidence in Colombia, employing a causal inference approach to appropriately adjust the causal model and machine learning methods to avoid assumptions about the linear form of the effect. By analyzing long-term spatiotemporal snakebite data alongside rainfall, ENSO-associated atmospheric and oceanic indices, and environmental and socio-economic variables, this research seeks to elucidate the relationship between these factors and provide valuable insights into the association between rainfall and snakebite incidence in Colombia.

Results

Among the 300 municipalities scrutinized, 50,334 snakebite cases were recorded from 2007 to 2021, constituting 78.2% of the nation’s reported cases (64,338). Predominant municipalities with the highest average cases were San Vicente del Caguán (59.5 average cases per year), Tibú (51.2 average cases per year), and Tierralta (50.9 average cases per year) (Fig. 1A).

Fig. 1
figure 1

Colombian dataset for analysis during the period 2007–2021: (A) Averaged yearly snakebite cases in the top 300 reporting municipalities. (B) Averaged rural gross domestic product (GDP) percentage in selected municipalities. (C) Averaged monthly rainfall in selected municipalities. (D) Averaged forest percentage in selected municipalities during the study period. The base map was obtained from DIVA-GIS (https://diva-gis.org/data.html), and the maps were generated using QGIS version 3.32.3 (https://qgis.org).

Conditional independencies within the directed acyclic graph (DAG) indicated values approaching zero, affirming the coherence between the DAG and dataset utilized for effect estimation (Fig. 2). The process of finding the correct adjustment of the causal model allowed us to identify the variable rural GDP as a collider, and for this reason, it was not included in the estimations of the causal effect of rainfall on excess snakebite cases20,22,23,24. This exclusion allows us to evade collider bias, which occurs when two or more variables (Forest and Rainfall) are associated with a third variable, the collider (rural GDP)25. Employing the linear double machine learning algorithm, as determined by the RScorer, yielded an estimation of the average treatment effect (ATE) of rainfall on excess snakebite cases of 0.021 (95% confidence interval (CI): 0.013–0.029). This implies a 2.1% rise in excess cases for each increase of one standard deviation in monthly rainfall (134.65 mm).

Fig. 2
figure 2

Evaluation of conditional independencies identified in the DAG. Note that the value of each conditional independence should be close to zero, to evidence no inconsistency between the DAG and the dataset used for the estimation of the ATE.

Figure 3 illustrates the constant marginal effect across forest cover, showing how the ATE changes with spatial variability in forest cover across municipalities, based on the spatiotemporal training data used in our machine learning model. A negative trend exists between the effect of rainfall on excess snakebite cases and forest coverage, where at lower forest coverage (< 50%) the effect of rainfall on excess cases is positive, meaning that high rainfall will likely generate an excess cases. Conversely, higher values of forest coverage > 50% show a negative effect; thus, high rainfall will decrease the excess cases (Fig. 3). Interestingly, most Colombians live in areas where high rainfall is more likely to cause excess snakebite cases, as municipalities with less than 50% forest cover have both a larger population and higher population density (20,521,292 vs. 2,956,796 people, and 247.32 vs. 16.84 people/km², respectively).

Fig. 3
figure 3

Constant marginal effect of rainfall on excess snakebite cases, conditioned by forest coverage (% of forest). The black line shows the effect of rainfall on the excess cases of snakebite, and the gray band represents the 95% CI. The effect changes its direction from a positive to a negative association around 50% of forest coverage.

The spatial pattern of the heterogeneous effect of rainfall on snakebite conditioned by forest coverage shows that the zones with the average highest effect (ATE > 0.0319) are in regions with low forest coverage, such as the plains of Caribe (Northern), Magdalena River (Center), and Orinoco (Eastern) (Fig. 4). Meanwhile, the regions of Pacific (Western) and Amazon (Southern) characterized by to have the highest forest coverage in the country, show average effect values relatively lower (ATE < 0.0142). This pattern is consistent with the estimation of the constant marginal effect (Fig. 3), and the average spatial distribution of forest coverage along the country (Fig. 1D).

Fig. 4
figure 4

Spatial pattern of the average heterogeneous effect conditioned by forest coverage for the top 300 reporting municipalities, during the study period 2007–2021. The base map was obtained from DIVA-GIS (https://diva-gis.org/data.html), and the map was generated using QGIS version 3.32.3 (https://qgis.org).

Figure S1 depicts the effect of rainfall on excess snakebite cases conditioned by the 9 atmospheric and oceanic indices. Non-homogeneous effects are observed, with positive relationships noted for SOI, ESOI, SST3, SST4, and NATL, while SST34, SST12, SATL, and TROP exhibit negative associations.

The three refutation methods applied to scrutinize the estimation and validate the causal relationship between rainfall and excess cases yielded p-values > 0.05, allowing us to rely on or causal estimation. The p-values for testing the differences between the estimated effect, and the same estimation with a random common cause and a random subset replacement, were > 0.05. Similarly, the p-value comparing the expected placebo treatment effect (0.000), with our estimated placebo treatment effect (-0.002) was equal to 0.45.

Discussion

The identification of a causal relationship between rainfall and the surge in snakebite cases is a significant advancement in the understanding of snakebite epidemiology. Our estimation, supported by robustness and refutation tests25,26, demonstrates a statistically significant causal relationship between rainfall and excess snakebite cases across municipalities. This causal machine learning approach contrasts with the prevalent statistical and mathematical models used in countries like Sri Lanka27, Costa Rica10, Colombia9, Cyprus28, Brazil29 and Ghana30, which often neglect the relationship between covariates leading to confusion or collider bias in exposure-outcome associations20. The explicit consideration of confounders, such as forest coverage and oceanic indices, while avoiding colliders (like rural GDP), contributes to obtaining unbiased estimates of the effect of rainfall on excess snakebite cases.

The intricate interplay between climatic phenomena, particularly El Niño, and its influence on snakebite dynamics has been previously reported by using SST4 to construct ENSO timeseries10. In our study, we extend this understanding by showing that oceanic indices modulate how rainfall influences excess snakebite cases, underscoring their relevance for predictive modeling (Fig. S1)31. Developing such models aligns with WHO global strategy to reduce snakebite mortality by 50% before 20306, highlighting the practical application of knowledge derived from ocean indices in strengthening health systems.

The conditioning effects of rural GDP and forest coverage on the association between rainfall and snakebite cases present intriguing revelations. While rural GDP causes collider bias, high forest coverage alters drastically the effect of rainfall causing excess snakebite cases. This highlights two distinct but interacting influences: forest coverage as an ecological moderator, and rainfall as a climatic driver of snakebite excess cases, unveiling a spatial pattern (Fig. 5) that corresponds with Bravo-Vega et al., (2022), where a significant positive association between rainfall and snakebite incidence was seen in the Orinoco basin and Caribbean coast9. Whilst that work attributes this non-homogeneous spatial pattern to areas with marked dry seasons, here we show that forest coverage is behind this phenomenon. The pattern that we found, absent in Bravo-Vega et al., (2022), suggests their model may not have accounted for inverse rainfall effects due to forest modulation.

Our results show that forest coverage modifies the direction of the rainfall-snakebite excess cases association: in highly forested municipalities (> 50% forest cover), rainfall is negatively associated with excess cases (ATE < 0), while in less forested areas (< 50%), the association is positive. Given that 20.5 million people live in municipalities with < 50% forest cover where rainfall positively affects excess cases (ATE > 0.0319), this relationship may have public health implications in heavily deforested areas, which are increasing in the country32. Sadly, there are scarce information around venomous snakes’ natural history in Colombia, which has proved to be useful to unveil the mechanisms behind climate and snakebite risk association in countries like Costa Rica33 and Brazil34. Human intervention reduces forest cover, which in turn decreases evapotranspiration and leads to lower rainfall35. Although our analysis does not directly model these mechanisms, we hypothesize that in areas with low forest cover monthly rainfall distribution becomes more heterogeneous, intensifying dry seasons and forcing snakes to be more active during the rainy season, when resources are more abundant34. In addition, low forest coverage can modify venomous snakes’ reproductive cycles, facilitate colonization of new areas by these animals, and can change the structure of snakes’ communities, thus affecting snakebite risk14,36,37. Low forest coverage also increases flooding, thus reducing the shared area between venomous snakes and humans (unflooded land), which increase the likelihood of a human-snake encounter and snakebite risk11,38.

The negative association found in areas with high forest coverage may be explained by the social context of several communities, where communities in the amazon basin and in the pacific coast tend to use more transport by rivers because of the lack of availability of roads, or because pathways can be flooded39,40. In highly forested areas like the Amazon basin (Fig. 2D), reliance on river transport reduces ground mobility and human-snake encounters during rainy seasons. During dry periods, decreased river flow forces people to travel on foot, increasing snakebite exposure41. These ecological and socio-economic factors can explain this complex association, and our work highlights the need of a one-health perspective on snakebite to finally unveil these macro complex associations at a local scale: We propose that community-based science is important to generate this new biological and socioeconomical knowledge to finally address this important issue in Colombia.

Factors that influence snakebite dynamics in other regions are the Monsoon in Asia42,43 and temperature in temperate areas of the United States and Europe44. The absence of seasons in tropical Colombia mitigates the influence of temperature on interannual snakebite seasonality9. However, the impact of temperature on the way rainfall drives snakebite cases remains unexplored, underlining a potential avenue for future research. Furthermore, recognizing the role of microclimatic variables in shaping snake activity adds a layer of complexity to our macroecological understanding, emphasizing the need for comprehensive analyses45.

Despite the strengths of our causal inference framework, several methodological limitations must be acknowledged. Unmeasured confounding remains a key concern in observational studies, as omitted variables may introduce bias in effect estimation46. Although our DAG incorporated nine atmospheric and oceanic indices and forest coverage, relevant factors such as La Niña-driven landslides affecting healthcare access and reporting47 were not explicitly modeled. Additionally, the monthly and municipal-level aggregation of data may obscure finer-scale temporal and spatial dynamics48. Spatial causal inference also faces challenges from potential inference across treatment units, where rainfall effects in one municipality may influence neighboring aeras via spillover49. Finally, our model assumes temporal stability in covariate relationships, yet dynamic climate-ecosystem interactions and human behaviors may violate this assumption50.

Our findings suggest the need for context-specific, climate-adaptive public health strategies in deforested regions, where rainfall extremes exacerbate snakebite risk. Surveillance systems should prioritize these areas during high precipitation periods, and targeted interventions such as antivenom stockpiling and healthcare worker training must be designed with future climate scenarios in mind13,51. In Colombia, where deforestation have been intensified, geographically informed resource allocation may be increasingly important52. Integrating environmental and health policies to address both land use change and snakebite prevention is essential to mitigate the burden of this neglected tropical disease. As climate change accelerates, efforts aligned with the WHO’s goal to reduce snakebite-related deaths and disabilities by 50% before 2030 will require context-specific intervention, particularly related to specific patterns of climate and land use change53.

Methods

Data sources

Epidemiological data

The monthly snakebite cases from January 2007 to December 2021 were sourced from the National Public Health Surveillance System of Colombia. We selected the top 300 municipalities with the highest reported cases, ensuring data consistency by excluding entries with municipality or date discrepancies (Fig. 1A). Monthly incidence per 10,000 inhabitants was computed for each municipality, aligning cases data with the National Department of Statistics population data54. The identification of excess cases was based on the standardized incidence ratio (SIR), categorizing SIR > 1 as excess and transforming it into a binary variable (1 for SIR > 1 and 0 otherwise) following Blangiardo et al.‘s approach (2013)55.

The Standardized Incidence Ratio (SIR) is a widely used epidemiological tool that compares the observed incidence in a specific population/municipality to the expected incidence in a standard population (the country)55. The SIR was estimated by dividing the observed number of cases in each municipality by the expected number of cases in the standard population. Note that the SIR is centered around 1 (areas with the same observed number of cases than expected) for ease of interpretation and comparison, serving to identify the excess snakebite cases.

Environmental and socio-economic data

Monthly values of nine atmospheric and oceanic indices for the period between January 2007 and December 2021 (Table 1), averaged domestic gross domestic product percentage (rural GDP) (Fig. 1B), monthly rainfall data (Fig. 1C), and annual forest coverage (Fig. 1D) were obtained at a municipality scale from the National Oceanic and Atmospheric Administration56, National Aeronautics and Space Administration (product GLDAS_NOAH025_M version 2.1)57 and the Terridata repository58. We estimated annual forest coverage and rainfall for each municipality by utilizing the R package MODIStsp59 and spatial matching techniques in the R package raster60.

Table 1 List of atmospheric and oceanic indices included in the study.

Causal inference analysis

We constructed a DAG to model the exposure (monthly rainfall) and outcome (excess snakebite cases). To tackle confounding bias, the DAG included nine atmospheric and oceanic indices and forest coverage as potential confounders, and additionally, we included the rural GDP as an effect modifier (Fig. 5). The DAG shows an asterisk to signify global teleconnections, which emphasize the interdependencies among atmospheric and oceanic indices61. We assumed associations among these variables based on established literature, acknowledging their potential confounding effects.

Fig. 5
figure 5

Directed Acyclic Graph (DAG) depicting rainfall’s effect on excess snakebite cases. The black arrow indicates the association of rainfall (exposure) with excess cases (outcome). The Oceanic and atmospheric and indices node include the 9 indices detailed in Table 1, the asterisk (*) on the node highlights the global teleconnections and interdependencies among the indices. Dashed arrows from the U node signify potential latent confusion bias from unmeasured confounders.

To elucidate the impact of atmospheric and oceanic indices as potential confounders, consider the scenario of estimating the effect of rainfall on snakebite cases. The Southern Oscillation Index (SOI) acts as a potential confounder, altering both the exposure variable (rainfall)62 and concurrent temperature and humidity patterns, thereby influencing snakebite incidence29 and introducing a dual effect on treatment and outcome. This dual influence establishes an indirect association between the exposure variable and snakebite excess cases, leading to confusion bias in the estimation.

Our analysis assumes associations between atmospheric and oceanic indices and rainfall, following the proposition of Costa et al. (2009)63. Similarly, we posit associations among forest coverage, rainfall, and the gross domestic product linked to rural activities in Colombia. This assumption aligns with historical patterns where regions with high rainfall and extensive forest coverage coincide with economically underdeveloped areas characterized by rural activities64,65.

Machine learning implementation

To estimate the association between the exposure and outcome, we employed the double machine learning algorithm, addressing linearity and additivity assumptions inherent in standard regression models66,67. Using a gradient boosting algorithm with 1,500 trees, cross-validation with k = 3, polynomial features with degree = 3, and lasso regularization, we estimated the ATE and its 95% CI. Additionally, we calculated the constant marginal effect to understand variations in the heterogeneity of the effect conditioned by forest coverage.

Note that the ATE represents the average difference in excess snakebite cases between municipalities with and without an increase of one standard deviation in rainfall (see Results section), while the constant marginal effect is the average treatment effect among all municipalities that share the same covariate values, i.e., municipalities that share the same values of forest coverage.

We tested several variations of the double machine learning algorithm, including linear, sparse linear, kernel, standard double machine, and causal forest, to select the machine learning model with the least overfitting during the training steep. Algorithm selection was facilitated through the RScorer test68, leveraging the RLearner loss to ensure optimal performance69.

Refutation tests

We assessed the estimate of the effect of rainfall on excess snakebite cases through three refutation methods, namely, adding a random common cause, replacing a random subset (10%), and introducing a placebo treatment. These tests help us validating the estimated association, where the results from the two first tests were expected to align with the ATE, whilst the inclusion of a placebo should give an ATE close to cero, thus providing confidence in the robustness of the causal estimation.

Continuous variables were transformed into standard deviation (sd) units, excluding the forest coverage variable, to estimate and visualize the conditional effect for each one. This transformation facilitated model convergence. The appropriateness of DAG adjustment and tests of conditional independence were performed using R packages ggdag version 0.2.6 and DAGitty version 0.3–170,71. Causal inference analysis was implemented using the DoWhy version 0.6 and EconML version 0.1368,72 modules in Python. The dataset and code are available for reproducibility at: https://github.com/juandavidgutier/rainfall_snakebite.