D-FENSE

Logo

D-FENSE project deals with Dengue Virus (DENV) epidemics in Brazil, enabling predictive modeling and data visualization to support decision-making in public health.

View the Project on GitHub americocunhajr/D-FENSE

Dynamics for Epidemic Surveillance and Evaluation

D-FENSE: Dynamics for Epidemic Surveillance and Evaluation is an initiative to deal with Dengue Virus (DENV) epidemics in Brazil.

This repository stores and shares surveillance and climate data related to DENV epidemics since 2010. It also presents predictive models for DENV outbreaks in the country. This work seeks to address emerging demands for dengue monitoring and forecasting, contributing to detailed analysis and supporting decision-making in public health. The objectives of this initiative include:

Team for 2025 challenge

Collaborators

Repository structure

D-FENSE/
│
├── DengueSprint2024_ChallengeRules/             # Official 2024 challenge docs (scope, submission rules, etc)
├── DengueSprint2024_DataAggregated/             # 2024 data after basic aggregation and harmonization
├── DengueSprint2024_DataProcessed/              # 2024 data after spurious values cleaning and noise filtering
├── DengueSprint2024_DataProcessingCode/         # Codes used for data processing in 2024 Sprint
│
├── DengueSprint2025_ChallengeRules/              # Official 2025 challenge docs (scope, submission rules, etc)
├── DengueSprint2025_DataAggregated/              # 2025 data after basic aggregation and harmonization
├── DengueSprint2025_DataProcessed/               # 2025 data after spurious values cleaning and noise filtering
├── DengueSprint2025_DataProcessingCode/          # Codes used for data processing in 2025 Sprint
│
├── DengueSprint2025_DataVisualization/           # Graphs to visualize surveillance and climate variables
│
├── DengueSprint2025_Model1_LNCC-ARp-2025-1/      # Codes and results obtained with the model LNCC-ARp (version 2025-1)
├── DengueSprint2025_Model2_UERJ-SARIMAX-2025-1/  # Codes and results obtained with the model UERJ-SARIMAX (version 2025-1)
├── DengueSprint2025_Model3_LNCC-CLiDENGO-2025-1/ # Codes and results obtained with the model LNCC-CLiDENGO (version 2025-1)
├── DengueSprint2025_Model4_LNCC-SURGE-2025-1/    # Codes and results obtained with the model LNCC-SURGE (version 2025-1)
├── DengueSprint2025_Model5_UERJ-SARIMAX-2025-2/  # Codes and results obtained with the model UERJ-SARIMAX (version 2025-2)

Data Source

The raw data used here was obtained in the Mosqlimate platform: https://sprint.mosqlimate.org/data/

Reference:

Data Processing

Author:

This data processing framework involves a two-step, reproducible MATLAB pipeline that converts the Mosqlimate raw files into UF-level weekly time series ready for visualization and modeling.

Raw data files (download from https://sprint.mosqlimate.org/data):

Put these CSVs files into the repository ‘DengueSprint2025_DataProcessingCode/DataRaw/’

Step 1 — Aggregate (DFENSE_DataAggregation.m):

Step 2 — Filter & Smooth (DFENSE_DataFilteringSmoothing.m):

Output schema (columns in CSV, one row per UF × week):

Data Visualization

Authors:

Soon!

Data Statistics

Authors:

Soon!

Model 1: LNCC-ARp-2025-1

LNCC-AR_p
ARp is a forecasting model for DENV dynamics through an autoregressive process of order p.

Repository structure:

D-FENSE/DengueSprint2025_Model1_LNCC-ARp-2025-1/
│
|── Aggregated_Data: surveillance data aggregated at the state level
│
|── DFense_ARp: codes and results for the 3 validation challenges
  │
  |── validation1: material related to validation 1 challenge
      |── matlab: Matlab scripts needed to run (run_batch_v1_predictor_ARp.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |── planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |── plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.
  │
  |_ validation2: material related to validation 2 challenge
      |── matlab: Matlab scripts needed to run (run_batch_v2_predictor_ARp.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |── planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |── plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.
  │
  |── validation3: material related to validation 3 challenge
      |── matlab: Matlab scripts needed to run (run_batch_v3_predictor_ARp.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |── planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |── plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.

Author:

Data and Variables:

Only the time series of the raw number of dengue cases per state along epidemic weeks has been used. Data are available from the ‘Aggregated_Data’ repository.

Model Structure and Training:

For each state (UF), the log2 mapping of time-series of raw dengue cases, in the defined range for each validation, has been used to estimate an AR(p), p=92 (experimentally chosen), via the function armcov.m. Initial conditions for the AR(p) model at epidemic week (EW) 25 of 2022/23/24 have been obtained by a simple scheme of inverse filtering of the time-series, followed by direct filtering of the modeling error. The modeling error sequence has been organized in a matrix with 52 columns, with each row representing a modeling error sequence for a single year. Assuming a zero-mean Gaussian White noise distribution for the modeling error ensemble, the standard deviation of a typical model excitation has been estimated. Then, a Monte Carlo simulation with 10k runs was carried out to generate predictions for dengue cases: the AR(p) and initial conditions were fixed, only the model excitation was drawn from a Gaussian distribution. Each of these model excitations has 79 samples, covering a forecast from EW 26 of a given year to EW 52 of the subsequent year. Then, the attained results have been mapped back to the original amplitude domain (via the inverse of the log2 function). From the set of these 10k case predictions, the median, lower- and upper-bounds of the 50%, 80%, 0%, 90%, and 95% prediction intervals are calculated. Finally, the resulting curves are smoothed out via an SSA (Singular Spectral Analysis) filter and cropped out to be in the range from EW 41 of a given year to EW 40 of the subsequent year.

Forecasting:

From the trained/estimated model, we run a Monte Carlo simulation with 10k runs to generate the dengue cases predictions: the AR(p) and initial conditions were fixed, only the model excitation has been drawn from a zero-mean Gaussian distribution, whose standard deviation has been estimated from the modeling error. Each of these artificially generated model excitations has 79 samples, covering a forecast range from EW 26 of a given year to EW 52 of the subsequent year. Then, the attained results have been mapped back to the original amplitude domain (via the inverse of the log2 function, 2^(predictions)). From the set of these 10k case predictions, the median, lower- and upper-bounds of the 50%, 80%, 0%, 90%, and 95% prediction intervals have been calculated. Finally, the resulting curves are smoothed out via an SSA (Singular Spectral Analysis) filter and cropped out to be in the range from EW 41 of a given year to EW 40 of the subsequent year.

Predictive Uncertainty:

From the set of 10k case predictions (for each state and each validation), we used the Matlab function prctile.m (percentiles of a sample) to obtain the median, as well as the lower- and upper bounds of 50%, 80%, 90%, and 95% prediction intervals. The median of the case predictions is the 50% percentile. The lower bounds for the 50%, 80%, 90%, and 95% prediction intervals are, respectively, the 25%, 10%, 5%, and 2.5% percentiles. The upper bounds for the 50%, 80%, 90%, and 95% prediction intervals are, respectively, the 75%, 90%, 95%, and 97.5% percentiles.

Model Output:

Libraries and Dependencies (MATLAB):

Model 2: UERJ-SARIMAX-2025-1

UERJ-SARIMAX
SARIMAX is a forecasting model for DENV dynamics through a seasonal autoregressive integrated moving average with exogenous inputs.

Repository structure:

DengueSprint2025_Model2_UERJ-SARIMAX-2025-1/
  │
  |── validation_X_sarimax_ZZ.csv: model output files for validation challenge X in the state ZZ
  │
  |── DengueSprint2025_SARIMAX_ZZ.R: code to run the model for state ZZ

Author:

Data and Variables:

Model Structure and Training:

Forecasting:

Predictive Uncertainty:

Model Output:

Libraries and Dependencies (R):

Model 3: LNCC-CLiDENGO-2025-1

LNCC-CLiDENGO
CLiDENGO — CLimate Logistic DENGue Outbreak Simulator is a forecasting model for DENV dynamics through a mechanistic, stochastic climate-modulated β-logistic growth model for weekly dengue cases at the state (UF) level. It couples a flexible epidemic growth core with a climate response so that periods of favorable weather (e.g., warm, humid, rainy) accelerate epidemic growth in a data-driven way.

Repository structure:

DengueSprint2025_Model3_LNCC-CLiDENGO-2025-1/
│
|── DengueSprint2025_DataAggregated: surveillance and climate data aggregated at the state level
│
|── DengueSprint2025_DataValidation1: model output files for validation challenge 1
|── DengueSprint2025_DataValidation2: model output files for validation challenge 2
|── DengueSprint2025_DataValidation3: model output files for validation challenge 3
│
|── logo: D-FENSE team logo files

Authors:

Data and Variables:

We use surveillance (weekly probable cases) together with climate covariates - temperature (min/mean/max), precipitation (min/mean/max), and relative humidity (min/mean/max) - aggregated at the UF level. Data are arranged as seasons of 52 weeks, from EW 41 of year Y to EW 40 of Y+1. Climate series are min–max normalized on the training set and lightly denoised to form a baseline seasonal signal; case series are also denoised for QoI preparation while keeping values non-negative and integer when reported. Training uses multiple past seasons (e.g., 2010–2011 to 2020–2021); the next season (e.g., 2022–2023) is held out for validation. These inputs come from the ‘DengueSprint2025_DataAggregated’ repository.

Model Structure:

CLiDENGO forecasts weekly dengue incidence by integrating a β-logistic growth ODE whose effective growth rate can be modulated by climate (temperature, precipitation, relative humidity). The model is trained per state (UF) and produces median and 50/80/90/95% prediction intervals.

Model Training:

Each season is 52 weeks from EW 41 of year Y to EW 40 of Y+1 (consistent with the Sprint evaluation windows). Validation is one season ahead. For instance, for validation challenge 1, training uses seasons 2010–2011 to 2020–2021, and validation uses the 2022-2023 season. Prior (probabilistic) model parameters are identified by least squares using historical seasons as observations (each season spans EW 41 of year Y to EW 40 of year Y+1). Fitting is performed per UF, yielding a climate-response and logistic growth structure tailored to each state.

Forecasting:

With the identified parameters and lags, we re-run the simulator with a larger ensemble (thousands of realizations) and integrate the ODE 52 weeks into the future (EW 41 → EW 40 of the next year). For each realization, we obtain weekly incidence and cumulative trajectories driven by the climate modulators. Reported weekly cases are kept non-negative and rounded to integers.

Predictive Uncertainty:

From the Monte Carlo simulation, done with 1024 realizations by sampling from the learned parameter priors (and perturbing climate inputs), we compute the mean and central prediction intervals at 50%, 80%, 90%, 95% using prctile.m. Lower bounds use the 25%, 10%, 5%, and 2.5% percentiles; upper bounds use the 75%, 90%, 95%, and 97.5% percentiles, respectively.

Model Output:

Libraries and Dependencies (MATLAB):

Model 4: LNCC-SURGE-2025-1

LNCC-SURGE
SURGE is a forecasting model for DENV dynamics through an average surge model.

Repository structure:

D-FENSE/DengueSprint2025_Model4_LNCC-SURGE-2025-1/
│
|── Aggregated_Data: surveillance data aggregated at the state level
│
|── DFense_SurgeModel: codes and results for the 3 validation challenges
  │
  |_ validation1: material related to validation 1 challenge
      |_ matlab: Matlab scripts needed to run (run_batch_v1_predictor_Surge_Model.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |_ planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |_ plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.
  │
  |_ validation2: material related to validation 2 challenge
      |_ matlab: Matlab scripts needed to run (run_batch_v2_predictor_Surge_Model.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |_ planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |_ plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.
  │
  |── validation3: material related to validation 3 challenge
      |_ matlab: Matlab scripts needed to run (run_batch_v3_predictor_Surge_Model.m) the simulation and generate the CSV and PDF files, related to dengue case predictions for each state. CSV files are stored in planilhas, and related plots (in PDF) are stored in plots  
      |_ planilhas: stores CSV files, one for each state, with predictions of dengue cases
      |_ plots: stores PDF files, one for each state, with 4 subplots related to predictions of dengue cases: median prediction, 50%, 80%, 90%, and 95% prediction intervals.

Author:

Data and Variables:

Only the time series of the raw number of dengue cases per state along epidemic weeks has been used. Data are available from the ‘Aggregated_Data’ repository.

Model Structure and Training:

For each state (UF), a time series of raw dengue cases, in the defined range for each validation, has been organized in blocks of 52 samples (one year), from EW 41 until the EW 40 of the next year. Training uses multiple past seasons (e.g., 2010–2011 to 2020–2021); the next season (e.g., 2022–2023) is held out for validation. Assuming that the dengue surges happen about the same time (around EW 15) each year, an average or typical surge (outbreak) curve has been obtained. Assuming the surge is symmetrical with respect to its local maximum, a centralized (to its peak) version of the surge is obtained. From the typical centralized surge, we estimate the parameters (L,k,x0) of the derivative of the logistic model, using a nonlinear estimator (lsqcurvefit.m, with algorithm ‘trust-region-reflective’). Then, we use a template matching filter scheme to find the local maxima of the cross-correlation coefficient sequence between the model surge (template) and the observed surges over time. After time-synchronizing the model with a given observed surge, we calculate the amplitude gain that, when applied to the model, matches it with each observed surge. We do that for each surge and obtain a set of amplitude gains, which are positive. The dengue cases prediction is simply given by a gain that multiplies the surge model. Assuming that the set of gains follows a log-normal distribution, we use the set of gains to estimate the related mean and sigma of a log-normal distribution. To predict the dengue cases, we generate 10k gains from the previously estimated log-normal distribution and apply it to the model surge, properly placed in time. From the set of these 10k case predictions, the median, lower- and upper-bounds of the 50%, 80%, 0%, 90%, and 95% prediction intervals are calculated. Finally, we cropped out the predictions to be in the range from EW 41 of a given year to EW 40 of the subsequent year.

Forecasting:

From the trained/estimated typical surge model, after time-synchronizing the surge model with a given observed surge, we calculate the amplitude gain that, when applied to the model, matches each observed surge. We do that for each surge and obtain a set of amplitude gains, which are positive. The dengue cases prediction is simply given by a gain that multiplies the surge model. Assuming that the set of gains follows a log-normal distribution, we use the set of gains to estimate the related mean and sigma of a log-normal distribution. To predict the dengue cases, we generate 10k gains from the previously estimated log-normal distribution and apply it to the model surge, properly placed in time. From the set of these 10k case predictions, the median, lower- and upper-bounds of the 50%, 80%, 0%, 90%, and 95% prediction intervals are calculated. Finally, we cropped out the predictions to be in the range from EW 41 of a given year to EW 40 of the subsequent year.

Predictive Uncertainty:

From the set of 10k case predictions (for each state and each validation), we used the Matlab function prctile.m (percentiles of a sample) to obtain the median, as well as the lower- and upper bounds of 50%, 80%, 90%, and 95% prediction intervals. The median of the case predictions is the 50% percentile. The lower bounds for the 50%, 80%, 90%, and 95% prediction intervals are, respectively, the 25%, 10%, 5%, and 2.5% percentiles. The upper bounds for the 50%, 80%, 90%, and 95% prediction intervals are, respectively, the 75%, 90%, 95%, and 97.5% percentiles.

Model Output:

Libraries and Dependencies (MATLAB):

Model 5: UERJ-SARIMAX-2025-2

UERJ-SARIMAX
SARIMAX is a forecasting model for DENV dynamics through a seasonal autoregressive integrated moving average with exogenous inputs.

Repository structure:

DengueSprint2025_Model2_UERJ-SARIMAX-2025-2/
  │
  |── validation_X_sarimax_ZZ.csv: model output files for validation challenge X in the state ZZ
  │
  |── DengueSprint2025_SARIMAX_ZZ.R: code to run the model for state ZZ

Author:

Data and Variables:

Model Structure and Training:

Forecasting:

Predictive Uncertainty:

Model Output:

Libraries and Dependencies (R):

How to Cite This Repository

If you wish to cite this repository in a document, please use the following reference:

In BibTeX format:

@misc{D-FENSE-GitHub,
   author       = {A. {Cunha~Jr} and {et al.}},
   title        = { {D-FENSE: Dynamics for Epidemic Surveillance and Evaluation} },
   year         = {2025},
   publisher    = {GitHub},
   howpublished = {https://github.com/americocunhajr/D-FENSE},
   doi          = {10.5281/zenodo.20182707},
}

License

All material available in this repository is licensed under the terms of the CC-BY-NC-ND 4.0 license.

Institutional support

   

Funding

         

Contact

For any questions or further information, please contact:

Americo Cunha Jr: americo@lncc.br