Prediction Models for Cardiovascular Disease Deaths Using Environmental Variables and an Exploratory Study Based on Climate Change Scenarios
Department of Computer Engineering and Industrial Automation - FEEC/UNICAMP, Brasil
Environmental variables play an important role in human health, and climate change can alter these relationships. Cardiovascular diseases (CVDs) are the leading cause of death in the world. In addition to lifestyle, CVDs are also influenced by environmental variables and are subject to climate change. To elaborate strategies for adaptation, mitigation, and prevention of the effects of climate change, tools capable of simulating different future scenarios are necessary, especially in a regionalized way. Thus, the general objective of this work is to generate and compare models of the associations between environmental variables and the number of deaths from CVDs in the city of Campinas, São Paulo. Furthermore, we propose an exploratory study on the impact of different climate change scenarios on the number of deaths from CVDs by 2050. We integrated and curated the databases of deaths from all causes from the Health Department of Campinas. The environmental variables were from the meteorological stations of the Environmental Company of the State of São Paulo, Agronomic Institute of Campinas, Center for Meteorological and Climatic Researches Applied to Agriculture, and Viracopos International Airport. The environmental database includes daily values for temperature, carbon monoxide, particulate matter, relative humidity, and atmospheric pressure. We developed predictive models from the integrated database using two approaches: linear regression with SARIMA errors (LR-SARIMAX) and LSTM recurrent neural network, for daily, weekly and monthly deaths. The grid search technique was adopted to achieve the smallest prediction errors. This technique systematically varies the intrinsic parameters of the models with different combinations of the predictor variables and the number of lags of these variables. Four hundred forty-one models were evaluated using the RMSE and MAPE metrics. The models were compared concerning the data periodicity, model type, variables combination, and the number of lags of environmental variables. The models using monthly data presented prediction errors up to 5 times smaller than the models using data in the other periodicities. Even though the different approaches did not show significant differences in prediction errors (