BioScience Trends. 2017;11(3):292-296. (DOI: 10.5582/bst.2017.01035)

Time series analysis of weekly influenza-like illness rate using a one-year period of factors in random forest regression.

Wu H, Cai Y, Wu Y, Zhong R, Li Q, Zheng J, Lin D, Li Y


SUMMARY

Influenza, a disease caused by a respiratory virus, sickened over 5,043,127 citizens in Shenzhen, China, from January 2014 to April 2016. An accurate forecasting of outbreaks of influenza-like illness (ILI, here we refer to ILI as the upper respiratory infection) could facilitate public health officials to suggest public health actions earlier. In this study, a random forest regression constructed with a one-year period of factors was adopted to forecast the weekly ILI rate using the clinical data from Shenzhen Health Information Center. The following conclusions were drawn based on this method: i) Compared to the predication with 52 (one-year) history observations, the accuracy of the predication was improved by adding another 52 first-order difference variables: mean absolute percentage error (MAPE) decreased from 5.04% to 4.35% and mean squared error (MSE) decreased from 2.85E-04 to 1.97E-04. ii) The variables with the first-order difference seemed more significant than the original history observations during the predication. In addition, both the recent observations and the later observations seemed important in the predicating procedure. iii) Analysis using the Pearson correlation concluded that weather conditions, the influence of which could have been implied by history observations and seemed insignificant for the predication, showed correlation to the weekly average temperature and maximum temperature. The correlation coefficients were -0.3656 and -0.3583, respectively.


KEYWORDS: Time series analysis, random forest regression, influenza-like illness (ILI), mean absolute percentage error (MAPE), mean squared error (MSE), correlation

Full Text: