Investigation of the efficiency of methods of infilling missing data in relation to the precipitation parameter in arid regions of Iran

Document Type : Research Article

Authors

1 Assistant Professor, Soil Conservation and Watershed Management Research Institute (SCWMRI), Agricultural Research, Education and Extension Organization (AREEO), Tehran, Iran

2 Ph.D. Student, Department of Arid Land And Desert Management, Faculty of Natural Resources & Desert Studies, Yazd University, Yazd, Iran

Abstract

Missing data are common issue in climate data. Also precipitation is a very important part of the hydrological cycle and meteorological and hydrological studies of watersheds, initially depend on the quantity and quality of recorded rainfall data and its distribution in the area. Complete and reliable sets of climatic and hydrological data are required for planning and design of these projects. Therefore for treatment of precipitation missing data, various methods have been developed and applied. Normal ratio method, linear regression, multivariate regression and inverse distance weighting (IDW) have a wide applications in natural resources study in our country. Therefore, it is necessary to determine the ability of these methods, especially in relation to the precipitation parameter, which plays a crucial role in the study of natural resources. In this study, the capability of each mentioned methods for infilling missing data of daily, monthly and annual precipitation time series in the arid regions of Iran was investigated for varying proportion of missing data from 5 to 50% of total data. In fact, the main purpose of this study is to answer the question of which of the four mentioned methods are more effective for infilling precipitation missing data.
The daily data of Irans synoptic meteorological stations were used for the present study. Using the Run homogeneity test, the data homogeneity was investigated. Also, using graphical exploring data, and especially boxplot diagrams, outlier data were identified and flagged as missing data. The average annual precipitation and temperature of 400 stations were determined, and then based on these data their de Martonne coefficients were computed. In the next step, stations with de Martonne coefficient less than 10 were selected as arid climate. Among them, 73 stations that had sufficient data from 1986 to 2017 were distinguished. To evaluate each of the data reconstruction methods, part of the actual data was deliberately discarded from the original data and then reconstructed. Due to high volume of calculations, this process was programmed in MATLAB software.
The results showed that each method had different functionality according to the conditions. Daily data are not well estimated using the normal ratio method to estimate the missing data less than the actual one. The use of linear regression method showed that in daily time scale, unlike the normal ratio method, the model accuracy in data reconstruction is higher. For linear regression approach, the distance between the fitted line between the observed and estimated data is small at first, and as the precipitation increases, this distance increases, indicating that the model is less accurate in estimating the extreme values. Given that the fitting line is below the 1:1 line, the linear regression method estimates the actual values below normal. The same results can be found for IDW producer. The multivariate regression method is more accurate for daily time series when the proportion of missing data are not considerable, but is generally very sensitive to the proportion of missing data. The normal ratio method is not suitable for reconstructing daily missing values, however it is more stable than other methods when missing data increase. In monthly time series, the performance of the IDW method and then the normal ratio is better. In annual series, linear correlation, normal ratio, and IDW have better performances, respectively.
The findings of this study show that in general, the accuracy of reconstructions on annual scales is more than monthly and on monthly scales is higher than daily. This is due to smoother time series in the monthly and annual time series than the daily ones. Also it should be noted that the scale of current studies is in Iran. If the data from the reserved rain-gauge stations of the Meteorological Organization and the Ministry of Energy are added to this data, the accuracy of the methods is expected to increase. As the results of the present study show, the accuracy of the models decreases with increasing incomplete data ratio. Therefore, if new data is included in missing data processing, there is an expectation of better performance of each of these methods. Finally it should be considered that each method should be used in accordance with the given conditions, and therefore it is recommended to develop a software package for infilling missing data in Iran.

Keywords

Main Subjects


رضازاده جودی.، ع. و ستاری، م. ت.، ۱۳۹۵، ارزیابی عملکرد روش‌های مختلف در بازسازی داده‌های بارش. تحقیقات کاربردی علوم جغرافیایی، (۱۶)۴۲، ۱۷۶-۱۵۵.
ساداتی نژاد، ج.، 1376، مقایسه آماری و روش‌های مختلف بازسازی داده‌های بارش در استان اصفهان. پایان نامه ارشد، دانشگاه تربیت مدرس.
علیزاده، ا.، 1392، اصول هیدرولوژی کاربردی. چاپ 36، دانشگاه امام رضا.
مهدوی، م.، 1384، هیدرلوژی کاربردی. جلد اول، انتشارات دانشگاه تهران.
رضیئی، ط.، 1396، چشم‌اندازی از مناطق اقلیمی ایران به روش کوپن-گایگر در سده بیست و یکم، م. فیزیک زمین و فضا، (2)43، 419-439.
Abebe, A. J., Solomatine, D. P. and Venneker, R. G. W., 2000, Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events. Hydrological Sciences Journal, 45, 425–436.
Ahani, H., Kherad, M., Kousari, M., Rezaeian-Zadeh, M., Karampour, M., Ejraee, F. and Kamali, S., 2012, An investigation of trends in precipitation volume for the last three decades in different regions of Fars province, Iran. Theoretical and Applied Climatology, 109, 361-382. doi: 10.1007/s00704-0-572-11z.
Barlow, M., Zaitchik, B., Paz, S., Black, E., Evans, J. and Hoell, A., 2016, A Review of Drought in the Middle East and Southwest Asia. Journal of Climate, 29, 8547-8574. doi: 10.1175/jcli-d-13-00692.1.
Barrios, A., Trincado, G. and Garreaud, R., 2018, Alternative approaches for estimating missing climate data: application to monthly precipitation records in South-Central Chile. Forest Ecosystems, 5, 28. doi: 10.1186/s40663-018-0147-x.
Canchala-Nastar, T., Carvajal-Escobar, Y., Alfonso-Morales, W., Loaiza Cerón, W. and Caicedo, E., 2019, Estimation of missing data of monthly rainfall in southwestern Colombia using artificial neural network Data in Brief 26, 104517. doi: https://doi.org/10.1016/j.dib.2019.104517.
De Martonne, E., 1925, Traité de Géographie Physique Quatrième édition. Paris: A. Colin.
Foehn, A., García Hernández, J., Schaefli, B. and Cesare, G., 2018, Spatial interpolation of precipitation from multiple rain gauge networks and weather radar data for operational applications in Alpine catchments. Journal of Hydrology, 563, 1092-1110. doi: https://doi.org/10.1016/j.jhydrol.2018.05.027.
Hasanpour Kashani, M. and Dinpashoh, Y., 2012, Evaluation of efficiency of different estimation methods for missing climatological data. Stochastic Environmental Research and Risk Assessment, 26, 59-71. doi: 10.1007/s00477-011-0536-y.
Hu, M. and Huang, Y., 2020, atakrig: An R package for multivariate area-to-area and area-to-point kriging predictions. Computers & Geosciences, 139, 104471. doi: https://doi.org/10.1016/j.cageo.2020.104471.
Kamwaga, S., Mulungu, D. M. M. and Valimba, P., 2018, Assessment of empirical and regression methods for infilling missing streamflow data in Little Ruaha catchment Tanzania. Physics and Chemistry of the Earth, Parts A/B/C , 106, 17-28. doi: https://doi.org/10.1016/j.pce.2018.05.008.
Kim, J.-W. and Pachepsky, Y. A., 2010, Reconstructing missing daily precipitation data using regression trees and artificial neural networks for SWAT streamflow simulation. Journal of Hydrology, 394, 305-314. doi: https://doi.org/10.1016/j.jhydrol.2010.09.005.
Lebrenz, H., Bárdossy, A. and Pavia Santolamazza, D., 2016, Reconstruction of missing precipitation data. Paper presented at the EGU General Assembly, Vienna.
Lo Presti, R., Barca, E. and Passarella, G., 2008, A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environmental Monitoring and Assessment, 160, doi: 10.1007/s10661-008-0653-3.
Miri, M., Masoudi, R. and Raziei, T., 2019, Performance Evaluation of Three Satellites-Based Precipitation Data Sets Over Iran, Journal of the Indian Society of Remote Sensing, 47, pages2073–2084.
Rees, G., 2008, Hydrological Data. In: Gustard, Alan; Demuth, Siegfried, (eds.). Manual on Low-flow Estimation and Prediction Operational Hydrology Report, World Meteorological Organization,50, 22-35.
Sattari, M.-T., Rezazadeh-Joudi, A. and Kusiak, A., 2016, Assessment of different methods for estimation of missing data in precipitation studies. Hydrology Research, 48(4), 1032-1044. doi: 10.2166/nh.2016.364.
Serrano-Notivoli, R., de Luis, M. and Beguería, S., 2017, An R package for daily precipitation climate series reconstruction. Environmental Modelling & Software, 89, 190-195. doi: https://doi.org/10.1016/j.envsoft.2016.11.005.
Shtiliyanova, A., Bellocchi, G., Borras, D., Eza, U., Martin, R. and Carrère, P., 2017, Kriging-based approach to predict missing air temperature data. Computers and Electronics in Agriculture, 142, 440-449. doi: https://doi.org/10.1016/j.compag.2017.09.033.
Teegavarapu, R. S. V., 2020, Precipitation imputation with probability space-based weighting methods. Journal of Hydrology, 581, 124447. doi: https://doi.org/10.1016/j.jhydrol.2019.124447.
Teegavarapu, R. S. V., Aly, A., Pathak, C. S., Ahlquist, J., Fuelberg, H. and Hood, J., 2018, Infilling missing precipitation records using variants of spatial interpolation and data-driven methods: use of optimal weighting parameters and nearest neighbour-based corrections. International Journal of Climatology, 38, 776-793. doi: 10.1002/joc.5209.