**Authors**

**Abstract**

In big cities, air pollution has become a great environmental issue nowadays. In city of Tehran, 90% of air pollutants are generated from traffic, among which carbon monoxide (CO) is the most important one because it constitutes more than 75% by weight of total air pollutants. This study aims to predict daily CO concentration of the urban area of Tehran using a hybrid forward selection- ANFIS (adaptive neuro-fuzzy inference system) model based on atmospheric stability analysis.

Atmospheric stability is the most important parameter affecting dilution of air pollutants. It plays a central role in the investigation of parameters that affect ambient pollutant concentrations. Therefore, it can be considered as an input parameter for developing air pollution prediction models. Although different methods are used for stability determination with varying degrees of complexity, most of them incorporate considerations of both mechanical and buoyant turbulence. In this study two aspects for atmospheric stability analysis are considered and thus, two models are developed.

ANFIS1: frictional wind velocity and temperature gradient are used for representing mechanical and buoyant turbulence, respectively. For predicting CO concentration at a certain time step (CO (t)), total candidates for inputs are: CO (t-1), u10(t), u10(t-1), rad(t), rad(t-1).

ANFIS2: wind velocity and solar radiation are considered as the indicators of mechanical and buoyant turbulence, respectively. For predicting CO concentration at a certain time step (CO(t)), there are 9 candidates for the inputs: CO (t-1), u10(t), u10(t-1), u24(t), u24(t-1), temp(t), temp (t-1), dtemp(t), dtemp(t-1).

Input selection is a crucial step in ANFIS implementation. This technique is not engineered to eliminate superfluous inputs. In the case of a high number of input variables, irrelevant, redundant, and noisy variables might be included in the data set, simultaneously; meaningful variables could be hidden. Moreover, high number of input variables may prevent ANFIS from finding the optimized models. Therefore, reducing input variables is recommended even though this causes some of the information to be omitted. In this research, input selection is carried out based on forward selection (FS) procedure. When the number of candidate covariates (N) is small, one can choose a prediction model by computing a reasonable criterion (e.g., RMSE, SSE, FPE or cross-validation error) for all possible subsets of the predictors. However, as N increases, the computational burden of this approach increases very quickly. This is one of the main reasons why step-by-step algorithms like forward selection are popular. In this approach, which is based on linear regression model, first step is ordering of the explanatory variables according to their correlation with the dependent variable (from the most to the least correlated variable). Then, the explanatory variable, which is best correlated with the dependent variable, is selected as the first input. All remained variables are then added one by one as the second input according to their correlation with the output and the variable which most significantly increases the correlation coefficient (R2) is selected as the second input. This step is repeated N-1 times for evaluating the effect of each variable on model output. Finally, among N obtained subsets, the subset with optimum R2 is selected as the model input subset. The optimum R2 is integral to a set of variables after which adding new variable dose not significantly increase the R2.

FS is applied on the input sets of this study which reduces the inputs of the models to 5 and 4 for ANFIS1 and ANFIS2, respectively. In order to identify the effect of FS on modeling results, the complete input sets are considered. Thus, 4 models are defined: ANFIS1, ANFIS2, FS- ANFIS1 and FS-ANFIS2. The selected inputs are used for Neuro-fuzzy modeling approach. Neuro-fuzzy modeling refers to the method of applying various learning techniques developed in the neural network literature to fuzzy modeling or Fuzzy Inference System (FIS). A specific approach in neuro-fuzzy development is ANFIS (adaptive neuro-fuzzy inference system), which has shown significant results in modeling nonlinear functions. ANFIS uses a feed forward network to optimize parameters of a given FIS to perform well on a given task. The learning algorithm for ANFIS is a hybrid algorithm, which is combination of the gradient descent and least squares methods. The used FIS here is the Sugeno first-order fuzzy model with its equivalent ANFIS architecture.

Results show that the forward selection reduces not only calculation burden but also the output error. FS-ANFIS models produce more accurate results with R2 of 0.52 and 0.41 for FS-ANFIS1 and FS-ANFIS2, respectively. Moreover, although both models can satisfactorily predict trends in CO concentration level, FS-ANFIS2, which is based on temperature and wind speed gradients, is the superior model.

Atmospheric stability is the most important parameter affecting dilution of air pollutants. It plays a central role in the investigation of parameters that affect ambient pollutant concentrations. Therefore, it can be considered as an input parameter for developing air pollution prediction models. Although different methods are used for stability determination with varying degrees of complexity, most of them incorporate considerations of both mechanical and buoyant turbulence. In this study two aspects for atmospheric stability analysis are considered and thus, two models are developed.

ANFIS1: frictional wind velocity and temperature gradient are used for representing mechanical and buoyant turbulence, respectively. For predicting CO concentration at a certain time step (CO (t)), total candidates for inputs are: CO (t-1), u10(t), u10(t-1), rad(t), rad(t-1).

ANFIS2: wind velocity and solar radiation are considered as the indicators of mechanical and buoyant turbulence, respectively. For predicting CO concentration at a certain time step (CO(t)), there are 9 candidates for the inputs: CO (t-1), u10(t), u10(t-1), u24(t), u24(t-1), temp(t), temp (t-1), dtemp(t), dtemp(t-1).

Input selection is a crucial step in ANFIS implementation. This technique is not engineered to eliminate superfluous inputs. In the case of a high number of input variables, irrelevant, redundant, and noisy variables might be included in the data set, simultaneously; meaningful variables could be hidden. Moreover, high number of input variables may prevent ANFIS from finding the optimized models. Therefore, reducing input variables is recommended even though this causes some of the information to be omitted. In this research, input selection is carried out based on forward selection (FS) procedure. When the number of candidate covariates (N) is small, one can choose a prediction model by computing a reasonable criterion (e.g., RMSE, SSE, FPE or cross-validation error) for all possible subsets of the predictors. However, as N increases, the computational burden of this approach increases very quickly. This is one of the main reasons why step-by-step algorithms like forward selection are popular. In this approach, which is based on linear regression model, first step is ordering of the explanatory variables according to their correlation with the dependent variable (from the most to the least correlated variable). Then, the explanatory variable, which is best correlated with the dependent variable, is selected as the first input. All remained variables are then added one by one as the second input according to their correlation with the output and the variable which most significantly increases the correlation coefficient (R2) is selected as the second input. This step is repeated N-1 times for evaluating the effect of each variable on model output. Finally, among N obtained subsets, the subset with optimum R2 is selected as the model input subset. The optimum R2 is integral to a set of variables after which adding new variable dose not significantly increase the R2.

FS is applied on the input sets of this study which reduces the inputs of the models to 5 and 4 for ANFIS1 and ANFIS2, respectively. In order to identify the effect of FS on modeling results, the complete input sets are considered. Thus, 4 models are defined: ANFIS1, ANFIS2, FS- ANFIS1 and FS-ANFIS2. The selected inputs are used for Neuro-fuzzy modeling approach. Neuro-fuzzy modeling refers to the method of applying various learning techniques developed in the neural network literature to fuzzy modeling or Fuzzy Inference System (FIS). A specific approach in neuro-fuzzy development is ANFIS (adaptive neuro-fuzzy inference system), which has shown significant results in modeling nonlinear functions. ANFIS uses a feed forward network to optimize parameters of a given FIS to perform well on a given task. The learning algorithm for ANFIS is a hybrid algorithm, which is combination of the gradient descent and least squares methods. The used FIS here is the Sugeno first-order fuzzy model with its equivalent ANFIS architecture.

Results show that the forward selection reduces not only calculation burden but also the output error. FS-ANFIS models produce more accurate results with R2 of 0.52 and 0.41 for FS-ANFIS1 and FS-ANFIS2, respectively. Moreover, although both models can satisfactorily predict trends in CO concentration level, FS-ANFIS2, which is based on temperature and wind speed gradients, is the superior model.

**Keywords**

September 2012

Pages 183-201