Dynamic Modeling of Streptomyces hygroscopicus Fermentation Broth Microfiltration by Artificial Neural Networks

Artificial neural networks (ANNs) have been used to dynamically model cross-flow microfiltration of Streptomyces hygroscopicus fermentation broths. The aim is to predict permeate flux as a function of temperature, feed flow, transmembrane pressure and processing time. Dynamic modeling of microfiltration performance of complex systems (such as broths) is very important for design of new processes and better understanding of the present. The results of ANN model analysis suggest that the coefficients of the determination have high values. The application of the Bayesian regularization gave better results to the performance of the neural network compared to the Levenberg-Marquet algorithm. The optimal number of neurons in the hidden layer is eight. Analysis of the absolute relative error showed excellent permeate flux estimates for 100 % of the data points, with an error less than 5 % for the data obtained during microfiltration in the presence of a turbulence promoter. Whilst in the case of microfiltration without turbulence promoter 90 % of predictions have an error less than 10 %. The results of applying the concept of neural networks in the dynamic modeling of microfiltration of Streptomyces hygroscopicus fermentative broths with and without a turbulence promoter clearly show the validity of proposed method for simulation and prediction of microfiltration experimental results.


Introduction
Biological control against spoilage organisms currently represents a fast-growing sector in the crop protection and food industry [1]. Fermentates, i.e., numerous components produced during fermentations by a variety of microorganisms, are referred to as 'biocontrol' as they exhibit antibiosis to pathogenic (spoilage) microorganisms [2,3].
Streptomyces is a genus of Gram-positive bacteria that grows in various environments, and in its shape, it resembles filamentous fungi. Streptomyces species are the source of thousands of bioactive compounds used in biological control [2]. The first stage for the downstream processing is separation of microbial cells and other insoluble materials from the fermentation broth by filtration or centrifugation.
Cross-flow membrane microfiltration (CFMF) can be used for these purposes. In CFMF, the fermentation broth flow is tangential to the membrane surface and perpendicular to the permeation flux, while the accumulation of filtered cells and solids can be minimized by the sharing action of the flow [4,5].
Although CFMF represents an attractive option for cell harvesting and other insoluble materials from the fermentation broth, the deposit of dissolved and suspended solutes onto the membrane surface during operation lead to permeate flux decline [6]. One of the methods to lessen the membrane fouling effects is the placement of turbulence promoters near the membrane surface, so due to the increased shear rate on the surfaces, the permeate flux can greatly improve [7].
Membrane technology and its applications are a significant aspect of chemical and biochemical engineering. Mathematical models for membrane separation processes are crucial in membrane science and technology as they play an important role in planning of economical and effective membrane separation processes [8]. In view of the complex nature of fermentation broth as well as complicated relationship between filtration performance and operation conditions, it is reasonably difficult to describe the non-linear behaviors using a mathematical model [7]. Conventional mathematical models involve rather complex mathematical equations in predicting permeate flux decline with time during membrane filtration processes and these models have certain limitations [8].
Artificial neural network (ANN) is an effective predictive model for non-linear systems. Artificial neural networks are used as an alternative approach in modeling. In the recent years, ANN has been widely used to predict the membrane performance in various filtration processes [8]. Compared to theoretical or mathematical models ANN modeling of CFMF flux decline is much simpler.
ANNs have been employed in a wide range of membrane separation processes such as reverse osmosis nanofiltration, ultrafiltration, microfiltration, gas separation, membrane bioreactors, and fuel cells [8]. In literature, numerous applications of ANN in CFMF processes can be found, and some of them are focused on the prediction of permeate flux and membrane fouling during CFMF of bentonite [9,10], protein [11,12] and colloid [13] suspensions, removal of phosphate from aqueous solution with fly ash [14], suspensions of baker's yeast [15], etc.
To the extent of our knowledge, application of ANN dynamic modeling of the turbulence promoter-assisted CFMF process is still limited. Nidal et al. [9], established two ANN models, using feed temperature, transmembrane pressure (TMP), feed concentration and cross-flow velocity as input variables, during CFMF of bentonite suspensions with and without a turbulence promoter. ANN model with Bayesian regularization training algorithm was successfully developed for the turbulence promoter-assisted CFMF of wheat starch industry wastewater [16]. Liu et al. [7] developed ANN model for the turbulence promoter-assisted cross-flow microfiltration, in which the inlet velocity, TMP and feed concentration were taken as inputs and the flux improvement efficiency (FIE) by turbulence promoter was taken as output.
Accurate flux decline modeling is essential for optimization, simulation, and scale-up [8]. So, in this study, an ANN model, using feed flow, TMP, temperature and time as input variables, was established to predict the flux decline during turbulence promoter assisted CFMF of Streptomyces hygroscopicus fermentation broth. After determining the optimum network parameters and training the network with a set of microfiltration experimental data, the prediction capabilities of the ANN models were validated. Using the trained ANN model, the effects of CFMF operation conditions on the flux decline were studied, and the relative importance of each operation condition was analyzed.

Material and methods 2.1 Materials
Streptomyces hygroscopicus cultivation broth was used for research of CFMF flux decline in this paper. Isolation of microorganism was performed from a soil sample from locality of Novi Sad, Serbia. Cultivation was carried out in Woulff bottles (capacity 2 L) with 500 mL of cultivation medium. Cultivation medium was seeded with 10 % of inoculum. Bioprocess lasted for 10 days at 27 °C with stirring of 150 rpm on a rotating microbiological shaker and with spontaneous aeration.
Cross-flow microfiltration of cultivation broth was carried out on laboratory apparatus, detailed apparatus explanation can be found in our previous work [17]. Ceramic membrane (TAMI, France) with one channel, 200 nm pore diameter, 25 cm length and 0.0043 m 2 active surface was used in experiments.
Kenics static mixer used in the experiments has been made from stainless steel, with 6 mm diameter and 23 cm length, which corresponds to the membrane active length. The permeate flux during microfiltration was calculated from the time needed to collect 10 mL of permeate.

Methods
For the experimental part of the work Box-Behnken design was selected. The design variables and their ranges were transmembrane pressure (0.3; 0.6 and 0.9 bar), feed flow rate (40; 100 and 160 L/h) and temperature (30; 40 and 50 °C).
The permeate flux during microfiltration was calculated from the time needed to collect 10 mL of permeate. Data on the change of permeate flux were monitored over time for each of the experimental conditions defined by experimental plans until the desired level of volume concentration factor (VCF) was achieved. In this study, VCF value was two. This value was selected as in our preliminary studies the permeation flux decrease with the increase of VCF up to values close to two, and for higher VCF of two onward, the permeation fluxes are practically constant.

Artificial neural network architecture
Reliable experimental data for training of the artificial neural network models are of principal importance. Total data for CFMF experiments are divided into two groups: the first one for permeate flux without static mixer (NSM) and the second group data for flux with static mixer (SM). Prior to the artificial network training data sets were normalized to bring all data within a specific range [18]: where: J norm and J P , are the normalized permeate flux and measured permeate flux at a given time, respectively; J min and J max , minimum and maximum permeate flux in data sets, respectively; ∆ L and ∆ U are the lower and upper limits for the normalization (with values of 0.01 for each limit). The multi-layer feed-forward with backpropagation neural networks are the most popular and most widely used models in many practical applications [8]. Because of the convergence rate and the performance of the network in seeking a optimal solution, the Levenberg-Marquardt (LM) training algorithm is selected in this study. The Lavenberg-Marquardt algorithm uses an early stopping criterion to improve network training speed and efficiency. The data for each experimental mode (220 and 211 data points for NSM and SM, respectively) are divided into three randomized sets. The first set is the training set for determining the weights and biases of the network and it consists of 70 % of all data. The second set is the validation set (15 % of data) for evaluating the weights and biases and for deciding when to stop training. The validation error generally decreases at the beginning of the training process, but when the network starts to overfit the data, the validation error begins to increase. The training is stopped when the validation error begins to increase and the weights and biases will then be derived at the minimum error. A maximum validation failure is set to default value of five epochs.
The last data set is for testing (15 % of data); the weights and biases are used to verify the capability of the stopping criterion and to estimate the expected network operation on new data sets. The randomization algorithm used, is built in Matlab function 'randperm'.
Internal ANN features such as number of hidden layers, number of neurons in each layer, momentum factor, learning rate, transfer functions, and initial weight distribution have great impact on ANN model building. Default values were selected for some of these factors (momentum factor and learning rate), since they only affect the training time [19]. In our study, the maximum number of epochs, target error goal MSE (mean square error), and minimum performance gradient are set to 1500, 0, and 10 -10 , respectively. Training stops when the maximum number of epochs is reached or when either the MSE or performance gradient is minimized to reach at the predetermined goal.
Number of neurons in the hidden layer is crucial for ANN model performance [20]. The small number of hidden neurons causes that the ANN is unable to predict experimental data precisely, whereas when the number of hidden neurons is too big the over-fitting problem may occur. Moreover, too many neurons do not propagate errors back efficiently [20] and therefore worsen the ability of the neural network to learn. A trial and error based method was selected for determining the number of neurons in the hidden layer.
Neurons in network layer are connected by different activation functions. The connection of hidden layer neurons to output layer was linear (purelin). In the case of input neurons connections to the hidden, two types of activation functions are examined i.e. log-sigmoid (logsig) or hyperbolic tangent sigmoid (tansig).
ANN accuracy was expressed in terms of the calculating coefficient of determination, R 2 , and the mean square error, MSE, according to Eq. (2) and Eq.
where: n is the number of data points, J exp,i and J pred,i are the normalized experimental and predicted values obtained from the experiments and neural network model, respectively and J pred,avg is the average of the predicted values. Two ANN models were generated for NSM and SM filtration data separately. The input layer consists of four neurons in both model i.e. TMP, feed flow, temperature and filtration time, while the output layer has one permeate flux in NSM or SM mode. The ANN predictive model simulations were carried out using mathematical software Matlab R2015b.
Since the neural network is highly dependent upon the initial weights values and in order to achieve the best results, the neural networks were run 30 times and the average values of statistical indicators (MSE and R 2 ), are used for comparing network performances [21]. Fig. 1 shows the mean squared error (MSE) variation for number of hidden nodes and type of activation function used for training and testing data. It can be seen that the MSE decreases up to 8 hidden neurons for all simulated ANN models.

Results and discussion 3.1 Artificial neural network architecture
In NSM mode, if fewer neurons (up to five) in the hidden layer are used to train the network, the ANN model has a poor predictive ability i.e. test data set have higher error for both transfer functions. Further increase of hidden neurons number results in better predictive capacity of ANN model as MSE for test data set is closer to training error values.
When the neuron number is 8, the optimal results are achieved for NSM data. The value of MSE for tansig transfer function is 2.93 × 10 -4 for training and 3.48 × 10 -4 for testing. In the case of logsig transfer function slightly higher values of MSE are obtained, 3.01 × 10 -4 and 4.12 × 10 -4 , for training and testing, respectively. Further increase in number of hidden layer neurons results in insignificant MSE decrease.
In SM mode, similar results are obtained. With the increasing neuron number in the hidden layer MSE values decrease for all ANN models. There is no significant difference between activation functions used, so the tansig activation function is chosen for this model as well. The optimal results, regarding neuron number, are achieved for the network with 8 hidden neurons. In this case the values of MSE are 7.84 × 10 -5 and 1.24 × 10 -4 , for training and testing, respectively. Further increase in hidden layer neurons number results in minor increase of MSE values. Fig. 2 illustrates the variation of coefficient of determination with the hidden neurons number for different transfer functions.
As can be seen, for both of training and testing data sets coefficient of determination has high values. In the case of SM mode microfiltration high values are obtained even for fewer neurons. As already indicated by the results of MSE values, the optimal values are reached for 8 hidden neurons and tansig transfer function. Further increase of number of neurons in hidden layer results in insignificant increase or even in decrease of coefficient of determination values. As the goal of network architecture optimization is to have simpler network as possible, increase of neuron number resulting in insignificant growth of R 2 values is dismissed. On the other hand, decline of R 2 values can indicate to over-fitting problems.
The values of coefficient of determination are 0.9953 and 0.9947, for training and testing data in NSM mode, respectively. In SM mode, the values are 0.9997 for training and 0.9989 for testing data.
So, the optimal network topography can be summoned as 4-8-1 type for NSM and SM mode with tansig transfer function and the Lavenberg-Marquardt trainig algorithm.

Artificial neural network model validation
Networks with its saved weights and biases are used to estimate to model predictability capacity in both NSM and SM mode. Fig. 3 depicts the regression lines for all data used in ANN model creation i.e. training, validation and testing. The solid line indicates the perfect fitting curve.  On the other hand, the values of R 2 for NSM and SM data are 0.9988 and 0.9993, respectively. It shows good linear fitting between ANN predicted permeation flux values and experimental values with model suggesting that less than 1 % of the total variation cannot be explained by linear model for both ANN models.
Error analysis was carried out for better comparison and understanding of each model results (Table 1). ANN models developed herein were able to predict the great majority of observations with small absolute relative error for both sets of data.
Analysis of the absolute relative error showed excellent permeate flux estimates for 100 % of the data points, with an error less than 5 % for the data obtained during microfiltration in the presence of a turbulence promoter. Whilst in the case of microfiltration without turbulence promoter 90 % of predictions have an error less than 10 %. In the same time, only 2 (1 %) data points had absolute relative error greater than 20 % in NSM mode. The reasons for this slightly worse prediction can be probably found in slow decline of permeate flux, almost linear, in systems with turbulence promoter i.e. static mixer (Fig. 4).
To confirm the generalization capacity of ANN models, new experimental data (not included data sets used for training, validation and testing) were presented to the neural networks. The input variables used for the verification of the developed model were transmembrane pressure 0.9 bars, feed flow 100 L/h and 30 °C. The microfiltration experiments were in two modes, NSM and SM. Results of model verification for denormalized flux decline are presented in Fig. 4.
As it can be seen, for both data sets, NSM and SM, the ANN predicted and the experimental permeation fluxes exactly overlap each other. This means high capability of neural networks to model the complicated and non-linear flux decline phenomena.
The network structure thus adopted, allows simulation of the permeate flux decline over time.

Relative importance of input variables
ANN need not be used simply as black box model and cause-effect information can be quantitatively extracted from network connection weights to assist in model development and experimental design [21]. Relative importance of input variable is calculated according to Eq. (4):  (4) where: n v is the number of input neurons, n h the number of hidden neurons, i vj and i kj the absolute value of connection weights between the input and hidden layer neurons, and   O j is the absolute value of connection weights between the hidden and output layers.
As seen in Table 2, in both NSM and SM mode time played an important role in determining flux decline (67 % and 54 %, respectively). Similar results were reported by Aydiner et al. [14]; the contribution of filtration time as input variable to flux values provided by ANNs during cross-flow microfiltration of a mixture that contains phosphate and fly ash, was determined in an important level at the range of 40-50 % due to increasing in membrane fouling by the time. In study of flux decline during cross-flow microfiltration of polydispersed suspensions time was the most significant input to the ANN model and range of influence in all cases was from 20 % to 40 % [21].
The other input variables (transmembrane pressure, feed flow rate and temperature) investigated in the study have a contribution of about 7.43-13.36 % in NSM node, whilst in SM mode contribution range is 10.94-12.90 %.
Besides filtration time, feed flow rate has the greatest contribution to ANN model of dynamic flux behavior in both modes of microfiltration. Feed flow rate influence increases for around 40 % higher when static mixer is placed into the membrane channel. At the same time, the TMP influence is about 32 % higher when turbulence promoter is inserted into membrane channel. CFMF in the presence of turbulence promoter i.e. Kenics static mixer is more sensitive to change in feed flow rate and TMP change due to the change in pattern of suspension flow through membrane channel. In difference to system without static mixer, helical component of the flow in presence of Kenics static mixer enhances radial mixing and creation of secondary flows, which in return increase the shear rate at the membrane surface, which results in reduced filtration cake buildup, and thus the effect of TMP is more pronounced [16,17]. Increased influence of TMP in the presence of Kenics static mixer is confirmed in literature by applying response surface methodology [22].
The same reasoning can be adopted for increase in feed flow influence as in SM mode turbulent flow is achieved even for lower rates of feed suspension. On the other hand, influence of temperature is slightly higher in SM mode.

Conclusion
The dynamics of the rate of permeate flux decline during broth cross-flow microfiltration was captured accurately by ANNs. The input parameters were transmembrane pressure, feed flow rate, temperature and filtration time. The observed output variable was permeate flux. Two modes of cross-flow microfiltration in single channel tubular ceramic membrane were modeled by ANNs: NSM, without turbulence promoter, and the turbulence promoter assisted process (SM). For each mode, a single neural network was created.
The adopted network structure (4-8-1 type, with tansig transfer function and the Lavenberg-Marquardt training algorithm) resulted in excellent agreement with experimental flux results.
Quantitative interpretation of ANN connection weights revealed that filtration time has the most influence on the filtration in both the NSM as well as SM mode. Besides filtration time, TMP and feed flow rate have more influence in the case of the turbulence promoter assisted process (SM). In contrast, temperature influence is only weakly increased in SM mode.
It can be concluded that ANNs are a suitable methodology to predict the permeate flux decline that occurs in CFMF of Streptomyces hygroscopicus fermentation broth under various hydrodynamic conditions. The results confirm applicability of neural network modeling for the complex feeds such as fermentation broth. Although, the ANN are considered as black-box model results of the study extended the knowledge of flux decline modeling without considering highly complex interactions between flux attenuation and membrane fouling caused by various components present in broth, even in the case of altered feed flow hydrodynamics in the presence of turbulence promoter.

Acknowledgement
This work was supported by the grant from the Ministry of Education, Science and Technological Development of Republic of Serbia. Project number: TR-31002.