STATISTICAL ANALYSIS OF REINFORCED CONCRETE BRIDGES IN ESTONIA

This paper introduces a possible way to use a multivariate methodology, called principal component analysis, to reduce the dimensionality of condition state database of bridge elements, collected during visual inspections. Attention is paid to the condition assessment of bridges in Estonian national roads and collected data, which plays an important role in the selection of correct statistical technique and obtaining reliable results. Additionally, detailed overview of typical road bridges and examples of collected information is provided. Statistical analysis is carried out by most natural reinforced concrete bridges in Estonia and comparison is made among different typologies. The introduced multivariate technique algorithms are presented and collated in two different formulations, with contrast on unevenness in variables and taking into account the missing data. Principal components and weighing factors, which are calculated for bridges with different typology, also have differences in results and element groups where variation is retained.


Introduction
Bridges play an essential role in an infrastructure network, a famous Serbian writer has introduced them as follows: "Of everything that man erects and builds in his urge for living nothing is in my eyes better and more valuable than bridges. They are more important than houses, more sacred than shrines. Belonging to everyone and being equal to everyone, useful, always built with a sense, on the spot where most human needs are crossing, they are more durable than other buildings, and they do not serve for anything secret or bad. " - Andrić (2015), Nobel Prize Winner in 1961.
Due to the economic and societal risk of failure of bridges, it is vital for asset managers and stakeholders to implement adequate management systems to ensure the risk of failure is under stated performance criteria (Mueller, Stewart 2011). Many authorities, including Estonian Road Administration, have implemented a management system to monitor the condition of their existing bridge network (Lauridsen et al. 1998). A Bridge Management System (BMS) is a systematic and rational approach to perform all management activities related to managing a bridge stock (Scherer, Glagola 1994). It usually includes planning intervention activities to fulfil the serviceability requirements and using computational tool to track, record and process the results of management actions (Lauridsen et al. 1998). Bridge Management System comprises coordinated activities to realize their optimal value, which involves balancing of costs, risks, opportunities and performance requirements. This framework allows asset managers to plan further assessment and intervention (Matos et al. 2015) often rely on collecting condition ratings of structures using visual inspections (Das 1998;Estes, Frangopol 2005).
Unified visual inspections, as the most basic level in the assessment of existing road structures (Rücker et al. 2006), have been used on Estonian national road bridges for more than 10 years and more than 40 000 element condition states have been recorded during the period. Although there are only 995 bridges, it is difficult to discern, which bridge is the worst state or what kind of intervention it is possible to carry out. One solution, to make decision-making more efficient, it is possible to exploit modern computational methods to perform "big data" analysis, which allows the extraction of information from a large dataset for descriptive and predictive purposes, using statistical techniques (Manyika et al. 2011). The purpose of data-reduction in multivariate analysis is to represent the original data with suitable lower-dimensional space, which helps to enable visualisation and discover data structures and patterns (Martinez et al. 2010). This paper investigates the use and applicability of linear multivariate analysis method called Principal Component Analysis (PCA), where different algorithms are applied on collected visual inspection database of most common reinforced concrete bridge typologies in Estonia to see the possibilities of statistical analysis and to point out elements with higher variance. Hanley et al. (2015; have previously investigated Principal Component Analysis applicability to condition rating data of BMS by integrating this technique into a network of road bridges in Ireland and Portugal. In conclusion, the PCA is indicated to be a viable tool in the assessment of large data sets relating to engineering applications, and it is possible to convert results directly to weighing factors of condition ratings. It was also suggested to investigate other typologies with more detailed description (Hanley et al. 2016).
In this paper, visual inspection database and most ordinary reinforced concrete bridges of Estonian national road network are introduced. The difference with previous research is an additional comparison of two algorithms of PCA and element groups with greater variance are pointed out. Two different algorithms are compared to show the possibility of making wrong decisions based on results of PCA. This paper attempts to address this issue by choosing appropriate indicators and giving a rational basis for modelling technique and decision-making. From an engineering perspective, decisions are naturally the result of a well-structured reasoning which justifies the selection of the final solution. Decisions made because of a logical, scientifically structured process are considered as rational decisions (Sánchez-Silva, Klutke 2016).
Since there is more than one possible decision, for the better outcome the use of statistical techniques should also be considered. Many techniques have been developed for this purpose, but PCA is one of the oldest and most widely used (Jolliffe, Cadima 2016). It is first formalized by Pearson (Pearson 1901) and described in its algebraic form by Hotelling (Hotelling 1933). The technique identifies the component ratings which are more important than the others regarding explained variability and reassesses these relative importance ratings of different factors from an engineering point of view, based on additional data (Hanley et al. 2015). Principal Component Analysis is dimensionality-reduction method where a set of original variables are replaced by an optimal set of derived variables, called Principal Components (PCs) (Jolliffe, Cadima 2016).
One solution is to use the results as an input for the bridge condition calculation or predictive models for different typologies of bridges by calculating the importance of relevant elements which define the performance of individual bridge based on visual inspections. The results are also useful for comparing relative components of bridges with different typology, age, traffic intensity, exposures or other environmental situations.

Description of dataset
The average age of Estonian national road bridges is 40 years, which indicates the necessity to have an overview of their condition and to make correct decisions to preserve them as long as possible. The implementation of unified visual inspections and management in Estonian national roads started in 2003; the system was based on United States system PONTIS, which was initially used to monitor the state of existing bridge stock and to predict the future condition of bridges (Thompson et al. 1998). First official inspections were carried out in 2005, and within this period every bridge is visually inspected twice and 400 of them already three times.
Assessment of a bridge consists of inspecting every element unit of the bridge and evaluating each with a condition rating based on a scale of damage present and necessary rehabilitation method (Table 1). An overall element condition index is calculated based on the overall quantity of units and state factors, as shown in Eq (1): where H e − condition state of element; s is condition state; k s − coefficient of state and q s is the amount of units in current state. Bridge condition index is calculated based on element condition index and weight factor. The overall condition rating of the bridge, which is often the primary decision criterion of investments, has misleading impact because different states of element deterioration possess equal condition ratings in overall. A central element in life-cycle assessment (LCA) involves making predictions about the degradation of the system. It requires a clear understanding of the physical laws which define the system behaviour and possible uncertainties. The degradation of a bridge condition describes the process by which one or a set of elements lose value with time (Sánchez-Silva, Klutke 2016). In PONTIS, the future condition states of visual inspections are processed with Markov chain method, depending on the assumption whether the interventions are performed during the time frame between inspections (Thompson et al. 1998).
From an overall number of 995 Estonian national road bridges, 778 are constructed of reinforced concrete. Although the number of reinforced concrete bridges has decreased in last decade, by replacing them with soil-steel composite bridges, it is still most popular construction material in Estonia. There are 4 main typologies for reinforced concrete bridges, which are divided into 7 different groups according to main girder and construction year, these are: 1. Mounted simply supported beams with diaphragms (constructed after 1956). 2. Mounted simply supported beams (constructed after 1963). 3. Cast in site simply supported slabs (constructed before 1960). 4. Mounted or cast in fragments simply supported slabs (constructed after 1960). 5. Cast in site simply supported beams (constructed before 1956). 6. Mounted frames (constructed after 1978). 7. Cast on site simply supported cantilevers (constructed before 1956). These typologies make 98% of all reinforced concrete bridges. The primary concentration is on bridges that were constructed or repaired before 2005, when first unified visual inspections were carried out, because every kind of intervention is changeing the natural degradation process. In last 10 years, the average amount of bridges being built or reconstructed is 41, and since the main criteria of multivariate analysis is to exclude bridges with intervention, there has been a filtering process before analysing most common typologies. After excluding repaired and reconstructed structures, only 5 main typologies with a dataset of 501 reinforced concrete bridges remain (Table 2). To be clear, Group No. 1 consists of both simple span bridge typologies constructed after 1956 and Group No. 4 consists of the simple span and cantilever bridges.
As mentioned before, the visual inspection results are recorded on an element basis. It is possible to select from 124 different elements, which are distinguished by type and overall dimensions. For example, there are 5 different elements for piers with different width.
Within this analysis, the elements are divided into 16 different groups (Table 3) to investigate overall patterns in specific typology. This division of element groups was made based on available element observations and structural integrity, which describes the bridge performance where structural components have bigger effect on the overall load bearing capacity than nonstructural. The condition ratings for each group are calculated from every represented element condition and are on the scale of 1-4 in increasing order as condition rating for elements (Table 1). No weight factors are used in the calculation of group and condition ratings are average results of elements.
Another limitation of data usage is due to the peculiarity of bridge inspections, which are carried out in different time frame. Bridges have been inspected seasonally or in so-called cycles, where assessment of all structures is done in 3 or 4 years. The first cycle was in 2005-2007, second in 2010-2013 and the third one started in 2015. An example of average condition indexes of all inspection cycles is shown in Fig. 1, where calculated average conditions are shown.
Due to the nature of deterioration process, without any intervention, the element state has to be higher, and within the period between first two assessments, the condition index is higher or slightly better. Improvement of the condition is normally caused by the subjectivity of inspector or unrecorded maintenance works. Nevertheless, differences between condition state of the second and third cycle are much more significant, and they describe the situation where most of the elements are getting into better condition state without any intervention. The differences are mainly emerging because of insufficient information, caused by the situation where all the bridges are inspected twice, and only half of them three times. In further analysis, the primary attention is paid to 501 bridges with two different results.

Formulation of principal component analysis and algorithms in case of missing data
The primary purpose of the PCA is to reduce the dimensionality of a set of data and redefine the input variables as principal components (PCs). It is a linear combination of the original variables, having fewer variables than the original dataset while preserving most of the information (Hotelling 1933;Jolliffe 2002). The first principal component Y 1 is defined as Eq (2): where ′ α 1 x − a linear function of the elements; x having maximum variance; α − a vector of p coefficients α.
The first principal component is the direction along which the data set shows the largest variation (Ringnér 2008), and the second component is determined under the constraint of being orthogonal to the first component and to have the largest variance (Abdi, Williams 2010). The second principal component x is found in a similar manner to the first principal component, and so on for the subsequent principal components up to p PCs. Recommendation is given for the number for PCs accounting the variance in the data set, which must be significantly lower than all the calculated components (Jolliffe 2002).
As described by Hanley et al. (2015), it is possible to use the sum of the square of PC coefficients α i for each variable, because the sum of coefficients are equal to unity and it is a better indicator in comparison of results. Weighing factors are derived from coefficients as shown in Eq (3): where ζ − a combination of weighing factors based on λ = α ⋅ 2 1, 100% j j and original condition ratings x j . The previous formula, discussed in matrix terms, where a PCA is conducted through an Eigenvalue Decomposition (EVD) or a more robust and generalized Singular Value Decomposition (SVD) (Chambers 1977).
For a data matrix X of n observations on p variables measured by their means Eq (4): where L -an (r × r) diagonal matrix; U and A − (n × r) and (p × r) matrices, respectively, with orthonormal columns, and r − the dimensionality of X. SVD approach to PCA is shown to be computationally efficient and generalized method to determining the PCs.
As in previous investigations by Hanley et al. (2016) in using PCA for analysing BMS data, there is a problem with "missing data" in data-subsets, which describes a situation where a statistical difference of observations in among element groups appears. In multivariate analysis, it is often possible to use the existing structure of the data to estimate the missing data and complete a dataset, for example using Alternating Least Square (ALS) algorithm.
The algorithm alternates between imputing the missing values in and applying standard PCA to the in-filled (complete) data matrix (Jolliffe 2002). Initially, the missing values are replaced by the row-wise means of the previous matrix. The covariance matrix of the complete data are then estimated without the problems of principal components being more abdundant than estimated variances and situation where the covariance matrix is negative, as in Eq (5)-(6) (Ilin, Raiko 2010).
The Alternating Least Square algorithm alternates among the updates: where X − (p × n) matrix of principal components; U and A − (n × r) and (p × r) matrices, as stated in Eq (3). This iteration is efficient when only a few principal components are needed, so the number of principal components must be significantly lower than dimensionality of data vectors. (Roweis 1998). Alternating Least Square alternates among imputing the missing values in updated matrix, by replacing the values with row-wise mean values of original matrix. The covariance matrix of updated data is estimated using bias term M and matrix A will be computed using EVD. Principal components are calculated using Eq (7) (Ilin, Raiko 2010): where Y c states centred principal component and M states row-wise mean values of original matrix. For better estimation of missing values, the computation is reconstructed as in Eq (8) (Ilin, Raiko 2010): where Y will be used for observed values and AX + M for missing values. In case of condition rating for a bridge element it is considered inappropriate to complete the dataset with algorithms, because unfilled rating usually indicates missing element (Hanley et al. 2016).
The suggestion has also been made for variables, which means PCA should only be conducted on continuous variables conforming to a Gaussian distribution (Qian et al. 1994), and its application to discrete data, such as element condition state ratings, are inaccurate. However, so long as inferential techniques requiring the assumption of multivariate normality are without reference, there is no necessity for the variables in the data set to have any associated probability distribution (Jackson 2003).
It is often considered wise to use the correlation matrix for a PCA, as the standardized varieties are dimensionless and more readily compared (Jolliffe 2002). However, when the variables are measured in the same units and have a low variance, using the covariance matrix is sometimes appropriate, and it is beneficial when statistical inference is essential. In case, when the condition ratings are already dimensionless, it is unnecessary to entirely standardise the variables (Hanley et al. 2015). Since condition ratings in Estonian BMS are already dimensionless, the raw data is used as input for a PCA.

Results
The Principal Component Analysis was conducted on five most common reinforced concrete bridge types shown in Table 2. In every specified typology, there was a different number of element groups present, and in some typologies, there were only a few records available on the condition of drainage, deformation joints or bearings, so these elements were excluded from analysis.
The comparison was made in two different steps. At first, two different algorithms of PCA were compared, concentrating on some useful PC and coefficients. Secondly, differences in PCA results between different typologies were compared and possible input for weighing factors suggested.

Comparison of algorithms
To compare different algorithms with identical data, suggested script by Matlab® was used. An example of results using ALS and SVD algorithms is presented in Fig. 2 and Table 4 where the mean values of condition states and included some observations of element groups are shown. As stated in the previous chapter, ALS algorithm fills missing data with row-mean values and input for calculating PCs are based on artificial data, which is different from SVD algorithm.
Differences in Fig. 2 are small when considering the condition state is within the limits of 1-4, but within the period of two different inspections, the condition state has changed in similar calibre, which means the decision-making process is influenced by artificial results.
In Table 4, only element group "Other" results have remained unchanged. The reason is the same amount of  In conclusion, according to Hanley et al. (2016) arguments and overall results of the comparison, in asset management, it is correct to use data with original results to prevent using false information of elements, which are missing. Due to incorrect information, interventions must be carried out also on missing elements.

Number of principal components
The number of useful components was visualized with a scree plot of the eigenvalues (Cattell 1966). When it is necessary to determine, which PCs are essential and which must be discarded from the data set, then it is effective to use scree plot as a tool. As the components become less influential the slope of the scree plot begins to flatten because they have retained less variance than the previous components (Hanley et al. 2016). From the example of beam bridges shown in Fig. 3, the PC, at which the plot begins to flatten out, occurs for SVD at the fourth PC and ALS at the third PC. There is also a difference in the retained percentage of the variation in the data, being respectively 48% and 66% for different methods with substantial PCs. For Singular Value Decomposition, the retaining variation percentage is still under half of all data variation. Without artificial data, it is necessary to use results of six PCs to have a similar amount of variation described. In both cases, the inclusion of these PCs would not violate the practice of retaining eigenvalues higher than average value.
The comparison of different typologies, shown in Table 5, is made with the assumption where the variation of deterioration model is described by PCs, which have higher eigenvalues than average. Although in all cases it is possible to get minimal results with fewer PCs using ALS, some patterns are stressed out. Group No. 1 has 15 element groups represented, so the lack of principal components is understandable. In Group No. 5, there are only nine element groups represented and a subset of missing data is lower, which gives similar results from both algorithms. Differences and similarities in variables are described with a specialty of ALS algorithm, where missing values are filled with row-mean values of the original matrix, and as a result, the variation of less presented components are lower.
If ALS algorithm makes the whole database artificial, then it is necessary to make some changes in data collection during visual inspection of different bridges by levelling the amount of data in different groups. One possibility is to spread these groups based on the amount of data -one with more elements and other with less. Since ALS algorithm retains most of the variation in falce data, further results are based only on SVD algorithm.
The coefficients of Principal Component indicate the relationship between the bridge elements and principal component (Hanley et al. 2016). As results vary from  positive to negative, the positive values indicate advanced damage for the element groups in the bridge, and a negative values indicates these element groups, which are in favourable conditions. For first principal component the largest coefficients of PC 1 in every typology were for the non-structural elements indicating most advanced damage for types in handrails, for old slab typology in barriers and for frame typology in other elements. Situation is explained by the circumstance where most of bridges have old deteriorated handrails or barriers. As most of element groups with the largest coefficients in PC 1 are indicating elements with shorter life cycle, there is also more variation in visual inspection results. Unfortunately, these elements are non-structural and it are considered as irrelevant elements of a bridge, because they incorporate minimal risk to structural load carrying capability. Since the retained variation, percentage is relatively high in following components then it is necessary to include results from them.
For second principal component in most cases the largest coefficients were also for non-structural elements indicating most correlation to previous elements in slopes, deformation joints and waterproofing.
Since different typologies retain variation differently and do not have a correlation in respective principal components, then only example of one bridge typology is presented (Figs 4-5) and further discussion of possible weighing factors for every typology is presented in subchapter number 4.3.
Example of principal components is based on slab bridges Group No. 2, which represents the most common typology constructed before 1960's since the lengths needed to span were relatively low, and it was easier to cast in slabs than beams. The results are influenced by non-structural elements in every principal component. The background of the results is explained by adequate building quality of structural components, and naturally, non-structural elements deteriorate faster. The most variance is retained in safety barriers. The first structural component is waterproofing. Results of the same element group have different signs, for example, overlay, results in the first cycle show deterioration, but in second cycle it shows better condition. The reason is described with the situation, where on the first inspection, the attention was on other structural elements, and this element group was inspected superficially. The same pattern is applicable for other element groups.

Weight factors of element groups
According to Eq (3) and results of PCA, it is possible to obtain weighing factors for every specified bridge typology. Results were calculated with results combined with two inspection cycles, where squares of PCs were taken before averaging the results. In this way, different signs of coefficients were eliminated, and element groups with higher variation remained as principal elements.
Overall results are presented in Table 6. Results are similar for most of the typologies, where weight factors are higher in non-structural element groups, and only some basic element groups influence overall results.
Weighting factors are also representing the overall results of PC 1, which shows where the most variance of visual inspection data is retained in safety barriers,   Another critical finding in comparison of element groups among typologies there are differences in weight factors. It is incorrect to use same weighing factors for all the bridges in one network and even when overall typology is same, like Groups No. 2 and No. 3, the results are different.
The findings of this work should not be used directly in predictive models because the results are not taking into account the risk of failure or impact to load carrying capacity. One opportunity is to combine the results with initial weights or expert judgement using weighted PCA, but this topic needs additional investigation. On the other hand, the results endorse using PCA as a useful data reduction tool, because putting same weight factors to all structural components influences decision-making process.

Conclusions
There are three meaningful conclusions to be drawn from the statistical analysis of visual inspection data of reinforced concrete bridges in Estonian national roads.
1. In a comparison of different Principal Component Analysis algorithms, using Singular Value Decomposition algorithm is suggested because it uses the original data. Alternating Least Square algorithm results are useable with less principal components and with more specified coefficients. The main variation of Alternating Least Square results is hidden under missing elements, and condition states of element groups differ up to 15.6% from the original state and Singular Value Decomposition results. The difference in mean condition states of compared algorithms is increasing when the amount of missing information is higher.
2. Regarding different coefficients in first principal components, the main variation is retained in nonstructural elements. According to this investigation, it is inappropriate to use Principal Component Analysis results directly in predictive models and for decision making it is important to consider additional circumstances as the risk of failure and influence to overall load capacity.
3. Reinforced concrete bridges in Estonia have similarities in principal components, but the importance of same elements are unequal. To be clear then although the main variance in Principal Components is retained in the same element, then according to results, different typologies retain variance in different elements, and due to fundamental distinction, it is incorrect to make decisions based on only construction material or typology.
A general conclusion is that Principal Component Analysis is suitable to be used as a statistical tool for data reduction and additional analysis of visual inspection data to filter out most significant components, but it is necessary to use additional weighing factors to emphasize the real influence of structural elements.
For future research, it is essential to cluster bridges based on a similar number of variables and add circumstances as additional weighting factors. It helps to provide relevant information without prioritizing common elements.