INVESTIGATION OF FACTORS THAT HAVE AFFECTED THE OUTCOMES OF ROAD TRAFFIC ACCIDENTS ON LITHUANIAN ROADS

The purpose of this paper is to analyse the possibility for predicting the outcome of a road traffic accident concerning the traffic environment, personal traits of the traffic participant and the vehicle, i.e. aiming to answer the question whether specific values of the factors analysed to increase the likelihood of a fatal accident. The logistic regression model that allows identifying the relationship between the dependent and independent variables were used in the research. Other methods for describing and analysing categorical variables were also used alongside the logistic regression. When


Introduction
The prevention of road traffic accidents (hereinafter in the text − accident) around the world receives a lot of attention and funding. This area is primarily addressed because of unchanging accident statistics. As reported by the World Health Organisation (2013), approximately 1.24 million people die every year on the world roads, and another 20 to 50 million sustain injuries as a result of the accident. These injuries and deaths have an immeasurable impact on the affected families, whose lives are often changed irrevocably by these tragedies, and on the communities in which these people lived and worked (World Health Organization, 2013). It has been estimated that annual losses from accidents amount to about 3.8 billion euro in Lithuania (Pukalskas, Pečeliūnas, Sadauskas, Kilikevičienė, & Bogdevičius, 2015). Understandably, the massive losses incurred due to accidents on the global economy are difficult to quantify. In addition, to the prevention of accidents, the most relevant are the issues related to the analysis of accidents, i.e. investigation of individuals involved in traffic accidents and identification of factors most often affecting their consequences. Analysis of accumulated information has the potential of revealing solutions to problems, both technical and social. Such analysis is relevant throughout the world, and Lithuania is no exception in improving its accident statistics.
Issues related to accident statistics are often dealt with in scientific publications dedicated to transport and accident analysis. Finnish researchers, comparing the accident rates of the country and neighbouring Sweden, distinguished between urban and highway zones (Peltola & Luoma, 2017). Although accident rates in Swedish cities are less because of advanced urban planning, it is acknowledged that head-on collisions and singlevehicle accidents are the two most dangerous accident types. However, unlike in Finland, Sweden has developed an extensive network of centrebarrier roads. Following the model of Sweden, it is recommended that data from the road register is used to analyse the safety of the road environment. The process of fatal accidents data collection has been identified as the sole limitation to obtain detailed accident analysis and to determine its causes. However, accident analysis does let be the identification of an apparent decline in the numbers of accident rates after the introduction of stringent road safety management measures in the Slovak Republic in 2009 (Brazinova & Majdan, 2016). Another identified measure related to the more accurate collection of accident data is also analysed in research conducted by Greek scientists Yannis, Papadimitriou, Chaziris, & Broughton (2014). Linking and matching methodology between police and hospitals accidents data of some European countries indicated the areas on which authorities focus their efforts (Yannis, Papadimitriou, Chaziris, & Broughton, 2014). The merging of hospital records and police records improves the collection of accident statistics and injury misclassification. While the Denning & Jennissen (2016) of their research emphasise that off-road crashes were not included in any of the national police reporting systems, all-terrain vehicle fatal accidents are the object of researchers from Iowa City, United States. The relevance of such research is related to the much higher popularity of all-terrain vehicles (special purpose vehicles) in the US than in Europe, which leads to more frequent accidents caused by these. It is noted that from 1998 to 2007 the rate of increase in a fatal accident on paved roads was nearly twice that of unpaved roads. As all-terrain vehicle design is unsuitable for paved roads (high centre of gravity, limited-slip rear differential, and all-terrain tyres) manufactures, safety agencies and stakeholders are under pressure to convey the message about the higher risk of this type of vehicle on paved roads.
Coming back to the Swedish road safety policy, interest groups were identified, with some of them advocating higher speeds, promoting mobility and the economy, while others were prioritising road safety improvement and speed reduction (Svensson, Summerton, & Hrelja, 2014). The compromise is partly achieved by maintaining speed limits on regional and local roads to the competence of municipal governments. Besides, road safety policy this way is brought closer to the regions by avoiding state-only regulation.
As the accident rate in urban areas is raising all kind of accidents and vehicles is considered in detail analysis regardless of its small statistical sample (Russo & Comi, 2017). The limited amount of data requires specific mobility and safety simulation models been implemented because sustainable urban mobility without injuries and fatal accidents is the crucial target for the future transport network. A specific case study of the city of Zagreb was done to find out the relations between external factors and accident blackspots involving pedestrians (Ćosić, Šimunović, & OF ROAD AND BRIDGE ENGINEERING 2 02 0/ 1 5 (5) Jakovljević, 2019). Accident data collecting and its statistical analysis were carried out before the determination of influential factors of pedestrianvehicle accidents. The external factors related to road infrastructure, speed limits and traffic volume showed the association with accident rate. At the same time, factors of year and day period also meteorological conditions did not show significant relation. However, the visibility of drivers in rural roads is significantly affected by day period and conditions as it was experimentally tested using an eye-tracking camera in real traffic conditions (Madleňák, Hoštáková, Madleňáková, Drozdziel, & Török, 2018). During night driving the number of fixations on traffic signs was 38.50% less comparing to daylight conditions. In addition, visual smog has an even higher negative effect (81.70%), and that is significant issues ensuring road safety. In the other hand, motorways witness road events causing even more serious results. Frequently road catastrophes take place, with the participation of a few vehicles, confirming the statistics that they are caused by the speeding, the fatigue of drivers, the insufficient distance between vehicles, or incorrect change of traffic lane (Drozdziel & Wrona, 2018).
Comparing to the European Union (EU) context, the number of accidents in Lithuania is still very high (by deaths − position 22 in the EU). After a significant decrease in 2008−2010, the numbers of the road accidents and injuries for next decade are quite similar or even deteriorating, however keeping the decrease in the fatal accident (Lithuanian Road Administration…, 2019). With the increase in the number of vehicles and their managers, nonregistered accidents have become part of our lives. Where they are the only type of accidents, there is no need to analyse the problems of accidents. However, there are also many reported accidents resulting in injuries to people. A significant number of accidents are particularly painful as they involve human death or serious injuries. The amount of research summarising long-term observations of the road safety situation in Lithuania is low. The available studies only analyse the road safety issues of both the vehicles, the road and traffic participants (Antov & Smirnovs, 2016;Bureika, Gaidamauskas, Kupinas, Bogdevičius, & Steišūnas, 2017;Bureika, Žuraulis, & Sadauskas, 2012;Gailienė & Laurinavičius, 2017;Žuraulis, Nagurnas, Pečeliūnas, Pumputis, & Skačkauskas, 2018). Two hundred forty-six non-professional Lithuanian drivers participated in the study about psychological factors to risky driving, and it is one of the few studies of this type from middle-income Eastern European countries (Šeibokaitė, Endriulaitienė, Sullman, Markšaitytė, & Žardeckaitė-Matulaitienė, 2017). A report of the cross-sectional survey showed that cognitive processing of emotions complicates the decision-making process while driving, leading to an increase in errors and lapses. This study encourages the testing of the fitness of drivers to drive in terms of emotional readiness. As questionnaire surveys are often used for accident issues, the log-linear model showed the best fit for such variables as gender, fault of drivers and time in Ankara city (Olmuş & Erbaş, 2012). Additionally, the survey showed that these variables are significantly associated with accident severity. Unfortunately, the variable as the gender of drivers do not directly relate to accidents and much more comprehensive analysis has to be done.
By reducing the number of fatal accidents, it is essential to summarise the information about accidents, attempting to find specific prevailing trends, to reveal the causes and to anticipate potential risk factors. Once the main hazards have been identified, and methodologies have been developed, targeted work is being done with road users to reduce the impact of the human factor: • to improve their training and examination programmes, • to integrate road safety topics into the school general education course, • to draw attention in social advertising. Since road users are identified as the main perpetrators of accidents, who inevitably tend to make mistakes, therefore, the likelihood and consequences of human errors can be reduced by improving the road infrastructure. The Vision Zero programme successfully implemented in Sweden is based on such a plan (Belin, Tillgren, & Vedung, 2012).
This work analyses the possibility for predicting the outcome of the accident concerning the traffic environment, personal traits of the traffic participant and the vehicle, i.e. aiming to answer the question whether specific values of the analysed factors (variables) increase the likelihood of a fatal accident.

Research methodology
Many studies have analysed various data mining technologies to get the most useful objectives from accident data (Gupta, Solanki, & Singh, 2017;Li, Shrestha, & Hu, 2017;Shokohyar, Taati, & Zolfaghari, 2017). Data clustering technique and association rule mining were used to get valuable information from heterogeneous accident data (Kumar & Toshniwal, 2015).
Determination of cause is the foremost possible and final task for accident data mining where computational intelligence algorithms are applied (Xi, Gao, Niu, Ding, & Ning, 2013). Here the importance of each value with the association rules method is analysed for selected accident data layers.

00/ 1 5 (5)
Exploring big data set of accidents as from China, data cleaning steps are necessary, so invalid data is deleted, and dirty data is removed (Chen, 2017). After that, the most accurate prediction models between Fisher Linear Discriminant, decision tree and random forest are analysed.
The publication presents the results of research obtained in collaboration with the Ministry of Transport and Communications of the Republic of Lithuania. Accidents involving deaths and injuries of people on the roads of the Republic of Lithuania from 2010 to 2015 are described and analysed in the research. The data set was compiled from records of 30 853 accidents (injuries and deaths). During this research, the focus was only on the perpetrators of the accident and the entire data set of 19 831 records was used. Only 18 175 records from this data set were recognised as suitable for the research due to insufficient data (no information provided about the perpetrators) in other records. Information on accidents is presented in various ways: general characteristics of the event, information on persons and vehicles involved in the accident. General event characteristics include accident number, date, time of the day, place (municipality, road number, type of road surface), number of vehicles and persons involved in the accident, number of injured and killed, meteorological conditions. Personal information is provided regarding the gender, age, driving experience, nature of injuries, sobriety, use of seat belt, airbag deployment. Vehicles are characterised by indicating the type, make, year of manufacture, country of registration and nature of the accident. Two additional variables are used in this research to distinguish accidents. One of these indicates the year in which the accident occurred and the other provides information on fatalities resulting from the accident. According to such classification, the total number of variables describing each accident is 29. However, some variables have been excluded because the number of values entered was below 5.00%, or they were assessed as not affecting the outcome of the accidents (for example, accident number or date). Thus, a total of 22 variables were used in the research.
Each of the variables analysed in the research draws from several to dozens of different values. For example, the values acquired for the variable Time of the accident are a day, dusk, dark, and the number of values of the variable indicating the Municipality of an accident is the highest -62. Understandably, such a large number of different values is inappropriate for the research. Therefore, at the beginning of the research, the values obtained by many variables were merged into groups. Groups were formed without loss of information, and the probability of occurrence of the values of each group statistically significant differs from zero. For example, the variable indicating the municipality of the accident was reorganised by merging municipalities The logistic regression model, which allows identifying the relationship between the dependent and independent variables, was used in the research. Other methods for describing and analysing categorical variables were also used alongside the logistic regression in the research. The second section discusses in detail the logistic regression model and its application in evaluating the ratio of probabilities. The third section describes the set of research data, evaluating the influence of various factors on the possible outcome of the accident by using logistic regression tools. The fourth section outlines the result of the research and discusses the possibilities for further research.

Logistic regression model
Logistic regression is one of the most popular analysis methods for categorical data. The general principles in the logistic regression are the same as in the linear regression. In the logistic regression, the relationship between a response variable (Y ) and one or more explanatory variables (X) is described, i.e. there is one dependent and one or more independent variables. The obtained method best fits the data. In the logistic regression, the dependent variable is dichotomous (the values of such variable are 0 -non-fatal accident and 1 -fatal accident). In the case when a dependent variable is influenced by more than one independent variable, multiple regression is assigned. In the general case, the independent variables are on different measurement scales. The general form of the logistic model (Eqs (1)−(2)) is the conditional probability (Hosmer, Lemeshow, & Sturdivant, 2013): variables and a, b 1 , b 2 , ..., b k − coefficients of the model. The coefficients b 1 , b 2 , ..., b k define the odds ratio (OR) for the dichotomous variables (Eq. (3)). An odds ratio is a measure of association. For the dichotomous variable x i coded 0 and 1, the relationship between OR and the regression coefficient is (Hosmer, Lemeshow, & Sturdivant, 2013): Odds ratio approximates how much more likely (or unlikely) it is for the outcome to be present among those with x i = 1 than among those with x i = 0. The influence of other variables (Eq. (4)) is not analysed, i. e. the model with is analysed. If b i > 0, then OR > 1 and if b i < 0, then OR < 1.
In the logistic regression (Eq. (5)) for estimating the model parameters a, b 1 , b 2 , ..., b k the maximum likelihood method is used, and the maximum problem for the likelihood function (Eq. (6)) with probability (Eq. (7)) is solved:

Data analysis
This section contains an overview of the research data and the logistic regression model described in the second section used to determine the impact of each variable on the outcome of the accident.
As mentioned in Section 1, accident information is presented with an indication of the general characteristics, the persons and the vehicles involved in the accident. Of particular interest in this research is the prediction of the values of a variable describing the outcome of the accident, in light of the values of other variables. The variable that acquires two values determines the outcome of the accident has two values: 0 -non-fatal accident, 1 -fatal accident.
Many research variables are of category type, acquiring many different meanings. It is necessary to rate the significance of each category, taking into account the probabilities of acquiring values of that category, to reduce the number of categories. In this research, the categories of variables with probabilities 0.05 were considered insignificant and were combined with other similar categories by meaning, or a new category was created. After such data rearrangement, the number of different categories of variables ranges between 2 and 10 ( Table 1).
First of all, all variables were briefly described. Then the application of the logistic regression model for the assessment of the ratio of the likelihood was evaluated.
The analysed data, as already mentioned, describes various aspects related to the accident. In this dataset, 7.38% are fatal accidents. Among the accidents that occurred between 2010 and 2015, the most common type of accidents was vehicle collisions, which account for 48.83% of all accidents. The accident usually involved 1 injury (75.10% of accidents). Most accidents were recorded in Kaunas, Klaipėda and Vilnius districts, including the cities themselves. The dominant type of road surface was asphalt or cement concrete. Route surface was usually dry during accidents (62.50%). Accidents dominate traffic dataset during daylight (55.50%). Most often, males were perpetrators of road accidents. Conforming to the research data, males were perpetrators up 76.90% fatal accidents.
The research of relations between the variable describing the outcome of the accident and other variables listed in Table 1 load to make a cross-tabulation table. By applying the χ 2 criterion, it was found that all variables listed in Table 1, except for variables: • number of the participants on the road traffic accident, • number of injured persons on the road traffic accident, • road surface indicator in the place of accident 2, • the sobriety of a vehicle driver and • driver age affects the outcome of the accident. Although the impact of the listed variables on the outcome of the accident based on χ 2 criterion is questionable, they are included in the initial logistic regression model. It was decided after analysing the relationships between the category variables of this research. The relationships between multiple variables are shown in Figures 1 and 2. The presented association plots (Zeileis, Meyer, & Hornik, 2007) shows relations between the category variables gender of the perpetrator on the road traffic accident and outcome of the road traffic accident (Figure 1), number of the participants on the road traffic accident and outcome of the road traffic accident (Figure 2). The association plots indicated where data deviate from the expected values and the effects of such deviation.
In the case, when the box is above one of the horizontal dotted lines, it is considered that the observed count is more than the expected count. In the case, when the box is below one of the horizontal dotted lines, it is considered that the observed count is below the expected count. Figure 1 shows that the variable outcome of the road traffic accident depends on the variable gender of the participant in the road traffic accident. In the case of non-fatal accidents, the value observed in the female group was higher than predicted (the black box above the 1 st dotted line). The value in the male group is higher than that predicted for a fatal accident (the black box above the 2 nd dotted line). Red boxes below the 1 st and 2 nd lines correspond to the cases, where the observed values were less than the predicted values.
When the number of the participants on the road traffic accident is below 2, the number of observed values is higher than the values predicted for non-fatal accidents (the black box above the 1st dotted line in Figure 2). When the number of the participants on the road traffic accident is no less than 2, the number of observed values is higher than the values predicted for a fatal accident (the black box above the 2 nd dotted line). Red boxes below the 1 st and 2 nd lines correspond to the cases, where the observed values were less than the predicted values. Similarly, the relationships between other category variables and the variable outcome of the road traffic accident were investigated. However, Figures 1 and 2 do not answer the question of how strongly the values of each category variable affect the result of the variable outcome of the road traffic accident. The logistic regression model described in Section 2 was used to reveal the relationships among the variables. The relationships among the variables were evaluated by analysing the relationship of probabilities.
Because the variables in the logistic regression model (both dependent and independent) are categorical, the independent variables were first transformed by introducing dummy (design) variables.  Tables 2 and 3 present the schemes for rearrangement of two variables conversion and pseudo-entry schemes (number of the participants on the road traffic accident and road surface indicator in the place of accident 2) and introduction of dummy variables. Similar actions were performed with all research variables.
Likelihood ratio test statistic is 6909.4 and p-value below 0.05, and that means that not all coefficients in the model are equal to zero. The model correctly predicts outcomes for 99.80% of non-fatal accidents and 67.90% of fatal accidents. The total percentage of correct forecasts is 97.00%. Therefore, such a model reflects the available data well (above 50.00% predicts fatal and non-fatal accidents), but is insufficient. However, it is more relevant to the other issue, i.e. the impact of each variable on the outcome of the accident. The effects of the identified factors on the outcome of the road traffic accident were studied using the ORs (Eq. (3)). Few dummy variables described each factor in the logistic regression model, and now one of these variables must be as a reference (Table 4). Table 4 does not include variables with the impact on the outcome variable is statistically insignificant (χ 2 test and likelihood ratio test p-values above 0.05): • number of the participants on the road traffic accident (p-value is equal to 0.14), • number of injured persons on the road traffic accident (p-value is equal to 0.23), • road surface indicator in the place of accident 2 (p-value is equal to 0.08), • driver age (p-value is equal to 0.96). Analysis of the results of Table 4 shows that not all calculated OR values are statistically significant (not all p-values are below 0.05). However, as these are separate categories of statistically significant variables, there is no reason to exclude them from the analysis.

Discussion and results
This section discusses the results of the data analysis performed in this research, and based on the available data, identifies the variables that have the most significant impact on the outcome of the accident. When analysing the results of Table 4, that the OR is above 1 shows the higher OR for a representative of the category in question to be involved in a fatal accident compared to a representative of the base category. Ratios of likelihoods for calculation of the accident type show that the OR of a fatal accident is statistically significant affected by rollovers or driving into an obstacle, compared to vehicle collisions. Regarding the impact of the number of vehicles involved in the road traffic accident on the outcome of the road traffic accident, the OR of a fatal accident involving more than 2 vehicles (OR = 1.87) is higher than in the case of 2 vehicles. It is also noted that the OR of a fatal accident is higher even if there is only one vehicle in the accident (OR is equal to 1.74). The odds ratio value above 1 with the p-value below 0.05 is obtained analysing the districts (according to Vilnius district) and observed that the risk of the fatal accident is higher in the districts of Marijampolė, Panevėžys, Tauragė, Telšiai, and Utena. The outcome of the road traffic accident is also strongly influenced by the time of the day. Nighttime or twilight traffic conditions are more complicated compared to the daytime and is illustrated by the OR values, which are 2.27 and 1.67, respectively. Meteorological conditions of fog, snow, strong wind and other, has more impact on fatal accident compared to sunny weather (OR is equal to 1.86). On slopes or horizontal curves of the road, the OR increases to 1.55 when compared to a straight section of the road. The ratio of values with the maximum likelihood also include the OR calculated for males compared to females (the likelihood of a fatal accident is above 2 times higher for male drivers). Non-fastened or not-equipped seat belts compared to fastened seat belts increase the likelihood of fatal accident (OR is equal to 2.21). Airbags unfitted or not deployed in vehicles also have a statistically significant effect on the outcome of the road traffic accident compared to airbag deployment. Compared to sober drivers, drivers intoxicated with alcohol or narcotic substances have a higher probability of being involved in a fatal accident (OR is equal to 1.34). It is noticeable higher probability value ratios for vehicles registered in other countries (the OR above 3.0 when compared to vehicles registered in Lithuania). A probability ratio higher than 3.0 was also pertained for trucks compared to passenger cars.

Conclusions
When summarising the results obtained during the particular statistical analysis, it is stated that most of the research factors have an impact on the outcome of the road traffic accident. The influence of some factors is higher, i.e. they increase the likelihood of involvement in a fatal accident more, as compared to others.
1. Regarding the impact of the number of vehicles involved in the road traffic accident on the outcome of the road traffic accident, the probability of a fatal accident involving more than 2 vehicles (OR = 1.87) is higher than in the case of 2 vehicles. It is also noted that the OR of a fatal accident is higher even if there is only one vehicle in the accident (OR is equal to 1.74). 2. The outcome of the road traffic accident is also strongly influenced by the time of the day. Nighttime or twilight traffic conditions are more complicated compared to the daytime and is illustrated by the odds ratio values, which are 2.27 and 1.67, respectively. 3. Meteorological conditions of fog, snow, strong wind and other, has more impact on fatal accident compared to sunny weather (odds ratio is equal to 1.86). 4. The ratio of values with the maximum probability also include the odds ratio calculated for males compared to females (the likelihood of a fatal accident is above 2 times higher for male drivers). 5. Unfastened or not-equipped seat belts compared to fastened seat belts increase the likelihood of fatal accident (odds ratio is equal to 2.21). 6. Compared to sober drivers, drivers intoxicated with alcohol or narcotic substances have a high probability of being involved in a fatal accident (odds ratio is equal to 1.34). 7. The logistic regression model presented in this research correctly predicts about 67.90% of a fatal accident, and it can be used as an auxiliary tool to identify the factors affecting the outcome of the road traffic accident concerning more on the impact of each factor separately.