HOW TO COMPARE DIFFERENT NATIONAL DATABASES OF HGV ACCIDENTS TO IDENTIFY ISSUES FOR SAFETY IMPROVEMENTS

. The objective of this paper is to present a methodological approach and a case study for an international comparison of accident data coming from different national databases. Safety levels and the characteristics of severe crashes involving heavy goods vehicles in different European countries (Italy, France, Germany, Great Britain and Spain) are ana-lysed. Considering that all the countries involved have different inventory structures for the variables reported in their national accident databases, the taxonomy theory was used in order to create a comparable structure for the database used in the analysis. The taxonomy is non-exclusive and the codes are categorical, denoting the absence or presence of a certain feature. Based on the data available in each national database the five European Union databases of accidents involving heavy goods vehicles have been referenced to only one, composed of 11 items (casualty class, injury number and severity, location, light conditions, road conditions, junction, vehicle type, driver age, driver gender, accident type and manoeu-vres), which capture common features of heavy goods vehicles accidents. A statistical analysis was carried out in order to highlight significant differences in the proportions of heavy goods vehicles crash categories.


Introduction
The objective of this paper is to compare heavy goods vehicle (HGV) safety levels and characteristics in different European countries. The European Union (EU) was originally composed of 15 countries (EU-15) now extended to 27 (EU-27) including new countries from the East.
In the EU-15, HGV fatal crashes fell from 4988 in 1995 to 3114 in 2006, a fall of more than 30%, although they still represent about 13% of the overall fatalities occurring in road crashes (Broughton et al. 2008). Despite the relevance of the phenomenon, few detailed statistics are currently available regarding accidents involving HGVs and even less is known about differences or similarities between different European countries. This lack of comparable data is due to an absence of homogeneity among accident databases at international level. To overcome this problem, in 1993 the Community Road Accident Database (CARE) was created as a useful tool for comparing accidents in EU countries, but, after 15 years of application it has not been able to harmonize the different national accident databases. With particular reference to commercial vehicles in CARE there is a lack of details for a more in-depth analysis.
For this reason, at European level in-depth analyses of accidents involving HGVs are only carried out by specific investigation systems including a high degree of detail but, consequently, with a limited number of available cases. In the European Truck Accident Causation (ETAC) study of 2007 a common database, made up by "only" 600 truck accident reports for seven European countries was used. In all those accidents, the main cause of accident (85.2%) was linked to the human error of one of the road participants (truck driver, car driver, pedestrian etc). Other factors such as weather conditions (4.4%), infrastructure conditions (5.1%) or technical failures of the vehicle (5.3%) played only a minor role. Accidents at intersections (27%) represented the first accident typology followed by accidents in queues (21%).
Accident data analyses highlight immediately some peculiarities characterizing the accident phenomenon. Fig. 1 shows the accident rate (accident/HGV fleet) and the fatality rate (fatality/accident) for each European country. The reference year for all the accident data is the 2006, except for France where the 2005 has been the last available one.
Based on national databases, it is difficult to conduct a more in-depth analysis due to the difference in the variables considered in each national database. For this reason, the aim of the paper is to attempt grouping comparable countries by way of national dataset management and then to compare the countries within a specific group or class.
With this purpose, the paper can be subdivided into two logical parts: -the taxonomy approach for different national database management; -statistical analysis of HGV data collected in different EU countries. Considering all EU-15 or EU-27 countries is time and cost consuming and not useful for the aim of the present work. Therefore, only five representative countries were selected. The transport of goods by road is prevalent in Europe with peaks of about 90% in Spain, Denmark, Greece, Ireland, Italy, Luxemburg and Portugal. In 2006 the fleet of commercial vehicles in the EU-15 countries amounted to about 22 mln vehicles. From these countries five (EU-5: Italy, France, Germany, Great Britain and Spain) were selected representing about 70% of the overall EU HGV fleet.

Methodological approaches
As regards road crashes, all national traffic accident databases contain a rich source of information on the different circumstances in which the accidents have occurred: cause of the accident (type of collision, road users, injuries, etc.), traffic conditions (max speed, priority regulations, etc.), environmental conditions (weather, light conditions, time of the accident, etc.), road conditions (road surface, obstacles, etc.), human conditions (fatigue, alcohol, etc.) and geographical conditions (location, physical characteristics, etc.) (Geurts et al. 2003). Unfortunately, each national database reports the accidents occurring throughout the country following its own particular choice of dataset.
In the present research, the accident data consists of statistical databases from five European countries (Italy, Germany, Spain, France and the Great Britain).
In Italy, the source statistics for the detection of accidents are provided by ISTAT (National Institute of Statistics). Any injury and/or fatal accident should be reported by the police authorities in the jurisdiction of the crash using the Model CTT.INC ISTAT. In Germany, the source statistics for the detection of accidents is provided by the Statistisches Bundesamt (DESTATIS). The accident information is based on a monthly collection of road accidents occurring over their territory collected by the public authorities. The Spanish national accident statistics database is run by the General Directorate of Traffic (DGT) which comes under the Spanish Ministry of the Interior. The database is fed by the police reports for all road accidents where at least one casualty was registered. The French National Road Administration's accident database requires that any accident involving injuries should be reported and coded in a Bullettin d'Analyse d'Accident Corporel de la Circulation (BAAC) by the gendarmerie or police in the jurisdiction of the crash. The STATS19 national database, run by the UK Government, contains comprehensive information about UK road accidents on the public highway which involving human injury or death. The data contains highway, vehicle and human information compiled at the time of accident by the police.
Each national crash database reports accidents occurring throughout the country that meet specific criteria for inclusion and classification. These criteria are different for each country and are not necessarily comparable. Therefore, for a comparison analysis, it was necessary to create a common structure to harmonize the individual differences into one consistent reporting system. For this purpose, the taxonomy approach can be used (Wallace, Ross 2007).

Taxonomy
The hierarchical structure of the data taxonomy represents a convenient way of classifying data in order to prove it is unique and not redundant (Bryce 2005 ture of classifications for a given set of objects. At the top of this structure there is a single classification, the root node that applies to all the objects. Nodes below this root are more specific classifications that apply to subsets of the total set of classified objects. The reasoning progresses from the general to the more specific. Classifying events using taxonomies designed for that purpose is a common technique in the human sciences (e.g. psychology, sociology, psychiatry) and studies have also been presented for its application to traffic accident analysis (Donnell et al. 2010;Elvik 2010;Gstalter, Fastenmeier 2010;Johnson et al. 2009;Regan et al. 2011;af Wåhlberg 2002). In traffic accident analysis, the point of any taxonomy is simply to help classify the factors that contribute to accidents or injuries and thus establish a starting point to study the causes of accidents. For any taxonomy the categories must strike a balance between incorporating too much and too little. For this reason any taxonomy of traffic accidents is necessarily incomplete as there are always categories which could be included or excluded. Taxonomy has been shown to be highly useful if whether a category finally ends up inside or outside the database meets three different criteria (Ross et al. 2004;Stanton, Salmon 2009;Yeraguntla et al. 2005): -the importance of the variable in the analysis of the phenomenon (when it comes to causing accidents and/or the usefulness of the category in accident analysis and prevention); -the availability of the kind of data needed to code for a variable; -the balance between the number of variables used and the size of the resulting samples. With these main guiding principles the accident taxonomy was developed for this study, using the procedure described below, to compare HGV safety levels and characteristics in different European countries (Italy, France, Germany, Great Britain, Spain).

Procedure
With the aim of harmonizing national databases and thus obtaining useful information for crash analysis and comparison, different items were defined (root node). Then for every node, starting from the different structure of each national database, more specific sub-classifications were defined according to the variables characterizing the datasets. In order to univocally characterize the property matched to each variable the attributes were defined with reference to every sub-category.
Finally, an Identification Data (ID) was applied to each sub-category in order to easily codify information taken from different databases. If there was not enough information to decide on the applicability of the variable it was not used and was marked as "missing". If the evidence for the interpretation of an attribute into a variable was not clear enough it was "not coded".
A taxonomy root structure was carried out for each item using the five European (Italy, Germany, Spain, France and the Great Britain) in order to define the list of attributes.
For example, to identify crashes involving heavy trucks (HGVs) Fig. 2 shows item E related to the vehicle type. Other example of hierarchical taxonomy is shown in Fig. 3 for junction/no junction definition (item D), highlighting the necessity for a high level of aggregation based on data availability. The variable's names are reported in the national language for a better reference to the original database.
From the data available in each national database eleven tree structures were composed, like those shown in the previous figures, using the taxonomic approach referring to the same number of items (casualty class, injury number and severity, location, light conditions, road conditions, junction, vehicle type, driver age, driver gender, accident type and manoeuvres) which capture common features of road accidents. Table 1 reports the list of attributes included in the new common database drawn up by referring each national dataset to only one.

HGV accident data comparison in EU countries
A simple comparison of the number of accidents referring to different categories doesn't lead to interesting results due to the variability among the various countries in terms of exposure (vehicle fleet, travelled km). Instead, the proportions of the occurrence of different typology of crashes are not influenced by the sample dimension and therefore can be used to compare the characteristics of HGV crashes in the analyzed countries. As each analysis of accident data a simple comparison of data could lead to bias due to the stochastic nature of the phenomenon.

The Bayes theorem for proportions
The "proportions" method compare proportions of an accident type among different samples Heydecker, Wu 1991;Lyon et al. 2007) considering the random characteristics of the phenomenon.
The proportion of a specific collision type for the sample "i" is defined µi: where xi -the total number of target collisions, during the study period in the sample "i"; ni -the total number of all types of collisions in the sample "i" during the same period. Considering m different samples the mean proportion of the target collision type is given by:  Any person who was killed outright or who died within 30 days as result of the accident F1 Injured Any person, who was not killed, but sustained one or more serious or slight injuries as a result of the accident F2 G Accident type Accident between vehicle and pedestrian Accidents involving one or several vehicles and pedestrians irrespective of whether the pedestrian was involved in the first or a later phase of the accident and of whether the pedestrian was injured or killed on or off the road G1 Single vehicle accidents Accidents involving no collision with other users, even though they may be involved or accident caused by collision with obstructions or animals on the road G2 Rear-end collisions Accident caused by a rear-end collision with another vehicle using the same lane of a carriageway and moving in the same direction or temporarily stopping due to the traffic conditions G3 Front side and sideswipe collisions Accident caused by a collision with another vehicle moving in a lateral direction due to leaving or entry from/to another lane, road, or premises G4 Head on collisions Accident caused by a head-on collision with another vehicle using the same lane of a carriageway and moving in the opposite direction or temporarily stopping due to traffic conditions The premise of the "proportions" method is that if the true proportion of sample i is µi, then the probability of observing xi target accidents with ni total accident is given by the Binomial distribution: Moreover, the parameter µi will vary between similar sites and is assumed to follow the Beta distribution, defined as: The parameters α and β of the Beta distribution can be estimated from the sample mean and the variance of a reference population using the following equations: where s 2 -the variance given by , n > 2.
Using Bayes theorem, the prior Beta distribution is combined with sample "i" specific accident data (ni, xi) to derive the adjusted posterior distribution that is again a Beta distribution: where i α′ and i β′ , posterior parameters defined as: For the posterior distribution the mean value and variance for each site "i" can be calculated with the following equations: Defining m µ and mi µ , respectively the median of the prior and posterior distributions the probability Based on the large sample dimension a probability of 99% can be assumed as acceptable for considering the difference significant.
If m µ is assumed as reference value of proportion for the accident type to be screened, the Potential for Safety (PfS) can be defined as the difference between the median in the sample "i" mi µ and the reference value of the proportion, m µ : Basing on the definition of PfS, the value of the potential reduction of accident number ∆xi can be calculated as the product of PfS and the observed number of accident xi: A positive value of ∆xi represents the potential reduction in the number of crashes, of the analyzed category, due to the abnormal proportion in the sample "i" with respect to the reference population.

Study results
Accidents type showing significantly higher proportions ( mi µ ) in relation to the reference value µ are the best candidate for improvement interventions. In this sense, may have significant potential reductions in the number of accidents at junctions ( potential reduction in the number of accidents: in Germany it is expected to reduce 267 pedestrian and 353 single accidents per year, in Italy 37 rear and 40 side crashes and in Spain 353 single accidents of HGV. Specifically if the reference proportion m µ is considered as "normal" the appropriate 4E safety strategies (Engineering, Education, Enforcement, Emergencies) could be particularly effective on accident types that have significant higher proportions than the expected one. It is generally assumed that new technologies can improve safety. In particular, a great deal of attention has been paid to the effects of driver assistance systems on driver performance (Cafiso, Di Graziano 2012;Lin et al. 2008). However, challenges still remain in quantifying the benefits of these systems in terms of their impact on reliability, profitability and safety (Cafiso et al. 2013).
Another example of the use of proportion method is reported with reference to accident type. In this case, data from the Great Britain and France were not available as disaggregated variables even if they are part of the national data source, thus the analysis is presented only for the other three countries (Fig. 4). Fig. 4 shows as front/sideswipe (33.2%) and single crashes (26.5%) have the higher

Conclusions
In the present research, five European countries (Italy, France, Germany, Great Britain and Spain) were considered representing about 70% of the overall EU HGV fleet. Due to an absence of homogeneity in national accident databases, taxonomy was used to create a common structure to harmonize the individual differences into one consistent reporting system. The five EU databases were referenced to only one structure composed of 11 items (casualty class, injury number and severity, location, light conditions, road conditions, junction, vehicle type, driver age, driver gender, accident type and manoeuvres) which captures common features of HGV accidents. Referring to this new common source it was possible to carry out comparable analyses of accidents involving HGVs using the proportion method to avoid the influence of exposure factors. At European level, as average, 40.1% of HGV crashes at intersections can be expected; while front/sideswipe (33.2%) and single crashes (26.5%) have the higher frequencies in terms of accident type. With reference to the EU median values, Spain has a particularly high percentage of crashes at intersections with a potential reduction of 1282 crashes per year; Germany is characterized by significant high proportions of single and pedestrian accidents with a potential reduction of 353 and 267 accident/year respectively; Italy is characterized by significant high proportions of rear and side crashes HGV with a potential reduction of 37 and 40 accident/year respectively.
Due to the limited availability of data only few comparisons were performed but the structure of the data defined using the taxonomy has identified eleven items that can be used as a reference for future studies and the proposed methodology can be used to compare crash proportions avoiding statistical bias.