STATISTICAL INTERPRETATION OF STR PROFILES
from most forensic scientists is yes – because we can empirically measure the predicted genotypefrequenciesunderHWEanddetectifthereisasignificantamountofdeviation. Many statistical tests have been developed to calculate the deviation of the allelic frequenciesfromHWE.Theseincludethegoodness-of-fittest(alsocalledthechisquare test), homozygosity test, likelihood ratio test and the exact tests . However, when analysing polymorphic STR loci these tests do not have the required sensitivity because therearemanyundetectedgenotypes,andnumerousgenotypes,thataredetectedatvery low frequencies at each locus. The multi-locus exact test was developed and can detect deviation from HWE when a large dataset is tested [17, 18]. Significant deviations from HWE have not been detected in the vast majority of populations. An exact test will not detect variations from HWE in small datasets, unless the deviation is extreme, and therefore conclusions from performing the exact test should not be over interpreted.
Estimating the frequencies of STR profiles
In forensic DNA analysis the HWE is used along with an allele frequency database to calculate genotype frequencies. An allelic frequency database is constructed by measuring the occurrence of alleles within the defined population. It has been recom- mended that a database of at least 200 alleles per locus (or 100 individuals) be used for a particular population when using the database for generating the statistical estimates of the strength of DNA evidence . The larger the database, the more representative of the population it will be, and current practice dictates that several hundred individuals should be sampled when creating an allelic frequency database. These people should not be direct relations, therefore siblings or mother and child, etc., combinations should not be incorporated into an allele frequency database. Using the HWE, the expected genotype frequency at each locus is calculated using the observed allele frequencies. Using these frequencies along with the above HWE equations we can calculate the frequency of a STR profile. If we take the profile that was analysed in Chapter 6, the genotype proportions for each locus are calculated using p2 for the homozygote and 2pq for the heterozygote loci (Table 8.1). The overall profile frequency is calculated by multiplying the genotype frequency at each locus. This multiplication is termed the product rule – it is possible because the inheritance of alleles at each locus is independent of the other loci. There have been some challenges to the approach presented above, namely that the inaccurate estimation of allelic frequencies can lead to inaccurate profile frequency estimates. To overcome this problem several methods have been employed that take into consideration the limitations in allele frequency estimates.
Corrections to allele frequency databases
Allelic frequencies are calculated by measuring a number of alleles in the target pop-ulation. The more alleles that are measured as a part of the allelic frequency database the more accurate it will be. However, it is impractical to measure all of the alleles in
CORRECTIONS TO ALLELE FREQUENCY DATABASES
Table 8.1 The profile frequency is estimated using the principles of the Hardy-Weinberg law and an allele frequency database that was constructed using 400 alleles. Because the loci are all on different chromosomes there is no genetic linkage and the product rule can be used, multiplying each genotype frequency to calculate the overall profile frequency
a large population and the frequencies are only estimates, prone to inaccuracies due to the limited size of the database. For common alleles the impact is small but with rare alleles, which can easily be under represented in a frequency database, the impact of limited sampling can have a large effect. It should be noted that the deficiencies in the frequency databases can also lead to over representation of allele frequencies but, as a general principle, when we are estimating the significance of forensic evidence the emphasis is not to over state the strength of the evidence. Different approaches have been taken to overcome the limitations of allele frequency databases. These include the allele frequency ceiling principle, the Balding size bias correction , allowing for the effects of subpopulations  and using a maximum profile frequency .
Allele ceiling principle
Very rare alleles may not appear at all in the frequency database. If a rare allele not previously represented on the frequency database is detected in a crime scene sample then the frequency of the allele would be 0 – which cannot be the case! A mechanism must be put in place to deal with this situation. One approach is to set a minimal allele
STATISTICAL INTERPRETATION OF STR PROFILES
frequency. The minimum frequency values that are used vary from country to country but are typically around 0.01 (1%). Any allele occuring with a frequency of less than 0.01 will be adjusted to this figure. An alternative approach is to use a minimal allele count, for example five alleles being the smallest number of alleles that is considered: the allele frequency is simply calculated using the formula 5/2N, where N is the number of individuals in the database .