Mathematics is the basis of modern technology. But it is also the basis of many false assertions. I have written previously about how statistics are (miss)used in stock market analysis. This article looks at the miss-application of math in DNA Sampling.
When DNA experts testify in court, they typically describe the probability of a false match as 1 in 100 trillion (1/100,000,000,000,000). That seems like a virtual certainty. But where did they get this number?
First a little background on DNA Sampling: Technicians extract DNA, use enzymes to cut it into pieces, process it then compare the different segments, or loci as they call it. To be admissible in court, there must be matches on 9 loci, or segments. DNA analysts typically use 13 loci, and empirical evidence suggests a random match occurs about 1 in 10 times for one loci. These two pieces of information is where the above number comes from;
This is pure mathematics. The probability of two DNA samples matching exactly is 1 in 100 trillion. But there is problem – “empirical evidence suggests a random match occurs about 1 in 10 times for one loci”. This is one of innumerable cases of getting subjective probability mixed up with frequentist probability. The former measures knowledge of an event, the latter measures mathematical probability. The problem with this particular mixup is that we don’t know for a fact that random matches occur at an exact frequency of 1 in 10. There is missing information, specifically, do the random matches always occur at this rate, or are there circumstances we haven’t encountered where this is not the case?
The Empirical Case
A study was done on the Arizona CODIS DNA database that found 1 in every 228 profiles in the database matched another profile in the database at nine or more loci. This in a database containing only 65,493 entries.
So the miss-applied mathematical probability of false DNA matches is claimed to be 1 in trillions (depending on number of loci). Real world practice reveals a probability of 1 in 228 (or less, with a larger dataset). Which is correct? Which number should be used in court? Definitely not the mathematical one, because it is falsely applied.
What is the frequentist probability of finding an exact false match in a database containing 10 million entries? You may be surprised to find out the odds are 51%.