Mathematics is the basis of modern technology. But it is
also the basis of many false assertions. I have written previously about how
statistics are (miss)used in stock market analysis. This article looks at the
miss-application of math in DNA Sampling.
When DNA experts testify in court, they typically describe
the probability of a false match as 1 in 100 trillion (1/100,000,000,000,000).
That seems like a virtual certainty. But where did they get this number?
First a little background on DNA Sampling: Technicians
extract DNA, use enzymes to cut it into pieces, process it then compare the
different segments, or loci as they call it. To be admissible in court, there
must be matches on 9 loci, or segments. DNA analysts typically use 13 loci, and
empirical evidence suggests a random match occurs about 1 in 10 times for one
loci. These two pieces of information is where the above number comes from;
(1/10)13
This is pure mathematics. The probability of two DNA samples
matching exactly is 1 in 100 trillion. But there is problem – “empirical
evidence suggests a random match occurs about 1 in 10 times for one loci”. This
is one of innumerable cases of getting subjective probability mixed up with
frequentist probability. The former measures knowledge of an event, the latter
measures mathematical probability. The problem with this particular mixup is
that we don’t know for a fact that
random matches occur at an exact frequency of 1 in 10. There is missing
information, specifically, do the random matches always occur at this rate, or
are there circumstances we haven’t encountered where this is not the case?
The Empirical Case
A study was done on the Arizona CODIS DNA database that
found 1 in every 228 profiles in the database matched another profile in the
database at nine or more loci. This in a database containing only 65,493
entries.
Conclusion
So the miss-applied mathematical probability of false DNA
matches is claimed to be 1 in trillions (depending on number of loci). Real
world practice reveals a probability of 1 in 228 (or less, with a larger
dataset). Which is correct? Which number should be used in court? Definitely
not the mathematical one, because it is falsely applied.
Addendum
What is the frequentist probability of finding an exact
false match in a database containing 10 million entries? You may be surprised
to find out the odds are 51%.